Genomic DNA sequences are outstanding examples of complex multiscale systems where randomness and structure coexist. One of the most important problems in genomics is gene finding. Good indices that accurately discriminate coding from noncoding regions are crucial elements of a successful gene identification algorithm. The authors describe two novel and effective codon indices by creatively representing DNA sequences which facilitates multiscale analysis of genome sequences. Their methods are unsupervised and nonparametric and are more accurate than conventional ways.
This article is part of a special issue on data mining in bioinformatics.