loading...
Cluster Utility: A New Metric for Clustering Biological Sequences
Stanford, California August 08-August 11
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CSBW.2005.382005 IEEE Computational Systems Bioin ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Jason Lee, School of Informatics,Indiana University
Sun Kim, Center for Genomics and Bioinformatics,Indiana University

Sequence clustering problem is different from traditional clustering problems in that the features of sequences are not observable and sequences cannot be placed in a metric space, which most clustering algorithms assume. The most widely used approach is to build a sequence graph using the all-pairwise sequence comparison data and to use the graph to generate clusters of sequences. Like other clustering problems, a metric to evaluate results from a sequence clustering algorithm is needed, but the metrics for traditional clustering problems are not readily applicable due to their metric space assumption. We propose Cluster Utility (CU), a metric that is based on consideration of similarity within a cluster and difference between clusters without metric space assumption. CU showed a very high correlation with the quality index. CU scales very well with data size and its strong correlation with quality index was nearly invariable regardless of data size change. CU can be used in two ways: to guide sequence clustering algorithms and to evaluate clustering results.

Citation:
Jason Lee, Sun Kim, "Cluster Utility: A New Metric for Clustering Biological Sequences," csbw, pp.45-46, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.