loading...
Validation Measures for Clustering Algorithms Incorporating Biological Information
Hangzhou, Zhejiang, China June 20-June 24
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/IMSCCS.2006.1392006 First International Multi-Sympos ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Susmita Datta, University of Louisville, USA
Somnath Datta, University of Louisville, USA
A cluster analysis is the most commonly performed procedure (often regarded as a first step) on a set of gene expression profiles. A closely related problem is that of selecting a clustering algorithm that is optimal in some way from a rather impressive list of clustering algorithms that currently exist. In this paper, we propose two validation measures each with two parts: one measuring the statistical consistency (stability) of the clusters produced and the other representing their biological functional consistency, so that a good clustering algorithm should have a small value for these measures.

We illustrate our methods using two sets of expression profiles obtained from a breast cancer data set. Six well known clustering algorithms UPGMA, K-Means, Diana, Fanny, Model-Based and SOM were evaluated. Whereas the exact ordering depends on the particular data set (expression profiles) used and the validation measure employed, overall UPGMA appears to be the optimal for this cancer data set that we considered.

R-codes: R-codes used in this paper are available from the author upon request.

Citation:
Susmita Datta, Somnath Datta, "Validation Measures for Clustering Algorithms Incorporating Biological Information," imsccs, vol. 1, pp.131-135, 2006 First International Multi-Symposiums on Computer and Computational Sciences, 2006
Usage of this product signifies your acceptance of the Terms of Use.