loading...
An Unsupervised Learning Approach to Resolving the Data Imbalanced Issue in Supervised Learning Problems in Functional Genomics
Rio de Janeiro, Brazil December 06-December 09
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICHIS.2005.23Fifth International Conference on Hyb ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Kihoon Yoon, University of Texas at San Antonio
Stephen Kwek, University of Texas at San Antonio
Learning from imbalanced data occurs very frequently in functional genomic applications. One positive example to thousands of negative instances is common in scientific applications. Unfortunately, traditional machine learning treats the extremely small instances as noise. The standard approach for this difficulty is balancing training data by resampling them. However, this results in high false positive predictions. Hence, we propose preprocessing majority instances by partitioning them into clusters. This greatly reduces the ambiguity between minority instances and instances in each cluster. For moderately high imbalance ratio and low in-class complexity, our technique gives better prediction accuracy than undersampling method. For extreme imbalance ratio like splice site prediction problem, we demonstrate that this technique serves as a good filter with almost perfect recall that reduces the amount of imbalance so that traditional classification techniques can be deployed and yield significant improvements over previous predictor. We also show that the technique works for subcellular localization and post-translational modification site prediction problems.
Citation:
Kihoon Yoon, Stephen Kwek, "An Unsupervised Learning Approach to Resolving the Data Imbalanced Issue in Supervised Learning Problems in Functional Genomics," his, pp.303-308, Fifth International Conference on Hybrid Intelligent Systems (HIS'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.