loading...
Detecting Fractures in Classifier Performance
Omaha, Nebraska, USA October 28-October 31
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2007.1062007 Seventh IEEE International Confe ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A fundamental tenet assumed by many classification algorithms is the presumption that both training and testing samples are drawn from the same distribution of data ? this is the stationary distribution assumption. This entails that the past is strongly indicative of the future. However, in real world applications, many factors may alter the One True Model responsible for generating the data distribution both significantly and subtly. In circumstances violating the stationary distribution assumption, traditional validation schemes such as ten-folds and hold-out become poor performance predictors and classifier rankers. Thus, it becomes critical to discover the fracture points in classifier performance by discovering the divergence between populations. In this paper, we implement a comprehensive evaluation framework to identify bias, enabling selection of a "correct" classifier given the sample bias. To thoroughly evaluate the performance of classifiers within biased distributions, we consider the following three scenarios: missing completely at random (akin to stationary); missing at random; and missing not at random. The latter reflects the canonical sample selection bias problem.
Citation:
David A. Cieslak, Nitesh V. Chawla, "Detecting Fractures in Classifier Performance," icdm, pp.123-132, 2007 Seventh IEEE International Conference on Data Mining, 2007
Usage of this product signifies your acceptance of the Terms of Use.