loading...
An Improved Categorization of Classifier?s Sensitivity on Sample Selection Bias
Houston, Texas November 27-November 30
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2005.24Fifth IEEE International Conference o ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Wei Fan, IBM T. J. Watson Research
Ian Davidson, State University of New York at Albany
Bianca Zadrozny, IBM T. J. Watson Research
Philip S. Yu, IBM T. J. Watson Research
A recent paper categorizes classifier learning algorithms according to their sensitivity to a common type of sample selection bias where the chance of an example being selected into the training sample depends on its feature vector x but not (directly) on its class label y. A classifier learner is categorized as "local" if it is insensitive to this type of sample selection bias, otherwise, it is considered "global". In that paper, the true model is not clearly distinguished from the model that the algorithm outputs. In their discussion of Bayesian classifiers, logistic regression and hard-margin SVMs, the true model (or the model that generates the true class label for every example) is implicitly assumed to be contained in the model space of the learner, and the true class probabilities and model estimated class probabilities are assumed to asymptotically converge as the training data set size increases. However, in the discussion of naive Bayes, decision trees and soft-margin SVMs, the model space is assumed not to contain the true model, and these three algorithms are instead argued to be "global learners". We argue that most classifier learners may or may not be affected by sample selection bias; this depends on the dataset as well as the heuristics or inductive bias implied by the learning algorithm and their appropriateness to the particular dataset.
Citation:
Wei Fan, Ian Davidson, Bianca Zadrozny, Philip S. Yu, "An Improved Categorization of Classifier?s Sensitivity on Sample Selection Bias," icdm, pp.605-608, Fifth IEEE International Conference on Data Mining (ICDM'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.