We have developed a method for distinguishing between correctly labeled and mislabeled data sampled from video sequences and used in the construction of a facial expression recognition classifier. The novelty of our approach lies in training a single, optimal classifier type (a Support Vector Machine, or SVM) on multiple representations of the data, involving different "discriminating" subspaces. Results of a preliminary study on the discrimination of "high stress" vs. "low stress" facial expression data by this method confirms that our novel approach is able to distinguish subproblems where labeling is highly reliable from those where mislabeling can lead to high error rates. In helping detect data sub-samples which yield misleading classification results, the method is also a rapid, highly efficient cross-validated approach for eliminating outliers.
Citation:
Sundara Venkataraman, Dimitris Metaxas, Dmitriy Fradkin, Casimir Kulikowski, Ilya Muchnik, "Distinguishing Mislabeled Data from Correctly Labeled Data in Classifier Design," ictai, pp.668-672, 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'04), 2004