loading...
Text Classification with Evolving Label-Sets
Houston, Texas November 27-November 30
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2005.143Fifth IEEE International Conference o ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Shantanu Godbole, Indian Institute of Technology - Bombay
Ganesh Ramakrishnan, IBM India Research Lab
Sunita Sarawagi, Indian Institute of Technology - Bombay

We introduce the evolving label-set problem encountered in building real-world text classification systems. This problem arises when a text classification system trained on a label-set encounters documents of unseen classes at deployment time. We design a Class-Detector module that monitors unlabeled data, detects new classes, and suggests them to the administrator for inclusion in the label-set.

We propose abstractions that group together tokens under human understandable concepts and provide a mechanism of assigning importance to unseen terms. We present generative algorithms leveraging the notion of support of documents in a model for (1) selecting documents of proposed new classes, and (2) automatically triggering detection of new classes. Experiments on three real world taxonomies show that our methods select new class documents with high precision, and trigger emergence of new classes with low false-positive and false-negative rates.

Citation:
Shantanu Godbole, Ganesh Ramakrishnan, Sunita Sarawagi, "Text Classification with Evolving Label-Sets," icdm, pp.629-632, Fifth IEEE International Conference on Data Mining (ICDM'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.


Suggestions