loading...
Better Rules, Few Features: A Semantic Approach to Selecting Features from Text
San Jose, California November 29-December 02
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2001.989501First IEEE International Conference o ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
The choice of features used to represent a domain has a profound effect on the quality of the model produced; yet, few researchers have investigated the relationship between the features used to represent text and the quality of the final model. We explored this relationship for medical texts by comparing association rules based on features with three different semantic levels: (1) words (2) manually assigned keywords and (3) automatically selected medical concepts. Our preliminary findings indicate that bi-directional association rules based on concepts or keywords are more plausible and more useful than those based on word features. The concept and keyword representations also required 90% fewer features than the word representation. This drastic dimensionality reduction suggests that this approach is well suited to large textual corpus of medical text, such as parts of the Web.
Citation:
Catherine Blake, Wanda Pratt, "Better Rules, Few Features: A Semantic Approach to Selecting Features from Text," icdm, pp.59, First IEEE International Conference on Data Mining (ICDM'01), 2001
Usage of this product signifies your acceptance of the Terms of Use.