loading...
Learning Term Dependency Links Using Information Theoretic Inclusion Measure
Omaha, Nebraska, USA October 28-October 31
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2007.21Seventh IEEE International Conference ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
An algorithm to identify and remove term redundancy is proposed for text classifiers using ranking-based feature selection. The proposed method employs a normalized mu- tual information, which is called inclusion measure, to es- timate asymmetric dependency between two terms. Based on pair-wise dependency measures, a dependency matrix is constructed. In this paper, an algorithm is proposed to learn term dependency links from term dependency matrix, and visualize the dependency between term in a graph called term dependency tree. All nodes of the tree are categorized into two groups: hubs and links. Any node whose outde- gree is less than two will join the Links group. We show that all link nodes are most likely redundant. We also in- troduce a criterion, which is called substitution cost, to de- cide whether to remove or retain a candidate, redundant term. The proposed approach is applied to four well-known benchmark data sets with a SVM and Rocchio classifier us- ing a set of highly aggressive feature selection schemes. The results show the effectiveness of the proposed method espe- cially when applied to weak classifiers.
Citation:
Masoud Makrehchi, Mohamed S. Kamel, "Learning Term Dependency Links Using Information Theoretic Inclusion Measure," icdmw, pp.423-428, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007), 2007
Usage of this product signifies your acceptance of the Terms of Use.