loading...
A Method for Calculating Term Similarity on Large Document Collections
Las Vegas, Nevada April 28-April 30
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ITCC.2003.1197526International Conference on Informati ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Wolfgang W. Bein, University of Nevada
Jeffrey S. Coombs, University of Nevada
Kazem Taghva, University of Nevada
We present an efficient algorithm called the Quadtree Heuristic for identifying a list of similar terms for each unique term in a large document collection. Term similarity is defined using the Expected Mutual Information Measure (EMIM). Since our aim for defining the similarity lists is to improve information retrieval (IR), we present the outcome of an experiment comparing the performance of an IR engine designed to use the similarity lists. Two methods were used to generate similarity lists: a brute-force technique and the Quadtree Heuristic. The performance of the list generated by the Quadtree Heuristic was commensurate with the brute force list.
Citation:
Wolfgang W. Bein, Jeffrey S. Coombs, Kazem Taghva, "A Method for Calculating Term Similarity on Large Document Collections," itcc, pp.199, International Conference on Information Technology: Computers and Communications, 2003
Usage of this product signifies your acceptance of the Terms of Use.