loading...
Improving OCR Text Categorization Accuracy with Electronic Abstracts
Lyon, France April 27-April 28
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/DIAL.2006.22Second International Conference on Do ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Linlin Li, National University of Singapore, Kent Ridge, Singapore
Chew Lim Tan, National University of Singapore, Kent Ridge, Singapore
Categorization of imaged documents is a useful technique for building document image based digital libraries. This paper investigates techniques to improve categorization accuracy on OCR text, particularly that of biomedical imaged documents. Experiments with different feature selection methods were run to explore their effect on the categorization performance. The result shows that document frequency is a good feature selection method in terms of eliminating OCR errors. Furthermore, our categorization scheme IMP that combines OCR text and electronic abstracts shows consistent improvement on the accuracy as compared to categorizing on either abstracts or OCR text alone.
Citation:
Linlin Li, Chew Lim Tan, "Improving OCR Text Categorization Accuracy with Electronic Abstracts," dial, pp.82-87, Second International Conference on Document Image Analysis for Libraries (DIAL'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.