loading...
Context-Sensitive Error Correction: Using Topic Models to Improve OCR
Curitiba, Parana, Brazil September 23-September 26
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDAR.2007.91Ninth International Conference on Doc ...
 This Article 
 
PDF
HTML
IEEE Xplore Subscribers
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
M. Wick, University of Massachusetts Amherst
M. Ross, University of Massachusetts Amherst
E. Learned-Miller, University of Massachusetts Amherst
Modern optical character recognition software relies on human interaction to correct misrecognized charac- ters. Even though the software often reliably identifies low-confidence output, the simple language and vocabu- lary models employed are insufficient to automatically cor- rect mistakes. This paper demonstrates that topic models, which automatically detect and represent an article's se- mantic context, reduces error by 7% over a global word distribution in a simulated OCR correction task. Detecting and leveraging context in this manner is an important step towards improving OCR.
Citation:
M. Wick, M. Ross, E. Learned-Miller, "Context-Sensitive Error Correction: Using Topic Models to Improve OCR," icdar, vol. 2, pp.1168-1172, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 2007
Usage of this product signifies your acceptance of the Terms of Use.