OCR systems for printed documents typically require large numbers of font styles and character models to work well. When given an unseen font, performance degrades even in the absence of noise. In this paper, we perform OCR in an unsupervised fashion without using any charac- ter models by using a cryptogram decoding algorithm. We present results on real and artificial OCR data.
Citation:
G. Huang, E. Learned-Miller, A. McCallum, "Cryptogram Decoding for OCR Using Numerization Strings," icdar, vol. 1, pp.208-212, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 1, 2007