loading...
Character Duration Modeling for Speed Improvements in the BBN Byblos OCR System
Seoul, Korea August 31-September 01
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDAR.2005.71Eighth International Conference on Do ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Premkumar Natarajan, BBN Technologie, MA., USA
Ram Sundaram, BBN Technologie, MA., USA
Rohit Prasad, BBN Technologie, MA., USA
Ehry MacRostie, BBN Technologie, MA., USA
In this paper, we describe a recent enhancement to our HMM-based OCR system that results in a signzjicant increase in the speed of the system without any impact on recognition accuracy. Recognition speed is, in part, a function of the number of distinct HMMs that constitute the model set. As a result, the recognition speed is much slower for ideographic scripts, such as Chinese and Japanese which contain thousands of glyphs, than for alphabetic scripts such as Latin and Arabic. In our current OCR system, methods like sub-character modeling and Gaussian shortlists are used to reduce the processing time. In this paper we describe a simple character-based duration modeling technique that puts a duration constraint on the number of frames for which a character can stay active. Character durations were obtained from automatically labeled training data and a probability mass function (histogram) was used to model character durations. The use of a duration model yielded a 37% improvement in speed with no loss in accuracy.
Citation:
Premkumar Natarajan, Ram Sundaram, Rohit Prasad, Ehry MacRostie, "Character Duration Modeling for Speed Improvements in the BBN Byblos OCR System," icdar, pp.1136-1140, Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.