loading...
A Multi-Font OCR System for Printed Telugu Text
Hyderabad, India December 13-December 15
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/LEC.2002.1182284Language Engineering Conference (LEC'02)
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
C. Vasantha Lakshmi, Dayal Educational Institute
C. Patvardhan, Dayal Educational Institute
This work describes the design and development of a Telugu Optical Character Recognition system for printed text (TOSP). Pre-processing tasks considered in this paper are: Conversion of a grey scale image to a binary image, image rectification, skew detection and removal, segmentation of text into lines, words and basic symbols. Basic symbols are identified as the fundamental unit of segmentation in this paper which are recognized by the recognizer. The combinations of these basic symbols that together form characters and compound characters of Telugu are also determined to complete the recognition process. The special feature of TOSP is that it is designed to handle multiple sizes and multiple fonts. Further, the output produced by TOSP can directly be opened in any Indian language software that supports transliteration facility into Telugu script and edited. Several such softwares are popular and available.
Citation:
C. Vasantha Lakshmi, C. Patvardhan, "A Multi-Font OCR System for Printed Telugu Text," lec, pp.7, Language Engineering Conference (LEC'02), 2002
Usage of this product signifies your acceptance of the Terms of Use.