loading...
SVM Based Scheme for Thai and English Script Identification
Curitiba, Parana, Brazil September 23-September 26
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDAR.2007.237Ninth International Conference on Doc ...
 This Article 
 
PDF
HTML
IEEE Xplore Subscribers
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
S. Chanda, Indian Statistical Institute, Kolkata-108, India
O.R. Terrades, Universitat Autonoma de Barcelona, 08193, Barcelona, Spain
U. Pal, Indian Statistical Institute, Kolkata-108, India
In some Thai documents, a single text line of a document page may contain both Thai and English scripts. For the Optical Character Recognition (OCR) of such a document page it is better to identify, at first, Thai and English script portions and then to use individual OCR system of the respective scripts on these identified portions. In this paper, a SVM based method is proposed for identification of word-wise printed English and Thai scripts from a single line of a document page. Here, at first, the document is segmented into lines and then lines are segmented into character groups (words). In the proposed scheme, we identify the script of the individual character group combining different character features obtained from structural shape, profile, component overlapping information, topological properties, water reservoir concept etc. Based on the experiment on 6110 data we obtained 99.36% script identification accuracy from the proposed scheme.
Citation:
S. Chanda, O.R. Terrades, U. Pal, "SVM Based Scheme for Thai and English Script Identification," icdar, vol. 1, pp.551-555, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 1, 2007
Usage of this product signifies your acceptance of the Terms of Use.