loading...
Statistical-Based Approach to Word Segmentation
Barcelona, Spain September 03-September 08
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICPR.2000.90298015th International Conference on Patt ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Yalin Wang, University of Washington
Robert Haralick, University of Washington
Ihsin T. Phillips, Seattle University
This paper presents a text word extraction algorithm that takes a set of bounding boxes of glyphs and their associated text lines of a given document and partitions the glyphs into a set of text words, using only the geometric information of the input glyphs. The algorithm is probability based. An iterative, relaxation-like method is used to find the partitioning solution that maximizes the joint probability. To evaluate the performance of our text word extraction algorithm, we used a three-validation method and developed a quantitative performance measure. The algorithm was evaluated on the UW-III database of some 1600 scanned document image pages. An area-overlap measure was used to find the correspondence between the detected entities and the ground-truth. For a total of 827; 433 ground truth words, the algorithm identified and segmented 806; 149 words correctly, an accuracy of 97.43%.
Citation:
Yalin Wang, Robert Haralick, Ihsin T. Phillips, "Statistical-Based Approach to Word Segmentation," icpr, vol. 4, pp.4555, 15th International Conference on Pattern Recognition (ICPR'00) - Volume 4, 2000
Usage of this product signifies your acceptance of the Terms of Use.