loading...
An Efficient Word Segmentation Technique for Historical and Degraded Machine-Printed Documents
Curitiba, Parana, Brazil September 23-September 26
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDAR.2007.51Ninth International Conference on Doc ...
 This Article 
 
PDF
HTML
IEEE Xplore Subscribers
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
M. Makridis, Democritus University of Thrace, 67 100 Xanthi, Greece
N. Nikolaou, Democritus University of Thrace, 67 100 Xanthi, Greece
B. Gatos, Democritus University of Thrace, 67 100 Xanthi, Greece
Word segmentation is a crucial step for segmentation-free document analysis systems and is used for creating an index based on word matching. In this paper, we propose a novel methodology for word segmentation in historical and degraded machine- printed documents. The proposed technique faces problems such as having text of different size, having text and non-text areas lying very near and having non-straight and warped text lines. It is based on: (i) a dynamic run length smoothing algorithm that helps grouping together homogeneous text regions, (ii) noise and punctuation marks removal as well as on obstacle detection in order to facilitate the segmentation process and (iv) a draft text line estimation procedure that guides the final word segmentation result. After testing on numerous historical and degraded machine- printed documents, it has turned out that our methodology performs better compared to current state-of-the-art word segmentation techniques for historical and degraded machine-printed documents.
Citation:
M. Makridis, N. Nikolaou, B. Gatos, "An Efficient Word Segmentation Technique for Historical and Degraded Machine-Printed Documents," icdar, vol. 1, pp.178-182, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 1, 2007
Usage of this product signifies your acceptance of the Terms of Use.