loading...
Page Segmentation for Manhattan and Non-Manhattan Layout Documents via Selective CRLA
Seoul, Korea August 31-September 01
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDAR.2005.185Eighth International Conference on Do ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Hung-Ming Sun, Kainan University, Taoyuan, Taiwan, R.O.C.
The Constrained Run-Length Algorithm (CRLA) is a well-known technique for page segmentation. The algorithm is fast and can be used to partition documents with Manhattan layouts. It is not, however, suited to deal with pages with layouts beyond the Manhattan format, e.g. irregular halftone images embedded in text paragraphs. A modified version of the CRLA, named selective CRLA, is presented in this paper. The selective CRLA is capable of processing documents with both Manhattan and non-Manhattan layouts. The selective CRLA is performed twice with different sets of parameters on a label image derived from the input document image. After both of its executions, the yielded text regions are extracted. The proposed method has been successfully applied to extraction of text from commercial magazine pages with complicated layouts.
Citation:
Hung-Ming Sun, "Page Segmentation for Manhattan and Non-Manhattan Layout Documents via Selective CRLA," icdar, pp.116-120, Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.