loading...
Unsupervised Clustering of Text Entities in Heterogeneous Grey Level Documents
Quebec City, QC, Canada August 11-August 15
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICPR.2002.104783516th International Conference on Patt ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Stéphane Bres, INSA de Lyon
Véronique Eglin, INSA de Lyon
Antoine Gagneux, INSA de Lyon
This paper presents a new method of functional classification of text blocks on a document. It is based on texture analysis and unsupervised classification. Texture is used here to define different classes of text blocks in the document and to direct a possible way of exploration from the most eye-catching data to the less significant text block. The typographicaI properties of blocks are characterized by two main discriminating primitives: the complexity of the text draw ing and the structural relief of the block. This analysis is the starting point of a hree-classes categorization into functional families (main headings, sub-headings and text paragraphs). Each block of text is described and classified through a labeling process based on a 3D-feature space using the two previous features (complexity and structural relief) and a third one among pattern primitives, blocks size and location in the document. This method allows a first approach to a global context-free classification of documents.
Citation:
Stéphane Bres, Véronique Eglin, Antoine Gagneux, "Unsupervised Clustering of Text Entities in Heterogeneous Grey Level Documents," icpr, vol. 3, pp.30224, 16th International Conference on Pattern Recognition (ICPR'02) - Volume 3, 2002
Usage of this product signifies your acceptance of the Terms of Use.