loading...
Distance Measures for Layout-Based Document Image Retrieval
Lyon, France April 27-April 28
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/DIAL.2006.16Second International Conference on Do ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Joost Van Beusekom, Technical University of Kaiserslautern, Germany
Daniel Keysers, Technical University of Kaiserslautern, Germany
Faisal Shafait, Technical University of Kaiserslautern, Germany
Thomas M. Breuel, Technical University of Kaiserslautern, Germany
Most methods for document image retrieval rely solely on text information to find similar documents. This paper describes a way to use layout information for document image retrieval instead. A new class of distance measures is introduced for documents with Manhattan layouts, based on a two-step procedure: First, the distances between the blocks of two layouts are calculated. Then, the blocks of one layout are assigned to the blocks of the other layout in a matching step. Different block distances and matching methods are compared and evaluated using the publicly available MARG database. On this dataset, the layout type can be determined successfully in 92.6% of the cases using the best distance measure in a nearest neighbor classifier. The experiments show that the best distance measure for this task is the overlapping area combined with the Manhattan distance of the corner points as block distance together with the minimum weight edge cover matching.
Citation:
Joost Van Beusekom, Daniel Keysers, Faisal Shafait, Thomas M. Breuel, "Distance Measures for Layout-Based Document Image Retrieval," dial, pp.232-242, Second International Conference on Document Image Analysis for Libraries (DIAL'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.