H. Guo, Stony Brook University, NY
In this paper, we describe a general-purpose approach for partitioning Web page content. The novelty of our ap- proach lies in the use of detailed layout information from a Web page renderer to determine spatial locality and identify visual separators, and the use of relaxed matching over pre- sentation style information to determine presentation style similarity. We present several examples to illustrate the gen- erality of our approach.
Citation:
H. Guo, J. Mahmud, Y. Borodin, A. Stent, I. Ramakrishnan, "A General Approach for Partitioning Web Page Content Based on Geometric and Style Information," icdar, vol. 2, pp.929-933, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 2007