loading...
Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping
Houston, Texas November 27-November 30
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2005.18Fifth IEEE International Conference o ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Mikhail Bilenko, University of Texas at Austin
Sugato Basu, University of Texas at Austin
Mehran Sahami, Google Inc.
The problem of record linkage focuses on determining whether two object descriptions refer to the same underlying entity. Addressing this problem effectively has many practical applications, e.g., elimination of duplicate records in databases and citation matching for scholarly articles. In this paper, we consider a new domain where the record linkage problem is manifested: Internet comparison shopping. We address the resulting linkage setting that requires learning a similarity function between record pairs from streaming data. The learned similarity function is subsequently used in clustering to determine which records are co-referent and should be linked. We present an online machine learning method for addressing this problem, where a composite similarity function based on a linear combination of basis functions is learned incrementally. We illustrate the efficacy of this approach on several real-world datasets from an Internet comparison shopping site, and show that our method is able to effectively learn various distance functions for product data with differing characteristics. We also provide experimental results that show the importance of considering multiple performance measures in record linkage evaluation.
Citation:
Mikhail Bilenko, Sugato Basu, Mehran Sahami, "Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping," icdm, pp.58-65, Fifth IEEE International Conference on Data Mining (ICDM'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.