loading...
Scaling up the ALIAS Duplicate Elimination System: A Demonstration
Bangalore, India March 05-March 08
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDE.2003.126086719th International Conference on Data ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Sunita Sarawagi, Indian Institute of Technology Bombay
Alok Kirpal, Indian Institute of Technology Bombay
Duplicate elimination is an important stage in integrating data from multiple sources. The challenges involved are finding a robust deduplication function that can identify when two records are duplicates and efficiently applying the function on very large lists of records. In ALIAS the task of designing a deduplication function is eased by learning the function from examples of duplicates and non-duplicates and by using active learning to spot such examples effectively [1]. Here we investigate the issues involved in efficiently applying the learnt deduplication system on large lists of records. We demonstrate the working of the ALIAS evaluation engine and highlight the optimizations it uses to significantly cut down the number of record pairs that need to be explicitly materialized.
Citation:
Sunita Sarawagi, Alok Kirpal, "Scaling up the ALIAS Duplicate Elimination System: A Demonstration," icde, pp.783, 19th International Conference on Data Engineering (ICDE'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.


Suggestions