loading...
On the Use of Semantic Blocking Techniques for Data Cleansing and Integration
Banff, Alberta, Canada September 06-September 08
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/IDEAS.2007.3611th International Database Engineeri ...
 This Article 
 
PDF
HTML
IEEE Xplore Subscribers
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Jordi Nin, IIIA, CSIC, Spain
Victor Muntes-Mulero, DAMA-UPC, Spain
Norbert Martinez-Bazan, DAMA-UPC, Spain
Josep-L. Larriba-Pey, DAMA-UPC, Spain
Record Linkage (RL) is an important component of data cleansing and integration. For years, many efforts have focused on improving the performance of the RL process, either by reducing the number of record comparisons or by reducing the number of attribute comparisons, which reduces the computational time, but very often decreases the quality of the results. However, the real bottleneck of RL is the post-process, where the results have to be reviewed by experts that decide which pairs or groups of records are real links and which are false hits.

In this paper, we show that exploiting the relationships (e.g. foreign key) established between one or more data sources, makes it possible to find a new sort of semantic blocking method that improves the number of hits and reduces the amount of review effort.

Index Terms:
Semantic information, blocking algorithms, record linkage, data integration, data cleansing.
Citation:
Jordi Nin, Victor Muntes-Mulero, Norbert Martinez-Bazan, Josep-L. Larriba-Pey, "On the Use of Semantic Blocking Techniques for Data Cleansing and Integration," ideas, pp.190-198, 11th International Database Engineering and Applications Symposium (IDEAS 2007), 2007
Usage of this product signifies your acceptance of the Terms of Use.