loading...
A Disc-based Approach to Data Summarization and Privacy Preservation
Vienna, Austria July 03-July 05
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/SSDBM.2006.618th International Conference on Scie ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Rong Ge, Simon Fraser University
Martin Ester, Simon Fraser University
Wen Jin, Simon Fraser University
Zengjian Hu, Simon Fraser University
Data summarization has been recognized as a fundamental operation in database systems and data mining with important applications such as data compression and privacy preservation. While the existing methods such as CFvalues and DataBubbles may perform reasonably well, they cannot provide any guarantees on the quality of their results. In this paper, we introduce a summarization approach for numerical data based on discs formalizing the notion of quality. Our objective is to find a minimal set of discs, i.e. spheres satisfying a radius and a significance constraint, covering the given dataset. Since the proposed problem is NP-complete, we design two different approximation algorithms. These algorithms have a quality guarantee, but they do not scale well to large databases. However, the machinery from approximation algorithms allows a precise characterization of a further, heuristic algorithm. This heuristic, efficient algorithm exploits multi-dimensional index structures and can be well-integrated with database systems. The experiments show that our heuristic algorithm generates summaries that outperform the state-of-the-art Data Bubbles in terms of internal measures as well as in terms of external measures when using the data summaries as input for clustering methods.
Citation:
Rong Ge, Martin Ester, Wen Jin, Zengjian Hu, "A Disc-based Approach to Data Summarization and Privacy Preservation," ssdbm, pp.321-332, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.