loading...
Privacy-preserving Distributed Clustering using Generative Models
Melbourne, Florida November 19-November 22
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2003.1250922Third IEEE International Conference o ...
 This Article 
 
PDF
HTML
IEEE Xplore Subscribers
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Srujana Merugu, University of Texas at Austin
Joydeep Ghosh, University of Texas at Austin
We present a framework for clustering distributed data in unsupervised and semi-supervised scenarios, taking into account privacy requirements and communication costs. Rather than sharing parts of the original or perturbed data, we instead transmit the parameters of suitable generative models built at each local data site to a central location. We mathematically show that the best representative of all the data is a certain "mean" model, and empirically show that this model can be approximated quite well by generating artificial samples from the underlying distributions using Markov Chain Monte Carlo techniques, and then fitting a combined global model with a chosen parametric form to these samples. We also propose a new measure that quantifies privacy based on information theoretic concepts, and show that decreasing privacy leads to a higher quality of the combined model and vice versa. We provide empirical results on different data types to highlight the generality of our framework. The results show that high quality distributed clustering can be achieved with little privacy loss and low communication cost.
Citation:
Srujana Merugu, Joydeep Ghosh, "Privacy-preserving Distributed Clustering using Generative Models," icdm, pp.211, Third IEEE International Conference on Data Mining (ICDM'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.