Abstract
Privacy and security considerations can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery can alleviate this problem. We present a technique that uses EM mixture modeling to perform clustering on distributed data. This method controls data sharing, preventing disclosure of individual data items or any results that can be traced to an individual site.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the twentieth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems. ACM, Santa Barbara, CA, pp 247–255 *http://doi.acm.org/10.1145/375551.375602
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD conference on management of data. ACM, Dallas, TX, pp 439–450 *http://doi.acm.org/10.1145/342009.335438
Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821
Benaloh JC (1986) Secret sharing homomorphisms: keeping shares of a secret secret. In: Odlyzko A (ed) Advances in cryptography—CRYPTO86: proceedings (Lecture notes in computer science), vol 263. Springer, Berlin Heidelberg New York pp 251–260 *http://springerlink.metapress.com/openurl.asp?genre=article&issn=0302-9743&volume=263&spage=251
Blackmer S and Wilmer, Cutler, Pickering (1998) Transborder personal data flows: administrative practice. In: The privacy and American business meeting on model data protection contracts and laws. Washington, DC *http://www.privacyexchange.org/tbdi/pdataflow.html
Celeux G, Chauveau D, Diebolt J (1996) Stochastic versions of the EM algorithm: an experimental study in the mixture case. J Stat Comput Simul 55:287–314
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J Roy Stat Soc B 39:1–38
Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model based cluster analysis. Comput J 41:578–588
Goldreich O, Micali S, Wigderson A (1987) How to play any mental game—a completeness theorem for protocols with honest majority. In: 19th ACM symposium on the theory of computing, pp 218–229 *http://doi.acm.org/10.1145/28395.28420
Kantarcıoglu M, Clifton C (to appear) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans Knowl Data Eng
Lindell Y, Pinkas B (2000) Privacy preserving data mining. In: Advances in cryptology—CRYPTO 2000. Springer, Berlin Heidelberg New York, pp 36–54 *http://link.springer.de/link/service/series/0558/bibs/1880/18800036.htm*
McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Dekker, New York
McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley, New York
McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York
Meng XL, Rubin DB (1991) Using EM to obtain asymptotic variance covariance matrices: the SEM algorithm. J Am Stat Assoc 86:899–909
Moore AW (1999) Very fast EM-based mixture model clustering using multiresolution kd-trees. Adv Neur Inf Process Syst 11
Pri (2001) National omnibus laws, http://www.privacyexchange.org/legal/nat/omni/nol.html *http://www.privacyexchange.org/legal/nat/omni/nol.html
Rizvi SJ, Haritsa JR (2002) Maintaining data privacy in association rule mining. In: Proceedings of 28th international conference on very large data bases. VLDB, Hong Kong, pp 682–693 *http://www.vldb.org/conf/2002/S19P03.pdf
Vaidya J, Clifton C (2002) Privacy preserving association rule mining in vertically partitioned data. In: The eighth ACM SIGKDD international conference on knowledge discovery and data mining. Edmonton, Alberta, Canada, pp 639–644 *http://doi.acm.org/10.1145/775047.775142
Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: The ninth ACM SIGKDD international conference on knowledge discovery and data mining. Washington, DC
Yao AC (1986) How to generate and exchange secrets. In: Proceedings of the 27th IEEE symposium on foundations of computer science. IEEE, pp 162–167
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lin, X., Clifton, C. & Zhu, M. Privacy-preserving clustering with distributed EM mixture modeling. Knowl Inf Syst 8, 68–81 (2005). https://doi.org/10.1007/s10115-004-0148-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-004-0148-7