Abstract
Privacy preserving data mining has gained considerable attention because of the increased concerns to ensure privacy of sensitive information. Amongst the two basic approaches for privacy preserving data mining, viz. Randomization based and Cryptography based, the later provides high level of privacy but incurs higher computational as well as communication overhead. Hence, it is necessary to explore alternative techniques that improve the overheads. In this work, we propose an efficient, collusion-resistant cryptography based approach for distributed K-Means clustering using Shamir’s secret sharing scheme. As we show from theoretical and practical analysis, our approach is provably secure and does not require a trusted third party. In addition, it has negligible computational overhead as compared to the existing approaches.
Chapter PDF
Similar content being viewed by others
References
Shaneck, M., Kim, Y., Kumar, V.: Privacy Preserving Nearest Neighbor Search. In: ICDM Workshops, pp. 541–545 (2006)
Aggarwal, C.C., Yu, P.S.: Privacy-Preserving Data Mining: A Survey. In: Michael, G., Sushil, J. (eds.) Handbook of Database Security, pp. 431–460. Springer (2007)
Wu., X., Chu, C.H., Wang, Y., Liu, F., Yue, D.: Privacy preserving data mining research: current status and key issues. In: 7th International Conference on Computational Science ICCS 2007, pp. 762–772 (2007)
Goldreich, O.: The Foundations of Cryptography, vol. 2. Cambridge Univ. Press, Cambridge (2004)
Pedersen, T.B., Saygin, Y., Savas, E.: Secret sharing vs. encryption-based techniques for privacy preserving data mining. In: UNECE/Eurostat Work Session on SDC (2007)
Oliveira, S.R.M.: Privacy preserving clustering by data transformation. In: 18th Brazilian Symposium on Databases, pp. 304–318 (2003)
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: 9th ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining. ACM Press (2003)
Inan, A., Kaya, S.V., Saygin, Y., Savas, E., Hintoglu, A.A., Levi, A.: Privacy preserving clustering on horizontally partitioned data. Data Knowl. Eng., 646–666 (2007)
Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: KDD, pp. 593–599 (2005)
Bunn, P., Ostrovsky, R.: Secure two-party k-means clustering. In: ACM Conference on Computer and Communications Security, pp. 486–497 (2007)
Jha, S., Kruger, L., McDaniel, P.: Privacy Preserving Clustering. In: di Vimercati, S.d.C., Syverson, P.F., Gollmann, D. (eds.) ESORICS 2005. LNCS, vol. 3679, pp. 397–417. Springer, Heidelberg (2005)
Upmanyu, M., Namboodiri, A.M., Srinathan, K., Jawahar, C.V.: Efficient Privacy Preserving K-Means Clustering. In: Chen, H., Chau, M., Li, S.-h., Urs, S., Srinivasa, S., Wang, G.A. (eds.) PAISI 2010. LNCS, vol. 6122, pp. 154–166. Springer, Heidelberg (2010)
Doganay, M.C., Pedersen, T.B., Saygin, Y., Savas, E., Levi, A.: Distributed privacy preserving k-means clustering with additive secret sharing. In: 2008 International Workshop on Privacy and Anonymity in Information Society, Nantes, France, pp. 3–11 (2008)
Shamir, A.: How to share a secret. Communications of the ACM 22(11), 612–613 (1979)
Forgey, E.: Cluster analysis of multivariate data: Efficiency vs. interpretability of classification. Biometrics 21(768) (1965)
Lloyd, S.P.: Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 129–137 (1982)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–296 (1967)
Kantarcioglu, M., Clifton, C.: Privacy-preserving Distributed Mining of Association Rules on Horizontally Partitioned Data. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD), pp. 639–644 (2002)
Verykios, S., Bertino, E., Fovino, I., Provenza, L., Saygin, Y., Theodoridis, Y.: Stateof- the-art in Privacy Preserving Data Mining. ACM SIGMOD Record 33(1), 50–57 (2004)
Bertino, E., Fovino, I., Provenza, L.: A Framework for Evaluating Privacy Preserving Data Mining Algorithms. Data Mining and Knowledge Discovery 11(2), 121–154 (2005)
Pinkas, B.: Cryptographic techniques for privacy-preserving data mining. SIGKDD Explor. Newslett. 4(2), 12–19 (2002), http://doi.acm.org/10.1145/772862.772865 , doi:10.1145/772862.772865
Lindell, Y., Pinkas, B.: Secure multiparty computation for privacy-preserving data mining. Journal of Privacy and Confidentiality 1(1), 59–98 (2009)
Ben-Or, M., Goldwasser, S., Wigderson, A.: Completeness theorems for non-cryptographic fault-tolerant distributed computation. In: 19th Annual ACM Conference on Theory of Computing (STOC), pp. 1–10. ACM Press (1988)
Ge, X., Yan, L., Zhu, J., Shi, W.: Privacy preserving distributed association rule mining based on a secret sharing technique. In: Second International Conference on Software Eng. and Data Mining, pp. 345–350 (2010)
http://www.uni-koeln.de/themen/statistik/data/cluster/milk.dat
Information and Computer Science. COIL 1999 Competition Data, The UCI KDD Archive. University of California Irvine (October 1999), http://kdd.ics.uci.edu/databases/coil/coil.html
Information and Computer Science. Japanese Vowels. University of California Irvine (June 2000), http://kdd.ics.uci.edu/databases/JapaneseVowels/JapaneseVowels.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 IFIP International Federation for Information Processing
About this paper
Cite this paper
Patel, S., Garasia, S., Jinwala, D. (2012). An Efficient Approach for Privacy Preserving Distributed K-Means Clustering Based on Shamir’s Secret Sharing Scheme. In: Dimitrakos, T., Moona, R., Patel, D., McKnight, D.H. (eds) Trust Management VI. IFIPTM 2012. IFIP Advances in Information and Communication Technology, vol 374. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29852-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-29852-3_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29851-6
Online ISBN: 978-3-642-29852-3
eBook Packages: Computer ScienceComputer Science (R0)