Abstract
Data integration methods enable different data providers to flexibly integrate their expertise and deliver highly customizable services to their customers. Nonetheless, combining data from different sources could potentially reveal person-specific sensitive information. In VLDBJ 2006, Jiang and Clifton (Very Large Data Bases J (VLDBJ) 15(4):316–333, 2006) propose a secure Distributed k-Anonymity (DkA) framework for integrating two private data tables to a k-anonymous table in which each private table is a vertical partition on the same set of records. Their proposed DkA framework is not scalable to large data sets. Moreover, DkA is limited to a two-party scenario and the parties are assumed to be semi-honest. In this paper, we propose two algorithms to securely integrate private data from multiple parties (data providers). Our first algorithm achieves the k-anonymity privacy model in a semi-honest adversary model. Our second algorithm employs a game-theoretic approach to thwart malicious participants and to ensure fair and honest participation of multiple data providers in the data integration process. Moreover, we study and resolve a real-life privacy problem in data sharing for the financial industry in Sweden. Experiments on the real-life data demonstrate that our proposed algorithms can effectively retain the essential information in anonymous data for data analysis and are scalable for anonymizing large data sets.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Adam N.R., Wortman J.C.: Security control methods for statistical databases. ACM Comput. Surv. 21(4), 515–556 (1989)
Agrawal, R., Terzi, E.: On honesty in sovereign information sharing. In: Proceedings of the EDBT (2006)
Agrawal, R., Evfimievski, A., Srikant, R.: Information sharing across private databases. In: Proceedings of ACM SIGMOD, San Diego, CA (2003)
Axelrod R.: The Evolution of Cooperation. Basic Books, New York (1984)
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE (2005)
Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: PODS (2005)
Brodsky A., Farkas C., Jajodia S.: Secure databases: Constraints, inference channels, and monitoring disclosures. IEEE Trans. Knowl. Data Eng. 12, 900–919 (2000)
Clifton C., Kantarcioglu M., Vaidya J., Lin X., Zhu M.Y.: Tools for privacy preserving distributed data mining. ACM SIGKDD Explor. Newsl. 4(2), 28–34 (2002)
Dayal U., Hwang H.Y.: View definition and generalization for database integration in a multidatabase systems. IEEE Trans. Softw. Eng. 10(6), 628–645 (1984)
Denning D., Schlorer J.: Inference controls for statistical databases. IEEE Comput. 16(7), 69–82 (1983)
Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: PODS (2003)
Du, W., Zhan, Z.: Building decision tree classifier on private data. In: Workshop on Privacy, Security, and Data Mining at the IEEE ICDM (2002)
Du, W., Han, Y.S., Chen, S.: Privacy-preserving multivariate statistical analysis: linear regression and classification. In: Proceedings of the SIAM International Conference on Data Mining (SDM), Florida (2004)
Dwork, C.: Differential privacy. In: ICALP (2006)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: TCC (2006)
Farkas C., Jajodia S.: The inference problem: A survey. ACM SIGKDD Explor. Newsl. 4(2), 6–11 (2003)
Fung B.C.M., Wang K., Yu P.S.: Anonymizing classification data for privacy preservation. IEEE TKDE 19(5), 711–725 (2007)
Fung B.C.M., Wang K., Chen R., Yu P.S.: Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv. 42(4), 14:1–14:53 (2010)
Hinke, T.: Inference aggregation detection in database management systems. In: IEEE S&P (1988)
Inan, A., Kantarcioglu, M., Bertino, E., Scannapieco, M.: A hybrid approach to private record linkage. In: Proceedings of the Int’l Conference on Data Engineering (2008)
Inan, A., Kantarcioglu, M., Ghinita, G., Bertino, E.: Private record matching using differential privacy. In: Proceedings of the EDBT (2010)
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: SIGKDD (2002)
Jiang, W., Clifton, C.: Privacy-preserving distributed k-anonymity. In: BDSec (2005)
Jiang W., Clifton C.: A secure distributed framework for achieving k-anonymity. Very Large Data Bases J. (VLDBJ) 15(4), 316–333 (2006)
Jiang W., Clifton C., Kantarcioglu M.: Transforming semi-honest protocols to ensure accountability. Data Knowl. Eng. 65(1), 57–74 (2008)
Jurczyk, P., Xiong, L.: Distributed anonymization: achieving privacy for both data subjects and data providers. In: DBSec (2009)
Kantarcioglu M., Kardes O.: Privacy-preserving data mining in the malicious model. Int. J. Inf. Comput. Secur. 2(4), 353–375 (2008)
Kantarcioglu, M., Xi, B., Clifton, C.: A game theoretical model for adversarial learning. In: Proceedings of the NGDM Workshop (2007)
Kardes, O., Kantarcioglu, M.: Privacy-preserving data mining applications in malicious model. In: Proceedings of the PADM Workshop (2007)
Kargupta, H., Das, K., Liu, K.: A game theoretic approach toward multi-party privacy-preserving distributed data mining. In: Proceedings of the PKDD (2007)
Kleinberg, J., Papadimitriou, C., Raghavan, P.: On the value of private information. In: TARK (2001)
Layfield, R., Kantarcioglu, M., Thuraisingham, B.: Incentive and trust issues in assured information sharing. In: Proceedings of the CollaborateComm (2008)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Workload-aware anonymization. In: SIGKDD (2006)
Li, N., Li, T., Venkatasubramanian, S. t-closeness: privacy beyond k-anonymity and ℓ-diversity. In: ICDE (2007)
Lindell Y., Pinkas B.: Privacy preserving data mining. J. Cryptol. 15(3), 177–206 (2002)
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: ℓ-diversity: privacy beyond k-anonymity. ACM TKDD 1(1) (2007)
Malvestuto F.M., Mezzini M., Moscarini M.: Auditing sum- queries to make a statistical database secure. ACM Trans. Inf. Syst. Secur. 9(1), 31–60 (2006)
Mohammed, N., Fung, B.C.M., Hung, P.C.K., Lee, C.: Anonymizing healthcare data: a case study on the blood transfusion service. In: SIGKDD (2009a)
Mohammed, N., Fung, B.C.M., Wang, K., Hung, P.C.K.: Privacy-preserving data mashup. In: EDBT (2009b)
Mohammed, N., Fung, B.C.M., Hung, P.C.K., Lee, C. (2010) Centralized and distributed anonymization for high-dimensional healthcare data. ACM Trans. Knowl. Discov. Data (TKDD) 4(4), 18:1–18:33
Nash J.: Non-cooperative games. Ann. Math. 54(2), 286–295 (1951)
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. http://archive.ics.uci.edu/ml/ (1998)
Nisan, N.: Algorithms for selfish agents. In: Proceedings of the STACS (1999)
Osborne M.J., Rubinstein A.: A Course in Game Theory. The MIT Press, Cambridge, UK (1994)
Pinkas B.: Cryptographic techniques for privacy-preserving data mining. ACM SIGKDD Explor. Newsl. 4(2), 12–19 (2002)
Quinlan J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Los Altos (1993)
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE TKDE 13(6), 1010–1027 (2001)
Sweeney, L.: Datafly: a system for providing anonymity in medical data. In: Proceedings of the DBSec (1998)
Sweeney L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 571–588 (2002a)
Sweeney, L.: k-anonymity: a model for protecting privacy. In: International Journal on Uncertainty, Fuzziness and Knowledge-based Systems (2002b)
Thuraisingham, B.M.: Security checking in relational database management systems augmented with inference engines. Comput. Secur. 6(6), 479–492 (1987)
Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the ACM SIGKDD (2002)
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the ACM SIGKDD (2003)
Wang K., Fung B.C.M., Yu P.S.: Handicapping attacker’s confidence: An alternative to k-anonymization. KAIS 11(3), 345–368 (2007)
Wiederhold, G.: Intelligent integration of information. In: Proceedings of ACM SIGMOD, pp 434–437 (1993)
Wong, R.C.W., Li, J., Fu, A.W.C., Wang, K.: (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: SIGKDD (2006)
Xiao, X., Tao, Y.: Anatomy: simple and effective privacy preservation. In: VLDB (2006)
Xiao, X., Yi, K., Tao, Y. The hardness and approximation algorithms for l-diversity. In: EDBT (2010)
Yang, Z., Zhong, S., Wright, R.N.: Privacy-preserving classification of customer data without loss of accuracy. In: Proceedings of the SDM (2005)
Yao, A.C.: Protocols for secure computations. In: Proceedings of the IEEE FOCS (1982)
Zhang, N., Zhao, W.: Distributed privacy preserving information sharing. In: Proceedings of the VLDB (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mohammed, N., Fung, B.C.M. & Debbabi, M. Anonymity meets game theory: secure data integration with malicious participants. The VLDB Journal 20, 567–588 (2011). https://doi.org/10.1007/s00778-010-0214-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-010-0214-6