Abstract
Recently, more and more social network data have been published in one way or another. Preserving privacy in publishing social network data becomes an important concern. With some local knowledge about individuals in a social network, an adversary may attack the privacy of some victims easily. Unfortunately, most of the previous studies on privacy preservation data publishing can deal with relational data only, and cannot be applied to social network data. In this paper, we take an initiative toward preserving privacy in social network data. Specifically, we identify an essential type of privacy attacks: neighborhood attacks. If an adversary has some knowledge about the neighbors of a target victim and the relationship among the neighbors, the victim may be re-identified from a social network even if the victim’s identity is preserved using the conventional anonymization techniques. To protect privacy against neighborhood attacks, we extend the conventional k-anonymity and l-diversity models from relational data to social network data. We show that the problems of computing optimal k-anonymous and l-diverse social networks are NP-hard. We develop practical solutions to the problems. The empirical study indicates that the anonymized social network data by our methods can still be used to answer aggregate network queries with high accuracy.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Adamic L, Adar E (2005) How to search a social network. Soc Netw 27(3): 187–203
Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’06), ACM Press, New York, pp 44–54
Backstrom L, Dwork C, Kleinberg J (2007) Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In: Proceedings of the 16th international conference on World Wide Web (WWW’07), ACM Press, New York, pp 181–190
Bhagat S, Cormode G, Krishnamurthy B, Srivastava D (2009) Class-based graph anonymization for social network data. PVLDB 2(1): 766–777
Campan A, Truta TM (2008) A clustering approach for data and structural anonymity in social networks. In: Proceedings of the 2nd ACM SIGKDD international workshop on privacy, security, and trust in KDD (PinKDD’08), in conjunction with KDD’08, Las Vegas, Nevada
Chakrabarti D, Zhan Y, Faloutsos C (2004) R-mat: a recursive model for graph mining. In: Proceedings of the 2004 SIAM international conference on data mining (SDM’04), SIAM, Philadelphia
Cormen TH, Leiserson CE, Rivest RL, Stein C (2002) Introduction to algorithms, 2nd edn. MIT Press and McGraw-Hill, Cambridge
Cormode G, Srivastava D, Yu T, Zhang Q (2008) Anonymizing bipartite graph data using safe groupings. PVLDB 1(1): 833–844
Coull SE, Monrose F, Reiter MK, Bailey M (2009) The challenges of effectively anonymizing network data. In: Proceedings of the 2009 cybersecurity applications & technology conference for homeland security (CATCH’09), IEEE Computer Society, Washington, DC, pp 230–236
Dwork C (2008) Differential privacy: a survey of results. In: Proceedings of the 5th international conference on theory and applications of models of computation. Lecture notes in computer science, vol 4978. Springer, pp 1–19
Dwork C, Smith A (2008) Differential privacy for statistics: what we know and what we want to learn. In: Proceedings of NCHS/CDC data confidentiality workshop
Faloutsos M, Faloutsos P, Faloutsos C (1999) On power law relationships of the internet topology. In: Proceedings of the conference on applications, technologies, architectures, and protocols for computer communication (SIGCOMM’99), ACM Press, New York, pp 251–262
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman & Co., New York
Getoor L, Diehl CP (2005) Link mining: a survey. ACM SIGKDD Explor Newsl 7(2): 3–12
Gkoulalas-Divanis A, Verykios VS (2009) Hiding sensitive knowledge without side effects. Knowl Inf Syst 20(3): 263–299
Hay M, Miklau G, Jensen D, Weis P, Srivastava S (2007) Anonymizing social networks. Tech. Rep. 07-19, University of Massachusetts Amherst
Hay M, Miklau G, Jensen D, Towsley D (2008) Resisting structural identification in anonymized social networks. PVLDB 1(1): 102–114
Hay M, Li C, Miklau G, Jensen D (2009) Accurate estimation of the degree distribution of private networks. In: Proceedings of the 2009 ninth IEEE international conference on data mining (ICDM’09), IEEE Computer Society, Washington, DC, pp 169–178
Hazan E, Safra S, Schwartz O (2003) On the complexity of approximating k-dimensional matching. In: Proceedings of the 6th international workshop on approximation algorithms for combinatorial optimization problems and of the 7th international workshop on randomization and computation techniques in computer science (RANDOM-APPROX’03), LNCS, vol 2764. Springer, Berlin, pp 83–97
Korolova A, Motwani R, Nabar SU, Xu Y (2008) Link privacy in social networks. In: Proceedings of the 24th international conference on data engineering (ICDE’08), IEEE, pp 1355–1357
Kossinets G, Watts DJ (2006) Empirical analysis of an evolving social network. Science 311(5757): 88–90
Kumar R, Novak J, Tomkins A (2006) Structure and evolution of online social networks. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’06), ACM Press, New York, pp 611–617
Li N, Li T, Venkatasubramanian S (2007) t-Closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of the 23rd international conference on data engineering (ICDE’07), IEEE, pp 106–115
Liu K, Terzi E (2008) Towards identity anonymization on graphs. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data (SIGMOD’08), ACM Press, New York, pp 93–106
Liu K, Das K, Grandison T, Kargupta H (2008) Privacy-preserving data analysis on graphs and social networks. In: Kargupta H, Han J, Yu P, Motwani R, Kumar V (eds) Next generation data mining. CRC Press, Boca Raton
Luo H, Fan J, Lin X, Zhou A, Bertino E (2009) A distributed approach to enabling privacy-preserving model-based classifier training. Knowl Inf Syst 20(2): 157–185
Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) L-diversity: privacy beyond k-anonymity. In: Proceedings of the 22nd IEEE international conference on data engineering (ICDE’06), IEEE Computer Society, Washington, DC
Machanavajjhala A, Kifer D, Abowd JM, Gehrke J, Vilhuber L (2008) Privacy: theory meets practice on the map. In: Proceedings of the 24th international conference on data engineering (ICDE’08), pp 277–286
Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. In: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS’04), ACM, New York, pp 223–228
Muhlestein D, Lim S (2010) Online learning with social computing based interest sharing. Knowl Inf Syst. doi:10.1007/s10115-009-0265-4
Qiu L, Li Y, Wu X (2008) Protecting business intelligence and customer privacy while outsourcing data mining tasks. Knowl Inf Syst 17(2): 99–120
Rastogi V, Hay M, Miklau G, Suciu D (2009) Relationship privacy: output perturbation for queries with joins. In: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS’09), ACM, New York, pp 107–116
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng (TKDE) 13(6): 1010–1027
Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information. In: Proceedings of the 7th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (PODS’98), ACM Press, New York, p 188
Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5): 557–570
Wang DW, Liau CJ, Hsu TS (2006) Privacy protection in social network data disclosure based on granular computing. In: Proceedings of the 2006 IEEE international conference on fuzzy systems, Vancouver, BC, pp 997–1003
Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, New York
Xiao X, Tao Y (2006) Anatomy: simple and effective privacy preservation. In: Dayal U, Whang KY, Lomet DB, Alonso G, Lohman GM, Kersten ML, Cha SK, Kim YK (eds) Proceedings of the 32nd international conference on very large data bases (VLDB’06), ACM, pp 139–150
Xiao X, Tao Y (2006b) Personalized privacy preservation. In: Proceedings of the 2006 ACM SIGMOD international conference on Management of data (SIGMOD’06), ACM Press, New York, pp 229–240
Xiao X, Tao Y (2007) M-invariance: towards privacy preserving re-publication of dynamic datasets. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data (SIGMOD’07), ACM, New York, pp 689–700
Xiao X, Tao Y (2008) Output perturbation with query relaxation. PVLDB 1(1): 857–869
Xu J, Wang W, Pei J, Wang X, Shi B, Fu AWC (2006) Utility-based anonymization using local recoding. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’06), ACM Press, New York, pp 785–790
Yan X, Han J (2002) gspan: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE international conference on data mining (ICDM’02), IEEE Computer Society, Washington, DC, p 721
Yan X, Yu PS, Han J (2004) Graph indexing: a frequent structure-based approach. In: Proceedings of the 2004 ACM SIGMOD international conference on management of data (SIGMOD’04), ACM Press, New York, pp 335–346
Ying X, Wu X (2008) Randomizing social networks: a spectrum preserving approach. In: Proceedings of the 2008 SIAM international conference on data mining (SDM’08), SIAM, pp 739–750
Ying X, Wu X (2009a) On link privacy in randomizing social networks. In: Proceedings of the 13th Pacific-Asia conference on advances in knowledge discovery and data mining, Springer, pp 28–39
Ying X, Wu X (2009b) On randomness measures for social networks. In: Proceedings of the 2009 SIAM international conference on data mining, SIAM, pp 709–720
Zheleva E, Getoor L (2007) Preserving the privacy of sensitive relationships in graph data. In: Proceedings of the 1st ACM SIGKDD workshop on privacy, security, and trust in KDD (PinKDD’07)
Zhou B, Pei J (2008) Preserving privacy in social networks against neighborhood attacks. In: Proceedings of the 24th IEEE international conference on data engineering (ICDE’08), IEEE Computer Society, Cancun, pp 506–515
Zhou B, Pei J, Luk WS (2008) A brief survey on anonymization techniques for privacy preserving publishing of social network data. SIGKDD Explor 10(2): 12–22
Zou L, Chen L, Özsu MT (2009) K-automorphism: a general framework for privacy preserving network publication. PVLDB 2(1): 946–957
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version of this paper appears as Zhou and Pei [49]. This research is supported in part by an NSERC Discovery Grant and an NSERC Discovery Accelerator Supplement Grant. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.
Rights and permissions
About this article
Cite this article
Zhou, B., Pei, J. The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowl Inf Syst 28, 47–77 (2011). https://doi.org/10.1007/s10115-010-0311-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-010-0311-2