Abstract
A matrix M is said to be k-anonymous if for each row r in M there are at least k − 1 other rows in M which are identical to r. The NP-hard k-Anonymity problem asks, given an n × m-matrix M over a fixed alphabet and an integer s > 0, whether M can be made k-anonymous by suppressing (blanking out) at most s entries. Complementing previous work, we introduce two new “data-driven” parameterizations for k-Anonymity—the number t in of different input rows and the number t out of different output rows—both modeling aspects of data homogeneity. We show that k-Anonymity is fixed-parameter tractable for the parameter t in , and that it is NP-hard even for t out = 2 and alphabet size four. Notably, our fixed-parameter tractability result implies that k-Anonymity can be solved in linear time when t in is a constant. Our computational hardness results also extend to the related privacy problems p-Sensitivity and ℓ-Diversity, while our fixed-parameter tractability results extend to p-Sensitivity and the usage of domain generalization hierarchies, where the entries are replaced by more general data instead of being completely suppressed.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Aggarwal G, Feder T, Kenthapadi K, Motwani R, Panigrahy R, Thomas D, Zhu A (2005) Anonymizing tables. In: Proceedings of the 10th international conference on database theory (ICDT ’05), Springer, LNCS, vol 3363, pp 246–258
Aggarwal G, Feder T, Kenthapadi K, Khuller S, Panigrahy R, Thomas D, Zhu A (2010) Achieving anonymity via clustering. ACM Trans Algorithms 6(3): 1–19
Blocki J, Williams R (2010) Resolving the complexity of some data privacy problems. In: Proceedings of the 37th international colloquium on automata, languages and programming (ICALP ’10), Springer, LNCS, vol 6199, pp 393–404
Bonizzoni P, Della Vedova G, Dondi R (2011a) Anonymizing binary and small tables is hard to approximate. J Comb Optim 22(1): 97–119
Bonizzoni P, Della Vedova G, Dondi R, Pirola Y (2011b) Parameterized complexity of k-anonymity: hardness and tractability. J Comb Optim (available online)
Bredereck R, Nichterlein A, Niedermeier R, Philip G (2011) Pattern-guided data anonymization and clustering. In: Proceedings of the 36th international symposium mathematical foundations of computer science (MFCS ’11), Springer, LNCS, vol 6907, pp 182–193
Campan A, Truta TM (2009) Data and structural k-anonymity in social networks. In: Proceedings of the 2nd ACM SIGKDD international workshop on privacy, security, and trust in KDD (PinKDD ’08), LNCS, vol 5456, Springer, pp 33–54
Chakaravarthy VT, Pandit V, Sabharwal Y (2010) On the complexity of the k-anonymization problem. CoRR abs/1004.4729
Dondi R, Mauri G, Zoppis I (2012) The ℓ-diversity problem: Tractability and approximability. Theor Comput Sci (available online)
Downey RG, Fellows MR (1999) Parameterized Complexity. Springer,
Dwork C (2011) A firm foundation for private data analysis. Commun ACM 54(1): 86–95
Evans PA, Wareham T, Chaytor R (2009) Fixed-parameter tractability of anonymizing data by suppressing entries. J Comb Optim 18(4): 362–375
Fard AM, Wang K (2010) An effective clustering approach to web query log anonymization. In: Katsikas SK, Samarati P (eds) Proceedings of the international conference on security and cryptography (SECRYPT ’10), SciTePress, pp 109–119
Fellows MR (2009) Towards fully multivariate algorithmics: some new results and directions in parameter ecology. In: Proceedings of the 20th international workshop on combinatorial algorithms (IWOCA ’09), Springer, LNCS, vol 5874, pp 2–10
Flum J, Grohe M (2006) Parameterized complexity theory. Springer, Berlin
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Fredkin E (1960) Trie memory. Commun ACM 3(9): 490–499
Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv 42(4): 14–11453
Gedik B, Liu L (2008) Protecting location privacy with personalized k-anonymity: architecture and algorithms. IEEE Trans Mobile Comput 7(1): 1–18
Gionis A, Tassa T (2009) k-anonymization with minimal loss of information. IEEE Trans Knowl Data Eng 21(2): 206–219
Gkoulalas-Divanis A, Kalnis P, Verykios VS (2010) Providing k-anonymity in location based services. ACM SIGKDD Explor Newsl 12: 3–10
Gruteser M, Grunwald D (2003) Anonymous usage of location-based services through spatial and temporal cloaking. In: Proceedings of the 1st international conference on mobile systems, applications and services (MOBISYS ’03), ACM, MobiSys ’03, pp 31–42
Johnson DS (1987) The NP-completeness column: an ongoing guide. J Algorithms 8: 438–448
Kenig B, Tassa T (2012) A practical approximation algorithm for optimal k-anonymity. Data Mining Knowl Discov 25: 134–168
Komusiewicz C, Niedermeier R, Uhlmann J (2011) Deconstructing intractability—a multivariate complexity analysis of interval constrained coloring. J Discrete Algorithms 9: 137–151
Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of the 23rd international conference on data engineering (ICDE ’07), IEEE, pp 106–115
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) ℓ-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1): Article No. 3. doi:10.1145/1217299.1217302
Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. In: Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS ’04), ACM, pp 223–228
Monreale A, Andrienko G, Andrienko N, Giannotti F, Pedreschi D, Rinzivillo S, Wrobel S (2010) Movement data anonymity through generalization. Trans Data Privacy 3: 91–121
Navarro-Arribas G, Torra V, Erola A, Castellà-Roca J (2012) User k-anonymity for privacy preserving data mining of query logs. Inf Process Manag 48(3): 476–487
Niedermeier R (2006) Invitation to fixed-parameter algorithms. Oxford University Press, Oxford
Niedermeier R (2010) Reflections on multivariate algorithmics and problem parameterization. In: Proceedings of the 27th international symposium on theoretical aspects of computer science (STACS ’10), Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Leibniz international proceedings in informatics (LIPIcs), vol 5, pp 17–32
Orlin J (1988) A faster strongly polynomial minimum cost flow algorithm. In: Proceedings of the 20th annual ACM symposium on theory of computing (STOC ’88), ACM, pp 377–387
Park H, Shim K (2007) Approximate algorithms for k-anonymity. In: Proceedings of the international conference on management of data (SIGMOD ’07), ACM, SIGMOD ’07, pp 67–78
Samarati P (2001) Protecting respondents identities in microdata release. IEEE Trans Knowl Data Eng 13(6): 1010–1027
Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (PODS ’98), ACM, pp 188–188
Sweeney L (2000) Uniqueness of simple demographics in the U.S. population. Tech. rep., Carnegie Mellon University, School of Computer Science, Laboratory for International Data Privacy
Sweeney L (2002a) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl Based Syst 10(5): 571–588
Sweeney L (2002b) k-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5): 557–570
Truta TM, Vinay B (2006) Privacy protection: p-sensitive k-anonymity property. In: Proceedings of the 22nd international conference on data engineering workshops (ICDE ’06), IEEE Computer Society, p 94
Wong RCW, Li J, Fu AWC, Wang K (2006) (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: Proceedings of the 12th international conference on knowledge discovery and data mining (KDD ’06), ACM, pp 754–759
Xiao X, Tao Y (2006) Anatomy: simple and effective privacy preservation. In: Proceedings of the 32nd international conference on very large data bases (VLDB ’06), ACM, pp 139–150
Xiao X, Yi K, Tao Y (2010) The hardness and approximation algorithms for l-diversity. In: Proceedings of the 13th international conference on extending database technology (EDBT’ 10), ACM, pp 135–146
Zhou B, Pei J (2011) The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowl Inf Syst 28: 47–77
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Jian Pei.
An extended abstract entitled “The Effect of Homogeneity on the Complexity of k-Anonymity” appeared in Proceedings of the 18th International Symposium on Fundamentals of Computation Theory (FCT ’11), volume 6914 of LNCS, pages 53-64, Springer 2011. Apart from the full proofs omitted in that version, the current article also contains new results on ℓ-Diversity, p-Sensitivity, and on the usage of domain generalization hierarchies.
Rights and permissions
About this article
Cite this article
Bredereck, R., Nichterlein, A., Niedermeier, R. et al. The effect of homogeneity on the computational complexity of combinatorial data anonymization. Data Min Knowl Disc 28, 65–91 (2014). https://doi.org/10.1007/s10618-012-0293-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-012-0293-7