The effect of homogeneity on the computational complexity of combinatorial data anonymization

Bredereck, Robert; Nichterlein, André; Niedermeier, Rolf; Philip, Geevarghese

doi:10.1007/s10618-012-0293-7

The effect of homogeneity on the computational complexity of combinatorial data anonymization

Published: 13 October 2012

Volume 28, pages 65–91, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

The effect of homogeneity on the computational complexity of combinatorial data anonymization

Download PDF

Robert Bredereck¹,
André Nichterlein¹,
Rolf Niedermeier¹ &
…
Geevarghese Philip²

449 Accesses
10 Citations
Explore all metrics

Abstract

A matrix M is said to be k-anonymous if for each row r in M there are at least k − 1 other rows in M which are identical to r. The NP-hard k-Anonymity problem asks, given an n × m-matrix M over a fixed alphabet and an integer s > 0, whether M can be made k-anonymous by suppressing (blanking out) at most s entries. Complementing previous work, we introduce two new “data-driven” parameterizations for k-Anonymity—the number t _in of different input rows and the number t _out of different output rows—both modeling aspects of data homogeneity. We show that k-Anonymity is fixed-parameter tractable for the parameter t _in, and that it is NP-hard even for t _out = 2 and alphabet size four. Notably, our fixed-parameter tractability result implies that k-Anonymity can be solved in linear time when t _in is a constant. Our computational hardness results also extend to the related privacy problems p-Sensitivity and ℓ-Diversity, while our fixed-parameter tractability results extend to p-Sensitivity and the usage of domain generalization hierarchies, where the entries are replaced by more general data instead of being completely suppressed.

Article PDF

A Refined Complexity Analysis of Degree Anonymization in Graphs

Pattern-Guided k-Anonymity

On the Complexity of t-Closeness Anonymization and Related Problems

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Aggarwal G, Feder T, Kenthapadi K, Motwani R, Panigrahy R, Thomas D, Zhu A (2005) Anonymizing tables. In: Proceedings of the 10th international conference on database theory (ICDT ’05), Springer, LNCS, vol 3363, pp 246–258
Aggarwal G, Feder T, Kenthapadi K, Khuller S, Panigrahy R, Thomas D, Zhu A (2010) Achieving anonymity via clustering. ACM Trans Algorithms 6(3): 1–19
Article MathSciNet Google Scholar
Blocki J, Williams R (2010) Resolving the complexity of some data privacy problems. In: Proceedings of the 37th international colloquium on automata, languages and programming (ICALP ’10), Springer, LNCS, vol 6199, pp 393–404
Bonizzoni P, Della Vedova G, Dondi R (2011a) Anonymizing binary and small tables is hard to approximate. J Comb Optim 22(1): 97–119
Article MATH MathSciNet Google Scholar
Bonizzoni P, Della Vedova G, Dondi R, Pirola Y (2011b) Parameterized complexity of k-anonymity: hardness and tractability. J Comb Optim (available online)
Bredereck R, Nichterlein A, Niedermeier R, Philip G (2011) Pattern-guided data anonymization and clustering. In: Proceedings of the 36th international symposium mathematical foundations of computer science (MFCS ’11), Springer, LNCS, vol 6907, pp 182–193
Campan A, Truta TM (2009) Data and structural k-anonymity in social networks. In: Proceedings of the 2nd ACM SIGKDD international workshop on privacy, security, and trust in KDD (PinKDD ’08), LNCS, vol 5456, Springer, pp 33–54
Chakaravarthy VT, Pandit V, Sabharwal Y (2010) On the complexity of the k-anonymization problem. CoRR abs/1004.4729
Dondi R, Mauri G, Zoppis I (2012) The ℓ-diversity problem: Tractability and approximability. Theor Comput Sci (available online)
Downey RG, Fellows MR (1999) Parameterized Complexity. Springer,
Dwork C (2011) A firm foundation for private data analysis. Commun ACM 54(1): 86–95
Article Google Scholar
Evans PA, Wareham T, Chaytor R (2009) Fixed-parameter tractability of anonymizing data by suppressing entries. J Comb Optim 18(4): 362–375
Article MATH MathSciNet Google Scholar
Fard AM, Wang K (2010) An effective clustering approach to web query log anonymization. In: Katsikas SK, Samarati P (eds) Proceedings of the international conference on security and cryptography (SECRYPT ’10), SciTePress, pp 109–119
Fellows MR (2009) Towards fully multivariate algorithmics: some new results and directions in parameter ecology. In: Proceedings of the 20th international workshop on combinatorial algorithms (IWOCA ’09), Springer, LNCS, vol 5874, pp 2–10
Flum J, Grohe M (2006) Parameterized complexity theory. Springer, Berlin
Google Scholar
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Fredkin E (1960) Trie memory. Commun ACM 3(9): 490–499
Article Google Scholar
Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv 42(4): 14–11453
Article Google Scholar
Gedik B, Liu L (2008) Protecting location privacy with personalized k-anonymity: architecture and algorithms. IEEE Trans Mobile Comput 7(1): 1–18
Article Google Scholar
Gionis A, Tassa T (2009) k-anonymization with minimal loss of information. IEEE Trans Knowl Data Eng 21(2): 206–219
Article Google Scholar
Gkoulalas-Divanis A, Kalnis P, Verykios VS (2010) Providing k-anonymity in location based services. ACM SIGKDD Explor Newsl 12: 3–10
Article Google Scholar
Gruteser M, Grunwald D (2003) Anonymous usage of location-based services through spatial and temporal cloaking. In: Proceedings of the 1st international conference on mobile systems, applications and services (MOBISYS ’03), ACM, MobiSys ’03, pp 31–42
Johnson DS (1987) The NP-completeness column: an ongoing guide. J Algorithms 8: 438–448
Article MATH MathSciNet Google Scholar
Kenig B, Tassa T (2012) A practical approximation algorithm for optimal k-anonymity. Data Mining Knowl Discov 25: 134–168
Article MATH MathSciNet Google Scholar
Komusiewicz C, Niedermeier R, Uhlmann J (2011) Deconstructing intractability—a multivariate complexity analysis of interval constrained coloring. J Discrete Algorithms 9: 137–151
Article MATH MathSciNet Google Scholar
Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of the 23rd international conference on data engineering (ICDE ’07), IEEE, pp 106–115
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) ℓ-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1): Article No. 3. doi:10.1145/1217299.1217302
Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. In: Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS ’04), ACM, pp 223–228
Monreale A, Andrienko G, Andrienko N, Giannotti F, Pedreschi D, Rinzivillo S, Wrobel S (2010) Movement data anonymity through generalization. Trans Data Privacy 3: 91–121
MathSciNet Google Scholar
Navarro-Arribas G, Torra V, Erola A, Castellà-Roca J (2012) User k-anonymity for privacy preserving data mining of query logs. Inf Process Manag 48(3): 476–487
Article Google Scholar
Niedermeier R (2006) Invitation to fixed-parameter algorithms. Oxford University Press, Oxford
Book MATH Google Scholar
Niedermeier R (2010) Reflections on multivariate algorithmics and problem parameterization. In: Proceedings of the 27th international symposium on theoretical aspects of computer science (STACS ’10), Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Leibniz international proceedings in informatics (LIPIcs), vol 5, pp 17–32
Orlin J (1988) A faster strongly polynomial minimum cost flow algorithm. In: Proceedings of the 20th annual ACM symposium on theory of computing (STOC ’88), ACM, pp 377–387
Park H, Shim K (2007) Approximate algorithms for k-anonymity. In: Proceedings of the international conference on management of data (SIGMOD ’07), ACM, SIGMOD ’07, pp 67–78
Samarati P (2001) Protecting respondents identities in microdata release. IEEE Trans Knowl Data Eng 13(6): 1010–1027
Article Google Scholar
Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (PODS ’98), ACM, pp 188–188
Sweeney L (2000) Uniqueness of simple demographics in the U.S. population. Tech. rep., Carnegie Mellon University, School of Computer Science, Laboratory for International Data Privacy
Sweeney L (2002a) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl Based Syst 10(5): 571–588
Article MATH MathSciNet Google Scholar
Sweeney L (2002b) k-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5): 557–570
Article MATH MathSciNet Google Scholar
Truta TM, Vinay B (2006) Privacy protection: p-sensitive k-anonymity property. In: Proceedings of the 22nd international conference on data engineering workshops (ICDE ’06), IEEE Computer Society, p 94
Wong RCW, Li J, Fu AWC, Wang K (2006) (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: Proceedings of the 12th international conference on knowledge discovery and data mining (KDD ’06), ACM, pp 754–759
Xiao X, Tao Y (2006) Anatomy: simple and effective privacy preservation. In: Proceedings of the 32nd international conference on very large data bases (VLDB ’06), ACM, pp 139–150
Xiao X, Yi K, Tao Y (2010) The hardness and approximation algorithms for l-diversity. In: Proceedings of the 13th international conference on extending database technology (EDBT’ 10), ACM, pp 135–146
Zhou B, Pei J (2011) The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowl Inf Syst 28: 47–77
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Softwaretechnik und Theoretische Informatik, TU Berlin, Berlin, Germany
Robert Bredereck, André Nichterlein & Rolf Niedermeier
Max-Planck-Institut für Informatik, Saarbrücken, Germany
Geevarghese Philip

Authors

Robert Bredereck
View author publications
You can also search for this author in PubMed Google Scholar
André Nichterlein
View author publications
You can also search for this author in PubMed Google Scholar
Rolf Niedermeier
View author publications
You can also search for this author in PubMed Google Scholar
Geevarghese Philip
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to André Nichterlein.

Additional information

Responsible editor: Jian Pei.

An extended abstract entitled “The Effect of Homogeneity on the Complexity of k-Anonymity” appeared in Proceedings of the 18th International Symposium on Fundamentals of Computation Theory (FCT ’11), volume 6914 of LNCS, pages 53-64, Springer 2011. Apart from the full proofs omitted in that version, the current article also contains new results on ℓ-Diversity, p-Sensitivity, and on the usage of domain generalization hierarchies.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bredereck, R., Nichterlein, A., Niedermeier, R. et al. The effect of homogeneity on the computational complexity of combinatorial data anonymization. Data Min Knowl Disc 28, 65–91 (2014). https://doi.org/10.1007/s10618-012-0293-7

Download citation

Received: 18 April 2012
Accepted: 24 September 2012
Published: 13 October 2012
Issue Date: January 2014
DOI: https://doi.org/10.1007/s10618-012-0293-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The effect of homogeneity on the computational complexity of combinatorial data anonymization

Abstract

Article PDF

Similar content being viewed by others

A Refined Complexity Analysis of Degree Anonymization in Graphs

Pattern-Guided k-Anonymity

On the Complexity of t-Closeness Anonymization and Related Problems

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The effect of homogeneity on the computational complexity of combinatorial data anonymization

Abstract

Article PDF

Similar content being viewed by others

A Refined Complexity Analysis of Degree Anonymization in Graphs

Pattern-Guided k-Anonymity

On the Complexity of t-Closeness Anonymization and Related Problems

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation