Abstract
Assuming the existence of one-way functions, we show that there is no polynomial-time, differentially private algorithm \(\mathcal{A}\) that takes a database D ∈ ({0,1}d)n and outputs a “synthetic database” \(\widehat{D}\) all of whose two-way marginals are approximately equal to those of D. (A two-way marginal is the fraction of database rows x ∈ {0,1}d with a given pair of values in a given pair of columns). This answers a question of Barak et al. (PODS ‘07), who gave an algorithm running in time poly(n,2d).
Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC ‘09) with encodings based on probabilistically checkable proofs.
We also present both negative and positive results for generating “relaxed” synthetic data, where the fraction of rows in D satisfying a predicate c are estimated by applying c to each row of \(\widehat{D}\) and aggregating the results in some way.
A full version of this paper appears on ECCC [28].
Chapter PDF
Similar content being viewed by others
Keywords
References
Adam, N.R., Wortmann, J.: Security-control methods for statistical databases: A comparative study. ACM Computing Surveys 21, 515–556 (1989)
Alekhnovich, M., Braverman, M., Feldman, V., Klivans, A.R., Pitassi, T.: The complexity of properly learning simple concept classes. J. Comput. Syst. Sci. 74, 16–34 (2008)
Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: A holistic solution to contingency table release. In: Proceedings of the 26th Symposium on Principles of Database Systems, pp. 273–282 (2007)
Barak, B., Goldreich, O.: Universal arguments and their applications. SIAM J. Comput. 38, 1661–1694 (2008)
Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: The SuLQ framework. In: Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (June 2005)
Blum, A., Ligett, K., Roth, A.: A learning theory approach to non-interactive database privacy. In: Proceedings of the 40th ACM SIGACT Symposium on Thoery of Computing (2008)
Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 202–210 (2003)
Duncan, G.: Confidentiality and statistical disclosure limitation. In: International Encyclopedia of the Social and Behavioral Sciences. Elsevier, Amsterdam (2001)
Dwork, C.: A firm foundation for private data analysis. Communications of the ACM (to appear)
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006, Part II. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)
Dwork, C., Naor, M., Reingold, O., Rothblum, G., Vadhan, S.: When and how can privacy-preserving data release be done efficiently? In: Proceedings of the 2009 International ACM Symposium on Theory of Computing (STOC) (2009)
Dwork, C., Nissim, K.: Privacy-preserving datamining on vertically partitioned databases. In: Franklin, M. (ed.) CRYPTO 2004. LNCS, vol. 3152, pp. 528–544. Springer, Heidelberg (2004)
Dwork, C., Rothblum, G., Vadhan, S.P.: Boosting and differential privacy. In: Proceedings of FOCS 2010 (2010)
Evfimievski, A., Grandison, T.: Privacy Preserving Data Mining (a short survey). In: Encyclopedia of Database Technologies and Applications. Information Science Reference (2006)
Feldman, V.: Hardness of proper learning. In: The Encyclopedia of Algorithms. Springer, Heidelberg (2008)
Feldman, V.: Hardness of approximate two-level logic minimization and PAC learning with membership queries. Journal of Computer and System Sciences 75(1), 13–26 (2009), http://dx.doi.org/10.1016/j.jcss.2008.07.007
Goldreich, O.: Foundations of Cryptography, vol. 2. Cambridge University Press, Cambridge (2004)
Håstad, J.: Some optimal inapproximability results. J. ACM. 48, 798–859 (2001)
Kearns, M.J., Valiant, L.G.: Cryptographic limitations on learning boolean formulae and finite automata. J. ACM. 41, 67–95 (1994)
Kilian, J.: A note on efficient zero-knowledge proofs and arguments (extended abstract). In: STOC (1992)
Micali, S.: Computationally sound proofs. SIAM J. Comput. 30, 1253–1298 (2000)
Naor, M., Yung, M.: Universal one-way hash functions and their cryptographic applications. In: STOC, pp. 33–43 (1989)
Pitt, L., Valiant, L.G.: Computational limitations on learning from examples. J. ACM 35, 965–984 (1988)
Reiter, J.P., Drechsler, J.: Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality. Iab discussion paper, Intitut für Arbeitsmarkt und Berufsforschung (IAB), Nürnberg, Institute for Employment Research, Nuremberg, Germany (2007), http://ideas.repec.org/p/iab/iabdpa/200720.html
Rompel, J.: One-way functions are necessary and sufficient for secure signatures. In: STOC, pp. 387–394 (1990)
Roth, A., Roughgarden, T.: Interactive privacy via the median mechanism. In: STOC 2010 (2010)
Ullman, J., Vadhan, S.P.: PCPs and the hardness of generating synthetic data. Electronic Colloquium on Computational Complexity (ECCC) 17, 17 (2010)
Valiant, L.G.: A theory of the learnable. Communications of the ACM 27(11), 1134–1142 (1984)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 International Association for Cryptologic Research
About this paper
Cite this paper
Ullman, J., Vadhan, S. (2011). PCPs and the Hardness of Generating Private Synthetic Data. In: Ishai, Y. (eds) Theory of Cryptography. TCC 2011. Lecture Notes in Computer Science, vol 6597. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19571-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-19571-6_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19570-9
Online ISBN: 978-3-642-19571-6
eBook Packages: Computer ScienceComputer Science (R0)