Abstract
Additive randomization has been a primary tool for hiding sensitive private information. Previous work empirically showed that individual data values can be approximately reconstructed from the perturbed values, using spectral filtering techniques. This poses a serious threat of privacy breaches. In this paper we conduct a theoretical study on how the reconstruction error varies, for different types of additive noise. In particular, we first derive an upper bound for the reconstruction error using matrix perturbation theory. Attackers who use spectral filtering techniques to estimate the true data values may leverage this bound to determine how close their estimates are to the original data. We then derive a lower bound for the reconstruction error, which can help data owners decide how much noise should be added to satisfy a given threshold of the tolerated privacy breach.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Adam NR, Wortman JC (1989) Security-control methods for statistical databases. ACM Comput Surv 21(4): 515–556
Aggarwal C, Yu P (2004) A condensation approach to privacy preserving data mining. In: Proceedings of international conference on extending database technology, LNCS, vol 2992, pp 183–199
Agrawal D, Agrawal C (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th symposium on principles of database systems, Santa Barbara, pp 247–255
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the ACM SIGMOD international conference on management of data, Dallas, pp 439–450
Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html
Chen K, Liu L (2005) Privacy preserving data classification with rotation perturbation. In: Proceedings of the 5th IEEE international conference on data mining, Houston pp 589–592
Dobra A, Fienberg SE (2001) Bounds for cell entries in contingency tables induced by fixed marginal totals with applications to disclosure limitation. Stat J United Nations ECE 18, pp 363–371
Evfimievski A (2002) Randomization in privacy preserving data mining. SIGKDD Explor 4(2): 43–48
Evfimievski A, Gehrke J, Srikant R (2003) Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART symposium on principles of database system, pp 211–222
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, pp 217–228
Guo S, Wu X (2006) On the use of spectral filtering for privacy preserving data mining. In: Proceedings of the 21st ACM symposium on applied computing, pp 622–626
Guo S, Wu X (2007) Deriving private information from arbitrarily projected data. In: Proceedings of the 11th Pacific-Asia conference on knowledge discovery and data mining, pp 84–95
Guo S, Wu X, Li Y (2006a) Deriving private information from perturbed data using iqr based approach. In: Proceedings of the 22nd international conference on data engineering workshops, pp 92–101
Guo S, Wu X, Li Y (2006b) On the lower bound of reconstruction error for spectral filtering based privacy preserving data mining. In: Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases, Berlin, pp 520–527
Huang Z, Du W, Chen B (2005) Deriving private information from randomized data. In: Proceedings of the ACM SIGMOD conference on management of data, Baltimore, pp 37–48
Kantarcioglu M, Jin J, Clifton C (2004) When do data mining results violate privacy? In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 599–604
Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the 3rd international conference on data mining, pp 99–106
Kargupta H, Datta S, Wang Q, Sivakumar K (2005) Random-data perturbation techniques and privacy-preserving data mining. Knowl Inf Syst 7(4): 387–414
Liu K, Giannella C, Kargupta H (2006) An attacker’s view of distance preserving maps for privacy preserving data mining. In: Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (PKDD’06), Berlin, pp 297–308
Liu K, Kargupta H, Ryan J (2006) Random projection based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18(1): 92–106
Stewart G, Sun J (1990) Matrix perturbation theory. Academic Press, London
Sung SY, Liu Y, Xiong H, Ng PA (2006) Privacy preservation for data cubes. Knowl Inf Syst 9(1): 38–61
Wang K, Fung BCM, Yu PS (2007) Handicapping attacker’s confidence: an alternative to k-anonymization. Knowl Inf Syst 11(3): 345–368
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Guo, S., Wu, X. & Li, Y. Determining error bounds for spectral filtering based reconstruction methods in privacy preserving data mining. Knowl Inf Syst 17, 217–240 (2008). https://doi.org/10.1007/s10115-008-0123-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-008-0123-9