Abstract
When microdata files for research are released, it is possible that external users may attempt to breach confidentiality. For this reason most National Statistical Institutes apply some form of disclosure risk assessment and data protection. Risk assessment first requires a measure of disclosure risk to be defined. In this paper we build on previous work by [BF98] to define a Bayesian hierarchical model for risk estimation. We follow a superpopulation approach similar to [BKP90] and [Rin03]. For each combination of values of the key variables we derive the posterior distribution of the population frequency given the observed sample frequency. Knowledge of this posterior distribution enables us to obtain suitable summaries that can be used to estimate the risk of disclosure. One such summary is the mean of the reciprocal of the population frequency or Benedetti-Franconi risk, but we also investigate others such as the mode. We apply our approach to an artificial sample of the Italian 1991 Census data, drawn by means of a widely used sampling scheme. We report on results of this application and document the computational difficulties that we encountered. The risk estimates that we obtain are sensible, but suggest possible improvements and modifications to our methodology. We discuss these together with potential alternative strategies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions. Dover, New York (1965)
Benedetti, R., Franconi, L.: Statistical and technological solutions for controlled data dissemination. In: Pre-proceedings of New Techniques and Technologies for Statistics, Sorrento, June 4-6, vol. 1, pp. 225–232 (1998)
Bethlehem, J., Keller, W., Pannekoek, J.: Disclosure control of microdata. Journal of the American Statistical Association 85, 38–45 (1990)
Butler, R.W., Wood, A.T.A.: Laplace approximations for hypergeometric functions with matrix argument. Technical Report 00-05, School of Mathematical Sciences, University of Nottingham, UK (2000)
Carlson, M.: Assessing microdata disclosure risk using the Poisson-inverse Gaussian distribution. Statistics in Transition 5, 901–925 (2002)
Di Consiglio, L., Franconi, L., Seri, G.: Assessing individual risk of disclosure: an experiment. In: Proceedings of the Joint ECE/Eurostat Work Session on Statistical Data Confidentiality, Luxembourg, April 7-9 (2003)
Duncan, G.T., Lambert, D.: The risk of disclosure for microdata. Journal of Business and Economic Statistics 7, 207–217 (1989)
Deville, J.C., Särndal, C.E.: Calibration estimators in survey sampling. Journal of the American Statistical Association 87, 367–382 (1992)
Elamir, E.A.H., Skinner, C.J.: Modeling the re-identification risk per record in microdata. Technical report, Southampton Statistical Sciences Research Institute, University of Southampton, U.K (2004)
Stephen, E.: Fienberg and Udi E. Makov. Confidentiality, uniqueness, and disclosure limitation for categorical data. Journal of Official Statistics 14, 385–397 (1998)
Ghosh, M., Rao, J.N.K.: Small area estimation: An appraisal (with comments). Statistical Science 9, 55–76 (1994)
Lambert, D.: Measures of disclosure risk and harm. Journal of Official Statistics 9, 313–331 (1993)
Polettini, S.: Some remarks on the individual risk methodology. In: Proceedings of the Joint ECE/Eurostat Work Session on Statistical Data Confidentiality, Luxembourg, April 7-9 (2003)
Rinott, Y.: On models for statistical disclosure risk estimation. In: Proceedings of the Joint ECE/Eurostat Work Session on Statistical Data Confidentiality, Luxembourg, April 7-9 (2003)
Skinner, C.J., Elliot, M.J.: A measure of disclosure risk for microdata. Journal of the Royal Statistical Society, Series B 64, 855–867 (2002)
Skinner, C.J., Holmes, D.J.: Estimating the re-identification risk per record in microdata. Journal of Official Statistics 14, 361–372 (1998)
Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Springer, New York (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Polettini, S., Stander, J. (2004). A Bayesian Hierarchical Model Approach to Risk Estimation in Statistical Disclosure Limitation. In: Domingo-Ferrer, J., Torra, V. (eds) Privacy in Statistical Databases. PSD 2004. Lecture Notes in Computer Science, vol 3050. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25955-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-25955-8_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22118-0
Online ISBN: 978-3-540-25955-8
eBook Packages: Springer Book Archive