Bayesian methods for dealing with missing data problems

Ma, Zhihua; Chen, Guanghui

doi:10.1016/j.jkss.2018.03.002

Bayesian methods for dealing with missing data problems

Review
Published: 13 April 2018

Volume 47, pages 297–313, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of the Korean Statistical Society Aims and scope Submit manuscript

Bayesian methods for dealing with missing data problems

Download PDF

Zhihua Ma¹ &
Guanghui Chen¹

634 Accesses
42 Citations
Explore all metrics

Abstract

Missing data, a common but challenging issue in most studies, may lead to biased and inefficient inferences if handled inappropriately. As a natural and powerful way for dealing with missing data, Bayesian approach has received much attention in the literature. This paper reviews the recent developments and applications of Bayesian methods for dealing with ignorable and non-ignorable missing data. We firstly introduce missing data mechanisms and Bayesian framework for dealing with missing data, and then introduce missing data models under ignorable and non-ignorable missing data circumstances based on the literature. After that, important issues of Bayesian inference, including prior construction, posterior computation, model comparison and sensitivity analysis, are discussed. Finally, several future issues that deserve further research are summarized and concluded.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Ahmed, M. R. (2011). An investigation of methods for missing data in hierarchical models for discrete data. (Ph.D. thesis), Canada: University of Waterloo.
Google Scholar
Berger, J. O. (1985). Prior information and subjective probability. In Statistical Decision Theory and Bayesian Analysis (pp. 74–117). New York: Springer.
Chapter Google Scholar
Cai, J. H., Song, X. Y., & Hser, Y. I. (2010). A Bayesian analysis of mixture structural equation models with non-ignorable missing responses and covariates. Statistics in Medicine, 29, 1861–1874.
Article MathSciNet Google Scholar
Carlin, B. P., & Louis, T. A. (1997). Bayes and empirical Bayes methods for data analysis. Statistics and Computing, 7, 153–154.
Article Google Scholar
Carrigan, G., Barnett, A. G., Dobson, A. J., & Mishra, G. (2007). Compensating for missing data from longitudinal studies using WinBUGS. Journal of Statistical Software, 19, 1–17.
Article Google Scholar
Chen, M. H., Dey, D. K., & Ibrahim, J. G. (2004). Bayesian criterion based model assessment for categorical data. Biometrika, 91, 45–63.
Article MathSciNet MATH Google Scholar
Chen, M. H., Huang, L., Ibrahim, J. G., et al. (2008). Bayesian variable selection and computation for generalized linear models with conjugate priors. Bayesian Analysis, 3, 585–614.
Article MathSciNet MATH Google Scholar
Chen, M. H., & Ibrahim, J. G. (2001). Maximum likelihood methods for cure rate models with missing covariates. Bioemtrics, 57, 43–52.
Article MathSciNet MATH Google Scholar
Chen, M. H., Ibrahim, J. G., & Lipsitz, S. R. (2002). Bayesian methods for missing covariates in cure rate models. Lifetime Data Analysis, 8, 117–146.
Article MathSciNet MATH Google Scholar
Chen, M. H., & Kim, S. (2008). The Bayes factor versus other model selection criteria for the selection of constrained models. Statistics for Social & Behavioral Sciences, 15, 5–180.
Google Scholar
Chen, Q., & Ibrahim, J. G. (2014). A note on the relationships between multiple imputation, maximum likelihood and fully Bayesian methods for missing responses in linear regression models. Statistics and its Interface, 6, 315.
Article MathSciNet MATH Google Scholar
Daniels, M. J., & Hogan, J. W. (2008). Missing data in longitudinal studies: Strategies for Bayesian modeling and sensitivity analysis. New York: CRC Press.
Book MATH Google Scholar
Daniels, M. J.,& Linero, A.R. (2015). Bayesian nonparametrics for missing data inlongitudinal clinical trials. In Nonparametric Bayesian inference in biostatistics (pp. 423-446). Springer.
Google Scholar
Daniels, M., Wang, C., & Marcus, B. (2014). Fully Bayesian inference under ignorable missingness in the presence of auxiliary covariates. Bioemtrics, 70, 62–72.
Article MathSciNet MATH Google Scholar
Das, S., Chen, M.-H., Kim, S., & Warren, N. (2008). A Bayesian structural equations model for multilevel data with missing responses and missing covariates. Bayesian Analysis, 3, 197–224.
Article MathSciNet MATH Google Scholar
Deyoreo, M., Reiter, J. P., & Hillygus, D. S. (2016). Bayesian mixture models with focused clustering for mixed ordinal and nominal data. Bayesian Analysis TBA, 1–25.
Google Scholar
Draper, D. (1995). Assessment and propagation of model uncertainty. Journal of the Royal Statistical Society. Series B. Methodology, 4, 5–97.
MATH Google Scholar
Erler, N. S., Rizopoulos, D., Rosmalen, J., et al. (2016). Dealing with missing covariates in epidemiologic studies: A comparison between multiple imputation and a full Bayesian approach. Statistics in Medicine, 35, 2955–2974.
Article MathSciNet Google Scholar
Garthwaite, P. H., Kadane, J. B., & O’hagan, A. (2005). Statistical methods for eliciting probability distributions. Journal of the American Statistical Association, 100, 680–701.
Article MathSciNet MATH Google Scholar
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences (with discussion). Statistical Science, 7, 457–511.
Article MATH Google Scholar
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 72, 1–741.
MATH Google Scholar
Green, P. E., & Park, T. (2003). A bayesian hierarchical model for categorical data with non-ignorable nonresponse. Bioemtrics, 59, 886–896.
Article MATH Google Scholar
Harel, O., & Zhou, X. H. (2007). Multiple imputation: review of theory, implementation and software. Statistics in Medicine, 26, 3057–3077.
Article MathSciNet Google Scholar
Hastie, T., & Tibshirani, R. (1987). Non-parametric logistic and proportional odds regression. Applied Statatistics-Journal of the Royal Statistical Society, 260–276.
Google Scholar
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57, 97–109.
Article MathSciNet MATH Google Scholar
Hong, H., Chu, H., Zhang, J., & Carlin, B. P. (2016). A Bayesian missing data framework for generalized multiple outcome mixed treatment comparisons. Research Synthesis Methods, 7, 6–22.
Article Google Scholar
Huang, L., Chen, M. H., & Ibrahim, J. G. (2005). Bayesian analysis for generalized linear models with nonignorably missing covariates. Bioemtrics, 61, 767–780.
Article MathSciNet MATH Google Scholar
Huang, Y. (2016). Quantile regression-based bayesian semiparametric mixed-effects models for longitudinal data with non-normal, missing and mismeasured covariate. Journal of Statistical Computation and Simulation, 86, 1183–1202.
Article MathSciNet Google Scholar
Ibrahim, J. G., Chen, M. H., & Lipsitz, S. R. (2002). Bayesian methods for generalized linear models with covariates missing at random. The Canadian Journal of Statistics. La Revue Canadienne de Statistique, 30, 55–78.
Article MathSciNet MATH Google Scholar
Ibrahim, J. G., Chen, M.-H., Lipsitz, S. R., & Herring, A. H. (2005). Missing-data methods for generalized linear models: a comparative review. Journal of the American Statistical Association, 100, 332–346.
Article MathSciNet MATH Google Scholar
Ibrahim, J. G., Chen, M.-H., & Sinha, D. (2001). Criterion-based methods for Bayesian model assessment. Statistica Sinica, 419–443.
Google Scholar
Ibrahim, J. G., Chu, H., & Chen, M.-H. (2012). Missing data in clinical studies: issues and methods. Journal of Clinical Oncology, 30, 3297–3303.
Article Google Scholar
Ibrahim, J. G., & Molenberghs, G. (2009). Missing data methods in longitudinal studies: a review. Test, 18, 1–43.
Article MathSciNet MATH Google Scholar
Jackson, C., Best, N., & Richardson, S. (2006). Improving ecological inference using individual-level data. Statistics in Medicine, 25, 2136–2159.
Article MathSciNet Google Scholar
Kaciroti, N., & Raghunathan, T. E. (2009). Bayesian sensitivity analysis of incomplete data using pattern-mixture and selection models through equivalent parameterization. Ann Arbor, 1001, 48109.
Google Scholar
Kaciroti, N. A., Raghunathan, T. E., Schork, M. Anthony. A, & Clark, N. M. (2008). A Bayesian model for longitudinal count data with non-ignorable dropout. Applied Statatistics-Journal of the Royal Statistical Society, 57, 521–534.
Article MathSciNet Google Scholar
Kaciroti, N. A., Raghunathan, T. E., Schork, M. A., Clark, N. M., & Gong, M. (2006). A Bayesian approach for clustered longitudinal ordinal outcome with non-ignorable missing data: Evaluation of an asthma education program. Journal of the American Statistical Association, 101, 435–446.
Article MathSciNet MATH Google Scholar
Kalaylioglu, Z., & Ozturk, O. (2013). Bayesian semiparametric models for non-ignorable missing mechanismsingeneralized linear models. Journal of Applied Statistics, 40, 1746–1763.
Article MathSciNet Google Scholar
Kaplan, D. E. (2000). Structural equation modeling: foundations and extensions. Sage Publications.
MATH Google Scholar
Kenward, M. G., Molenberghs, G., & Thijs, H. (2003). Pattern-mixture models with proper time dependence. Biometrika, 90, 53–71.
Article MathSciNet MATH Google Scholar
Kim, Y. D., & Choi, S. (2014). Bayesian binomial mixture model for collaborative prediction with non-random missing data. In Proceedings of the 8th ACM Conference on recommender systems. ACM.
Google Scholar
Knott, M., & Bartholomew, D. J. (1999). Latent variable models and factor analysis. Edward Arnold.
MATH Google Scholar
Koenker, R. (2005). Quantile regression. Cambridge University Press.
Book MATH Google Scholar
Kyoung, Y., & Lee, K. (2015). Bayesian pattern mixture model for longitudinal binary data with non-ignorable missingness. Communications for Statistical Applications and Methods, 22, 589–598.
Article Google Scholar
Lee, K. J., & Simpson, J. A. (2014). Introduction to multiple imputation for dealing with missing data. Respirology, 19, 162–167.
Article Google Scholar
Lee, S. Y., & Song, X. Y. (2004). Bayesian model comparison of nonlinear structural equation models with missing continuous and ordinal categorical data. British Journal of Mathematical and Statistical Psychology, 57, 131–150.
Article MathSciNet Google Scholar
Lee, S. Y., & Tang, N. S. (2006). Bayesian analysis of nonlinear structural equation models with non-ignorable missing data. Psychometrika, 71, 541.
Article MathSciNet Google Scholar
Lee, S. Y.,& Zhu,H.T. (2000). Statistical analysis ofnonlinear structural equation models with continuous and polytomous data. British Journal of Mathematical and Statistical Psychology, 53, 209–232.
Article Google Scholar
Linero, A. R. (2017). Bayesian nonparametric analysis of longitudinal studies in the presence of informative missingness. Biometrika, 104, 327–341.
Article MathSciNet MATH Google Scholar
Linero, A.R., & Daniels, M.J. (2015). Aflexible Bayesian approachtomonotone missing datainlongitudinal studies withinformativedropout with application to a schizophrenia clinical trial. Journal of the American Statistical Association, 110, 45–55.
Article MathSciNet Google Scholar
Little, R. J. A., & Rubin, D. B. (2002). Statistical Analysis with Missing Data (2nd ed.). New York: Wiley.
Book MATH Google Scholar
Liu, G. F., Han, B., Zhao, X., & Lin, Q. (2016). A comparison of frequentist and Bayesian model based approaches for missing data analysis: case study with a schizophrenia clinical trial. Statistics in Biopharmaceutical Research, 8, 116–127.
Article Google Scholar
Lu, Z. L., Zhang, Z., & Lubke, G. (2011). Bayesian inference for growth mixture models with latent class dependent missing data. Multivariate Behavioral Research, 46, 567–597.
Article Google Scholar
Lunn, D., Spiegelhalter, D., Thomas, A., et al. (2009). The BUGS project: Evolution, critique and future directions. Statistics in Medicine, 28, 3049–3067.
Article MathSciNet Google Scholar
Martyn, P. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing.
Google Scholar
Mason, A. J. (2010). Bayesian methods for modelling non-random missing data mechanisms in longitudinal studies. In Technical report. London: Imperial College.
Google Scholar
Mason, A., Best, N., Plewis, I., & Richardson, S. (2010). Insights into the use of Bayesian models for informative missing data. In Technical report. London: Imperial College.
Google Scholar
Mealli, F., & Rubin, D. B. (2015). Clarifying missing at random and related definitions, and implications when coupled with exchangeability. Biometrika, 102, 995–1000.
Article MathSciNet MATH Google Scholar
Mengersen, K. L., Robert, C. P., & Guihenneuc, J. C. (1999). MCMC convergence diagnostics: a review. Bayesian Statistics, 6, 415–440.
MATH Google Scholar
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of state calculations by fast computing machines. Journal of Chemical Physics, 21, 1087–1092.
Article MATH Google Scholar
Molenberghs, G., Fitzmaurice, G., Kenward, M. G., Tsiatis, A., & Verbeke, G. (2014). Handbook of missing data methodology. CRC Press.
Book MATH Google Scholar
Molenberghs, G., & Kenward, M. (2007). Missing data in clinical studies. John Wiley & Sons.
Book Google Scholar
Molitor, N. T., Best, N., Jackson, C., & Richardson, S. (2009). Using Bayesian graphical models to model biases in observational studies and to combine multiple sources of data: application to low birth weight and water disinfection by-products. Journal of the Royal Statistical Society. Series A. Satistics in Society, 172, 615–637.
Article MathSciNet Google Scholar
Moltchanova, E., Penttinen, A., & Karvonen, M. (2005). A hierarchical Bayesian birth cohort analysis from incomplete registry data: evaluating the trends in the age of onset of insulin-dependent diabetes mellitus (T1DM). Statistics in Medicine, 24, 2989–3004.
Article MathSciNet Google Scholar
Murray, J. S., & Reiter, J. P. (2016). Multiple imputation of missing categorical and continuous values via Bayesian mixture models with local dependence. Journal of the American Statistical Association, 111, 1466–1479.
Article MathSciNet Google Scholar
Nandram, B., Cox, L. H., & Choi, J. W. (2005). Bayesian analysis of non-ignorable missing categorical data: an application to bone mineral density and family income. Surv. Methodol., 31, 213.
Google Scholar
Nandram, B., Han, G., & Choi, J. (2002). A hierarchical Bayesian non-ignorable nonresponse model for multinomial data from small areas. Surv. Methodol., 28, 145–156.
Google Scholar
Nandram, B., Liu, N., Choi, J. W., & Cox, L. (2005). Bayesian non-response models for categorical data from small areas: an application to BMD and age. Statistics in Medicine, 24, 1047–1074.
Article MathSciNet Google Scholar
Oakley, J. E., & O’hagan, A. (2007). Uncertainty in prior elicitations: a nonparametric approach. Biometrika, 94, 427–441.
Article MathSciNet MATH Google Scholar
Pettitt, A., Tran, T., Haynes, M., & Hay, J. (2006). A Bayesian hierarchical model for categorical longitudinal data from a social survey of immigrants. Journal of the Royal Statistical Society. Series A. Satistics in Society, 169, 97–114.
Article MathSciNet Google Scholar
Poleto, F. Z., Paulino, C. D., Singer, J. M., & Molenberghs, G. (2015). Semi-parametric Bayesian analysis of binary responses with a continuous covariate subject to non-random missingness. Statistical Modelling, 15, 1–23.
Article MathSciNet Google Scholar
Raghunathan, T. E., Xie, D., Schenker, N., Parsons, V. L., Davis, W. W., Dodd, K. W., & Feuer, E. J. (2007). Combining information from two surveys to estimate county-level prevalence rates of cancer risk factors and screening. Journal of the American Statistical Association, 102, 474–486.
Article MathSciNet MATH Google Scholar
Rizopoulos, D., & Ghosh, P. (2011). A Bayesian semiparametric multivariate joint model for multiple longitudinal outcomes and a time-to-event. Statistics in Medicine, 30, 1366–1380.
Article MathSciNet Google Scholar
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.
Article MathSciNet MATH Google Scholar
Rubin, D. B. (2008). Multiple imputation for nonresponse in surveys. New York: Wiley.
MATH Google Scholar
Samani, E. B., & Ganjali, M. (2014). Mixed correlated bivariate ordinal and negative binomial longitudinal responses with non-ignorable missing values. Coommunications in Statistics - Theory and Methods, 43, 2659–2673.
Article MATH Google Scholar
SAS/STAT, 13.2. (2014). User’s guide SAS Institute Inc., Cary, NC.
Scharfstein, D. O., Daniels, M. J., & Robins, J. M. (2003). Incorporating prior beliefs about selection bias in the analysis of randomized trials with missing outcomes. Biostatistics, 4, 495.
Article MATH Google Scholar
Seaman, S., Galati, J., Jackson, D., & Carlin, J. (2013). What is meant by missing at random?. Statistical Science, 25, 7–268.
MathSciNet MATH Google Scholar
Si, Y., & Reiter, J. P. (2013). Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys. Journal of Educational and Behavorial Statistics, 38, 499–521.
Article Google Scholar
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 64, 583–639.
Article MathSciNet MATH Google Scholar
Stan Development Team. (2012). A C++ library for probability and sampling, version 1.0. http://mc-stanorg/.
Google Scholar
Su, L., & Hogan, J. W. (2008). Bayesian semiparametric regression for longitudinal binary processes with missing data. Statistics in Medicine, 27, 3247–3268.
Article MathSciNet Google Scholar
Tang, N.-S., & Zhao, H. (2014). Bayesian analysis of nonlinear reproductive dispersion mixed models for longitudinal data with non-ignorable missing covariates. Communications in Statistics-Simulation and Computation, 43, 1265–1287.
Article MathSciNet MATH Google Scholar
Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82, 528–540.
Article MathSciNet MATH Google Scholar
Thijs, H., Molenberghs, G., Michiels, B., et al. (2002). Strategies to fit pattern-mixture models. Biostatistics, 3, 245–265.
Article MATH Google Scholar
Tran, T. T. (2008). Bayesian model estimation and comparison for longitudinal categorical data. Queensland University of Technology.
Google Scholar
Wang, C., Danies, M. J., Scharfstein, D. O., et al. (2010). A Bayesian shrinkage model for incomplete longitudinal binary data with application to the breast cancer prevention trial. Journal of the American Statistical Association, 105, 1333–1346.
Article MathSciNet MATH Google Scholar
Wang, S., Shao, J., & Kim, J. K. (2014). An instrument variable approach for identification and estimation with nonignorable nonresponse. Statistica Sinica, 24, 1097–1116.
MathSciNet MATH Google Scholar
Xu, D., Daniels, M. J., & Winterstein, A. G. (2016). Sequential BART for imputation of missing covariates. Biostatistics, 17, 589–602.
Article MathSciNet Google Scholar
Yu, F., Chen, M.-H., Huang, L., & Anderson, G. J. (2013). Hierarchical Bayesian analysis of repeated binary data with missing covariates. New York: Springer.
Book Google Scholar
Yuan, Y., & Yin, G. (2010). Bayesian quantile regression for longitudinal studies with non-ignorable missing data. Bioemtrics, 66, 105–114.
Article MATH Google Scholar
Zhang, P. (2003). Multiple imputation: theory and method. International Statistical Review, 71, 581–592.
Article MATH Google Scholar
Zhang, Z., & Wang, L. (2012). A note on the robustness of a full Bayesian method for non-ignorable missing data analysis. Brazilian Journal of Probability and Statistics, 26, 244–264.
Article MathSciNet MATH Google Scholar
Zhu, H., Ibrahim, J. G., & Tang, N. (2011). Bayesian influence analysis: a geometric approach. Biometrika, 98, 307–323.
Article MathSciNet MATH Google Scholar
Zhu, H., Ibrahim, J. G., & Tang, N. (2014). Bayesian sensitivity analysis of statistical models with missing data. Statistica Sinica, 24, 871.
MathSciNet MATH Google Scholar
Zhu, H. T., & Lee, S.-Y. (2001). A Bayesian analysis of finite mixtures in the LISREL model. Psychometrika, 66, 133–152.
Article MathSciNet MATH Google Scholar
Zhu, J., & Raghunathan, T. E. (2015). Convergence properties of a sequential regression multiple imputation algorithm. Journal of the American Statistical Association, 110, 1112–1124.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, School of Economics, Jinan University, Guangzhou, China
Zhihua Ma & Guanghui Chen

Authors

Zhihua Ma
View author publications
You can also search for this author in PubMed Google Scholar
Guanghui Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhihua Ma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, Z., Chen, G. Bayesian methods for dealing with missing data problems. J. Korean Stat. Soc. 47, 297–313 (2018). https://doi.org/10.1016/j.jkss.2018.03.002

Download citation

Received: 05 September 2017
Accepted: 09 March 2018
Published: 13 April 2018
Issue Date: September 2018
DOI: https://doi.org/10.1016/j.jkss.2018.03.002

AMS 2000 subject classifications

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Bayesian methods for dealing with missing data problems

Abstract

Article PDF

Similar content being viewed by others

Ignoring Non-ignorable Missingness

Evaluating the Performance of Bayesian Approach for Imputing Missing Data under different Missingness Mechanism

Missing Data Imputation: A Practical Guide

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

AMS 2000 subject classifications

Keywords

Navigation

Bayesian methods for dealing with missing data problems

Abstract

Article PDF

Similar content being viewed by others

Ignoring Non-ignorable Missingness

Evaluating the Performance of Bayesian Approach for Imputing Missing Data under different Missingness Mechanism

Missing Data Imputation: A Practical Guide

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

AMS 2000 subject classifications

Keywords

Search

Navigation