Abstract
Missing data, a common but challenging issue in most studies, may lead to biased and inefficient inferences if handled inappropriately. As a natural and powerful way for dealing with missing data, Bayesian approach has received much attention in the literature. This paper reviews the recent developments and applications of Bayesian methods for dealing with ignorable and non-ignorable missing data. We firstly introduce missing data mechanisms and Bayesian framework for dealing with missing data, and then introduce missing data models under ignorable and non-ignorable missing data circumstances based on the literature. After that, important issues of Bayesian inference, including prior construction, posterior computation, model comparison and sensitivity analysis, are discussed. Finally, several future issues that deserve further research are summarized and concluded.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Ahmed, M. R. (2011). An investigation of methods for missing data in hierarchical models for discrete data. (Ph.D. thesis), Canada: University of Waterloo.
Berger, J. O. (1985). Prior information and subjective probability. In Statistical Decision Theory and Bayesian Analysis (pp. 74–117). New York: Springer.
Cai, J. H., Song, X. Y., & Hser, Y. I. (2010). A Bayesian analysis of mixture structural equation models with non-ignorable missing responses and covariates. Statistics in Medicine, 29, 1861–1874.
Carlin, B. P., & Louis, T. A. (1997). Bayes and empirical Bayes methods for data analysis. Statistics and Computing, 7, 153–154.
Carrigan, G., Barnett, A. G., Dobson, A. J., & Mishra, G. (2007). Compensating for missing data from longitudinal studies using WinBUGS. Journal of Statistical Software, 19, 1–17.
Chen, M. H., Dey, D. K., & Ibrahim, J. G. (2004). Bayesian criterion based model assessment for categorical data. Biometrika, 91, 45–63.
Chen, M. H., Huang, L., Ibrahim, J. G., et al. (2008). Bayesian variable selection and computation for generalized linear models with conjugate priors. Bayesian Analysis, 3, 585–614.
Chen, M. H., & Ibrahim, J. G. (2001). Maximum likelihood methods for cure rate models with missing covariates. Bioemtrics, 57, 43–52.
Chen, M. H., Ibrahim, J. G., & Lipsitz, S. R. (2002). Bayesian methods for missing covariates in cure rate models. Lifetime Data Analysis, 8, 117–146.
Chen, M. H., & Kim, S. (2008). The Bayes factor versus other model selection criteria for the selection of constrained models. Statistics for Social & Behavioral Sciences, 15, 5–180.
Chen, Q., & Ibrahim, J. G. (2014). A note on the relationships between multiple imputation, maximum likelihood and fully Bayesian methods for missing responses in linear regression models. Statistics and its Interface, 6, 315.
Daniels, M. J., & Hogan, J. W. (2008). Missing data in longitudinal studies: Strategies for Bayesian modeling and sensitivity analysis. New York: CRC Press.
Daniels, M. J.,& Linero, A.R. (2015). Bayesian nonparametrics for missing data inlongitudinal clinical trials. In Nonparametric Bayesian inference in biostatistics (pp. 423-446). Springer.
Daniels, M., Wang, C., & Marcus, B. (2014). Fully Bayesian inference under ignorable missingness in the presence of auxiliary covariates. Bioemtrics, 70, 62–72.
Das, S., Chen, M.-H., Kim, S., & Warren, N. (2008). A Bayesian structural equations model for multilevel data with missing responses and missing covariates. Bayesian Analysis, 3, 197–224.
Deyoreo, M., Reiter, J. P., & Hillygus, D. S. (2016). Bayesian mixture models with focused clustering for mixed ordinal and nominal data. Bayesian Analysis TBA, 1–25.
Draper, D. (1995). Assessment and propagation of model uncertainty. Journal of the Royal Statistical Society. Series B. Methodology, 4, 5–97.
Erler, N. S., Rizopoulos, D., Rosmalen, J., et al. (2016). Dealing with missing covariates in epidemiologic studies: A comparison between multiple imputation and a full Bayesian approach. Statistics in Medicine, 35, 2955–2974.
Garthwaite, P. H., Kadane, J. B., & O’hagan, A. (2005). Statistical methods for eliciting probability distributions. Journal of the American Statistical Association, 100, 680–701.
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences (with discussion). Statistical Science, 7, 457–511.
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 72, 1–741.
Green, P. E., & Park, T. (2003). A bayesian hierarchical model for categorical data with non-ignorable nonresponse. Bioemtrics, 59, 886–896.
Harel, O., & Zhou, X. H. (2007). Multiple imputation: review of theory, implementation and software. Statistics in Medicine, 26, 3057–3077.
Hastie, T., & Tibshirani, R. (1987). Non-parametric logistic and proportional odds regression. Applied Statatistics-Journal of the Royal Statistical Society, 260–276.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57, 97–109.
Hong, H., Chu, H., Zhang, J., & Carlin, B. P. (2016). A Bayesian missing data framework for generalized multiple outcome mixed treatment comparisons. Research Synthesis Methods, 7, 6–22.
Huang, L., Chen, M. H., & Ibrahim, J. G. (2005). Bayesian analysis for generalized linear models with nonignorably missing covariates. Bioemtrics, 61, 767–780.
Huang, Y. (2016). Quantile regression-based bayesian semiparametric mixed-effects models for longitudinal data with non-normal, missing and mismeasured covariate. Journal of Statistical Computation and Simulation, 86, 1183–1202.
Ibrahim, J. G., Chen, M. H., & Lipsitz, S. R. (2002). Bayesian methods for generalized linear models with covariates missing at random. The Canadian Journal of Statistics. La Revue Canadienne de Statistique, 30, 55–78.
Ibrahim, J. G., Chen, M.-H., Lipsitz, S. R., & Herring, A. H. (2005). Missing-data methods for generalized linear models: a comparative review. Journal of the American Statistical Association, 100, 332–346.
Ibrahim, J. G., Chen, M.-H., & Sinha, D. (2001). Criterion-based methods for Bayesian model assessment. Statistica Sinica, 419–443.
Ibrahim, J. G., Chu, H., & Chen, M.-H. (2012). Missing data in clinical studies: issues and methods. Journal of Clinical Oncology, 30, 3297–3303.
Ibrahim, J. G., & Molenberghs, G. (2009). Missing data methods in longitudinal studies: a review. Test, 18, 1–43.
Jackson, C., Best, N., & Richardson, S. (2006). Improving ecological inference using individual-level data. Statistics in Medicine, 25, 2136–2159.
Kaciroti, N., & Raghunathan, T. E. (2009). Bayesian sensitivity analysis of incomplete data using pattern-mixture and selection models through equivalent parameterization. Ann Arbor, 1001, 48109.
Kaciroti, N. A., Raghunathan, T. E., Schork, M. Anthony. A, & Clark, N. M. (2008). A Bayesian model for longitudinal count data with non-ignorable dropout. Applied Statatistics-Journal of the Royal Statistical Society, 57, 521–534.
Kaciroti, N. A., Raghunathan, T. E., Schork, M. A., Clark, N. M., & Gong, M. (2006). A Bayesian approach for clustered longitudinal ordinal outcome with non-ignorable missing data: Evaluation of an asthma education program. Journal of the American Statistical Association, 101, 435–446.
Kalaylioglu, Z., & Ozturk, O. (2013). Bayesian semiparametric models for non-ignorable missing mechanismsingeneralized linear models. Journal of Applied Statistics, 40, 1746–1763.
Kaplan, D. E. (2000). Structural equation modeling: foundations and extensions. Sage Publications.
Kenward, M. G., Molenberghs, G., & Thijs, H. (2003). Pattern-mixture models with proper time dependence. Biometrika, 90, 53–71.
Kim, Y. D., & Choi, S. (2014). Bayesian binomial mixture model for collaborative prediction with non-random missing data. In Proceedings of the 8th ACM Conference on recommender systems. ACM.
Knott, M., & Bartholomew, D. J. (1999). Latent variable models and factor analysis. Edward Arnold.
Koenker, R. (2005). Quantile regression. Cambridge University Press.
Kyoung, Y., & Lee, K. (2015). Bayesian pattern mixture model for longitudinal binary data with non-ignorable missingness. Communications for Statistical Applications and Methods, 22, 589–598.
Lee, K. J., & Simpson, J. A. (2014). Introduction to multiple imputation for dealing with missing data. Respirology, 19, 162–167.
Lee, S. Y., & Song, X. Y. (2004). Bayesian model comparison of nonlinear structural equation models with missing continuous and ordinal categorical data. British Journal of Mathematical and Statistical Psychology, 57, 131–150.
Lee, S. Y., & Tang, N. S. (2006). Bayesian analysis of nonlinear structural equation models with non-ignorable missing data. Psychometrika, 71, 541.
Lee, S. Y.,& Zhu,H.T. (2000). Statistical analysis ofnonlinear structural equation models with continuous and polytomous data. British Journal of Mathematical and Statistical Psychology, 53, 209–232.
Linero, A. R. (2017). Bayesian nonparametric analysis of longitudinal studies in the presence of informative missingness. Biometrika, 104, 327–341.
Linero, A.R., & Daniels, M.J. (2015). Aflexible Bayesian approachtomonotone missing datainlongitudinal studies withinformativedropout with application to a schizophrenia clinical trial. Journal of the American Statistical Association, 110, 45–55.
Little, R. J. A., & Rubin, D. B. (2002). Statistical Analysis with Missing Data (2nd ed.). New York: Wiley.
Liu, G. F., Han, B., Zhao, X., & Lin, Q. (2016). A comparison of frequentist and Bayesian model based approaches for missing data analysis: case study with a schizophrenia clinical trial. Statistics in Biopharmaceutical Research, 8, 116–127.
Lu, Z. L., Zhang, Z., & Lubke, G. (2011). Bayesian inference for growth mixture models with latent class dependent missing data. Multivariate Behavioral Research, 46, 567–597.
Lunn, D., Spiegelhalter, D., Thomas, A., et al. (2009). The BUGS project: Evolution, critique and future directions. Statistics in Medicine, 28, 3049–3067.
Martyn, P. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing.
Mason, A. J. (2010). Bayesian methods for modelling non-random missing data mechanisms in longitudinal studies. In Technical report. London: Imperial College.
Mason, A., Best, N., Plewis, I., & Richardson, S. (2010). Insights into the use of Bayesian models for informative missing data. In Technical report. London: Imperial College.
Mealli, F., & Rubin, D. B. (2015). Clarifying missing at random and related definitions, and implications when coupled with exchangeability. Biometrika, 102, 995–1000.
Mengersen, K. L., Robert, C. P., & Guihenneuc, J. C. (1999). MCMC convergence diagnostics: a review. Bayesian Statistics, 6, 415–440.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of state calculations by fast computing machines. Journal of Chemical Physics, 21, 1087–1092.
Molenberghs, G., Fitzmaurice, G., Kenward, M. G., Tsiatis, A., & Verbeke, G. (2014). Handbook of missing data methodology. CRC Press.
Molenberghs, G., & Kenward, M. (2007). Missing data in clinical studies. John Wiley & Sons.
Molitor, N. T., Best, N., Jackson, C., & Richardson, S. (2009). Using Bayesian graphical models to model biases in observational studies and to combine multiple sources of data: application to low birth weight and water disinfection by-products. Journal of the Royal Statistical Society. Series A. Satistics in Society, 172, 615–637.
Moltchanova, E., Penttinen, A., & Karvonen, M. (2005). A hierarchical Bayesian birth cohort analysis from incomplete registry data: evaluating the trends in the age of onset of insulin-dependent diabetes mellitus (T1DM). Statistics in Medicine, 24, 2989–3004.
Murray, J. S., & Reiter, J. P. (2016). Multiple imputation of missing categorical and continuous values via Bayesian mixture models with local dependence. Journal of the American Statistical Association, 111, 1466–1479.
Nandram, B., Cox, L. H., & Choi, J. W. (2005). Bayesian analysis of non-ignorable missing categorical data: an application to bone mineral density and family income. Surv. Methodol., 31, 213.
Nandram, B., Han, G., & Choi, J. (2002). A hierarchical Bayesian non-ignorable nonresponse model for multinomial data from small areas. Surv. Methodol., 28, 145–156.
Nandram, B., Liu, N., Choi, J. W., & Cox, L. (2005). Bayesian non-response models for categorical data from small areas: an application to BMD and age. Statistics in Medicine, 24, 1047–1074.
Oakley, J. E., & O’hagan, A. (2007). Uncertainty in prior elicitations: a nonparametric approach. Biometrika, 94, 427–441.
Pettitt, A., Tran, T., Haynes, M., & Hay, J. (2006). A Bayesian hierarchical model for categorical longitudinal data from a social survey of immigrants. Journal of the Royal Statistical Society. Series A. Satistics in Society, 169, 97–114.
Poleto, F. Z., Paulino, C. D., Singer, J. M., & Molenberghs, G. (2015). Semi-parametric Bayesian analysis of binary responses with a continuous covariate subject to non-random missingness. Statistical Modelling, 15, 1–23.
Raghunathan, T. E., Xie, D., Schenker, N., Parsons, V. L., Davis, W. W., Dodd, K. W., & Feuer, E. J. (2007). Combining information from two surveys to estimate county-level prevalence rates of cancer risk factors and screening. Journal of the American Statistical Association, 102, 474–486.
Rizopoulos, D., & Ghosh, P. (2011). A Bayesian semiparametric multivariate joint model for multiple longitudinal outcomes and a time-to-event. Statistics in Medicine, 30, 1366–1380.
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.
Rubin, D. B. (2008). Multiple imputation for nonresponse in surveys. New York: Wiley.
Samani, E. B., & Ganjali, M. (2014). Mixed correlated bivariate ordinal and negative binomial longitudinal responses with non-ignorable missing values. Coommunications in Statistics - Theory and Methods, 43, 2659–2673.
SAS/STAT, 13.2. (2014). User’s guide SAS Institute Inc., Cary, NC.
Scharfstein, D. O., Daniels, M. J., & Robins, J. M. (2003). Incorporating prior beliefs about selection bias in the analysis of randomized trials with missing outcomes. Biostatistics, 4, 495.
Seaman, S., Galati, J., Jackson, D., & Carlin, J. (2013). What is meant by missing at random?. Statistical Science, 25, 7–268.
Si, Y., & Reiter, J. P. (2013). Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys. Journal of Educational and Behavorial Statistics, 38, 499–521.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 64, 583–639.
Stan Development Team. (2012). A C++ library for probability and sampling, version 1.0. http://mc-stanorg/.
Su, L., & Hogan, J. W. (2008). Bayesian semiparametric regression for longitudinal binary processes with missing data. Statistics in Medicine, 27, 3247–3268.
Tang, N.-S., & Zhao, H. (2014). Bayesian analysis of nonlinear reproductive dispersion mixed models for longitudinal data with non-ignorable missing covariates. Communications in Statistics-Simulation and Computation, 43, 1265–1287.
Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82, 528–540.
Thijs, H., Molenberghs, G., Michiels, B., et al. (2002). Strategies to fit pattern-mixture models. Biostatistics, 3, 245–265.
Tran, T. T. (2008). Bayesian model estimation and comparison for longitudinal categorical data. Queensland University of Technology.
Wang, C., Danies, M. J., Scharfstein, D. O., et al. (2010). A Bayesian shrinkage model for incomplete longitudinal binary data with application to the breast cancer prevention trial. Journal of the American Statistical Association, 105, 1333–1346.
Wang, S., Shao, J., & Kim, J. K. (2014). An instrument variable approach for identification and estimation with nonignorable nonresponse. Statistica Sinica, 24, 1097–1116.
Xu, D., Daniels, M. J., & Winterstein, A. G. (2016). Sequential BART for imputation of missing covariates. Biostatistics, 17, 589–602.
Yu, F., Chen, M.-H., Huang, L., & Anderson, G. J. (2013). Hierarchical Bayesian analysis of repeated binary data with missing covariates. New York: Springer.
Yuan, Y., & Yin, G. (2010). Bayesian quantile regression for longitudinal studies with non-ignorable missing data. Bioemtrics, 66, 105–114.
Zhang, P. (2003). Multiple imputation: theory and method. International Statistical Review, 71, 581–592.
Zhang, Z., & Wang, L. (2012). A note on the robustness of a full Bayesian method for non-ignorable missing data analysis. Brazilian Journal of Probability and Statistics, 26, 244–264.
Zhu, H., Ibrahim, J. G., & Tang, N. (2011). Bayesian influence analysis: a geometric approach. Biometrika, 98, 307–323.
Zhu, H., Ibrahim, J. G., & Tang, N. (2014). Bayesian sensitivity analysis of statistical models with missing data. Statistica Sinica, 24, 871.
Zhu, H. T., & Lee, S.-Y. (2001). A Bayesian analysis of finite mixtures in the LISREL model. Psychometrika, 66, 133–152.
Zhu, J., & Raghunathan, T. E. (2015). Convergence properties of a sequential regression multiple imputation algorithm. Journal of the American Statistical Association, 110, 1112–1124.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ma, Z., Chen, G. Bayesian methods for dealing with missing data problems. J. Korean Stat. Soc. 47, 297–313 (2018). https://doi.org/10.1016/j.jkss.2018.03.002
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1016/j.jkss.2018.03.002