Abstract
This study investigated the performance of multiple imputations with Expectation-Maximization (EM) algorithm and Monte Carlo Markov chain (MCMC) method in missing data imputation. We compared the accuracy of imputation based on some real data and set up two extreme scenarios and conducted both empirical and simulation studies to examine the effects of missing data rates and number of items used for imputation. In the empirical study, the scenario represented item of highest missing rate from a domain with fewest items. In the simulation study, we selected a domain with most items and the item imputed has lowest missing rate. In the empirical study, the results showed there was no significant difference between EM algorithm and MCMC method for item imputation, and number of items used for imputation has little impact, either. Compared with the actual observed values, the middle responses of 3 and 4 were over-imputed, and the extreme responses of 1, 2 and 5 were under-represented. The similar patterns occurred for domain imputation, and no significant difference between EM algorithm and MCMC method and number of items used for imputation has little impact. In the simulation study, we chose environmental domain to examine the effect of the following variables: EM algorithm and MCMC method, missing data rates, and number of items used for imputation. Again, there was no significant difference between EM algorithm and MCMC method. The accuracy rates did not significantly reduce with increase in the proportions of missing data. Number of items used for imputation has some contribution to accuracy of imputation, but not as much as expected.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Arbucle J.L.: Amos User’s Guide. Smallwaters, Chicago (1995)
Arbuckle J.L.: Full information estimation in the presence of incomplete data. In: Marcoulides, G.A., Schumacker, R.E. (eds) Advanced Structural Equation Modeling, Lawrence Erlbaum Publishers, Mahwah (1996)
Boomsma A.: On the Robustness of LISREL (Maximum Likelihood Estimation) Against Small Sample Size and Non-normality. Sociometric Research Foundation, Amsterdam (1983)
Brown C.H.: Asymptotic comparison of missing data procedures for estimating factor loadings. Psychometrika 48, 269–291 (1983)
Brown R.L.: Efficacy of the indirect approach for estimating structural equation models with missing data: A comparison of five methods. Struct. Eq. Model. 1, 287–316 (1994)
Curran D., Fayers P.M., Molenberghs G., Machin D.: Analysis of incomplete quality-of-life data in clinical trials. In: Staquet, M. (eds) Quality of Life Assessment in Clinical Trials: Methods and Practice, Oxford University Press, Oxford (1998a)
Curran D., Molenberghs G., Fayers P.M., Machin D.: Incomplete quality of life data in randomized trials: missing forms. Stat. Med. 17, 697–709 (1998b)
Dempster A., Laird N., Rubin D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B Methodol. 39, 1–38 (1997)
Enders C.K., Bandalos D.L.: The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Struct. Eq. Model. 8(3), 430–457 (2001)
Fayers P., Machin D.: Quality of Life. Assessment, Analysis and Interpretation. Wiley, Chichester (2000)
Fayers P.M., Curran D., Machin D.: Incomplete quality of life data in randomized trials: Missing items. Stat. Med. 17, 679–696 (1998)
Gilks, W., Richardson, S., Spiegelhalter, D.: Markov Chain Monte Carlo in Practice. Chapman and Hall (1995)
Glasser M.: Linear regression analysis with missing observations among the independent variables. J. Am. Stat. Assoc. 59, 834–844 (1964)
Graham, J.W., Hofer, S.M.: Multiple imputation in multivariate research In: Little, T.D. Schnabel, K.U., Baumert J. (eds.) Modeling Longitudinal and Multilevel Data: Practical Issues Applied Approaches and Specific Examples. Lawrence Erlbaum Associates, Mahwah (2000)
Graham J.W., Hofer S.M., MacKinnon D.P.: Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures. Multivar. Behav. Res. 31, 197–218 (1996)
Haitovsky Y.: Missing data in regression analysis. J. R. Stat. Soc. B 30, 67–81 (1968)
Jöreskog K.G., Sörbom D.: PRELIS 2: User’s Reference Guide. Scientific Software International, Chicago (1996)
Jöreskog, K., Sörbom, D. LISREL 8.7 for Windows. Scientific Software International, Inc., Lincolnwood (2004)
Kim J.O., Curry J.: The treatment of missing data in multivariate analysis. Sociol. Methods Anal. 6, 215–240 (1977)
Lin T.H., Chang H.Y., Weng W.S., Chen Y.J., Cho E.Y., Hsiung C.A., Liu J.P.: The National Health Interview Survey Information System: an overview. J. Taiwan Pub. Health 22(6), 431–440 (2003)
Little R., Rubin D.: Statistical Analysis with Missing Data. Wiley, New York (1987)
Little R., Rubin D.: The analysis of social science data with missing values. Sociol. Methods Res. 18, 292–326 (1989)
McLachlan G.J., Krishnan T.: The EM Algorithm and Extensions. Wiley, New York (1997)
Muthén B., Kaplan D., Hollis M.: On structural equation modeling with data that are not missing completely at random. Psychometrica 52, 431–462 (1987)
Neale M.C., Boker S.M., Xie G., Maes H.H.: Mx: Statistical Modeling (5th ed.). Department of Psychiatry, Richmond (1999)
Olschewski M., Schulgen G., Schumacher M., Altman D.G.: Quality of life assessment in clinical cancer research. Br. J. Cancer 70, 1–5 (1994)
Rovine, M.J.: Latent variable models and missing data analysis. In: von Eye, A., Clogg, C.C. (eds.) Latent Variable Analysis: Applications for Developmental Research. Sage Publications Thousand Oaks (1994)
Schafer J.L.: Analysis of Incomplete Multivariate Data. Chapman & Hall, New York (1997)
Verleye, G. (1996). Missing at random data problems in attitude measurement using maximum likelihood structural equation modeling. Unpublished dissertation. Frije Universiteit Brussels, Department of Psychology
World Health Organization. International Classification of Impairments, Disabilities and Handicaps. WHO, Geneva (1980)
Wothke W.: Longitudinal and multi-group modeling with missing data. In: Little, T.D., Schnabel, K.U., Baumert, J. (eds) Modeling Longitudinal and Multiple Group Data: Practical Issues, Applied Approaches and Specific Examples, Lawrence Erlbaum Associates, Inc., Mahwah (2000)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lin, T.H. A comparison of multiple imputation with EM algorithm and MCMC method for quality of life missing data. Qual Quant 44, 277–287 (2010). https://doi.org/10.1007/s11135-008-9196-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11135-008-9196-5