Abstract
This paper suggests a method to supplant missing categorical data by “reasonable” replacements. These replacements will maximize the consistency of the completed data as measured by Guttman's squared correlation ratio. The text outlines a solution of the optimization problem, describes relationships with the relevant psychometric theory, and studies some properties of the method in detail. The main result is that the average correlation should be at least 0.50 before the method becomes practical. At that point, the technique gives reasonable results up to 10–15% missing data.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Dear, R. E. (1959).A principal component missing data method for multiple regression models (SP-86). Santa Monica, CA: System Development Corporation.
Fisher, W. D. (1958). On grouping for maximum homogeneity.Journal of the American Statistical Association, 53, 789–798.
Gifi, A. (1990).Nonlinear multivariate analysis. Chichester: Wiley.
Gleason, T. C., & Staelin, R. (1975). A proposal for handling missing data.Psychometrika, 40, 229–252.
Greenacre, M. J. (1984).Theory and applications of correspondence analysis. New York: Academic Press.
Guttman, L. (1941). The quantification of a class of attributes: A theory and method of scale construction. In P. Horst et al. (Eds.),The prediction of personal adjustment (pp. 319–348). New York: Social Science Research Council.
Hartigan, J. A. (1975).Clustering algorithms. New York: Wiley.
Hartley, H. O., & Hocking, R. R. (1971). The analysis of incomplete data.Biometrics, 27, 783–808.
Kalton, G., & Kasprzyk, D. (1982). Imputing for missing survey responses.Proceedings of the Section of Survey Research Methods, 1982 (pp. 22–23). Alexander, VA: American Statistical Association.
Little, R. J. A., & Rubin, D. B. (1990). The analysis of social science data with missing values. In J. Fox & T. Scott Long (Eds.),Modern methods of data analysis (pp. 374–409). London: Sage.
Madow, W. G., Olkin, I., & Rubin, D. B. (Eds.). (1983).Incomplete data in sample surveys (Vols. 1–3). New York: Academic Press.
Meulman, J. (1982).Homogeneity analysis of incomplete data. Leiden: DSWO Press.
Milligan, G. W. (1980). An examination of the effect of six types of error perturbation of fifteen clustering algorithms.Psychometrika, 45, 325–342.
Nishisato, S. (1980).Analysis of categorical data: Dual scaling and its applications. Toronto: University of Toronto Press.
Nishisato, S., & Ahn, H. (in press). When not to analyze data: Decision making on missing responses in dual scaling.Annals of Operations Research.
Rubin, D. B. (1987).Multiple imputation for nonresponse in surveys. New York: Wiley.
Rubin, D. B. (1991). EM and beyond.Psychometrika, 56, 241–254.
Scheibler, D., & Schneider, W. (1985). Monte Carlo tests of the accuracy of cluster analysis algorithms.Multivariate Behavioral Research, 20, 283–304.
Späth, H. (1985).Cluster dissection and analysis. Chichester: Ellis Horwood.
Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation.Journal of the American Statistical Association, 82, 528–550.
van Buuren, S., & Heiser, W. J. (1989). Clusteringn objects intok groups under optimal scaling of variables.Psychometrika, 54, 699–706.
van Buuren, S., & van Rijckevorsel, J. L. A. (1992). Data augmentation and optimal scaling. In R. Steyer, K. F. Wender, & K. F. Widaman (Eds.),Psychometric Methodology. Proceedings of the 7th European Meeting of the Psychometric Society in Trier (80–84). Stuttgart and New York: Gustav Fischer Verlag.
van der Heijden, P. G. M., & Escofier, B. (1989).Multiple correspondence analysis with missing data. Unpublished manuscript, University of Leiden, Department of Psychometrics and Research Methods.
van Rijckevorsel, J. L. A., & de Leeuw, J. (1992). Some results about the importance of knot selection in nonlinear multivariate analysis.Statistica Applicata: Italian Journal of Applied Statistics, 4.
Author information
Authors and Affiliations
Additional information
We thank Anneke Bloemhoff of NIPG-TNO for compiling and making the Dutch Life Style Survey data available to use, and Chantal Houée and Thérèse Bardaine, IUT, Vannes, France, exchange students under the COMETT program of the EC, for computational assistance. We also thank Donald Rubin, the Editors and several anonymous reviewers for constructive suggestions.
Rights and permissions
About this article
Cite this article
van Buuren, S., van Rijckevorsel, J.L.A. Imputation of missing categorical data by maximizing internal consistency. Psychometrika 57, 567–580 (1992). https://doi.org/10.1007/BF02294420
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02294420