Abstract
Multiple imputation is one of the most highly recommended procedures for dealing with missing data. However, to date little attention has been paid to methods for combining the results from principal component analyses applied to a multiply imputed data set. In this paper we propose Generalized Procrustes analysis for this purpose, of which its centroid solution can be used as a final estimate for the component loadings. Convex hulls based on the loadings of the imputed data sets can be used to represent the uncertainty due to the missing data. In two simulation studies, the performance of Generalized Procrustes approach is evaluated and compared with other methods. More specifically it is studied how these methods behave when order changes of components and sign reversals of component loadings occur, such as in case of near-equal eigenvalues, or data having almost as many counterindicative items as indicative items. The simulations show that other proposed methods either may run into serious problems or are not able to adequately assess the accuracy due to the presence of missing data. However, when the above situations do not occur, all methods will provide adequate estimates for the PCA loadings.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
ACOCK, A., and MARTIN, J.D. (1974), “The Undermeasurement Controversy: Should Ordinal Data be Treated as Interval?”, Sociology and Social Research, 58, 427-433.
ALISIC, E., VAN DER SCHOOT, T.A.W, VAN GINKEL, J.R., and KLEBER, R.J. (2008), “Looking Beyond PTSD in Children: Posttraumatic Stress Reactions, Posttraumatic Growth, and Quality of Life”, Journal of Clinical Psychiatry, 69, 1455-1461.
ANDERSON, T.W. (1963), “Asymptotic Theory for Principal Component Analysis”, Annals of Mathematical Statistics, 34, 122-148.
ARCHER, C.O., and JENNRICH, R. I. (1973), “Standard Errors for Rotated Factor Loadings”, Psychometrika, 38, 581-592.
BAKER, B.O., HARDYCK, C.D., and PETRINOVICH, L.F. (1966), “Weak Measurement vs. Strong Statistics: An Empirical Critique of S. S. Stevens’s Proscriptions on Statistics”, Educational and Psychological Measurement, 26, 291-309.
BENZÉCRI, J.P. (1973), L’Analyse des Données. 1. La Taxinomie, 2. L’Analyse de Correspondances, Paris: Dunod.
BERNAARDS, C.A., BELIN, T.R., and SCHAFER, J.L. (2007), “Robustness of a Multivariate Normal Approximation for Imputation of Incomplete Binary Data”, Statistics in Medicine, 26, 1368-1382.
BERNAARDS, C.A., and SIJTSMA, K. (1999), “Factor Analysis of Multidimensional Polytomous Items Response Data Suffering from Ignorable Item Nonresponse”, Multivariate Behavioral Research, 34, 277-313.
BERNAARDS, C.A., and SIJTSMA, K. (2000), “Influence of Imputation and EM Methods on Factor Analysis when Item Nonresponse in Questionnaire Data is Nonignorable”, Multivariate Behavioral Research, 35, 321-364.
BOLLEN, K.A., and BARB, K.H. (1981), “Pearson’s R and Coarsely Categorized Measures”, American Sociological Review, 46, 232-239.
CHATTERJEE, S. (1984), “Variance Estimation in Factor Analysis: An Application of the Bootstrap”, British Journal of Mathematical and Statistical Psychology, 37, 252-262.
COHEN, J. (1988), Statistical Power Analysis for the Behavioral Sciences (2nd ed.), Hillsdale, NJ: Lawrence Erlbaum Associates.
COMMANDEUR, J.J.F. (1991), Matching Configurations, Leiden, The Netherlands: DSWO Press.
COMREY, A.L., and LEE, H.B. (1992), A First Course in Factor Analysis (2nd ed.), Hillsdale, NJ: Lawrence Erlbaum Associates.
COSTA, P.T., and MCCRAE, R.R. (1985), The NEO Personality Inventory Manual, Odessa, Florida: Psychological Assessment Resources Inc.
D’AUBIGNY, G. (2004), “Une Méthode d’Imputation Multiple, en ACP”, paper presented the XXXVIème Journée de Statistique. Montpellier, France, May 2004.
DOERING, T.R., and RAYMOND, H. (1979), “Measurement and Statistics: The Ordinal-Interval Controversy and Geography”, Area, 11, 237-243.
GIRSHICK, M.A. (1939), “On the Sampling Theory of Roots of Determinantal Equations”, Annals of Mathematical Statistics, 10, 203-224.
GOWER, J.C. (1971), “Statistical Methods of Comparing Different Multivariate Analyses of the Same Data”, in Mathematics in the Archaeological and Historical Sciences, eds. F.R. Hodson, D.G. Kendall, and P. Tautu, Edinburgh: Edinburgh Univ. Press, pp. 138-149.
GOWER, J.C. (1975), “Generalized Procrustes Analysis”, Psychometrika, 40, 33-51.
GRAHAM, J.W., and SCHAFER, J.L. (1999), “On the Performance of Multiple Imputation for Multivariate Data with Small Sample Size”, in Statistical Strategies for Small Sample Research, ed. R. Hoyle, Thousand Oaks CA: Sage, pp. 1-29.
GREEN, B.F. (1952), “An Orthogonal Approximation of an Oblique Structure in Factor Analysis”, Psychometrika, 17, 429-440.
GREEN, P.J. (1981), “Peeling Bivariate Data”, in Interpreting Multivariate Data, ed. V, Barnett, New York: Wiley, pp. 3-19.
GRUNG, B., and MANNE, R. (1998), “Missing Values in Principal Component Analysis”, Chemometrics and Intelligent Laboratory Systems, 42, 125-139.
HO, P., SILVA M.C.M., and HOGG T.A. (2001), “Changes in Colour and Phenolic Composition During the Early Stages of Maturation of Port in Wood, Stainless Steel and Glass”, Journal of the Science of Food and Agriculture, 81, 1269-1280.
HOCK, E. (1984), “The Transition to Day Care: Effects of Maternal Separation Anxiety on Infant Adjustment”, in The Child and the Day Care Setting, ed. R. Ainslie, New York: Praeger.
JOLLIFFE, I.T. (2002), Principal Component Analysis (2nd ed.), New York: Springer.
JOSSE, J., PAGÈS, J., and HUSSON, F. (2011), “Multiple Imputation in PCA”, Advances in Data Analysis and Classification, 5, 231-246.
JOSSE, J., HUSSON, F., and PAGÈS, J. (2009), “Gestion des Données Manquantes en Analyse en Composantes Principales”, Journal de la Société Française de Statistique, 150, 28-51.
KIERS, H.A.L. (1997), “Weighted Least Squares Fitting Using Ordinary Least Squares Algorithms”, Psychometrika, 62, 251-266.
KNAPP. T.R. (1990), “Treating Ordinal Scales as Interval Scales: An attempt to Resolve the Controversy”, Nursing Research, 39, 121-123.
KROONENBERG, P.M. (1983), Three-Mode Principal Component Analysis, Leiden, The Netherlands: DSWO Press, accessed January, 2013, from http://three-mode.leidenuniv.nl/
KROONENBERG, P.M. (2008), Applied Multiway Data Analysis, Hoboken, NJ: Wiley.
LABOVITZ, S. (1967), “Some Observations on Measurement and Statistics”, Social Forces, 46, 151-160.
LINGOES, J.C., and BORG, I. (1978), “A Direct Approach to Individual Differences Scaling Using Increasingly Complex Transformations”, Psychometrika, 43, 491-519.
LINTING, M., MEULMAN, J.J., GROENEN, P.J.F., and VAN DER KOOIJ, A.J. (2007), “Stability of Nonlinear Principal Components Analysis: An Empirical Study Using the Balanced Bootstrap”, Psychological Methods, 12, 359-379.
LITTLE, R.J.A. (1988), “Missing-Data Adjustments in Large Surveys”, Journal of Business and Economic Statistics, 6, 287-296.
LITTLE, R.J.A., and RUBIN, D.B. (2002), Statistical Analysis with Missing Data (2nd ed.), New York: Wiley.
MARKUS, M.T. (1994), Bootstrap Confidence Regions in Nonlinear Multivariate Analysis, Leiden: DSWO Press.
MASI, A.T., ALDAG, J.C., and CHATTERTON, R.T. (2006), “Sex Hormones and Risks of Rheumatoid Arthritis and Developmental or Environmental Influences”, Annals of the New York Academy of Sciences, 1069, 223-235.
MEULMAN, J. (1982), Homogeneity Analysis of Incomplete Data, Leiden: DSWO Press.
MILAN, L., and WHITTAKER, J. (1995), “Application of the Parametric Bootstrap to Models that Incorporate a Singular Value Decomposition”, Applied Statistics, 44, 31-49.
NANDAKUMAR, R., YU, F., LI, H.H., and STOUT, W.F. (1998), “Assessing Unidimensionality of Polytomous Data”, Applied Psychological Measurement, 22, 99-115.
NICHD EARLY CHILDCARE RESEARCH NETWORK (1996), “Characteristics of Infant Childcare: Factors Contributing to Positive Caregiving”, Early Childhood Research Quarterly, 11, 269-306.
OGASAWARA, H. (2000), “Standard Errors of the Principal Component Loadings for Unstandardized and Standardized Variables”, British Journal of Mathematical and Statistical Psychology, 53, 155-174.
OGASAWARA, H. (2002), “Concise Formulas for the Standard Errors of Component Loading Estimates”, Psychometrika, 67, 289-297.
PIANTA, R.C. (1992), Child-Parent Relationship Scale, Charlotsville: University of Virginia.
RADLOFF, L.S. (1977), “The CES-D Scale: A Self-Report Depression Scale for Research in the General Population”, Applied Psychological Measurement, 1, 385-401.
RAVENS-SIEBERER, U., AUQUIER, P., ERHART, M., GOSCH, A., RAJMIL, L., BRUIL, J., POWER, M., DUER, W., CLOETTA, B., CZEMY, L., MAZUR, J., CZIMBALMOS, A., TOUNTAS, Y., HAGQUIST, C., KILROE, J, and the EUROPEAN KIDSCREEN GROUP (2007), “The KIDSCREEN-27 for Children and Adolescents: Psychometric Results from a Cross-Cultural Survey in 13 European Countries”, Quality of Life Research, 16, 1347-1356.
ROUSSEEUW, P.J., RUTS, I., and TUKEY, J.W. (1999), “The Bagplot: a Bivariate Boxplot”, The American Statistician, 53, 382-387.
RUBIN, D.B. (1976), “Inference and Missing Data”, Biometrika, 63, 581-592.
RUBIN, D.B. (1986), “Statistical Matching Using File Concatenation with Adjusted Weights and Multiple Imputations”, Journal of Business and Economic Statistics 4, 87-94.
RUBIN, D.B. (1987), Multiple Imputation for Nonresponse in Surveys, New York: Wiley.
SCHAFER, J.L. (1997), Analysis of Incomplete Multivariate Data, London: Chapman and Hall.
SCHAFER, J.L. (1998), NORM: Version 2.02 for Windows 95/98/NT, accessed January, 2013, from http://www.stat.psu.edu/~jls/misoftwa.html
S-PLUS 7 for WINDOWS [Computer software], (2007), Seattle, WA: Insightful Corporation.
SPSS INC. (2011), SPSS 19.0 for Windows [Computer software], Chicago: SPSS.
SU, Y.S., GELMAN, A., HILL, J., and YAJIMA, M. (2011), “Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box”, Journal of Statistical Software, 45, 1-31.
TAKANE, Y., and OSHIMA-TAKANE, Y. (2003), “Relationship Between Two Methods for Dealing with Missing Data in Principal Component Analysis”, Behaviormetrika, 30, 145-154.
TEN BERGE, J.M.F. (1977), “Orthogonal Procrustes Rotation for Two or More Matrices”, Psychometrika, 42, 267-275.
TIMMERMAN, M.E., KIERS, H.A.L., and SMILDE, A.K. (2007), “Estimating Confidence Intervals for Principal Component Loadings: A Comparison Between the Bootstrap and Asymptotic Results”, British Journal of Mathematical and Statistical Psychology, 60, 295-314.
TUCKER, L.R. (1951), “A Method for Synthesis of Factor Analysis Studies”, Personnel Research Section Report No. 984, Washington, DC: Department of the Army.
VAN BUUREN, S. (2010), “Item Imputation Without Specifying Scale Structure”, Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 6, 31-36.
VAN BUUREN, S., BRAND, J.P.L., GROOTHUIS-OUDHOORN, C.G.M., and RUBIN, D.B. (2006), “Fully Conditional Specification in Multivariate Imputation”, Journal of Statistical Computation and Simulation, 76, 1049-1064.
VAN GINKEL, J.R. (2010), “Investigation of Multiple Imputation in Low-Quality Questionnaire Data”, Multivariate Behavioral Research, 45, 574-598.
VAN GINKEL, J.R., and KIERS, H.A.L. (2011), “Constructing Bootstrap Confidence Intervals for Principal Component Loadings in the Presence of Missing Data: A Multiple-Imputation Approach”, British Journal of Mathematical and Statistical Psychology, 64, 498-515.
VAN GINKEL J.R., and KROONENBERG, P.M. (2009), “Using Generalized Procrustes Analysis to Combine the Results from Principal Components Analysis in Multiple Imputation”, presentation given at the 16th International Meeting of the Psychometric Society, Cambridge, July 2009.
VAN GINKEL, J.R., VAN DER ARK, L.A., SIJTSMA, K., and VERMUNT, J.K. (2007), “Two-Way Imputation: A Bayesian Method for Estimating Missing Scores in Tests and Questionnaires, and an Accurate Approximation”, Computational Statistics and Data Analysis, 51, 4013-4027.
WEISSTEIN, E.W., “Heron's Formula”, MathWorld-A Wolfram Web Resource, accessed, January, 2013 from http://mathworld.wolfram.com/HeronsFormula.html
WENTZELL, P.D., ANDREWS, D.T., HAMILTON, D.C., FABER, K., and KOWALSKI, B.R. (1997), “Maximum Likelihood Principal Component Analysis”, Journal of Chemometrics, 11, 339-366.
YUAN, Y.C. (2011), “Multiple Imputation using SAS Software”, Journal of Statistical Software, 45, 1-25.
ZUCCOLOTTO, P. (2008), “A Symbolic Data Approach for Missing Values Treatment in Principal Components Analysis,” Statistica Applicazioni, 6, 153-180.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
van Ginkel, J.R., Kroonenberg, P.M. Using Generalized Procrustes Analysis for Multiple Imputation in Principal Component Analysis. J Classif 31, 242–269 (2014). https://doi.org/10.1007/s00357-014-9154-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-014-9154-y