Abstract
Generalized canonical correlation analysis is a versatile technique that allows the joint analysis of several sets of data matrices. The generalized canonical correlation analysis solution can be obtained through an eigenequation and distributional assumptions are not required. When dealing with multiple set data, the situation frequently occurs that some values are missing. In this paper, two new methods for dealing with missing values in generalized canonical correlation analysis are introduced. The first approach, which does not require iterations, is a generalization of the Test Equating method available for principal component analysis. In the second approach, missing values are imputed in such a way that the generalized canonical correlation analysis objective function does not increase in subsequent steps. Convergence is achieved when the value of the objective function remains constant. By means of a simulation study, we assess the performance of the new methods. We compare the results with those of two available methods; the missing-data passive method, introduced in Gifi’s homogeneity analysis framework, and the GENCOM algorithm developed by Green and Carroll. An application using world bank data is used to illustrate the proposed methods.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Albers CJ, Gower JC (2010) A general approach to handling missing values in procrustes analysis. Adv Data Anal Classif 4: 223–237
Bijmolt TH, Wedel M (1999) A comparison of multidimensional scaling methods for perceptual mapping. J Mark Res 36: 277–285
Borg I, Leutner D (1985) Measuring the similarity between MDS configurations. Multivar Behav Res 20: 325–334
Carroll JD (1968) Generalization of canonical correlation analysis to three or more sets of variables. In: Proceedings of the American psychological association, pp 227–228
Gifi A (1990) Nonlinear multivariate analysis. Wiley, Chichester
Green PE, Carroll JD (1988) A simple procedure for finding a composite of several multidimensional scaling solutions. J Acad Mark Sci 16: 25–35
Horst P (1961) Generalized canonical correlation and their applications to experimental data. J Clin Psychol 17: 331–347
Hotelling H (1936) Relations between two sets of variates. Biometrika 28: 321–377
Kettenring JR (1971) Canonical analysis of several sets of variables. Biometrika 58: 433–451
Little R, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York
Magnus J, Neudecker H (1999) Matrix differential calculus with applications in statistics and econometrics. Wiley, Chichester
Meulman JJ (1982) Homogeneity analysis of incomplete data. DSWO Press, Leiden
Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York
Shibayama T (1988) Kessokuchi o fukumu testo sukoa no tahenryoukaiseki (multivariate analysis of test scores with missing data). Unpublished Doctoral Dissertation, Faculty of Education, University (in Japanese)
Shibayama T (1995) A linear composite method for test scores with missing values. Niigata daigaku kyouikugakubu kiyou (Memoirs of the Faculty of Education, Niigata University) 36: 445–455
Steenkamp J-BEM, Van Trijp HCM, Ten Berge JMF (1994) Perceptual mapping based on idiosyncratic sets of attributes. J Mark Res 31: 15–27
Takane Y (1995) Seiyakutsuki Shuseibunbunsekihou (Constrained principal component analysis). Asakurashoten, Tokyo
Takane Y, Oshima-Takane Y (2003) Relationships between two methods for dealing with missing data in principal component analysis. Behaviormetrika 30: 145–154
Ten Berge JMF, Kiers HAL, Commandeur JJF (1993) Orthogonal procrustes rotation for matrices with missing values. British J Math Stat Psychol 46: 119–134
Van de Velden M, Bijmolt TH (2006) Generalized canonical correlation analysis of matrices with missing rows: a simulation study. Psychometrika 71: 323–331
Van der Burg E (1988) Nonlinear canonical correlation and some related techniques. DSWO Press, Leiden
Zanakis SH, Alvarez C, Li V (2007) Socio-economic determinants of HIV/AIDS pandemic and nations efficiencies. Eur J Oper Res 176: 1811–1838
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
van de Velden, M., Takane, Y. Generalized canonical correlation analysis with missing values. Comput Stat 27, 551–571 (2012). https://doi.org/10.1007/s00180-011-0276-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-011-0276-y