Keywords

1 Factor Analysis and Principal Component Analysis

Factor analysis (FA) and principal component analysis (PCA) are frequently used multivariate statistical methods for data reduction. In FA (Anderson, 2003; Lawley & Maxwell, 1971), the p-dimensional mean-centered vector of the observed variables \( \varvec{y}_{i} \), \( i = 1, \ldots ,n \), is linearly related to a m-dimensional vector of latent factors \( \varvec{f}_{i} \) via \( \varvec{y}_{i} \user2 = {\varvec{\Lambda}}\varvec{f}_{i} \varvec{ + \varepsilon }_{i} \), where \( {\varvec{\Lambda}} = (\varvec{\lambda}_{1} , \ldots ,\varvec{\lambda}_{m} ) \) is a p × m matrix of factor loadings (with p > m), and \( \varvec{\varepsilon}_{i} \) is a p-dimensional vector of errors. Typically for the orthogonal factor model, the three assumptions are imposed: (i) \( \varvec{f}_{i} \sim N_{m} ({\varvec{0}},\varvec{I}_{m} ) \); (ii) \( \varvec{\varepsilon}_{i} \sim N_{p} ({\varvec{0}},{\varvec{\Psi}}) \), where \( {\varvec{\Psi}} \) is a diagonal matrix with positive elements on the diagonal; (iii) \( {{Cov}}(\varvec{f}_{i} ,\varvec{\varepsilon}_{i} ) = {\varvec{0}} \). Then, under these three assumptions, the covariance matrix of \( \varvec{y}_{i} \) is given by \( {\varvec{\Sigma}} = {\varvec{\Lambda}} {\varvec{\Lambda}}^{\prime } + {\varvec{\Psi}} \). If \( \varvec{y}_{i} \) is standardized, \( {\varvec{\Sigma}} \) is a correlation matrix.

Let \( {\varvec{\Lambda}}^{ + } = (\varvec{\lambda}_{1}^{ + } , \ldots ,\varvec{\lambda}_{m}^{ + } ) \) be the p × m matrix whose columns are the standardized eigenvectors corresponding to the first m largest eigenvalues of \( {\varvec{\Sigma}} \); \( {\varvec{\Omega}} = diag(\varvec{\omega}) \) be the m × m diagonal matrix whose diagonal elements \( \varvec{\omega}= (\omega_{1} , \ldots ,\omega_{m} )^{\prime} \) are the first m largest eigenvalues of \( {\varvec{\Sigma}} \); and \( {\varvec{\Omega}}^{1/2} \) be the m × m diagonal matrix whose diagonal elements are the square root of those in \( {\varvec{\Omega}} \). Then principal components (PCs) (c.f., Anderson, 2003) with m elements are obtained as \( \varvec{f}_{i}^{*} = {\varvec{\Lambda}}^{ + \prime } \varvec{y}_{i} \). Clearly, the PCs are uncorrelated with a covariance matrix \( {\varvec{\Lambda}}^{ + \prime } {\varvec{\Sigma \Lambda }}^{ + } \). When m is properly chosen, there exists \( {\varvec{\Sigma}} \approx {\varvec{\Lambda}}^{ + } {\varvec{\Omega}} {\varvec{\Lambda}}^{{ + {\prime }}} = {\varvec{\Lambda}}^{\text{*}} {\varvec{\Lambda}}^{*\prime } \) , where \( {\varvec{\Lambda}}^{*} = {\varvec{\Lambda}}^{ + } {\varvec{\Omega}}^{1/2} \) is the p × m matrix of PCA loadings.

2 Closeness Conditions Between Factor Analysis and Principal Component Analysis

It has been well-known that FA and PCA often yield approximately the same results, especially their estimated loading matrices \( {\varvec{\hat{\Lambda }}} \) and \( {\varvec{\hat{\Lambda }}}^{*} \), respectively (e.g., Velicer & Jackson, 1990). Conditions under which the two matrices are close to each other are of substantial interest. At the population level, two such conditions identified by Guttman (1956) and Schneeweiss (1997) are among the most well-known.

2.1 Guttman Condition

Consider the factor analysis model \( {\varvec{\Sigma}} = {\varvec{\Lambda}} {\varvec{\Lambda}}^{\prime } + {\varvec{\Psi}}, \) where \( {\varvec{\Psi}} \) is a diagonal unique variance matrix, with \( ({\varvec{\Sigma}}^{ - 1} )_{jj} = \sigma^{jj} \) and \( ({\varvec{\Psi}})_{jj} = \psi_{jj} \), \( j = 1, \ldots ,p \). Let m be the number of common factors, Guttman (1956; See also Theorem 1 of Krijnen, 2006) has shown that if \( m/p \to 0 \) as \( p \to \infty \), then \( \psi_{jj} \sigma^{jj} \to 1 \) for almost all j. Here, “for almost all j” means \( \lim_{p \to \infty } \# \{ j :\,\psi_{jj} \sigma^{jj} < 1\} /p = 0 \). That is, the number of j that satisfies \( \psi_{jj} \sigma^{jj} < 1 \) is ignorable as p goes to infinity.

2.2 Schneeweiss Condition

The closeness condition between the loading matrix from FA and that from PCA by Schneeweiss and Mathes (1995) and Schneeweiss (1997) is \( ev_{m} ({\varvec{\Lambda}}^{\prime} {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}}) \to \infty \), where \( ev_{k} (\varvec{A}) \) is the \( k \)-th largest eigenvalue of a square matrix \( \varvec{A} \). Obviously, \( ev_{m} ({\varvec{\Lambda}}^{\prime} {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}}) \) is the smallest eigenvalue of \( {\varvec{\Lambda}}^{\prime} {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}} \).

Related with the Schneeweiss condition, Bentler (1976) parameterized the correlation structure of the factor model as \( {\varvec{\Psi}}^{ - 1/2} {\varvec{\Sigma}} {\varvec{\Psi}}^{ - 1/2} = {\varvec{\Psi}}^{ - 1/2} {\varvec{\Lambda}} {\varvec{\Lambda}}^{\prime } {\varvec{\Psi}}^{ - 1/2} + \varvec{I}_{p} \) and showed that, under this parameterization, a necessary condition for \( ev_{m} ({\varvec{\Lambda}}^{\prime } {\varvec{\Psi}}^{ - 1} {\varvec{\Lambda}}) = ev_{m} ({\varvec{\Psi}}^{ - 1/2} {\varvec{\Lambda}} {\varvec{\Lambda}}^{\prime } {\varvec{\Psi}}^{ - 1/2} ) \to \infty \) is that as \( p \) increases, the sum of squared loadings on each factor has to go to infinity (\( \varvec{\lambda}_{k}^{\prime }\varvec{\lambda}_{k} \to \infty \), \( k = 1, \ldots ,m \), as \( p \to \infty \)).

2.3 Relationship Between Guttman and Schneeweiss Conditions

The relationship between Guttman and Schneeweiss conditions is summarized in Table 1. Schneeweiss condition \( ( {ev_{m} ({\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}}) \to \infty } ) \) is sufficient for Guttman condition (\( m/p \to 0 \) as \( p \to \infty \)) (Krijnen, 2006, Theorem 3). What we would like is for the converse (\( m/p \to 0 \) as \( p \to \infty \Rightarrow ev_{m} ({\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}}) \to \infty \)) to hold in practical applications, as to be discussed in the next section.

Table 1 Relationships among conditions and results

First, the condition of \( m/p \to 0 \) as \( p \to \infty \) is sufficient for \( \psi_{jj} \sigma^{jj} \to 1 \) for almost all \( j \) (Guttman, 1956; Krijnen, 2006, Theorem 1). Also, \( \psi_{jj} \sigma^{jj} \to 1 \) for all \( j \) implies \( ev_{m} ({\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}}) \to \infty \) (Krijnen, 2006, Theorem 4). Here, “\( \psi_{jj} \sigma^{jj} \to 1 \) for all \( j \)” is slightly stronger than “\( \psi_{jj} \sigma^{jj} \to 1 \) for almost all \( j \).” However, in practice, it seems reasonable to assume that the number of loadings on every factor increases with \( p \) proportionally, as stated in Bentler (1976). Then the condition of \( m/p \to 0 \) as \( p \to \infty \) becomes equivalent to \( ev_{m} ({\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}}) \to \infty \). That is, Guttman and Schneeweiss conditions become interchangeable.

3 Extended Guttman Condition

By far the most important consequence of the Schneeweiss condition is that, when \( ev_{m} ({\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}}) \to \infty \), the second term in the right-hand side of the Sherman-Morrison-Woodbury formula (see, e.g., Chap. 16 of Harville, 1997):

$$ {\varvec{\Sigma}}^{ - 1} = {\varvec{\Psi}}^{ - 1} - {\varvec{\Psi}}^{ - 1} {\varvec{\Lambda}}(\varvec{I}_{m} + {\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}})^{ - 1} {\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} $$
(1)

vanishes, so that

$$ {\varvec{\Psi}}^{ - 1} - {\varvec{\Sigma}}^{ - 1} \to {\varvec{0}}\quad {\text{as}}\quad ev_{m} ({\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}}) \to \infty $$
(2)

As we noted in the previous section, the condition of \( m/p \to 0 \) as \( p \to \infty \) can be equivalent to \( ev_{m} ({\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}}) \to \infty \) in practical applications. Therefore, we have \( {\varvec{\Psi}}^{ - 1} - {\varvec{\Sigma}}^{ - 1} \to {\varvec{0}} \) under high dimensions with a large \( p \). We call \( {\varvec{\Psi}}^{ - 1} - {\varvec{\Sigma}}^{ - 1} \to {\varvec{0}} \) the extended Guttman condition. It is an extension of the original Guttman condition in the sense that \( \psi_{jj} \sigma^{jj} \to 1 \) can be expressed as \( \psi_{jj}^{ - 1} - \sigma^{jj} \to 0 \), as long as \( \psi_{jj} \) is bounded above \( (\psi_{jj} \le \psi_{\sup } < \infty ) \).

Note that there exists a similar identity for the FA model:

$$ {\varvec{\Psi}}^{ - 1} - {\varvec{\Psi}}^{ - 1} {\varvec{\Lambda}}({\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}})^{ - 1} {\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} = {\varvec{\Sigma}}^{ - 1} - {\varvec{\Sigma}}^{ - 1} {\varvec{\Lambda}}({\varvec{\Lambda^{\prime}\Sigma }}^{ - 1} {\varvec{\Lambda}})^{ - 1} {\varvec{\Lambda^{\prime}\Sigma }}^{ - 1} $$
(3)

(see, e.g., Hayashi & Bentler, 2001). Clearly, as \( ev_{m} ({\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}}) \to \infty \), not only the second term on the left-hand side of Eq. (3) but the second term on the right-hand side of Eq. (3) vanishes.

As we have just seen, the extended Guttman condition is a direct consequence of the Schneeweiss condition. Because \( {\varvec{\Psi}}^{ - 1} {\varvec{\Lambda}}(\varvec{I}_{m} + {\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}})^{ - 1} {\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} < {\varvec{\Psi}}^{ - 1} {\varvec{\Lambda}}({\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}})^{ - 1} {\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} \) and \( \varvec{I}_{m} + {\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}} \) is only slightly larger than \( {\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}} \) when \( {\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}} \) is large in the sense that \( ev_{m} (\varvec{I}_{m} + {\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}}) = ev_{m} ({\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}}) + 1 \), the speed of convergence in \( {\varvec{\Psi}}^{ - 1} - {\varvec{\Sigma}}^{ - 1} \to {\varvec{0}} \) is approximately at the rate of the reciprocal of smallest eigenvalues of \( {\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}} \), that is, of \( 1/ev_{m} ({\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}}) \).

4 Approximation of the Inverse of the Covariance Matrix

An important point to note here is that the original Guttman condition of \( \psi_{jj} \sigma^{jj} \to 1 \) (for almost all \( j \)) has to do with only the diagonal elements of \( {\varvec{\Psi}} \) (or \( {\varvec{\Psi}}^{ - 1} \)) and \( {\varvec{\Sigma}}^{ - 1} \), while \( {\varvec{\Psi}}^{ - 1} - {\varvec{\Sigma}}^{ - 1} \to {\varvec{0}} \) involves both the diagonal and the off-diagonal elements of the matrices. It justifies the interchangeability of \( {\varvec{\Sigma}}^{ - 1} \) and \( {\varvec{\Psi}}^{ - 1} \) as \( ev_{m} ({\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}}) \to \infty \), or assuming that the number of loadings on every factor increases with \( p \) proportionally, as \( m/p \to 0 \) with \( p \to \infty \). The important implication is that all the off-diagonal elements of \( {\varvec{\Sigma}}^{ - 1} \) approach zero in the limit. Thus, it is a result of sparsity of the off-diagonal elements of the inverted covariance (correlation) matrix in high dimensions.

One of the obvious advantages of being able to approximate \( {\varvec{\Sigma}}^{ - 1} \) by \( {\varvec{\Psi}}^{ - 1} \) in high dimensions is that the matrix of unique variances \( {\varvec{\Psi}} \) is a diagonal matrix and thus it can be inverted only with \( p \) operations. Note that, in general, the inversion of a \( p \) –dimensional square matrix requires operations of order \( O(p^{3} ) \) (see, e.g., Pourahmadi, 2013, p. 121).

Consequently, the single most important application of the extended Guttman condition is to approximate the inverse of the covariance matrix \( {\varvec{\Sigma}}^{ - 1} \) by \( {\varvec{\Psi}}^{ - 1} \) under high dimensions. This implication is very important because \( {\varvec{\Sigma}}^{ - 1} \) is involved in the quadratic form for the log likelihood function of the multivariate normal distribution. Even if \( {\varvec{\Sigma}} \) is positive definite so that \( {\varvec{\Sigma}}^{ - 1} \) exists in the population, the inverse \( \varvec{S}^{ - 1} \) of the sample covariance matrix \( \varvec{S} \) does not exist under high dimensions when \( p > n \). When \( \varvec{S}^{ - 1} \) does not exist, we cannot estimate \( {\varvec{\Psi}}^{ - 1} \) under the FA model using the generalized least squares (GLS) or the maximum likelihood (ML) method, without resorting to certain regularization method(s), either. Thus, a natural choice would be to employ the unweighted least square (ULS) estimation method that minimizes the fit function of \( F_{ULS} (\varvec{S},{\varvec{\Sigma}}) = tr\{ (\varvec{S} - {\varvec{\Sigma}})^{2} \} \), which does not require to compute \( \varvec{S}^{ - 1} \) or the estimate of \( {\varvec{\Sigma}}^{ - 1} \). Note that \( 1 - 1/s^{jj} \), a common initial value for the j-th communarity cannot be used because it requires the computation of \( \varvec{S}^{ - 1} \). Then, we can use the value of 1 as the initial communality estimates. In this case, the initial solution is identical to PCA.

Alternatively, when \( p \) is huge, we can employ the following “approximate” FA model with equal unique variances (e.g., Hayashi & Bentler, 2000), using standardized variables, that is, applying to the correlation matrix:

$$ {\varvec{\Sigma}} \approx {\varvec{\Lambda}}^{*} {\varvec{\Lambda}}^{{*^{\prime}}} + k\varvec{I}_{p} , $$
(4)

with a positive constant \( k \). Note that this model is also called the probabilistic PCA in statistics (Tipping & Bishop, 1999). Use of the FA model with equal unique variances seems reasonable, because the eigenvectors of \( \text{(}{\varvec{\Sigma}} - k\varvec{I}_{p} \text{)} \) are the same as the eigenvectors of \( {\varvec{\Sigma}} \), and the eigenvalues of \( \text{(}{\varvec{\Sigma}} - k\varvec{I}_{p} ) \) are smaller than the eigenvalues of \( {\varvec{\Sigma}} \) by only the constant of \( k \). Thus, the FA model with equal unique variances is considered as a variant of the PCA, and, the loading matrices between the FA and the PCA approach the same limit values as \( ev_{m} ({\varvec{\Lambda}}^{\prime } {\varvec{\Psi }}^{ - 1} {\varvec{\Lambda}}) \to \infty \), or they become essentially equivalent, under high dimensions.

In Eq. (4), let \( {\varvec{\Psi}}^{\text{*}} = k\varvec{I}_{p} \), then \( {\varvec{\Psi}}^{* - 1} = k^{ - 1} \varvec{I}_{p} \). Thus, we can use \( {\varvec{\Psi}}^{* - 1} \) as quick and fast approximation for \( {\varvec{\Psi}}^{ - 1} \). The natural estimator of \( k \) is the MLE for \( k \) given \( {\varvec{\Lambda}}^{*} \) (Tipping & Biship, 1999):

$$ \hat{k} = \frac{1}{p - m}\sum\limits_{j = m + 1}^{p} {ev_{j} (\varvec{S})} . $$
(5)

However, a more practical method seems as follows: Once estimating \( {\varvec{\Psi}}^{\text{*}} = k\varvec{I}_{p} \) by \( {\hat{\varvec{\Psi }}}^{\text{*}} = \hat{k}\varvec{I}_{p} \), we can compute loadings \( \hat{\Lambda }^{*} \) using the eigenvalues and eigenvectors of \( \text{(}\varvec{S} - \hat{k}\varvec{I}_{p} ) \) and find the estimates of \( {\varvec{\Psi}}^{*} \) as \( {\hat{\varvec{\Psi }}}^{*} = diag(\varvec{S} - \hat{\Lambda }^{*} \hat{\Lambda }^{*\prime } ). \) Note that \( {\hat{\varvec{\Psi }}}\,^{*} \) is no longer a constant times the identity matrix. Now, invoke the estimator version of the extended Guttman condition \( {\hat{\varvec{\Psi }}}^{* - 1} - {\hat{\varvec{\Sigma }}}^{ - 1} \approx {\varvec{0}} \) to find the approximate estimator \( {\hat{\varvec{\Sigma }}}^{ - 1} \) of \( {\varvec{\Sigma}}^{ - 1} \).

5 Illustration

The compound symmetry correlation structure is expressed as \( {\varvec{\Sigma}} = (1 - \rho )\varvec{I}_{p} + \rho {\varvec{1}}_{p} {\varvec{1}}_{p}^{\prime } \) with a common correlation \( \rho \), \( 0 < \rho < 1 \). Obviously, it is a one-factor model with the vector of factor loadings \( \varvec{\lambda}_{1} = \sqrt \rho {\varvec{1}}_{p} \) and the diagonal matrix unique variances \( {\varvec{\Psi}} = (1 - \rho )\varvec{I}_{p} \). Because the first eigenvalue and the corresponding standardized eigenvector of \( {\varvec{\Sigma}} = (1 - \rho )\varvec{I}_{p} + \rho {\varvec{1}}_{p} {\varvec{1}}_{p}^{\prime} \) are \( \omega_{1} = 1 + (p - 1)\rho \) and \( \varvec{\lambda}_{1}^{ + } = (1/\sqrt p ){\varvec{1}}_{p} \), respectively, the first PC loading vector is

$$ \varvec{\lambda}_{1}^{*} =\varvec{\lambda}_{1}^{ + } \sqrt {\omega_{1} } = (1/\sqrt p )\sqrt {1 + (p - 1)\rho } \cdot {\varvec{1}}_{p} = \sqrt {1/p + (1 - 1/p)\rho } \cdot {\varvec{1}}_{p} , $$
(6)

which approaches the vector of factor loadings \( \varvec{\lambda}_{1} = \sqrt \rho {\varvec{1}}_{p} \) with \( m/p = 1/p \to 0 \) and \( p \to \infty \). The remaining p − 1 eigenvalues are \( \omega_{2} = \ldots = \omega_{p} = 1 - \rho \). Thus, obviously, the constant k in the FA model with equal unique variances is \( k = 1 - \rho \). Note that the Schneeweiss condition also holds

$$ \varvec{\lambda}_{1}^{\prime } {\varvec{\Psi}}^{ - 1}\varvec{\lambda}_{1} = (\sqrt \rho {\varvec{1}}_{p} )^{\prime}\{ (1/(1 - \rho ))\varvec{I}_{p} \} (\sqrt \rho {\varvec{1}}_{p} ) = p \cdot \rho /(1 - \rho ) \to \infty $$
(7)

with \( m/p = 1/p \to 0 \) as \( p \to \infty \). The inverse of the correlation matrix is:

$$ \begin{aligned} {\varvec{\Sigma}}^{ - 1} & = {\varvec{\Psi}}^{ - 1} - {\varvec{\Psi}}^{ - 1}\varvec{\lambda}_{1} (1 +\varvec{\lambda}_{1}^{\prime } {\varvec{\Psi}}^{ - 1}\varvec{\lambda}_{1} )^{ - 1}\varvec{\lambda}_{1}^{\prime } {\varvec{\Psi}}^{ - 1} \\ & = (\frac{1}{1 - \rho })\varvec{I}_{p} - (\frac{1}{1 - \rho })\varvec{I}_{p} \cdot (\sqrt \rho {\varvec{1}}_{p} ) \cdot (1 + \frac{\rho }{1 - \rho } \cdot p)^{ - 1} \cdot (\sqrt \rho {\varvec{1}}_{p}^{\prime } ) \cdot (\frac{1}{1 - \rho })\varvec{I}_{p} \\ & = (\frac{1}{1 - \rho })\varvec{I}_{p} - (\frac{\rho }{1 - \rho })(\frac{1}{(1 - \rho ) + \rho \cdot p})({\varvec{1}}_{p} {\varvec{1}}_{p}^{\prime } ) \to (\frac{1}{1 - \rho })\varvec{I}_{p} = {\varvec{\Psi}}^{ - 1} \\ \end{aligned} $$
(8)

with \( m/p = 1/p \to 0 \) as \( p \to \infty \).

For example, it is quite easy to show that if \( \rho = 0.5 \), then for p = 10, the diagonal elements of the inverse of the compound symmetry correlation structure are 2 − 1/5.5 = 1.818 and the off-diagonal elements are −1/5.5 = –0.182. At p = 100, the diagonal and the off-diagonal elements become 2 − 1/50.5 = 1.980 and −1/50.5 = –0.0198, respectively. Furthermore, at p = 1000, the diagonal and the off-diagonal elements become 2 − 1/500.5 = 1.998 and −1/500.5 = –0.001998. Again, we see the off-diagonal elements of \( {\varvec{\Sigma}}^{ - 1} \) approaching 0 as p increases. Also, the diagonal elements of \( {\varvec{\Sigma}}^{ - 1} \) approach 2, which are the value of the inverse of the unique variances in the FA model.

6 Discussion

We discussed the matrix version of the Guttman condition for closeness between FA and PCA. It can be considered as an extended Guttman condition in the sense that the matrix version involves not only the diagonal elements but also the off-diagonal elements of the matrices \( {\varvec{\Sigma}}^{ - 1} \) and \( {\varvec{\Psi}}^{ - 1} \). Because \( {\varvec{\Psi}}^{ - 1} \) is a diagonal matrix, the extended Guttman condition implies that the off-diagonal elements of \( {\varvec{\Sigma}}^{ - 1} \) approach zero as the dimension increases. We showed how the phenomenon happens with the compound symmetry example in the Illustration section. We also discussed some implications of the extended Guttman condition, which include the ease of inverting \( {\varvec{\Psi}} \) compared against inverting \( {\varvec{\Sigma}} \). Because the ULS estimation method does not involve any inversion of either the sample covariance matrix S or the estimated model implied population covariance matrix \( {\hat{\varvec{\Sigma }}} \), the ULS should be the estimation of choice when sample size n is smaller than the number of variables p. Furthermore, we proposed a simple method to approximate \( {\varvec{\Sigma}}^{ - 1} \) by \( {\varvec{\Psi}}^{ - 1} \) using the FA model with equal unique variances, or equivalently, the probabilistic PCA model.

Some other implications of the extended Guttman condition (especially with respect to algorithms) are as follows: First of all, suppose we add the \( \left( {p + 1} \right) \)th variable at the end of already existing p variables. Then, while the values of \( \sigma^{jj} \), \( j = 1, \ldots ,p \), can change, \( \psi_{jj}^{ - 1} \), \( j = 1, \ldots ,p \), remain unchanged. Thus, with the extended Guttman condition, only one additional element needs to be computed.

Another implication is on the ridge estimator, which is among the methods to deal with singularity of S or the estimator of its covariance matrix by introducing some small bias term (see e.g., Yuan & Chan, 2008, 2016). Warton (2008, Theorem 1) showed that the ridge estimator of the covariance (correlation) matrix \( {\hat{\varvec{\Sigma }}}_{\eta } {\varvec = }\eta {\hat{\varvec{\Sigma }}} + (1 - \eta )\varvec{I}_{p} \) (with the tuning parameter \( \eta \)) is the maximum penalized likelihood estimator with the penalty term proportional to \( - tr({\varvec{\Sigma}}^{ - 1} ) \). Unfortunately, as the dimension p increases (or the ratio \( p/n \) increases), it becomes more difficult to obtain the inverse of the covariance matrix. Therefore, in high dimensions, it is not practical to express the ridge estimator of the covariance matrix in the form of the maximum penalized likelihood with the penalty term involving \( - tr({\varvec{\Sigma}}^{ - 1} ) \). This naturally leads to employing an “approximate” maximum penalized likelihood with the penalty term approximately proportional to \( - tr({\varvec{\Psi}}^{ - 1} ) \) in place of the penalty term proportional to \( - tr({\varvec{\Sigma}}^{ - 1} ) \), assuming the factor analysis model, when the dimension p is large.

We are aware that, perhaps except approximations of the inverse of covariance matrix, the majority of implications that we discussed in this article may be of limited practical utility. For example, because the original Guttman condition, the Schneeweiss condition, and the extended Guttman condition are all conditions for closeness between FA and PCA, we can simply employ PCA as an approximation to FA when the conditions hold. Also, we did not discuss regularized FA with L1 regularization here, which in itself is a very interesting topic. Yet, we think the implications we discussed are still of theoretical interest that should continue to be studied. The compound symmetry example used in the Illustration is probably only an approximation to the real world. We will need to do an extensive simulation to come up with some empirical guidelines regarding how to best apply the theoretical results in practice.