Abstract
Factor Analysis is a method for modeling observed variables and their covariance structure, in terms of a smaller number of underlying unobservable factors. The factors are considered as broad concepts or ideas that may describe an observed phenomenon. Factor analysis may be considered to be a generalization of principal component method since here also we replace the large number of variables by a smaller number of unknown factors. However, the aim of principal component analysis is to explain the variance while factor analysis explains the covariance among the variables. Hence, factor analysis is a way to understand how the patterns of relationship between several variables are caused by a smaller number of latent variables, according to their common aspects. These hidden variables are called factors.
A significant part of ‘Chattopadhyay and Chattopadhyay (2014). Statistical Methods for Astronomical Data Analysis, Springer Series in Astrostatistics, Springer Science\(+\)Business Media New York’ is reproduced in this chapter.
Access provided by CONRICYT-eBooks. Download chapter PDF
Similar content being viewed by others
Keywords
10.1 Factor Analysis
Factor analysis (Chattopadhyay and Chattopadhyay 2014) is a statistical method used to study the dimensionality of a set of variables. In factor analysis, latent variables represent unobserved constructs and are referred to as factors or dimensions. Factor analysis attempts to identify underlying variables or factors that explain the pattern of correlations within a set of observed variables. Factor analysis is often used in data reduction to identify a small number of factors that explain most of the variance that is observed in a much larger number of manifest variables.
Suppose the observable random vector X with p components has mean vector \(\mu \) and covariance matrix \(\Sigma \). In the factor model, we assume that X is linearly dependent upon a few unobservable random variables \(F_1, F_2 \ldots F_p\) called common factors and p additional sources of variation \(\in _1, \in _2, \ldots , \in _p\) called the errors (or specific factors).
Then the factor model is
The coefficients \(l_{ij}\)s are called the loading of the ith variable on the jth factor so the matrix L is the matrix of factor loadings. Here \(\in _i\) is associated only with the ith response \(X_i\). Here the p deviations \(X_1 - \mu _1 \ldots X_p - \mu _p\) are expressed in terms of \(p + m\) random variables \(F_1, F_2, \ldots , F_m, \in _1, \ldots \in _p\) which are unobservable (but in multivariate regression independent variables can be observed).
With same additional assumption on the random vectors F and \(\in \), the model w implies certain covariance relationships which can be checked.
We assume that
The model \(X - \mu = LF + \in \) is linear in the common factors. If the p response of X are related to the underlying in factors in a nonlinear form \([X_1 - \mu _1 = F_1 F_3 + \in _1]\) Then the covariance structure \(LL' + \psi \) may not be adequate. The assumption of linearity is inherent here.
These assumption and the relation (10.1.1) constitute the orthogonal factor model.
The orthogonal factor model implies a covariance structure for X.
Now \(\Sigma = LL' + \psi \) implies
Let ith communality \( = {h_i}^2 = {l_{i1}}^2 + \cdots + {l_{im}}^2\)
Then \(\delta _{ii} = {h_i}^2 + \psi _i \, \, \, (i = 1 \ldots p)\)
\({h_i}^2\) \(=\) sum of squares of loadings of ith variable on the m common factors.
Given a random sample of observations \({\displaystyle \mathop {x_1}^{b \times 1}}, x_2 \dots {\displaystyle \mathop {x_p}^{p \times 1}}\). The basic problem is to decide whether \(\Sigma \) can be expressed in the form (10.1.3) for reasonably small value of m, and to estimate the elements of L and \(\psi \).
Here the estimation procedure is not so easy. Primarily, we have from the sample data estimates of the \(\frac{p(p + 1)}{2}\) distinct elements of the upper triangle of \(\Sigma \) but on the RHS of (10.1.3) we have \(pm + p\) parameters, pm for L and p for \(\psi \). The solution will be indeterminate unless \(\frac{p(p + 1)}{2} - p(m + 1) \ge 0\) or \(p > 2m\). Even if this condition is satisfied L is not unique.
Proof
Let \({\displaystyle \mathop T^{m \times m}}\) be any \(\bot \) matrix so that \(TT' = T'T = I\)
Then (10.1.1) can be written as
It is impossible to distinguish between loadings L and \(L^{*}\) on the basis of the observations on X. So the vectors F and \(F^{*} = T'F\) have the same statistical properties and even if the loadings L and \(L^{*}\) are different, they both generate the same covariance matrix \(\Sigma ,\) i.e.,
The above problem of uniqueness is generally resolved by choosing an orthogonal rotation T such that the final loading L satisfies the condition that \(L' \psi ^{- 1} L\) is diagonal with positive diagonal elements. This restriction requires L to be of full rank m. With a valid \(\psi \) viz. one with all positive diagonal elements it can be shown that the above restriction yields a unique L.
\(\square \)
10.1.1 Method of Estimation
Given n observations vectors \(x_1 \ldots x_n\) on p generally correlated variables, factor analysis seeks to verify whether the factor model (10.1.1) with a small number of factors adequately represent the data.
The sample covariance matrix is an estimator of the unknown covariance matrix \(\Sigma \). If \(\Sigma \) appears to deviate significantly from a diagonal matrix, then a factor model can be used and the initial problem is one of estimating the factor loadings \(l_{ij}\) and the specific variances. \(\psi _i\).
Principal Component Method Footnote 1
Let \(\Sigma \) has eigenvalue–eigenvector pairs \((\lambda _i, e_i)\) with \(\lambda _1 \ge \lambda _2 \ge \cdots \lambda _p \ge 0\). Then by specified decomposition
[in (10.1.6) \(m = p\) and jth column of \(L = \sqrt{\lambda _j e_j}\)].
Apart from the scale factor \(\sqrt{\lambda _j}\), the factor loadings on the jth factor are the \(pp^n\) jth principal component.
The approximate representation assumes that the specific factors \(\in \) are of minor importance and can be ignored in factoring \(\Sigma \). If specific factors are included in the model, their variances may be taken to be the diagonal elements of \(\Sigma - LL'\).
Allowing for specific factors, the approximation becomes
where \(m \le p\).
(we assume that last \(p - m\) eigenvalues are small)
and \(\psi _{ii} = \delta _{ii} - {\displaystyle \mathop \sum ^m_{j = 1}} {l_{ij}}^2\) for \(i = 1 \ldots p\).
For the principal component solution, the estimated factor loadings for a given factor do not change as the number of factors is increased. If \(m = 1\)
if \(m = 2\)
where \((\widehat{\lambda _1}, \widehat{e_1})\) and \((\widehat{\lambda _2}, \widehat{e_2})\) are the first two eigenvalue–eigenvector pairs for S (or R).
By the definition of \(\widehat{\psi _i}\), the diagonal elements of S are equal to the diagonal elements of \({\widehat{L} \widehat{L}}' + \psi \). How to determine m?
The choice of m can be based on the estimated eigenvalues.
Consider the residual matrix \(S - (LL' + \psi )\)
Here the diagonal elements are new and if the off-diagonal elements are also small we may take that particular value of m to be appropriate.
Analytically, we chose that m for which
Ideally, the contribution of the first few factors to the sample variance of the variables should be large. The contribution to the sample variance \(s_{ii}\) from the first common factor is \({l_{ii}}^2\). The contribution to the total sample variance \(s_{11} + \cdots s_{pp} = h(S)\) from the first common factor is
Since the eigenvectors \(\widehat{e}_1\) has unit length.
In general,
Criterion (10.1.9) is frequently used as a heuristic device for determining the appropriate number of common factors. The value of m is gradually increased until a suitable proportion of the total sample variance has been explained.
Other Rules Used in Package
No. of eigenvalue of R greater than one (when R is used)
No. of eigenvalue of S that are positive (when S is used).
10.1.2 Factor Rotation
If \(\widehat{L}\) be the \(p \times m\) matrix of estimated factor loadings obtained by any method, then
is a \(p \times m\) matrix of rotated loadings.
Moreover, the estimated covariance (or correlation) matrix remains unchanged since
The above equation indicates that the residual matrix \(S_n - \widehat{L} \widehat{L}' - \widehat{\psi } = S_n - \widehat{L}^{*} {\widehat{L}^{*'}} - \widehat{\psi }\) remains unchanged. Moreover, the specific variances \(\widehat{\psi }_i\) and hence the communication \({\widehat{h}_i}^2\) are unaltered. Hence, mathematically it is immaterial whether \(\widehat{L}\) or \(L^{*}\) is obtained.
Since the original loadings may not be readily interpretable, it is usual practice to rotate them until a ‘simple structure’ is achieved.
Ideally, we should like to see a pattern of loadings of each variable loads highly on a single factor and has small to moderate loading on the remaining factors.
The problem is to find an orthogonal rotation which compounds to a ‘simple structure.’
There can be achieved if often rotation the orthogonality of the factor still exists. This is maintained of we perform orthogonal rotation. Among these (1) Variance rotation, (2) Quartimax rotation, (3) Equamax rotation are important.
Oblique rotation does not ensure the orthogonality of factors often rotation. There are several algorithms like oblimax, Quartimax.
10.1.3 Varimax Rotation
Orthogonal Transformation on L
\(L^{*}\) is the matrix of orthogonally rotated loadings and let \(d_j = {\displaystyle \mathop \sum ^p_{i = 1}} {l^{*}}^2_{ij} \, \, \, j = 1 \ldots m\)
Then the following expression is maximized
Such a procedure tries to give either large (in absolute value) or zero values in the columns of Ł\(^{*}\). Hence, the procedure tries to produce factors with either a stray association with the responses or no association at all.
The communality
\({h_i}^2 = {\displaystyle \mathop \sum ^m_{j = 1}} {l^{*}}^2_{ij} = {\displaystyle \mathop \sum ^m_{j = 1}} {l_{ij}}^2\) remains constant under rotation.
10.2 Quartimax Rotation
The factor pattern is simplified by forcing the variables to correlate highly with one main factor (the so-called G-factor of 1Q studies) and very little with remaining factors. Here all variables are primarily associated with a single factor.
Interpretation of results obtained from factor analysis. It is usually difficult to interpret. Many users should significant coefficient magnitudes on many of the retained factors (coefficient greater than —.60— are often considered large and coefficients of —0.35— are often considered moderate). And especially on the first factor.
For good interpretation, factor rotation is necessary. The objective of the rotation is to achieve the most ‘simple structure’ though the manipulation of factor pattern matrix.
The most simple structure can be explained in terms of five principles of factor rotation.
-
1.
Each variable should have at least one zero (small) loadings.
-
2.
Each factor should have a set of linearly independent variables where factor loadings are zero (small).
-
3.
For every pair of factors, there should be several variables where loadings are zero (small) for one factor but not the other.
-
4.
For every pair of factors, a large proportion of variables should have zero (small) loading on both factors whenever more than about four factors are extracted.
-
5.
For every pair of factors, there should only be a small number of variables with nonzero loadings on both.
In orthogonal rotation,
-
(1)
Factors are perfectly uncorrelated with one another.
-
(2)
Less parameters are to be estimated.
10.3 Promax Rotation
Factors are allowed to be correlated with one another.
-
Step I. Rotate the factors orthogonally.
-
Step II. Get a target matrix by raising the factor coefficients to an exponent (3 or 4). The coefficients secure smaller but absolute distance increases.
-
Step III. Rotate the original matrix to a best-fit position with the target matrix.
Here many moderate coefficients quickly approaches zero \([.3 \times .3 = .09]\) then the large coefficients \((\ge .6)\).
Example 10.1
Consider the data set related to the relative consumption of certain food items in European and Scandinavian countries considered in the chapter of principal component analysis.
If we do factor analysis with varimax rotation, then the output is as follows:
Rotated Factor Loadings and Communalities Varimax Rotation
Variable | Factor1 | Factor2 | Factor3 | Factor4 | Communality |
coffee | 0.336 | 0.807 | 0.018 | −0.095 | 0.774 |
Tea | −0.233 | 0.752 | 0.330 | 0.370 | 0.866 |
Biscuits | 0.502 | 0.124 | 0.712 | −0.177 | 0.806 |
Powder | 0.317 | 0.856 | 0.047 | −0.230 | 0.889 |
Potatoes | 0.595 | 0.047 | 0.060 | 0.485 | 0.595 |
Frozen fish | 0.118 | −0.100 | 0.050 | 0.918 | 0.869 |
Apples | 0.832 | 0.284 | 0.251 | 0.097 | 0.846 |
Oranges | 0.903 | 0.148 | 0.004 | 0.036 | 0.839 |
Butter | −0.004 | 0.089 | 0.900 | 0.172 | 0.847 |
Variance | 2.3961 | 2.0886 | 1.4969 | 1.3480 | 7.3296 |
\(\%\) Var | 0.266 | 0.232 | 0.166 | 0.150 | 0.814 |
Factor Score Coefficients
Variable | Factor1 | Factor2 | Factor3 | Factor4 |
coffee | 0.038 | 0.408 | −0.144 | −0.040 |
Tea | −0.311 | 0.456 | 0.119 | 0.319 |
Biscuits | 0.165 | −0.141 | 0.506 | −0.252 |
Powder | 0.026 | 0.426 | −0.109 | −0.144 |
Potatoes | 0.253 | −0.047 | −0.089 | 0.331 |
Frozen fish | 0.006 | −0.019 | −0.072 | 0.692 |
Apples | 0.339 | −0.008 | 0.045 | 0.006 |
Oranges | 0.431 | −0.064 | −0.129 | −0.026 |
Butter | −0.132 | −0.080 | 0.674 | 0.029 |
We see that according to percentage of variation about 80% variation is explained by first four components (as in case of PCA). But here the advantage id unlike PCA we can physically explain the factors. According to rotated factor loading, we can say that the first factor is composed of ‘apples, oranges, and potatoes,’ similarly the other three factors are composed of ‘coffee, tea, and powder soup,’ ‘butter and biscuits,’ and ‘potatoes and frozen fish,’ respectively.
Except Potato there is no overlapping, and the groups are well defined and may correspond to types of customers preferring ‘fruits,’ ‘hot drinks,’ ‘snacks’ and ‘proteins, vitamins, and minerals.’
The most significant difference between PCA and factor analysis is regarding the assumption of an underlying causal structure. Factor analysis assumes that the covariation among the observed variables is due to the presence of one or more latent variables known as factors that impose causal influence on these observed variables. Factor analysis is used when there exit some latent factors which impose causal influence on the observed variables under consideration. Exploratory factor analysis helps the researcher identify the number and nature of these latent factors. But principal component analysis makes no assumption about an underlying causal relation. It is simply a dimension reduction technique.
Notes
- 1.
A significant part of ‘Chattopadhyay and Chattopadhyay (2014). Statistical methods for Astronomical Data Analysis, Springer Series in Astrostatistics, Springer’ is reproduced in this part.
References and Suggested Readings
Albazzas, H., & Wang, X. Z. (2004). Industrial & Engineering Chemistry Research, 43(21), 6731.
Babu, J., et al. (2009). The Astrophysical Journal, 700, 1768.
Chattopadhyay, A. K., & Chattopadhyay, T. (2014). Statistical methods for astronomical data analysis., Springer series in astrostatistics New York: Springer.
Chattopadhyay, A. K., Chattopadhyay, T., Davoust, E., Mondal, S., & Sharina, M. (2009). The Astrophysical Journal, 705, 1533.
Chattopadhyay, A. K., Mondal, S., & Chattopadhyay, T. (2013). Computational Statistics & Data Analysis, 57, 17.
Comon, P. (1994). Signal Processing, 36, 287.
Dickens, R. J. (1972). Monthly Notices of Royal Astronomical Society, 157, 281.
Fusi Pecci, F., et al. (1993). Astronomical Journal, 105, 1145.
Gabriel, K. R. (1971). Biometrika, 5, 453.
Hastie, T., & Tibshirani, R. (2003). In S. Becker, & K. Obermayer (Eds.). Independent component analysis through product density estimation in advances in neural information processing system (Vol. 15, pp. 649–656). Cambridge, MA: MIT Press.
Hyvarinen, A., & Oja, E. (2000). Neural Networks, 13(4–5), 411.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Mukherjee, S.P., Sinha, B.K., Chattopadhyay , A. (2018). Factor Analysis. In: Statistical Methods in Social Science Research. Springer, Singapore. https://doi.org/10.1007/978-981-13-2146-7_10
Download citation
DOI: https://doi.org/10.1007/978-981-13-2146-7_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2145-0
Online ISBN: 978-981-13-2146-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)