Keywords

10.1 Factor Analysis

Factor analysis (Chattopadhyay and Chattopadhyay 2014) is a statistical method used to study the dimensionality of a set of variables. In factor analysis, latent variables represent unobserved constructs and are referred to as factors or dimensions. Factor analysis attempts to identify underlying variables or factors that explain the pattern of correlations within a set of observed variables. Factor analysis is often used in data reduction to identify a small number of factors that explain most of the variance that is observed in a much larger number of manifest variables.

Suppose the observable random vector X with p components has mean vector \(\mu \) and covariance matrix \(\Sigma \). In the factor model, we assume that X is linearly dependent upon a few unobservable random variables \(F_1, F_2 \ldots F_p\) called common factors and p additional sources of variation \(\in _1, \in _2, \ldots , \in _p\) called the errors (or specific factors).

Then the factor model is

$$\begin{aligned} {\displaystyle \mathop X^{p \times 1}} = {\displaystyle \mathop \mu ^{p \times 1}} + {\displaystyle \mathop L^{p \times m}} {\displaystyle \mathop F^{m\times 1}} + {\displaystyle \mathop \in ^{p \times 1}} \end{aligned}$$
(10.1.1)
$$ \begin{array}{c} X_1 - \mu _1 = l_{11} F_1 + l_{12} F_2 + \cdots + l_{1m} F_m + \in _1\\ X_2 - \mu _1 = l_{21} F_1 + l_{22} F_2 + \cdots + l_{2m} F_m + \in _2\\ \vdots \\ X_p - \mu _p = l_{p1} F_1 + l_{p2} F_2 + \cdots + l_{pm} F_m + \in _p \end{array} $$

The coefficients \(l_{ij}\)s are called the loading of the ith variable on the jth factor so the matrix L is the matrix of factor loadings. Here \(\in _i\) is associated only with the ith response \(X_i\). Here the p deviations \(X_1 - \mu _1 \ldots X_p - \mu _p\) are expressed in terms of \(p + m\) random variables \(F_1, F_2, \ldots , F_m, \in _1, \ldots \in _p\) which are unobservable (but in multivariate regression independent variables can be observed).

With same additional assumption on the random vectors F and \(\in \), the model w implies certain covariance relationships which can be checked.

We assume that

$$ E(P) = 0^{m \times 1} \, \, \, cov(F) = E(FP') = I ^{m \times m} $$
$$ E(\in ) = 0^{p \times 1} \, \, \, cov(\in ) = E(\in \in ') = \psi = \left( \begin{array}{cccc} \psi _1 &{} 0 &{} \ldots &{} 0 \\ 0 &{} \psi _2 &{} \ldots &{} 0 \\ 0 &{} 0 &{} \ldots &{} \psi _p \end{array}\right) $$
$$\begin{aligned} \text { and } cov(\in , F) = E(\in , F) = 0^{p \times m} \end{aligned}$$
(10.1.2)

The model \(X - \mu = LF + \in \) is linear in the common factors. If the p response of X are related to the underlying in factors in a nonlinear form \([X_1 - \mu _1 = F_1 F_3 + \in _1]\) Then the covariance structure \(LL' + \psi \) may not be adequate. The assumption of linearity is inherent here.

These assumption and the relation (10.1.1) constitute the orthogonal factor model.

The orthogonal factor model implies a covariance structure for X.

$$\begin{aligned} \text { Here } (X - \mu ) (X - \mu ')= & {} (LF + \in ) (LF + \in )' \\= & {} (LF + \in ) ((LF)' + \in ') \\= & {} LF(LF)' + \in (LF)' + LF \in ' + \in \in ' \\= & {} LFF'L' + \in F'L' + LF \in ' + \in \in ' \end{aligned}$$
$$\begin{aligned} \Sigma= & {} \text { covariance matrix of } X \\= & {} E(X - \mu )(X - \mu )' \\= & {} LE(FF')L' + E(\in F)' L' + LE(F \in ') + E(\in \in ') \\= & {} LIL' + \psi = LL' + \psi \end{aligned}$$
$$ \text { Again } (X - \mu )F' = (LF + \in ) F' = LFF' + \in F' $$
$$ \text { or, } \text { cov}(X, F) = E(X - \mu )F' = E(LF + \in )F' = LE(FF') + E(\in F') = L $$

Now \(\Sigma = LL' + \psi \) implies

$$\begin{aligned} \left. \begin{array}{c} \text {var}(X_i) = {l_{i1}}^2 + \cdots + {l_{im}}^2 + \psi _i \\ \text {cov}(X_i X_k) = l_{i1} l_{k1} + \cdots + l_{im} l_{km} \end{array}\right\} \end{aligned}$$
(10.1.3)
$$ \text {cov}(X F) = L \Rightarrow cov(X_i F_j) = l_{ij} $$
$$ \Rightarrow V(X_i) = \delta _{ii} = {l_{i1}}^2 + \cdots + {l_{im}}^2 + \psi _i $$

Let ith communality \( = {h_i}^2 = {l_{i1}}^2 + \cdots + {l_{im}}^2\)

Then \(\delta _{ii} = {h_i}^2 + \psi _i \, \, \, (i = 1 \ldots p)\)

\({h_i}^2\) \(=\) sum of squares of loadings of ith variable on the m common factors.

Given a random sample of observations \({\displaystyle \mathop {x_1}^{b \times 1}}, x_2 \dots {\displaystyle \mathop {x_p}^{p \times 1}}\). The basic problem is to decide whether \(\Sigma \) can be expressed in the form (10.1.3) for reasonably small value of m, and to estimate the elements of L and \(\psi \).

Here the estimation procedure is not so easy. Primarily, we have from the sample data estimates of the \(\frac{p(p + 1)}{2}\) distinct elements of the upper triangle of \(\Sigma \) but on the RHS of (10.1.3) we have \(pm + p\) parameters, pm for L and p for \(\psi \). The solution will be indeterminate unless \(\frac{p(p + 1)}{2} - p(m + 1) \ge 0\) or \(p > 2m\). Even if this condition is satisfied L is not unique.

Proof

Let \({\displaystyle \mathop T^{m \times m}}\) be any \(\bot \) matrix so that \(TT' = T'T = I\)

Then (10.1.1) can be written as

$$\begin{aligned} X - \mu = LF + \in = LTT'F + \in = L^{*} F^{*} + \in \end{aligned}$$
(10.1.4)
$$ \text { where } L^{*} = LT \text { and } F^{*} = T'F $$
$$ \text { Since } E(F^{*}) = T'E(F) = 0 $$
$$ \text { and } \text { cov}(F^{*}) = T' \text {Cov}(F) T = T'T = I $$

It is impossible to distinguish between loadings L and \(L^{*}\) on the basis of the observations on X. So the vectors F and \(F^{*} = T'F\) have the same statistical properties and even if the loadings L and \(L^{*}\) are different, they both generate the same covariance matrix \(\Sigma ,\) i.e.,

$$\begin{aligned} \Sigma = LL' + \psi = LT T' L' + \psi = L^{*} {L^{*}}' + \psi \end{aligned}$$
(10.1.5)

The above problem of uniqueness is generally resolved by choosing an orthogonal rotation T such that the final loading L satisfies the condition that \(L' \psi ^{- 1} L\) is diagonal with positive diagonal elements. This restriction requires L to be of full rank m. With a valid \(\psi \) viz. one with all positive diagonal elements it can be shown that the above restriction yields a unique L.

\(\square \)

10.1.1 Method of Estimation

Given n observations vectors \(x_1 \ldots x_n\) on p generally correlated variables, factor analysis seeks to verify whether the factor model (10.1.1) with a small number of factors adequately represent the data.

The sample covariance matrix is an estimator of the unknown covariance matrix \(\Sigma \). If \(\Sigma \) appears to deviate significantly from a diagonal matrix, then a factor model can be used and the initial problem is one of estimating the factor loadings \(l_{ij}\) and the specific variances. \(\psi _i\).

Principal Component Method Footnote 1

Let \(\Sigma \) has eigenvalue–eigenvector pairs \((\lambda _i, e_i)\) with \(\lambda _1 \ge \lambda _2 \ge \cdots \lambda _p \ge 0\). Then by specified decomposition

$$\begin{aligned} \Sigma= & {} \lambda _1 e_1 {e_1}' + \lambda _2 e_2 {e_2}' + \cdots + \lambda _p e_p {e_p}' \nonumber \\= & {} (\sqrt{\lambda _1 e_1} \cdots \sqrt{\lambda _p e_p}) \left( \begin{array}{c} \sqrt{\lambda _1 {e_1}'} \\ \vdots \\ \sqrt{\lambda _p {e_p}'} \end{array}\right) e_1 \sqrt{\lambda _1} \cdots e_p \sqrt{\lambda _p} \\= & {} {\displaystyle \mathop L^{p \times p}} \, \, {\displaystyle \mathop {L'}^{p \times p}} + 0^{p \times p} \nonumber \end{aligned}$$
(10.1.6)

[in (10.1.6) \(m = p\) and jth column of \(L = \sqrt{\lambda _j e_j}\)].

Apart from the scale factor \(\sqrt{\lambda _j}\), the factor loadings on the jth factor are the \(pp^n\) jth principal component.

The approximate representation assumes that the specific factors \(\in \) are of minor importance and can be ignored in factoring \(\Sigma \). If specific factors are included in the model, their variances may be taken to be the diagonal elements of \(\Sigma - LL'\).

Allowing for specific factors, the approximation becomes

$$\begin{aligned} \Sigma= & {} LL' + \psi \nonumber \\= & {} (\sqrt{\lambda _1} e_1 \sqrt{\lambda _2} e_2 \cdots \sqrt{\lambda _m} e_m) \left( \begin{array}{c} \sqrt{\lambda _1} {e_1}' \\ \sqrt{\lambda _2} {e_2}' \\ \vdots \\ \sqrt{\lambda _m} {e_m}' \end{array}\right) + \left( \begin{array}{cccc} \psi _1 &{}\,\,\, 0 &{}\,\,\, \ldots &{}\,\,\, 0 \\ 0 &{}\,\,\, \psi _2 &{}\,\,\, \ldots &{}\,\,\, 0 \\ \vdots &{}\,\,\, \vdots &{}\,\,\, \vdots &{}\,\,\, \vdots \\ 0 &{}\,\,\, 0 &{}\,\,\, \ldots &{}\,\,\, \psi _p \end{array}\right) \end{aligned}$$
(10.1.7)

where \(m \le p\).

(we assume that last \(p - m\) eigenvalues are small)

and \(\psi _{ii} = \delta _{ii} - {\displaystyle \mathop \sum ^m_{j = 1}} {l_{ij}}^2\) for \(i = 1 \ldots p\).

For the principal component solution, the estimated factor loadings for a given factor do not change as the number of factors is increased. If \(m = 1\)

$$ L = \left( \sqrt{\lambda _1} \widehat{e_1}\right) $$

if \(m = 2\)

$$ L = \left( \sqrt{\widehat{\lambda _1}} \widehat{e_1} \sqrt{\widehat{\lambda _2}} \widehat{e_2}\right) $$

where \((\widehat{\lambda _1}, \widehat{e_1})\) and \((\widehat{\lambda _2}, \widehat{e_2})\) are the first two eigenvalue–eigenvector pairs for S (or R).

By the definition of \(\widehat{\psi _i}\), the diagonal elements of S are equal to the diagonal elements of \({\widehat{L} \widehat{L}}' + \psi \). How to determine m?

The choice of m can be based on the estimated eigenvalues.

Consider the residual matrix \(S - (LL' + \psi )\)

Here the diagonal elements are new and if the off-diagonal elements are also small we may take that particular value of m to be appropriate.

Analytically, we chose that m for which

$$\begin{aligned} \text { Sum of squared entries of } (S - (LL' + \psi )) \le {\widehat{\lambda }_{m + 1}}^2 + \cdots + \widehat{\lambda }^2_p \end{aligned}$$
(10.1.8)

Ideally, the contribution of the first few factors to the sample variance of the variables should be large. The contribution to the sample variance \(s_{ii}\) from the first common factor is \({l_{ii}}^2\). The contribution to the total sample variance \(s_{11} + \cdots s_{pp} = h(S)\) from the first common factor is

$$ \widehat{l}^2_{11} + \widehat{l}^2_{21} + \cdots + \widehat{l}^2_{p1} = (\sqrt{\lambda _1} \widehat{e}_1)' (\sqrt{\lambda _1} \widehat{e}_1) = \widehat{\lambda }_1 $$

Since the eigenvectors \(\widehat{e}_1\) has unit length.

In general,

$$\begin{aligned} \left( \begin{array}{c} \text { Propertion of total }\\ \text { sample variance due }\\ \text { to the factor } \end{array}\right) = \left\{ \begin{array}{c} \frac{\widehat{\lambda }_j}{s_{11} + \cdots + s_{pp}} \text { for a factor analysis of } S \\ \\ \frac{\widehat{\lambda }_j}{p} \text { for a factor analysis of } R \end{array}\right. \end{aligned}$$
(10.1.9)

Criterion (10.1.9) is frequently used as a heuristic device for determining the appropriate number of common factors. The value of m is gradually increased until a suitable proportion of the total sample variance has been explained.

Other Rules Used in Package

No. of eigenvalue of R greater than one (when R is used)

No. of eigenvalue of S that are positive (when S is used).

10.1.2 Factor Rotation

If \(\widehat{L}\) be the \(p \times m\) matrix of estimated factor loadings obtained by any method, then

$$ L^{*} = \widehat{L} T \text { where } TT' = T'T = I $$

is a \(p \times m\) matrix of rotated loadings.

Moreover, the estimated covariance (or correlation) matrix remains unchanged since

$$ \widehat{L} {\widehat{L}}' + \widehat{\psi } = \widehat{L} TT' {\widehat{L}}' + \widehat{\psi } = \widehat{L}^{*} {\widehat{L}^{*'}} + \widehat{\psi } $$

The above equation indicates that the residual matrix \(S_n - \widehat{L} \widehat{L}' - \widehat{\psi } = S_n - \widehat{L}^{*} {\widehat{L}^{*'}} - \widehat{\psi }\) remains unchanged. Moreover, the specific variances \(\widehat{\psi }_i\) and hence the communication \({\widehat{h}_i}^2\) are unaltered. Hence, mathematically it is immaterial whether \(\widehat{L}\) or \(L^{*}\) is obtained.

Since the original loadings may not be readily interpretable, it is usual practice to rotate them until a ‘simple structure’ is achieved.

Ideally, we should like to see a pattern of loadings of each variable loads highly on a single factor and has small to moderate loading on the remaining factors.

The problem is to find an orthogonal rotation which compounds to a ‘simple structure.’

There can be achieved if often rotation the orthogonality of the factor still exists. This is maintained of we perform orthogonal rotation. Among these (1) Variance rotation, (2) Quartimax rotation, (3) Equamax rotation are important.

Oblique rotation does not ensure the orthogonality of factors often rotation. There are several algorithms like oblimax, Quartimax.

10.1.3 Varimax Rotation

Orthogonal Transformation on L

$$ L^{*} = LT \, \, \, TT' = I $$

\(L^{*}\) is the matrix of orthogonally rotated loadings and let \(d_j = {\displaystyle \mathop \sum ^p_{i = 1}} {l^{*}}^2_{ij} \, \, \, j = 1 \ldots m\)

Then the following expression is maximized

$$ \sum ^m_{j = 1} \left\{ \sum ^p_{i = 1} \left( {l^{*}}^4_{ij} - {d_j}^2 / p\right) \right\} $$

Such a procedure tries to give either large (in absolute value) or zero values in the columns of Ł\(^{*}\). Hence, the procedure tries to produce factors with either a stray association with the responses or no association at all.

The communality

\({h_i}^2 = {\displaystyle \mathop \sum ^m_{j = 1}} {l^{*}}^2_{ij} = {\displaystyle \mathop \sum ^m_{j = 1}} {l_{ij}}^2\) remains constant under rotation.

10.2 Quartimax Rotation

The factor pattern is simplified by forcing the variables to correlate highly with one main factor (the so-called G-factor of 1Q studies) and very little with remaining factors. Here all variables are primarily associated with a single factor.

Interpretation of results obtained from factor analysis. It is usually difficult to interpret. Many users should significant coefficient magnitudes on many of the retained factors (coefficient greater than —.60— are often considered large and coefficients of —0.35— are often considered moderate). And especially on the first factor.

For good interpretation, factor rotation is necessary. The objective of the rotation is to achieve the most ‘simple structure’ though the manipulation of factor pattern matrix.

The most simple structure can be explained in terms of five principles of factor rotation.

  1. 1.

    Each variable should have at least one zero (small) loadings.

  2. 2.

    Each factor should have a set of linearly independent variables where factor loadings are zero (small).

  3. 3.

    For every pair of factors, there should be several variables where loadings are zero (small) for one factor but not the other.

  4. 4.

    For every pair of factors, a large proportion of variables should have zero (small) loading on both factors whenever more than about four factors are extracted.

  5. 5.

    For every pair of factors, there should only be a small number of variables with nonzero loadings on both.

In orthogonal rotation,

  1. (1)

    Factors are perfectly uncorrelated with one another.

  2. (2)

    Less parameters are to be estimated.

10.3 Promax Rotation

Factors are allowed to be correlated with one another.

  • Step I. Rotate the factors orthogonally.

  • Step II. Get a target matrix by raising the factor coefficients to an exponent (3 or 4). The coefficients secure smaller but absolute distance increases.

  • Step III. Rotate the original matrix to a best-fit position with the target matrix.

Here many moderate coefficients quickly approaches zero \([.3 \times .3 = .09]\) then the large coefficients \((\ge .6)\).

Example 10.1

Consider the data set related to the relative consumption of certain food items in European and Scandinavian countries considered in the chapter of principal component analysis.

If we do factor analysis with varimax rotation, then the output is as follows:

Rotated Factor Loadings and Communalities Varimax Rotation

Variable

Factor1

Factor2

Factor3

Factor4

Communality

coffee

0.336

0.807

0.018

−0.095

0.774

Tea

−0.233

0.752

0.330

0.370

0.866

Biscuits

0.502

0.124

0.712

−0.177

0.806

Powder

0.317

0.856

0.047

−0.230

0.889

Potatoes

0.595

0.047

0.060

0.485

0.595

Frozen fish

0.118

−0.100

0.050

0.918

0.869

Apples

0.832

0.284

0.251

0.097

0.846

Oranges

0.903

0.148

0.004

0.036

0.839

Butter

−0.004

0.089

0.900

0.172

0.847

Variance

2.3961

2.0886

1.4969

1.3480

7.3296

\(\%\) Var

0.266

0.232

0.166

0.150

0.814

Factor Score Coefficients

Variable

Factor1

Factor2

Factor3

Factor4

coffee

0.038

0.408

−0.144

−0.040

Tea

−0.311

0.456

0.119

0.319

Biscuits

0.165

−0.141

0.506

−0.252

Powder

0.026

0.426

−0.109

−0.144

Potatoes

0.253

−0.047

−0.089

0.331

Frozen fish

0.006

−0.019

−0.072

0.692

Apples

0.339

−0.008

0.045

0.006

Oranges

0.431

−0.064

−0.129

−0.026

Butter

−0.132

−0.080

0.674

0.029

We see that according to percentage of variation about 80% variation is explained by first four components (as in case of PCA). But here the advantage id unlike PCA we can physically explain the factors. According to rotated factor loading, we can say that the first factor is composed of ‘apples, oranges, and potatoes,’ similarly the other three factors are composed of ‘coffee, tea, and powder soup,’ ‘butter and biscuits,’ and ‘potatoes and frozen fish,’ respectively.

Except Potato there is no overlapping, and the groups are well defined and may correspond to types of customers preferring ‘fruits,’ ‘hot drinks,’ ‘snacks’ and ‘proteins, vitamins, and minerals.’

The most significant difference between PCA and factor analysis is regarding the assumption of an underlying causal structure. Factor analysis assumes that the covariation among the observed variables is due to the presence of one or more latent variables known as factors that impose causal influence on these observed variables. Factor analysis is used when there exit some latent factors which impose causal influence on the observed variables under consideration. Exploratory factor analysis helps the researcher identify the number and nature of these latent factors. But principal component analysis makes no assumption about an underlying causal relation. It is simply a dimension reduction technique.