Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

As mentioned in the last chapter, a measurement model of the type illustrated in Fig. 4.1 is assumed in confirmatory factor analysis.

Fig. 4.1
figure 4_1_105830_2_Enfigure 4_1_105830_2_En

A graphical representation of multiple measures with a confirmatory factor structure

The objective of a confirmatory analysis is to test if the data fit the measurement model.

4.1 Confirmatory Factor Analysis : A Strong Measurement Model

The graphical representation of the model shown in Fig. 4.1 can be expressed by the system of equations:

$$ \left\{ {\begin{array}{*{20}c} {X_1 = \lambda _{11} F_1 +\varepsilon _1 } \\ {X_2 = \lambda _{21} F_1 +\varepsilon _2 } \\ {X_3 = \lambda _{31} F_1 +\varepsilon _3 } \\ {X_4 = \lambda _{42} F_2 +\varepsilon _4 } \\ {X_5 = \lambda _{52} F_2 +\varepsilon _5 } \\ \end{array}} \right. $$
((4.1))

Let

$$ \mathop {\bf{x}}\limits_{5 \times 1} = \left[ \begin{array}{l} X_1 \\ X_2 \\ X_3 \\ X_4 \\ X_5 \\ \end{array} \right];\;\mathop {\bf{F}}\limits_{2 \times 1} = \left[ \begin{array}{l} F_1 \\ F_2 \\ \end{array} \right];\;\mathop {\boldsymbol \Lambda}\limits_{5 \times 2} = \left[ {\begin{array}{*{20}c} \begin{array}{l} \lambda _{11} \\ \lambda _{21} \\ \lambda _{31} \\ \lambda _{41} \\ \lambda _{51} \\ \end{array} & \begin{array}{l} \lambda _{12} \\ \lambda _{22} \\ \lambda _{32} \\ \lambda _{42} \\ \lambda _{52} \\ \end{array} \\ \end{array}} \right];\;\mathop {\bf{e}}\limits_{5 \times 1} = \left[ \begin{array}{l} \varepsilon _1 \\ \varepsilon _2 \\ \varepsilon _3 \\ \varepsilon _4 \\ \varepsilon _5 \\ \end{array} \right] $$

Equation (4.1) can be expressed in matrix notation as

$$ \mathop {\bf{x}\vbox to 12pt{}}\limits_{5 \times 1} = \mathop {\boldsymbol{\Lambda}\vbox to 12pt{}}\limits_{5 \times 2} \mathop {\bf{F}\vbox to 12pt{}}\limits_{2 \times 1} + \mathop {\bf{e}\vbox to 12pt{}}\limits_{5 \times 1} $$
((4.2))

with

$$ E[{\bf{e}}] = 0 $$
((4.3))
$$ E[{\bf{ee}}^\prime ] = {\bf{D}} = {{\rm diag}}\{ \delta _{ii} \} $$
((4.4))
$$ E\left[ {{\bf{FF}}^\prime } \right] = {{\boldsymbol \Phi}} $$
((4.5))

If the factors are assumed independent:

$$ E\left[ {{\bf{FF'}}} \right] = {\bf{I}} $$
((4.6))

While we were referring to the specific model with five indicators in the expressions above, the matrix notation is general and is identical if we now consider a measurement model with q indicators and a factor matrix containing n unobserved factors.

$$ \mathop {\bf{x}\vbox to 12pt{}}\limits_{q \times 1} = \mathop {\boldsymbol{\Lambda}\vbox to 12pt{}}\limits_{\vbox to 5pt{}q \times n} \mathop {\bf{F}\vbox to 12pt{}}\limits_{n \times 1} + \mathop {\bf{e}\vbox to 12pt{}}\limits_{q \times 1} $$
((4.7))

The theoretical covariance matrix of x is given by

$$ E\left[ {{\bf{xx}}^\prime } \right] = E\left[ {\left( {{\boldsymbol{\Lambda {\rm{F}}}} + {\bf{e}}} \right)\left( {{\boldsymbol{\Lambda {\rm{F}}}} + {\bf{e}}} \right)^\prime } \right] = E\left[ {{\boldsymbol{\Lambda {\rm{FF}}}}^\prime {\boldsymbol{\Lambda }}^\prime + {\bf{ee}}^\prime } \right]$$
((4.8))
$$ = {\boldsymbol{\Lambda }}E\left[ {{\boldsymbol{\rm{FF}}}^\prime } \right]{\boldsymbol{\Lambda }}^\prime + E\left[ {{\bf{ee}}^\prime } \right]$$
((4.9))
$$ \boldsymbol{\Sigma} = {\boldsymbol{\Lambda \Phi \Lambda }}^\prime + {\bf{D}} $$
((4.10))

Therefore Equation (4.10) expresses how the covariance matrix is structured, given the measurement model specification in Equation (4.7). The structure is simplified in case of the independence of the factors:

$$ \boldsymbol{{\Sigma}} = {\boldsymbol{\Lambda \Lambda }}^\prime + {\bf{D}} $$
((4.11))

The notation used above was chosen to stay close to the notation in the previous chapter to facilitate the comparison, especially between exploratory factor analysis and confirmatory factor analysis. However, we now introduce the notation found in LISREL because the software refers to specific variable names. In particular, Equation (4.7) uses ξ for the vector of factors and δ for the vector of measurement errors:

$$ \mathop {\bf{x}}\limits_{q \times 1} = \mathop {\boldsymbol{\Lambda}_x }\limits_{q \times n} \mathop {\boldsymbol{\xi }}\limits_{n \times 1} + \mathop {{\boldsymbol \delta}}\limits_{q \times 1} $$
((4.12))

with

$$ E\left[ {{{\boldsymbol \xi \boldsymbol \xi }}^\prime } \right] = {{\boldsymbol \Phi}}$$
((4.13))

and

$$ E\left[ {{{\boldsymbol \delta \boldsymbol \delta }}^\prime } \right] = {{\boldsymbol \theta }}_\delta .$$
((4.14))

The methodology for estimating these parameters is presented next.

4.2 Estimation

If the observed covariance matrix estimated from the sample is S, we need to find the values of the lambdas (the elements of Λ) and of the deltas (the elements of D), which will reproduce a covariance matrix as similar as possible to the observed one. Maximum likelihood estimation is used to minimize SΣ. The estimation consists in finding the parameters of the model which will replicate as closely as possible the observed covariance matrix in Equation (4.10). For the maximum likelihood estimation, the comparison of the matrices S and Σ is made through the following expression:

$$F = {{\rm Ln}}\left| {\boldsymbol \Sigma} \right| + {{\rm tr}}\left( {{\bf{S \boldsymbol \Sigma }}^{ - 1} } \right) - {{\rm Ln}}\left| {\bf S} \right| - \left( q \right)$$
((4.15))

This expression follows directly from the maximization of the likelihood function. Indeed, based on the multivariate normal distribution of the data matrix \(\mathop {{\bf X}^d }\limits_{N \times q} \), which has been mean-centered, the sampling distribution is

$$f({\bf X}) = \prod\limits_{i = 1}^N {\left( {2\pi } \right)} ^{ - \frac{q}{2}} \left| {\boldsymbol \Sigma} \right|^{ - \frac{1}{2}} \exp \left\{ { - \frac{1}{2}{\bf{x}}_i^{d^\prime } {\boldsymbol \Sigma}^{ - 1} {\bf{x}}_i^d } \right\}$$
((4.16))

which is also the likelihood

$$\ell = \ell ({{\rm parameters}}\,{{\rm of}}\;{\boldsymbol \Sigma} |{\bf X}) = \prod\limits_{i = 1}^N {\left( {2\pi } \right)} ^{ - \frac{q}{2}} \left| {\boldsymbol \Sigma} \right|^{ - \frac{1}{2}} \exp \left\{ { - \frac{1}{2}{\bf x}_i^{d^\prime} {\boldsymbol \Sigma}^{ - 1} {\bf{x}}_i^d } \right\}$$
((4.17))

or

$$\begin{array}{c}\displaystyle{\bf{L}} = {{\rm Ln}}\ell = \sum\limits_{i = 1}^N {\left[ { - \frac{q}{2}{{\rm Ln}}\left( {2\pi } \right) - \frac{1}{2}{{\rm Ln}}\left| {\boldsymbol \Sigma} \right| - \frac{1}{2}{\bf{x}}_i^{d^\prime } {\boldsymbol \Sigma}^{ - 1} {\bf{x}}_i^d } \right]} \\[4pt] \displaystyle= - \frac{{Nq}}{2}{{\rm Ln}}\left( {2\pi } \right) - \frac{N}{2}{{\rm Ln}}\left| {\boldsymbol \Sigma} \right| - \frac{1}{2}\sum\limits_{i = 1}^N {\left( {{\bf{x}}_i^{d^\prime } {\boldsymbol \Sigma}^{ - 1} {\bf{x}}_i^d } \right)} \\[4pt]\displaystyle= - \frac{N}{2}\left[ {q{{\rm Ln}}\left( {2\pi } \right) + {{\rm Ln}}\left| {\boldsymbol \Sigma} \right| + \frac{1}{N}\sum\limits_{i = 1}^N {\left( {{\bf{x}}_i^{d^\prime } {\boldsymbol \Sigma}^{ - 1} {\bf{x}}_i^d } \right)} } \right] \\[4pt]\displaystyle= - \frac{N}{2}\left[ {q{{\rm Ln}}\left( {2\pi } \right) + {{\rm Ln}}\left| {\boldsymbol \Sigma} \right| + \frac{1}{N}{{\rm tr}}\left( {{\bf X}^{d^\prime } {\boldsymbol \Sigma}^{ - 1} {\bf X}^d } \right)} \right] \\[4pt]\displaystyle= - \frac{N}{2}\left[ {q{{\rm Ln}}\left( {2\pi } \right) + {{\rm Ln}}\left| {\boldsymbol \Sigma} \right| + \frac{1}{N}{{\rm tr}}\left( {{\bf X}^{d^\prime } {\bf X}^d {\boldsymbol \Sigma}^{ - 1} } \right)} \right] \\\end{array}$$
((4.18))
$${\bf{L}} = - \frac{N}{2}\left[ {q{{\rm Ln}}\left( {2\pi } \right) + {{\rm Ln}}\left| {\boldsymbol \Sigma} \right| + {{\rm tr}}\left( {{\bf{S\boldsymbol \Sigma }}^{ - 1} } \right)} \right]$$
((4.19))

Therefore, given that the constant terms do not impact the function to maximize, the maximization of the likelihood function corresponds to minimizing the expression in Equation (4.15) (note that the last terms –Ln|S|–(q) are constant terms).

The expression F is minimized by searching over values for each of the parameters. If the observed variables x are distributed as a multivariate normal distribution, the parameter estimates that minimize the Equation (4.15) are the maximum likelihood estimates.

There are ½(q)(q+1) distinct elements that constitute the data; this comes from half of the symmetric matrix to which one needs to add back half of the diagonal in order to count the variances of the variables themselves (i.e., [(q)×(q)/2+q/2]). Consequently, the number of degrees of freedom corresponds to the number of distinct data points as defined above minus the number of parameters in the model to estimate.

In the example shown in Fig. 4.5, ten parameters must be estimated:

$$ 5\lambda _{ij} '{{\rm s}} + 5\delta _{ii} '{{\rm s}}{{\rm .}} $$

These correspond to each of the arrows in the figure, i.e., the factor loadings and the variances of the measurement errors. There would be 11 parameters to estimate if the two factors were correlated.

4.2.1 Model Fit

The measure of the fit of the model to the data corresponds to the criterion that was minimized, i.e., a measure of the extent to which the model, given the best possible values of the parameters, can lead to a covariance matrix of the observed variables that is sufficiently similar to the actually observed covariance matrix. We first present and discuss the basic chi-square test of the fit of the model. We then introduce a number of measures of fit that are typically reported and those which alleviate the problems inherent to the chi-square test. We finally discuss how modification indices can be used as diagnostics for model improvement.

4.2.1.1 Chi-Square Tests

Based on large-sample distribution theory, \(v = (N - 1)\hat F\) (where N is the sample size used to generate the covariance matrix of the observed variables and \(\hat F\) is the minimum value of the expression F as defined by Equation 4.15) is distributed as a chi-squared with the number of degrees of freedom corresponding to the number of data points minus the number of estimated parameters, as computed in the example above. If the value of v is significantly greater than zero, the model is rejected; this means that the theoretical model is unable to generate data with a covariance matrix close enough to the one obtained from the actual data.

This follows from the normal distribution assumption of the data. As discussed above, the likelihood function at its maximum value (L) can be compared with L 0, the likelihood of the full or saturated model with zero degrees of freedom. Such saturated model reproduces the covariance matrix perfectly so that Σ = S and tr(SΣ–1) = tr(I) = q. Consequently

$$ {\bf{L}}_0 = - \frac{N}{2}\left[ {q{{\rm Ln}}\left( {2\pi } \right) + {{\rm Ln}}\left| {\bf S} \right| + q} \right]$$
((4.20))

The likelihood ratio test is

$$ - 2\left[ {{\bf{L}} - {\bf{L}}_0 } \right] \sim \chi _{df = [q(q + 1)/2] - T}^2$$
((4.21))

where T is the number of parameters estimated.

Equation (4.21) results in the expression:

$$ N\left[ {{{\rm Ln}}\left| {\boldsymbol \Sigma} \right| + {{\rm tr}}\left( {{\bf{S \boldsymbol \Sigma }}^{ - 1} } \right) - {{\rm Ln}}\left| {\bf S} \right| - \left( q \right)} \right]$$
((4.22))

which is distributed as a chi-squared with [q(q + 1)/2] – T degrees of freedom.

It should be noted that the comparison of any nested models is possible. Indeed, the test of a restriction of a subset of the parameters implies the comparison of two of the measures of fit v, each distributed as a chi-squared. Consequently, the difference between the value v r of a restricted model and v u, the unrestricted model, follows a chi-square distribution with the number of degrees of freedom corresponding to the number of restrictions.

One problem with the expression v or Equation (4.22) is that it contains N, the sample size. This means that as the sample size increases, it becomes less likely that one will fail to reject the model. This is why several other measures of fit have been developed. They are discussed below. While this corresponds to the statistical power of a test consisting in rejecting a null hypothesis that a parameter is equal to zero, it is an issue in this context because the hypothesis which the researcher would like to get support for is the null hypothesis that there is no difference between the observed covariance matrix and the matrix that can be generated by the model. Failure to reject the hypothesis, and therefore “accepting” the model, therefore, can be due to the lack of power of the test. A small enough sample size can contribute to finding fitting models based on chi-square tests. The parallel is the greater difficulty in finding fitting models when the sample size is large.

4.2.1.2 Other Goodness-of-Fit Measures

The LISREL output gives a direct measure (GFI) of the fit between the theoretical and observed covariance matrices following from the fit criterion of Equation (4.15), and it is defined as

$$ {{\rm GFI}} = 1 - \frac{{{{\rm tr}}\left[ {\left( {\hat{\boldsymbol\Sigma}^{ - 1} {\bf S} - {\bf{I}}} \right)^2 } \right]}}{{{{\rm tr}}\left[ {\left( {\hat{\boldsymbol\Sigma}^{ - 1} {\bf S}} \right)^2 } \right]}}$$
((4.23))

From this equation, it is clear that if the estimated and the observed variances are identical, the numerator of the expression subtracted from 1 is 0 and, therefore, GFI = 1. To correct for the fact that the GFI is affected by the number of indicators, an adjusted goodness-of-fit index (AGFI) is also proposed. This measure of fit corrects the GFI for the degrees of freedom, just like an adjusted R-squared would in a regression context:

$$ {{\rm AGFI}} = 1 - \left[ {\frac{{\left( q \right)(q + 1)}}{{\left( q \right)(q + 1) - 2T}}} \right]\left[ {1 - {{\rm GFI}}} \right]$$
((4.24))

where T is the number of estimated parameters.

As the number of estimated parameters increases, holding everything else constant, the adjusted GFI decreases.

A threshold value of 0.9 (for either the GFI or the AGFI) has become a norm for the acceptability of the model fit (Bagozzi and Yi 1988, Baumgartner and Homburg 1996, Kuester, Homburg and Robertson 1999).

Another index that is often found to assess model fit is the root mean square error of approximation (RMSEA). It is defined as a function of the minimum fit function corrected by the degrees of freedom and the sample size:

$$ {{\rm RMSEA}} = \sqrt {\frac{{\hat F_0 }}{d}} $$
((4.25))

where

$$\hat F_0 = {{\rm Max}}\left\{ {\left( {\hat F - \left[ {d/\left( {N - 1} \right)} \right]} \right),0} \right\}$$
((4.26))
$$ d = \left[ {q\left( {q + 1} \right)/2} \right] - T$$
((4.27))

A value of RSMEA smaller than 0.08 is considered to reflect reasonable errors of approximation and a value of 0.05 indicates a close fit.

4.2.1.3 Modification Indices

The solution obtained for the parameter estimates uses the derivatives of the objective function relative to each parameter. This means that for a given solution, it is possible to know the direction in which a parameter should change in order to improve the fit and how steeply it should change. As a result, the modification indices indicate the expected gains in fit that would be obtained if a particular coefficient should become unconstrained (holding all other parameters fixed at their estimated value). Although not a substitute for theory, this modification index can be useful in analyzing structural relationships and, in particular, in refining the correlational assumptions of random terms and for modeling control factors.

4.2.2 Test of Significance of Model Parameters

Because of the maximum likelihood properties of the estimates which follow from the normal distribution assumption of the variables, the significance of each parameter can be tested using the standard t statistics formed by the ratio of the parameter estimate and its standard deviation.

4.3 Summary Procedure for Scale Construction

Scale construction involves several steps. The process brings the methods discussed in the previous chapter (Chapter 3) with those presented in this one. These include the following statistical analyses, which provide a guide in scale construction: exploratory factor analysis (EFA), confirmatory factor analysis (CFA), and reliability coefficient alpha. The confirmatory factor analysis technique can also be used to assess the discriminant and convergent validity of a scale. We now review these steps in turn.

4.3.1 Exploratory Factor Analysis

Exploratory factor analysis can be performed separately for each hypothesized factor. This demonstrates the unidimensionality of each factor. One global factor analysis can also be performed in order to assess the degree of independence between the factors.

4.3.2 Confirmatory Factor Analysis

Confirmatory factor analysis can be used to assess the overall fit of the entire measurement model and to obtain the final estimates of the measurement model parameters. Although sometimes performed on the same sample as the exploratory factor analysis, when it is possible to collect more data, it is preferable to perform the confirmatory factor analysis on a new sample.

4.3.3 Reliability Coefficient α

In cases where composite scales are developed, this measure is useful to assess the reliability of the scales. Reliabilities of less than 0.7 for academic research and 0.9 for market research are typically not sufficient to warrant further analyses using these composite scales.

In addition, scale construction involves determining that the new scale developed is different (i.e., reflects and measures a construct which is different) from measures of other related constructs. This is a test of the scale’s discriminant validity. It also involves a test of convergent validity, i.e., that this new measure relates to other constructs it is supposed to be related to, while remaining different.

4.3.4 Discriminant Validity

A construct must be different from other constructs (discriminant validity) but are nevertheless mutually conceptually related (convergent validity). The discriminant validity of the constructs is ascertained by comparing measurement models where the correlation between the constructs is estimated with one where the correlation is constrained to be one (whereby assuming a single factor structure). The discriminant validity of the constructs is examined for each pair at a time. This procedure, proposed by Bagozzi, Yi and Phillips (1991) indicates that, if the model where the correlation is not equal to 1 improves significantly the fit, the two constructs are distinct from each other, although they can possibly be significantly correlated.

4.3.5 Convergent Validity

The convergent validity of the constructs is assessed by comparing a measurement model where the correlation between the two constructs is estimated with a model where the correlation is constrained to be equal to zero. This test can also be performed simply to test the independence of the constructs. For example, if several constructs are used to explain other dependent variables, it is desirable that the explanatory factors are uncorrelated to identify the separate effects of these factors. In such a case, the researcher would hope to fail to reject the null hypothesis that the correlations are zero. In the context of assessing convergent validity, the researchers would want to check that the constructs being measured (likely to be a newly developed construct) are related to other constructs they are supposed to be related to according to the literature and theory. Here, the researcher would hope to reject the null hypothesis that the correlation is zero. In comparing the restricted and the non-restricted models, a significant improvement in fit due to removing the restriction of independence indicates that the two constructs are related, which confirms convergence validity. Combining the two tests (that the correlation is different from one and different from zero) demonstrates that the two constructs are different (discriminant validity) although related with a correlation significantly different from zero correlation (convergent validity).

4.4 Second-Order Confirmatory Factor Analysis

In the second-order factor model, there are two levels of constructs. At the first level, constructs are measured through observable variables. These constructs are not independent and, in fact, their correlation is hypothesized to follow from the fact that these unobserved constructs are themselves reflective of common second-order unobserved constructs of a higher conceptual level. This can be represented as in Fig. 4.2.

Fig. 4.2
figure 4_2_105830_2_Enfigure 4_2_105830_2_En

Graphical representation of a second-order factor analytic model

The relationships displayed in Fig. 4.2 can be expressed algebraically by the following equations:

$$\mathop {\bf{y}}\limits_{p \times 1} = \mathop {{\boldsymbol \Lambda}}\limits_{p \times m} \mathop {{\boldsymbol \eta }}\limits_{m \times 1} + \mathop {{\boldsymbol \varepsilon }}\limits_{p \times 1} $$
((4.28))

and

$$\mathop {{\boldsymbol \eta }}\limits_{m \times 1} = \mathop {{\boldsymbol \Gamma }}\limits_{m \times n} \mathop {{\boldsymbol \xi }}\limits_{n \times 1} + \mathop {{\boldsymbol \zeta }}\limits_{m \times 1}$$
((4.29))

The first equation (4.28) expresses the first-order factor analytic model. The unobserved constructs η are the first-order factors; they are measured by the reflective items represented by the variables y. The second equation (4.29) shows that the constructs η are derived from the second-order factors ξ. The factor loadings corresponding to, respectively, the first-order and second-order factor models are the elements of matrices Λ and Γ. Finally, the errors in measurement are represented by the vectors ε and ζ.

In addition to the structure expressed by these two equations, we use the following notation of the covariances:

$$ E\left[ {{{\boldsymbol \xi \boldsymbol \xi '}}} \right] = \mathop {{\boldsymbol \Phi}}\limits_{n \times n}$$
((4.30))
$$ E\left[ {{{\boldsymbol \zeta \boldsymbol \zeta '}}} \right] = \mathop {{\boldsymbol \Psi} }\limits_{m \times m}$$
((4.31))

and

$$ E\left[ {{{\boldsymbol \varepsilon \boldsymbol \varepsilon '}}} \right] = \mathop {{{\boldsymbol \Theta }}_\varepsilon }\limits_{p \times p}$$
((4.32))

Furthermore, we assume that the ζ’s are uncorrelated to the ξ’s and similarly that the ε’s are uncorrelated to the η’s.

If the second-order factor model described by the equations above is correct, the covariance matrix of the observed variables y must have a particular structure. This structure is obtained as

$$ E\left[ {{\bf{yy}}^\prime } \right] = E\left[ {\left( {{{\boldsymbol \Lambda \boldsymbol \eta + \boldsymbol \varepsilon }}} \right)\left( {{{\boldsymbol \Lambda \boldsymbol \eta + \boldsymbol \varepsilon }}} \right)^\prime } \right]$$
((4.33))

If we develop:

$$ E\left[ {{\bf{yy}}^\prime } \right] = {{\boldsymbol \Lambda}}E\left[ {{{\boldsymbol \eta \boldsymbol \eta }}^\prime } \right]{{\boldsymbol \Lambda}}^\prime {\bf{ + }}E\left[ {{{\boldsymbol \varepsilon \boldsymbol \varepsilon }}^\prime } \right]$$
((4.34))

Replacing η by its value expressed in Equation (4.29):

$$ E\left[ {{\bf{yy}}^\prime } \right] = {{\boldsymbol \Lambda}}E\left[ {\left( {\boldsymbol \Gamma} {\boldsymbol \xi} + {\boldsymbol \zeta} \right)\left( {{{\boldsymbol \Gamma \boldsymbol \xi + \boldsymbol \zeta }}} \right)^\prime } \right]{{\boldsymbol \Lambda}}^\prime {\bf{ + }}E\left[ {{{\boldsymbol \varepsilon \boldsymbol \varepsilon }}^\prime } \right]$$
((4.35))
$$ E\left[ {{\bf{yy}}^\prime } \right] = {{\boldsymbol \Lambda}}\left( {{{\boldsymbol \Gamma} }E\left[ {{{\boldsymbol \xi \boldsymbol \xi }}^\prime } \right]{{\boldsymbol \Gamma} }^\prime + E\left[ {{{\boldsymbol \zeta \boldsymbol \zeta }}^\prime } \right]} \right){{\boldsymbol \Lambda}}^\prime {\bf{ + }}E\left[ {{{\boldsymbol \varepsilon \boldsymbol \varepsilon }}^\prime } \right]$$
((4.36))
$$ E\left[ {{\bf{yy}}^\prime } \right] = {\boldsymbol \Sigma} = {{\boldsymbol \Lambda}}\left( {{{\boldsymbol \Gamma \boldsymbol \Phi \boldsymbol \Gamma }}^\prime {\bf{ + {\boldsymbol{\Psi}} }}} \right){{\boldsymbol \Lambda}}^\prime + {{\boldsymbol \Theta }}_\varepsilon$$
((4.37))

where the elements on the right-hand side of Equation (4.37) are model parameters to be estimated, such that their values combined in that structure reproduce as closely as possible the observed covariance matrix S calculated from the sample data.

The estimation procedure follows the same principle as described above for the simple confirmatory factor analytic model. The number of parameters is, however, different.

How many parameters need to be estimated?

We typically define the covariance matrices Φ, ψ, and Θ ε to be diagonal. Therefore, these correspond to n + m + p parameters to be estimated, to which one would need to add the factor loading parameters contained in matrices Γ and Λ. Taking the example in Fig. 4.2, n = 2, m = 5, and p = 11. One of the factor loadings for each first-order factor should be set to 1 to define the units of measurement of these factors. Consequently, Λ contains 11 – 5 = 6 parameters to be estimated and Γ contains five parameters that need to be estimated. That gives a total of 2 + 5 + 11 + 6 + 5 = 29 parameters to estimate. Given that the sample data covariance matrix (an 11 × 11 matrix) contains (11 × 12)/2 = 66 data points, the degrees of freedom are 66 – 29 = 37.

The same measures of fit as described above for confirmatory factor analysis are used to assess the appropriateness of the structure imposed on the data.

4.5 Multi-group Confirmatory Factor Analysis

Multi-group confirmatory factor analysis is appropriate to test the homogeneity of measurement models across samples. It is particularly useful in the context of cross-national research where measurement instruments may vary due to cultural differences. This corresponds to the notion of measurement invariance. From that point of view, the existing model described by Equation (4.2) must be expanded along two dimensions: (1) several sets of parameters must be estimated simultaneously for each of the groups and (2) some differences in the means of the unobserved constructs must be recognized between groups while they are ignored (assumed to be zero) in regular confirmatory factor analysis. These expansions are represented in Equations (4.38), (4.39) and (4.40). Equation (4.40) is identical to the simple confirmatory factor analytic model.

The means of the factors are represented by the vector κ in Equation (4.39), which contains n rows for the mean of each of the n factors. The vector τ x in Equation (4.38) contains q rows for the scalar constant term of each of the q items:

$$ \mathop {\bf{x}}\limits_{q \times 1} = \mathop {{{\boldsymbol \tau }}_x }\limits_{q \times 1} + \mathop {{{\boldsymbol \Lambda}}_x }\limits_{q \times n} \mathop {{\boldsymbol \xi }}\limits_{n \times 1} + \mathop {{\boldsymbol \delta}}\limits_{q \times 1} $$
((4.38))
$$ E\left[ {{\boldsymbol \xi }} \right] = \mathop {{\boldsymbol \kappa }}\limits_{n \times 1}$$
((4.39))
$$ E\left[ {{{\boldsymbol \delta \boldsymbol \delta }}^\prime } \right] = \mathop {{{\boldsymbol \theta }}_\delta }\limits_{q \times q}$$
((4.40))

Therefore, the means of the observed measures x are:

$$ \mathop {{{\boldsymbol \mu }}_x }\limits_{q \times 1} = E\left[ {\bf{x}} \right] = \mathop {{{\boldsymbol \tau} }_x }\limits_{q \times 1} + \mathop {{{\boldsymbol \Lambda}}_x }\limits_{q \times n} E\left[ {\mathop {{\boldsymbol \xi }}\limits_{n \times 1} } \right] = \mathop {{{\boldsymbol \tau} }_x }\limits_{q \times 1} + \mathop {\Lambda _x }\limits_{p \times n} \mathop {{\boldsymbol \kappa }}\limits_{n \times 1}$$
((4.41))

Such a model with a mean structure such as in Equation (4.41) imposed can be estimated if we recognize that the log likelihood function specified in Equation (4.19) now contains not only the parameters that determine the covariance matrix Σ but also the expected values of the x variables, so that

$$ {\bf S} = \left( {{\bf X} - {{\boldsymbol \mu }}_x } \right)\left( {{\bf X} - {{\boldsymbol \mu }}_x } \right)^\prime $$
((4.42))

Consequently, the objective function or the log likelihood function when modeling the means in addition to the covariance structure is

$$ {\bf{L}} = - \frac{N}{2}\left[ {q{{\rm Ln}}\left( {2\pi } \right) + {{\rm Ln}}\left| {\boldsymbol \Sigma} \right| + {{\rm tr}}\left\{ {\left( {{\bf X} - {{\boldsymbol \mu }}_x } \right)\left( {{\bf X} - {{\boldsymbol \mu }}_x } \right)^\prime {\boldsymbol \Sigma}^{ - 1} } \right\}} \right]$$
((4.43))

We now add a notation to reflect that the model applies to group g with g = 1, …, G:

$$ \forall g = 1, \ldots G:\quad \mathop {{\bf{x}}^{(g)} }\limits_{q \times 1} = \mathop {{{\boldsymbol \tau} }_x^{(g)} }\limits_{q \times 1} + \mathop {{{\boldsymbol \Lambda}}_x^{(g)} }\limits_{q \times n} \mathop {{{\boldsymbol \xi }}^{(g)} }\limits_{n \times 1} + \mathop {{{\boldsymbol \delta}}^{(g)} }\limits_{q \times 1} $$
((4.44))

and

$$ E\left[ {{{\boldsymbol \xi }}^{(g)} } \right] = {{\boldsymbol \kappa }}^{(g)}$$
((4.45))

For identification, it is required that one of the groups serves as a reference with the means of its factors centered at zero (the same requirement as for a single group confirmatory factor analysis). Usually group 1 serves as that reference, although, in principle, it can be any group:

$$ {{\boldsymbol \kappa }}^{(1)} = 0 $$
((4.46))

It is also necessary to fix one factor loading for each factor in Λ x to define the measurement unit of the unobserved constructs.

The estimation is again based on the maximum likelihood. The log likelihood is the sum of the log likelihoods for all the groups so that we now search for the values of the parameters which maximize:

$$\begin{array}{l} \displaystyle {\bf{L}} = - \frac{1}{2}\sum\limits_{g = 1}^G {N^{(g)} } \Bigg[ q^{(g)} {{\rm Ln}}\left( {2\pi } \right) + {{\rm Ln}}\left| {{\boldsymbol \Sigma}^{(g)} } \right| \\[6pt] \qquad\displaystyle + {{\rm tr}}\left\{ {\left( {{\bf X}^{(g)} - {{\boldsymbol \mu }}_x^{(g)} } \right)\left( {{\bf X}^{(g)} - {{\boldsymbol \mu} }_x^{(g)} } \right)^\prime {\boldsymbol \Sigma}^{(g)^{ - 1} } } \right\} \Bigg] \end{array}$$
((4.47))

It is then possible to impose equality constraints on the parameters to be estimated by defining them as invariant across groups. Different types of invariance can be imposed and tested.

Metric invariance concerns the constraint of equality of factor loadings across groups:

$${{\boldsymbol \Lambda}}_x^{(g)} = {{\boldsymbol \Lambda}}_x^{(g^\prime )} = {{\boldsymbol \Lambda}}_x $$
((4.48))

Scalar invariance restricts the scalar constants to be identical across groups:

$${{\boldsymbol \tau} }_x^{(g)} = {{\boldsymbol \tau} }_x^{(g^\prime )} = {{\boldsymbol \tau} }_x $$
((4.49))

In order to illustrate the types of restrictions that need to be imposed, let us consider the example of two groups, depicted in Fig. 4.3.

Fig. 4.3
figure 4_3_105830_2_Enfigure 4_3_105830_2_En

Graphical representation of two-group confirmatory factor analysis

For the first item of the first group, the measurement model is

$$ {\bf{x}}_1^{(1)} = {\boldsymbol \tau}_1 + {\boldsymbol \xi}_1^{(1)} + {\boldsymbol{\delta}} _1^{(1)} $$
((4.50))

with

$$ \kappa _1^{(1)} = 0 $$
((4.51))

This means that the latent construct \({\boldsymbol \xi} _1^{(1)} \) is measured in the units of \({\bf{x}}_1^{(1)} \).

Constraining τ 1 to be equal across groups is identical for identification as estimating it in one group and fixing the value in the other groups to be equal across groups. For the first item of the second group, the measurement model is

$$ {\bf{x}}_1^{(2)} = {\boldsymbol \tau}_1 + {\boldsymbol \xi}_1^{(2)} + {\boldsymbol{\delta}} _1^{(2)} $$
((4.52))

Even though the mean of \({\boldsymbol{\xi}} _1^{(2)} \) can be different from \({\boldsymbol{\xi}} _1^{(1)} \), the measurement unit is fixed to be the units of \({\bf{x}}_1^{(1)} \).

For the model to have different factor means κ that are meaningful, the following conditions must be met:

  1. 1.

    Metric invariance, i.e., the same factor loadings Λ x across groups.

  2. 2.

    Scalar invariance, i.e., the same constant for the scale of each item τ x across groups.

These issues are particularly relevant in cross-cultural research where measurement instruments must be comparable across cultures/countries and especially when the factor means are of interest to the research.

4.6 Application Examples Using LISREL

We now present examples of confirmatory factor analysis using LISREL8 for Windows (or AMOS) . These examples include the test of a single factor analytic structure and the estimation of a factor analytic structure with two correlated factors.

4.6.1 Example of Confirmatory Factor Analysis

The following example in Fig. 4.4 shows the input file for LISREL8 for Windows:

Fig. 4.4
figure 4_4_105830_2_Enfigure 4_4_105830_2_En

LISREL input example for confirmatory factor analytic model (examp4-1.spl)

An exclamation mark indicates that what follows is a comment and is not part of the LISREL8 commands. Therefore, the first real input line in Fig. 4.4 starts with DA, which stands for data. On that line, NI indicates the number of input (observed) variables (6 in this example), MA=KM indicates the type of matrix to be modeled, KM for correlation, or CV for covariance.

The second line of the input is used to specify how to read the data. RA indicates that the raw data will be read (from which the correlation matrix will be automatically computed) and FI=filename indicates the name of the file containing that data, where filename is the Windows file name including the full path.

The third line, with LA, indicates that next come the labels of the indicator (input) variables. These are shown as Q5, Q7, etc., on the following line.

The next line specifies the model, as indicated by the code MO at the beginning of that line. NX indicates the number of indicators corresponding to the exogenous constructs (here, there are six). NK stands for the number of ksi constructs (we have a unique factor in this example). PH=ST indicates that the covariance matrix phi is specified here as a standardized matrix, i.e., a correlation matrix with 1’s in the diagonal and 0’s off-diagonal. The covariance matrix of the measurement model error terms, theta delta, is specified as a symmetric matrix (TD=SY). A diagonal matrix (TD=DI) could have presented a simpler model where all covariances are zero. However, this example illustrates how some of these parameters can be estimated.

LK, on the next line, stands for the label of the ksi constructs, although there is only one of them in this example. That label “FactorOne” follows on the next line.

The following line starting with FR is the list of the parameters that are estimated where LX stands for lambda x and TD for theta delta. Each is followed by the row and column of the corresponding matrix, as defined in the model specification in Equations (4.2) and (4.4).

The line “Path Diagram” indicates that a graphical representation of the model is requested.

The last line of the input file describes the output (OU) requested. SE means standard errors, TV their t-values and MI the modification indices.

The LISREL8 output of such a model is given in Fig. 4.5.

In the output, as shown in Fig. 4.5, after listing the instruction commands described earlier according to the model specified in the corresponding input file, the observed covariance matrix (in this case a correlation matrix) to be modeled is printed.

The “Parameter Specifications” section indicates the list and number of parameters to be estimated, with a detail of all the matrices containing the parameters. The value zero indicates that the corresponding parameter is fixed and is not to be estimated. Unless specified otherwise, the default value of these fixed parameters is set to zero.

The number of iterations shows the number that was necessary to obtain convergence and the parameter estimates follow. Below each parameter estimate value, its standard error is shown in parentheses and the t-value below it.

Then follow the goodness-of-fit statistics, among which those described earlier can be found. The example run in Fig. 4.5 shows that the single factor model represents well the observed correlation matrix since the chi-squared is not statistically significant and the GFI is high with a value of 0.98.

The modification indices are reasonably small, which indicates that freeing additional parameters would not lead to a big gain in fit.

The diagram of such a confirmatory factor analytic model is shown in Fig. 4.6.

4.6.2 Example of Model to Test Discriminant Validity Between Two Constructs

The following example is typical of an analysis where the goal is to assess the validity of a construct. Figure 4.7 shows the input file to estimate a two-factor model (such analyses are usually performed two factors at a time because the modeling of all the factors at once typically involves problems too big to obtain satisfactory fits). The commands are identical to those described earlier, except that now two constructs, “FactorOne” and “FactorTwo”, are specified.

Fig. 4.5
figure 4_5_105830_2_Enfigure 4_5_105830_2_Enfigure 4_5_105830_2_Enfigure 4_5_105830_2_Enfigure 4_5_105830_2_Enfigure 4_5_105830_2_Enfigure 4_5_105830_2_Enfigure 4_5_105830_2_En

LISREL8 for Windows output example for confirmatory factor analytic model (examp4-1.out)

Fig. 4.6
figure 4_6_105830_2_Enfigure 4_6_105830_2_En

Path diagram of confirmatory factor analytic model (examp4-1.pth)

Fig. 4.7
figure 4_7_105830_2_Enfigure 4_7_105830_2_En

LISREL8 for Windows input for model with two factors (examp4-2.spl)

The LISREL8 output corresponding to this two-factor confirmatory factor structure is shown in Fig. 4.8. The description of this output is similar to the one described above involving a single factor. The major difference is the estimate of the correlation between the two factors, which is shown to be –0.56 in this particular example. The diagram representing that factor analytic structure is shown in the next figure (Fig. 4.9).

Fig. 4.8
figure 4_8_105830_2_Enfigure 4_8_105830_2_Enfigure 4_8_105830_2_Enfigure 4_8_105830_2_Enfigure 4_8_105830_2_Enfigure 4_8_105830_2_Enfigure 4_8_105830_2_Enfigure 4_8_105830_2_Enfigure 4_8_105830_2_Enfigure 4_8_105830_2_Enfigure 4_8_105830_2_Enfigure 4_8_105830_2_Enfigure 4_8_105830_2_Enfigure 4_8_105830_2_Enfigure 4_8_105830_2_Enfigure 4_8_105830_2_Enfigure 4_8_105830_2_Enfigure 4_8_105830_2_En

LISREL8 for Windows output for model with two factors (examp4-2.out)

Fig. 4.9
figure 4_9_105830_2_Enfigure 4_9_105830_2_En

LISREL8 for Windows path diagram for model with two factors (examp4-2.pth)

Figure 4.10 shows the input file for a factor analytic structure where a single factor is assumed to be reflected by all the items.

Fig. 4.10
figure 4_10_105830_2_Enfigure 4_10_105830_2_En

LISREL8 for Windows input for model with single factor (examp4-3.spl)

Figure 4.11 is the output for such a factor analytic structure where a single factor is assumed to be reflected by all the items.

Fig. 4.11
figure 4_11_105830_2_Enfigure 4_11_105830_2_Enfigure 4_11_105830_2_Enfigure 4_11_105830_2_Enfigure 4_11_105830_2_Enfigure 4_11_105830_2_Enfigure 4_11_105830_2_Enfigure 4_11_105830_2_Enfigure 4_11_105830_2_Enfigure 4_11_105830_2_Enfigure 4_11_105830_2_Enfigure 4_11_105830_2_Enfigure 4_11_105830_2_Enfigure 4_11_105830_2_Enfigure 4_11_105830_2_Enfigure 4_11_105830_2_Enfigure 4_11_105830_2_Enfigure 4_11_105830_2_En

LISREL8 for Windows output of model with single factor (examp4-3.out)

The resulting chi-squared (χ 2 = 126.75 in Fig. 4.11) can be compared with the chi-squared resulting from a model with a correlation between the two factors (χ 2 = 54.78 in Fig. 4.6). The χ 2 difference (126.75–54.78) has one degree of freedom and its significance indicates that there are indeed two different constructs (factors), i.e., demonstrating the discriminant validity of the constructs.

4.6.3 Example of Model to Assess the Convergent Validity of a Construct

Next, in order to assess the convergent validity, one needs to compare the fit of a model with zero correlation between the factors with a model where the factors are correlated (as in Fig. 4.6). The input file for a model with independent factors (zero correlation) is shown in Fig. 4.12.

Fig. 4.12
figure 4_12_105830_2_Enfigure 4_12_105830_2_En

LISREL8 for Windows input for model with two independent factors (examp4-4.spl)

The output file for such a model with independent factors (zero correlation) is shown in Fig. 4.13.

Fig. 4.13
figure 4_13_105830_2_Enfigure 4_13_105830_2_Enfigure 4_13_105830_2_Enfigure 4_13_105830_2_Enfigure 4_13_105830_2_Enfigure 4_13_105830_2_Enfigure 4_13_105830_2_Enfigure 4_13_105830_2_Enfigure 4_13_105830_2_Enfigure 4_13_105830_2_Enfigure 4_13_105830_2_Enfigure 4_13_105830_2_Enfigure 4_13_105830_2_Enfigure 4_13_105830_2_Enfigure 4_13_105830_2_Enfigure 4_13_105830_2_Enfigure 4_13_105830_2_Enfigure 4_13_105830_2_Enfigure 4_13_105830_2_Enfigure 4_13_105830_2_En

LISREL8 for Windows output of model with two independent factors (examp4-4.out)

The independent factor model has a chi-squared of 84.34 (Fig. 4.13), which when compared with the chi-squared of the model estimating a correlation between the two constructs (Fig. 4.6) shows a chi-squared difference of 29.56. This difference being significant (with one degree of freedom at the 0.05 level), this indicates that the constructs are not independent, i.e., showing convergent validity of the two constructs.

Instead of defining the variances of the unobserved constructs to unity, the result would have been the same if one lambda for each construct had been fixed to one but the variances of these constructs had been estimated. This is illustrated with the input, which would be needed for running this model with AMOS (although it can be done easily with LISREL8 following the principles described above, this example uses AMOS to introduce its commands).

The input of the corresponding two-factor confirmatory factor model with AMOS is shown in Fig. 4.14.

Fig. 4.14
figure 4_14_105830_2_Enfigure 4_14_105830_2_En

AMOS input example for confirmatory factor analytic model (examp4-5.ami)

In AMOS , such as shown in Fig. 4.14, each equation for the measurement model can be represented with a variable on the left-hand side of an equation and a linear combination of other variables on the right-hand side. These equations correspond to the measurement model as specified by Equation (4.2). Inserting “(1)” before a variable on the right-hand side indicates that the coefficient is fixed to that value and that the corresponding parameters will not be estimated. The program recognizes automatically which variables are observed and which are unobserved.

Correlations are indicated by “variable1 <> variable2”, where variable1 and variable2 are the labels of observed variables or of hypothetical constructs. The output provides similar information as available in LISREL8 .

4.6.4 Example of Second-Order Factor Model

Next, we present an example of second-order factor analysis using the same data as in the previous examples. Since two factors are correlated, we can test a model where these two factors reflect a single higher-order construct. Figure 4.15 shows the LISREL input file.

Fig. 4.15
figure 4_15_105830_2_Enfigure 4_15_105830_2_En

Input for second-order factor analysis using LISREL 8 (examp4-6.spl)

For the most part, the input file contains instructions similar to the description of the input files of regular confirmatory factor analysis. It should be noted that the sample size is included on the data line (“NO=145”). The differences are in the model statement where NX has been replaced by NY, the number of indicator variables for the η’s. NE corresponds to the number of first-order factors (the η’s). NK is set to one in this example because only one second-order factor is assumed. GA indicates that the elements of the Γ matrix will be fixed by default, although we

will specify which elements to estimate in the “FREE” line below. The covariance matrix of the second-order factors is set to be diagonal (“PH=DI”), although in our example, this matrix is simply a scalar. The labels for the first-order factors are the same as in the earlier example of regular confirmatory factor analysis, except that they now correspond to the η’s, which is why they are introduced by “LE” (Label Etas). The labels for the second-order factor are “new,” which follows the “LK” (Label Ksis).

One of the factor loadings for each first-order factor is fixed to one in order to define the unit of the factors to the units of that item. Finally, the parameters to be estimated are freed; they are the elements of the factor loading matrix Λ and Γ.

The output corresponding to this second-order factor analysis is shown in Fig. 4.16.

Fig. 4.16
figure 4_16_105830_2_Enfigure 4_16_105830_2_Enfigure 4_16_105830_2_Enfigure 4_16_105830_2_Enfigure 4_16_105830_2_Enfigure 4_16_105830_2_Enfigure 4_16_105830_2_Enfigure 4_16_105830_2_Enfigure 4_16_105830_2_Enfigure 4_16_105830_2_Enfigure 4_16_105830_2_Enfigure 4_16_105830_2_Enfigure 4_16_105830_2_Enfigure 4_16_105830_2_En

LISREL output for second-order factor analytic model (examp4-6.out)

The graphical representation of the results is shown in Fig. 4.17.

Fig. 4.17
figure 4_17_105830_2_Enfigure 4_17_105830_2_En

Second-order factor analytic model (examp4-6.pth)

The results of this second-order factor analysis indicate a poor fit of the model with a chi-squared, which is highly significant. Nevertheless, the parameter estimates for the second-order factor loadings on the first-order factors correspond to what would be expected from the correlation pattern between these two constructs (a positive loading on FactorOne and a negative loading on FactorTwo).

4.6.5 Example of Multi-group Factor Analysis

The example we will use to illustrate the analysis of factors across groups concerns the subjective wellbeing of men in three different countries (USA, Austria, and Australia). There are five items to measure subjective wellbeing. Figure 4.18 lists the input for doing this analysis in LISREL.

Fig. 4.18
figure 4_18_105830_2_Enfigure 4_18_105830_2_En

Unconstrained CFA for subjective wellbeing of men in three countries (examp4-7.ls8)

We indicate that the data file contains raw data (rather than correlations or covariances) by specifying on the third line “RA=” followed by the full name of the file, including the directory path. The first line indicates the label for the first group (country, in this case).

The second line indicates that the data contain five indicators (“NI=5”), that there will be three groups (“NG=3”), the number of observations in the first group (“NO=226”) and that the covariance matrix will be analyzed (“MA=CM”).

The model line (which starts with “MO”) indicates that there will be 5× indicators (observed items), one factor ξ (“NK=1”), and that Tau is to be estimated (“TX=FR”) but Kappa is fixed (“KA=FI”). Θ δ is specified as symmetric because we will estimate some of the covariance terms that appeared to be non-zero.

We label the Factor as “SWB” for subjective Wellbeing, following the line LK for Label Xsi. The Lambda matrix is then specified with five rows of 1’s and the first value is fixed to the value 1 (the line “FI LX 1 1” fixes the parameter and the line “VA 1 LX 1 1” sets it to the value 1). The diagonal elements of the measurement error covariance matrix are then freed so that these elements can be estimated (as well as one of the covariances).

Then the output line “OU MI” requests that the modification indices be included in the output.

Similar information is then entered in turn for the other two groups, except that some of the parameters do not need to be repeated.

The path diagram is requested through the instruction “PD”.

For this unconstrained analysis, the confirmatory factor analysis is conducted country by country separately. The chi-square for the three countries is the sum of the chi-squares for each of the three groups.

Because no constraints are imposed, the construct means cannot be estimated and each mean (in each country) is zero. Figure 4.19 gives the values of the estimated parameters on a graphical representation of the model.

Fig. 4.19
figure 4_19_105830_2_Enfigure 4_19_105830_2_En

Unconstrained estimates (examp4-7.pth)

It is clear from Fig. 4.19 that the estimated loading parameters are country specific.

In metric invariance, the factor loadings are constrained to be the same across groups. The scalar values Tau can, however, vary across groups, which makes it impossible to assess different means for the construct across groups. Figure 4.20 lists the input to run such a partially constrained model.

Fig. 4.20
figure 4_20_105830_2_Enfigure 4_20_105830_2_En

LISREL input for metric invariance model of subjective wellbeing for three countries (examp4-8.ls8)

The input in Fig. 4.20 is identical to the unconstrained estimation, except for the statement concerning the factor loadings in the second and third group. Indeed, for these two countries, the statement “LX=IN” indicates that these parameters must be constrained to be invariant, i.e., equal across groups. Figure 4.21 provides the output for this problem.

Fig. 4.21
figure 4_21_105830_2_Enfigure 4_21_105830_2_En

Output for metric invariance (examp4-8.pth)

Although the error variances vary across countries, the factor loadings are identical, i.e., invariant. As indicated above, the means of the unobserved factors are still zero for each group.

In the scalar invariance model, the factor loadings are equal, i.e., invariant across groups as in metric invariance. However, in addition, the scalars Tau are also invariant. This is indicated as in Fig. 4.22 with “TX=IN” for the last two groups for Tau to be invariant or equal across groups.

Fig. 4.22
figure 4_22_105830_2_Enfigure 4_22_105830_2_En

LISREL input for scalar invariance model (examp4-9.spl)

The means are then shown in Fig. 4.23.

Fig. 4.23
figure 4_23_105830_2_Enfigure 4_23_105830_2_En

Factor means with scalar invariance model (examp4-9.pth)

It can be seen from Fig. 4.23 that the means of the SWB factor in the USA and Austria are almost the same (zero for the USA and close to zero for Austria but slightly below as indicated by the negative sign before the 0.00). However, the mean is –0.58 for SWB in Australia, indicating an inferior perception of wellbeing in that country relative to the USA and Austria.

The full outputs are not listed here, as they provide the same information as in the case of single-group confirmatory factor analysis. The chi-squared of each of these models can be compared because these are nested constrained models. The difference in chi-squares with the proper difference across models in the degrees of freedom is also chi-squared distributed and can serve to test the extent of the loss in fit due to the imposition of the constraint. Insignificant chi-squares when imposing metric invariance first and scalar invariance next lead to conclude the appropriateness of a comparison across groups.

The outputs of the three models under different constraints are not included beyond their graphical representations. The basic statistics needed are (1) the number of data points to be reproduced, (2) the number of parameters to be estimated, and (3) the chi-square values for each of the models.

First, we calculate the number of data points available. For each country, there is a 5×5 covariance matrix, which provides 15 different data points, i.e., 45 for the three countries. In addition, there are five means for the five items for each country, i.e., 15 means. The total number of data points is, therefore, 45+15 = 60.

Next, we calculate the number of parameters to be estimated for each model. Table 4.1 provides the details.

Table 4.1 Number of parameters and degrees of freedom of each model

In the unconstrained model, there are four lambdas to be estimated for each country (one loading must be fixed to unity to define the unit of measurement); this is indicated in the corresponding cell of the table by “4+4+4”. In both the metric and scalar invariance models, there are only four lambdas to be estimated since these lambdas are constrained to be equal across groups. The error term variances are five for each country but for two countries, a covariance term has also been estimated (although Figs. 4.19 and 4.21 show two estimated covariances Θ δ per country, one covariance in the USA and Austria is close to zero so that only one covariance for each of these two countries and none for Australia are estimated in the model for which the chi-squares are shown in Table 4.1); this explains the “6+6+5”, as no covariance is estimated for the third country.

When subtracting the number of parameters from the number of data points (i.e., 60), one obtains the degrees of freedom for each model.

Given the nested structure of these three models, it is possible to compare the extent to which imposing additional constraints makes the fit worse. When comparing the unrestricted model to the metric invariance constraint (same loadings across groups), the chi-squared goes from 14.79 to 25.26, that is a difference of 10.47, which is chi-squared distributed with 8 degrees of freedom (21–13). The critical chi-squared with 8 degrees of freedom at α = 0.05 is 15.51. Consequently, we fail to reject this difference as significant. This supports the restriction that there is metric invariance.

Similarly, we can further evaluate the impact of the restriction that there is scalar invariance by comparing the chi-squares of the metric invariance model with that of the scalar invariance model. The chi-square increases from 25.26 to 40.00 when imposing the constraint that the tau’s are the same, even if we now can estimate the mean of the unobserved construct relative to one of the countries (USA) that serves as reference. The difference (40.00 – 25.26) = 14.74 is still not significant with 8 degrees of freedom (29 – 28) at α = 0.05. We therefore conclude for scalar invariance, which allows us to interpret the means estimated under this scalar invariance model. These means are shown in Fig. 4.23, as indicated above.

4.7 Assignment

Using the SURVEY data, estimate the parameters of a measurement model corresponding to a confirmatory factor analysis of two or three constructs. Include an analysis of convergent and discriminant validity.

Considering a categorical variable that distinguishes between respondents, define several groups of respondents (e.g., respondents of different ages). Then, perform a multi-group analysis to test the invariance of the measurement model of your choice.

The SAS file listed in Fig. 4.24 shows an example of how to create a new data file to use with LISREL containing a subset of the data especially that may contain only the items relevant for your analysis.

Fig. 4.24
figure 4_24_105830_2_Enfigure 4_24_105830_2_En

SAS code example to create a new data file containing a subset of the full survey data to use with LISREL

4.8 Bibliography

4.8.1 Basic Technical Readings

  • Bagozzi, Richard P. and Youjae Yi (1988), “On the Evaluation of Structural Equation Models,” Journal of the Academy of Marketing Science, 16, (Spring), 74–94.

  • Bagozzi, Richard P., Youjae Yi and Lynn W. Phillips (1991), “Assessing Construct Validity in Organizational Research,” Administrative Science Quarterly, 36, 421–458.

  • Bollen, Kenneth and Richard Lennox (1991), “Conventional Wisdom on Measurement: A Structural Equation Perspective,” Psychological Bulletin, 110, 2, 305–314.

  • Cortina, Jose M. (1993), “What is Coefficient Alpha? An Examination of Theory and Applications,” Journal of Applied Psychology, 78, 1, 98–104.

  • Diamanopoulos, Adamantios and Heidi M. Winklhofer (2001), “Index Construction with Formative Indicators: An Alternative to Scale Development,” Journal of Marketing Research, 38, 2 (May), 269–277.

  • Gerbing, David W. and James C. Anderson (1988), “An Updated Paradigm for Scale Development Incorporating Unidimensionality and Its Assessment,” Journal of Marketing Research, 25, 2, 186–192.

  • Green, Paul E. (1978), Mathematical Tools for Applied Multivariate Analysis, New York: Academic Press, [Chapter 5 and Chapter 6, Section 6.4].

  • Lord, Frederic M. and Melvin R. Novick (1968), Statistical Theories of Mental Test Scores, Reading, MA: Addison-Wesley Publishing Company, Inc., [Chapter 4].

  • Nunnally, John C. and Ira H. Bernstein (1994), Psychometric Theory, Third Edition, New York: McGraw Hill.

4.8.2 Application Readings

  • Aaker, Jennifer L. (1997), “Dimensions of Brand Personality”, Journal of Marketing Research, 34, 3 (August), 347–356.

  • Anderson, Erin (1985), “The Salesperson as Outside Agent or Employee: A Transaction Cost Analysis,” Marketing Science, 4 (Summer), 234–254.

  • Anderson, Ronald D. and Jack Engledow (1977), “A Factor Analytic Comparison of U.S. and German Information Seeker,” Journal of Consumer Research, 3, 4, 185–196.

  • Baumgartner, Hans and Christian Homburg (1996), “Applications of Structural Equation Modeling in Marketing and Consumer Research: A Review,” International Journal of Research in Marketing, 13, (April), 139–161.

  • Blackman, A. W. (1973), “An Innovation Index Based on Factor Analysis,” Technological Forecasting and Social Change, 4, 301–316.

  • Churchill, Gilbert A., Jr. (1979), “A Paradigm for Developing Better Measures of Marketing Constructs”, Journal of Marketing Research, 16 (February), 64–73.

  • Deshpande, Rohit (1982), “The Organizational Context of Market Research Use”, Journal of Marketing, 46, 4 (Fall), 91–101.

  • Finn, Adam and Ujwal Kayandé (1997), “Reliability Assessment and Optimization of Marketing Measurement”, Journal of Marketing Research, 34, 2 (May), 262–275.

  • Gilbert, Faye W. and William E. Warren (1995), “Psychographic Constructs and Demographic Segments,” Psychology & Marketing, 12, 3 (May), 223–237.

  • Green, Stephen G., Mark B. Gavin and Lunda Aiman-Smith (1995), “Assessing a Multidimensional Measure of Radical Technological Innovation”, IEEE Transactions on Engineering Management, 42, 3, 203–214.

  • Kuester, Sabine, Christian Homburg, and Thomas S. Robertson (1999), “Retaliatory Behavior to New Product Entry,” Journal of Marketing, 63, 4 (October), 90–106.

  • Murtha, Thomas P., Stefanie Ann Lenway and Richard P. Bagozzi (1998), “Global Mind-Sets and Cognitive Shift in a Complex Multinational Corporation,” Strategic Management Journal, 19, 97–114.

  • Paulssen, Marcel and Richard P. Bagozzi (2006), “Goal Hierarchies as Antecedents of Market Structure,” Psychology & Marketing, 23, 8, 689–709.

  • Perreault, William D., Jr. and Laurence E. Leigh (1989), “Reliability of Nominal Data Based on Qualitative Judgments”, Journal of Marketing Research, 26 (May), 135–148.

  • Zaichowsky, Judith Lynne (1985), “Measuring the involvement Construct,” Journal of Consumer Research, 12 (December), 341–352.