Spearman (1904) marks the birth of factor analysis. He hypothesized a single, general intelligence (“G”) factor influencing the performance on many different tests along with a “specific factor” that was unique to each test and proposed factor analysis to examine these ideas. His “two-factor” (common and specific factors) perspective characterized numerous early applications of factor analysis. But even with an emphasis on a common factor that permeated all observed variables, there was recognition that “group factors,” secondary to the common factor, were in subsets of similar tests (Burt, 1917; Carey, 1916). The group factors stimulated interest in multiple factors rather than factor analysis with only a single, common factor. When the door to multiple factor analysis was opened, it gave rise to the question of how many more? This in turn led toward exploratory factor analysis (EFA) where the number of factors was not specified in advance but were “discovered” from the data in combination with factor analysis algorithms.

Though the procedures for EFA are diverse, there are several characteristics that are common. First, they assume no correlated errors (or unique factors) of the observed variables (or indicators).Footnote 1 That is, once the factors are controlled, there are no other sources of covariance among the observed variables that measure the factors. Second, when two or more factors are extracted, EFA allows all indicators to load on all factors apart from the scaling (or reference) indicators when these set the metric of the factors. Third, EFA requires rules to determine when enough factors are extracted to stop the process (e.g., the number of factors equals the number of eigenvalues>1). EFA seeks to use many fewer factors than the number of observed variables. After all, one of EFA’s purposes is data reduction so the goal is to summarize numerous observed variables into relatively few factors. Fourth, after the extraction of initial factors most applications of EFA require factor rotation to maximize the high loadings and to minimize the low loadings. Factor rotation algorithms seek solutions that enhance the interpretation of the substantive meaning of rotated solution so that researchers can interpret the meaning of the factors. Fifth, as part of the rotation, EFA requires the analysts to choose between allowing all factors to correlate or be uncorrelated where the former is an oblique solution, and the latter is an orthogonal solution.

A variety of techniques for factor analysis emerged in empirical analyses. These ranged from Spearman’s (Spearman, 1904) early use of correlations, partial correlations, and tetrads to Thurstone’s (1947) centroid method, and to Lawley and Maxwell’s (Lawley and Maxwell, 1963) maximum likelihood estimator. Excellent texts on EFA (e.g., Gorsuch, 1983; Harman, 1976; Mulaik, 2009) are available, and our purpose is not to provide a tutorial on such techniques. But we note that there are common problems shared by most EFA procedures. These include: (1) ambiguity about the number of factors (e.g., Preacher et al. 2013), (2) the factor rotation problem or underidentification (e.g., Darton 1980), (3) how to find simple structure solutions (e.g., Thurstone 1947, Ch. 14), (4) ad hoc rules of when factor loadings should be set to zero (e.g., Nunnally 1978, p. 421), (5) assuming that all errors (“unique factors”) are uncorrelated even if there are good reasons to allow correlated errors (Bollen 1989, p. 232), (6) nonconvergence of iterative procedures, (7) computational burdens of iterative procedures when researchers apply factor analysis to “Big Data” (e.g., Urban and Bauer 2021), and (8) limited diagnostics for each individual indicator (measure or observed variable).Footnote 2

Our purpose is to propose a new approach to exploratory factor analysis that attends to these issues by using a model implied instrumental variable (MIIV) framework. The MIIV method for confirmatory factor analysis (CFA) and latent variable structural equation models (SEMs) for prespecified models is covered in other articles (e.g., Bollen 1996; Bollen et al. 2021). However, while instrumental variable approaches exist, a MIIV approach to EFA does not. We created this method not because a MIIV approach completely solves all problems of EFA that we described in the last paragraph, but the MIIV approach can help, and it provides a fresh angle on these problems. For instance, the MIIV approach for estimation is noniterative so there is no issue of having an EFA that does not converge. The MIIV approach also offers diagnostic statistics that apply to individual observed variables (indicators) and their fit in the model. One valuable aspect of these diagnostics is that they take a predictable form when the model has too few factors and hence can aid us in determining the number of factors. Further, there is no rotation problem because the algorithms only apply to identified models and the rotation problem is a consequence of underidentification. Yet another advantage of the MIIV approach is that it has greater robustness to model misspecification than has been established for other factor analysis methods. For instance, the factor loadings of an indicator on a factor in a CFA that are correctly specified are relatively uninfluenced by mistakenly including the indicators of a different factor, such as would occur by having an insufficient number of factors (e.g., Bollen, 2020). Yet another advantage is that we can start with some prior hypotheses about the data even if that does not lead to a complete model specification. Correlated errors (“unique factors”) are an example where we might suspect that the errors of the same indicator correlate over time. With our MIIV approach, we can include these in our starting model even if we do not know the number of factors. Our paper is meant as a demonstration of the potential of this novel approach to EFA.

The next section of the paper is a literature review of the use of instrumental variables in factor analysis and how our MIIV approach differs from past efforts. The factor analysis model and assumptions are covered in the subsequent section. This is followed by a section that describes the MIIV approach and algorithm to EFA. We then cover test examples to see how our algorithms perform for several simulation and empirical illustrations. Our conclusions are next in which we describe what we see as remaining problems with our MIIV approach in the hopes of stimulating fresh research. Appendix 1 to the paper describes alternative MIIV EFA algorithms that we tried prior to the algorithm described in the main text. Appendix 2 is on sensitivity of the MIIV-EFA to cross-loadings.

1 Instrumental Variables in Factor Analysis

Before discussing how instrumental variables are utilized for EFA, we first provide some foundational information. Since estimation with instrumental variables has been covered extensively elsewhere (e.g., Angrist & Krueger, 2001; Bollen, 2012; Bowden & Turkington, 1990; Didelez et al., 2010; Sovey & Green, 2011), we simply provide a broad overview here. Instrumental variables are helpful in situations where, at the population or data-generating level, the error for an equation is correlated with one or more covariates in the same equation. When the error correlates with a covariate, ordinary least squares (OLS) regression results in biased and inconsistent estimator of regression coefficients. Estimators that use instrumental variables can create a consistent estimator of coefficients. But to do so, we need to find such instruments.

Instrumental variables are variables that correlate with the endogenous variables on the right-hand side of the equation but do not correlate with the composite error. Model implied instrumental variables (MIIVs) are observed variables (indicators) that are already part of the model and that meet the conditions to be instruments, that is, they correlate with the right-hand side endogenous variables and are uncorrelated with the composite errors according to the specified model. With these variables identified, analysts can use instrumental variable estimators [e.g., two-stage least squares (2SLS)] to estimate the coefficients in a factor model as well as other latent variable models (Bollen, 1996). Estimation with instrumental variables has benefits, such as being “distribution-free” and having greater robustness to structural misspecification than full information estimators such as maximum likelihood (Bollen et al., 2007, 2018; Bollen, 2020).

The overwhelming majority of instrumental variable applications are in regression or simultaneous equations without latent variables. However, instruments apply to factor analysis as well. We are not the first to propose an instrumental variable methodology in EFA. The idea appears to have originated in Psychometrika with Madansky (1964), though even he cites antecedences to his work. Nearly two decades later, Hägglund (1982) built on Madansky (1964) paper, and he discussed three different factor analysis by instrumental (FABIN) variables estimators for EFA that he called FABIN 1, FABIN 2, and FABIN 3, also published in Psychometrika. In yet another Psychometrika paper, Jennrich (1987) proposed improved algorithms to calculate the FABIN 2 and FABIN 3 estimators and added a generalized least squares (GLS) estimator of the factor covariances and unique factor variances that is conditional on the factor loading estimates. Cudeck (1991) described the importance of choosing the best scaling (or reference) indicator for each factor and discussed a method to do so by Du (1986).

Though all these studies used instrumental variables, they differ in varying degrees. To help to explain how these instrumental variable methods and models differ, we begin with a basic equation for a general EFA model,

$$\begin{aligned} {\textbf {Z}} = \varvec{\alpha } + \Lambda {\textbf { L}} + \varvec{\varepsilon } \end{aligned}$$
(1)

where Z is the vector of \(N_{Z}\) indicators of the factors where \(N_{Z}\) is the number of indicators in the factor analysis. We will use the terms of indicators, measures, or observed variables to refer to Z. The next symbol, \(\varvec{\alpha } \), is a vector of measurement intercepts or constants that are analogous to constants in regression equations. The \(\Lambda \) is the matrix of factor loadings and L is the vector of factors or latent variables. The \(\varepsilon \) is the vector of unique factors, sometimes called the equation “errors.” All authors assume that \(\textrm{E}\)( \(\varvec{\varepsilon } ) =\) 0 and COV(L, \(\varvec{\varepsilon } ) =\) 0 where E(.) is the expected value and COV(.,.) is the covariance of the variables within the parentheses. As we explain below, the model in Eq. (1) as specified is more general than earlier instrumental variable approaches to EFA approaches in that past instrumental variable methods impose more restrictive assumptions compared to what we use.

Table 1 Comparison of instrumental variable approaches to exploratory factor analysis.

Table 1 summarizes the properties of the model and estimators for each of the instrumental variable approaches to EFA. The studies differ in their assumptions about the covariances of the unique factors, the inclusion of intercepts, and whether variables are deviated from their means. The last column includes the model implied instrumental variable (MIIV) method that we propose as a way of contrasting ours with earlier work. The first column of Table 1 gives the properties of the EFA method as defined by each publication. The citations to the publications label the remaining columns, with our new approach being the last column. The first property asks whether the estimator is noniterative (yes or no).Footnote 3 All these instrumental variable estimators are noniterative and this is seen as a major advantage in terms of computational speed. As researchers begin to analyze large data sets with factor analysis, this computational speed takes on greater importance. This noniterative feature also avoids nonconvergence which sometimes happens for iterative estimators like maximum likelihood.

A second property is whether the model for EFA permits estimating intercepts ( \(\varvec{\alpha } )\) of the measurement equation. Intercepts are useful in examining whether there are constant differences between indicators (measures) with similar ranges of values for a given value of the latent variable. Large differences in intercepts across indicators with similar ranges can point to problematic indicators. All the earlier studies omitted intercepts from their models, though we will include them in ours. Relatedly, all studies besides ours assume that the means of the indicators are zero [\(\textrm{E}\)(Z) \(=\) 0] as is true for variables that are deviated from their means, and this is shown in Table 1. Another common property of these methods is that they assume that the factors are correlated (“oblique”). Cudeck (1991) assumes uncorrelated common factors only for convenience in parts of the presentation, but for all others \(\Sigma _{\textrm{LL}}\) permits nonzero off-diagonal elements.

For the covariance matrix of the unique factors, all papers except Toyoda (1997) and our MIIV approach assume that they are uncorrelated ( \(\Sigma _{\mathrm {\varepsilon \varepsilon }}\) diagonal). A complication of not allowing correlated errors is that there are frequent situations where researchers know that such correlations are likely. For instance, longitudinal studies frequently use multiple indicators of the same factor measured with the same indicators at each wave of data. The errors of a measure at one time point are often associated with the errors of the same measure at the next time point. Hence, correlated errors are hypothesized. Or sometimes the questions leading to measures have similar wordings and this leads to correlated errors even after taking account of the common factors underlying the indicators. Taking account of such correlated errors as part of the EFA procedure could lead to better fitting models that are easier to interpret. It is for this reason that we propose an EFA method that allows them.

Once researchers estimate the factor loadings (\(\hat{\varvec{\Lambda } })\), it is valuable to have an estimate of their standard errors for significance testing and forming confidence intervals. Madansky (1964) did not include standard errors in his study, but Hägglund (1982) and Toyoda (1997) did. Although Jennrich (1987) and Cudeck (1991) focused on the FABIN estimators from Hägglund (1982), they did not include their standard errors as part of the discussion. Our MIIV approach includes standard errors of \(\varvec{\hat{\alpha }}\) and \(\varvec{\hat{\Lambda }}\). These are valuable in assessing whether measurement intercepts and factor loadings significantly differ from zero or other hypothesized values.

Tests are available that can help identify misspecifications in the model. These tests rely on the presence of a greater number of instrumental variables than needed to estimate coefficients. As some background, the coefficients in the equations for indicators in EFA are estimated using instrumental variables. Sometimes, there is more than the minimum number of instrumental variables than needed to estimate them. That is, some indicator equations are overidentified. For instance, consider an indicator that depends on one latent variable. This implies that the scaling indicator of the latent variable is the only variable other than the composite error that is on the right-hand side of the equation. Suppose that there are three instrumental variables for that equation. This means that the equation is overidentified because there are two more instrumental variables than there are coefficients to estimate. In these cases, a Chi-square test for the equation is available that can help to detect possible misspecifications in a model (Bollen 1996, pp 117–118). This test is also referred to as an overidentification test.

Kirby and Bollen (2009) compared a variety of overidentification tests. Though they proposed these overidentification tests in the context of CFA and latent variable SEM, these same overidentification diagnostics apply to EFA. In fact, these are an important part of our approach to building new models through EFA. None of the other instrumental variable EFA approaches discussed overidentification tests, whereas the use of such tests is central to our MIIV-EFA approach.

Among other things, these overidentification tests are essential to our procedures for detecting the number of factors underlying a set of variables. Indeed, one reason for indicators failing the overidentification tests is that an insufficient number of factors are extracted from the data and these indicators load on additional factors. In the next section, we will discuss how we use these tests to add factors in the algorithm that we propose. None of the other EFA instrumental variable papers proposed a procedure with which to determine the number of factors.

Finally, there are situations where equality constraints on parameters are deemed necessary. Toyoda (1997) proposed how instrumental variable methods could include such constraints. We use the MIIV-2SLS from Bollen (1996) and Bollen (2019) for our EFA estimator. Nestler (2014) provides a way to modify this estimator to include equality constraints. The remaining studies do not discuss equality constraints.

In sum, we propose an instrumental variable estimator for EFA that allows intercepts for the measurement equations, correlated common factors, correlated errors, standard errors of factor loadings and measurement intercepts, overidentification tests of equations, and a procedure for determining the number of factors. We also permit simpler structures by removing nonsignificant cross-loadings in a model. Some earlier methods allow some of these features, though none, but ours allows all of them and some features are unique to our approach.

The next section gives more details on the MIIV approach to EFA.

2 MIIV-EFA Approach

In broad terms, the MIIV-EFA approach has a number of steps: (1) choose a starting model, (2) transform latent variables to observed variables (L2O transformation), (3) select scaling indicator(s), (4) determine whether to add another factor (latent variable), (5) check for potential cross-loading variables, (6) estimate and check the new factor model, (7) repeat (3) to (6)    until there are none or only one significant overidentification test of indicator equations or no remaining equations that are overidentified, and (8) assess model modifications against substantive knowledge. Except for step (1) and (8), all these steps are automated so that the analyst does not need to manually implement them. We describe the steps here to better explain how our algorithm proceeds. Below we give more details on each step.

  1. (1)

    Choose a starting model

The first step is to determine where to have the algorithm start. For a complete EFA with no information provided by the user, the algorithm starts at a default model of one factor with uncorrelated errors.Footnote 4 Analysts have the option of prespecifying specific correlated errors. We allow the options of correlated errors in the beginning because sometimes analysts have hypotheses or prior knowledge to incorporate. For instance, suppose that the wording of two indicators of the same factor is similar enough that the analysts expect that the errors of these two indicators are correlated even after controlling their common dependence on a factor. In this situation, the analyst would begin by including the correlated errors and the algorithm would commence from there.

This example illustrates a situation where the hypotheses about the model are partially formulated, but the rest of the analyses is exploratory. We illustrate this condition later with an empirical example. In many other analyses, the default model of one factor without correlated errors will be the starting point.

(2)Transform latent variables to observed variables (L2O transformation)

Once the initial factor model is in place, the MIIV-EFA procedure replaces latent variables (factors) with observed variables. Equation (1), Z \(=\) \(\varvec{\alpha } \quad +\) \(\Lambda \)L \(+\)  \(\varvec{\varepsilon } \), accommodates all factor analysis models. Because the right-hand side includes the latent factors L, we cannot directly regress the indicators on the factors. The procedure we use eliminates the latent factor L, so that the resulting equation contains no latent variables. To do so, we use the procedure from Bollen’s (1996) original paper on the MIIV approach to latent variable SEM. This is the latent to observed (L2O) variables transformation.Footnote 5

The MIIV approach has a scaling (reference or anchor) indicator for each factor such that its intercept is zero, its factor loading is one, and it only loads on the factor it scales. In this step, we choose scaling indicator(s) based on our initial impressions of which would be best. In the next step, we have a search process for finding the best scaling indicator for each factor. Partitioning the full set of indicators into the scaling and nonscaling indicators, we rewrite Eq. (1) as

$$\begin{aligned} \left[ {\begin{array}{*{20}c} \mathrm {\textbf{Z}}_{\textrm{s}}\\ \mathrm {\textbf{Z}}_{\textrm{ns}}\\ \end{array} } \right] =\left[ {\begin{array}{*{20}c} \textbf{0}\\ \varvec{\alpha }_{\textrm{ns}}\\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} \mathrm {\textbf{I}}_{\textrm{s}}\\ \mathrm {\varvec{\Lambda }}_{\textrm{ns}}\\ \end{array} } \right] \mathrm {\mathbf {L+ }}\left[ {\begin{array}{*{20}c} \varvec{\varepsilon }_{\textrm{s}}\\ \varvec{\varepsilon }_{\textrm{ns}}\\ \end{array} } \right] \end{aligned}$$
(2)

where Z\(_{\textrm{s}}\) and Z\(_{\textrm{ns}}\) are the vectors of scaling (s) and nonscaling (ns) indicators, the scaling indicators’ intercepts are set to zero, and the nonscaling indicators’ intercepts are in \(\varvec{\alpha }_{\textrm{ns}}\), the I\(_{\textrm{s}}\) is an identity matrix that represents the factor loading of ones for the scaling indicators, \(\varvec{\Lambda } _{\textrm{ns}}\) is the matrix of factor loadings for the nonscaling indicators, and \(\varvec{\varepsilon }_{\textrm{s}}\) and \(\varvec{\varepsilon }_{\textrm{ns}}\) are the unique factors of the scaling and nonscaling indicators, respectively.

We write the equation for the scaling indicators as Z\(_{\textrm{s}} =\) L \(+\) \(\varvec{\varepsilon } _{\textrm{s}}\) and we solve this for the latent variables to find that L \(=\) Z\(_{\textrm{s}}\) - \(\varvec{\varepsilon } _{\textrm{s}}\). Substituting in for L, we write the equation for the nonscaling indicators as

$$\begin{aligned}{} & {} {\textbf {Z}}_{\textrm{ns}} = \varvec{\alpha }_{\textrm{ns}} + \varvec{\Lambda }_{\textrm{ns }}{} {\textbf {L}} + \varvec{\varepsilon } _{\textrm{ns }} \nonumber \\{} & {} \quad = \varvec{\alpha }_{\textrm{ns}} + \varvec{\Lambda }_{\textrm{ns }} ({\textbf {Z}}_{\textrm{s}} - \varvec{\varepsilon }_{\textrm{s}}) \quad + \varvec{\varepsilon }_{\textrm{ns}}\nonumber \\{} & {} \quad = \varvec{\alpha }_{\textrm{ns}} + \varvec{\Lambda }_{\textrm{ns }}{} {\textbf {Z}}_{\textrm{s}} - \varvec{\Lambda } _{\textrm{ns}} \quad \varvec{\varepsilon }_{\textrm{s}} + \varvec{\varepsilon }_{\textrm{ns}}. \end{aligned}$$
(3)

The last line of Eq. (3) is the L2O variable transformation from the MIIV approach where the latent variables from the original model are substituted out while maintaining identical intercepts and factor loadings to those in the original model in (1) (Bollen, 1996, 2019). The last line of Eq. (3) eliminates the latent factors and has only observed variables on the left-hand side and observed variables and errors on the right-hand side. This completes the second step.

(3) Select scaling indicator

In the default situation where the number of factors or correlated errors are not hypothesized, then researchers begin with a one-factor model. More specifically, they have all indicators load on a single factor. A key part of the algorithm is to identify the best scaling indicator for this factor. For this, we estimate a series of one-factor models where each indicator in turn is the scaling indicator. Because there are \(N_{Z}\) indicators, this results in \(N_{Z}\) one-factor models.

The \(Z_{\textrm{1}}\) indicator is first, and we begin by estimating the one factor model using it as the scaling indicator. In this single-factor model, Eq. (3) becomes

$$\begin{aligned} {\textbf {Z}}_{\textrm{ns}} = \varvec{\alpha }_{\textrm{ns}} + \varvec{\Lambda }_{\textrm{ns}} Z_{\textrm{1}} +\varvec{\Lambda } _{\textrm{ns}} \varvec{\varepsilon }_{\textrm{1}} + \varvec{\varepsilon } _{\textrm{ns }} \end{aligned}$$
(4)

where Z\(_{\textrm{ns}} = \left[ Z_{2}\textrm{ }Z_{3}\cdots Z_{N_{Z}} \right] '\), \(\varvec{\alpha }_{\textrm{ns}} =\) \(\left[ \alpha _{2}\textrm{ }\alpha _{3}\cdots \alpha _{N_{Z}} \right] ^{'}\) \(\varvec{\Lambda }_{\textrm{ns }}= \left[ \mathrm {\Lambda }_{21}\textrm{ }\mathrm {\Lambda }_{31}\cdots \mathrm {\Lambda }_{N_{Z}1} \right] ^{'}\) and \(\varvec{\varepsilon } _{\textrm{ns }}= \left[ \varepsilon _{2}\textrm{ }\varepsilon _{3}\cdots \varepsilon _{N_{Z}} \right] '\) and these column vectors having dimensions \((N_{Z}-1)\times 1\). Next, we would form a second model where Z\(_{\textrm{2}}\) is the scaling indicator so that

$$\begin{aligned} {\textbf {Z}}_{\textrm{ns}} =\varvec{\alpha }_{\textrm{ns}} + \varvec{\Lambda }_{\textrm{ns}} Z_{\textrm{2}} + \varvec{\Lambda } _{\textrm{ns}} \varvec{\varepsilon }_{\textrm{2}} + \varvec{\varepsilon } _{\textrm{ns }} \end{aligned}$$
(5)

where Z\(_{\textrm{ns}} = \left[ Z_{1}\textrm{ }Z_{3}\cdots Z_{N_{Z}} \right] '\), \(\varvec{\alpha }_{\textrm{ns}} =\) \(\left[ \alpha _{1}\textrm{ }\alpha _{3}\cdots \alpha _{N_{Z}} \right] ^{'}\) \(\varvec{\Lambda }_{\textrm{ns}}= \left[ \mathrm {\Lambda }_{11}\textrm{ }\mathrm {\Lambda }_{31}\cdots \mathrm {\Lambda }_{N_{Z}1} \right] ^{'}\) and \(\varvec{\varepsilon } _{\textrm{ns }}= \quad \left[ \varepsilon _{1}\textrm{ }\varepsilon _{3}\cdots \varepsilon _{N_{Z}} \right] '\). And this would continue rotating through all \(N_{Z}\) indicators in the model.

The MIIVefa R package that we introduce allows the user multiple methods for selecting the scaling indicator. The default method chooses as the scaling indicator that variable that has the fewest number of statistically significant Sargan (1958) overidentification tests for all other indicators combined with the fewest number of nonsignificant factor loadings. In the factor analyses setting, significant Sargan tests indicate that variables may belong to or load on a different factor or that the model has omitted correlated errors with the errors of other indicators (more on this below). Nonsignificant factor loadings suggest it does not have strong relations with other indicators of the same factor, making it a poor scaling indicator. We refer to indicators with either significant Sargan test statistics or nonsignificant factor loadings as problematic variables. If two or more potential scaling indicators are tied on the least number of problematic variables, we choose the one with the highest \(R^{2}\) when regressed on all other indicators.

The highest \(R^{2}\) is straightforward in meaning and calculation. The overidentification test for each equation requires more discussion. Kirby and Bollen (2009) compared a variety of equation overidentification tests and found that the Sargan (1958) test had the best performance across different sample sizes, and we use it. The Sargan test statistic (\(T_{Sargan})\) asymptotically follows a chi square distribution with degrees of freedom (df) equal to (\(N_{Z}\) – 3) in the one-factor model with uncorrelated errors. For \(N_{Z }\) \({\>3}\), the equation is overidentified and testable.Footnote 6 The null hypothesis of the Sargan test is that all MIIVs satisfy the overidentification constraints. If the model is correctly specified, then this is true. If at least one of the MIIVs is uncorrelated with the composite error of the equation, then another interpretation of \(T_{Sargan }\)is that it tests whether all MIIVs are uncorrelated with the equation composite error versus at least one of the MIIVs are correlated with the regression error. If so, it violates the condition that all MIIVs are uncorrelated with the equation error and suggests model misspecifications.

Matrix expressions for \(T_{Sargan}\) are available in several sources (e.g., Kirby, 2009; Bollen, 2021). We explain a simple popular way to calculate \(T_{Sargan}\) using the first nonscaling indicator, Z\(_{\textrm{2}}\). Below is the Z\(_{\textrm{2}}\) equation with the estimates of its intercept, factor loading, and error in the one factor model,

$$\begin{aligned} Z_{2}=\hat{\alpha }_{2}+\hat{\Lambda }_{21}Z_{1}+\hat{\varepsilon }_{2}. \end{aligned}$$
(6)

The estimated residual is \(\hat{\varepsilon }_{2}=Z_{2}-\hat{\alpha }_{2}-\hat{\Lambda }_{21}Z_{1}\). Using OLS, regress \(\hat{\varepsilon }_{2}\) on the MIIVs for this equation. Take the \(R_{\hat{\varepsilon }_{2}}^{2}\) from this equation and multiply it by the sample size, and this is an estimate of \(T_{Sargan }(= \quad {NR}_{\hat{\varepsilon }_{2}}^{2})\) that asymptotically follows a Chi-square distribution with df \(=\) (\(N_{Z}\) – 3) in the one-factor model. Each overidentified equation has the Sargan test statistic calculated in the same way.

In brief, choose as the scaling indicator, the variable that results in the fewest number of problematic variables. If some indicators have the same number, then choose the variable with the highest \(R^{2}\) as the scaling indicator.

(4) Determine whether another factor should be added to the model.

Count the number of significant overidentification tests and non-significant factor loadings (i.e., problematic variables). If two or more problematic variables exist, then create an additional factor that has these variables loading on it and go to step (3) to select the scaling indicator of that factor. If none or only one indicator has a statistically significant overidentification test, then the next step depends on whether there are one or more factors. If there is only one factor and one or no problematic variable, then stop and report this model as the final one. If there are two or more problematic variables, then create a new factor and load these problematic variables exclusively on this new factor. If the model still has more than one problematic variable after the creation of a new latent variable (and selection of scaling indicator in step 3), then go to step (5).

(5) Check for potential cross-loading variables

When there is still one or more problematic variables after a new factor is created, cross-load all problematic variables (PV1) on all existing factors and check for new problematic variables (PV2) after cross-loading. Remove all unnecessary cross-loadings except those of PV2. If PV2 is equal or less than one, then no need to create a new factor. If PV2 is more than 1, create a new factor and load PV2 exclusively on this new factor and move on to step (6).

(6) Estimate and check the new factor model.

Re-estimate the model and overidentification tests.

(7) Repeat (3) to (6).

Continue adding factors and removing nonsignificant factor loadings until either there is no or only one problematic variable or there are no more overidentified equations.

(8) Assess model modifications against substantive theory.

As with most data-driven methods, the results should be treated with caution. Researchers need to assess the modifications such as new factors, cross-loadings, or correlated errors against the substantive knowledge in the field of application. There is no purely mechanical way to do this. Rather, researchers need to assess whether the created factors, the indicators that load on them, or the added cross-loadings are consistent with our understandings of the field. If not, researchers can ignore the implausible modifications in place of alternatives that are more defensible.

Comments on MIIV-EFA Approach

We would like to emphasize several characteristics of the MIIV-EFA approach. As the above procedure describes, we use instrumental variable estimators(Bollen, 1996) to estimate the measurement intercepts and factor loadings. As we explained above, the instruments are variables that correlate with the right-hand side endogenous variables but do not correlate with the composite error. The MIIVs are observed variables (indicators) that are already part of the model and that meet the conditions to be instruments, that is, they correlate with the right-hand side variables and are uncorrelated with the composite error according to the specified model. An algorithm finds the MIIVs for each equation in the model(Bollen, 1996; Bollen and Bauer, 2004; Fisher et al., 2021).

Two-stage least squares (2SLS) and generalized method of moment (GMM) are two instrumental variable estimators proposed for this MIIV approach (Bollen, 1996; Bollen et al., 2014). We use MIIV-2SLS because of the availability of single equation overidentification test statistics (Kirby and Bollen, 2009) that we use when assessing different number of factors, correlated errors, or cross-loadings, though MIIV-GMM is another estimation option. These overidentification test statistics are helpful in finding misspecifications in the model because if the model were correct, then all MIIVs would meet the conditions of instruments. The Sargan (1958) overidentification test statistic is a diagnostic that helps to detect when we have too few factors, omitted cross loadings, or omitted correlated unique factors.

When we have hypotheses such as the number of factors or specific correlated errors, our selection of MIIVs is modified according to these hypotheses. In all cases, we use the algorithm programmed in MIIVsem R package (Fisher et al., 2021) for selection of MIIVs.

3 Illustrations

In this section, we illustrate the MIIV-EFA algorithm. We use both simulation and empirical examples. The simulation examples have the advantage that we know the true data generating model (DGM) while for the empirical example we do not. For those variables that load on more than one factor, we distinguish between the largest or primary loading and the secondary or cross-loading. The performance of the MIIV-EFA algorithm on simulations is evaluated by answering three questions: (1) how many times the true number of factors is recovered, (2) how many times the primary loading structure is recovered correctly, and (3) how many times each individual path is recovered correctly, including both primary and secondary loadings. We begin with eight simulation examples.

3.1 Simulated Data

We simulated data with one, two, three, and four latent factors that contained zero, one, and two cross-loadings for a total of eight simulations. The number of ‘observed’ variables in the simulations ranged from 4 to 20 (see Table 2). Each simulation example was simulated with a sample size of 60, 100, 500, and 1000, which resulted in a total of 32 individual simulated datasets. Each simulation has 500 replications. All latent factors are N(0,1) with a covariance of.5, and the errors are N(0,1). All scaling indicators for each factor in all simulations have factor loadings of 1 while the remaining primary loadings range from .6 to .8, and secondary or cross-loadings range from .3 to .4. With these parameters, the scaling indicators have reliabilities of about .6. We use the R package mnormt for multivariate normal distribution generation and provide the code in the online supplementary material. Table 2 summarizes the detailed conditions for each simulation regarding the number of latent factors, the number of variables, and the presence and loadings of cross-loading variables.

Simulations with the same number of factors have all identical parameters except for cross-loading parameters. For example, simulation 7 only differs from simulation 6 in that it adds a cross-loading to Z\(_{\textrm{15}}\) on factor 1. We use MIIVefa R package to run the algorithm on all eight simulation conditions (32 individual simulated datasets). In addition, we use a significance level of .05, .01, and .005 for testing the Sargan’s test and loading coefficients when running MIIVefa with the purpose to evaluate the potential influences of different significance levels on the performance of the algorithm. This resulted in 48,000 (8 simulation conditions * 500 repetitions * 4 sample sizes * 3 significance levels) MIIVefa outputs for us to evaluate. We start with examining the recovery of the correct dimensionality first.

Table 2 Simulation conditions.

The number of factors is counted as correct if it matches the number of factors in the true data generating model (DGM). The results of dimensionality recovery by simulation condition, sample size, and significance level are represented in Fig. 1. The x-axes represent sample sizes, and the y-axes represent the number of times the correct number of factors were recovered. Because each simulation condition by sample size has 500 repetitions, a perfect (100%) recovery would be 500, as shown as the maximal y-axis value in Fig. 1.

Fig. 1
figure 1

Recovery of correct number of factors using MIIVefa.

Perhaps most striking in Fig. 1 is the excellent recovery of the number of factors in the sample sizes of 100 or more. In nearly all replications the correct number of factors results regardless of the significance levels. It is only in sample size of 60 that differences are noticeable. Here using the 0.05 significance level has the best performance compared to the 0.01 and 0.005 levels of significance, and indeed, the differences are notable for some of the models. To the degree that these results generalize, the significance level has only a minor impact when the sample is 100 or more, but for smaller samples the use of the 0.05 significance level would be best in recovery the true number of factors. It is interesting to note that some authors recommend a sample of at least 100 for EFA (Cattell, 2012; Kyriazos, 2018) and our results suggest that at these sample sizes there will be excellent recovery of the number of factors.

As the name primary loading recovery suggests, Fig. 2 shows the number of times that all the main or primary loadings of variables in the true DGM are recovered with our algorithm. For example, in the case of simulation 3, if for one repetition all main loadings of Z\(_{\textrm{1}}\) to Z\(_{\textrm{7}}\) are recovered but not for Z\(_{\textrm{8}}\), it would be an incorrect primary loading recovery because of Z\(_{\textrm{8}}\). Therefore, the y-axis has a maximum number of 500, the number of repetitions for each simulation condition, as the primary loading recovery measure happens at the repetition level.

Compared to Fig. 1, the overall recovery decreased in Fig. 2, especially at smaller sample sizes, which is expected as the primary loading recovery is a stricter measure than recovering the number of factors. Like the patterns present in Fig. 1, the primary loading recovery increased when the sample size increased and when the true DGMs were less complex. In addition, a significance level of 0.05 appears to outperform 0.01 and 0.005 especially at smaller sample sizes. However, this relation was flipped at larger sample sizes for more complex true DGMs as shown in the plots for simulation 7 and 8, where the dotted line (p\(=\)0.05) became the lowest although all recovery rates are high at the large sample sizes.

The main difference between Figs. 1 and 2 appears to be at N\(=\)100. While the number of factors recovery became almost perfect after increasing the sample size from 60 to 100, this was not the case for primary loading recovery when the true DGM was more complex (as shown in the plots for simulation 5–8) and when the significance level was at either 0.01 or 0.005. However, for a significance level of 0.05, the primary loading recovery was still almost perfect for the first 6 simulations and was above 95% for the latter more complex DGMs (simulation 7 and 8).

Fig. 2
figure 2

Primary loadings recovery.

Note that although it is termed primary loading recovery, it is nevertheless a very strict measure—failing to recovery even one main loading in the true DGM would be counted as incorrect. For example, if say, Z\(_{\textrm{8}}\) was only recovered on the secondary factor but not on the first factor (the primary loading), the recovery would be counted as incorrect. This failing to capture cross-loadings was the main reason for the decrease in recovery rates from Figs. 1 and 2, which we explain in more details in the sensitivity and specificity section.

Figure 3 contains the most stringent standard for evaluating our EFA procedure. It looks at the recovery of both the primary and secondary loadings of the DGM. Note that for true DGMs without cross loadings, the primary loading recovery would be identical to the primary and cross-loading recovery, which means that the plots for simulation 1, 2, 4, and 6 are identical in Figs. 2 and 3.

Fig. 3
figure 3

Primary and cross-loadings recovery.

The general pattern of the influences of sample sizes and significance levels on the primary and cross-loading recovery as shown in Fig. 3 resembles that of Figs. 1 and 2. However, when compared to Fig. 2, the decrease in recovery is more noticeable at N \(=\) 100. Even at a significance level of 0.05, the exact data structure recovery was at most 70% (simulation 3) and decreased to under 20% when the true DGM became more complex. However, when the sample size was 500 or more, MIIVefa was able to recover the exact data structure almost 100% even for complex DGMs such as simulation 8 that have 20 variables with multiple cross-loadings.

A better recovery at a significance level of 0.01 and 0.005 (around 100%) compared to 0.05 (around 95%) at larger sample sizes also became more noticeable in Fig. 3, which was only evident in Fig. 2 for simulation 8 and is not present in Fig. 1. The three figures show that for different recovery purposes (e.g., number of factors recovery versus primary and secondary loadings recovery) and for different sample sizes, a different significance level could be applied to reach the best recovery. For example, for a large sample size with potentially a complex data structure, MIIVefa at a significance level of 0.01 or 0.005 would be expected to have better primary and cross-loading recovery than a significance level of 0.05; similarly, for a very small sample size, MIIVefa at a significance level of 0.05 would be expected to have a better recovery of the number of factors. In addition, a significance level of 0.05 would be a good default significance level as it had the most consistent results across sample sizes and data structures for all three recovery measures as shown in Figs. 12, and 3.

Fig. 4
figure 4

Sensitivity (“true positives”) plots for primary and cross-loadings in simulations.

3.1.1 Sensitivity and Specificity

We also calculated sensitivity (“true positives”) and specificity (“true negatives”) for all 8 conditions across all 4 sample sizes. Unlike the previous recovery measures which concern the overall model structure for each repetition, we calculated sensitivity and specificity for each simulation condition by all 500 repetitions. The overall sensitivity results are summarized in Fig. 4. Note that because each simulation condition has different number of true positive paths in the true DGMs, the y-axis is now a percentage with a maximum of 100% instead of a raw number of 500 representing the number of repetitions as shown in Figs. 12, and  3. Sensitivity was more influenced by significance levels at smaller sample sizes while a significance level of 0.05 having the highest sensitivity, resembling the pattern in previous figures. Different from Fig. 3 in which the primary and secondary loadings recovery rates are still very low at a sample size of 100, sensitivity rates are above 85% even for the most complex simulation 8 for all three significance levels. Once the sample size reached 500, MIIVefa had almost perfect 100% sensitivity across all 8 simulations.

Note that the previously discussed primary and cross-loading recovery rates are stricter measures than sensitivity. For example, for one repetition in one simulation, it needs to have a 100% sensitivity, aka all true positives in the true DGM are recovered correctly, to have a correct primary and cross-loading recovery. Therefore, sensitivity measures are always higher than or equal to primary and cross-loading recovery rates. The rather large discrepancy between sensitivity measures and primary and cross loading recovery rates, especially at small sample sizes such as 60 and 100 when cross-loadings are present in the true DGMs as shown in Figs. 3 and 4, largely stems from the failure to recover all loadings for cross-loading variables. Please refer to Appendix 2 and Fig. 8 for more details on sensitivity for just cross-loading variables.

In addition to sensitivity measures, we also calculated specificity for all eight simulations as summarized in Fig. 5. Note that because simulation 1 is a simple four variable one-factor model which means there is no true negatives in the DGM, it cannot be calculated for specificity. Again, like the pattern in previous figures for sensitivity measures and data structure recovery rates, specificity is the lowest and more influenced by significance levels at a sample size of 60. However, once the sample size increased to 100, MIIVefa specificity reached almost 100% for all significance levels. Furthermore, even at a sample size as small as 60, MIIVefa had specificity above 95% across all 8 simulations at a significance level of 0.05. This means that MIIVefa performed very well in terms of not recovering path relations that are not in the true DGMs even for very small sample sizes.

Fig. 5
figure 5

Specificity (“true negative”) plots for all simulations.

3.1.2 Extra Simulations

To explore the performance of MIIVefa under additional conditions, we created two simulation examples based on our simulation 2, which is a simple two-factor, eight-variable condition as introduced earlier. The first modification changes the distribution of observed variables’ error terms from normal to log-normal to explore how MIIVefa performs under nonnormality. The first graph in Fig. 6 repeats, for easy comparison, the plot of the recovery of the correct primary loadings under simulation 2 with errors of observed variables from normal distributions as shown in Fig. 1. The second plot in Fig. 6 is the new simulation 2 with errors from a log-normal distribution. In other words, it is the recovery of how many times X\(_{1}\) to X\(_4\) and X\(_5\) to X\(_8\) were correctly recovered on two separate factors. The most obvious difference is at the smallest sample size of 60 and 100. Under normal errors, the recovery of primary loadings is excellent (more than 80%) across all sample sizes. In contrast, under log-normal distributions the recovery of primary loading also is excellent for sample sizes of 100 or more, but declines greatly for N\(=\)60. Interestingly, the performance under nonnormality is close to normality even at N\(=\)100 if the cutoff p-value is 0.05, whereas lower p-values lead to less accuracy.

Fig. 6
figure 6

Primary loadings recovery with log normal errors.

Our second modified model is including a pair of correlated errors between X\(_{\textrm{3}}\) and X\(_{\textrm{4}}\) to explore the performance of MIIVefa with correlated errors which are mistakenly omitted in the model search procedure. Again, the code to simulate and analyze the secondary simulations is in the online OSF supplementary.

One complication that emerges with correlated errors is what counts as a correct model recovery? It is well known that a CFA model with a single correlated error pair has an equivalent chi-square and degrees of freedom to a model with the exact same structure, except that the correlated error pair is replaced by a new factor that is uncorrelated with the other factors, and has both loadings constrained to one and pointing toward the two indicators that had had the correlated errors. So, if an algorithm chooses a model that replaces the correlated errors with a new factor, it seems inaccurate to count this against the algorithm as a wrong model. After all, both models have identical chi square fits and degrees of freedom.Footnote 7 Relatedly, in Bollen (1987) it was found that a one-factor model with correlated error is equivalent to a two-factor model where the indicators with correlated errors load exclusively on a second factor and the other indicators load exclusively on the first factor which suggests another possible model with the same or similar fit. 

For these reasons, we separately count the number of times that: (1) a two-factor model is correctly recovered with primary loadings, (2) a three-factor model is correctly recovered with primary loadings, and (3) either a two- or three-factor model is correctly recovered with primary loadings when we count the recovery of the model. Though this complicates the reporting of the results, it appears to be the most complete way to gauge whether the correct number of factors is recovered. Figure 7 reports the results. The leftmost plot shows the number times that the primary loadings of the two-factor model are recovered and this is most common at the smaller sample sizes (60, 100). As the center graph shows, three factors become increasingly more common as the sample size grows. Finally, the two- or three-factor solutions are the most common solutions across all sample sizes with some variations across the choice of p-values. This exploration reveals the complications that correlated errors give rise to in that defining the number of factors because of the presence of equivalent models while at the same time it reveals the ability of the our MIIV EFA algorithm in recovering the equivalent models.

Fig. 7
figure 7

Recovery of two-factor, three-factor, or two- or three-factor primary loadings with simulation 2 and correlated errors.

4 Empirical Examples

4.1 Empirical Example #1 Holzinger and Swineford

The first example comes from Holzinger and Swineford (1939), and it consists of mental ability test scores seventh- and eighth-grade children from two different schools. The original dataset contains a total of 26 tests, but a subset of 9 tests is more common in the literature (e.g., Jöreskog, 1969) and the raw data for this example are available as part of the lavaan R package (Rosseel, 2012).Footnote 8 These 9 variables in this empirical data (\(N=\)301) are scores for tests on: visual perception, cubes, lozenges, paragraph comprehension, sentence completion, word meaning, speeded addition, speeded counting of dots, and speeded discrimination on straight and curved capitals.

Table 3 MIIVefa recovered model factor loadings for Holzinger data.

Researchers often use this subset of Holzinger data to test a hypothesized three factor model of mental abilities on visual, textual, and speed tests, without any cross-loading, using CFA. Our algorithm MIIVefa recovered an almost identical model, except that it also recovered one cross-loading: the speeded discrimination of straight and curved capitals on the hypothesized visual factor. The factor loadings are presented in Table 3. We use a significance level of 0.01 for these empirical data (\(N=\)301) because our simulation results suggested that a smaller significance level performed slightly better at recovering primary and cross-loadings at larger sample sizes. Note that as previously mentioned, because MIIVefa cleans up non-significant factor loadings before printing the final recovered model. The output differs from a traditional EFA output which contains factor loadings of all variables on all factors. The recovered cross-loading is an interesting finding and is empirically reasonable. The speeded discrimination test on straight and curved capitals involves mental abilities of both differentiating between types of capitals visually and being able to do so quickly, and therefore, it naturally loads onto both visual and speed factors.

4.2 Empirical Example #2 Access to Family Planning Facilities in Tanzania

The second empirical illustration uses a subset of variables from the 1993 Tanzania Accessibility Survey (Mroz et al., 1999). We use measures of subjective assessments of accessibility to family planning clinics in rural communities in Tanzania. Six informants, three females (arbitrarily numbered 1, 2, 3) and three males (arbitrarily numbered 4, 5, 6), were asked to rate both the accessibility (access1 to access6) and how easy it is to get to the clinics (easy1 to easy6). Informant data were collected for 220 rural communities and the rural community is the unit of analysis.

Because the same informant assessed both accessibility and easy to use, it is reasonable to assume that the measurements from the same informant have correlated errors—e.g., access1 and easy1, as they are rated by the same female informant. We ran the algorithm on the data with and without the inclusion of correlated errors between variables rated by the same informant.

Table 4 MIIVefa recovered model factor loadings for family planning data.

When we exclude the correlated errors from the model search, MIIVefa arrived at a seven-factor model—each of the five first factors contained the two assessments by the same informant while for one informant the accessibility and easy were further divided into two separate factors. Table 4 reveals that the MIIVefa solution was simplified by including the correlated errors which resulted in a two-factor model with all male rated assessments on the first factor, and all female rated assessments on the second factor. The results demonstrated that the perceived accessibility of the same clinic differs between male and female informants.

4.3 Empirical Example #3 Fisher Negative Affect Data

The third empirical example comes from Fisher and Boswell (2016). The original data contain 40 individual level intensive repeated measures on 26 mood measures for individuals who had generalized anxiety disorder (GAD) and/or major depressive disorder (MAD) for four times a day for about 30 days. We examined a subset of 8 variables that associated with negative affect: irritable, restless, worried, guilty, anhedonia, hopeless, down, and concentrate, for all 40 individuals (sample size: mean\(=\)113.55, min\(=\)90, max\(=\)151; days: mean\(=\)32.5, min\(=\)28, max\(=\)54). We used a significance level of 0.05 instead of 0.01 as in the previous two empirical examples due to the small sample size, as our simulation results suggested that MIIVefa with a significance level of 0.05 performed better on smaller sample sizes.

Having intensive repeated measures allows researchers to analyze intraindividual differences and to arrive at individual-level models that are person specific. Here we represent three models (individual A, B, and C) recovered by MIIVefa that are the most representative. Factor loadings are summarized in Table 5. Individual A had a clean one-factor model, which means that all eight variables load solely on one negative factor. Individual B had a clean two-factor model, with irritable, restless, worried, and concentrate on one factor and the rest (anhedonia, hopeless, down, and guilty) on the other. The two factors with their corresponding indicators can be interpreted as two separate factors of anxiety and depression. This suggests that while for some people these two factors are internalized as one concept, some other people might see depression and anxiety very differently. Individual C had a two-factor model that is similar to individual B, but with anhedonia being cross-loaded on both factors, which means that although individual C differentiates anxiety and depression similarly as individual B, they have symptoms that could be an indicator for both. This furthermore demonstrates the inter-individual variability that has been found to be present in many psychological processes (Molenaar and Campbell, 2009). For this empirical example specifically, this means that some people are more likely to have comorbidity with GAD and MAD than others (for example, individual A more than individual B and C), and should potentially receive different treatments.

Table 5 MIIVefa recovered model factor loadings for Fisher’s negative affect for individuals A, B, and C.

5 Conclusions

Analysts have been developing factor analysis methods for well over a century since Spearman (1904) first proposed it. This has resulted in many advances ranging from new procedures for determining the number of factors to novel methods of rotating factor solutions. The possibility of using instrumental variables in exploratory factor analysis has been known since Madansky (1964), but the attention given to instruments in factor analysis has been extremely limited. We proposed an original instrumental variable approach to factor analysis that makes use of MIIVs (Bollen, 1996), overidentification tests (Sargan, 1958), and ways to automate the determination of the number of factors and the presence of cross-loadings. This MIIV-EFA method has many valuable aspects. It uses a MIIV-2SLS estimator which is asymptotically distribution free, so normality or no excess kurtosis of the variables need not be assumed for significance tests of measurement intercepts and factor loadings. It is noniterative so nonconvergence is never an issue. There is no rotation problem because the factor analysis models constructed are identified. In addition, the procedure allows partial input from users who can specify correlated errors when they are theoretically predicted. Furthermore, the MIIV-2SLS has an equation specific overidentification test statistic which provides a local measure of fit that is absent in other approaches to EFA. Finally, we have an R package MIIVefa to implement MIIV-EFA. Beyond presenting the algorithm, we illustrated it with simulations with known DGMs and three empirical examples with unknown DGMs. We found that MIIV-EFA was able to frequently recover the DGMs for the simulation data and came up with plausible models with the empirical examples.

Asparouhov and Muthén (2009) proposed an exploratory structural equation modeling (ESEM) approach that combines features of CFA and SEM with exploratory factor analysis (EFA). Specifically, they permit, an EFA structure to be embedded in a SEM where the EFA part gives rotated factors with cross-loadings and correlated errors that might improve model fit. For a recent overview of ESEM, see van Zyl and ten Klooster (2022). The MIIV-EFA approach differs from ESEM in several ways. First, the MIIV-EFA does not use rotation methods, so there is no need to choose between different methods of rotation. Second, MIIV-EFA estimates single equations rather than the full system of equations. Single-equation estimators like MIIV-2SLS appear more robust to misspecification of models (Bollen et al., 2007), and this should carry over to MIIV-EFA. Relatedly, the MIIV-2SLS estimator is noniterative and does not have the convergence issues that are present with systemwide iterative estimators. In addition, MIIV-EFA has equation specific overidentification tests that can help pinpoint problematic equations. Finally, MIIV-EFA has built in procedures to choose the number of factors.

Though the results for MIIV-EFA are encouraging, we recognize that limitations remain. One limitation is multiple hypothesis testing with the Sargan overidentification test and tests of factor loadings. An overidentification test applies to each overidentified equation and a significance test evaluates factor loadings for each different model that the algorithm creates. If we knew how many tests were to be done in advance, we could apply corrections such as that of Holm (1979) or false discovery rates. But we generally do not know how many tests are implemented until the algorithm ends its search. Our package has the option to control the Type I error rate and we could adjust, but to what value? At the same time, our simulations suggested that higher Type I errors without adjustment often worked better at smaller sample sizes.

Another challenge to EFA in general and to MIIV-EFA as well is the issue of correlated errors. Most EFA methods assumed away correlated errors. MIIV-EFA is less restrictive in that it allows users to specify specific correlated errors that occur with repeated and/or across-group measures or other mechanisms that are hypothesized in advance. However, we do not automate the process of including correlated errors in our algorithm. We experimented with such an option but found that our algorithms were less successful. Part of the challenge is that a pair of correlated errors fits the same as a model that introduces a factor that directly affects the indicators to replace the correlated errors.Footnote 9 Determining whether new factors or correlated errors are appropriate is complicated and requires more study. Our extra simulations illustrated some of the issues.

We also did not discuss how factor score estimates are possible with a model that emerges from MIIV-EFA. The MIIV-EFA procedure provides estimates of the factor loadings and the measurement intercepts. Factor score estimates such as the regression method also need estimates of the covariance matrices of the factors and of the unique factors (“errors”). Bollen (1996, pp 116–117) recommends that the intercepts and factor loadings be entered as fixed values and that the remaining variance and covariance parameters be estimated using maximum likelihood or another system wide estimator that are widely available in SEM software. These would provide consistent estimates of the variances and covariances of the factors and unique factors. Researchers could then use the regression method or other factor score methods that form weighted functions of the observed variables to provide predicted values of all factors in the model. Furthermore, this same approach could give Bollen and Arminger (1991) residuals for individual cases.

Some might wonder how to handle binary or ordinal indicators in factor analysis. Though we did not develop the algorithm for categorical variables, it is a straightforward extension. Bollen and Maydeu-Olveras (2007), Fisher and Bollen (2020), and Jin et al. (2021) developed a polychoric correlation approach to analyzing mixed types of variables along with asymptotic standard errors and significance tests. Jin and Cao (2018) created an overidentification test analogous to the Sargan (1958) test, but for categorical indicators. These ingredients mean that a MIIV-EFA for categorical and continuous measures should be available.

Finally, we reported the MIIV-EFA algorithm with the best performance in recovering DGMs that we examined. In Appendix 1, we report two other algorithms that we tried but did not work quite as well. We would be surprised if there were no other algorithms that could be devised and that might work even better than what we have proposed. Our goal was to offer a new approach to EFA that uses MIIVs and to demonstrate a “proof of concept.” Our hope is that even the promising algorithm that we developed can be improved upon.