1 Introduction

Structural equation modeling (SEM) has become a quasi-standard tool for analyzing complex inter-relationships between observed and latent variables (Kaplan 2002). Two conceptually different approaches to SEM have been proposed: factor- and composite-based SEM (Jöreskog and Wold 1982; Rigdon et al. 2017). In factor-based SEM, as strongly influenced by the psychometric or psychological measurement tradition, unobservable conceptual variables are approximated by common factors under the assumption that each latent variable exists as an entity independent of observed variables, but serves as the sole source of the associations among the observed variables. On the contrary, composite-based SEM—influenced by traditional multivariate statistical techniques such as principal component analysis and canonical correlation analysis (e.g., Horst 1936, 1961; Hotelling 1933, 1936; Pearson 1901; Spearman 1913)—represents a latent variable by a weighted composite (or component) of observed variables, assuming that it is a deterministic aggregation of observed variables.Footnote 1

Partial least squares path modeling (PLSPM; Wold 1966, 1973, 1982; Lohmöller 1989) and generalized structured component analysis (GSCA; Hwang and Takane 2004, 2014) are full-fledged approaches to composite-based SEM and are comparable in scope and capability, although many multivariate methods can also be considered to fall into the domain of composite-based SEM (Hwang and Takane 2014, see Chapter 2). Particularly PLSPM has gained massive dissemination during the last decade, especially in the social sciences (e.g., Ali et al. 2018; Hair et al. 2012; Ringle et al. 2019), but also in other fields of scientific inquiry such as agricultural science, engineering, environmental science, and medicine (e.g., Avkiran 2018; Sarstedt 2019; Willaby et al. 2015).

Reflecting on the increasing prominence of the method, Khan et al. (2019) recently presented a social network analysis of methodological PLSPM research. Specifically, using 84 methodological studies published in 39 journals by 145 authors from 106 universities as input, their results show that the PLSPM knowledge network is rather fragmented, with authors working in partly isolated silos. An additional burst detection analysis indicates that method comparisons and extensions, for example, to estimate common factor model data (e.g., Dijkstra and Henseler 2015) or to leverage PLSPM’s predictive capabilities (e.g., Shmueli et al. 2016), feature prominently in recent research. While Khan et al. (2019) outline the PLSPM’s domain knowledge infrastructure and identify prominent topics via simple word counts in the studies’ titles and abstracts, their study offers no insights into the semantic relationships among key topics covered in prior research. However, recognizing these relationships is important for understanding the domain’s current state of research and identifying future research opportunities. Furthermore, Khan et al.’s (2019) analysis focuses on PLSPM and did not investigate methodological research on GSCA, which has attracted considerable research attention among users of composite-based SEM and methodologists alike (e.g., Hwang et al. 2017; Jung et al. 2018; Suk and Hwang 2016).

Addressing these concerns, this study sets out to identify dominant topics that characterize the joint PLSPM and GSCA research domain. For this purpose, we apply a two-stage approach to uncover the structure in text corpora by identifying links between dominant topics via the co-occurrence of words within their textual contexts (Smith and Humphreys 2006). This approach differs from co-citation analysis, which investigates the subject similarity among central articles in a research stream by counting the number of joint citations (White and Griffith 1981). It also extends the social network and burst detection analyses employed by Khan et al. (2019), which focus on analyzing relationships among authors, institutions, and countries in the form of co-authorships and simple word pairs to identify salient topics in the field. Instead, our analysis identifies semantic patterns from lexical co-occurrence information extracted from methodological publications on PLSPM and GSCA.

In what follows, we first discuss similarities and differences between PLSPM and GSCA in model specification, parameter estimation, and results evaluation. Describing and contrasting the foundations of PLSPM and GSCA allows establishing the grounds for a unified view on composite-based modeling, which is important to path the way for future method developments and application practices in studies. We then introduce the method and the data used in our concept analysis of methodological research on PLSPM and GSCA. The next section depicts the dominant topics derived from the analysis and puts these into relationship with recent research on related topics. In doing so, we differentiate between two periods, 1979–2013 and 2014–2017, to disclose trending and fading topics.

2 PLSPM and GSCA: similarities and differences

While PLSPM and GSCA share the same objective of analyzing complex inter-relationships between observed and latent variables, both methods differ in the way they achieve this aim. In the following, we highlight several similarities and differences, focusing on aspects related to model specification, estimation, and results evaluation.

2.1 Model specification

Model specification in PLSPM and GSCA involves two sub-models—the measurement (or outer) models and the structural (or inner) model. The measurement model is used to specify the relationships between indicators and latent variables, whereas the structural model expresses the relationships between latent variables.

Let z and γ denote vectors of all indicators and latent variables, respectively. PLSPM and GSCA contemplate the following measurement model:

$${\mathbf{z}} = {\mathbf{C\gamma }} + {\varvec{\upvarepsilon}},$$
(1)

where C is a matrix of loadings relating the indicators (z) and the latent variables (γ), and ε is the disturbance term of the indicators. The loading is the zero-order correlation between a latent variable and an indicator.

PLSPM additionally considers another measurement model, where a latent variable (γ) is modeled as a linear function of its associated indicators (z):

$${\varvec{\upgamma}} = {\mathbf{Hz}} + {\varvec{\uptheta}},$$
(2)

where H is a matrix of indicator weights derived from a regression of each latent variable on the indicators of its measurement model, and θ is the disturbance term of the latent variables.

The measurement model specification in Eq. (1) is typically associated with the term “reflective measurement model”, where the indicators are viewed as imperfect reflections of the underlying construct (MacKenzie et al. 2011). However, this can be misleading in composite-based SEM, because it is typically used in the context of common factor models to indicate that a latent variable “causes” the indicators to covary. In other words, when controlling for the impact of the latent variable, the indicator correlations are zero, also known as the axiom of local independence (Lazarsfeld 1959). Similarly, in PLSPM, researchers typically associate the measurement model specification in Eq. (2) with the term “formative measurement model”, where the indicators combine to form the construct (MacKenzie et al. 2011). However, the terms “reflective” and “formative” refer to the theoretical specification of a construct, which is different from how PLSPM and GSCA statistically estimate the models. No matter if one specifies the measurement model according to Eqs. (1) or (2), composite-based SEM methods such as PLSPM and GSCA always compute weighted composites of indicators to represent the latent variables in the statistical model (Rigdon et al. 2017; Sarstedt et al. 2016).

In PLSPM and GSCA, the structural model of the relationships between the latent variables (γ) can be generally expressed as:

$${\varvec{\upgamma}} = {\mathbf{B\gamma }} + {\varvec{\upzeta}},$$
(3)

where B is a matrix of path coefficients and ζ is the disturbance term of the dependent latent variables. In addition to the measurement and structural models, GSCA has another sub-model, called the weighted relation model, which explicitly defines latent variables (γ) as weighted composites of indicators (z), as follows:

$${\varvec{\upgamma}} = {\mathbf{Wz}},$$
(4)

where W is a matrix of (component) weights assigned to indicators. A key difference in model specification is whether the aforementioned sub-models are combined into a single formulation for specifying an entire structural equation model. GSCA integrates its sub-models into a unified formulation (i.e., a single equation), as follows:

$$\begin{aligned} \left[ {\begin{array}{*{20}c} {\mathbf{z}} \\ {\varvec{\upgamma}} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\mathbf{C}} \\ {\mathbf{B}} \\ \end{array} } \right]{\varvec{\upgamma}} + \left[ {\begin{array}{*{20}c} {\varvec{\upvarepsilon}} \\ {\varvec{\upzeta}} \\ \end{array} } \right] \\ \left[ {\begin{array}{*{20}c} {\mathbf{z}} \\ {{\mathbf{Wz}}} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\mathbf{C}} \\ {\mathbf{B}} \\ \end{array} } \right]{\mathbf{Wz}} + \left[ {\begin{array}{*{20}c} {\varvec{\upvarepsilon}} \\ {\varvec{\upzeta}} \\ \end{array} } \right] \\ \left[ {\begin{array}{*{20}c} {\mathbf{I}} \\ {\mathbf{W}} \\ \end{array} } \right]{\mathbf{z}} = \left[ {\begin{array}{*{20}c} {\mathbf{C}} \\ {\mathbf{B}} \\ \end{array} } \right]{\mathbf{Wz}} + \left[ {\begin{array}{*{20}c} {\varvec{\upvarepsilon}} \\ {\varvec{\upzeta}} \\ \end{array} } \right] \\ {\mathbf{Vz}} = {\mathbf{AWz}} + {\mathbf{e}} \, , \\ \end{aligned}$$
(5)

where I is an identity matrix, V = \(\left[ {\begin{array}{*{20}c} {\mathbf{I}} \\ {\mathbf{W}} \\ \end{array} } \right]\), and A = \(\left[ {\begin{array}{*{20}c} {\mathbf{C}} \\ {\mathbf{B}} \\ \end{array} } \right]\). This is called the GSCA model. Note that this model can also be expressed as:

$$\begin{aligned} \left[ {\begin{array}{*{20}c} {\mathbf{I}} \\ {\mathbf{W}} \\ \end{array} } \right]{\mathbf{z}} = \left[ {\begin{array}{*{20}c} {\mathbf{0}} & {\mathbf{C}} \\ {\mathbf{0}} & {\mathbf{B}} \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\mathbf{I}} \\ {\mathbf{W}} \\ \end{array} } \right]{\mathbf{z}} + \left[ {\begin{array}{*{20}c} {\varvec{\upvarepsilon}} \\ {\varvec{\upxi}} \\ \end{array} } \right] \\ {\mathbf{u}} = {\mathbf{Tu}} + {\mathbf{e}} \, , \\ \end{aligned}$$
(6)

where u = \(\left[ {\begin{array}{*{20}c} {\mathbf{I}} \\ {\mathbf{W}} \\ \end{array} } \right]{\mathbf{z}}\) and T = \(\left[ {\begin{array}{*{20}c} {\mathbf{0}} & {\mathbf{C}} \\ {\mathbf{0}} & {\mathbf{B}} \\ \end{array} } \right]\). This model is essentially of the same form as the reticular action model (RAM; McArdle and McDonald 1984), which is mathematically the most compact one amongst several formulations for factor-based SEM, including the LISREL (Jöreskog 1970; Jöreskog 1973) and the Bentler–Weeks (Bentler and Weeks 1980) models. The difference between GSCA and RAM is that GSCA defines latent variables as composites; that is, γ = Wz in Eq. (4), whereas the RAM defines it as a (common) factor. On the other hand, PLSPM does not combine its sub-models into a single equation as GSCA does based on Eq. (5). This dissimilarity in the number of equations needed for the entire model specification in turn leads to differences in the set-up of the parameter estimation algorithms used by the two approaches.

2.2 Model estimation

Model estimation in PLSPM—as implemented in most software programs—draws on Lohmöller’s (1989) extension of Wold’s (1982) original PLSPM algorithm, which belongs to the family of (alternating) least squares algorithms (Mateos-Aparicio 2011). PLSPM estimates model parameters such that the model’s residual variances are minimized (Jöreskog and Wold 1982). To achieve this aim, PLSPM carries out two computational stages sequentially. The first stage returns the latent variable scores as weighted sums of their associated indicators by either using correlation weights (Mode A) or regression weights (Mode B) per measurement model. Correlation weights are technically equivalent to zero-order correlations between a latent variable and each of its assigned indicators. Regression weights result from regressing a latent variable on its associated indicators. In this first stage, an iterative four-step algorithm is used to estimate the weights per measurement model. After convergence (i.e., between iterations, the sum of weights changes becomes very small; e.g., < 0.0000001), the weights are used to compute the latent variable scores as linear combinations of their indicators. The second stage uses the latent variable scores as input in a series of ordinary least squares regressions to estimate the final outer loadings (Eq. 1), outer weights (Eq. 2), the structural model path coefficients (Eq. 3), and the R2 values of the dependent latent variables. The second stage is, thus, non-iterative and simply based on the latent variable scores obtained from the first stage. Consequently, the first stage is the most crucial in PLSPM (Hanafi 2007).Footnote 2

GSCA has the goal to minimize a single optimization criterion of the model. Let zi denote a vector of indicators measured on a single observation of a sample of N observations (i = 1, …, N). To estimate weights (W) and path coefficients and loadings (A) in Eq. (5), GSCA seeks to minimize the sum of all squared residuals (ei) over N observations. This is equivalent to minimizing the following least squares criterion:

$$\varphi = \sum\limits_{i = 1}^{N} {{\mathbf{e}}_{i} '{\mathbf{e}}_{i} } = \sum\limits_{i = 1}^{N} {({\mathbf{Vz}}_{i} - {\mathbf{AWz}}_{i} )'({\mathbf{Vz}}_{i} - {\mathbf{AWz}}_{i} )} ,$$
(7)

with respect to W and A.

For this purpose, GSCA estimates all parameters in one stage by using an iterative algorithm with two steps named alternating least squares (ALS; De Leeuw et al. 1976). The ALS algorithm divides the entire set of parameters into two subsets—W and A. The algorithm begins by assigning arbitrary initial values to W and A, and subsequently carries out two steps per iteration. The first step obtains the least squares estimates of W by minimizing Eq. (7) only with respect to W, considering A fixed temporarily. The second step obtains the least squares estimates of A by minimizing the same criterion only with respect to A, while considering W constant. The two steps are repeated until convergence; for example, the change of the criterion value between iterations becomes smaller than a pre-determined threshold value (e.g., 0.0001). We refer to Hwang and Takane (2014, Appendix 2.1) for a detailed description of the ALS algorithm.

A main difference in parameter estimation is that GSCA optimizes a single criterion to estimate all parameters concurrently and utilizes all information available from the entire system of equations. On the contrary, PLSPM does not involve a single criterion, but rather splits its parameters into two sets and estimates each set iteratively by using a subset of equations at a time, which uses the results of the other subset as input. For these reasons, GSCA is a full-information method, whereas PLSPM is a limited-information method (Tenenhaus 2008). In general, full-information methods are known to be more efficient under correct model specification (Antonakis et al. 2010; Fomby et al. 2012, see Chapter 22), whereas limited-information methods tend to be robust to model misspecification (Bollen et al. 2007; Gerbing and Hamilton 1994). In addition, the estimation procedure of GSCA via ALS appears technically more straightforward and easier to understand than the procedure of PLSPM, which has been criticized for its complexity (e.g., McDonald 1996; Tenenhaus 2008). When using the PLSPM algorithm, for instance, researchers must choose to either use correlations weights (Mode A) or regression weights (Mode B) per measurement model. However, which choice best supports certain model estimation objectives is subject of further research (Dijkstra 2017). Primary results in this direction by Becker et al. (2013a) substantiate that correlation weights (Mode A) produce a higher out-of-sample predictive power under a broad range of conditions and better parameter accuracy when sample sizes are small and the model’s effect sizes are moderate to strong. But most importantly, the absence of a single optimization criterion in PLSPM makes it difficult to impose certain constraints (e.g., equality constraints) on parameters or to fix specific parameters (Tenenhaus 2008). On the contrary, GSCA allows to impose parameter constraints in its estimation procedure (e.g., Hwang and Takane 2014, Chapter 3).

PLSPM and GSCA have recently been extended to address the most common criticism of composite-based SEM that it has no formal way of modeling errors in indicators (e.g., Bentler and Huang 2014; Takane and Hwang 2018), although extracting a weighted composite from a set of indicators contributes to accounting for measurement error (Gleason and Staelin 1973; Henseler et al. 2014; Rigdon 2012). For example, recent research has brought forward consistent partial least squares (PLSc; Dijkstra 2010; Dijkstra and Henseler 2015). The method follows a composite modeling logic but mimics a common factor model (Sarstedt et al. 2016). To do so, the method first computes the model parameters using the standard PLSPM algorithm and correlation weights to obtain the results for the outer loadings, outer weights, path coefficients, and R2 values. Then, PLSc corrects these estimates for attenuation by using the constructs’ reliability coefficients ρA (Dijkstra and Henseler 2015):

$$\rho_{\text{A}} { \sim } = \left( {h^{{\prime }} h} \right)^{2} \cdot \frac{{h^{{\prime }} \left( {S - {\text{diag}}\left( S \right)} \right)h}}{{h^{{\prime }} \left( {hh^{{\prime }} - {\text{diag}}\left( {hh^{{\prime }} } \right)} \right)h}},$$
(8)

whereby \(h\) represents the estimated outer weights vector of the latent variable and S is the empirical covariance matrix of the latent variable’s indicators. PLSc also employs ρA to compute adjusted outer loadings \(\hat{c}\) as follows:

$$\hat{c} = h \cdot \frac{{\sqrt {\rho_{\text{A}} } }}{{h^{{\prime }} h}}.$$
(9)

In the structural model, PLSc adjusts the PLSPM correlations \({\text{corr}}\left( {\gamma_{i} ,\gamma_{j} } \right)\) between all pairs of latent variables \(\gamma_{i}\) and \(\gamma_{j}\) as follows:

$${\text{corr}}\left( {\tilde{\gamma }_{i} ,\tilde{\gamma }_{j} } \right) = \frac{{{\text{corr}}\left( {\gamma_{i} ,\gamma_{j} } \right)}}{{\sqrt {\rho_{\text{A}} \left( {\gamma_{i} } \right) \cdot \rho_{\text{A}} \left( {\gamma_{j} } \right)} }}.$$
(10)

Then, PLSc utilizes the matrix of the changed correlations \({\text{corr}}\left( {\tilde{\gamma }_{i} ,\tilde{\gamma }_{j} } \right)\) to estimate the adjusted path coefficients for each dependent latent variable and its R2 values by means of ordinary least squares regressions.

To mimic common factor model results in a GSCA context, Hwang et al. (2017) proposed GSCAM that includes both common and unique parts of each indicator, under the assumption that the unique parts are equivalent to measurement errors, as postulated in common factor analysis or factor-based SEM. Like GSCA, GSCAM estimates all parameters simultaneously, taking into account measurement errors.

Both PLSc and GSCAM provide parameter estimates comparable to those of factor-based SEM. Thus, researchers may use these extensions when factor-based SEM does not converge or converges to improper solutions and small sample sizes or complex model specifications permit addressing these issues. Nonetheless, a main difference between PLSc and GSCAM is that GSCAM does not require the basic design in model specification and parameter estimation. The basic design can often be restrictive in practice, leading to the exclusion of multidimensional latent variables that have been well studied in the literature (Asparouhov and Muthén 2009). For instance, multitrait–multimethod models (Campbell and Fiske 1959) and latent growth curve models (Meredith and Tisak 1990; Duncan et al. 2013) include multidimensional latent variables. In this regard, GSCAM seems to be more flexible than PLSc. Irrespective of this, rather than mimicking factor-based SEM results, researchers should generally revert to the much more widely recognized and validated factor-based SEM method when estimating factor models (Hair et al. 2017a).

2.3 Results evaluation

Differences in model estimation entail distinct model evaluation criteria to be used in PLSPM and GSCA, which are well documented in the extant literature (e.g., Hair et al. 2017b; Hwang and Takane 2014; see Table 1).

Table 1 Results assessment

For PLSPM, the first step in results evaluation involves examining the measurement models. For reflective measurement models, researchers need to assess the indicator and construct reliabilities, convergent validity, and discriminant validity. Formative measurement models need to be assessed with regard to convergent validity, multicollinearity, and the significance and relevance of the indicator weights (Sarstedt et al. 2017). The second step in PLSPM-based results evaluation considers the structural model. This step focuses on the significance and relevance of the path coefficients and the model’s explanatory power (i.e., the R2) as well as its predictive power (e.g., using PLSpredict; Shmueli et al. 2016, 2019). Researchers have also proposed various criteria for assessing a PLS path model’s goodness-of-fit (e.g., GFI and SRMR; Lohmöller 1989; Henseler et al. 2014). However, recent research calls the appropriateness of these metrics and their proposed thresholds into question as the metrics don’t align with the functional principles of the PLSPM algorithm (Hair et al. 2019b). Instead, researchers should focus on predictive model evaluation, which conforms to PLSPM’s causal-predictive nature (Jöreskog and Wold 1982). Researchers should also consider different model configurations and engage in (predictive) model comparisons (Sharma et al. 2019a, b).

The assessment of GSCA results (see Hwang and Takane 2014) can be carried out based on global fit criteria such as FIT and AFIT. These criteria represent a form of average R2 value of both the indicators in the measurement model and the dependent latent variables in the structural model. Moreover, the GFI and SRMR are global fit measures that consider the difference between the sample covariance matrix and the model-implied covariance matrix. Similarly to the FIT and AFIT criteria, the indices FITM and FITS separately indicate how much the variance of indicators (and latent variables) is on average accounted for by a measurement (and the structural) model. GSCA also supports the assessment of each measurement model’s composite reliability (Ryoo and Hwang 2017), as well as predictability of the entire model or sub-models via cross-validation (Cho et al. 2019). Furthermore, various aforementioned local fit criteria that PLSPM adopts can also be used for GSCA (Hwang and Takane 2014, Chapter 2).

Since PLSPM and GSCA are non-parametric, both methods rely on resampling methods, such as bootstrapping (Efron 1979, 1982), to obtain the parameters’ standard errors (e.g., Chin 2001; Hwang and Takane 2004). These allow for computing test statistics or confidence intervals, which facilitate testing the significance of path coefficients and other model parameters of interest (most notably indicator weights).

Even though the criteria differ, results evaluation in PLSPM and GSCA put strong emphasis on the explained variance of the model’s dependent constructs. In addition, while GSCA emphasizes goodness-of-fit testing, researchers using PLSPM have recently put greater emphasis on prediction-oriented model assessment (Shmueli et al. 2019). We expect that these views will converge in order to exploit the composite-based SEM’s causal-predictive capabilities in the future.

3 Concept analysis

3.1 Methodology

To identify dominant topics that characterize the joint PLSPM and GSCA research domain, we apply a combination of semantic and relational extraction from text, referred to as Leximancer (Smith and Humphreys 2006). The approach has been used in various fields including communication studies (Chevalier et al. 2018), marketing (e.g., Babin and Sarstedt 2019; Fritze et al. 2018; Wilden et al. 2017), and different areas of life sciences (e.g., Day et al. 2018; Kilgour et al. 2019; Rigo et al. 2018) to identify dominant themes in research streams.

In its first stage (semantic extraction), the method uses word frequencies and co-occurrences to produce a ranked list of lexical terms. This list seeds a bootstrapping algorithm, which extracts a set of classifiers from the text by iteratively extending the seed word definitions. The resulting weighted term classifiers are referred to as concepts and represent words that carry related meanings (Smith and Humphreys 2006). In its second stage (relational extraction), the method uses the concepts identified in the previous stage to classify text segments (typically every two or three sentences; Leximancer 2018). Specifically, using the relative concept co-occurrence frequency as input, the method generates a two-dimensional concept map based on a variant of the spring-force model for the many-body problem (Chalmers and Chitson 1992). The connectedness of each concept in the underlying network is used to group the concepts into higher-level themes, which aid interpretation of the network’s structure. Themes typically consist of several highly connected concepts, which can be used to characterize the corresponding region of the network—as visualized in the concept map. When themes comprise only a single concept, they usually appear as isolates in the border region of the concept map.

3.2 Data

For the analysis, we first included all 84 articles used in Khan et al.’s (2019) network analysis of the PLSPM research domain. In the next step, we used the Web of Science (WoS) to retrieve additional articles that deal with GSCA and other composite-based SEM methods. Specifically, we entered the following search query into the WoS search engine to find publications across all the databases: “composite-based SEM” OR “composite-based structural equation modeling” OR “GSCA” OR “GESCA” OR “generalized structure component analysis”. To remain consistent with Khan et al.’s (2019) list, we considered all publications from 1965 to early 2017. This second search retrieved an initial number of 24 articles, which three professors proficient in SEM then independently classified. Relevant articles identified in the second search primarily deal with generalized canonical correlation analysis (e.g., Tenenhaus and Tenenhaus 2014; Tenenhaus et al. 2015), GSCA (Hwang and Takane 2004) and its various extensions (e.g., Hwang and Takane 2004; Hwang et al. 2007, 2010).

As a result, our analysis considers 108 papers published between 1979 and 2017. Most of these articles were published in Journal of Business Research (10 articles, 9.26%), Long Range Planning (8 articles, 7.41%), Psychometrika (7 articles, 6.48%), Computational Statistics & Data Analysis, and Ind Manag Data Syst (both 6 articles, 5.56%), showing that composite-based SEM methods have a strong standing in both, prominent applied and renowned statistics journals. A detailed breakdown of the publication years shows that the field experienced a sharp increase in publications in 2014 and later. Specifically, 64 of the 108 articles (59.26%) stem from this time period. Therefore, we performed (1) an analysis of all papers, and (2) separate analyses of the time periods 1979–2013 and 2014–2017. We first performed several training runs on the overall data separated by time periods to identify generic concepts that do not offer any insights into the research domain. As a result of these training runs, we excluded concepts such as “article”, “number”, “paper”, “table”, and “use” from the subsequent analyses. In addition, we removed all name-like concepts such as “Chin” and “Wold”.

4 Results

Table 2 shows the extracted themes (i.e., groups of concepts) from the analysis of all papers and by time periods, including each theme’s number of hits per analysis. The two most dominant themes are latent and analysis. Also, from 1979 to 2013, the theme model played a particularly important role, while from 2014 to 2017 effects and value represent became relevant themes.

Table 2 Themes

Table 3 presents the breakdown of the themes and corresponding concepts, showing all themes with more than one concept per theme (i.e., other than the theme itself) in any of the analyses, sorted by overall hits. Figure 1 shows the concept map resulting from the analyses showing all the derived concepts and their groupings into themes. Finally, Fig. 2 displays the concept map from 1979 to 2013 while Fig. 3 shows the concept map from 2014 to 2017. The dots in each of the maps represent the concepts, while the circles represent the themes. The size of each circle indicates the number of concepts belonging to each theme, thereby also defining the boundaries to neighboring themes. The themes are heat-mapped to indicate importance. That is, a theme comprising many concepts that are mentioned frequently within the textual data is considered important and appears in red in the map. The second most important theme appears in orange, and so on according to the color wheel. Similarly, the size of a concept’s dot reflects its connectivity in the concept map. The larger the dot, the more often the concept is coded in the text along with the other concepts in the map. In addition, distances between the dots indicate how closely the concepts are related. For example, concepts that appear together often in the same text element tend to settle near one another in the map space (e.g., empirical and theory in the effects theme) .

Table 3 Major themes and concepts
Fig. 1
figure 1

Concept map (all papers)

Fig. 2
figure 2

Concept map (1979–2013)

Fig. 3
figure 3

Concept map (2014–2017)

The analysis yields the dominant theme latent, which comprises a multitude of concepts related to measurement models (Table 3; Fig. 1). Contrasting the two time periods shows that earlier research has put greater emphasis on the distinction between reflective and formative measurement models (Table 3; Figs. 2, 3). In fact, recent research in psychometrics has witnessed considerable debates regarding the nature and applicability of formative measurement (e.g., Aguirre-Urreta et al. 2016; Bentler 2016; Howell and Breivik 2016), which have also impacted the way methodological research on composite-based SEM uses these concepts and related terminologies. For example, Rigdon (2016, p. 601) notes that “the terms ‘formative’ and ‘reflective’ only obscure the statistical reality” and that researchers should rather distinguish between common factor proxies and composite proxies, and between regression weighted composites and correlation weighted composites (Ryoo and Hwang 2017). Similarly, Henseler et al. (2016a, p. 6) avoid the distinction between reflective and formative measurement, instead noting that “the specification of the measurement model entails decisions for composite or factor models”. Sarstedt et al. (2016) however argue that the distinction is still relevant in the context of measurement conceptualization, which needs to be distinguished from how SEM methods treat the measures statistically.

Another prominent theme is analysis, which strongly relates to method comparisons, spanning across both time periods (Table 3; Fig. 1). Research in the field has a long-standing tradition of comparing composite-based with factor-based SEM methods on the grounds of simulated data. The vast majority of these studies used factor model populations as the benchmark against which the parameter estimates from composite-based SEM methods were evaluated (e.g., Goodhue et al. 2012; Rönkkö and Evermann 2013). Researchers have long warned that such comparisons are akin to “comparing apples with oranges” (Marcoulides et al. 2012, p. 725) and only more recent studies have evaluated composite-based SEM methods on the grounds of correctly specified population (i.e., composite) models (Hair et al. 2017c; Sarstedt et al. 2016). This strand of research is also reflected in the emergence of the theme performance in more recent studies.

The third most prominent theme labeled effects shows a divergent development over time. Earlier research related to this theme had a stronger focus on the analysis of interaction effects as evidenced in several prominent publications on this topic (e.g., Chin et al. 2003; Henseler and Chin 2010; Henseler et al. 2012). More recent research, however, focuses on other model specification types such as mediating effects (e.g., Nitzl et al. 2016) and hierarchical component models (e.g., Cheah et al. 2019; Ciavolino et al. 2015; Sarstedt et al. 2019a).

The analysis with regard to the theme model shows that the assessment of unobserved heterogeneity was a dominant concept in earlier research (Table 3; Fig. 2). In this time period, research has brought forward several latent class approaches for capturing unobserved heterogeneity, including FIMIX-PLS (e.g., Hair et al. 2016; Sarstedt et al. 2019b), fuzzy clusterwise GSCA (Hwang et al. 2007), and PLS-POS (Becker et al. 2013b). More recently proposed latent class procedures in PLSPM—PLS-GAS (Ringle et al. 2014) and PLS-IRRS (Schlittgen et al. 2016)—have not reinforced the heterogeneity concept in current research (Table 3; Fig. 3), despite ongoing research in related fields such a multigroup comparisons (Henseler et al. 2016b). A potential reason for this surprising finding could be that new guidelines for PLSPM analyses (Henseler et al. 2016a; Henseler 2017) neglect the concept of unobserved heterogeneity, despite its obvious importance to ensure the validity of results (Becker et al. 2013b; Hair et al. 2019b; Jedidi et al. 1997).

The analysis also produced the theme value, which, in recent research, relates to the concepts of customer and satisfaction. A more detailed analysis shows that many of the studies in the field use customer or job satisfaction data to illustrate methodological extensions such as quantile composite-based SEM (Davino and Vinzi 2016), GSCA with uniqueness terms (Hwang et al. 2017), or cross-validation in PLSPM (Reguera-Alvarado et al. 2016).

Several of the themes identified in the analysis only appear as isolates in either of the time periods (Table 3). For example, the theme bootstrapping relates to the studies by Kock (2016), Rönkkö et al. (2015), and Streukens and Leroi-Werelds (2016) on statistical inference in PLSPM. The theme test refers goodness-of-fit testing in PLSPM, which has experienced renewed interest in recent research. For example, while Henseler et al. (2016a) and Henseler (2017) call for the routine use of model fit measures such as SRMR, Sarstedt et al. (2016) comment critically on the applicability of measures grounded in a comparison of empirical and model-implied correlation matrices in a PLSPM context (also see Hair et al. 2019a). Finally, the theme validity appears in the overall analysis, pointing to researchers’ ongoing interest in related concepts such as discriminant validity (Henseler et al. 2015), which extends to very recent research (Franke and Sarstedt 2019).

5 Discussion

Fostered by recent methodological advances and the availability of easy-to-use software programs, composite-based SEM methods—particularly PLSPM and GSCA—have gained massive traction in recent years (Hair et al. 2019a, Ringle 2019). Methodological research has constantly advanced PLSPM and GSCA to accommodate a broad range of data and model constellations. Recent examples include methods for assessing a model’s predictive power (Shmueli et al. 2016, 2019), addressing endogeneity (Hult et al. 2018; Sarstedt et al. 2019a), and model comparisons (Sharma et al. 2019a, b). At the same time, conceptual considerations have given composite-based SEM methods significant tailwind. Specifically, researchers have started questioning the reflex-like applicability of factor-based methods, calling for a broader scope, which also considers composites as an integral element of measurement. For example, Bentler and Huang (2014, p. 138) note that “composite variable models are probably far more prevalent overall than latent variable models”. Similarly, Grace and Bollen (2008, p. 210) note that “composites have, we believe, great potential to facilitate our ability to create models that are empirically meaningful and also of theoretical relevance”. Rhemtulla et al. (2019) have recently echoed these observations, noting that latent variable models have been over-applied in psychiatry, clinical psychology, and various other fields of life sciences (also see Henseler et al. 2014). These observations certainly pave the way for composite-based SEM methods, which we expect to play an increasingly important role in all fields of science.

Composite-based SEM methods’ prominence is fostered by increasing doubts on long-held beliefs regarding the factors that differentiate composite- from factor-based SEM methods. For example, researchers have relativized PLSPM’s small sample size capabilities by identifying concrete situations in which the method performs well vis-á-vis other SEM methods when limited data are available (e.g., Goodhue et al. 2012; Hair et al. 2017c, 2019b). More importantly, many researchers have criticized composite-based SEM methods for not being able to reduce measurement error (Rönkkö and Evermann 2013), which has been debunked as wrong (Henseler et al. 2014). More importantly, however, Rigdon et al. (2019) show that factor-based SEM methods induce a significant degree of measurement uncertainty, which is a “parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand” (JCGM/WG1 2008, Sect. 2.2.3). Uncertainty quantifies a researcher’s lack of knowledge about the value of the measurand and directly blurs the relationship between latent variables and the concepts that they seek to represent. By acknowledging uncertainty as an integral part of any measurement, “researchers would derive substantial benefit from a full accounting, and a fresh debate, relating to the compromises involved in using either common factors or weighted composites as stand-ins or proxies for conceptual variables” (Rigdon et al. 2019, p. 440). We expect that this perspective will change the nature of the debate regarding the relative merits of factor- vs. composite-based SEM methods in the long-run (also see Rigdon et al. 2017).

Our concept mapping of composite-based SEM research illustrates the field’s maturation. The results suggest that researchers have become aware of the conceptual differences between composite and factor models and their implications for the methods’ performance. Specifically, researchers have started evaluating composite-based methods under (composite) models that are consistent with what the methods assume, finding support for their consistency (Hair et al. 2017c). Sarstedt et al. (2016) investigated the robustness of covariance structure analysis and PLSPM when incorrectly applied to the composite-based and factor-based models, respectively. These authors found that in this situation, PLSPM yields more accurate parameter estimates on average than covariance structure analysis, indicating that PLSPM is more robust against being used for models with its incomparable latent variables (i.e., factors). We expect that future studies will build on this research and further examine the methods’ performance under (in)consistent model specifications.

Furthermore, our analysis documents an increased interest in model evaluation metrics such as for the assessment of discriminant validity testing (Henseler et al. 2015) and internal consistency reliability (Dijkstra and Henseler 2015). Researchers have also developed methods for assessing a model’s out-of-sample predictive power (Shmueli et al. 2016, 2019; Cho et al. 2019) or measures for comparing different models in this respect (Sharma et al. 2019b). We expect this development to prevail as composite-based methods do not follow a strict confirmatory perspective, like factor-based SEM, but adhere to a causal-predictive paradigm. As Hair et al. (2019a, p. 3) note in the context of business research applications, PLSPM “overcomes the apparent dichotomy between explanation—as typically emphasized in academic research—and prediction, which is the basis for developing managerial implications”.

Finally, we would like to envision what we believe are the most pressing challenges in composite-based SEM research. With regard to PLSPM, future research should extend the method’s modeling capabilities to permit relating a manifest variable to multiple constructs simultaneously. GSCA already supports this modeling option, which paves the way for running an explorative composite analysis. In the structural model, modeling advances in PLSPM may support non-recursive models (i.e., circular path relationships), bidirectional relationships, and constraining paths. Also, the PLSPM method developments should follow calls to facilitate longitudinal and panel data analyses (Richter et al. 2016). Current treatments of longitudinal data (Roemer 2016) are rather ad hoc and do not truly take time variant effects into account. Similarly, multilevel modeling in PLSPM is a concern for future methodological developments. Often, the assessment of common methods variance represents an issue that researchers must address in their PLSPM analysis. Even though prior research tackled this question (Chin et al. 2013), PLSPM lacks methodological support and a straightforward procedure on how to assess and treat common methods variance. The same call holds for GSCA. Finally, in the social sciences, researchers usually only focus on confirmatory tests and results evaluations but often neglect the relevance of their models’ predictive power (Shmueli and Koppius 2011). Future research also need to complement recent efforts to establish predictive model assessment criteria (Shmueli et al. 2016; Sharma et al. 2019b), for example, by developing a test for predictive model comparisons. Similar needs hold for GSCA.

In GSCA, it would be fruitful to also extend the modeling capabilities and to consider alternative objective criteria of the algorithm. For example, GSCA should follow recent developments in PLSPM (Hult et al. 2018) and identify means for dealing with endogeneity that generally refers to situations where independent variables are correlated with residual terms in either the measurement or structural model. One may consider replacing the current ordinary least squares estimator with the instrumental variable estimator in each step of the ALS algorithm. In practice, auxiliary covariates (e.g., gender, age, ethnicity, etc.) often lead to heterogeneous subgroups of observations. GSCA can be extended to consider such covariate-dependent heterogeneity to examine whether the relationships among indicators and latent variables vary across subgroups differentiated by covariates. It may be combined with recursive partitioning (e.g., Strobl et al. 2009) to capture this heterogeneity. In addition, it would be desirable to develop an integrated approach to GSCA and GSCAM in order to simultaneously accommodate two statistical representations of latent variables—common factors and composites. Such an integrated framework can contribute to bridging the two SEM approaches. The same holds for PLSPM. Lastly, future endeavors are needed to provide a combined view of PLSPM and GSCA, for example, by developing a unified model formulation and/or estimation procedure for both composite-based SEM approaches.