Introduction

Structural equation modeling (SEM) has become a quasi-standard in marketing research (e.g., Babin et al. 2008; Bagozzi 1994; Hulland 1999), as it allows authors to test complete theories and concepts (Rigdon 1998). Researchers especially appreciate SEM’s ability to assess latent variables at the observation level (outer or measurement model) and test relationships between latent variables on the theoretical level (inner or structural model) (Bollen 1989).Footnote 1 When applying SEM, researchers must consider two types of methods: covariance-based techniques (CB-SEM; Jöreskog 1978, 1993) and variance-based partial least squares (PLS-SEM; Lohmöller 1989; Wold 1982, 1985). Although both methods share the same roots (Jöreskog and Wold 1982), previous marketing research has focused primarily on CB-SEM (e.g., Bagozzi 1994; Baumgartner and Homburg 1996; Steenkamp and Baumgartner 2000).

Recently, PLS-SEM application has expanded in marketing research and practice with the recognition that PLS-SEM’s distinctive methodological features make it a possible alternative to the more popular CB-SEM approaches (Henseler et al. 2009). A variety of PLS-SEM enhancements have been developed in recent years, including (1) confirmatory tetrad analysis for PLS-SEM to empirically test a construct’s measurement mode (Gudergan et al. 2008); (2) impact-performance matrix analysis (Slack 1994; Völckner et al. 2010); (3) response-based segmentation techniques, such as finite mixture partial least squares (FIMIX-PLS; Hahn et al. 2002; Sarstedt et al. 2011a); (4) guidelines for analyzing moderating effects (Henseler and Chin 2010; Henseler and Fassott 2010); (5) non-linear effects (Rigdon et al. 2010); and (6) hierarchical component models (Lohmöller 1989; Wetzels et al. 2009). These enhancements expand PLS-SEM’s general usefulness as a research tool in marketing and the social sciences.

Most methodological fields have established regular critical reflections to ensure rigorous research and publication practices, and consequently acceptance, in their domain. While reviews of CB-SEM usage have a long tradition across virtually all business disciplines (e.g., Babin et al. 2008; Baumgartner and Homburg 1996; Brannick 1995; Garver and Mentzer 1999; Medsker et al. 1994; Shah and Goldstein 2006; Shook et al. 2004; Steenkamp and van Trijp 1991), relatively little attention has been paid to assessing PLS-SEM use. Hulland (1999) was the first to review PLS-SEM use with an in-depth analysis of four strategic management studies. His review revealed flaws in the PLS-SEM method’s application, raising questions about its appropriate use in general, and indicating the need for further examination in a more comprehensive assessment. More recently, Henseler et al. (2009) and Reinartz et al. (2009) assessed PLS-SEM use in (international) marketing research but focused only on the reasons for choosing this method.

PLS-SEM requires several choices that, if not made correctly, can lead to improper findings, interpretations, and conclusions. In light of its current more widespread application in marketing research and practice, a critical review of PLS-SEM’s use seems timely and warranted. Our objectives with this review are threefold: (1) to investigate published PLS-SEM articles in terms of relevant criteria, such as sample size, number of indicators used, and measures reported; (2) to provide an overview of the interdependencies in researchers’ choices, identify potential problem areas, and discuss their implications; and (3) to provide guidance on preventing common pitfalls in using PLS-SEM.

Our review shows that PLS-SEM’s methodological properties are widely misunderstood, at times leading to the technique’s misapplication, even in top tier journals. Furthermore, researchers often do not apply criteria available for model assessment, sometimes even misapplying the measures. Our guidelines on applying PLS-SEM appropriately have important implications, therefore, for correct application and knowledgeable assessment of PLS-SEM–related research studies.

Not “CB-SEM versus PLS-SEM” but “CB-SEM and PLS-SEM”

Wold (1975) originally developed PLS-SEM under the name NIPALS (nonlinear iterative partial least squares), and Lohmöller (1989) extended it. PLS-SEM was developed as an alternative to CB-SEM that would emphasize prediction while simultaneously relaxing the demands on data and specification of relationships (e.g., Dijkstra 2010; Jöreskog and Wold 1982). The methodological concepts underlying both approaches have been compared in several publications, including those by Barclay et al. (1995), Chin and Newsted (1999), Fornell and Bookstein (1982), Gefen et al. (2011), Hair et al. (2011), Jöreskog and Wold (1982), and Lohmöller (1989).

CB-SEM estimates model parameters so that the discrepancy between the estimated and sample covariance matrices is minimized. In contrast, PLS-SEM maximizes the explained variance of the endogenous latent variables by estimating partial model relationships in an iterative sequence of ordinary least squares (OLS) regressions. An important characteristic of PLS-SEM is that it estimates latent variable scores as exact linear combinations of their associated manifest variables (Fornell and Bookstein 1982) and treats them as perfect substitutes for the manifest variables. The scores thus capture the variance that is useful for explaining the endogenous latent variable(s). Estimating models via a series of OLS regressions implies that PLS-SEM relaxes the assumption of multivariate normality needed for maximum likelihood–based SEM estimations (Fornell and Bookstein 1982; Hwang et al. 2010; Lohmöller 1989; Wold 1982; for a discussion, see Dijkstra 2010). In this context, Lohmöller (1989, p. 64) notes that “it is not the concepts nor the models nor the estimation techniques which are ‘soft,’ only the distributional assumptions.” Furthermore, since PLS-SEM is based on a series of OLS regressions, it has minimum demands regarding sample size and generally achieves high levels of statistical power (Reinartz et al. 2009). Conversely, CB-SEM involves constraints regarding the number of observations and small sample sizes, often leading to biased test statistics (e.g., Hu and Bentler 1995), inadmissible solutions (e.g., Heywood cases), and identification problems—especially in complex model set-ups (e.g., Chin and Newsted 1999). Thus, PLS-SEM is suitable for applications where strong assumptions cannot be fully met and is often referred to as a distribution-free “soft modeling approach.”

Consideration of formative and reflective outer model modes is an important issue for SEM (e.g., Diamantopoulos and Winklhofer 2001; Jarvis et al. 2003). While CB-SEM is applicable for formative outer model specifications only under certain conditions (e.g., Bollen and Davies 2009; Diamantopoulos and Riefler 2011), PLS-SEM can almost unrestrictedly handle both reflective and formative measures (e.g., Chin 1998). Furthermore, PLS-SEM is not constrained by identification concerns, even if models become complex, a situation that typically restricts CB-SEM usage (Hair et al. 2011).

These advantages must be considered, however, in light of several disadvantages. For example, the absence of a global optimization criterion implies a lack of measures for overall model fit. This issue limits PLS-SEM’s usefulness for theory testing and for comparing alternative model structures. As PLS-SEM also does not impose any distributional assumptions, researchers cannot rely on the classic inferential framework and thus have to revert to prediction-oriented, non-parametric evaluation criteria as well as resampling procedures to evaluate the partial model structures’ adequacy (e.g., Chin 2010). A further concern is that PLS-SEM parameter estimates are not optimal regarding bias and consistency (Reinartz et al. 2009)—a characteristic frequently referred to as PLS-SEM bias. This bias is more severe in very complex models, since least squares estimators do not control the contingent and chained effects of one part of the model’s errors to another. Only when the number of observations and the number of indicators per latent variable increase to infinity do the latent variable scores (and therefore the parameter estimates) approach the true values. Thus, estimates are asymptotically correct in the qualified sense of consistency at large (Jöreskog and Wold 1982; Lohmöller 1989; Wold 1982).

In light of their distinct statistical concepts, most researchers consider the two approaches to SEM as complementary, a fact that has already been stressed by the two originators of PLS-SEM and CB-SEM, Jöreskog and Wold (1982). In general, PLS-SEM’s weaknesses are CB-SEM’s strengths, and vice versa (e.g., Jöreskog and Wold 1982; Sosik et al. 2009). Thus, neither of the SEM methods is generally superior to the other. Instead, researchers need to apply the SEM technique that best suits their research objective, data characteristics, and model set-up (e.g., Fornell and Bookstein 1982; Gefen et al. 2011; Reinartz et al. 2009).

Review of PLS-SEM research

Our review of PLS-SEM applications in marketing consists of studies published in the top 30 marketing journals identified in Hult et al.’s (2009) journal ranking. This ranking is similar to those by Baumgartner and Pieters (2003), Hult et al. (1997), and Theoharakis and Hirst (2002). All studies published in the 30-year period from 1981 to 2010 were searched for empirical PLS-SEM applications in the field of marketing. We conducted a full text search in the Thomson Reuters Web of Knowledge, ProQuest ABI/INFORM Global, and EBSCO Business Source Premier databases, using the keywords “partial least squares” and “PLS.” We also looked into the online versions of the journals.Footnote 2 The search across multiple databases using the same keywords allowed us to verify that we had captured all PLS-SEM articles in the targeted marketing journals.

Since Hult et al.’s (2009) ranking includes interdisciplinary journals that cover several functional business areas (e.g., Management Science, Journal of Business Research, Journal of International Business Studies), articles were screened to identify those in marketing. Papers that drew on PLS-SEM simulation studies and empirical PLS-SEM applications to illustrate methodological enhancements were not considered. Similarly, studies applying path analysis, conventional score-based latent variable regression, and PLS regression were excluded from the sample. Ultimately, a total of 24 journals with relevant articles remained (Table 1). The paper selection process included an initial coding by a senior PhD student. Thereafter, two professors proficient in the technique coded each article independently. The coding agreement on the relevant articles was 92%, which compares well with Shook et al.’s (2004) study. To resolve coding inconsistencies, opinions of other experts were obtained.

Table 1 PLS-SEM studies in the top 30 marketing journals

This search resulted in 204 articles (Table 1) with 311 PLS-SEM estimations, since some articles analyze alternative models and/or use different datasets (e.g., collected in different years and/or countries). The European Journal of Marketing (30 articles, 14.71%), Industrial Marketing Management (23 articles, 11.27%), and Journal of Marketing (17 articles, 8.33%) published the highest numbers of PLS-SEM studies. To assess the growth trend in PLS-SEM use, we conducted a time-series analysis with the number of studies applying this method as the dependent variable. Model estimation using a linear term results in a significant model (F = 25.01, p ≤ 0.01) in which the time effect is significant (t = 5.00, p ≤ 0.01). Next, we used both linear and quadratic time effects. The regression model is significant (F = 35.13, p ≤ 0.01) and indicates that 72.22% of PLS-SEM applications are explained by linear and quadratic time effects. The additionally considered quadratic effect is significant (t = 4.94, p ≤ 0.01), indicating that the use of PLS-SEM in marketing has accelerated over time. Clearly, PLS-SEM has gained popularity over the past decades—most notably, 51 studies appeared in 2010 alone. In contrast, Baumgartner and Homburg’s (1996) analysis of early CB-SEM diffusion provided no indication of its accelerated use prior to their analysis.

Critical issues in PLS-SEM applications

Each article was evaluated according to a wide range of criteria which allow PLS-SEM’s critical issues and common misapplications to be identified. The following review focuses on six key issues in PLS-SEM: (1) reasons for using PLS-SEM; (2) data characteristics; (3) model characteristics; (4) outer model evaluation; (5) inner model evaluation; and (6) reporting. Where possible, we also indicate best practices as guidelines for future applications and suggest avenues for further research. In addition to the PLS-SEM application analysis, we also contrast two time periods to assess whether the usage has changed. In light of Chin’s (1998), Chin and Newsted’s (1999) as well as Hulland’s (1999) influential PLS-SEM articles, published in the late 1990s, we differentiate between studies published before 2000 (39 studies with 69 models) and those published in 2000 and beyond (165 studies with 242 models).Footnote 3

Reasons for using PLS-SEM

Since predominantly covariance-based SEM techniques have been used to estimate models in marketing, PLS-SEM use often requires a more detailed explanation of the rationale for selecting this method (Chin 2010). The most often used reasons relate to data characteristics, such as the analysis of non-normal data (102 studies, 50.00%), small sample sizes (94 studies, 46.08%), and the formative measurement of latent variables (67 studies, 32.84%). Furthermore, researchers stated that applying PLS-SEM is more consistent with their study objective. For example, in 57 studies (27.94%), authors indicated their primary research objective is to explain the variance of the endogenous constructs. This is closely related to the rationale of using the method for exploratory research and theory development, which 35 studies (17.16%) mentioned. The latter two reasons comply with PLS-SEM’s original purpose of prediction in research contexts with rich data and weak theory (Wold 1985). Nevertheless, 25 studies (12.25%) incorrectly justified the use of PLS-SEM by citing its appropriateness for testing well-established complex theories. Further justifications for its use relate to the ability to cope with highly complex models (27 studies, 13.24%) and categorical variables (26 studies, 12.75%). In studies published prior to 2000, the authors mentioned non-normal data, prediction orientation (both p ≤ 0.01), and the use of categorical variables (p ≤ 0.05) significantly more often than did authors of recent studies, suggesting changes over time. Conversely, the formative indicator argument was significantly (p ≤ 0.01) more prevalent in studies published in 2000 and beyond.

Several of these characteristics have been extensively discussed in the methodological literature on PLS-SEM. The sample size argument in particular has been the subject of much debate (e.g., Goodhue et al. 2006; Marcoulides and Saunders 2006). While many discussions on this topic are anecdotal in nature, few studies have systematically evaluated PLS-SEM’s performance when the sample size is small (e.g., Chin and Newsted 1999; Hui and Wold 1982). More recently, Reinartz et al. (2009) showed that PLS-SEM achieves high levels of statistical power—in comparison to its covariance-based counterpart—even if the sample size is relatively small (i.e., 100 observations). Similarly, Boomsma and Hoogland’s (2001) study underscores CB-SEM’s need for relatively large sample sizes to achieve robust parameter estimates. PLS-SEM is therefore generally more favorable with smaller sample sizes and more complex models. However, as noted by Marcoulides and Saunders (2006), as well as Sosik et al. (2009), PLS-SEM is not a silver bullet for use with samples of any size, or a panacea for dealing with empirical research challenges. All statistical techniques require consideration of the sample size in the context of the model and data characteristics, and PLS-SEM is no exception.

Theoretical discussions (Beebe et al. 1998) as well as simulation studies (Cassel et al. 1999) indicate that the PLS-SEM algorithm transforms non-normal data in accordance with the central limit theorem (see also Dijkstra 2010). These studies show that PLS-SEM results are robust if data are highly skewed, also when formative measures are used (Ringle et al. 2009). In contrast, maximum likelihood–based CB-SEM requires normally distributed indicator variables. Since most empirical data do not meet this requirement, several studies have investigated CB-SEM with non-normal data and reported contradictory results (e.g., Babakus et al. 1987; Reinartz et al. 2009). Given the multitude of alternative estimation procedures for CB-SEM, such as weighted least squares and unweighted least squares, it is questionable whether the choice of PLS-SEM over CB-SEM can be justified solely by distribution considerations.

CB-SEM can accommodate formative indicators, but to ensure model identification, researchers must follow rules that require specific constraints on the model (Bollen and Davies 2009; Diamantopoulos and Riefler 2011). These constraints often contradict theoretical considerations, and the question arises whether model design should guide theory or vice versa. In contrast, similar problems do not arise in PLS-SEM, which only requires the constructs to be structurally linked. As a result, PLS-SEM provides more flexibility when formative measures are involved.

Data characteristics

Sample size is a basic PLS-SEM application issue. In line with the frequently noted argument that PLS-SEM works particularly well with small sample sizes, the average sample size in our review (5% trimmed mean = 211.29) is clearly lower than that reported by Shah and Goldstein (2006) in their review of CB-SEM studies (mean = 246.4). The same holds for Baumgartner and Homburg’s (1996) review; these authors reported a median sample size of 180 (median in this study = 159.00). It is interesting to note that several models exhibited very large sample sizes considered atypical in PLS-SEM (Johnson et al. 2006, n = 2,990; Sirohi et al. 1998, n = 16,096; Xu et al. 2010, n = 2,333). Conversely, 76 of 311 models (24.44%) have less than 100 observations, with Lee (1994) having the smallest sample size (n = 18). An assessment over time shows that the sample size was higher (albeit not significantly) in recent models (5% trimmed mean = 229.37) than in earlier models (5% trimmed mean = 175.22).

As a popular rule of thumb for robust PLS-SEM estimations, Barclay et al. (1995) suggest using a minimum sample size of ten times the maximum number of paths aiming at any construct in the outer model (i.e., the number of formative indicators per construct) and inner model (i.e., the number of path relationships directed at a particular construct). Although this rule of thumb does not take into account effect size, reliability, the number of indicators, and other factors known to affect power and can thus be misleading, it nevertheless provides a rough estimate of minimum sample size requirements. While most models meet this rule of thumb, 28 of the estimated models (9.00%) do not; on average they are 45.18% below the recommended sample size. Moreover, 82.61% of the models published before 2000 meet this rule of thumb, but the percentage increased significantly (p ≤ 0.01) in more recent models (93.39%). Thus, researchers seem to be more aware of sample size issues in PLS-SEM.

The use of holdout samples to evaluate the results’ robustness is another area of concern (Hair et al. 2010). Only 13 of 204 studies (6.37%) included a holdout sample analysis. While this may be due to data availability issues, PLS-SEM’s distribution-free character, which relies on resampling techniques such as bootstrapping for significance testing (Henseler et al. 2009), might also be a reason for not using a holdout sample. In light of the PLS-SEM bias, however, substantiating parameter estimates’ robustness through holdout samples is of even greater importance in PLS-SEM than in CB-SEM.

Although prior research has provided evidence of PLS-SEM’s robustness in situations in which data are extremely non-normal (e.g., Cassel et al. 1999; Reinartz et al. 2009), researchers should nevertheless consider the data distribution. Highly skewed data inflate bootstrap standard errors (Chernick 2008) and thus reduce statistical power, which is especially problematic given PLS-SEM’s tendency to underestimate inner model relationships (Wold 1982). Despite this concern, only 19 studies (9.31%) report the extent to which the data are non-normal, and no significant differences were evident over time.

Several researchers stress that PLS-SEM generally works with nominal, ordinal, interval, and ratio scaled variables (e.g., Fornell and Bookstein 1982; Haenlein and Kaplan 2004; Reinartz et al. 2009). It is therefore not surprising that researchers routinely use categorical (14 studies, 6.86%) or even binary variables (43 studies, 21.08%). However, this practice should be considered with caution. For example, researchers may decide to use a binary single indicator to measure an endogenous construct to indicate a choice situation. In this set-up, however, the latent construct becomes its measure (Fuchs and Diamantopoulos 2009), which proves problematic for approximations in the PLS-SEM algorithm since path coefficients are estimated by OLS regressions. Specifically, OLS requires the endogenous latent variable scores to be continuous, a property that cannot be met in such a set-up. Likewise, using binary indicators in reflective models violates this OLS assumption, because reflective indicators are regressed on the latent variable scores when estimating outer weights. Correspondingly, Jakobowicz and Derquenne (2007, p. 3668) point out that “when working with continuous data […], PLS does not face any problems, but when working with nominal or binary data it is not possible to suppose there is any underlying continuous distribution.” In a similar vein, Lohmöller (1989) argues that standard procedures for applying linear equations cannot be used for categorical variables. Based on Lohmöller’s (1989) early efforts to include categorical variables, Jakobowicz and Derquenne (2007) developed a modified version of the PLS-SEM algorithm based on generalized linear models that is, however, restricted to reflective measures. The standard PLS-SEM algorithm’s application does not account for these extensions and thus often violates fundamental OLS principles when used on categorical variables.Footnote 4 As a consequence, researchers should not use categorical variables in endogenous constructs and should carefully interpret the meaning of categorical variables in exogenous constructs. Alternatively, categorical variables can be used to split the data set for PLS multigroup comparisons (Sarstedt et al. 2011b).

Model characteristics

Table 2 provides an overview of model characteristics in PLS-SEM studies. The average number of latent variables in PLS path models is 7.94, which is considerably higher than the 4.70 reported in Shah and Goldstein’s (2006) review of CB-SEM studies. In addition, the average number of latent variables is significantly (p ≤ 0.01) higher in models published in 2000 and beyond (8.43) than in earlier models (6.29). The average number of inner model relationships (10.56) has also increased (but not significantly) over time (9.10 before 2000 and 10.99 thereafter). Overall, model complexity in PLS-SEM studies has clearly increased.

Table 2 Descriptive statistics for model characteristics

Our review examines three types of models (i.e., focused, unfocused, and balanced). Focused models have a small number of endogenous latent variables that are explained by a rather large number of exogenous latent variables (i.e., the number of exogenous latent variables is at least twice as high as the number of endogenous latent variables). A total of 109 focused models (35.05%) were used. An unfocused model was defined as having many endogenous latent variables and mediating effects, and a comparatively smaller number of exogenous latent variables (i.e., the number of endogenous latent variables is at least twice as high as the number of exogenous latent variables). A total of 85 unfocused models (27.33%) were employed. The remaining 117 models (37.62%) were balanced models, identified as between the focused and unfocused model types. Focused models were significantly (p ≤ 0.01) more prevalent recently, while unfocused (p ≤ 0.05) and balanced (p ≤ 0.01) models appeared significantly more often in studies published before 2000.

Focused and balanced models meet PLS-SEM’s prediction goal, while CB-SEM may be more suitable for explaining unfocused models. Only 11 of the 57 applications that explicitly stated they used PLS-SEM for prediction purposes also examined a focused model. In contrast, 23 of 57 models supposedly designed for prediction actually examined an unfocused model. Thus, there appears to be a lack of awareness of the relationship between PLS-SEM’s prediction goal and the type of model examined.

Regarding the outer models, PLS path models typically have been assumed to be composed either of solely reflectively measured latent variables (131 models, 42.12%) or a combination of reflectively and formatively measured latent variables (123 models, 39.55%). Much fewer PLS path models were assumed to be based exclusively on formative measures (20 models, 6.43%). Surprisingly, 37 models (11.90%) do not offer a description of the constructs’ measurement modes, despite the extensive debate on measurement specification (e.g., Bollen 2011; Diamantopoulos and Siguaw 2006; Diamantopoulos and Winklhofer 2001; Jarvis et al. 2003). Interestingly, neither the proportion of models incorporating reflective and formative measures nor those lacking a description of the measurement mode changed significantly over time (Table 2).

The average number of indicators is 3.99 for reflective constructs, which is significantly (p ≤ 0.01) higher in recent models (2.90 before 2000; 4.25 thereafter). The higher number of indicators for formative constructs (4.62) is logical since the construct should be represented by the entire population of indicators (Diamantopoulos et al. 2008). As with reflective constructs, the number has increased significantly (p ≤ 0.01) recently (3.63 before 2000; 4.98 thereafter). The total number of indicators used is large, and much higher than in CB-SEM models. This is primarily a result of the larger number of constructs, however, and not a larger average number of indicators per construct. Our review identified an average of 29.55 indicators per PLS path model, with a significant (p ≤ 0.01) increase in recent models (32.51) compared to earlier models (19.58). Furthermore, the number of indicators is much higher than the 16.30 indicators per CB-SEM model reported by Shah and Goldstein (2006). Similarly, Baumgartner and Homburg (1996) reported a median value of 12 indicators per CB-SEM analysis, which is also considerably lower than in our review (median = 24).

Researchers have argued that if a construct’s scope is narrow, unidimensional, and unambiguous for the respondents, using single-item measures is the best approach (e.g., Nunnally 1967; Sackett and Larson 1990), an argument which Bergkvist and Rossiter (2007, 2009) have recently empirically supported.Footnote 5 Unlike with CB-SEM, where the inclusion of single items generally leads to model underidentification (e.g., Fuchs and Diamantopoulos 2009), PLS-SEM is not restricted in this respect. Thus, it is not surprising that 144 models (46.30%) included single-item measures. Although using single-item measures can also prove beneficial from an analytic perspective as they generally increase response rates (e.g., Fuchs and Diamantopoulos 2009; Sarstedt and Wilczynski 2009), one has to keep in mind that the utilization of single-item measures is contrary to PLS-SEM’s concept of consistency at large. For example, Reinartz et al. (2009) showed that only with reasonable outer model quality (in terms of indicators per construct and loadings), does PLS-SEM yield acceptable parameter estimates when the sample size is restricted. While the conflict between psychometric properties and consistency at large has not yet been addressed in research, the use of single-item measures should be considered with caution when using PLS-SEM.

Outer model evaluation

Outer model assessment involves examining individual indicator reliabilities, the reliabilities for each construct’s composite of measures (i.e., internal consistency reliability), as well as the measures’ convergent and discriminant validities. When evaluating how well constructs are measured by their indicator variables, individually or jointly, researchers need to distinguish between reflective and formative measurement perspectives (e.g., Diamantopoulos et al. 2008). While criteria such as Cronbach’s alpha and composite reliability are commonly applied to evaluate reflective measures, an internal consistency perspective is inappropriate for assessing formative ones (e.g., Diamantopoulos and Winklhofer 2001). As Diamantopoulos (2006, p. 11) points out, when “formative measurement is involved, reliability becomes an irrelevant criterion for assessing measurement quality.” Similarly, formative measures’ convergent and discriminant validities cannot be assessed by empirical means (e.g., Hair et al. 2011). Our review examines if and how authors evaluate the suitability of constructs, and whether the concepts typically related to reflective outer models’ assessment are also applied in formative settings.Footnote 6

While the decision on outer model set-up should be based primarily on theoretical grounds (e.g., Diamantopoulos and Winklhofer 2001; Jarvis et al. 2003), Gudergan et al. (2008) propose a confirmatory tetrad analysis technique for PLS-SEM (CTA-PLS) that allows researchers to empirically test constructs’ measurement modes (Bollen and Ting 2000). Since CTA-PLS has been introduced only recently, it may not have been applied in any of the reviewed studies. Future research should, however, routinely employ this technique as a standard means for model assessment (Coltman et al. 2008).

Reflective outer models

Assessment of reflective outer models involves determining indicator reliability (squared standardized outer loadings), internal consistency reliability (composite reliability), convergent validity (average variance extracted, AVE), and discriminant validity (Fornell-Larcker criterion, cross-loadings) as described by, for example, Henseler et al. (2009) and Hair et al. (2011). The findings of the marketing studies reviewed are shown in Table 3 (Panel A).

Table 3 Evaluation of outer models

Overall, 254 of 311 models (81.67%) reported that reflectively measured constructs were included.Footnote 7 Many, but surprisingly not all, models commented on reliability. Specifically, 157 of 254 models (61.81%) reported outer loadings, and thus indirectly specify the indicator reliability, with only 19 models explicitly addressing this criterion. Support for indicator reliability was significantly (p ≤ 0.01) more prevalent in early models than in more recent ones.

Internal consistency reliability was reported for 177 models (69.69%). Prior assessments of reporting practices in the CB-SEM context (e.g., Shah and Goldstein 2006; Shook et al. 2004) revealed that Cronbach’s alpha is the most common measure of internal consistency reliability. However, Cronbach’s alpha is limited by the assumption that all indicators are equally reliable (tau-equivalence), and efforts to maximize it can seriously compromise reliability (Raykov 2007). In contrast, composite reliability does not assume tau-equivalence, making it more suitable for PLS-SEM, which prioritizes indicators according to their individual reliability. The majority of models report composite reliability, either exclusively (73 models; 28.74%) or in conjunction with Cronbach’s alpha (69 models; 27.17%). A total of 35 models (13.78%) reported only Cronbach’s alpha. Application of composite reliability, individually or jointly with Cronbach’s alpha, was significantly (p ≤ 0.01) more prevalent recently (Table 3).

Convergent validity was examined in 153 of 254 models (60.24%). Authors primarily relied on the AVE (146 models), while in the remaining seven models, they incorrectly interpreted composite reliability or the significance of the loadings as indicative of convergent validity. Moreover, a total of 154 models (60.63%) provided evidence of discriminant validity, with most (111 models) solely comparing the constructs’ AVEs with the inter-construct correlations (Fornell and Larcker 1981), a practice that was significantly (p ≤ 0.01) more prevalent recently (Table 3, Panel A). Alternatively, authors examined only cross-loadings (12 models), a criterion which can generally be considered more liberal in terms of discriminant validity. In the 31 remaining models both criteria were reported.

Formative outer models

Overall, 143 of 311 models (45.98%) contained at least one formatively measured construct. Table 3 (Panel B) shows the results of this analysis. A total of 33 models (23.08%) with formatively measured constructs inappropriately evaluated the corresponding measures using reflective outer model assessments. This mistake was significantly (p ≤ 0.01) more prevalent in earlier studies, in which 21 of 38 models (55.26%) used reflective criteria to evaluate formative measures, compared to 12 of 105 models (11.43%) in more recent studies.

Several statistical criteria have been suggested to evaluate the quality of formative measures. The primary statistic for assessing formative indicators is their weight, which was reported in 33 of 143 models (23.08%). Evaluation of indicator weights should also include examining their significance by means of resampling procedures. While most studies applied blindfolding or bootstrapping, these procedures were primarily used to evaluate inner model parameter estimates rather than the significance of formative indicators’ weights. Only 25 models (17.48%) reported t-values or corresponding p-values. In fact, most researchers did not comment on this important issue.

Multicollinearity between indicators is an important issue in assessing formative measures because of the potential for unstable indicator weights (Cenfetelli and Bassellier 2009). Since formative indicator weights are frequently smaller than reflective indicators’ loadings, this can lead to misinterpretations of the indicator relevance for the construct domain (Diamantopoulos and Winklhofer 2001). Only 22 of 143 models (15.38%) using formative measures—all of which appeared in 2000 and beyond—assessed multicollinearity, relying primarily on the tolerance and variance inflation factor (VIF).

Overall, our findings regarding outer model assessments give rise to concern. First, given that reliability and validity assessments play a vital role in outer model assessment, the proportion of studies that do not report reliability and validity measures is disconcerting. If measures lack reliability and validity, inner model estimates may be substantially biased, leading researchers to overlook relationships that could be significant. Indeed, since PLS-SEM applications often serve as a basis for theory development, promising research avenues might have been overlooked. Further, despite the broad discussion on the inappropriateness of internal consistency-based measures for evaluating blocks of formative indicators (e.g., Diamantopoulos 2006; Diamantopoulos and Winklhofer 2001; Diamantopoulos et al. 2008), it is distressing to see that authors still apply a cookbook-like recipe used with reflective measures to assess formative ones. The limited multicollinearity assessment in earlier studies, can potentially be traced back to Diamantopoulos and Winklhofer’s (2001) article, which raised considerable awareness of this issue among marketing scholars.

With the emerging interest in the use of formative measurement in the marketing discipline, PLS-SEM is likely to be more widely applied. But PLS-SEM researchers need to pay closer attention to the validation of formative measures by taking into account standard (e.g., significance, multicollinearity) as well as recently proposed evaluation steps (e.g., absolute and relative indicator contributions, suppressor effects) (Bollen 2011; Cenfetelli and Bassellier 2009; MacKenzie et al. 2011).

Inner model evaluation

If the outer model evaluation provides evidence of reliability and validity, it is appropriate to examine inner model estimates. The classic CB-SEM–related distinction between variance and covariance fit is not applicable in PLS-SEM, which is primarily due to the PLS-SEM assumption of distribution-free variance. Thus, when using PLS-SEM, researchers must focus their evaluation on variance-based, non-parametric evaluation criteria to assess the inner model’s quality (e.g., Chin 1998, 2010; Henseler et al. 2009). Table 4 provides an overview of the results regarding inner model evaluation.

Table 4 Evaluation of inner models

The primary criterion for inner model assessment is the coefficient of determination (R²), which represents the amount of explained variance of each endogenous latent variable. In our review, 275 models (88.42%) reported R² values to assess the quality of their findings. Only 16 models (5.14%) considered a particular exogenous latent variable’s relative impact on an endogenous latent variable by means of changes in the R² values, based on the effect size f² (Cohen 1988). Sample re-use techniques proposed by Stone (1974) and Geisser (1974) can be used to assess the model’s predictive validity by means of the cross-validated redundancy measure Q². This technique is a synthesis of cross-validation and function fitting, and Wold (1982, p. 30) argues that it fits PLS-SEM “like hand in glove.” Nevertheless, only 51 models (16.40%) reported this criterion. Like f², the Q² can assess an individual construct’s predictive relevance for the model by omitting selected inner model relationships and computing changes in the criterion’s estimates (q²). None of the models reported this statistic, although research has stressed its importance for inner model evaluation (e.g., Chin 1998; Henseler et al. 2009).

Tenenhaus et al. (2004) proposed a global criterion for goodness-of-fit (i.e., the GoF index). This criterion is defined by the geometric mean of the average communality and the model’s average R² value. Thus, the GoF does not represent a true global fit measure (even though its name suggests this), and threshold values for an acceptable “goodness-of-fit” can hardly be derived because acceptable R² values depend on the research context (Hair et al. 2011) and the construct’s role in the model (e.g., key target construct versus mediating construct). Moreover, the GoF is not universally applicable for PLS-SEM as it is based on reflective outer models’ communalities. Thus, the proposed GoF is conceptually inappropriate whenever outer models are formative, or when single indicator constructs are involved. Despite these concerns, 16 studies (5.14%) reported this relatively new measure, five of which included single items or formatively measured constructs.

Standardized path coefficients provide evidence of the inner model’s quality, and their significance should be assessed using resampling procedures. A total of 298 models (95.82%) reported path coefficients, and 287 (92.28%) commented on their significance by, for example, providing t-value statistics and/or corresponding p-values. But none of the articles reported resampling-based confidence intervals (Henseler et al. 2009).

PLS-SEM applications are usually based on the assumption that the data stem from a single population. In many real-world applications, however, this assumption of homogeneity is unrealistic, as different population parameters are likely to occur for different subpopulations, such as segments of consumers, firms, industries, or countries. A total of 47 studies (23.04%) accounted for observed heterogeneity by considering categorical moderating variables and comparing corresponding group-specific path coefficient estimates, for example, by using multigroup comparison techniques (Sarstedt et al. 2011b). Other studies (15; 7.35%) evaluated interaction effects by modeling (continuous) moderator variables that potentially affect the strengths or direction of specific path relationships (e.g., Henseler and Chin 2010). However, while the consideration of observed heterogeneity generally proves valuable from a theory perspective, heterogeneity is often unobservable and cannot be attributed to any predetermined variable(s). The impact of unobserved heterogeneity on SEM results can be considerable, and if not carefully taken into account, may entail misleading interpretations (e.g., Jedidi et al. 1997). As a consequence, PLS-SEM analyses require the use of complementary techniques for response-based segmentation that allow testing for and dealing with unobserved heterogeneity. Finite mixture partial least squares (FIMIX-PLS; Hahn et al. 2002; Sarstedt et al. 2011a) is currently regarded the primary approach in the field (e.g., Rigdon et al. 2010). Based on a mixture regression concept, FIMIX-PLS simultaneously estimates inner model parameter and ascertains the data structure’s heterogeneity by calculating the probability of the observations’ segment membership so that they fit into a predetermined number of segments. However, in contrast to conventional mixture regressions, models in FIMIX-PLS can comprise a multitude of interrelated endogenous latent variables. In light of the approach’s performance in prior studies (e.g., Ringle et al. 2010a, b; Sarstedt and Ringle 2010) and its availability through the software application SmartPLS (Ringle et al. 2005), Hair et al. (2011) suggest that researchers should routinely use the technique to evaluate whether the results are distorted by unobserved heterogeneity. While the issue of unobserved heterogeneity is important in many marketing studies using PLS-SEM, none of the reviewed studies carried out this type of analysis.

Overall, our review shows that even though researchers have made a significantly (p ≤ 0.05) greater use of model evaluation criteria (i.e., R², f², Q², and GoF) in recent years (Table 4), they apply few of the criteria available for inner model assessment. We urge researchers to use a greater number of measures to assess the inner model’s quality. Gathering more evidence for or against the model’s quality is particularly required for PLS-SEM since it does not allow assessment of model fit as CB-SEM does. Moreover, such an analysis should always be supplemented by checking for heterogeneity.

Reporting

Reporting is a crucial issue in empirical marketing studies, and articles should provide readers with sufficient information to enable them to replicate the results and to fully assess the study’s quality, regardless of which method is applied (Stewart 2009). In general, PLS-SEM studies should provide information on (1) the population and sample structure, (2) the distribution of the data, (3) the conceptual model, including a description of the inner and outer models, as well as the measurement modes, and (4) the statistical results to corroborate the subsequent interpretation and conclusions (Chin 2010). In addition, researchers should report specific technicalities related to the software and computational options used as well as the parameter settings of ancillary analysis procedures.

It is crucial to know which software application was used as different programs exhibit different default settings. Despite this fact and license agreement requirements, only 100 studies (49.02%) indicated which software was used for model estimation. Of those providing this information, 64 studies used PLS Graph (Chin 2003), 24 used SmartPLS (Ringle et al. 2005), and 12 used LVPLS (Lohmöller 1987).

In addition, the initial values for the outer model relationships, parameter settings, computational options, and the resulting (maximum) number of iterations need to be reported, which none of the studies did. The selection of initial values for outer weights may impose changes in the outer models and/or the inner model estimates (e.g., Henseler et al. 2009). Specific parameter settings (e.g., stop criterion) and computational options (e.g., weighting scheme for determining inner model proxies) can also entail different model estimation outcomes and are sometimes inadequate in certain model configurations. For example, the centroid weighting scheme ensures the PLS-SEM algorithm’s convergence (Henseler 2010), but it should not be used for estimating higher order component models (Henseler et al. 2009). Finally, although the PLS-SEM algorithm usually converges for reasonably small stop criterion parameter settings (e.g., 10-5; Wold 1982), it may not converge in some extreme data constellations (Henseler 2010). Moreover, convergence is usually not reached if the stop criterion is extremely low (e.g., 10-20). To assess if the PLS-SEM algorithm converged before reaching the pre-specified stop criterion (i.e., the actual number of iterations is smaller than the pre-specified maximum number of iterations), and thus provides the optimum model estimates, one needs to know the (maximum) number of iterations.

While all PLS-SEM evaluations should rely on resampling procedures, only 135 studies (66.18%) explicitly mention the use of bootstrapping or jackknifing in model evaluation. The number reporting resampling is significantly (p ≤ 0.01) more prevalent in recent studies (115 before 2000; 20 thereafter), but it still needs to be higher. Moreover, only 66 of 135 studies, all of which appeared more recently, reported the concrete parameter settings. Reporting of this detailed information is important since misspecifications in, for example, the bootstrapping parameter settings (e.g., with regard to the number of bootstrap samples) can lead to biased standard error estimates (e.g., Chernick 2008).

Overall, none of the reviewed studies provided sufficient information to replicate and validate the analytical finding with only 10 studies (4.90%) reporting the empirical covariance/correlation matrix for the indicator variables. While readers usually gain a basic understanding of the analysis, overall transparency leaves much to be desired.

Impact of journal quality on PLS-SEM use

A final analysis addresses the question of whether a comparison of top tier journals with other leading journals reveals significant differences in PLS-SEM use. We therefore compare PLS-SEM use in the top five journals according to Hult et al.’s (2009) ranking (i.e., Journal of Marketing, Journal of Marketing Research, Journal of Consumer Research, Marketing Science, and Journal of the Academy of Marketing Science; 41 studies with 61 models) with those in other leading journals (i.e., those journals on positions 6 to 30 in Hult et al.’s (2009) ranking; 163 studies with 250 models) to determine the impact of journal quality on application and reporting practices. Surprisingly, our assessment reveals that there are few significant differences between the journal tiers.Footnote 8

Meeting the ten times rule of thumb for the minimum sample size is significantly (p ≤ 0.10) more prevalent in models published in other leading journals (231 models; 92.40%) than in the top tier journals (52 models; 85.25%). This finding is likely due to differences in model characteristics (Table 2). Specifically, PLS path models in the top tier journals include a significantly (p ≤ 0.10) larger number of latent variables and inner model relationships. Likewise, models in top tier journals are to a significant extent (p ≤ 0.01) more frequently unfocused (i.e., contain a relatively high number of endogenous latent variables relative to exogenous latent variables) and more often incorporate both reflective and formative measures.

In terms of outer model evaluation (Table 3), articles in other leading journals use Cronbach’s alpha significantly (p ≤ 0.10) more frequently for reliability assessment and more often apply reflective criteria to evaluate formative measures (p ≤ 0.05), both of which are inappropriate. Models in top tier journals significantly (p ≤ 0.01) more often report formative indicator weights and the corresponding significance levels. Most other aspects of outer and inner model assessment (Tables 3 and 4) yield no significant differences, indicating that, in general, reporting practices are equally prevalent among the top tier and other leading journals.

Conclusion

Our review substantiates that PLS-SEM has become a more widely used method in marketing research. But PLS-SEM’s methodological properties are widely misunderstood, which at times leads to misapplications of the technique, even in top-tier marketing journals. For instance, our findings suggest that researchers rely on a standard set of reasons from previous publications to substantiate the use of PLS-SEM rather than carefully pinpointing specific reasons for its use. Likewise, researchers do not fully capitalize on the criteria available for model assessment and sometimes even misapply measures. A potential reason for many researchers’, reviewers’, and editors’ unfamiliarity with the principles of PLS-SEM might be that textbooks on multivariate data analysis do not discuss PLS-SEM at all (e.g., Churchill and Iacobucci 2010; Malhotra 2010), or only superficially (e.g., Hair et al. 2010). In fact, there is still no introductory textbook on PLS-SEM, and the topic is seldom found in research methodology class syllabi. Journal editors and reviewers should more strongly emphasize making all information available, including the data used, to allow the replication of statistical analyses (e.g., in online appendices). Progressing toward the highest possible level of transparency will substantially improve the way in which research is conducted and its quality, thus allowing accelerated development paths.

Several points should be considered when applying PLS-SEM, some of which, if not handled properly, can seriously compromise the analysis’s interpretation and value. Many of these issues, such as the performance of different weighting schemes for algorithm convergence (Henseler 2010) and the limitations of conventional statistical tests in multigroup comparisons (e.g., Rigdon et al. 2010; Sarstedt et al. 2011b), have been reported in the literature. Researchers should consider the PLS-SEM’s methodological foundations and complementary analysis techniques more strongly. Similarly, discussions in related fields, such as the recent debate on the validation of formative measures in management information systems research (e.g., Bollen 2011; Cenfetelli and Bassellier 2009; MacKenzie et al. 2011), can provide important guidance for PLS-SEM applications.

Based on the results of our review, we present guidelines for applying PLS-SEM which provide researchers, editors, and reviewers with recommendations, rules of thumb, and corresponding references (Table 5).

Table 5 Guidelines for applying PLS-SEM

While offering many beneficial properties, PLS-SEM’s “soft assumptions” should not be taken as carte blanche to disregard standard psychometric assessment techniques. The quality of studies employing PLS-SEM hopefully will be enhanced by following our recommendations, so that the method’s value in research and practice can be clarified.