1 Introduction

Over recent years, partial least squares structural equation modeling (PLS-SEM) has become more popular across disciplines such as accounting [1], management information systems [2], marketing and strategic management [3], operations management [4], supply chain management [5], tourism [6], as well as health care [712]. Nevertheless, articles in health care often fail to detail the advantages and disadvantages of PLS-SEM and robustness testing is not undertaken. This article presents an open discussion of PLS-SEM and reports the steps to be followed in a thorough analysis. The discussion also includes a conceptual comparison of PLS-SEM with the more traditional covariance-based structural equation modeling (CB-SEM) to enable researchers make appropriate choices. Following PLS-SEM analysis, robustness testing is undertaken using generalized structured component analysis (GSCA). The PLS-SEM analysis is illustrated in the context of residential aged care networks using data from Avkiran and McCrystal [13].

By definition, latent constructs cannot be directly observed but can be observed indirectly through a number of indicators, e.g. customer service quality, effectiveness of management, wellness of staff. PLS-SEM has established itself as an appropriate method when working with composite models of prediction in exploratory research. PLS-SEM is robust with skewed data [14, 15] and it is particularly relevant with secondary data frequently found in business databases where distributional constraints are unlikely to be satisfied. PLS-SEM is a non-parametric, multivariate approach based on iterative OLS regression [16, 17]. The main objective of PLS-SEM is to maximize the explained variance of endogenous latent constructs where the assumption of multivariate normality is relaxed. Hair, Ringle and Sarstedt [11] and Hair et al. [3, 19, 20] provide an introduction to PLS-SEM, whereas Monecke and Leisch [21] deliver a step-by-step explanation of the mathematics behind its algorithm. “PLS is primarily intended for causal-predictive analysis in situations of high complexity but low theoretical information.” [[22], p.270].

PLS-SEM, as a predictive method, has a broad range of applications to managerial challenges, in particular, where there is human interaction. For example, the illustrative example used in this article explains overall quality of care (a latent construct) by observing other latent constructs such as low-level care (e.g. in hostels) and high-level care (e.g. in nursing homes) provided as part of a residential aged care network. PLS-SEM allows for critical exploratory research to lay the groundwork for follow-up studies using methods with stricter assumptions.

Section 2 of this article discusses PLS-SEM versus CB-SEM and offers practical guidelines for making a choice between the two methods. Section 3 describes the illustrative setting of a residential aged care network and proposes two hypotheses. Section 4 outlines the steps to be followed in a thorough PLS-SEM analysis and reports results, including robustness testing. Section 5 offers some concluding remarks.

2 PLS-SEM versus CB-SEM and practical guidelines

PLS-SEM modeling is comprised of three main components (a) the structural or inner model, (b) the measurement or outer models, and (c) the weighting scheme. A group of manifest variables (indicators) associated with a latent construct is known as a block, and an indicator is associated with one construct. Recursive models are needed, where there are no circular relationships or loops and the model is a predictive chain [19, 23]. PLS-SEM is robust with skewed data because it transforms non-normal data according to the central limit theorem [19, 24]. Top three reasons for choosing PLS-SEM are non-normal data, small sample size and presence of formative indicators (see Table 1 in [20]).

Table 1 PLS-SEM (reduced model) versus GSCA (robustness test)

Nevertheless, PLS-SEM has been criticized for giving biased parameter estimates because it does not explicitly model measurement error [25], despite employing bootstrapping and blindfolding to estimate standard errors for parameter estimates. Sohn, Han and Jeon [26] restate this potential shortcoming as PLS-SEM parameter estimates that are based on limited information not being as efficient as those based on full information estimates found in CB-SEM. However, Chin [27] sees this as a major shortcoming of CB-SEM because of the assumption of the specified model as being true. That is, as a full information approach, any model misspecification in CB-SEM can impact estimates throughout the analysis, and unlike PLS-SEM, the overall model fit does not differentiate between proximity of constructs. Attempts so far to develop goodness-of-fit indices for PLS-SEM have not been successful [28].

Measurement error structures can be modeled via a factor analytic approach in CB-SEM but it comes at the cost of covariances among the observed variables conforming to overlapping proportionality constraints, i.e. measurement errors are assumed to be uncorrelated [29]. CB-SEM assumes homogeneity in the observed population [30]. Such constraints are unlikely to hold unless latent constructs are based on highly developed theory and the measurement instrument is refined through multiple stages. Therefore, secondary data frequently found in business databases are not likely to satisfy such constraints. Under such circumstances, CB-SEM that relies on common factors would not be appropriate, and PLS-SEM that relies on weighted composites would be used because of its less restrictive assumptions. In addition, use of formative indicators in CB-SEM is problematic because of identification problems and thus, the reduced ability of CB-SEM to reliably capture measurement error [31]. Chin [27] discusses the advantages and disadvantages of CB-SEM and PLS-SEM.

Itemizing practical guidelines for the choice between CB-SEM and PLS-SEM could help those researchers new to SEM. According to Hair et al. [19, see Table 1.6], PLS-SEM can be employed when

  1. 1.

    Research is exploratory, i.e. an environment of underdeveloped theory.

  2. 2.

    Sample size is small: The rule of thumb commonly applied in PLS-SEM requires the sample to be at least ‘ten times the maximum number of indicators associated with an outer model (construct)’ [32]. This rule should be considered as the bare minimum and researchers are advised to consult Cohen [33] for power tables to identify a more project-specific adequate sample size. Hair et al. [19] provide one such example in Exhibit 1.7 on page 21 and also suggest use of the G*Power program (available from http://www.gpower.hhu.de/en.html). Sample should also be compared to the population, i.e. a small sample from a large population will not give reliable results.

  3. 3.

    Data are non-normal: Reinartz, Haenlein and Henseler [34] warn against placing too much emphasis on this consideration. The authors’ results (from Monte Carlo simulations) indicate CB-SEM can be robust to violations of normality.

  4. 4.

    The research goal is predicting key target constructs.

  5. 5.

    Structural model is complex and there are many constructs.

  6. 6.

    Modeling is recursive, i.e. no circular relationships.

The researcher should lean towards CB-SEM when the following conditions are satisfied:

  1. 1.

    Main objective is theory confirmation or comparing alternative theories.

  2. 2.

    Structural model has non-recursive relationships where circular relationships are allowed.

  3. 3.

    Global goodness-of-fit is required.

  4. 4.

    Error terms require additional specification such as measurement of covariation.

In summary, selecting between PLS-SEM and CB-SEM will depend on whether the underlying theory is exploratory or advanced, the types of latent constructs used, nature of the data and sample size. It has also been noted that estimates from PLS-SEM and CB-SEM converge as sample size grows, as long as assumptions about distributions hold and the model is correct [35]. Those interested in further critique/rebuttal of PLS-SEM are encouraged to read Henseler et al. [15]. Similarly, a highly readable introduction to CB-SEM can be found in Lei and Wu [36], and Reinartz, Haenlein and Henseler [34] provide an empirical comparison of the performance of CB-SEM and PLS-SEM.

The next section describes the illustrative setting of residential aged care networks published in Health care Management Science and develops hypotheses.

3 The illustrative setting: residential aged care networks

I borrow some of the concepts outlined in Avkiran and McCrystal’s [13] study of organizational productivity to illustrate PLS-SEM. Assuming a residential aged care network consists of low-level care (e.g. hostel) and high-level care (e.g. nursing home), I examine the contribution of these to the overall quality of care. For example, registered nurses and their average length of service (i.e. experience in years capturing quality of care) and other caregivers form some of the inputs that define low-level and high-level care; the fourth formative indicator is the average resident classification score (ten-point scale) capturing the level (intensity) of care needed. Reflections of the overall quality of care are average length of stay (longer is desirable), and three undesirable indicators (reciprocals are taken in order to reverse the causality), namely, annual number of hospitalizations, average severity of hospitalizations and mortality rate (see [13], p.115). As a result of the above theoretical discussion, two main hypotheses emerge:

  • H1: Overall quality of care in a residential aged care network is significantly explained by the low-level care provided in associated hostel(s).

  • H2: Overall quality of care in a residential aged care network is significantly explained by the high-level care provided in associated nursing home(s).

PLS-SEM analysis is used with the overall objective of rejecting null hypotheses regarding path relationships between constructs and accepting the two alternative hypotheses outlined above.

Figure 1 charts the PLS-SEM modeling undertaken in this article. Circles represent the latent variables or constructs that comprise the structural model; left-hand rectangles depict the formative indicators (composite indicators) theorized as underlying sources of the two exogenous latent constructs; right-hand rectangles depict the reflective indicators theorized as the consequences of the endogenous or target latent construct.

Fig. 1
figure 1

Charting a predictive PLS-SEM model in residential aged care networks. Legend: RN-FTE, registered nurses full-time equivalent: RN-ALS, registered nurses average length of service: OC-FTE, other caregivers full-time equivalent: ARCS, average resident classification score: ALOS, average length of stay: ANH, annual number of hospitalizations: ASH, average severity of hospitalizations: MR, mortality rate

In PLS-SEM, formative indicators represent sources that form associated exogenous latent constructs. The overlap among formative indicators is minimized because they are considered to be complementary. The exogenous latent constructs illustrated in Fig. 1 become the dependent variables in multiple regression where the associated formative indicators are the independent variables. Reflective indicators are consequences or manifestations of the underlying target latent construct, i.e. causality is from the construct to the indicator. Because of substantial overlap among reflective indicators they are treated as interchangeable. The endogenous latent construct becomes the independent variable in single regression runs where the reflective indicators individually become the dependent variable in each run.

The following section details the PLS-SEM analysis step-by-step and outlines various statistical criteria to be interpreted in the context of accepted guidelines in literature.

4 Evaluating the PLS-SEM measurement and structural models

I use the software SmartPLS 3 [37] for conducting the PLS-SEM analysis. I detail the procedure to be followed and encourage the reader to refer to Table 3 in [18], Table 5 in [3] and Hair et al. [19] for further notes on the outlined procedure. Other useful references are Tenenhaus et al. [23] who offer a step-by-step mathematical exposition of PLS-SEM, and Chin [27] who focuses on reporting in the second half of his chapter. The sample of residential aged care networks borrowed from Avkiran and McCrystal’s [13] simulation is N = 100 – a sample size that passes the minimum sample size guidelines outlined in section 2 under practical guidelines.

In the next sub-section, I begin by outlining the important statistical criteria for the reflective measurement model, and then I move to interpreting the formative and structural models.

4.1 Reflective measurement model

  • Internal consistency: According to Hair et al. [3, 20], composite reliability is a better measure of internal consistency because it avoids underestimation often seen with Cronbach’s alpha and accommodates differences in indicator reliabilities expected by PLS-SEM. A composite reliability of 0.6 is acceptable in exploratory research [3] but values above 0.95 indicate redundancy [19]. Composite reliability is only relevant for the reflective measurement model. In the current analysis, composite reliability is low at 0.398.

  • Indicator reliability: Outer loadings greater than 0.7 are desirable [18]. Square of this standardized outer loading represents communality, that is, how much of the variation in the indicator is explained by the endogenous construct, and 1 minus communality reveals the measurement error variance. However, Hair et al. [3] state that in exploratory research, outer loadings as low as 0.4 are acceptable. Otherwise, if less than 0.4, the reflective indicator can be deleted (at the very least, all remaining loadings should be statistically significant). Figure 2 shows that outer loadings for two of the reflective indicators are very low and can be considered for deletion, i.e. ALOS and MR.

  • Convergent validity: Average variance extracted (AVE) greater than 0.5 is preferred; this ratio implies that greater than 50% of the variance of the reflective indicators have been accounted for by the latent variable. AVE is only relevant for the reflective measurement model. When examining reflective indicator loadings, it is desirable to see higher loadings in a narrow range, indicating all items are explaining the underlying latent construct, i.e. convergent validity [27]. AVE is low at 0.273.

  • Discriminant validity: Fornell-Larcker criterion states that the square root of AVE must be greater than the correlation of the reflective construct with all other constructs; this criterion is not applicable to formative measurement models and single-item constructs. The square root of AVE is 0.522 and is greater than the construct correlations.

Fig. 2
figure 2

PLS-SEM analysis of overall quality of care in residential aged care networks (see legend of Fig. 1)

Interpretation of the formative measurement model follows.

4.2 Formative measurement model

Under the formative measurement model, it is assumed that the exogenous construct (latent variable) is defined by the formative (composite) indicators that could be multidimensional. It is important that the researcher establishes theoretical content validity before attempting empirical analysis to ensure that the major dimensions of the construct have been covered by the indicators.

  • Convergent validity: Convergent validity is the degree to which an indicator is positively correlated with other indicators of the same construct. As a result, it is necessary to test whether a formative construct is highly correlated with a reflective measure of the same construct. Higher path coefficients linking the exogenous and endogenous constructs are preferred, implying adequate coverage by the formative indicators [27]. A substantial coefficient of determination is also a good indication of convergent validity. Path coefficients are shown in Fig. 2 where the low-level care unit appears to make a small contribution to the overall quality of care in the residential aged care network.

  • Collinearity among indicators: When collinearity exists, standard errors and thus variances are inflated. A variance inflation factor (VIF) is calculated for each of the explanatory variables in OLS regression, and VIF must be less than 5 [18], i.e. VIF represents the factor by which variance is inflated. Statistically, VIF is the reciprocal of tolerance,\( \left(1-{R}_i^2\right) \), where the latter is defined as the variance of a formative indicator not explained by others in the same block. A VIF of 1 means there is no correlation among the predictor variable examined and the rest of the predictors, and therefore, the variance is not inflated. If VIF is higher than 5, the researcher should consider removing indicators, or combine the collinear indicators into a new composite indicator. The VIF is an acceptable 1.027.

  • Significance and relevance of outer weights: ‘Weight’ is an indicator’s relative contribution; ‘loading’ is an indicator’s absolute contribution. To assess significance, one can start bootstrapping with 5000 sub-samples in order to check whether outer weights are significantly different from zero, i.e. the recommended minimum by Hair, Ringle and Sarstedt [18]. Indicators with significant outer weights are kept; otherwise, an indicator can still be kept if its outer loading, that is, its absolute contribution is greater than 0.5. Insignificant formative indicators based on p-values (i.e. higher than 5%) with outer loadings less than 0.5 can be removed from the model for being irrelevant. ARCS(LLC), RN-ALS(HLC), ARCS(HLC), ALOS and MR indicators are indicated as non-significant.

Interpretation of the structural model is next.

4.3 Structural model

If the outer models, that is, measurement models are not reliable, little confidence can be held in the structural (inner) model. Analysis of the structural model is an attempt to find evidence supporting the theoretical model, i.e. the theorized relationships between exogenous constructs and the endogenous construct.

  • Predictive accuracy, coefficient of determination (R 2): This statistic indicates to what extent the exogenous construct(s) are explaining the endogenous construct. According to Hair, Ringle and Sarstedt [18] and Hair et al. [19], in marketing discipline 0.25 (weak), 0.50 (moderate) and 0.75 (substantial). However, unless the adjusted R 2 is used (for a formal definition, see Hair et al. [19], p.176), this coefficient can be upward-biased in complex models where more paths are pointing towards the endogenous construct. More importantly, coefficient of determination needs to be judged in the context of a research project’s discipline. Adjusted R 2 equals a healthy 0.593.

  • Predictive relevance (Q 2): This statistic is obtained by the sample re-use technique called ‘Blindfolding’ where omission distance is set between 5 and 10, where the number of observations divided by the omission distance is not an integer [3]. For example, if you select an omission distance of 7, then every seventh data point is omitted and parameters are estimated with the remaining data points. Estimated parameters help predict the omitted data points and the difference between the actual omitted data points and predicted data points becomes the input to calculation of Q 2. Blindfolding is applied only to endogenous constructs with reflective indicators. If Q 2 is larger than zero, it is indicative of the path model’s predictive relevance in the context of the endogenous construct and the corresponding reflective indicators. While Q 2 is small at 0.112, it is larger than zero.

  • Assessing the relative impact of predictive relevance (q 2): Following from the above analysis of predictive relevance, q 2 effect size can be calculated by excluding the exogenous constructs one at a time (Hair et al. [19], p.183). According to Hair, Ringle and Sarstedt [38] and Hair et al. [19], effect size of 0.02 is considered small, 0.15 is moderate and 0.35 is large. Excluding the low-level care and the high-level care constructs one at a time results in q 2 values of −0.0045 and 0.1644, respectively. Clearly, the effect size of the low-level care unit is very small.

  • Assessing the effect sizes’( f 2): This statistic measures the importance of the exogenous construct(s) in explaining the endogenous construct and it re-calculates R 2 by omitting one exogenous construct at a time. Again, effect size of 0.02 is small, 0.15 is moderate and 0.35 is large. f 2 equals 1.353 for the high-level care exogenous construct and a rather low 0.038 for the low-level care exogenous construct.

  • Significance of path coefficients: Bootstrapping is needed, following which p-values for the path coefficients are checked. For the high-level care construct the p-value is 0.000 and for the low-level care construct the p-value is insignificant at 0.424.

The above analysis indicates removal of indicators ARCS(LLC), RN-ALS(HLC), ARCS(HLC), ALOS and MR to improve the model parameters. Following deletion, while the path coefficient of the low-level care construct remains insignificant, other parameters improve significantly (see Fig. 3). For example, composite reliability rises from a low 0.398 to 0.665 (in the acceptable range); AVE (convergent validity) moves from a low 0.273 to a healthy 0.543; VIF (collinearity) is equally acceptable at 1.025; adjusted R 2 (predictive accuracy) is similar at 0.588; and Q 2 (predictive relevance) more than doubles from 0.112 to 0.280.

Fig. 3
figure 3

PLS-SEM analysis of overall quality of care in residential aged care networks based on the reduced model (see legend of Fig. 1)

Next, I describe robustness testing to confirm the main findings of the PLS-SEM analysis.

4.4 Robustness testing

Generalized structured component analysis (GSCA) was introduced by Hwang and Takane [39, 40] as an alternative to PLS-SEM. I apply GSCA as a robustness test because it belongs to the same family of methods. Both PLS-SEM and GSCA are variance-based methods appropriate for predictive modeling and they substitute components for factors.

GSCA maximizes the average or the sum of explained variances of linear composites, where latent variables are determined as weighted components or composites of observed variables. GSCA follows a global least squares optimization criterion, which in turn, is minimized to generate the model parameter estimates. GSCA is not scale-invariant and it standardizes data. GSCA is supposed to retain the advantages of PLS-SEM such as less restrictions on distributional assumptions (i.e. multivariate normality of observed variables is not required for parameter estimation), unique component score estimates, and avoidance of improper solutions with small samples [39, 41].

Regarding model specification, GSCA has one equation while PLS-SEM has two equations, and GSCA uses a global optimization function in parameter estimation with least squares (see Tables 1 in [41, 42]). As Marcoulides, Chin and Saunders ([43], p.174) clearly point out “…comparison of PLS to other methods cannot and should not be applied indiscriminately.” I re-state that CB-SEM is not a feasible or meaningful alternative to PLS-SEM under the conditions of the current study, where the sample size is relatively small, formative indicators are present and theorized model is exploratory.

I use the web-based GSCA software GeSCA (http://www.sem-gesca.org/) for robustness testing of the reduced model. As can be seen in Table 1, the main PLS-SEM results are confirmed by GSCA. For example, AVE is close to each other; outer loadings are of similar magnitude across the two reflective indicators; the same path coefficient is statistically significant in the structural model; and, the coefficients of determination are very close to each other.

In summary, based on PLS-SEM and GSCA, the H 1 hypothesis (i.e. the contribution of low-level care) outlined in section 3 is rejected, while the H 2 hypothesis (i.e. the contribution of high-level care) is accepted. In terms of PLS-SEM analysis, rejection of H 1 hypothesis emerges from multiple tests. For example, in section 4.2 under convergent validity, it is pointed out that the low-level care unit makes a small contribution to the overall quality of care (the path coefficient in Fig. 2 is only 0.126). Similarly in section 4.3, q 2 and f 2 are very low at −0.0045 and 0.038 respectively, and following bootstrapping this path coefficient emerges as insignificant at a p-value of 0.424.

The main findings based on the illustrative example of residential aged care networks follows in the final section of the paper, with potential examples of PLS-SEM applications in health care, and a further discussion of PLS-SEM vs CB-SEM.

5 Concluding remarks

The reduced model of PLS-SEM shows that close to 59% of the variation in overall quality of care in residential aged care networks can be explained by a parsimonious model of five formative indicators (where high-level care plays a greater role), and two reflective indicators. The illustration outlines the steps involved in a thorough analysis and highlights the solutions to problems encountered during analysis. Furthermore, the article provides key references from outside the health care literature that are often overlooked by health care articles. Some of the potential applications of PLS-SEM in health care include wellness of staff and patients, effectiveness of management, customer service quality, impact of health information technology on nurses, psychological ownership, achievement of multiple strategic goals at hospitals, patient engagement and safety, job satisfaction of physicians, and so on. PLS-SEM can become an indispensable tool for managers, policy makers and regulators in the health care sector.

Differing viewpoints exist in literature regarding PLS-SEM and CB-SEM. In summary, selecting between PLS-SEM and CB-SEM is dependent on data characteristics, sample size, the types and numbers of latent constructs modelled (i.e. reflective versus formative and complexity of the model), and the nature of the underlying theory in terms of exploratory versus advanced. In the presence of multiple considerations, PLS-SEM and CB-SEM should be treated as complementary rather than competing methods.

A critical examination of PLS-SEM versus CB-SEM can be found in Rigdon [44]. The author disassembles some of the myths perpetuated by each method’s followers. Rigdon maintains that regardless of the method employed, if the sample size is small, the best approach is to collect more data. The author further points out that both SEM methods form proxies (rather than conceptual variables) out of data. Such proxies need to be well-founded representations of conceptual variables to generate valid findings. Statistical methods’ performance is lower when there is misspecification. Rigdon also maintains that absence of a reliable measurement of error in PLS-SEM is not a valid protest to its application because neither PLS-SEM nor CB-SEM can remove the impact of measurement error on findings.

By examining the use of SEM, Richter et al. [45] conclude that PLS-SEM is not fully utilized in the theorizing process. The authors underline that exploring is the first step in theory building where hypotheses are developed. They also highlight that PLS-SEM can be used for prediction and exploration in complex models where assumptions on data are relaxed. Some of the findings by Richter et al. [45] indicate poor compliance with basic PLS-SEM guidelines such as, (a) using a holdout sample, (b) fully reporting the distribution of data, (c) substantiating measurement mode, (d) detailing contribution of indicators, (e) identifying collinearity, and (f) reporting effect sizes. The authors acknowledge that PLS-SEM is helpful in identifying relationships between constructs and explaining them. Richter et al. [45] recommend that a study’s purpose and theoretical basis should be the main selection criteria between PLS-SEM and CB-SEM. In other words, if the primary aim is theory development than PLS-SEM is better suited and other issues such as sample size, distributional assumptions etc. should be of secondary concern.