Keywords

1 Introduction

The intensive alteration to the natural environment by human beings has been posing challenges to the natural and socio-economic systems (Munda and Saisana 2011). Global warming causes a huge economic loss to agriculture sector of some countries including China in the past decade (Chen et al. 2016). Air pollution has severe negative health effects, especially for those vulnerable people such as the elderly, infant and child (He et al. 2016). Some pollution reduction mandates by central and local governments also trigged severe unintended consequences. For example, due to the effect “polluting thy neighbor”, the most downstream county of a province in China has up to 20% more water polluting activities than other counterparts since 2001 (Cai et al. 2016). Undoubtedly, problems resulted from climate change, environmental pollution, depletion of natural resources and others have been threatening the development of our society (Tilman et al. 2002). A rising concern has been voiced in scientific community and policy circles on how human beings should interact with nature, and how they are responsible for future generations in a sustainable way (Baumgärtner and Quaas 2010). Indeed, there are so many efforts and initiatives towards sustainable development in our society. However, whether those activities are adequate for pursuing sustainable development is still questionable (Sala et al. 2015). To provide a scientific basis for fighting with climate change and avoiding unanticipated consequence, the status of sustainability should be evaluated in a solid and reliable manner to assess whether the target of “meet the needs of the present without compromising the ability of future generations to meet their own needs” has gradually been realized (WCDE 1987).

Theoretically, sustainability is a multi-dimension (e.g. economic, social and ecological) concept encompassing internal relationships between different dimensions, which brings difficulty in sustainability assessment (Mayer 2008). In addition, the issues such as multi-interpretation in concept, the determination of boundaries and measurability, also cause a rising concern on the reliability of sustainability assessment (Hák et al. 2012). Different methods have been introduced into sustainability assessment, e.g. indicators, product-based assessment, and integrated assessment (Ness et al. 2007). The indicator approach, owing to some desirable properties like simplicity, quantifiability and timely indentification of the trends, has acquired compelling attention in the literature of environmental and ecological economics (Dı́az-Balteiro and Romero 2004). At the end of last century, the United Nations suggested to develop indicators for sustainable development to provide an analytical foundation for policy analysis and decision making at different levels (UNCED 1992). Since then, various indicators have been developed, e.g. Ecological Footprint (Wackernagel and Rees 1998), Environmental Sustainability Index (Esty et al. 2005), Human Development Index (UNDP 2014), Environmental Policy Stringency Index (Botta and Kozluk 2014), World Energy Trilemma Index (WEC 2016), Oxford Sustainability Index (OCC 2016), and Environmental Performance Index (Hsu et al. 2016). According to Zhou and Ang (2008), the existing sustainability indicators may broadly be grouped into non-composite and composite indicators. Non-composite indicators are usually in the form of a set of indicators or an integrated indicator. The approach of composite indicators aims to aggregate various indicators into a single real-valued score to represent an entity’s sustainable performance. As Nardo et al. (2008) argued, composite indicators can reduce the visible size of indicators and are easier to interpret than a set of individual indicators. Hence, composite indicators have recently gained much popularity in sustainability assessment. Hereafter we refer to composite indicators for sustainability assessment as composite sustainability indicators (CSI) for convenience purpose.

The reliability of a CSI heavily depends on the underlying methods which are used for constructing the CSI. In the past decades, scholars have contributed to developing alternative methods for constructing CSI. See, for example, van den Bergh and Veen-Groot (2001), Cherchye and Kuosmanen (2004), Dı́az-Balteiro and Romero (2004), Munda (2005), Despotis (2005a, b), Zhou et al. (2007) and Zanella et al. (2015). In parallel, Ebert and Welsch (2004) showed how to construct a meaningful environmental index from the social choice perspective. Zhou et al. (2006a) proposed an information loss criterion for comparing different aggregation functions. More recently, Pollesch and Dale (2015, 2016) investigated on the application of aggregation theory and normalization methods to sustainability assessment. Zhou et al. (2017) further looked through the meaningfulness of composite environmental indices and showed that a cardinally meaningful composite indicator can be constructed by nonparametric frontier approach. Several scholars have also reviewed past CSI studies with emphasis on their theoretical and conceptual developments, e.g. Parris and Kates (2003), Ness et al. (2007) and Mori and Christodoulou (2012). As pointed out by Mayer (2008), the identification of bias introduced by method choice plays a significant role in improving the utility of CSI for supporting policy making. The study by Böhringer and Jochem (2007) highlights the significance of scientifically sound methods for normalization, weighting and aggregation in building meaningful CSI. The purpose of this chapter is to provide a systematic literature review of the methodological developments in constructing CSI. It is expected that such a review provides not only a sketch of the mainstream methods with their strengths and weaknesses but also useful insights on the choice of an appropriate method for constructing CSI in various application scenarios.

The rest of this chapter is organized as follows. Section 2 provides a description of the framework on CSI construction. Section 3 summarizes the most popular or promising methods in constructing CSI. In Sect. 4, we discuss the influential factors and principles in method choice at the stages of normalization and aggregation. The last section concludes this study with discussions on potential future research topics.

2 Generic Procedure of Constructing CSI

The construction of CSI starts from the determination of a set of indicators for the entities whose sustainable performance is to be evaluated. The given information may be represented by a performance matrix X as shown in Eq. (1) that deals with m entities and n indicators.

$$\begin{aligned} X & = \left[ {\begin{array}{*{20}c} {x_{11} } & \cdots & {x_{1n} } \\ \vdots & \ddots & \vdots \\ {x_{m1} } & \cdots & {x_{mn} } \\ \end{array} } \right]\quad \left( {m,n \ge 2} \right) \\ W & = \left[ {\begin{array}{*{20}c} {w_{1} } & \cdots & {w_{n} } \\ \end{array} } \right] \\ \end{aligned}$$
(1)

In Eq. (1), \({\text{x}}_{\text{ij}}\) refers to the performance value of entity i pertinent to indicator j, and W is a weight vector in which \({\text{w}}_{\text{j}}\) denotes the weight assigned to indicator j. The indicators for sustainability performance evaluation are usually measured by different units. In order to aggregate individual indicators into CSI, some aggregation methods require each indicator to be dimensionless by certain transformation function, i.e. \({\text{V}} = {\text{v}}\left( {\text{X}} \right)\). We assume that the performance matrix after normalization is denoted by

$$V = \left[ {\begin{array}{*{20}c} {v_{11} } & \cdots & {v_{1n} } \\ \vdots & \ddots & \vdots \\ {v_{m1} } & \cdots & {v_{mn} } \\ \end{array} } \right]\quad \left( {m,n \ge 2} \right)$$
(2)

Once the weight vector W is determined, different aggregation techniques might be used to combine individual indicators into real-valued CSI. In general, the aggregation techniques aim to:

  • Seek a function \({\text{r}}_{\text{i}} = {\text{f}}_{1} \left( {{\text{X}}\left( {\text{or V}} \right),{\text{W}}} \right)\) for providing sustainable performance rankings of entities, and/or

  • Seek a function \({\text{u}}_{\text{i}} = {\text{f}}_{2} \left( {{\text{X}}\left( {{\text{or }}\;{\text{V}}} \right),{\text{W}}} \right)\) for providing sustainable performance index for each entity.

As described above, the construction of CSI from a set of pre-defined indicators mainly involves the normalization of indicators, the assignment of indicator weights and the search for an appropriate aggregation function. In literature, various methods have been used for these three steps, which might broadly be classified into two categories. One is based on multi-attribute decision making (MADM), and the other is based on benefit of the doubt (BOD) that is a data envelopment analysis (DEA)-like approach (see Fig. 1). Although BOD methods in the broad sense can be attributed to MADM, they are different from each other in several aspects. For example, by MADM, a set of common weights are shared by each entity. However, BOD methods often aim to find a different set of weights which are most favorable for each entity. Besides, CSI based on BOD methods may not require normalization.

Fig. 1
figure 1

Classification of the methods for constructing CSI

3 MADM Methods

MADM is a well-established methodology with the aim to make choice under multiple conflict criteria be more explicit, rational and efficient (Yoon and Hwang 1995). MADM methods can be involved in all procedure of constructing CSI. In the followings, we shall describe the steps with focus on the methods used. It starts from normalization followed by data aggregation, by which the importance of indicator weighting is highlighted and discussed.

3.1 Normalization Methods

The underlying indicators for assessing sustainability are generally in different measurable units, and different indicators have distinct ranges or scales (Mayer 2008). As such, normalization procedure is often taken for making underlying indicators comparable (Nardo et al. 2008). The commonly used normalization methods in constructing CSI may be categorized into three categories, namely, standard deviation from the mean (i.e. z-score), distance from a reference, and distance from best and worst performers (i.e. re-scaling). Table 1 provides a description of the three normalization methods.

Table 1 The commonly used normalized methods

Z-score is used to statistically measure the relationship between the value of a sustainable indicator and the mean of the values of the sustainable indicator system. It has an average of zero, indicating that it can avoid introducing aggregation distortions stemming from the differences in indicator means (Freudenberg 2003). The positive (negative) value indicates that it is above (below) the mean by how many standard deviations. Z-score transforms the original variables into a common scale. These desirable characteristics make it be frequently used in normalization, see Floridi et al. (2011).

Distance to a reference aims to normalize the underlying indicators by measuring the distance of an entity to a reference point. When the sustainable indicators are ratio-scale and the distance to a reference method is used for normalization, the CSI derived from the simple additive weighting aggregation function are found to be meaningful (Ebert and Welsch 2004). In operation, the method of distance to a reference needs to first determine the reference point, which could be the leader of the entities (Zhou et al. 2006a) or an external benchmark (Nardo et al. 2008). A popular practice in application is to use the base time as a reference so that the sustainability performance of entities could be dynamically monitored. Examples of such studies can be found in Kang (2002), Kang et al. (2002), Krajnc and Glavič (2005) and Cherchye et al. (2007a). In the circumstance, it is also possible to make meaningful comparison over time when panel data is involved (Cherchye et al. 2007a).

Re-scaling method attempts to re-scale the original indicators to dimensionless range [0, 1] by using the global maximum and minimum. One well-known example is the Human Development Index. Other examples using the re-scaling method can be found in Neumayer (2001), Dı́az-Balteiro and Romero (2004), Hajkowicz (2006), Gómez-Limón and Riesgo (2009), and Gómez-Limón and Sanchez-Fernandez (2010).

3.2 Aggregation Methods

The aggregation process implies the search for an appropriate function that can incorporate multiple indicators into a single composite indicator. In the literature, there are many aggregation functions available for use. Most of them can be represented by the following equation:

$$CI_{i} = \left\{ \begin{aligned} & \left[ {\mathop \sum \limits_{j = 1}^{n} w_{j} \left( {v_{ij} } \right)^{\beta } } \right]^{{\frac{1}{\beta }}} \quad {\text{for}}\quad \beta \ne 0 \\ & \mathop \prod \limits_{j = 1}^{n} v_{ij}^{{w_{j} }} \quad {\text{for}}\quad \beta = 0 \\ \end{aligned} \right.$$
(3)

When the parameter \(\upbeta\) is assigned to different values, the aggregation function will collapse to different forms. It should be pointed out that the values of \(\upbeta\) have an impact on the trade-offs between different indicators. More discussions can be found in Decancq and Lugo (2013). Table 2 shows several aggregation functions which are often used in constructing CSI.

Table 2 Several popular aggregation methods

3.2.1 Simple Additive Weighting

When the parameter \(\upbeta = 1\), Eq. (3) reduces to the simple additive weighting (hereafter referred to as SAW). In the context of constructing CSI, the simple additive weighting might be the most commonly used aggregation function, e.g. Kang (2002), Kang et al. (2002), Krajnc and Glavič (2005), Esty et al. (2005), Hajkowicz (2006), Singh et al. (2009), Murillo et al. (2015), and Global Warming Potential (IPCC 2001). The SAW method is easy to understand and can visualize the relative contribution of each indicator to the CSI. Since the assumption of preferentially independent relationship between indicators may not be satisfied in practice, statistical techniques such as principal component analysis (PCA) and factor analysis (FA) are often applied before aggregation (Grupp and Schubert 2010). Additionally, the use of SAW allows for the full substitutability between the indicators so that the weights imply trade-offs, which is inconsistent with the meaning of importance coefficients quoted by many earlier studies (Munda and Nardo 2009). From a practical point of view, this characteristic of the SAW method is not desirable since it violates the spirit of sustainable development (Ayres et al. 1998).

3.2.2 Weighted Product Method

With the parameter \(\upbeta = 0\), Eq. (3) is referred to as weighted product (WP) method. Although WP method is not widely applied in constructing CSI, it has attracted much attention owing to its several desirable characteristics, e.g. semi-compensatory property (Nardo et al. 2008), meaningfulness for ratio-scale indicators (Ebert and Welsch 2004; Böhringer and Jochem 2007), and less information loss (Zhou and Ang 2009). In application, WP method has been used for constructing HDI to replace SAW method by the United Nations Development Programme, which could be affected by these earlier studies as discussed by Tofallis (2013).

Due to the exponent property, WP method requires that all ratings are greater than one (Yoon and Hwang 1995). The relative contribution by each individual indicator to the CSI is not visualized as that in SAW. Furthermore, the results usually do not have a numerical upper bound. The former problem can be solved by multiplied by \(10^{\text{l}}\). The later problem, Yoon and Hwang (1995) suggested to compute the distance between each entity and the ideal entity as follows:

$$R_{i} = \frac{{\mathop \prod \nolimits_{j = 1}^{n} v_{ij}^{{w_{j} }} }}{{\mathop \prod \nolimits_{j = 1}^{n} \left( {v_{ij}^{*} } \right)^{{w_{j} }} }}\quad \left( {i = 1, \ldots ,m} \right)$$
(4)

where \({\text{v}}_{\text{ij}}^{ *}\) is the best value for the \({\text{jth}}\) indicators. It is clear that \(0 \le {\text{R}}_{\text{i}} \le 1\), in which 1 (0) indicates the most (least) sustainable entity.

3.2.3 Weighted Displaced Ideal

Weighted displaced ideal (WDI) method is on the basis of the ideal solution theory that aims to calculate the distance between the normalized value of each entity and the “ideal” entity (Zeleny and Cochrane 1981). This concept has further been generalized by Dı́az-Balteiro and Romero (2004) which can provide solutions of “total compensability” among the sustainable indicators and “total non-compensability” of the indicators, as well as a compromise set of solutions between these two extreme cases. By setting \(\upbeta \to + \infty\), Eq. (3) is transformed to the form \({\text{CI}}_{\text{i}} = \hbox{min} {\text{w}}_{\text{j}} {\text{v}}_{\text{ij}}\), in which substitutability between indicators are prohibited. To reach a balanced evaluation, Dı́az-Balteiro and Romero (2004) introduced a parameter \(\uplambda\) representing the degree of substitutability between indicators. When \(\uplambda = 0\), non-compensatory between indicators is assumed. When \(\uplambda = 1\), the WDI method will be simplified as the SAW method which assumes full compensability. For \(0 < \lambda < 1\), partial compensation between indicators will be allowed. This aggregation function is also attracted an increasing attention, e.g. Zhou et al. (2006a), Zhou and Ang (2009), Gómez-Limón and Riesgo (2009), Blancas et al. (2010), Gómez-Limón and Sanchez-Fernandez (2010), and Pollesch and Dale (2015).

3.2.4 Social Multi-criterion Evaluation Method

Social multi-criteria evaluation (SMCE) method, introduced by Munda (2005), is a non-compensatory technique to provide rankings of the entities based on a Condorcet-type of aggregation procedure. SMCE aims to improve the quality of composite indicators by overcoming two technical weaknesses: independence between indicators and the meaningfulness of weights (Munda and Nardo 2003). Once the weights are determined, SMCE undergoes two steps to obtain the overall sustainability rankings. At first, an outranking matrix is built by pair-wise comparison. Elements (\({\text{e}}_{\text{jk}} \left( {{\text{j}} \ne {\text{k}}} \right)\)) in the matrix is the score of the sum of the weights for corresponding indicators under the condition of indicator j performing better than indicator k. A half of the weight will be added if the relationship between indicator j and k is indifference. Such process can be expressed by

$$e_{jk} = \sum\limits_{i = 1}^{m} {(w_{i} \left( {P_{jk} } \right) + \frac{1}{2}w_{i} \left( {I_{jk} } \right)}$$
(5)

where \({\text{P}}_{\text{jk}} \left( {{\text{I}}_{\text{jk}} } \right)\) indicates preference (indifference) relationship. In this step, \({\text{n}}\left( {{\text{n}} - 1} \right)\) pair-wise combinations need to be compared. The second step is to sum up the relevant scores for a complete pre-order of entities. For instance, to rank three entities (e.g. E1, E2, and E3), all possible permutations of these entities are E1E2E3, E1E3E2, E2E1E3, E2E3E1, E3E1E2, and E3E2E1. Then the value of each permutation can be calculated. The one with the highest score is used to determine the ranking of the entities.

Compared with the aggregation methods described above, SMCE requires a large amount of computation, especially in the second step. The information including in the final results might also be limited. Nevertheless, SMCE provides a novel framework for assessing sustainability. When CSIs are derived from SMCE, their underlying subjective aspect only comes from the determination of indicator’s weight. This property decreases the uncertainty in constructing CSI and also relieves some burden in sensitivity analysis. Besides, SMCE is a totally non-compensatory aggregation method which may better reflect the concept of strong sustainability.

3.3 Weighting Methods

From the previous section, it is clear that there is close relationship between the indicator weights and data aggregation. The existing weighting methods could be partitioned into three categories: exogenous (or called normative) methods, endogenous (or called data-driven) methods, and hybrid methods. The main difference between the three categories lies in the degree of value judgement of decision-makers or experts involved in determining the weights. Exogenous methods, mainly dependent on the value judgement of decision-makers or experts are determined by participatory methods. Endogenous methods on the other hand mainly rely on the data distribution, tending to let data “speak”. Hybrid methods attempt to balance the exogenous and endogenous methods.

3.3.1 Exogenous Methods

Equal weighting, arbitrary weighting, and analytic hierarchy process (AHP) are three frequently used exogenous methods. See, for example, Ecological Footprint, Hope et al. (1992) and Murillo et al. (2015) for equal weighting, Kang (2002), Kang et al. (2002) and Krajnc and Glavič (2005) for AHP. Equal weights are usually applied in the circumstance of the absence of comprehensive understanding for the entity. With the improvement of data collection techniques and the extensive research on sustainability, equal weights have gradually been abandoned in constructing CSI. AHP and other methods like budget allocation processes and conjoint analysis are heavily dependent on a thorough understanding about how each entity works. The challenge in the application of these exogenous methods is the choice of appropriate experts (Decancq and Lugo 2013). Once this problem is properly handled, the reliability of the exogenous methods would significantly increase. It should be noted that exogenous methods are ex ante approaches, which makes performance comparison across time and space be feasible.

3.3.2 Endogenous Methods

Statistical weighting and BOD methods are two major endogenous families in determining the weights for constructing CSI. Statistical weighting methods are based on statistical properties of the data, e.g. PCA, FA, and regression analysis (RA). PCA is basically a multivariate statistical technique to summarize the data. FA is based on the assumption that some observed indicators rely on a certain number of unobserved factors. Although the basic assumption of these two methods is distinct, in practice, one usually does not distinguish the difference. Once the principal components are extracted, the factor loading matrix and eigenvalues of the associate principal components can be calculated. The weights of indicators then equal to the ratio of squared factor loadings to the corresponding eigenvalues. The technical details can be found in Gómez-Limón and Riesgo (2009). Despite their statistical soundness, the meaning of the weights estimated by PCA or FA fails in accordance with the original meaning, i.e. importance, since these two methods measure the overlapping information between two or more correlated indicators (Shen et al. 2013). RA approach determines weight by multiple regression or linear programming and assumes that individual indicator relies on the sum of an observed variable and an error term. Thus, the RA approach is usually defined as unobserved components model or observed derived weight method (Nardo et al. 2008).

3.3.3 Hybrid Methods

In addition to assess sustainability, one valuable characteristic of CSI is to compare among all entities so that decision makers can detect the gap and then take actions to improve sustainability performance. From this perspective, endogenous methods somewhat fail to perform this function. Exogenous methods do not suffer the issue since they do not rely on the data distribution. However, exogenous methods depend on the value judgments that might be affected by different expert panels. As mentioned by Decancq and Lugo (2013), expert groups might be underrepresented or simply uninformed resulting in a skewed weighting scheme.

Hybrid methods are proposed to combine exogenous methods with endogenous methods. Decancq and Lugo (2013) listed two hybrid methods, namely stated preference weight and hedonic weights. Stated preference weight, instead of imposing weights by expert panel, is directly based on individual opinions. Hedonic weights also rely on the individual self-reported preference. After obtaining the preference matrix, weights can be estimated by a linear regression. An example of hybrid weighting methods is the BOD model with weight restrictions that can be determined by experts.

4 BOD Methods

Two problems in constructing CSI by MADM models are the information loss caused by normalization (Zanella et al. 2015) and the subjectivity in determining the weights. Fixed weight stemming from MADM models has been controverted with argument that different cultural and social settings value individual weighting framework in different ways (Cherchye et al. 2008).

Alternatively, as suggested by Lovell et al. (1995) and Lovell (1995), linear programming models can be used to construct the ‘best practice’ frontier for the entities. The linear programming approach for constructing CSI is usually defined as benefit of the doubt (BOD). BOD roots in DEA which was originally proposed for evaluating the relative efficiency of a homogeneous set of entities which use multiple inputs to produce multiple outputs. In DEA, the weights of inputs and outputs can be endogenously determined by raw data without using price information. In methodology, CSI based on BOD borrow the idea of DEA for the purpose of weighting and aggregation.

4.1 Basic BOD Model

For each of entities, the basic BOD model explores its most favorable weights (Cherchye et al. 2007b). It can be formulated as follows:

$$\begin{array}{*{20}l} {CI_{i} } \hfill & { = \hbox{max} \sum\limits_{j = 1}^{n} {w_{i} x_{ij} } } \hfill \\ {s.t.} \hfill & {\sum\limits_{j = 1}^{m} {w_{i} x_{ik} } \le 1 k = 1, \ldots ,n} \hfill \\ {} \hfill & {w_{j} \ge 0} \hfill \\ \end{array}$$
(6)

Model (6) is equivalent to the input oriented DEA model with the assumption of constant returns to scale and a dummy input for all the evaluated entities. It provides the optimal aggregated performance values for all entities by solving the model n times. Different from the MADM models, the weight assignment based on BOD adheres to a posterior weighting scheme and the weights of the individual indicators weight may differ between entities. Model (6) holds several desirable properties, such as normalization-free and the invariance with respect to ratio scale transformations (Athanassoglou 2015). Normalization-free can avoid the information loss caused by data transformation. Invariance allows practitioners to aggregate individual indicators into a meaningful composite indicator (Ebert and Welsch 2004). In essence, model (6) measures how far the evaluated entity is from the best practice entity under most favorable weights (Zhou et al. 2007).

Model (6) has been used in many application contexts. The earliest literature may date back to Mahlberg and Obersteiner (2001) who introduced model (6) to reassess HDI. Despotis (2005a, b) used an extension to model (6) to reevaluate HDI. It is worth pointing out that Cherchye and his collaborators applied the model in several backgrounds including sustainable development (Cherchye and Kuosmanen 2004), internal market (Cherchye et al. 2007a) and technology achievement (Cherchye et al. 2008).

4.2 Weight Restriction in Basic BOD Model

The BOD model brings new perspective for constructing CSI, while it also suffers from some shortcomings. For example, model (6) assumes that the weights are nonnegative. It is possible that all the weights are assigned to a single indicator which may not be expected since all the selected indicators are theoretically importance and thus need to be considered (Zhou et al. 2007). Besides, it could open up the debate on the CSI’s credibility and acceptability. To overcome the problems, it is appropriate to restrict weights in certain ways. A straightforward way is to introduce non-Archimedean infinitesimal variable \(\upvarepsilon\) into the model, e.g. Despotis (2005a, b) and Kao (2010). With such modification, however, it is still possible to diagnose an entity well performing even if it is only superior with respect to one indicator but performs poorly with respect to the remaining indicators (Mahlberg and Obersteiner 2001). Hence, further restrictions on weights usually are considered in practice.

Broadly speaking, weight restriction could be classified into two categories, i.e. direct restriction and indirect restriction (Allen et al. 1997). Direct restriction on weights could be formulated in the forms of Eqs. (7) and (8) which are respectively termed as “Type I Assurance Regions” and “Type II Assurance Regions” by Thompson et al. (1990). The Greek letters in Eqs. (7) and (8) are specified by decision makers to reflect their preference regarding the relative importance of indicators. \({\text{w}}^{{\prime }}\) could be the combination of weights. The use of direct restriction on weights can be found in Mahlberg and Obersteiner (2001), Cherchye and Kuosmanen (2004), and Cherchye et al. (2007b). Indirect restrictions on weights could be formulated as the form of Eq. (9) which was originally proposed by Wong and Beasley (1990). \(\upphi\) and \({\upvarphi }\) also indicate the preference of decision-makers. Rather than restricting actual weights, Eq. (9) places lower and upper bounds on the relative contribution of each indicator to the entity’s aggregate performance value. This restriction method has been adopted in many previous studies, e.g. Zhou et al. (2007, 2010), Cherchye et al. (2008), Zanella et al. (2015) and Athanassoglou (2015).

$$\alpha_{j} \le \frac{{w_{j} }}{{w_{j + 1} }} \le \beta_{j}$$
(7)
$$\lambda w^{{\prime }} \le \kappa w_{j} \le \gamma w^{{\prime }}$$
(8)
$$\phi_{j} \le \frac{{w_{j} x_{ij} }}{{\mathop \sum \nolimits_{j = 1}^{n} w_{j} x_{ij} }} \le \varphi_{j}$$
(9)

The above restrictions cannot only overcome the problem aforementioned but also introduce “valued judgment” to incorporate prior views or information in assessing the performance of entities. The prior information can be incorporated via the determination of boundaries by MADM method such as AHP, BAP and the social surveys, e.g. Cherchye et al. (2008). Direct restriction usually incorporates information of marginal rates of substitution between indicators which is sensitive to the units of measurement (Allen et al. 1997). Consequently, it is often difficult to specify meaningful substitution in real-life applications (Zanella et al. 2015). In contrast, indirect restriction method holds the desirable property of ratio-scale invariance (Cherchye et al. 2008; Zhou et al. 2007). This is particularly compelling in the case of constructing environmental performance index (Ebert and Welsch 2004). Furthermore, as Cherchye et al. (2008) discussed, Eq. (9) can be expressed as pie share constraints which are pure numbers and can be easily grasped by decision makers. Nevertheless, the meaning of Eq. (9) is not so straightforward since the implied restrictions on weights are entity-specific. Hence, Wong and Beasley (1990) suggested several modifications. One of the modifications, i.e. replacing \({\text{x}}_{\text{ij}}\) with \(\sum\nolimits_{{{\text{j}} = 1}}^{\text{n}} {\frac{{{\text{x}}_{\text{ij}} }}{\text{n}}}\) in Eq. (9), which represents the level of the ith indicator of the “average” entity, has also been applied in constructing composite indicators, e.g. Zanella et al. (2015).

4.3 Extensions of Basic BOD Model

Due to its striking properties, BOD model has been extended to solve various problems in constructing CSI, e.g. hierarchy problem, compensability, comparability, etc.

The basic BOD model usually treats all the indicators at the same level and thus leaves out the information of the hierarchical structure of indicators. This hierarchy problem might be unrealistic due to the fact that multiple layer indicator framework is constructed in order to evaluate the increasing complicated sustainable performance in a more comprehensive way. According to Becker (2005), frameworks are mostly hierarchical extending from broad categories of data and information to detailed measures. To overcome this limitation, Shen et al. (2013) improved the basic BOD model to fit the property of hierarchical indicator system by specifying weights in each category of each layer. More straightforwardly, in the situation of multiple hierarchical indicator framework, the basic BOD model is first used to determine the “best practice” performance of certain layer indicators. Then the aggregation of higher layer indicators can be done by MADM methods. See, for example, Kao et al. (2008).

In addition, due to the linear characteristic of its objective function, BOD also faces the compensatory issue as discussed earlier. Munda and Nardo (2009) suggested that it is compulsory to construct non-compensatory composite indicators so that weights are theoretical consistent with the meaning of importance. To relax the compensatory characteristic, Zhou et al. (2010) combined the WP aggregation method with basic BOD model to construct a multiplicative optimization approach with semi-compensatory characteristic to reach a compromise solution. Pakkar (2014) proposed a similar model for constructing Technology Achievement Index. Fusco (2015) introduced directional penalties to enhance the non-compensatory characteristic of basic BOD model to take into account the preference structure among indicators. Generally, the methods take similar perspective, i.e. imposing more penalties upon the indicators with worse performance.

The basic BOD model on the basis of conventional DEA technique distinguishes efficient and inefficient entities in the DEA terminology, and is not suitable for ranking the performance of entities (Kao 2010). The main strength of basic DEA models is to recognize the inefficient entities. Hence, many studies have been devoted to improve the comparability of basic BOD model under the framework of composite indicators. For example, from an opposite perspective of Model (6), Zhou et al. (2007) proposed a model to seek the “worst” set of weights for each entity, and use an adjusting parameter to combine the “best practice” and the “worst practice” to form composite indicators. Several studies have adopted this model to construct composite indicators in various contexts, e.g. Domínguez-Serrano and Blancas (2011), Rogge (2012), and Blancard and Hoarau (2013). Athanassoglou (2015) further improved the worst-case of basic BOD model for constructing composite indicators.

Besides, the concept of common-weight is also applied to enhance the comparability of basic BOD model. Its basic idea is that every entity need to use the same benchmark for calculating the performance score. Despotis (2005a, b) initially introduced the concept of common-weight, in which basic BOD model is firstly used to determine most favorable weights for entities and then a goal programming model is developed to discriminate entities with the same performance score. Dong et al. (2015) used similar two-stage method to measure farm sustainability. Kao et al. (2008) proposed a similar two-stage model for evaluating the national competitiveness. Kao (2010) combined the concept of common-weights with Malmquist productivity index. Built upon Zhou et al. (2007), Hatefi and Torabi (2010) also proposed a common-weights MCDA-DEA approach in which the common-weights are calculated in one step. Tofallis (2013) also used two-stage model to seek a common set of weights to apply to all entities. More recently, Hatefi and Torabi (2016) further analyzed how to improve the composite indicators of inefficient entities on the basis of a slack analysis framework.

BOD methods are flexible and systematic for constructing CSI. In recent years, conventional DEA models are also used to establish composite indicators, e.g. environmental performance index (EPI). Application of conventional DEA models to construct EPI might begin with the establishment of environmental production technology (Zhou and Ang 2008). Then an EPI can be constructed by different types of DEA models with different properties. See, for example, Zaim et al. (2001), Zhou et al. (2006b), Zhou and Ang (2008), Blancard and Hoarau (2013), and Wang (2015). More recently, Zhou et al. (2017) evaluated previous studies and showed that the range adjusted DEA model can generate a cardinally meaningful composite index.

5 The CSI Robustness and Beyond

5.1 Selection Principle

So far, we have examined three methodological aspects pertinent to CSI construction. Besides, two additional issues have often been questioned, i.e. comparability and meaningfulness. Comparability is mainly caused by the incommensurability of indicators’ measurement units. Martinez-Alier et al. (1998) theoretically showed that the incommensurability does not imply incomparability but weak comparability, which means that there is a good potential for applying multi-indicator evaluation methods (e.g. MADM and BOD) to sustainability assessment. Although the above argument provides theoretical comparability foundation, sustainability assessment still faces the difference and ambiguity caused by measurement units, which may make CSI meaningfulness.

Ebert and Welsch (2004) first discussed how to construct a meaningful environmental index, which has been used as a criterion for investigating whether an environmental or sustainable index is meaningful or not by Böhringer and Jochem (2007) and Singh et al. (2009). Meaningful CSI indicates that the preference orderings does not vary with different scale of underlying indicators. Ebert and Welsch (2004) classified different scales into four categories according to the concept of comparability (measurability) of scales: interval-scale non-comparability, interval-scale full comparability, ratio-scale non-comparability, and ratio-scale full comparability. If interval-scaled indicators are full comparable, the arithmetic mean aggregation function satisfies continuous, strongly monotone, and separable properties and thus can generate a meaningful index. If ratio-scaled indicators are non-comparable, geometric mean aggregation function is recommended. Table 3 provides a summary of different cases. It should be pointed out that it is impossible to construct meaningful CSI when there exist indicators with distinct measurement scales (Böhringer and Jochem 2007; Ebert and Welsch 2004). More recently, Pollesch and Dale (2015) investigated aggregation functions for six different scales of indicators in constructing an appropriate meaningful CSI. Zhou et al. (2017) generalized the meaningfulness concept by Ebert and Welsch (2004) and showed how to construct a cardinally meaningful index.

Table 3 Aggregation rules for indicators by Ebert and Welsch via Böhringer and Jochem (2007)

In addition to scales, many other factors can have impact on the selection of aggregation function in constructing CSI, e.g. interactive phenomena between indicators, the types of weight, and the assumption of sustainability. When there are interactive phenomena between indicators, some preliminary treatments should be firstly conducted to eliminate those interactive relationships. However, as Mayer (2008) stated, without a clear understanding of interactive relationship between indicators and how those relationships influence the results, it is hard for decision makers to formulate policy with the aim to increase economic equity, environmental improvement, and further increase possibilities for long-term sustainability. Hence, those aggregation methods taking interaction into consideration, e.g. Choquet integral with fuzzy measure, might be a good choice. The types of weight can also have impact on the application of aggregation function. For example, weights, no matter on which weighting methods, can be classified into two categories: ordinal and cardinal ones. Ordinal weights usually cannot be handled well by compensatory aggregation methods. In this situation, non-compensatory approach may be an appropriate choice. In addition, the assumption of sustainability theoretically determines the choice of aggregation algorithm (Munda 2005). There are usually two economic paradigms of sustainability: weak sustainability and strong sustainability (Dietz and Neumayer 2007; Neumayer 2013). From weak sustainability perspective, natural capital is considered to be substitutable. In this view, those compensatory aggregation algorithms might be suitable. From the perspective of strong sustainability, natural capital is regarded as non-substitutable. Then, the non-compensatory or semi-compensatory aggregation schemes may be more appropriate.

In general, we may summarize the procedure for selecting an appropriate approach to constructing CSI as follows. First and most importantly, economic paradigms (weak or strong sustainability) should be clearly defined, based on which either compensatory or non-compensatory aggregation scheme can be determined. The indicator framework following the definition of different paradigms can also be established. With the premise of indicator framework, practitioners can check the scales of indicators and assign weight for each indicator. Once indicator framework show the property of the same measurement scales, the procedure can continue. Otherwise, indicators with different scales should be replaced by other proxy indicators with the same scale. Additionally, there are two other factors that should be considered, namely interactive phenomena between indicators and hierarchical structure. If practitioners decide to model the interactive relationship between indicators, the way for assigning weights to indicators might be different on which the selection of aggregation scheme will directly be influenced.

5.2 Uncertainty and Sensitivity Analysis

It must be acknowledged that each method has its own merits. However, as Booysen (2002) discussed, every element of methods used to construct composite indicators cannot escape from criticism. The disagreements originate from many facets, and one main source is the robustness of CSI. Theoretically, different combinations of methods can be used to construct CSI which implies that it is possible to derive very different results.

Two alternative approaches are used in constructing CSI to increase their robustness. One is to ensure the transparency of the whole construction process. This requires vivid statement of the models including those important aspects, such as mathematical and descriptive properties. In addition, the way by which such models are used and integrated in a decision process still needs to be elaborated clearly. The other approach is to assess the uncertainties by sensitivity analysis. Sensitivity analysis can answer the question why those entities with similar sustainable performance get distinct rankings, and can also be used to globally analyze the variation in CSI when different aspects vary over a reasonable range of possibilities (Saisana et al. 2005; Munda and Saisana 2011). For instance, Zhou et al. (2010) compared different rankings of entities obtained by a large set of randomly chosen weighting schemes. Munda and Saisana (2011) analyzed the stability of sustainability rankings by different aggregation rule while keeping the weights of indicators unchanged.

Keeping transparency and conducting sensitivity analysis are posterior uncertainty analysis. Correspondingly, there are also a priori uncertainty analysis methods, e.g. the Shannon-Spearman measure (SSM) developed by Zhou et al. (2006a, b) and Zhou and Ang (2009). SSM is based on the concept of information loss in the process of aggregating underlying indicators into a composite index. Intuitively, methods with smaller SSM, i.e. less loss of information, may be regarded as better ones. Methods with zero SSM are deemed inheriting full information, and thus are regarded as perfect model. In this sense, SSM might be another approach for uncertainty evaluation of CSI.

5.3 Beyond Rankings

Although CSI intuitively provides the index values and ranking results, as emphasized by Nardo et al. (2008) and Grupp and Schubert (2010), it can be a means of initiating discussion to facilitate communication between different stakeholders. The influential CSI can draw the attention from policy makers towards the importance of sustainable development. Its intuitive construction also provides opportunity to uncover the debate for the public, instead of excluding them straightaway. Besides, CSI may help to stir policy competition about best practice in sustainable development policies and become a useful monitoring tool to avoid unintended consequence caused by unsuitable policies. The process of constructing CSI also provides the possibility of further analyzing the questions. For example, where is the strength? Which aspect of the entity should be improved? What is the real contribution of certain indicators to CSI? The information hidden in the CSI can be visually exhibited with the help of spider diagrams or radar charts, by which the strengths and weaknesses can be easily and intuitively represented. The correlation analysis between underlying indicators and the values of CSI can illustrate the contribution of each indicator, and then help identify the priority of improvement.

6 Conclusions

This chapter provides a state-of-the-art review of CSI construction with focus on the methodological developments. We firstly introduce the general structure of CSI construction. Then, we classify the methods for constructing CSI into two groups, i.e. MADM and BOD. In MADM, methods for normalization, weighting and aggregation together with their pros and cons are respectively discussed. It is found that z-score normalization scheme, hybrid weighting methods and compensatory/semi-compensatory aggregation functions are most commonly used in application. Non-compensatory aggregation scheme has received increasing attention by some recent studies. In BOD, the basic BOD model, weight restriction and other extensions are described. A new trend is that analysts tend to incorporate various MADM methods into BOD to construct CSI. Finally, we investigate the principles for selecting appropriate aggregation methods in constructing CSI. Uncertainty and sensitivity analysis have also been discussed in order to establish CSI with robustness.

CSI has evolved as a popular tool for the purpose of monitoring sustainability performance and providing valuable information for supporting policy analysis and decision making. However, various challenges still exist, e.g. the conceptual issue of sustainability, dimensional diversity, data availability and so on. The widely accepted definition of sustainability includes the impacts on the next generation, which implies that it is important to incorporate the influence of time and geographical factor. When taking geographical factors into account, practitioners may also need to consider entities’ different culture and development patterns. There is also a rising concern on how to construct a meaningful CSI from both theoretical and methodological perspectives, as the existing CSIs seldom satisfy the axiomatic requirements of the meaningfulness definition. In this sense, further efforts are still required to improving the meaningfulness and robustness of existing CSIs.