Introduction

Policy makers within industry are paying increased attention to implementing the sustainable development concept into business activities due to fierce competition in the global market, and strict environmental regulations. Sustainable development indicators are recognized as useful tools for the assessment and anticipation of production performance and trends, being able to provide early-warning information to prevent economic, societal, and environmental damage, and support decision-making (Singh et al. 2009). The indicators can be divided in two groups: content indicators, which are describing the state of the system and performance indicators, which are measuring the behavior of the system (Sikdar 2003). The composite indicator (CI) can be defined as an aggregation of different indicators according to a well-developed and pre-determined methodology (Gasparatos et al. 2008). CIs can be divided into several different categories depending on the various methods selected during their formulation (Niemeijer 2002): (1) data-driven, when data availability is the central issue concerning the development of the CIs and high-quality data must be provided, (2) theory-driven, when selecting the best possible indicators for CIs construction is done from a theoretical point of view, and data availability is only one of the many aspects considered, (3) policy-driven, when the indicators are selected, especially for the monitoring of a certain policy. Various methodologies exist for the construction of CIs. Nardo et al. (2008) described a framework for the construction of a CI, which includes the selection of relevant indicators and data, imputation of missing data, normalization of the selected indicators, weighting, and aggregation. Even though all steps are important for the quality of the final CI, the weighting and aggregation steps seem to have the greatest impact. Zhou et al. (2007) proposed a mathematical-programming approach for the construction of CIs using multiple criteria decision analysis (MCDA) for aggregation of the selected indicators and two sets of weights (the most and least favorable) that are generated from the data themselves. This proposed approach does not require a prior knowledge in relation to the weights of the selected indicators. Cherchye (2007) used data envelopment analysis (DEA) in their construction with the aim of neutralizing the recurring sources of criticism about CIs. The application of this method makes it possible to skip the normalization stage as it is invariant to measurement units and the weights are generated by a ‘flexible benefit of the doubt’ method. Sikdar (2009) simplified the comparison process of aggregated metrics by expressed them as the geometric mean of the ratios of the individual metrics for pairwise comparison. The generated aggregate metric by this method is sensitive to the individual metrics and weighting factors, also. Hatefi and Torabi (2010) proposed a common weight MCDA–DEA approach for constructing CIs. Zhou et al. (2010) proposed a multiplicative optimization approach for constructing CIs, using the weighted product (WP) method. In their approach the weights are generated by solving a series of multiplicative DEA-type models that can be transformed into equivalent linear programs. The developed model enables the incorporation of additional relevant information about the weights.

The sustainability assessment is performed in several manufacturing sectors, such as the steel industry (Singh et al. 2007), chemical industry (Beloff and Tanzil 2006), traditional beet-sugar plants (Krajnc et al. 2007), dry-cleaning industry (von Bahr et al. 2003), and breweries (Tokos et al. 2011). There have also been studies that focused on benchmarking the energy efficiency of an industry (Phylipsen et al. 2002; Mateos-Espejel et al. 2011; Frangopoulos and Keramioti 2010), transport sector (Henning et al. 2011), sludge-plants (Abusam et al. 2004), water management (De Carvalho et al. 2009), and ski areas (Geneletti 2008) .

CIs remain controversial despite their increasing usage. It is frequently argued that CIs are too subjective, due to the assumptions when estimating any measurement error in data, the selection of the relevant indicators, choice of normalization scheme, weights and aggregation systems (Singh et al. 2009). It is vital to find the best combination for a normalization–weighting–aggregation scheme regarding the construction of a CI, which will effectively measure any changes in a company’s performance, and assist it toward a more sustainable development.

By using sensitivity analysis, it can be determined how the composite sustainability index depends on the information fed into it. Chan et al. (1997) applied and compared several variance-based methods (correlation ratios or importance measures, Sobol’ indices, and Fourier amplitude sensitivity test (FAST) indices), in order to measure how much a model depends on its input. They concluded that the correlation-ratio measurement is model-independent, and can only evaluate the first-order sensitivity indices of the input parameters. ‘FAST and Sobol’ are completely automated and are able to compute those total-effect indices that allow for a qualitative ranking of the input parameters in regard to their influence on the output. Saltelli (2002) introduced a new strategy for the computation of full sets of first and total-effect sensitivity indices for a model’s output. The computation of sensitivity indices are based on decomposing the variance of the target-function in a quantitative manner. Cherchye et al. (2006) applied uncertainty and sensitivity analyses to assess the robustness of the final results and to evaluate the influences of each individual uncertainty source on the output variance, if DEA was used for the construction of CI. Saisana and Saltelli (2008), using information collected from budget allocation (BAP) and analytical hierarchy processes (AHPs), analyzed the relative importance of expert opinion on the selected indicators included in a CI.

This article, aims to identify the best integration scheme for the construction of a composite sustainability index, via a sensitivity analysis of its application during an industrial case study. The remainder of this article is organized as follows: steps for the construction of a composite suitability index and the considered methodologies, together with the used sensitivity analysis, are given in second section. The industrial case study is presented in third section, followed by the results and conclusions in fourth section.

Construction of a composite sustainability index, and sensitivity analysis

By applying sensitivity analysis to the obtained results, the objective is to identify the more suitable methods for constructing a composite sustainability index. The evaluated methodology gradually aggregates the selected environmental, societal, and economic indicators into a composite sustainability index, Fig. 1. The relevant indicators covering different aspects of sustainability are selected during the pre-modeling stage, and their positive or negative impacts on the sustainability development are judged. These steps are followed by data collection, including indicators, benchmarks, and weights. The modeling phase involves normalization that transforms the indicators into dimensionless forms. Weights are assigned to the individual indicators and sustainability sub-indices before the aggregation process, thus measuring their importance for sustainable development within the company. By applying a step-by-step procedure, the indicators are first grouped into sustainability sub-indices, which are then combined into a composite sustainability index. At the end of the methodological procedure, the final result provides a set of sub-indices for different sustainability dimensions in regard to the company’s performance, thus providing a degree of sustainability implementation.

Fig. 1
figure 1

Scheme for calculation of composite sustainability index

Indicator selection, judgment on the indicator’s impact, and data collection

According to the described methodology, the first step during the integrated sustainability performance assessment is the selection of a suitable set of indicators. In our approach, the indicators were selected based on global reporting initiative (GRI) guidelines. These guidelines provide a set of core and additional indicators. The core indicators are those indicators, which are identified as interesting to most stakeholders and assumed to be material, unless deemed otherwise, on the basis of the GRI reporting principles. However, the final selection should always be performed in close cooperation with the industry. Any additional indicators are optional, for example, some additional indicators can be included if the company is interested in evaluating its influence on sustainable development. Likewise, some core indicators can also be excluded, as they are not being measured or are seen as unimportant. The environmental indicators are selected from three aspects: (1) Material, (2) Energy, and (3) Emissions, Effluents, Waste. The societal indicators are chosen from the next three groups: (1) Employment, (2) Occupational Health and Safety, and (3) Diversity and Equal Opportunities. Eight economic indicators are presented from the aspect of economic performance.

In the next step, the indicators are divided into groups in regard to their influences on sustainable development. These groups consist of those whose increasing value have a positive impact on sustainable development (I +), e.g., paper recycling, and alternatively those whose increasing values have a negative impact (I ), e.g., freshwater consumption. Depending on which group the indicator is assigned to, the normalization equation will be different for every normalization method applied.

Data collection is the most time-consuming process, as a large number of values need to be collected from the company regarding their environmental, societal, and economic performances. In addition, the benchmarks need to be defined, and are later used during the normalization method (distance to reference). The ‘benefit of the doubt’ approach is used for determining the weights. The benchmark values were determined for each of the selected indicators, based on the values from the best available techniques (BAT), the measurements and standards within the company, the local legal regulations, GRI reports for specific production sectors, and other relevant documents. The strictest limit within the range is selected for those cases where the benchmark values are given within a range. Typical types of technology, the raw materials, the production process, and the types of energy used during production, are taken into account when determining the benchmarks. Expert opinion regarding the importance of individual indicators and sustainability sub-indices also needs to be collected for the calculation of weights within BAP.

Normalization

Normalization is necessary for integrating the selected indicators into a composite sustainability index, as they are usually expressed in different units. The normalization methods included are: minimum–maximum, distance to a reference, and the percentage of annual differences over consecutive years.

Minimum–Maximum

According to this method, each indicator with a positive impact on sustainable development, \( I_{i,j,t}^{ + } \), is transformed into a normalized form by the equation:

$$ I_{{N_{i,j,t} }}^{ + } = \frac{{I_{i,j,t}^{ + } - I_{i,j}^{{ + ,{\text{MIN}}}} }}{{I_{i,j}^{{ + ,{\text{ MAX}}}} - I_{i,j}^{{ + ,{\text{MIN}}}} }}\quad \forall i \in I\quad I_{{_{i,j} }}^{{ + ,{\text{ MAX}}}} = \mathop {\max }\limits_{t \in T} I_{i,j,t}^{ + } \quad \wedge \quad I_{{_{i,j} }}^{{ + ,{\text{ MIN}}}} = \mathop {\min }\limits_{t \in T} I_{i,j,t}^{ + } $$
(1)

While the indicator with a negative impact on sustainable development, \( I_{i,j,\,t}^{ - } \), is normalized by the equation:

$$ I_{{N_{i,j,t} }}^{ - } = 1 - \frac{{I_{i,j,t}^{ - } - I_{i,j}^{{ - ,{\text{MIN}}}} }}{{I_{i,j}^{{ - ,{\text{ MAX}}}} - I_{i,j}^{{ - ,{\text{MIN}}}} }}\quad \forall i \in I\quad I_{{_{i,j} }}^{{ - ,{\text{ MAX}}}} = \mathop {\max }\limits_{t \in T} I_{i,j,t}^{ - } \quad \wedge \quad I_{{_{i,j} }}^{{ - ,{\text{ MIN}}}} = \mathop {\min }\limits_{t \in T} I_{i,j,t}^{ - } $$
(2)

where \( I_{i,j,t}^{ + } \) and \( I_{i,j,t}^{ - } \) are the values for indicator i from the group of indicator j in year t with positive and negative impacts on sustainable development, respectively, while \( I_{{N_{i,j,t} }}^{ + } \) and \( I_{{N_{i,j,t} }}^{ - } \) are their normalized indicators, respectively. The highest value for indicator i with positive impact on sustainable development from the group of indicator j for the analyzed time period is denoted as \( I_{{_{i,j} }}^{{ + ,{\text{ MAX}}}} , \) while for indicator i with negative impact on sustainable development, as \( I_{{_{i,j} }}^{{ - ,{\text{ MAX}}}} \). Otherwise, the lowest value for indicator i with positive and negative impact are denoted as \( I_{{_{i,j} }}^{{ + ,{\text{ MIN}}}} , \) and \( I_{{_{i,j} }}^{{ - ,{\text{ MIN}}}} \), respectively. In this way the normalized indicator with positive impact on sustainable development will have a value between 0, for \( I_{i,j,t}^{ + } = I_{i,j}^{{ + ,{\text{MIN}}}} , \) and 1, for \( I_{i,j,t}^{ + } = I_{i,j}^{{ + ,{\text{MAX}}}} \). In the case of an indicator with negative impact on sustainable development, the highest value will be achieved when \( I_{i,j,t}^{ - } = I_{i,j}^{{ - ,{\text{MIN}}}} \), according to the indicator’s nature. It should be noted, that this transformation is time-dependent, which implies an adjustment of the analyzed time period, if new data is available. This adjustment may change the minimum and maximum values of some indicators, and then affect the normalized values. The composite sustainability index for the existing data must be re-calculated to maintain comparability between the existing and the new data.

Distance to a reference

When applying this method, the normalized value is calculated as the ratio between the indicator and an external benchmark. The external benchmark can be defined by the BAT values, measurements and standards for a specific production sector, local legal regulations, GRI reports, and any other relevant documents. The normalized indicators are described by Eqs. 3 and 4:

$$ I_{{N_{i,j,t} }}^{ + } = \frac{{I_{i,j,t}^{ + } }}{{I_{i,j}^{\text{Benchmark}} }} $$
(3)
$$ I_{{N_{i,j,t} }}^{ - } = \frac{{I_{i,j}^{\text{Benchmark}} }}{{I_{i,j,t}^{ - } }} $$
(4)

where \( I_{i,j}^{\text{Benchmark}} \) is the benchmark for indicator i from the group of indicators j. Using the denominator \( I_{i,j}^{\text{Benchmark}} , \) the equation takes into account the evolution of indices over time, according to possible future benchmark updates. In this case, the normalized value can have a value higher than 1, indicating that the company has above-average performance.

Percentage of annual differences over consecutive years

The indicators are transformed by Eqs. 5 and 6:

$$ I_{{N_{i,j,t} }}^{ + } = \frac{{I_{i,j,t}^{ + } - I_{i,j,t - 1}^{ + } }}{{I_{i,j,t - 1}^{ + } }} $$
(5)
$$ I_{{N_{i,j,t} }}^{ - } = \frac{{I_{i,j,t - 1}^{ - } - I_{i,j,t}^{ - } }}{{I_{i,j,t - 1}^{ - } }} $$
(6)

The disadvantage of this method is that the indicators for t = t 0 cannot be normalized with the given equations, and would be lost during the analysis.

Weighting methods

The relative importance of indicators is a source of disagreement, as the decision makers of companies have different views, and are interested in different indicators. The weights of indicators can be obtained by statistical models, such as factor analysis, DEA, and unobserved component models (UCM), or from participatory methods such as, BAPs, AHPs, and conjoint analysis (CA). The applied weighting methods are presented in the continuation.

Equal weightings (EWs)

Most of the CIs are constructed by EW, which means that all indicators are assigned the same weight. This essentially implies that all indicators have the same importance, but it could also disguise the absence of statistical or empirical bases for determining the weights. In any case, EW does not mean no weights, but implicitly implies that the weights are equal. Moreover, if variables are grouped into dimensions and these are further aggregated into the composite, then applying EW to the variables may imply an unequal weighting of the dimensions (the dimensions grouping the larger number of variables will have higher weight). This could result in an imbalanced structure within the composite index.

‘Benefit of the doubt’ approach (BOD)

Using this method, the composite sustainability index in year t, \( I_{{{\text{SUST}}_{t} }} \), is defined as the ratio between the actual performance of the company and the external benchmark:

$$ I_{{{\text{SUST}}_{t} }} = \frac{{\sum\nolimits_{j} {I_{{S_{j,t} }} } \cdot w_{j} }}{{\sum\nolimits_{j} {I_{{S_{j} }}^{\text{Benchmark}} } \cdot w_{j} }} $$
(7)

where \( I_{{S_{j,t} }} \) is the sustainable sub-indices for the group of indicator j in year t, while \( I_{{S_{j,t} }}^{\text{Benchmark}} \) is their benchmarks, and w j is the weight of the group of the sustainability indicator (sub-indices) j, reflecting the importance given to the environmental, societal, and economic performance of the company (group of environmental indicators: j = 1; group of societal indicators: j = 2; group of economic indicators: j = 3). The sustainable sub-indices, \( I_{{S_{j,t} }} , \) are calculated by the equation:

$$ \begin{aligned} & I_{{S_{j,t} }} = \sum\limits_{i} {I_{{N_{i,j,t} }}^{ + } \cdot w_{i,j} } + \sum\limits_{i} {I_{{N_{i,j,t} }}^{ - } \cdot w_{i,j} } \\ & \sum\limits_{i} {w_{i,j} = 1,} \\ & w_{i,j} \ge 0 \\ \end{aligned} $$
(8)

While \( I_{{S_{j,t} }}^{\text{Benchmark}} \) is given by:

$$ \begin{aligned} & I_{{S_{j,t} }}^{\text{Benchmark}} = \sum\limits_{i} {I_{i,j}^{\text{Benchmark}} \cdot w_{i,j} } \\ & \sum\limits_{i} {w_{i,j} = 1,} \\ & w_{i,j} \ge 0 \\ \end{aligned} $$
(9)

where w i,j is the weight of indicator i from the group of indicator j, and reflects the importance of this indicator during the sustainability assessment of the company. With the aim of determining the weights for individual indicators, a linear programming problem can be defined, which ensures the highest value for the CI:

$$ \begin{aligned} & I_{{S_{j,t} }}^{*} = \mathop {\arg \max }\limits_{{w_{i,j} }} \left( {\sum\limits_{i} {I_{{N_{i,j,t} }}^{ + } \cdot w_{i,j} } + \sum\limits_{i} {I_{{N_{i,j,\,t} }}^{ - } \cdot w_{i,j} } } \right) \\ & \sum\limits_{i}^{s.t.} {I_{i,j}^{\text{Benchmark}} \cdot w_{i,j} \le 1} , \\ & w_{i,j} \ge 0 \\ \end{aligned} $$
(10)

where \( I_{{S_{j,t} }}^{*} \) is the highest value for the sustainable sub-indices for the group of indicator j in year t. The weight of the group of sustainability indicator, w j , can be determined in a similar way by:

$$ \begin{aligned} & I_{{_{{{\text{SUST}}_{t} }} }}^{*} = \mathop {\arg \max }\limits_{{w_{i,j} }} \left( {\sum\limits_{j} {I_{{S_{j,t} }} } \cdot w_{j} } \right) \\ & \sum\limits_{j}^{s.t.} {I_{{S_{j} }}^{\text{Benchmark}} \cdot w_{j} \le 1,} \\ & w_{j} \ge 0 \\ \end{aligned} $$
(11)

where \( I_{{_{{{\text{SUST}}_{t} }} }}^{*} \) is the highest value for the composite sustainability index in year t.

Budget allocation process (BAP)

The BAP determines the indicator weights based on expert opinion. In order to establish a proper weighting system, it is essential to bring together experts representing a wide-spectrum of knowledge and experience, e.g., experts from different production sectors and management within the company. They estimate a preference factor for each indicator from the aspect of sustainability by following a scale from 1 (unimportant) to 5 (highly important), based on their experience and subjective judgment of the selected indicators’ relative importance. The weights are then calculated as average preference factors. The main advantages of BAP are its transparent and relatively straightforward nature, and its short duration. The disadvantage of this method is that the weights could reflect specific local conditions, e.g., environmental problems, or the need for political intervention regarding some indicators, e.g., local legal regulations regarding CO2 emission, so the weighting may not be transferable from one region to another.

Aggregation methods

When applying a step-by-step procedure the selected indicators are grouped into sustainability sub-indices. The \( I_{{S_{j,t} }} \), for each group of sustainability indicators j, are then combined into a composite sustainability index, \( I_{{{\text{SUST}}_{t} }} \). The aggregation methods used during this analysis were: linear, geometric, and non-compensatory multi-criteria approaches (NCMCs).

Linear aggregation (LIN)

As Eq. 8 illustrated, the sustainability sub-indices for the group of indicator j, \( I_{{S_{j,t} }} \) were calculated as a summation of weighted and normalized individual indicators. Based on this, when applying LIN, the composite sustainability index is calculated as:

$$ \begin{aligned} & I_{{_{{{\text{SUST}}_{t} }} }} = \sum\limits_{j} {I_{{S_{j,t} }} } \cdot w_{j} \\ & \sum\limits_{j} {w_{j} = 1,} \\ & w_{j} \ge 0 \\ \end{aligned} $$
(12)

LIN is widely used, because of its simplicity, transparency, and easy understanding. An undesirable feature of this method is its compensability, namely the poor performances of some indicators can be compensated by the sufficiently high values of other indicators thus, in this way, the composite sustainability index will not entirely reflect any information about its individual indicators.

Geometric aggregation (GME)

The shortcomings of the LIN method can be partially overcome by using GME, where the sustainability sub-indices for the group of indicator j, \( I_{{S_{j,t} }} , \) is calculated as a product of the normalized individual indicators as a power of their weights:

$$ I_{{S_{j,t} }} = \prod\limits_{i} {\left( {I_{{N_{i,j,t} }}^{ + } } \right)^{{w_{i,j} }} \cdot \prod\limits_{i} {\left( {I_{{N_{i,j,t} }}^{ - } } \right)^{{w_{i,j} }} } } $$
(13)

and the composite sustainability index is calculated by the equation:

$$ I_{{_{{{\text{SUST}}_{t} }} }} = \prod\limits_{j} {\left( {I_{{S_{j,t} }} } \right)^{{w_{j} }} } $$
(14)

Use of GMEs is recommended when non-comparable and strictly positive sub-indicators are expressed in different ratio scales. It should be noticed that compensability exists in case of GMEs, also. In case of LINs the compensability is constant, while for GME is partial, i.e., compensability is lower when the CI contains indicators with low values. In other words, GME is able to alleviate but not to eliminate the compensability. Moreover, as a method of multiplication, problems may occur when zero is present in the input data.

Non-compensatory multi-criteria approach (NCMC)

In the cases of linear and GMEs, the substitution rates among the indicators are equal to the weights of the indicators. As a consequence, the weights in these aggregation schemes necessarily have the meaning of substitution rates and do not indicate the importance of the associated indicator. When using the NCMC during the construction of a composite sustainability index, the weights would be interpreted as important coefficients. The NCMC approach can be divided into two steps: (1) pairwise comparison of indicators with the whole set of selected indicators for different years, (2) ranking of years based on an achieved sustainability performance, which would give information about the company’s progress towards sustainable production. These steps are first applied for the ranking of sustainability sub-indices (environmental, societal, and economic), and then repeated during the second step using the results from the previous ranking, in order to grade the composite sustainability index for all t ∈ T. As an illustrative example, three environmental indicators (freshwater, heat, and electricity consumption), two societal (total number of employees and the number of injuries), and two economic indicators (profit and operating costs) were selected. In addition, a set of weights, considered as important coefficients and which satisfied the condition \( \sum\nolimits_{i} {w_{i,j} = 1} \) for every sustainability sub-indices group j, were assumed for an individual indicator. The data necessary for constructing an outranking impact matrix for sustainability sub-indices, is given in the impact matrix, Table 1. The score for each year t within the environmental sub-indices is the sum of those weights of individual indicators that showed better performance in year t. Table 2 gives the outranking impact matrix for environmental sub-indices. For the year 2003, in comparison with 2004, the score is equal to 0, because during the year 2003 the company had higher water, heat, and electricity consumption than in 2004. Accordingly, the score for 2004 in comparison to 2003 is equal to 1. It should be noted that environmental indicators have a negative impact on sustainable development, therefore, a lower value is preferable. The final ranking would be the permutation with the highest score. If two or more permutations have the same score, the permutation from the earliest year would be ranked higher as a penalty for the following years, when no improvement was achieved in the field of sustainable development. The outranking impact matrixes for societal and economic sub-indices are shown in Tables 3 and 4, respectively. It should be stressed that, in the case of societal indicators, the total number of employees was positive, while the number of injuries had a negative impact on sustainable development. In the same manner, profit was positive while the operating cost had a negative impact on sustainable development among the selected economic indicators. The ranks determined for the sustainability sub-indices are joined into a new impact matrix on the second level of aggregation, Table 5, and equal weights were assigned to the sub-indices. According to the outranking impact matrix for the composite sustainability index (Table 6), the company had the best sustainability performance in the year 2005. With the aim of being able to compare the results of NCMC with the results of LIN and GME, the rankings of the composite sustainability index were divided by the highest possible permutation score, thus placing the results within a range of between 0 and 1. The highest possible permutation score for the illustrative example is 4.

Table 1 Impact matrix for the illustrative example—first level of aggregation
Table 2 Outranking impact matrix for environmental sub-indices
Table 3 Outranking impact matrix for societal sub-indices
Table 4 Outranking impact matrix for economic sub-indices
Table 5 Impact matrix—second level of aggregation
Table 6 Outranking impact matrix for composite sustainability index

Sensitivity analysis

The development of a CI involves stages where subjective judgments have to be made: when selecting a suitable set of indicators, the choice of the normalization method, the choice of the aggregation method, the weights of the indicators, etc. By using sensitivity analysis, it can be determined as to how the variation in the output is connected qualitatively or quantitatively, to the different sources of variation within the assumptions, and how the composite sustainability index depends on the information fed into it. It was noted earlier that CIs may be considered as models. When several layers of uncertainty are present simultaneously, a CI could become a non-linear, possibly non-additive model. In the case of sensitive analysis regarding non-linear models, variance-based techniques should be used, which are model-free and robust techniques. Our analysis mainly focused on three uncertainties: different normalization methods, different weighting schemes, and different aggregation methods. The evaluation procedure for the composite sustainability index model is given in Fig. 2. This procedure involves three normalization methods, three weighting schemes, and three aggregation methods, thus creating a total of 19 combinations for the calculation of a composite sustainability index. In variance-based sensitivity analysis, it is assumed that the true value, \( \tilde{X}_{i} \), of an input quantity X i is known. A conditional variance \( V\left[ {E\left( {Y\left| {\tilde{X}_{i} } \right.} \right)} \right] \) is estimated holding the true value of the specific fixed-input quantity. Unfortunately, in general, the true values of the input quantities are unknown. Therefore, in order to obtain global sensitivity measurements, the expected value E(Y|X i ) above the whole variation interval of the input quantity X i has to be evaluated. Variance-based sensitivity indices are estimated as ratios between the conditional variance and the unconditional variance, V(Y), for the output quantity Y:

$$ S_{i} = \frac{{V\left[ {E\left( {Y\left| {X_{i} } \right.} \right)} \right]}}{V\left( Y \right)} $$
(15)

where S i is the first-order sensitivity index. This sensitivity index indicates the relative importance of an individual input quantity X i when driving the uncertainty. For additive models, it holds true that \( \sum\nolimits_{i} {S_{i} = 1} \), which means that the input is non-correlated. This leads to an easy quantitative interpretation of the sensitivity index, because each S i delivers a direct measurement for that portion of X i on the output variance V(Y). However, in the cases of non-additive models, it is necessary to take into consideration the interactions among the input quantities within the models or the effects of the higher order. The terms of the higher order are estimated by holding more than one fixed-input quantity, e.g., the input quantities X i and X j :

$$ S_{i,j} = \frac{{V\left[ {E\left( {Y\left| {X_{i} ,X_{j} } \right.} \right)} \right]}}{V\left( Y \right)} - S_{i} - S_{j} $$
(16)

where S i,j is the second-order sensitivity index or a two-way interaction for input quantities X i and X j and \( V\left[ {E\left( {Y\left| {X_{i} ,X_{j} } \right.} \right)} \right] \) is the conditional variance of the expected value for the output quantity Y when the input quantities X i and X j are fixed. The computations of all higher-order terms have high computational costs. However, estimation of the total effects, S Ti , includes all higher-order terms with respect to the input quantity X i , Eq. 17, makes it possible to calculate within one computational step, like the first-order sensitivity index:

$$ S_{Ti} = \frac{{E\left[ {V\left( {Y\left| {X_{ - i} } \right.} \right)} \right]}}{V\left( Y \right)} $$
(17)

where \( E\left[ {V\left( {Y\left| {X_{ - i} } \right.} \right)} \right] \) is the expected amount of residual variance if X i , and only X i were left free to vary over its uncertainty range, all the other variables having their true values (they are fixed). While the first-order sensitivity indices are quantitative sensitivity measurements for additive models, the total sensitivity indices are quantitative measurements for all kinds of models, independent of their model characteristics. Comparison between S i and S Ti may lead to a conclusion regarding the additivity of a model with non-correlated input, as for additive models S Ti  = S i and for non-additive models S Ti  > S i . Additionally, the difference S Ti  − S i is a measure of how much X i is involved in any interaction with other input variables.

Fig. 2
figure 2

Evaluation procedure for the composite sustainability index model

Case study

Selecting the best combination from the analyzed methodologies described in “Construction of a composite sustainability index, and sensitivity analysis” section was carried out as a case study of a brewery. The sustainable indicators were chosen according to the GRI guidelines in cooperation with the company, and grouped under three sections covering the economic, environmental, and social dimensions of sustainability. In total, 69 environmental, 22 social, and 8 economic performance indicators were selected, which were delivered for the time period 2003–2007. The selected environmental, societal and economic indicators are the same as in Tokos et al. (2011). The weights of individual indicators and groups of indicators determined by EW, BAP, and BOD, are given in Table 7 for the environmental group of indicators, in Table 8 for the societal, and in Table 9 for the economic group of indicators. The weights for the BAP method were determined based on the opinion of 11 experts from different production sectors within the case-studied brewery. The defined benchmarks for the selected indicators and more details regarding the analyzed case study can be found in Tokos et al. (2011). The sensitivity analysis was performed over the results of 19 combinations of methods for constructing a composite sustainability index.

Table 7 Weights of individual environmental indicators and sub-index
Table 8 Weights of individual societal indicators and sub-index
Table 9 Weights of individual economic indicators and sub-index

Results and discussion

The results from uncertainty analysis of the composite sustainability index for different construction combinations are given in Fig. 3. It can be concluded that:

Fig. 3
figure 3

Uncertainty analysis of different method combinations

  • the composite sustainability index value varied from 0.0 to 0.9, as well as the ranking order of each year indicating that different combinations of the selected methodologies can lead to very different results, and a comparison of different methods is a necessity,

  • as long as the GME method was involved within the construction frame, the composite sustainability index had zero value when there was zero value in the former data layer.

By detailed analysis of the data layer, it was found that the zero value had two sources: (1) origin data collected from the company, for example, the brewery did not have a probationer in 2007, (2) normalization of individual indicators using minimum–maximum method. According to this method the value of the normalized individual indicator was between 0 and 1. In the case of the indicator with a positive impact on sustainable development for \( I_{i,j,t}^{ + } = I_{i,j}^{{ + ,{\text{MIN}}}} \), the normalized form had a value of 0 or in the case of the indicator had negative impact on sustainable development for \( I_{i,j,t}^{ - } = I_{i,j}^{{ - ,{\text{MAX}}}} \), according to the indicators’ nature. Based on these observations it can be concluded that GME method should not be used if there is zero value in the origin data or in combination with minimum–maximum normalization method.

Sensitivity analysis can help to investigate which steps are the key drivers of the final results, the value of the composite sustainability index. During the calculation one of the method was fixed within a specific step within the construction frame and its impacts on the final results were examined, first-order sensitivity index. The first-order sensitivity index for the sustainability sub-indices and composite sustainability index is given in Fig. 4. According to the results, it can be concluded that when construction sustainability sub-indices, different construction step will have the highest proportion in the output variance (the greatest impact on the result). In the case of the economic sub-index for the year 2003, the normalization step took part with more than 60% of the output uncertainty while in the case of the environmental sub-index for the same year (2003) the normalization step contributed to the output variance with only 26%. The aggregation method had the greatest impact on the environmental sub-index value for the year 2003. In general, the composite sustainability index was mainly affected by the normalization step (the first layer of data treatment), followed by the aggregation and weighting step.

Fig. 4
figure 4

The first-order sensitivity index for sustainability sub-indices and composite sustainability index

The interactions among the construction steps can be measured by the total sensitivity index. In this analysis all methods were fixed except for one within the specific construction steps. The results are given in Fig. 5. The indices added up to a number greater than 1 due to the interactions which seemed to exist among the identified influential factors. As, S Ti  > S i , it can be concluded that all of the construction combinations were non-additive. The difference S Ti  − S i , points out that the aggregation step had more interaction with the other two construction steps, following by the normalization and the weighting step, Table 10.

Fig. 5
figure 5

The total sensitivity index of composite index

Table 10 First and total sensitivity index for the case study

Comparison between the two normalization methods, with reference to distance and minimum–maximum method is shown in Fig. 6, which suggests that the distance to the reference method had more influence on the output variance. The percentage of annual differences over the consecutive years method was excluded because of its limited compatibility and the information loss from the first year. Figure 7 gives a comparison among the weighting methods, EW, BAP, and BOD. According to the results, the ‘benefit of the doubt’ approach (BOD) gave the largest contribution to the output variance V(Y). In the case of the aggregation methods, the GME was excluded from the investigation because of its disadvantage in dealing with zero involved data. A comparison of the remaining aggregation method is given in Fig. 8, which shows that LIN had the greatest effect on the final result.

Fig. 6
figure 6

Comparison of first-order sensitivity index among different normalization methods

Fig. 7
figure 7

Comparison of first-order sensitivity index among different weighting methods

Fig. 8
figure 8

Comparison of first-order sensitivity index among different aggregation methods

Conclusions

This study investigated different combinations for calculating a composite sustainability index, and thus providing a comparison among different combinations to give a recommendation for the best way of constructing a composite index. Sustainable development indicators are recognized as useful tools for the assessment and anticipation of production performance and trends. Despite their increasing use, CIs remain controversial as they undesirably depend on the method used for normalizing the original data and causing disagreement among experts/stakeholders on the specific weighting scheme, and the way to aggregate sub-indicators. As a result, it is important to compare different combination of normalization–weighting–aggregation methods and create a recommendation for the most input sensitive construction scheme of CIs. The applied methodology in this article gradually aggregated sustainable development indicators into sustainability sub-indices and, finally, to a composite sustainability index. The normalization methods included in this analysis were: minimum–maximum, distance to a reference, and the percentage of annual differences over consecutive years. EWs, ‘benefit of the doubt’ approach, and BAP were used for determining the weights of individual indicators and sustainability sub-indices. The linear, geometric, and NCMCs were used in the cases of aggregation methods. Sensitivity analysis was performed over the results of 19 combinations of methods for constructing a composite sustainability index. According to the results, the normalization step had the largest influence on the composite sustainability index, followed by the aggregation and the weighting step. As the minimum–maximum normalization method will unavoidably resulted in a zero value causing information loss when combined with the GME method, this combination cannot be recommended for CI construction. For its limited compatibility and information loss during the first year, the percentage of annual differences over consecutive years is also not recommended. Within the normalization methods the ‘distance to a reference’ method had the greatest impact on the value of the composite sustainability index. In the weighting step, the ‘benefit of the doubt’ approach (BOD) and LIN methods for the aggregation step were identified as methods with the biggest influence on the result. Based on the conducted analysis and gained experience, the recommended construction frame is: the ‘distance to a reference’ normalization method in combination with the ‘benefit of the doubt’ and LIN.