1 Introduction

Composite indicators are becoming increasingly influential tools of public policy. Usually taking the form of a weighted arithmetic average of normalized indicators, these indices condense complex multidimensional information into a single number. As such, they are easy to compute and to interpret. Furthermore, they allow for the computation of rankings to assess the comparative standing of different entities (countries, regions, governmental policies under consideration). This conceptual simplicity facilitates communication with the press and public, and thus aids in generating awareness regarding the issue that the composite indicator is meant to address.

Despite their increasing popularity, composite indicators are often strongly criticized by official statisticians and economists, including those interested in the measurement of sustainability (Bohringer and Jochem 2007; Ravallion 2012b). These critiques come in different varieties, both conceptual and methodological. On the conceptual side, it is argued that the underlying issues that composite indicators address are often ill-defined and open to excessive interpretation. Ravallion (2012b) cites Newsweek magazine’s “best country rankings” as an intuitive example of this kind of definitional haziness. Statisticians further complain that the process of constructing a composite index discards useful statistical information by reducing multidimensional data to an aggregate measure. While we agree that these are important issues, we do not dwell on them as the focus of this paper is primarily methodological. Here, critics argue that integral modeling assumptions behind the construction of composite indices such as the choice of normalization procedure, aggregation function, and weighting scheme, fail to be grounded in economic theory or a coherent analytic framework (Ravallion 2012b). What is more, the ad-hoc nature of these choices may lead to unintended theoretical consequences such as unacceptable tradeoffs (Ravallion 2012a) and problematic measurement-theoretic implications (Ebert and Welsch 2004). Finally, the indices themselves may be very sensitive to changes in these subjective choices so that any insights or rankings that are generated can be highly non-robust.

One way of addressing the dependence of composite indices on arbitrary assumptions on normalization and weighting (while maintaining the linearity of the aggregation function) is via the nonparametric framework of data envelopment analysis (DEA).Footnote 1 First developed by Charnes et al. (1978) in the field of production economics, DEA was primarily conceived as a methodology for measuring the relative efficiency of different decision-making units. Since then, DEA has been the subject of extensive research in both economics and operations research (Cooper et al. 2007). Its application in composite index construction, known as the “benefit of the doubt” (BOD) method, was proposed by Cherchye et al. (2007b). For each entity (country, region, policy) to be assessed, the BOD method searches for its “most favorable” set of weights, defined as the maximizers of the ratio of its score to that of the highest-performing member of the group. Thus, weights are determined endogenously and may differ between entities. Furthermore, it is important to note that DEA takes as input non-normalized data and its scores and rankings are invariant to ratio-scale transformations (i.e., multiplicative changes in units). DEA-like methods are being increasingly used to build composite indices for a variety of applications ranging from market structure and technology (Cherchye et al. 2007a, 2008), to gender issues (Dominguez-Serrano and Blancas 2011), to development and environmental policy and assessment (Fare et al. 2004; Zaim 2004; Zhou et al. 2007, 2010; Zhang et al. 2013; Jin et al. 2014).

In a recent paper Zhou et al. (2007) [hereafter, ZAP] extended the DEA framework of Cherchye et al. (2007b) to account for worst-case analysis. In particular, they propose a model with which to compute an entity’s “least favorable weights” and corresponding worst-case relative performance. The yardstick of performance becomes the ratio of an entity’s score to that of the worst member in the group. They then go on to propose a hybrid DEA methodology in which convex combinations of their normalized best- and worst-case DEA scores are considered.

Using ZAP’s work as a springboard we argue that, while interesting in its own right, the worst-case measure that they adopt may not capture, in a theoretical as well as practical sense, the notion of worst-case relative performance. We propose an alternative measure that is, in a strict mathematical sense, the worst-case analogue of the BOD model of Cherchye et al. (2007b). While the mathematical structure of this measure differs significantly to that of the BOD method, we show how it can nonetheless be tractably computed, even under general convex restrictions on the weights.Footnote 2 We then compare the two methodologies using data from ZAP’s Sustainable Energy Index case study, demonstrating that they occasionally lead to notably different results.

1.1 Paper Outline

The structure of the paper is as follows. Section 2 sets up the formal model and relevant DEA framework. It goes on to discuss ZAP’s approach to modeling worst-case relative performance and to suggest, by means of a stylized example, how it may result in undesirable conclusions. Section 3 introduces and analyzes an alternative optimization problem that is the strict worst-case analogue of traditional DEA for composite indices. Section 4 applies the proposed procedure to the case study of the original ZAP paper, showing how the two methodologies can lead to divergent results. Section 7 provides conclusions. All mathematical proofs, tables, and figures are collected in the “Appendix”.

2 Model Description

Suppose we are given a set \({\mathcal {A}}=\{a_1, a_2,\ldots ,a_A\}\) of A agents and a set \({\mathcal {I}}=\{i_1, i_2,\ldots ,i_I\}\) of I indicators. Moreover let \(x_{ai}\) denote agent a’s value for indicator i. All indicator values \(x_{ai}\) for \(a \in {\mathcal {A}}\) and \(i \in {\mathcal {I}}\) are assumed to be positive. Indicators are weighted with a non-negative column vector of weights \(\varvec{w} \in \mathfrak {R}^I_+\), where \(w_i\) denotes the weight assigned to indicator i. Consistent to classical DEA, an agent a’s score under weights \(\varvec{w}\) is given by the corresponding weighted sum of the non-normalized indicators: \(\sum _{i=1}^I w_i x_{ai}\).

Now, let us introduce the main concept behind the use of DEA-like methods in composite indicators. Consider an individual agent \(a_j \in {\mathcal {A}}\) and suppose that weights \(\varvec{w}\) are chosen. The relative standing of this agent among her peers, given the chosen weights w, is captured via the ratio of her performance to that of the highest-performing agent of group \({\mathcal {A}}\). Denoting it by a function \(f_a(\varvec{w})\), it equals:

$$\begin{aligned} f_{a_j}(\varvec{w})\equiv \frac{\sum _{i=1}^I w_i x_{a_ji}}{\max _{a\in \mathcal {A}}\sum _{i=1}^I w_i x_{ai}}. \end{aligned}$$
(1)

Equation (1) ranges between 0 and 1; the higher it is, the closer agent \(a_j\) is to the top performer. If it equals 1, then for this choice of \(\varvec{w}\), agent \(a_j\) has the top score. The DEA approach to the construction of composite indicators uses exactly this measure of relative standing as its measuring stick of performance. In particular, it searches for the set of weights that maximize the function \(f_a(\varvec{w})\), for each agent \(a\in {\mathcal {A}}\). Applied to agent \(a_j\), it solves the following optimization problem:

$$\begin{aligned} f^*_{a_j}\equiv \max _{\varvec{w} \ge {\bf 0}} f_{a_j}(\varvec{w}) = \max _{\varvec{w} \ge {\bf 0}} \frac{\sum _{i=1}^I w_i x_{a_j i}}{\max _{a\in \mathcal {A}}\sum _{i=1}^I w_i x_{ai}} \end{aligned}$$
(2)

Optimization problem (2) determines the weights, subject to a non-negativity constraint, resulting in the best-case relative performance of agent \(a_j\). These are known as the “most favorable weights” for agent \(a_j\).

From a mathematical standpoint, the tractability of problem (2) is crucially dependent on the fact that it may be reduced to the following, equivalent linear-fractional program:

$$\begin{aligned} f^*_{a_j}= & {} \max _{{\varvec{w}} \ge {{\bf 0}}, z} \quad \frac{\sum _{i=1}^I w_i x_{a_ji}}{z} \nonumber \\& {\text s.t.} \quad \sum _{i=1}^I w_i x_{ai} \le z, \; \; a \in {\mathcal {A}}, \end{aligned}$$
(3)

which, in turn, can be shown to be equivalent (see section 4.5.2 in Boyd and Vandenberghe (2004) to the linear program

$$\begin{aligned} f^*_{a_j}= & {} \max _{{\varvec w}\ge {\bf 0}} \quad \quad \sum _{i=1}^I w_i x_{a_ji} \nonumber \\& {\text s.t.} \quad \quad \sum _{i=1}^I w_i x_{ai} \le 1, \; \; \text{for\,all} \; a \in {\mathcal {A}}. \end{aligned}$$
(4)

Linear program (4) is the familiar “benefit of the doubt” method for composite indicators discussed in Cherchye et al. (2007b) and applied in many contexts since (Zhou et al. 2007; Cherchye et al. 2007a, 2008; Hatefi and Torabi 2010; Dominguez-Serrano and Blancas 2011; Rogge 2012).

Importantly, additional linear constraints may be imposed to the weights in optimization problem (2) at no conceptual or computational cost. Particularly compelling weight restrictions come in the form of so-called “pie shares” (see Cherchye et al. 2007a, b), which set lower and upper bounds on the contribution of any single indicator to the agent’s total score. To wit, given a set of numbers \(L_i, U_i\) for all \(i\in {\mathcal {I}}\) the corresponding pie-share constraints to be appended to problem (2), and ultimately also to its linear equivalent (4), are given by

$$\begin{aligned} L_i \le \frac{w_i x_{a_j i}}{\sum _{k=1}^I w_k x_{a_jk}} \le U_i, \; \; \text {for\, all} \; i \in {\mathcal {I}}. \end{aligned}$$
(5)

The above constraints hold theoretical as well as practical appeal. Theoretically, their imposition does not compromise the very desirable property of ratio-scale invariance of DEA, also known as “units invariance” (Cherchye et al. 2007b; Cooper et al. 2007). That is, DEA scores and their resulting rankings remain unchanged under incomparable (i.e., non-identical across indicators) ratio-scale transformations of the original indicators. For example, if an index is composed of three indicators \(i_1, i_2, i_3\) and we multiple indicator \(i_1\) by 2, \(i_2\) by 8 and \(i_3\) by 0.1, this will have no effect on the corresponding DEA scores and rankings. This property is particularly compelling in the case of environmental indices (Ebert and Welsch 2004).Footnote 3 Meanwhile, on a practical level pie shares are pure numbers whose meaning is easy to grasp and on whose values experts can usually come to an agreement (Cherchye et al. 2007a, b).

2.1 The Worst-Case Model of ZAP

ZAP take as a starting point the above standard DEA model and extend it to account for worst-case relative performance. Considering again an agent \(a_j \in {\mathcal {A}}\), they draw on previous work by Zhu (2004) and Takamura and Tone (2003) and (implicitly) define this agent’s “least favorable weights” as the solution of the following optimization problem:

$$\begin{aligned} g^{ZAP}_{a_j}\equiv \min _{\varvec{w} \ge {\bf 0}} \frac{\sum _{i=1}^I w_i x_{a_ji}}{ \min _{a\in \mathcal {A}}\sum _{i=1}^I w_i x_{ai}}. \end{aligned}$$
(6)

That is, they define the worst-case DEA weights to be such that they minimize the ratio of an agent’s performance to that of the worst-case performer in the group. Optimization problem (6) retains the nice properties of problem (2) in that it too can be reduced to a linear-fractional program, and ultimately to the following linear program (which, in turn, is the formulation that appears in ZAP’s work):

$$\begin{aligned} g^{ZAP}_{a_j}= & {} \min _{w\ge {\mathbf 0}} \quad \quad \sum _{i=1}^I w_i x_{a_ji} \nonumber \\& {\text s.t.} \quad \quad \sum _{i=1}^I w_i x_{ai} \ge 1, \; \; a \in {\mathcal {A}}. \end{aligned}$$
(7)

In a formal sense, problem (6), and thus also its linear equivalent (7), does not correspond to worst-case DEA for composite indicators. This is because it abandons the measure of relative performance \(f_{a_j}(\varvec{w})\) of Eq. (1), which constitutes the objective function of problem (2), in favor of an alternative measure, namely

$$\begin{aligned} g_{a_j}(\varvec{w})\equiv \frac{\sum _{i=1}^I w_i x_{a_ji}}{\min _{a\in \mathcal {A}}\sum _{i=1}^I w_i x_{ai}}. \end{aligned}$$
(8)

The ratio \(g_{a_j}(\varvec{w})\) is no smaller than 1 and unbounded above; the smaller it is, the closer agent \(a_j\) is to the bottom performer. If it equals 1, then for this choice of \(\varvec{w}\), agent \(a_j\) has the worst score. Thus, worst-case DEA as defined by ZAP searches for the set of weights w that minimize the ratio \(g_{a_j}(\varvec{w})\).

While problem (6) is interesting in its own right, and the underlying optimization problem has identical structure to the standard DEA context (and is thus readily solvable using similar techniques), it is not the worst-case analogue of standard DEA. Moreover, it may sometimes fail to capture the essence of worst-case relative performance. The following, highly stylized, example illustrates this fact.

Example 1

[counter-intuitive implications of ZAP’s model]. Consider the setting described in Table 1 summarizing an instance of the problem for \({\mathcal {A}}=\{a_1, a_2\}\) and \({\mathcal {I}}=\{i_1, i_2\}\).

Table 1 \(x_{ai}\) values for Example 1

The standard best-case DEA model of Eq. (2) results in identical scores for \(a_1\) and \(a_2\), since \(f^*_{a_1}=f^*_{a_2}=1\). Let us now consider worst-case performance. According to ZAP’s model of Eq. (6), agents \(a_1\) and \(a_2\) are equal as they both get the absolute minimum score of 1. This is because there exist weight vectors that equalize their performance (e.g., \(\varvec{w}=(1/2,1/2)'\)), thus implying that they are simultaneously the worst performers of the two-member group \({\mathcal {A}}\). By definition of problem (6), this means that they both get the worst possible score, i.e., \(g^{ZAP}_{a_1}=g^{ZAP}_{a_2}=1\), and so ZAP’s methodology cannot discriminate between them. This result does not, arguably, accord with intuition. Indeed, we would expect agent \(a_2\)’s balanced performance across indicators, in combination with \(a_1\)’s extremely unbalanced one, to be recognized and rewarded. Furthermore, note that the exact numbers here are not important. Similar results would obtain if we make \(x_{a_1i_1}\ge 0\) as large and \(x_{a_1i_2}\ge 0\) as small as we like, and set \(x_{a_1i_1}+x_{a_1i_2}=x_{a_2i_1}+x_{a_2i_2}\) and \(x_{a_2i_1}=x_{a_2i_2}\). \(\square\)

Finally, in order to construct a DEA measure combining best- and worst-case performance, ZAP normalize the results of (2) and (6) via max-min rescaling. This is necessary because the scales of the two measures clearly differ; one ranges from 0 to 1, the other from 1 to \(+\infty\). This normalization introduces an undesirable source of subjectivity, which arguably goes against the normalization-free essence of DEA. In any event, given an agent \(a_j\) and \(\lambda \in [0,1]\), ZAP propose to consider the following family of convex combinations of normalized best- and worst-case DEA scores:

$$\begin{aligned} CI^{ZAP}_{a_j}(\lambda ) = \lambda \frac{f^*_{a_j}-\min _{a \in {\mathcal {A}}} f^*_{a}}{\max _{a \in {\mathcal {A}}}f^*_{a}-\min _{a \in {\mathcal {A}}} f^*_{a}} +(1 -\lambda ) \frac{g^{ZAP}_{a_j}-\min _{a \in {\mathcal {A}}} g^{ZAP}_{a}}{\max _{a \in {\mathcal {A}}}g^{ZAP}_{a}-\min _{a \in {\mathcal {A}}} g^{ZAP}_{a}}. \end{aligned}$$
(9)

We close this section by noting that, while relatively recent, ZAP’s model has already been quite influential in the literature. Indeed, a number of studies have adopted ZAP’s approach to worst-case DEA for the construction of composite indices (see Hatefi and Torabi 2010; Dominguez-Serrano and Blancas 2011; Rogge 2012, among others).

3 An Alternative Approach to Worst-Case DEA

An alternative way of modeling worst-case relative performance is to maintain the structure of optimization problem (2) (i.e., its objective function and constraints) but make it a minimization as opposed to a maximization. This would involve solving the following optimization problem:

$$\begin{aligned} g_{a_j}^*\equiv \min _{\varvec{w} \ge {\mathbf 0}} \frac{\sum _{i=1}^I w_i x_{a_j i}}{\max _{a\in \mathcal {A}}\sum _{i=1}^I w_i x_{ai}}. \end{aligned}$$
(10)

Problem (10) is the strict worst-case analogue of problem (2). Not surprisingly, when applied to the data of Example 1 it clearly points to \(a_2\)’s far superior worst-case relative performance since we have \(g_{a_2}^*=5000/9999\) vs. \(g_{a_1}^*=1/5000\)).

Analytically, problem (10) is not as straightforward as (2) or (6). This is because we cannot do the same trick of Eq. (3) to reduce it to an equivalent linear-fractional program. Nonetheless, it is possible to argue from first principles that it too admits a simple and tractable solution.

For expository reasons, before going into the statement and proofs of the following results, we generalize Eq. (10) to incorporate arbitrary constraints on the weights. Letting \(W_j \subseteq {\mathfrak R}_+^{I}\) denote an arbitrary subset of the non-negative orthant, define the optimization problem:

$$\begin{aligned} g_{a_j}^*(W_j)\equiv \min _{\varvec{w} \in W_j} \frac{\sum _{i=1}^I w_i x_{a_j i}}{\max _{a\in \mathcal {A}}\sum _{i=1}^I w_i x_{ai}}. \end{aligned}$$
(11)

The pie-share bounds of Eq. (5) correspond to sets \(W_j\) that are polyhedra, i.e., they can be expressed as systems of linear inequalities [see Chapter 2 in Bertsimas and Tsitsiklis (1997)]. This is because Eq. (5) are equivalent to the system of linear inequalities

$$\begin{aligned} \left\{ w_i x_{a_j i} - L_i \sum _{k=1}^I w_k x_{a_jk}\ge 0, \; \; w_i x_{a_j i} - U_i \sum _{k=1}^I w_k x_{a_jk}\le 0 \right\} , \; \; \text {for\,all} \; i \in {\mathcal {I}}. \end{aligned}$$

We are now ready to state the paper’s first theorem.

Theorem 1

Consider optimization problem (11) with m linear constraints on the weights given by \(W_j=\{\varvec{w} \in {\mathfrak R}^I: \; \varvec{w}\ge {\bf 0}, \; \varvec{G}^j \cdot \varvec{w} \le \varvec{h}^j\}\), where \(\varvec{G}^j \in \mathfrak {R}^{m \times I}\) and \(\varvec{h}^j \in \mathfrak {R}^m\). We have

$$\begin{aligned} g_{a_j}^*(W_j)=\min _{a\in {\mathcal {A}}}\left\{ \min \limits _{\begin{array}{c} \varvec{w} \ge 0, \; y \ge 0 \\ \varvec{G}^j \varvec{w}-\varvec{h}^j y \le 0 \\ \sum _{i=1}^I w_i x_{ai}=1 \end{array}} \sum _{i=1}^I w_i x_{a_ji}\right\} . \end{aligned}$$
(12)

Proof

See “Appendix”. \(\square\)

Theorem 1 establishes that problem (11) is highly tractable for arbitrary polyhedral restrictions on weights. Indeed, its solution simply amounts to solving A linear programs, the inner minimizations of Eq. (12) for each \(a \in {\mathcal {A}}\), and picking the optimal solution which is the smallest. Specifically, this means that the pie-share weight restrictions of Eq. (5) can be easily accommodated in problem (10).

Corollary 1 establishes an easy consequence of Theorem 1. In particular, when there are no constraints on the weights, problem (11) can be trivially solved by simply enumerating the ratios \(\frac{x_{a_ji}}{x_{ai}}\) for all \(i \in {\mathcal {I}}\) and \(a \in {\mathcal {A}}\) and picking the minimum value.

Corollary 1

Consider the setting of Theorem 1 with no constraints on the weights except for non-negativity (i.e., \(W_j = {\mathfrak R}_+^I\)). In this case Eq. (12) can be simplified to:

$$\begin{aligned} g_{a_j}^*(\mathfrak {R}^I_+)\equiv g_{a_j}^*=\min _{a \in {\mathcal {A}}, \; i \in {\mathcal {I}}} \; \frac{x_{a_ji}}{x_{ai}}. \end{aligned}$$
(13)

Let \(\mathcal {I}^*\) denote the set of indicators that attain the minimum in Expression (13). Any vector \(\varvec{w^*_{a_j}}\ge \mathbf {0}\) such that \(\sum _{i \in {\mathcal {I}}^*} w^*_{a_ji^*}>0\) for \(i^* \in \mathcal {I}^*\) and \(w^*_{a_ji}=0\) otherwise, is an optimal solution of problem (10).

Finally, it is worth noting that the positive result of Theorem 1 extends to the case of arbitrary convex constraints, which has not been previously mentioned in the DEA literature.

Theorem 2

Consider optimization problem (11) for an arbitrary set \(W_j\). We have

$$\begin{aligned} g_{a_j}^*(W_j)=\min _{a\in {\mathcal {A}}}\left\{ \min _{\varvec{w} \in W_j} \frac{\sum _{i=1}^I w_i x_{a_ji}}{\sum _{i=1}^I w_i x_{ai}}\right\} . \end{aligned}$$
(14)

If \(W_j\) can be written as \(W_j=\{\varvec{w} \in \mathfrak {R}^I_+: \; c_k(\varvec{w}) \le 0, \; k=1,2,\ldots ,K\}\), where \(c_k(\cdot )\) for \(k=1,2,\ldots ,K\) are convex functions, then the inner minimizations of Eq. (14) are concave fractional programs that can be efficiently solved with standard methods.

Proof

See “Appendix”. \(\square\)

In conclusion, the analytic results of this section establish that problem (11), in addition to being the (generalized) worst-case equivalent of (2), is highly tractable.

4 Numerical Case Study

In this section, we apply the framework developed in Sect. 3 to the original case study of ZAP. In their paper, Zhou and his co-authors applied their DEA methodology to the construction of a sustainable energy index (SEI) for the eighteen Asia Pacific Economic Development (APEC) economies in 2002. In what follows, we offer a bare-bones description of ZAP’s SEI, omitting details on how the index was developed. This is because our primary objective is to briefly compare the results obtained under the two different DEA methodologies.

The three building blocks of ZAP’s SEI are an energy efficiency indicator (EEI), a renewable energy indicator (REI), and a climate change indicator (CCI). The EEI is the reciprocal of the energy-to-GDP ratio, the REI is the percentage of renewable energy in total final energy consumption, and the CCI is the reciprocal of the CO\(_2\) emissions-to-GDP ratio. More information on the rationale and data sources of the SEI can be found in Sect. 4 of ZAP. Table 2 summarizes data on the EEIs, REIs, and CCIs of the 18 APEC countries.

5 Application of DEA Methodologies

We begin by considering the simplest possible DEA setting in which there are no weight restrictions. Table 3 summarizes the results of the various DEA models for this case.Footnote 4

The second column of Table 3 collects the results of best-case DEA scores as defined by Eq. (2) (which are of course identical to those cited in ZAP) along with the ranks they imply. The third column collects the results of ZAP’s worst-case DEA scores as per Eq. (6), while the fourth column summarizes the worst-case DEA scores proposed in this paper as per Eq. (10). The fifth column lists the average of the normalized values of Columns 2 and 3, i.e., the values of Eq. (9) for \(\lambda =1/2\); the sixth column does the same for Columns 2 and 4, i.e using the worst-case DEA scores of this paper.Footnote 5

Examining Table 3 we see that the choice of model (6) versus model (10) results in numerous rank changes (indicated in red). Some of them can be quite dramatic, like for instance those involving Russia, which is last according to model (6) and 12th according to model (10). Indeed, to elucidate the differences between the two methodologies it is instructive to focus on Russia and contrast its performance to that of Korea. Under model (6) Russia and Korea are considered equal as there exist weight vectors that result in their having the minimum score in group \({\mathcal {A}}\) of APEC countries. Denoting by \(\varvec{w_{a_{17}}^{ZAP}}\) and \(\varvec{w_{a_{18}}^{ZAP}}\) the optimal solutions of (6) for Korea and Russia respectively, we have

$$\begin{aligned} \varvec{w_{a_{17}}^{ZAP}}=(0.01,1.52,0.03)', \; \varvec{w_{a_{18}}^{ZAP}}=(0.22,0.01,0.49)'. \end{aligned}$$

Conversely, under model (10) we see that Korea has a far inferior worst-case performance to Russia. The optimal weights provide insight as to why. Denoting by \(\varvec{w_{a_{17}}^{*}}\) and \(\varvec{w_{a_{18}}^{*}}\) the optimal solutions of (10) for Korea and Russia respectively, we have

$$\begin{aligned} \varvec{w_{a_{17}}^{*}}=(0,K_1,0), \; \varvec{w_{a_{18}}^{ZAP}}=(0,0,K_2), \; \text {for\, any} \; K_1, K_2 >0. \end{aligned}$$

Hence, we see that for Korea (Russia), worst-case weights correspond to those assigning positive weight exclusively to indicator EEI (CCI). The result now follows since Korea’s performance of 0.6 in EEI (where New Zealand has the maximum value of 56.9) is, in relative terms, worse than that of Russia which has a CCI value of 0.652 (where Papua has the maximum value of 5.039).

Qualitatively similar implications persist even when we impose the uniform pie-share bounds \((L_i,U_i)=(L,U)=(0.1,0.5)\) for all \(i\in \{1,2,3\}\), albeit to a weaker degree. Table 4 summarizes the corresponding results.

6 Combining Best- and Worst-Case DEA Scores

Examining the fifth and sixth columns of Tables 3 and 4, we see that the rankings implied by the two methodologies converge significantly when we consider the averages of the respective normalized DEA scores, as per Eqs. (9) and (15). Taking this analysis further, Figure 1 follows the example of ZAP’s Figures 2 and 3 and presents box plots of country ranks when \(\lambda\) is allowed to assume all values in \(\{0, 0.1., 0.2.,\ldots ,0.9,1\}\), for both cases of unconstrained and constrained weights. As expected, we observe greater variability in country ranks for the worst-case DEA model of Eq. (15) compared to that of (9). The effect is stronger when weights are unrestricted, but persists even upon setting the aforementioned pie shares.

In conclusion, this brief empirical exercise suggests that indices combining best- and worst-case DEA scores are quite sensitive to how one chooses to model worst-case relative performance.

7 Conclusion

This note has revisited the concept of worst-case performance in a nonparametric DEA framework, first introduced in the composite-indicator literature by Zhou et al. (2007) [ZAP]. We argue that, while interesting and valid in its own right, the worst-case measure adopted by ZAP does not capture, in a formal sense, the notion of worst-case DEA performance. By means of a stylized example, we showed that this theoretical inconsistency may at times lead to undesirable implications. We analyze the strict worst-case analogue of standard DEA and show how it can be tractably computed, even under general convex restrictions on the weights. Furthermore, the resulting worst-case DEA scores can be combined with their best-case analogues without requiring prior normalization. The two methodologies are compared using ZAP’s Sustainable Energy Index case study, demonstrating that they occasionally lead to divergent results. Future work could incorporate the model presented herein in uncertainty and sensitivity analyses of composite indicators (Cherchye et al. 2008).