1 Introduction

In the DEA literature, the super-efficiency procedure has been proposed to rank efficient decision making units (DMUs) by Andersen and Petersen (1993, AP thereafter) and to identify outliers (Banker and Gifford 1988; Banker et al. 1989). Under the super-efficiency DEA formulation, the efficiency scores may be different and do not have to equal one for those efficient DMUs. Thus the AP procedure can be suggested to discriminate those efficient units. Nevertheless there lacks evidence on whether the super-efficiency model is effective to differentiate those efficient units despite its popularity in DEA applications. Banker and Chang (2006) find that the AP procedure does not rank efficient units satisfactorily in their simulation study but they do not elaborate on why the AP procedure is ineffective in ranking efficient units. This paper attempts to shed light on this problem by examining the performance of super-efficiency procedures in different “regions” of the production set, i.e. for DMUs with different levels of input/output values. We find that the unsatisfactory results mainly originate from the “left corners” of DMUs, those units with relatively smaller values.

In the case of outlier detection, Banker and Gifford (1988) suggest using super-efficiency scores to screen out those observations whose super-efficiency scores exceed a pre-specified screen level (see Banker et al. 1989, pp. 279–280).Footnote 1 After removing those outliers, a conventional DEA model (e.g. the BCC model) is estimated with the remaining observations. We refer to this approach as the BG procedure hereafter. Banker and Chang (2006) show that the BG procedure is effective in identifying outliers in the presence of noise data. This finding is consistent with our results based on extensive simulation experiments. We further investigate the effect of different noise levels on outlier detection using the BG procedure. We find that the BG procedure is generally more effective when the noise level is high.

It is important to note that the simulation study in Banker and Chang (2006) only considers a Cobb–Douglas production function and a super-efficiency procedure for the BCC model. It remains to be seen if their findings are consistent across different production functions, different DEA formulations and different returns-to–scale settings. Our experiments extend Banker and Chang (2006) in three dimensions: (1) we incorporate different production functions based on a polynomial function that displays different characteristics (Banker et al. 2004, Banker and Chang 1996); (2) we examine the performance of super-efficiency procedures under both BCC and CCR formulations; and (3) we consider different returns-to-scale (CRS and NIRS) assumptions. Our simulation results show that the findings in Banker and Chang (2006) are robust to different DEA formulations, production functions and returns-to-sscale assumptions.

The remainder of this paper is organized as follows. We first describe the super-efficiency model in DEA in Sect. 2. Section 3 details the data generating process and the simulation setup for testing the performance of the AP procedure on ranking efficient units. Section 4 describes the data generating process and simulation results to evaluate the performance of the super-efficiency model in outlier identification. Section 5 concludes with a summary of our principal results.

2 Super-efficiency models

We consider three specific super-efficiency models: super BCC, super CCR, and super NIRS models.

2.1 Super BCC

We refer to the approach of using the super-efficiency procedure on the BCC formulation as the super BCC model. Let \(\hbox {Y}_\mathrm{j} = 0\) and \(\hbox {X}_\mathrm{j} = 0,\, \hbox {j} = 1,{\ldots }\hbox {N}\), be the output and input vectors for N observations, with at least one element of each vector being strictly positive. The output-oriented super-efficiency measure \(\hat{{\uppsi }}_\mathrm{k}^\mathrm{SE}\) for an observation \((\hbox {X}_\mathrm{k},\, \hbox {Y}_\mathrm{k})\), \(\hbox {k}\in \{1{\ldots }\hbox {N}\}\) is the reciprocal of the super-inefficiency measure \(\hat{{\theta }}_{k}^{SI}\) obtained by solving the following linear program:

$$\begin{aligned} \hat{{\theta }}_k^{SI} =\hbox {Max}\,\theta _k \end{aligned}$$
(1)

subject to

$$\begin{aligned}&\mathop {\mathop {\sum }\limits _{\mathrm{j}=1}}\limits _{\mathrm{j}\ne \mathrm{k}}^\mathrm{N} {{\uplambda }_\mathrm{j} \hbox {Y}_\mathrm{j}} -{\uptheta }_\mathrm{k} \hbox {Y}_\mathrm{k} \ge 0\end{aligned}$$
(1a)
$$\begin{aligned}&\mathop {\mathop {\sum }\limits _{\mathrm{j}=1}}\limits _{\mathrm{j}\ne \mathrm{k}}^\mathrm{N} {{\uplambda }_\mathrm{j} \hbox {X}_\mathrm{j} } \,\le \hbox {X}_\mathrm{k}\end{aligned}$$
(1b)
$$\begin{aligned}&\mathop {\mathop {\sum }\limits _{\mathrm{j}=1}}\limits _{\mathrm{j}\ne \mathrm{k}}^\mathrm{N} {{\uplambda }_\mathrm{j}} =1\end{aligned}$$
(1c)
$$\begin{aligned}&{\uptheta }_\mathrm{k} ,{\uplambda }_\mathrm{j} \,\ge 0 \end{aligned}$$
(1d)

Note that the above super-efficiency model excludes the observation “k” under evaluation in the reference set for the constraints in (1a), (1b) and (1c), as opposed to the conventional BCC model. Thus the reference observation \((\mathop {\sum \limits _{\mathrm{j}=1}^\mathrm{N}}\limits _{{{\mathrm{j}\ne \mathrm{k}}}} {{\uplambda }_\mathrm{j} \hbox {X}_\mathrm{j} } ,\,{\mathop {\sum \limits _{\mathrm{j}=1}^\mathrm{N}}\limits _{\mathrm{j}\ne \mathrm{k}}} {{\uplambda }_\mathrm{j} \hbox {Y}_\mathrm{j}})\) in the evaluation of the super-efficiency of observation k is constructed only from observations other than k itself. Nevertheless this does not guarantee a convex combination can be created from the remaining observations to envelop observation k. Banker and Gifford (1988) proved that while there always exists a feasible solution to the super-efficiency model for the CCR specification, there may not be a feasible solution to the super-efficiency model for the BCC specification for certain extreme observations.Footnote 2

To avoid the computational problem associated with infeasible programs for the BCC super-efficiency model, we solve the following modified model:

$$\begin{aligned} \hbox {Max}\,{\upeta }_\mathrm{k} -2{\uplambda }_\mathrm{k} \end{aligned}$$
(2)

subject to

$$\begin{aligned}&\sum \limits _{\mathrm{j}=1}^\mathrm{N} {{\uplambda }_\mathrm{j}} \hbox {Y}_\mathrm{j} -{\upeta }_\mathrm{k} \hbox {Y}_\mathrm{k} \,\ge 0\end{aligned}$$
(2a)
$$\begin{aligned}&\sum _{\mathrm{j}=1}^\mathrm{N} {{\uplambda }_\mathrm{j} \hbox {X}_\mathrm{j} } \,\le \hbox {X}_\mathrm{k}\end{aligned}$$
(2b)
$$\begin{aligned}&\sum \limits _{\mathrm{j}=1}^\mathrm{N} {{\uplambda }_\mathrm{j}}=1\end{aligned}$$
(2c)
$$\begin{aligned}&{\upeta } _\mathrm{k} ,{\uplambda }_\mathrm{j} \,\ge 0 \end{aligned}$$
(2d)

Because of the large negative weight on \({\uplambda }_\mathrm{k}\) in the objective function in (2), an observation k will not serve as a reference point for its own evaluation (i.e. \({\uplambda }_\mathrm{k}^*=1\)) unless the corresponding problem in (1) is infeasible. Therefore, the super-efficiency of observation k is \(\hat{{\uppsi }}_\mathrm{k}^\mathrm{SE} =1/{\upeta }_\mathrm{k}^*\,\hbox {if}\,{\uplambda }_\mathrm{k}^{*} =0\) in an optimal solution to (2), and is marked as infeasible if \({\uplambda }_\mathrm{k}^*=1\).Footnote 3 Observe that an observation k would be rated as inefficient by the conventional BCC model (that allows an observation to be in the reference set for itself, rather than excluding it as in program (1)), if and only if the super-efficiency estimate \(\hat{{\uppsi }}_\mathrm{k}^\mathrm{SE} <1,\) and that the observation would have been rated as efficient by the conventional BCC model if and only if \(\hat{{\uppsi }}_\mathrm{k}^\mathrm{SE} \ge 1\) (Banker and Gifford 1988).

2.2 Super CCR and super NIRS

Similar to the above Super BCC model, the Super CCR model excludes a unit “k” from the reference set in evaluating the super-efficiency of a certain observation. However in the Super CCR model, Eq. (2c) is omitted because of the constant returns to scale assumption in the CCR formulation.

The Super NIRS model assumes a non-increasing returns to scale, that is, \(\sum \limits _{j=1}^N {\lambda _j } \ge 1\). The only difference between the Super NIRS formulation and the Super BCC formulation is that the former replaces Eq. (2c) in the latter with \(\sum \limits _{j=1}^{N} {\lambda _{j}} \ge 1\).

3 Ranking efficient units in different “regions”

As mentioned earlier, Andersen and Petersen (AP) suggested the use of the super-efficiency model for ranking efficient units. This section aims to evaluate the performance of the AP procedure in ranking efficient units and more importantly to shed light on when the AP procedure is effective or ineffective. We conduct 500 simulation experiments described below.

3.1 Data generating process

We considered three factors: sample size, production technology, and inefficiency distribution in generating the data for the simulation experiments reported in this section.

3.1.1 Sample size

For each experiment, we considered a sample of size N, where N can take any integer value between 51 and 150 with equal probability.

3.1.2 Production technology

We considered two different production technologies. The first consists of a single output and two inputs specified in terms of its efficient production function \(\hbox {z} =\hbox {f}(\hbox {x}_{1},\, \hbox {x}_{2})\), where z represents the maximum output that can be produced from the levels \(\hbox {x}_{1}\) and \(\hbox {x}_{2}\) of the two inputs. Specifically, we used the following “shifted”Footnote 4 Cobb–Douglas production function:

$$\begin{aligned} \hbox {z}=(\hbox {x}_1 -{\upalpha }_1 )^{{\upbeta }_1 }(\hbox {x}_2 -{\upalpha }_2 )^{{\upbeta }_2 } \end{aligned}$$
(3)

where \({\upalpha }_{1}= {\upalpha }_{2}= 5\), the inputs \(\hbox {x}_{1}\) and \(\hbox {x}_{2}\) generated randomly from independent uniform probability distributions over the interval [10, 20], and the coefficients \({\upbeta }_{1}\) and \({\upbeta }_{2}\) generated randomly from independent uniform probability distributions over the interval [0.4, 0.5]. Since the sum of \({\upbeta }_{1}\) and \({\upbeta }_{2}\) is less than one, the production function in (3) satisfies the BCC model’s maintained assumption of a concave production function, while the shifts \({\upalpha }_{1},\, {\upalpha }_{2}> 0\) allow both increasing and decreasing returns to scale to prevail. The function shows increasing returns over the interval [10, A] and decreasing returns over the interval [A, 20], where \(\hbox {A}=5/(1-{\beta }_{i})\), for the input \(x_{i}\) in a section of the production function obtained by fixing the level of the other input \(x_{3-i,,} \quad i=1,2.\)

The second production technology we consider here follows Banker and Natarajan (2004). It is a polynomial function in the form of

$$\begin{aligned} y = \beta _{0}+ \beta _{1} x + \beta _{2} x^{2} + \beta _{3} x^{3} \end{aligned}$$
(4)

The input variable x is generated from the uniform distribution over the interval [1,4]. The coefficients \(\beta _{0},\, \beta _{1},\, \beta _{2},\beta _{3}\) determine the properties of the production of technology and we use the following values: \(\beta _{0}= -37,\, \beta _{1} = 48,\, \beta _{2}= -12,\beta _{3}= 1\). Note that these choices guarantee that the production function is a monotonically increasing and concave function when \(x\in [1,4]\).

3.1.3 Inefficiency distribution

We generated the logarithm of the inefficiency, \(\hbox {u}_\mathrm{k}=\hbox {ln}\uptheta _\mathrm{k}\) for each observation \(\hbox {k}\in \{1,{\ldots }\hbox {N}\}\) from a half-normal distribution \(\left| {\hbox {N}(0,\,{\upsigma }_\mathrm{u}^2)} \right| \), where the parameter \({\upsigma }_\mathrm{u}^2\) itself is drawn from a uniform distribution on the interval [0, 0.1989]. The range of values for the distribution of \({\upsigma }_\mathrm{u}^2 \) is chosen such that mean efficiency given by \(\hbox {E}(\uppsi )=\hbox {E}(\hbox {e}^\mathrm{-u})=\exp (-{\upsigma }_\mathrm{u} \sqrt{2/\pi })\) is between 0.7 and 1.0.

3.1.4 Simulated observations for Cobb–Douglas production function

For each experiment, we first randomly generated a value for N between 51 and 150, values for \({\upbeta }_{1}\) and \({\upbeta }_{2}\) between 0.4 and 0.5, and a value for \({\upsigma }_\mathrm{u}^2\) between 0 and 0.1989. Next, we simulated N observations of the two inputs \(\hbox {x}_{1}\) and \(\hbox {x}_{2}\) between 10 and 20. These values \((\hbox {x}_\mathrm{1k,}\, \hbox {x}_\mathrm{2k})\), k = 1, ...N, were then substituted into the efficient production function specified in Eq. (3) to obtain the corresponding values \(\hbox {z}_\mathrm{k} = \hbox {f}(\hbox {x}_\mathrm{1k},\, \hbox {x}_\mathrm{2k})\) for the efficient output quantity. Then, we randomly generated the logarithm of “true” inefficiency values \(\hbox {u}_\mathrm{k}=\hbox {ln}{\uptheta }_\mathrm{k}\) for each observation \(\hbox {k}\in \{1, {\ldots } \hbox {N}\}\) from the half-normal distribution \(\left| {\hbox {N}(0,\,{\upsigma }_\mathrm{u}^2)}\right| \). Finally, we obtained the values for “observed” output quantities \(\hbox {y}_\mathrm{k}\) and the “true” efficiency values \({\uppsi }_\mathrm{k}\) as:

$$\begin{aligned}&\hbox {y}_\mathrm{k}= \hbox {f}(\hbox {x}_\mathrm{k}) / \hbox {exp}(\hbox {u}_\mathrm{k})\end{aligned}$$
(5)
$$\begin{aligned}&{\uppsi }_\mathrm{k} =1/\hbox {exp}(\hbox {u}_\mathrm{k}). \end{aligned}$$
(6)

Thus, each observation k comprises its “observed” output and inputs values \((\hbox {y}_\mathrm{k};\, \hbox {x}_{1\mathrm k},\, \hbox {x}_{2k})\) and each sample consists of N such observations.

3.1.5 Simulated observations for the polynomial production Function

Similarly, for each experiment we first randomly generated a value for N between 51 and 150 and a value for \({\upsigma }_\mathrm{u}^2\) between 0 and 0.1989. Next, we simulated N observations of the input x between a and 4. These values \(\hbox {x}_\mathrm{k},\, \hbox {k}= 1,\, {\ldots }\hbox {N}\), were then substituted into the efficient production function specified in Eq. (4) to obtain the corresponding values \(\hbox {z}_\mathrm{k} = \hbox {f}(\hbox {x}_\mathrm{k})\) for the efficient output quantity. Then, we randomly generated the logarithm of “true” inefficiency values \(\hbox {u}_\mathrm{k}={\uptheta }_\mathrm{k}\) for each observation \(\hbox {k}\in \{1, {\ldots } \hbox {N}\}\) from the half-normal distribution \(\left| {\hbox {N}(0,\,{\upsigma }_\mathrm{u}^2)} \right| \). Finally, we obtained the values for “observed” output quantities \(\hbox {y}_\mathrm{k}\):

$$\begin{aligned} \hbox {y}_\mathrm{k}= \hbox {f}(\hbox {x}_{1\mathrm{k},}\, \hbox {x}_\mathrm{2k}) - \hbox {u}_\mathrm{k} \end{aligned}$$
(7)

Thus, each observation k comprises its “observed” output and inputs values \((\hbox {y}_\mathrm{k};\, \hbox {x}_\mathrm{k)}\) and each sample consists of N such observations.

3.2 Regions

We hypothesize that efficiency rankings derived from super-efficiency procedures are more sensitive to those units on the corners than those in the middle. To test this hypothesis, we separate the DMUs into three regions based on the values of inputs/outputs: 15 % to the left as region 1, 70 % in the middle as region 2 and 15 % to the right as region 3.

Superficially, we divide the DMUs generated from the Cobb–Douglas production function (see Eq. 5) based on the value of \(\hbox {y}_\mathrm{k}\). Those 15 % DMUs with the smallest \(\hbox {y}_\mathrm{k}\) are assigned to region 1; the 15 % with the highest \(\hbox {y}_\mathrm{k}\) are assigned to region 3; and the rest middle 70 % are assigned to region 2. This roughly corresponds to the regions having \(\hbox {y}_\mathrm{k} = 5,\, 5 <\hbox {y}_\mathrm{k} = 8\) and \(8<\hbox {y}_\mathrm{k}\) for region 1, 2 and 3 respectively.

For the polynomial function, we separate the three regions according to \(\hbox {y}_\mathrm{k}\) specified in Eq. (7) in a similar manner—15 % DMUs with the smallest \(\hbox {y}_\mathrm{k}\) are assigned to region 1, 15 % with the highest \(\hbox {y}_\mathrm{k}\) are assigned to region 3, and the rest middle 70 % are assigned to region 2. This roughly corresponds to the three regions of \(\hbox {y}_\mathrm{k}\) = 11.2, \(11.2 <\hbox {y}_\mathrm{k}\) = 25.2 and \(25.2 <\hbox {y}_\mathrm{k}\) respectively.

3.3 Simulation results

In each experiment, we run six linear programs in order to estimate the efficiencies for the six models considered in this paper: BCC, CCR, NIRS, the Super-BCC, Super CCR and Super-NIRS, for each of the k=1,...N observations in the sample.

Table 1 Means of pearson correlation coefficients between the true and estimated efficiencies

In Table 1, we report the means of Pearson correlation between the true efficiency value \({\uppsi }_\mathrm{k}\) and the estimated efficiency \(\hat{{\psi }}_k\). It is evident from Panel A and B of Table 1 that AP procedure’s performance in ranking efficient units is not at all satisfactory, consistent with the findings of Banker and Chang (2006). One claimed advantage of the AP procedure is that it can distinguish efficiencies among those efficient observations. Our results in Panel A did not lend support to this. Panel A shows that the correlations (only for those efficient points) are very low and many of them are even negative, across all six models considered. Panel B shows that the mean correlations (based on all observations) of AP’s procedures (super-BCC, super-CCR and super-NIRS) are all lower than those of their counterparts (BCC, CCR and NIRS) across both production functions (Cobb–Douglas and Polynomial). In addition, while CCR and super CCR seem to be appropriate for the Cobb–Douglas production function, they are not satisfactory for the polynomial function. This may be because the concavity of the polynomial function violates the constant returns to scale assumption of CCR.

Table 2 Means of pearson correlation coefficients between the true and estimated efficiencies when data are separated into regions

What might have caused the above findings? We hypothesize that the performance of AP’s procedure may vary in different regions of observations. Table 2 summarizes the comparative results across the three regions of DMUs.Footnote 5 Panel A shows that the AP procedure does not rank those efficient units well for any of the three regions, with region 1 being the worst, followed by region 2 and then region 3. The results in Panel B and C of Table 2 are based on all observations. These two panels show that both the AP procedure (see Panel B) and the conventional DEA models (see Panel C) perform the worst in region 1. However, the problem is more pronounced for the AP procedure. In general, region 2 and region 3 results are more satisfactory for both the AP procedure and the conventional DEA models. Between the two panels, Panel C results are consistently better than those of Panel B across regions. This furthers corroborates that the AP procedure does not improve the rankings of DMUs over the conventional DEA formulations (BCC, CCR and NIRS).

4 Outlier identification

The AP ranking procedure does not consider the potential impact of outliers on efficiency estimation. Outliers are a few extreme observations often caused by errors in measuring either the inputs or outputs. Since extreme observations determine the production frontier in DEA models, the estimation of the frontier may be sensitive to measurement errors in the sample data. If an observation has been contaminated with noise that increases the observed output value or decreases the observed input values such that it gets rated as efficient, then it may also enter the reference set of other observations and distort their estimated efficiency scores. Such outliers may be influential in the estimation results obtained using a conventional DEA model. It is desirable, therefore, to consider a procedure that allows us to identify and remove such outliers.

Banker and Gifford’s (1988) procedure for identifying outliers generalizes Timmer’s (1971) procedure. Timmer suggests discarding a certain percentage of efficient observations from the sample and re-estimating the production frontier using the remaining observations. Another way to interpret Timmer’s procedure is that a certain proportion of efficient observations are classified as outliers and eliminated before re-estimating the efficiency of the remaining observations. BG’s procedure differs from that of Timmer’s in that they suggest the use of a screen based on the super-efficiency score to identify those observations that are more likely to be contaminated with noise. In other words, rather than throwing out an arbitrary set of efficient observations, BG suggest that only those observations with super-efficiency scores higher than a pre-selected screen should be eliminated. If an efficient observation is an outlier that has been contaminated with noise then it is more likely to have an output (or input) level much greater (smaller) than that of other observations with similar input (output) levels. Therefore, such outliers are more likely to have a super-efficiency score much greater than one. This is the motivation underlying the BG procedure for outlier identification (see Banker et al. 1989, pp. 279–280).

4.1 Data generating process

To evaluate the BG procedure for outlier identification, we extended the data generating process described earlier for the simulation experiments in the previous section. We considered two additional factors: probability that data are contaminated and the distribution of the random noise for such contaminated observations.

4.1.1 Probability of contaminated observations

The probability of an observation being contaminated with random noise was specified to be \(\uprho \). We generated \(\uprho \) randomly from a uniform probability distribution over the interval [0, 0.1]. In other words, the probability that an observation is contaminated with random noise ranges from 0 to 10 %.

4.1.2 Random noise distribution

Noise distribution for the Cobb–Douglas function Conditional on an observation being contaminated, we specified a two-sided random noise distribution. We generated the logarithm of the random noise \(\hbox {v}_\mathrm{k}\) for each observation k=1,...N, from a normal distribution N(0, \(\hbox {s}_\mathrm{v}^{2}\)), where \(\hbox {s}_\mathrm{v} = \hbox {d E}(\hbox {z})\). For each experiment, the parameter d was generated randomly from a uniform probability distribution over the interval [0, 1] and \(\hbox {E(z)}=\frac{(15^{1+{\upbeta }_1 }-5^{1+{\upbeta } _1 })\,(15^{1+\beta _2 }-5^{1+\beta _2 })}{100(1+\upbeta _{1} )(1+{\upbeta }_2 )}\) equaled the mean of the efficient output quantity.

Although we consider a two-sided random noise distribution, only those outliers that lie above the frontier are likely to affect the efficiency estimation of other observations by entering their reference set. Outliers with negative errors are likely to lie inside the frontier and have no impact on the efficiency estimation of other observations.

Noise distribution for the polynomial function We generated the random noise \(\hbox {v}_\mathrm{k}\) for each observation \(\hbox {k}=1,{\ldots }\hbox {N}\), from a half-Normal distribution \({\vert }\hbox {N}(0, {\upsigma }_\mathrm{v}^{2}){\vert }\), where \({\upsigma }_\mathrm{v} = \hbox {d}/2 \,\hbox {E(z)}\), where parameter d was generated randomly from a uniform probability distribution over the interval [0, 1] for each experiment and E(z) = 20.2.

4.1.3 Simulated observations

Simulated observations for the Cobb–Douglas function For each experiment, we first generated values of N, \({\upbeta }_{1}, \, {\upbeta }_{2}\) and \({\upsigma }_\mathrm{u}^2\) as described in the previous section. In addition, we simulated values of \(\uprho \), the probability of being contaminated with random noise, from a uniform distribution over [0, 0.1], and d from a uniform distribution over [0, 1] to specify the parameter of the random noise distribution as \({\upsigma }_\mathrm{v} =\updelta \hbox {E(z)}\). Next, for each observation k in the sample for an experiment, we randomly generated the values of the input quantities \(\hbox {x}_\mathrm{1k}\) and \(\hbox {x}_\mathrm{2k}\), and inefficiency \({\uptheta }_\mathrm{k}= \hbox {exp}(\hbox {u}_\mathrm{k})\) as described in the previous section. Further, for each observation k, we also generated an index variable, q, from a uniform distribution over the interval [0, 1] and random noise \(\hbox {v}_\mathrm{k}\) from the half-Normal distribution \(\left| {\hbox {N}(0,{\upsigma }_\mathrm{v}^2 )} \right| \). Finally, we obtained the values for the observed output quantities \(\hbox {y}_\mathrm{k}\) as:

$$\begin{aligned} \begin{array}{lll} \hbox {either}&{} \hbox {y}_\mathrm{k}=\hbox {f}((\hbox {x}_\mathrm{1k},\, \hbox {x}_\mathrm{2k}) * \hbox {exp}(\hbox {v}_\mathrm{k}) / \hbox {exp}(\hbox {u}_\mathrm{k})&{} \quad \hbox {if}\, \hbox {q} \le \uprho \\ \hbox {or}&{} \hbox {y}_\mathrm{k}=\hbox {f}((\hbox {x}_\mathrm{1k,}\, \hbox {x}_\mathrm{2k})/ \hbox {exp} (\hbox {u}_\mathrm{k})&{}\quad \hbox {if}\, \hbox {q}> \uprho . \end{array} \end{aligned}$$
(8)

Simulated observations for the polynomial function For each experiment, we first generated values of N, \(\uprho \) and \({\upsigma }_\mathrm{u}^2\) as described in the previous section. Next, for each observation k in the sample for an experiment, we randomly generated the values of the input quantities \(\hbox {x}_\mathrm{k}\) and inefficiency \({\uptheta }_\mathrm{k}= \hbox {u}_\mathrm{k}\) as described in the previous section. Further, for each observation k, we also generated an index variable, q, from a uniform distribution over the interval [0, 1] and random noise \(\hbox {v}_\mathrm{k}\) from the half-normal distribution \(\left| {\hbox {N}(0,{\upsigma }_\mathrm{v}^2 )} \right| \). Since DEA does not allow non-positive values, we added a constant, 10, to \(\hbox {y}_\mathrm{k}\) to ensure all values are positive. Finally, we obtained the values for the observed output quantities \(\hbox {y}_\mathrm{k}\) as:

$$\begin{aligned} \begin{array}{ll@{\quad }l} \hbox {either} &{}\hbox {y}_\mathrm{k}=\hbox {f}(\hbox {x}_\mathrm{k}) - \hbox {u}_\mathrm{k} - \hbox {v}_\mathrm{k}+ 10&{} \hbox {if}\ \hbox {q} \le \uprho \\ \hbox {or}&{} \hbox {y}_\mathrm{k}=\hbox {f}(\mathrm{x}_\mathrm{k}) - \hbox {u}_\mathrm{k}&{} \hbox {if}\ \hbox {q} > \uprho . \end{array} \end{aligned}$$
(9)

4.2 Screens for outlier identification

To evaluate the performance of the BG procedure, we considered the screen level 1.0 for outlier identification as suggested by Banker and Gifford (1988). The screen level of 1.0 implies the elimination of all observations rated as super-efficient in the BG super-efficiency model. In the first stage, we identify and eliminate outliers using the pre-selected screen level and then in the second stage re-estimate the BCC, CCR and NIRS models with the remaining observations. We refer to the second-stage efficiency estimates as the BG-SE estimates.

4.3 Simulation results: average performance

Table 3 Correlation coefficients between the true efficiencies and BG estimated efficiencies based on super-efficiency model when data are contaminated, by Rho and screen levels

Table 3 reports the mean correlations between the true efficiency values and the BG efficiency estimates. For comparison purposes, we also report the mean correlations between the true efficiency scores and the efficiency estimates from the BCC, CCR and NIRS models. The main finding here is that the BG procedure is effective in identifying outliers, across both production functions and the three different DEA models. On average, the results in Table 3 indicate that the BG procedures using screens outperforms all of the initial BCC, CCR and NIRS estimators. For example with a screen of 1.0, using the BG super efficiency procedure improves ovesr the BCC model by 19 % (= (0.635\(-\)0.543)/0.543), for the Cobb–Douglas function averaged across different noise levels. Interestingly there is considerable improvement when using the BG procedures relative to the conventional DEA models even for very small probabilities of contamination. Figure 1 is based on the BG procedure results in Table 2, with Cobb–Douglas results on the left chart and the polynomial results on the right one. Clearly, as the noise level increases, the performance of the BG procedures deteriorates, for all three DEA models we consider. In addition, the BG procedure for the CCR model does not perform well for the polynomial function. As mentioned earlier, this may be due to the fact that the constant returns to scale assumption of CCR is violated under the polynomial function.

Fig. 1
figure 1

Pearson correlation comparison as a function of noise rate rho under Cobb–Douglas (left) and Polynomial (right) production functions

Table 4 Means of correlation coefficients between the true efficiencies and BG estimated efficiencies when data are not contaminated

To further assess how well the BG procedure performs in efficiency estimation when the data are not contaminated with random noise, we report in Table 4 correlation between the true efficiency values and the BG estimates, as well as the correlation between true efficiency values and the BCC/CCR/NIRS estimates. In general, these results in Table 4 confirm the intuition that the BG estimation procedure may not be as effective in this case. Out of six scenarios in Table 4, only two (Super-BCC and Super-NIRS for the polynomial function) outperform their counterparts. The remaining four perform worse. This implies that the removal of some observations by using the BG procedure is likely to result in worse, rather than better, estimation performance if there is no noise contamination. Under such circumstances, there is a cost to eliminating observations identified as outliers using the BG procedure.

Table 5 Correlation coefficients between the true efficiencies and BG estimated efficiencies based on super-efficiency model when data are contaminated, by Rho, Screen levels and regions

In order to shed light on the performance of BG procedures in different regions, Table 5 details correlations of different DEA procedures by region, noise level and the screen level. There are three main findings. First, all procedures’ performance deteriorates for all regions as the noise level increases. Second, for the Cobb–Douglas function, the BG procedure shows clear improvement over the counterpart DEA model for each of the three regions. For example, for region 2, the BG procedure on top of the BCC model (referred to as super-BCC in Table 5) outperforms the BCC model by 21 % = (0.84-0.69)/0.69. However, for the polynomial functions, the BG-procedure results do not improve over its counterparts as significantly when the DMUs are separated into three regions. Third, most interestingly, as the noise level increases for the Cobb–Douglas function, only the efficiencies of those DMUs in the middle (region 2) are estimated well. Both of the two corner DMU regions (region 1 and region 3) are not estimated satisfactorily. This finding is consistent across all the six procedures (BCC, CCR, NIRS, Super-BCC, Super-CCR and Super-NIRS). We believe this is because the positive noise data in the middle region tend to be muted by the greater likelihood of finding spanning observations in constructing a convex combination of reference points, but this is less likely in the corners.

5 Conclusion

In this paper, we have conducted simulation experiments to evaluate the performance of the Banker and Gifford (1988) super-efficiency model when it is used for ranking efficient units and when it is used for outlier identification. We find that Andersen and Petersen’s (1993) procedure using the super-efficiency model for ranking efficient observations does not perform satisfactorily. In contrast, the evidence supports the use of Banker and Gifford’s (1988) and Bankera et al.’s (1989) super-efficiency based procedure to identify outliers. Most importantly, we document that the poor performance is particularly acute for observations with smaller input and output values. Performance is worse for the super-CCR model. Performance of the BG outlier screening procedure also deteriorates with the level of noise, and rather precipitously when the noise level rho exceeds 0.05.

Our study has the following important implications. From academic perspective, there is an urgent need for future research to propose alternative methods or to extend the AP procedure for the improvement of ranking the performance of efficient units. From practical consideration, given that the BG procedure outperforms other conventional DEA procedures only when the data is contaminated with a high level of random noise, it is essential for decision makers to evaluate the extent of their data contamination before using the BG procedure in removing outliers. However, a caveat is in order. The results obtained from this study may not be applied directly to the multiple output and multiple input case without conducting additional Monte Carlo simulation experiments.