A central folk wisdom of efficiency analysis is that Data Envelopment Analysis (DEA) models are not amenable to differential arguments. This belief emerges because DEA technologies are conservative approximations derived as appropriate hulls of observed input and output data. Those “hulling” operations inevitably introduce “kinks” into the resulting frontiers of the enveloped data. These kinks, in turn, are typically inherited by function representations of the DEA technology.

This lack of smoothness has limited the adoption of DEA and closely related methods by economists and others not specializing in efficiency analysis. But, in and of itself, it is not inherently important, particularly because it typically only occurs on sets of measure zero. Problems arise for efficiency analysis, however, because those sets of measure zero frequently correspond to “extreme” efficient points for the conservative approximation to the technology. These are precisely the points on which efficiency analysts frequently focus. And, as is well-known, “kinks” in primal (quantity) space map into “flats” in dual (price) space (McFadden 1978). Thus, the most familiar manifestation of this lack of differentiability is the lack of unique multipliers (shadow prices) for “extreme” efficient units. This multiplicity of multipliers for efficient units has led some to suggest that specific criteria should be developed for selecting appropriate multipliers from among the many contained in the dual “flat” corresponding to the primal “kink” associated with “extreme” efficient units (Färe and Korhonen 2004; Cooper et al. 2007).

This paper demonstrates that that desired set of criteria is straightforward. First, however, it shows how generalized differential arguments can be applied to DEA representations of technologies. Because one cannot always think in terms of the usual notion of a derivative, a different differential concept must be used. That concept is the directional derivative. The convexity properties of DEA technologies ensure that directional derivatives almost always exist (even at traditionally nondifferentiable points associated with primal “kinks”). And, once properly understood, directional derivatives can be used in much the same fashion, with almost exactly the same intuition, as more usual differential arguments. As we show below, these derivatives also offer a unique and economically compelling solution to the shadow pricing problem associated with DEA technologies.

In what follows we first introduce our approach by way of a simple example that captures its essence. Then we introduce a general representation of the technology and the differential concept that we use. We show how that differential concept always yields computable and unique shadow prices (subject to an obvious normalization). Then we move on to the special case of DEA technologies and show how our concepts resolve some of the common problems that arise from lack of differentiability there.

1 An example

Let x denote a single input and y denote a single output. Assume that there exists a single observation (x, y) = (1, 1). The DEA technology that satisfies nonincreasing returns to scale and which is consistent with this observation is

$$ T=\left\{ \left(x,y\right) :z1\geq y,z1\leq x,0\leq z\leq 1\right\}, $$

where z is the intensity variable. This technology is visually illustrated in Fig. 1.

Fig. 1
figure 1

Nondifferentiability and multiple shadow prices

The graph of this technology is smooth everywhere but at the points (0, 0) and (1, 1). Unfortunately, (1, 1) is often the most interesting point because it is “extreme” efficient. But at (1, 1), the graph of T has infinitely many supporting hyperplanes with slopes in the range [0, 1]. From a dual perspective, this is, perhaps, noted most clearly by recognizing that the constraints dual to (z1 ≥ y, z1 ≤ x ) in the Charnes et al. (1978) formulation of the DEA problem require that for all \(\left(w,p\right)\in {\mathbb{R}}_{+}^{2}\)

$$ w-p\geq 0. $$

Each of these hyperplanes defines a potentially legitimate shadow price for the technology. Therein lies the problem usually encountered in DEA analyses.

Notice, however, that for appropriate movements, the relevant shadow prices associated with T are clear cut. For example, starting at (1, 1), what would a rational decisionmaker be willing to pay for an extra unit of x if T were the true technology? The answer is zero because such changes will not bring about any output growth. Conversely, if one asked how much a rational decisionmaker facing this T must be compensated for a one unit decrease in the input to keep him indifferent to being at (1, 1), the answer would be 1. In what follows, we show how to appropriately generalize these simplified concepts of willingness to pay and willingness to accept in terms of directional derivatives to make differential analysis of DEA technologies both easy and economically meaningful.

2 Basics

We concentrate on input-based representations of the technology, while noting that our arguments extend directly to both output-based and graph-based representations of the technology with virtually no changes in the argument.Footnote 1 The technology is represented by a continuous input correspondence, \(V:{\mathbb{R}}_{+}^{M}\rightarrow 2^{{{\mathbb{R}}_{+}^{N}}},\) that maps points in output space, denoted by \(y\in {\mathbb{R}}_{+}^{M},\) into subsets of the input space, \({\mathbb{R}}_{+}^{N},\) and is defined as

$$ V\left(y\right)=\left\{x:x\; \hbox{can produce}\;y \right\}. $$

We assume that the image of the correspondence, V (y), is convex for all \(y\in{\mathbb{R}}_{+}^{M}\) and exhibits free disposability of inputs so that \(V\left(y\right)+{\mathbb{R}} _{+}^{N}\subset V\left(y\right)\) for all \(y\in{\mathbb{R}}_{+}^{M}.\) Footnote 2

In what follows, we rely heavily on directional derivatives. Because these are closely related geometrically to the concept of a directional distance function, we, therefore, use directional distance functions as our function representations of the technology. The directional input distance function for \(g\in {\mathbb{R}}_{+}^{N}\backslash 0^{N}\) (results are symmetric for directional output distance functions) is defined by

$$ D\left(x, y, g\right)=\hbox{max}\left\{\beta :x-\beta g\in V\left(y\right) \right\} $$

if there exists β such that x−β g ∈ V(y) and −∞ otherwise. Given our assumptions on V(y), D(x, y, g) is nondecreasing and concave in x and satisfies (Chambers et al. 1996) the translation property

$$ D\left(x+\lambda g,y,g\right)=D\left(x,y,g\right)+\lambda, \quad \lambda \in {\mathbb{R}}, $$
(1)

and the representation property

$$ D\left(x,y,g\right) \geq 0\Leftrightarrow x\in V\left(y\right). $$
(2)

Because D(x, y, g) is concave in x (Rockafellar 1970, Theorem 23.1), its (one-sided) directional derivative

$$ D^{\prime}\left(x,y,g;x^{0}\right) =\lim_{\lambda \rightarrow 0^{+}}\left\{\frac{D\left(x+\lambda x^{0},y,g\right) -D\left(x,y,g\right)}{\lambda}\right\}, $$
(3)

is a superlinear (positively linearly homogeneous and concave) function of x 0 with D′(x, y, g; 0) = 0 and

$$ -D^{\prime}\left(x,y,g;-x^{0}\right)\geq D^{\prime}\left(x, y, g;x^{0}\right). $$

By (1) and (3):

$$ \begin{aligned} D^{\prime}\left(x,y,g;g\right)&=\lim_{\lambda \rightarrow 0^{+}}\left\{ \frac{D\left(x+\lambda g,y,g\right) -D\left(x,y,g\right)}{\lambda}\right\}\\ &=\lim_{\lambda \rightarrow 0^{+}}\left\{\frac{D\left(x,y,g\right) +\lambda -D\left(x,y,g\right)}{\lambda}\right\}\\ &=1. \end{aligned} $$
(4)

Moreover,

$$ \begin{aligned} D^{\prime}\left(x+\beta g,y,g;x^{0}\right) &=\lim_{\lambda \rightarrow 0^{+}}\left\{\frac{D\left(x+\beta g+ \lambda x^{0},y,g\right) -D\left(x+\beta g,y,g\right)}{\lambda}\right\}\\ &=\lim_{\lambda \rightarrow 0^{+}}\left\{\frac{D\left(x+\lambda x^{0},y,g\right)+\beta -D\left(x,y,g\right) -\beta }{\lambda}\right\}\\ &=D^{\prime}\left(x,y,g;x^{0}\right), \end{aligned} $$

for all \(\beta \in {\mathbb{R}}\) Thus, directional derivatives for directional distance functions are translation invariant in the direction defining the directional distance function.

The superdifferential of D in x, which we denote as ∂D(x, y, g), is defined as

$$ \partial D\left(x,y,g\right) =\left\{ v\in {\mathbb{R}}^{N}:D\left(x,y,g\right)+v^{\prime}\left(x^{0}-x\right) \geq D\left(x^{0},y,g\right) \hbox{for all }x^{0}\in {\mathbb{R}}^{N}\right\}. $$

By basic results (Rockafellar 1970, Theorems 23.3 and 23.4),

$$ \partial D\left(x,y,g\right) =\left\{ v:v^{\prime}x^{0}\geq D^{\prime}\left(x,y,g;x^{0}\right) \text{ for all }x^{0}\right\}, $$
(5)

or equivalently,

$$ D^{\prime}\left(x,y,g;x^{0}\right) =\inf \left\{v^{\prime}x^{0}:v\in \partial D\left(x,y,g\right) \right\}. $$
(6)

When D(xyg) is differentiable in x, then ∂D(xyg) is the singleton set {∇D(xyg)}, where ∇D(xyg) denotes the gradient of D(xyg) in x. When ∂D(xyg) is a singleton set, then D(xyg) is differentiable in x (Rockafellar 1970). Therefore, when D(xyg) is differentiable in x, then D′(xygx 0) is the inner product of the gradient and x 0, or more compactly

$$ D^{\prime}\left(x,y,g;x^{0}\right)=\nabla D\left(x,y,g\right) ^{\prime}x^{0} $$

When D is differentiable everywhere but on a set of Lebesgue measure zero, ∂D(xyg) is computable using ∇D(xyg). Clarke (1983, Theorem 2.5.1) shows that if \(\Upgamma \subset{\mathbb{R}} ^{Q}\) is a set of (Lebesgue) measure zero such that D(xyg) is differentiable on its complement, \(\Upgamma ^{C}\subset{\mathbb{R}}^{Q},\) then

$$ \partial D\left(x,y,g\right) =co\left\{\lim_{i\rightarrow \infty}\nabla D\left(x_{i},y,g\right) :x_{i}\in \Upgamma^{C},x_{i}\rightarrow x{\mathbf{,}} \hbox{and }\nabla D\left(x_{i},y,g\right) \hbox{ converges}\right\}, $$
(7)

where co{} denotes the convex hull of the indicated set.

Using these results establishes:

Lemma 1

(Chambers et al. 2004; Chambers and Quiggin 2007) If v ∈ ∂D (xyg), then

$$ v^{\prime}g=1, $$

and v ∈ ∂D(x + βgyg) for all \(\beta \in{\mathbb{R}}.\)

Proof

By (4),

$$ D^{\prime}\left(x,y,g;g\right)=1, $$

while a symmetric argument establishes:

$$ D^{\prime}\left(x,y,g;-g\right) =-1. $$

Therefore if v ∈ ∂D(x, yg), by (5)

$$ 1\leq v^{\prime}g\leq 1, $$

which gives the first part. The second part now follows from the fact that D′(x + βgygx 0) = D′(xygx 0) for all \(\beta \in {\mathbb{R}}.\) \(\hfill\square\)

Lemma 1 establishes two important facts that have important economic implications. First, the inner product of any element of ∂D(xyg) and g must equal one. As we shall see below, economically this reflects the fact that ∂D(xyg) contains the shadow prices of the inputs normalized by the shadow value of the numeraire bundle, g. Second, superdifferentials of directional distance functions are invariant to translations of the input vector in the direction of g.

To see precisely how Lemma 1 relates to shadow pricing results, note first that given input prices \(w\in {\mathbb{R}}_{++}^{N},\) the cost function associated with V (y) is defined:

$$ c\left(w,y\right) =\hbox{min}\,\left\{w^{\prime}x:x\in V\left(y\right) \right\} $$

if V(y) is nonempty and ∞ otherwise.Footnote 3 So long as there exists an x such that x−βg ∈ V(y) for some β, Chambers et al. (1996) have shown that by the representation property (2)

$$ \begin{aligned} c\left(w,y\right)=\hbox{min} \left\{w^{\prime}\left(x-D\left(x,y,g\right)g\right)\right\}\notag \\ =\hbox{min}\left\{w^{\prime}x-D\left(x,y,g\right)w^{\prime}g\right\}. \end{aligned} $$
(8)

Take any solution to (8) and denote it by x*. In what follows, we term it efficient or cost-efficient. Now consider the directional derivative of (8) in an arbitrary direction, x 0, away from x*:

$$ \lim_{\lambda \rightarrow 0^{+}}\left\{\frac{w^{\prime}\left(x^{\ast}+\lambda x^{0}\right)-D\left(x^{\ast}+\lambda x^{0},y,g\right)w^{\prime }g-w^{\prime}\left(x^{\ast}-D\left(x^{\ast},y,g\right) g\right)}{\lambda}\right\} =w^{\prime}x^{0}-D^{\prime}\left(x^{\ast},y,g;x^{0}\right) w^{\prime}g. $$
(9)

If x* is optimal, this expression must be nonnegative in all possible directions, whence

$$ \frac{w^{\prime}x^{0}}{w^{\prime}g}\ge D^{\prime}\left(x^{\ast},y,g;x^{0}\right), $$
(10)

for all x 0. Using (5) thus establishes that \(\frac{w}{w^{\prime}g}\in \partial D\left(x^{\ast},y,g\right).\) When attention is restricted to efficient points, ∂D(x*, y, g) must contain all the viable normalized shadow or virtual prices for D at x*. Naturally, when weighted by the elements of the numeraire vector, these virtual prices sum to one by Lemma 1. (Importantly, as we show below, this constraint on shadow prices is always inherited by the Charnes et al. (1978) DEA formulation of D(x, yg).)

There are several things to note. First, by taking x 0 = g while applying (4) to (9), one obtains

$$ w^{\prime}g-D^{\prime}\left(x^{\ast},y,g;g\right) w^{\prime} g=w^{\prime}g-w^{\prime}g=0. $$

Hence, translations of x* in the direction of g yield no change in the objective function for (8). Thus, as demonstrated by Chambers (2001), if x* is a solution to (8) then so is any translation of it in the direction of g. This solution indeterminacy is resolved by setting D(x*, yg) = 0 to ensure that x* is on the frontier of V(y).

Second, letting e i denote the ith element of the usual orthonormal basis, it follows from (10) that for an efficient point x*

$$ \begin{aligned} \frac{w_{i}}{w^{\prime}g}&\geq D^{\prime}\left(x^{\ast},y,g;e_{i}\right)\\ &=\inf \left\{v^{\prime}e_{i}:v\in \partial D\left(x^{\ast},y,g\right) \right\}\\ &=\inf\left\{ v_{i}:v\in \partial D\left(x^{\ast},y,g\right) \right\}. \end{aligned} $$

Thus, any normalized price at which x* is efficient is an upper bound for D′(x*, y, g; e i ). Hence, D′(x*, y, g; e i ) measures what an efficient decisionmaker would be willing to pay for one extra unit of x i . For that reason, we refer to D′(x*, yge i ) as the willingness to pay for a unit of x i (denominated in units of the numeraire commodity bundle g). Symmetrically, now consider a movement in the direction of −e i . Intuitively, this can be associated with the sale of one unit of x i . We have:

$$ \frac{-w_{i}}{w^{\prime}g}\geq D^{\prime}\left(x^{\ast},y,g;-e_{i}\right) , $$

whence

$$ -D^{\prime}\left(x^{\ast},y,g;-e_{i}\right) \geq \frac{w_{i}}{w^{\prime}g}, $$

and noting that

$$ \begin{aligned} -D^{\prime}\left(x^{\ast},y,g;-e_{i}\right)=&\sup \left\{ v^{\prime}e_{i}:v\in \partial D\left(x^{\ast },y,g\right) \right\}\\ =&\sup \left\{v_{i}:v\in \partial D\left(x^{\ast},y,g\right) \right\}, \end{aligned} $$

then establishes that it is appropriate to refer to −D′(x*, yg; −e i ) as the willingness to accept for a unit of x i .

By the properties of the directional derivative:

$$ -D^{\prime}\left(x^{\ast},y,g;-e_{i}\right) \geq D^{\prime}\left(x^{\ast},y,g;e_{i}\right). $$

The divergence between the willingness to accept and the willingness to pay emerges from the fact, illustrated in our initial example, that a unit operating at a kink point on an efficient frontier evaluates the acquisition of extra units of an input and the sale of some units of an input differently. Moreover, as a simple buy-low sell high intuition would suggest, the selling price in such instances should be larger than the buying price.

Even for such units, however, although there are potentially infinitely many shadow prices for x i , there are only two economically relevant shadow prices, the shadow buying price and the shadow selling price. These prices can diverge because of the nonsmoothness of the technology. But even though they diverge, they are unique.On the other hand, units facing a smooth technology, at the margin, are willing to buy and sell inputs at the same unique shadow price.

More generally, we shall refer to −D′(x*, yg; −x 0) as the willingness to accept for x 0 at x* and D′(x*, y, g; x 0) as the willingness to pay. D′(x*, y, g; x 0) is superlinear (positive linearly homogeneous and superadditive) in x 0. Positive linear homogeneity implies that a renormalization of the units of x 0 leads to an equivalent renormalization of the willingness to pay. Suppose x 0 = x 1 + x 2, superaddivity implies that the gap between willingness to accept and willingness to pay for x 0 is smaller than the sum of the respective gaps for x 1 and x 2. Superadditivity also has important implications for the shadow pricing of inputs. It implies

$$ D^{\prime}\left(x^{\ast},y,g;x^{0}\right) \geq D^{\prime}\left(x^{\ast},y,g;x^{1}\right)+D^{\prime}\left(x^{\ast},y,g;x^{2}\right). $$

More generally, because \(x^{0}=\sum_{n=1}^{N}x_{n}^{0}e_{n},\) superadditivity requires that

$$ \begin{aligned} D^{\prime}\left(x^{\ast},y,g;x^{0}\right)=\hbox{min}\, \left\{v^{\prime }x^{0}:v\in \partial D\left(x^{\ast },y,g\right) \right\}\\ \geq &\sum_{n=1}^{N}D^{\prime}\left(x^{\ast },y,g;x_{n}^{0}e_{n}\right)\\ =&\sum_{n=1}^{N}x_{n}^{0}D^{\prime}\left(x^{\ast },y,g;e_{n}\right) . \end{aligned} $$

In evaluating the value of a small move in the direction of x 0 > 0N, it is improper to take each x n , multiply it by D′(x*, y, g; e n ), and then sum. This only provides a lower bound for the true willingness to pay. Depending upon their desired use, the multipliers selected to weight inputs can differ. Thus, while the selection criterion offered by the directional derivative always yields a unique willingness to pay for any move, the actual input weights chosen may change as different moves are considered. One particularly important manifestation emerges when x 0 = x*. Then the rule for multiplier selection is

$$ \begin{aligned} D^{\prime}\left(x^{\ast},y,g;x^{\ast}\right)=\hbox{min}\,\left\{ v^{\prime}x^{\ast}:v\in \partial D\left(x^{\ast},y,g\right)\right\}\\ \geq&\sum_{n=1}^{N}D^{\prime}\left(x^{\ast },y,g;x_{n}^{\ast}e_{n}\right)\\ =&\sum_{n=1}^{N}x_{n}^{\ast}D^{\prime}\left(x^{\ast},y,g;e_{n}\right). \end{aligned} $$

Thus, the appropriate multipliers belong to arg min  {vx* : v ∈ ∂D(x*, y, g)}. Similarly,

$$ \begin{aligned} -D^{\prime}\left(x^{\ast},y,g;-x^{\ast}\right)=\hbox{max}\,\left\{ v^{\prime}x^{\ast}:v\in \partial D\left(x^{\ast},y,g\right)\right\}\\ \leq -\sum_{n=1}^{N}D^{\prime}\left(x^{\ast},y,g;-x_{n}^{\ast}e_{n}\right)\\ =-\sum_{n=1}^{N}x_{n}^{\ast}D^{\prime}\left(x^{\ast},y,g;-e_{n}\right). \end{aligned} $$

However, if the technology is smooth at x, ∂D(x, y, g) = {∇D(xyg)}, and inputs have “unique” weights regardless of the direction of the move. Expression (6) then implies

$$ D^{\prime}\left(x^{\ast},y,g;x^{0}\right)=D^{\prime}\left(x^{\ast},y,g;x^{1}\right)+D^{\prime}\left(x^{\ast},y,g;x^{2}\right). $$

Before applying these results, it is important to emphasize that the interpretation of D′(x*, ygx 0) as willingness to pay or of −D′(x*, yg; −x 0) as willingness to accept is only sensible under the presumption that x* is efficient. That, however, does not mean that D′(x*, ygx 0) is irrelevant if x* is not efficient. Instead its interpretation needs to be modified. If x* is not efficient, then just as in the usual calculus, D′(x*, y, gx 0) measures how the directional distance function will change as a result of a small move in the direction of x 0. But it is not then appropriate to identify D′(x*, ygx 0) with a willingness to pay. As a practical matter, however, when x* is not DEA “extreme” efficient, D′(x*, yg; x 0) will be linear in x 0 (and not just superlinear) because ∂D(x*, yg), in most practical instances, is typically a singleton for inefficient units. And while that singleton set is often referred to as containing a vector of shadow prices, it is now important to distinguish the programming notion of a shadow price from the economic notion of a shadow price. The economic notion always presumes efficiency, while the programming notion does not. Hence, those elements of ∂D(x*, yg) are more properly thought of simply as gradients (presuming uniqueness) or supergradients when x* is not efficient.

3 Applying the calculus to DEA models

Suppose that one is given a data set (x ky k) for k = 1, 2, ... , K where K is the number of observations. Then the constant returns to scale free disposal hull of the data is:

$$ T\left(K\right) =\left\{\left(x,y\right):x\geq \sum_{k}\mu _{k}x^{k},y\leq \sum_{k}\mu_{k}y^{k},\mu_{k}\geq 0,k=1,\,\ldots,\,K\right\}. $$

The corresponding directional distance function for a particular (xy) is given by

$$ D^{T\left(K\right)}\left(x,y,g\right) =\sup_{\beta ,\mu}\left\{\beta :x-\beta g\geq \sum_{k}\mu_{k}x^{k},y\leq \sum_{k}\mu _{k}y^{k},\mu _{k}\geq 0,k=1,\,\ldots\,,K\right\}. $$

To clarify the linkage between our concept of willingness to pay and DEA representations of the technology, rewrite this directional distance function in its Charnes et al. (1978) dual formulation as

$$ D^{T\left(K\right)}\left(x,y,g\right) =\min\limits_{w\in {\mathbb{R}}_{+}^{N},p\in{\mathbb{R}}_{+}^{M}}\left\{ w^{\prime}x-p^{\prime}y:w^{\prime}g\geq 1;w^{\prime}x^{k}-p^{\prime }y^{k}\geq 0,\,k=1,2,\,\ldots\,,K\right\}, $$
(11)

where, as usual, \(\left(w, p\right)\in{\mathbb{R}} _{+}^{N+M}\) denote the the multipliers (shadow prices) of x and y, respectively. In this version of the optimization problem, Lemma 1 is directly reflected in the multiplier normalization constraint, wg ≥ 1. Denote the solution set to (11) by:

$$ \left\{ \left(w^{\ast},p^{\ast}\right) \right\} =\hbox{arg min}\,\left\{ w^{\prime}x-p^{\prime}y:w^{\prime}g\geq 1;w^{\prime}x^{k}-p^{\prime }y^{k}\geq 0,k=1,2,\,\ldots\,,K\right\}. $$

By our earlier arguments and the normalization requirement (wg ≥ 1),

$$ w_{i}^{\ast}=\frac{w_{i}^{\ast}}{w^{\ast \prime}g}\geq D^{T\left(K\right)^{\prime}}\left(x,y,g,e_{i}\right) =\inf \left\{ v_{i}:v\in \partial D^{T\left(K\right) }\left(x,y,g\right) \right\} . $$
(12)

Thus, the best DEA estimate of the willingness to pay for a small unit increase in x i is

$$ \hat{w}_{i}=\hbox{min}\,\left\{w_{i}:\left(w,p\right) \in \left\{\left(w^{\ast},p^{\ast}\right)\right\}\right\}. $$

Similarly, the best estimate of the willingness to accept a small unit decrease in x i is given by max {w i  : (wp) ∈ {(w*, p*)}}, while the corresponding willingness to pay for and accept a marginal perturbation of x in the direction x 0 (and −x 0) are given, respectively, by min {wx 0 : (wp) ∈{(w*, p*)}} and max{wx 0 : (wp) ∈ {(w*, p*)}}.

We illustrate with a simple example. There are two observations with two inputs and one output given by x 1 = x 2 = y = 1 and x 1 = 3, x 2 = 2, and y = 2. By (11), for g = (1, 1)

$$ D^{T\left(K\right)}\left(x,y,g\right)= \hbox{min}\,\left\{ w_{1}x_{1}+w_{2}x_{2}-py:w_{1}+w_{2}\geq 1;w_{1}+w_{2}-p\geq 0;3w_{1}+2w_{2}-2p\geq 0\right\}. $$

Thus, D T(K)(1, 1, 1, (1, 1) ) = 0, and at this point

$$ \left\{\left(w^{\ast},p^{\ast}\right) \right\} =\left\{\left(w_{1},w_{2},p\right)\in {\mathbb{R}}_{+}^{3}:w_{1}+w_{2}=1;p=1\right\}. $$

The willingness to pay for a small increase in x i is given by

$$ \hbox{min} \left\{w_{i}\in {\mathbb{R}}_{+}:w_{1}+w_{2}=1\right\}=0, $$

while the willingness to accept a small decrease in x i is given by

$$ \hbox{max} \left\{w_{i}\in {\mathbb{R}}_{+}:w_{1}+w_{2}=1\right\}=1. $$

The willingness to pay for a small perturbation in the direction x 0 > 0 is

$$ \hbox{min}\left\{w_{1}x_{1}^{0}+w_{2}x_{2}^{0}:w_{1}+w_{2}=1\right\} =\hbox{min}\left\{x_{1}^{0},x_{2}^{0}\right\}, $$

while the willingness to accept a small movement in the direction −x 0 is

$$ \hbox{max}\left\{ w_{1}x_{1}^{0}+w_{2}x_{2}^{0}:w_{1}+w_{2}=1\right\}= \hbox{max}\left\{ x_{1}^{0},x_{2}^{0}\right\}. $$

To illustrate the superlinearity of willingness to pay, notice that

$$ \begin{aligned} \hbox{min}\left\{ x_{1}^{0},x_{2}^{0}\right\}=&\hbox{min}\left\{ w_{1}x_{1}^{0}+w_{2}x_{2}^{0}:w_{1}+w_{2}=1\right\} \\ > &\hbox{min}\left\{ w_{1}x_{1}^{0}:w_{1}+w_{2}=1\right\} +\hbox{min}\left\{ w_{2}x_{2}^{0}:w_{1}+w_{2}=1\right\} \\ =0. \end{aligned} $$

To illustrate the computation of D′(x*, yg; x 0) for an inefficient unit, let us consider the point (3, 2, 2). Under constant returns to scale, this unit is clearly inefficient. Taking the directional vector to be g = (1, 0), one obtains:

$$ \begin{aligned} D^{T\left(K\right) }\left(3,2,2,\left(1,0\right) \right)=&\hbox{min}\,\left\{ 3w_{1}+2w_{2}-2p:w_{1}\geq 1;w_{1}+w_{2}-p\geq 0;3w_{1}+2w_{2}-2p\geq 0\right\} \\ =&1, \end{aligned} $$

with

$$ \left\{\left(w^{\ast},p^{\ast}\right) \right\}=\left(1,0,1\right). $$

Because (3, 2, 2) is not on the efficient frontier, it is inappropriate to interpret (w*, p*) in this instance in terms of willingness to pay or willingness to accept, but D T(K)′(3, 2, 1, (1, 0), x 0) still exists and is given uniquely by \(x_{1}^{0}.\) Now, however, it simply measures how D T(K)(3, 2, 2, (1, 0)) changes as we move in the direction x 0.

Taking the directional vector to be (1, 1) or (0, 1) gives D T(K)(3, 2, 2, g) = 0 with (w*, p*) = (0, 1, 1), whence \(D^{T\left(K\right)^{\prime}}\left(3,2,1,g,x^{0}\right)=x_{2}^{0}.\)

4 Conclusion

DEA models, as formulated by Charnes et al. (1978), are not amenable to differential arguments for “extreme” efficient units. The reason, of course, is that in the primal input–output space, these points coincide with “kinks” in the approximating technology. Consequently, function representations of the approximating technology are not there differentiable in the usual sense. Dually, this nondifferentiability is manifested by the existence of alternate optima to the Charnes et al. (1978) DEA problem, and, in turn, it naturally raises the question of which weights to use in evaluating the relative importance of inputs and outputs. This problem is more than just an idle theoretical curiosity because as Cooper et al. (2007) show, the presence of alternate optima yields different evaluations of the relative importance of inputs and outputs for efficient units depending upon the software used to analyze the data.

This paper shows how a “calculus” can be applied to DEA, and, in particular, how this “calculus” resolves this weight choice problem uniquely. While not frequently used within the DEA literature, the “calculus” is based on well-known results in the convex analysis literature (Rockafellar 1970; Clarke 1983) for directional derivatives and their associated superdifferentials. Here, we have shown by example how these results can be applied to resolve the shadow price (weight choice) problem. But is also clear that under suitable convexity assumptions on the graph of the technology, parallel methods can be used to calculate the elasticity of scale. In fact, we close this paper by noting that the elasticity of scale, as usually defined, is the ratio of a directional derivative to an observed output level. Thus, when properly extended, our "calculus" easily covers that case as well.