The wrong skew problem in stochastic frontier models when inefficiency depends on environmental variables

Cho, Cheol-Keun; Schmidt, Peter

doi:10.1007/s00181-018-1573-x

The wrong skew problem in stochastic frontier models when inefficiency depends on environmental variables

Published: 16 October 2018

Volume 58, pages 2031–2047, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Empirical Economics Aims and scope Submit manuscript

The wrong skew problem in stochastic frontier models when inefficiency depends on environmental variables

Download PDF

Cheol-Keun Cho¹ &
Peter Schmidt²

380 Accesses
8 Citations
Explore all metrics

Abstract

A well-known result due to Waldman (J Econom 18:275–279, 1982) states that, in the standard normal/half-normal SFA model, estimated technical inefficiency will be zero if the OLS residuals are positively skewed. It is not clear how much this result generalizes. In this paper, we consider the normal/half-normal model in which the distribution of the half-normal error depends on explanatory variables. We consider estimation by nonlinear least squares and maximum likelihood. In both cases, we find a stationary point (zero derivatives) at parameter values that indicate zero inefficiency, a result similar to Waldman’s. However, both for nonlinear least squares and for MLE, we show that in general the stationary point is neither a local minimum nor a local maximum.

Stochastic Frontier Analysis: Foundations and Advances I

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Consider the stochastic frontier model

$$ y_{i} = \alpha + x_{i}^{'} \beta + v_{i} - u_{i} = \alpha + x_{i}^{'} \beta + \varepsilon_{i} , $$

(1)

where $ x_{i} $ is “fixed” (independent of $ v_{i} $ and $ u_{i} ) $; $ v_{i} $ is distributed as $ N\left( {0,\sigma_{v}^{2} } \right) $; $ u_{i} $ is distributed as $ N^{ + } \left( {0,\sigma_{u}^{2} } \right) $, i.e., “half normal”; and $ v_{i} $ and $ u_{i} $ are independent. We will sometimes use the standard notation that $ \sigma^{2} = \sigma_{v}^{2} + \sigma_{u}^{2} $ and $ \lambda = \sigma_{u} /\sigma_{v} $. This is the model of Aigner et al. (1977) and Meeusen and van den Broeck (1977), and we will call it the standard SFA model.

The distribution of $ \varepsilon_{i} $ is “skew normal,” and the model can be estimated by MLE using the skew normal density. An alternative method of estimation is corrected OLS (COLS), which was suggested by Aigner, Lovell and Schmidt and further analyzed in Olson et al. (1980) and Waldman (1982), and which will be described below.

Under the assumptions made above, the third moment of $ \varepsilon_{i} $ is negative. However, its sample equivalent, the third moment of the OLS residuals, can be positive. This is the so-called wrong skew case. In the wrong skew case, two problems arise. The first, noted in the original Aigner, Lovell and Schmidt article, is that the COLS estimate does not exist. The second and more subtle problem, or set of problems, was noted by Waldman. The likelihood always (regardless of wrong or right skew) has a stationary point at the parameter values that reflect no inefficiency ($ \lambda = 0 $ and other parameters = the OLS estimates). At that stationary point, the Hessian is always singular. In the wrong skew case, this point is a local maximum of the likelihood function, and empirically in this case, this point is also the global maximum. Thus, with a positive probability the data will indicate no inefficiency. This complicates the process of inference, see Simar and Wilson (2009).

The question this paper addresses is whether similar problems occur in models in which the distribution of technical inefficiency depends on observable “environmental” variables. Specifically, we will consider a form of the RSCFG model of Reifschneider and Stevenson (1991), Caudill and Ford (1993) and Caudill et al. (1995). Here, we obtain results that are similar to but different from those for the standard SFA model. There is no COLS estimator, but the model can be estimated by nonlinear least squares, and the least squares criterion function always has a stationary point at the parameter values that reflect no inefficiency. The model can also be estimated by MLE, and the likelihood always has a stationary point and the Hessian is singular at the parameter values that reflect no inefficiency. In general, these stationary points are neither a local minimum nor a local maximum of the relevant criterion function (sum of squares or likelihood). None of these statements have any connection to the skew of the residuals.

This paper does not aim to give advice about how to proceed if in a particular data set the stationary point is the global minimum of the NLLS criterion or the maximum of the log-likelihood. Nor do we attempt to construct models in which a skew of the residuals of either sign is not wrong. Some papers that do these things will be discussed in the final section of the paper. We simply ask the questions of whether, for a given criterion function, there is always a stationary point and whether we can say that it is or is not a local maximum or minimum. These are rather specific questions, but as we will see it is not trivial to answer them.

2 More detail on the wrong skew problem in the standard SFA model

We will now be a little more precise than in the previous section about the nature of the wrong skew problem in the standard SFA model. We do this so that it is clear what results we might hope or expect to generalize the RSCFG model.

The model is as given in Eq. (1). It is well known that $ E\left( u \right) \equiv \mu = \sqrt {\frac{2}{\pi }} \sigma_{u} $ so that $ E\left( \varepsilon \right) = - \mu $. Also $ {\text{var}}\left( u \right) = \frac{\pi - 2}{\pi }\sigma_{u}^{2} $, $ E\left( {u^{2} } \right) = \sigma_{u}^{2} $ and $ \mu_{3}^{\prime} \equiv E(\epsilon+\mu)^{3} $ = $ - E\left( {u - \mu } \right)^{3} $ = $ \sqrt {\frac{2}{\pi }} \left( {\frac{\pi - 4}{\pi }} \right)\sigma_{u}^{3} $. Note that $ \mu_{3}^{'} \le 0 $. OLS implicitly estimates $ \left( {\alpha - \mu } \right) $ and $ \beta $, and the OLS residuals $ e_{i} $ correspondingly are “estimates” of $ v_{i} - \left( {u_{i} - \mu } \right) $. So

$$ \hat{\sigma }^{2} \equiv \frac{1}{N}\mathop \sum \limits_{i} e_{i}^{2} \to_{p} \sigma^{2} = \sigma_{v}^{2} + \sigma_{u}^{2} \,{\text{and}}\,\hat{\mu }_{3}^{'} \equiv \frac{1}{N}\mathop \sum \limits_{i} e_{i}^{3} \to_{p} \mu_{3}^{'} = \sqrt {\frac{2}{\pi }} \left( {\frac{\pi - 4}{\pi }} \right)\sigma_{u}^{3} . $$

(2)

Therefore, we obtain consistent estimates of $ \sigma_{u}^{2} $ and $ \sigma_{v}^{2} $ as

$$ \hat{\sigma }_{u}^{2} = \left[ {\sqrt {\frac{\pi }{2}} \left( {\frac{\pi }{\pi - 4}} \right)\hat{\mu }_{3}^{'} } \right]^{2/3} ,\quad \hat{\sigma }_{v}^{2} = \hat{\sigma }^{2} - \hat{\sigma }_{u}^{2} . $$

(3)

If $ \hat{\alpha },\hat{\alpha } $ are the least squares estimates, the COLS estimates of $ \alpha $ and $ \beta $ are $ \tilde{\alpha } = \hat{\alpha } + \sqrt {\frac{2}{\pi }} \hat{\sigma }_{u} $ and $ \tilde{\beta } = \hat{\beta } $. However, our interest in this paper is just in the estimates of $ \sigma_{u}^{2} $ and $ \sigma_{v}^{2} $ (as given in Eq. (3)) themselves. In the wrong skew case that $ \hat{\mu }_{3}^{'} > 0 $, in Eq. (3) we have $ \hat{\sigma }_{u} < 0 $ and $ \hat{\sigma }_{u}^{2} $ is not really well defined. So the COLS method fails.

With respect to the MLE, the situation is more complicated. Waldman (1982) showed that the point $ \hat{\alpha },\hat{\beta } = $ OLS, $ \hat{\sigma }_{u}^{2} = 0 $, $ \hat{\sigma }_{v}^{2} = \hat{\sigma }^{2} $ is always a stationary point of the likelihood. He also showed that the information matrix is singular at this point. Finally, Waldman showed that the stationary point given above is a local maximum of the likelihood when $ \hat{\mu }_{3}^{'} > 0 $. It is generally thought that it is also the global maximum.

The wrong skew problem occurs most frequently when the sample size is small and the population value of $ \lambda = \sigma_{u} /\sigma_{v} $ is small. Many people might find the frequency with which it occurs to be surprising. For example, Simar and Wilson (2009, Table 1, p. 71) report simulations in which the probability of a wrong skew is 0.301 when $ n $ = 100 and $ \lambda $ = 1; it is 0.320 when $ n $ = 500 and $ \lambda $ = 0.5; and it is 0.386 when $ n $ = 10,000 and $ \lambda $ = 0.1.

3 The RSCFG model

We will now consider the case that the distribution of technical inefficiency ($ u_{i} $) depends on some observable “environmental variables” $ z_{i} $ that may or may not affect the level of the frontier but that do affect the level of technical inefficiency. A possible example of such a variable in an agricultural setting would be ownership of the farm (private vs. state-owned).

The most commonly assumed case is that the distribution of $ u_{i} $ is truncated normal. In standard notation, $ u_{i} $ is distributed as $ N^{ + } \left( {\mu_{i} ,\sigma_{i}^{2} } \right) $. When $ \mu_{i} = 0 $ and $ \sigma_{i}^{2} $ is constant (does not depend on i), we have the standard stochastic frontier model of the previous section. When $ \mu_{i} $ and $ \sigma_{i}^{2} $ are constant, we have the truncated normal model of Stevenson (1980). However, here we are interested in models in which $ \mu_{i} $ and/or $ \sigma_{i}^{2} $ depend on environmental variables $ z_{i} $. For example, in the RSCFG model of Reifschneider and Stevenson (1991), Caudill and Ford (1993) and Caudill et al. (1995), $ \mu_{i} = 0 $ and $ \sigma_{i}^{2} $ is a function of $ z_{i} $ and some parameters. In the KGMHLBC model of Kumbhakar et al. (1991), Huang and Liu (1994) and Battese and Coelli (1995), $ \sigma_{i}^{2} $ is constant (does not depend on i) and $ \mu_{i} $ is a function of $ z_{i} $ and parameters. In the model of Wang (2002), both $ \mu_{i} $ and $ \sigma_{i}^{2} $ depend on $ z_{i} $ and parameters. In the model of Alvarez et al. (2006), there is a “scaling function” $ g\left( {z_{i} ,\theta } \right) $ such that $ \mu_{i} = \mu \cdot g\left( {z_{i} ,\theta } \right) $ and $ \sigma_{i} = \sigma \cdot g\left( {z_{i} ,\theta } \right). $ A related model is the model of Amsler et al. (2015) in which the post-truncation mean and variance of $ u_{i} $ are parameterized.

In this paper, we will consider the specific case of the RSCFG model with $ \sigma_{i} = \sigma_{u} { \exp }(z_{i}^{'} \delta $). We treat $ x_{i} $ and $ z_{i} $ as “fixed” (independent of $ v_{i} $ and $ u_{i} ) $ so that the model is:

$$ y_{i} = \alpha + x_{i}^{'} \beta + v_{i} - u_{i} ,\;v_{i} \sim N\left( {0,\sigma_{v}^{2} } \right),\;u_{i} \sim N^{ + } \left( {0,\sigma_{u}^{2} { \exp }\left( {2z_{i}^{'} \delta } \right)} \right). $$

(4)

This is a straightforward extension of the standard stochastic frontier model because $ u_{i} $ is still half normal. We will use the notation $ d_{z} $ = dimension($ z_{i} $) and $ d_{x} $ = dimension($ x_{i} $).

4 Nonlinear least squares estimation of the RSCFG model

Given the RSCFG model with $ \sigma_{i} = \sigma_{u} { \exp }(z_{i}^{'} \delta $), we have $ E\left( {y_{i} |x_{i} ,z_{i} } \right) = \alpha + x_{i}^{'} \beta - \sqrt {\frac{2}{\pi }} \sigma_{u} { \exp }\left( {z_{i}^{'} \delta } \right) $. This suggests a nonlinear least squares (NLLS) estimator that minimizes (with respect to $ \alpha ,\beta ,\sigma_{u} $ and $ \delta $) the criterion function

$$ {\text{SSE}} = \mathop \sum \limits_{i} \left[ {y_{i} - \alpha - x_{i}^{'} \beta + \sqrt {\frac{2}{\pi }} \sigma_{u} { \exp }\left( {z_{i}^{'} \delta } \right)} \right]^{2} . $$

(5)

We will denote the NLLS estimates as $ \tilde{\alpha },\tilde{\beta },\tilde{\sigma }_{u} $ and $ \tilde{\delta } $.

We can note a few points about identification of the model based on this criterion function. First, obviously $ \delta $ is not identified when $ \sigma_{u} = 0 $. Second, $ \alpha $ and $ \sigma_{u} $ are not separately identified when $ \delta = 0 $. Third, this criterion function does not lead directly to an estimate of $ \sigma_{v}^{2} $, but we can derive an estimate based on the NLLS estimates. Note that if $ u_{i} $ ~ $ N^{ + } (0,\sigma_{i}^{2} $), then $ E\left( {u_{i}^{2} } \right) = $$ \sigma_{i}^{2} $ and so $ E\left( {\varepsilon_{i}^{2} } \right) = $$ \sigma_{v}^{2} + \sigma_{i}^{2} $ = $ \sigma_{v}^{2} + \sigma_{u}^{2} { \exp }\left( {2z_{i}^{'} \delta } \right) $. If $ \tilde{\varepsilon }_{i} = y_{i} - \tilde{\alpha } - x_{i}^{'} \tilde{\beta } $, this leads to the estimate

$$ \tilde{\sigma }_{v}^{2} = \frac{1}{n}\mathop \sum \limits_{i} \tilde{\varepsilon }_{i}^{2} - \tilde{\sigma }_{u}^{2} \frac{1}{n}\mathop \sum \limits_{i} { \exp }\left( {2z_{i}^{'} \tilde{\delta }} \right). $$

(6)

This is similar in spirit to the estimator given in (3) for the standard SFA model.

Let $ \psi = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} \alpha \\ \beta \\ \end{array} } \\ {\sigma_{u} } \\ \delta \\ \end{array} } \right] $, the parameters that we seek to estimate by NLLS. Let $ \psi^{*} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\hat{\alpha }} \\ {\hat{\beta }} \\ \end{array} } \\ 0 \\ 0 \\ \end{array} } \right] $, where $ \hat{\alpha },\hat{\beta } $ = OLS of $ y $ on intercept and $ x $. Obviously, $ \psi^{*} $ is a set of parameters that indicate no inefficiency. Note that we carefully said “a set of parameters that indicate no inefficiency” rather than “the set of parameters that indicate no inefficiency” because when $ \sigma_{u} = 0 $ we have no inefficiency regardless of the value of $ \delta $.

Result 1

The criterion SSE given in Eq. (5) has a stationary point at $ {\psi}^{{*}} . $

Proof

The derivatives of the NLLS criterion function with respect to the parameters in $ \psi $ are:

$$ \nabla_{\alpha } {\text{SSE}} = - 2\mathop \sum \limits_{i} \left[ {y_{i} - \alpha - x_{i}^{'} \beta + \sqrt {\frac{2}{\pi }} \sigma_{u} { \exp }\left( {z_{i}^{'} \delta } \right)} \right], $$

(7A)

$$ \nabla_{\beta } {\text{SSE}} = - 2\mathop \sum \limits_{i} \left\{ {\left[ {y_{i} - \alpha - x_{i}^{'} \beta + \sqrt {\frac{2}{\pi }} \sigma_{u} { \exp }\left( {z_{i}^{'} \delta } \right)} \right]x_{i} } \right\}, $$

(7B)

$$ \nabla_{{\sigma_{u} }} {\text{SSE}} = 2\sqrt {\frac{2}{\pi }} \mathop {\sum \left\{ {\left[ {y_{i} - \alpha - x_{i}^{'} \beta + \sqrt {\frac{2}{\pi }} \sigma_{u} { \exp }\left( {z_{i}^{'} \delta } \right)} \right] {\text{exp}}\left( {z_{i}^{'} \delta } \right)} \right\}}\limits_{i} , $$

(7C)

$$ \nabla_{\delta } {\text{SSE}} = 2\sqrt {\frac{2}{\pi }} \sigma_{u} \mathop \sum \limits_{i} \left\{ {\left[ {y_{i} - \alpha - x_{i}^{'} \beta + \sqrt {\frac{2}{\pi }} \sigma_{u} { \exp }\left( {z_{i}^{'} \delta } \right)} \right] {\text{exp}}\left( {z_{i}^{'} \delta } \right)z_{i} } \right\}. $$

(7D)

These derivatives are all zero at $ \psi^{*} $. (The derivatives in (7A), (7B) and (7D) equal zero when $ \alpha = \hat{\alpha } $, $ \beta = \hat{\beta } $ and $ \sigma_{u} = 0 $ regardless of the value of $ \delta $, but $ \delta = 0 $ is required for the derivative in (7C) to equal zero.) So this point is a stationary point of the NLLS criterion function.

Next we ask whether there is any readily interpretable condition (analogous to “wrong skew” in the standard SFA model) such that this stationary point is a local minimum of the NLLS criterion function. The Hessian (second derivative) matrix $ H $ is messy and is given in Appendix 1. The Hessian evaluated at the stationary point $ \psi^{*} $ is simpler and is given by:

$$ \frac{1}{2}H\left( {\psi^{*} } \right) = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} n & {n\bar{x}^{'} } \\ {n\bar{x}} & {\mathop \sum \limits_{i} x_{i} x_{i}^{'} } \\ \end{array} } & {\begin{array}{*{20}c} { - kn} & 0 \\ { - kn\bar{x}} & 0 \\ \end{array} } \\ {\begin{array}{*{20}c} { - kn} & { - kn\bar{x}^{'} } \\ 0 & 0 \\ \end{array} } & {\begin{array}{*{20}c} {k^{2} n} & { 0} \\ { 0} & { 0} \\ \end{array} } \\ \end{array} } \right]. $$

(8)

In this expression, $ k = \sqrt {2/\pi } $. The four block rows (and columns) correspond to $ \alpha ,\beta ,\sigma_{u} $ and $ \delta $, and they are of dimension 1, $ d_{x} $, 1 and $ d_{z} $, respectively. The matrix is singular, with rank equal to $ d_{x} + 1 $, so there are $ d_{z} + 1 $ eigenvalues equal to zero.

The Hessian $ H\left( {\psi^{*} } \right) $ is positive semi-definite, which is a necessary condition for $ \psi^{*} $ to be a local minimum. However, because the Hessian is singular, this is not a sufficient condition. As in Waldman (1982), we need to examine the values of the criterion function in the directions that correspond to the eigenvectors associated with the zero eigenvalues. To elaborate on this point, consider the Taylor series expansion of $ {\text{SSE}} $ around the point $ \psi^{*} $:

$$\begin{aligned} {\text{SSE}}\left( \psi \right) - {\text{SSE}}\left( {\psi^{*} } \right) &= \left( {\nabla_{\psi } {\text{SSE}}\left( {\psi^{*} } \right)} \right)^{'} \left( {\psi - \psi^{*} } \right) + \raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} \left( {\psi - \psi^{*} } \right)^{'} H\left( {\psi^{*} } \right)\left( {\psi - \psi^{*} } \right)\\ &\quad + {\text{higher-order}}\;{\text{terms}}.\end{aligned} $$

(9)

The first term on the right-hand side of (9) equals zero because $ \nabla_{\psi } {\text{SSE}}\left( {\psi^{*} } \right) = 0 $, so we consider the second term, $ \left( {\psi - \psi^{*} } \right)^{'} H\left( {\psi^{*} } \right)\left( {\psi - \psi^{*} } \right) $. A necessary condition for $ \psi^{*} $ to be a local minimum is that $ H\left( {\psi^{*} } \right) $ is positive semi-definite, since, if there exists a vector $ g $ such that $ g^{\prime}H\left( {\theta^{*} } \right)g < $ 0, then for small enough scalar τ > 0, $ \psi = \psi^{*} + \tau g $ will lead to a smaller $ {\text{SSE}} $ than $ \psi^{*} $. However, for vectors that are linear combinations of the eigenvectors corresponding to the zero eigenvalues (i.e., vectors in the null space of $ H\left( {\psi^{*} } \right) $) the second-order term above is zero, and we need to investigate the behavior of the criterion function in those directions, since the higher-order terms could be of either sign.

The eigenvectors that correspond to the zero eigenvalues, and which span the null space of $ H\left( {\psi^{*} } \right) $, are as follows:

$$ \left[ {\begin{array}{*{20}c} 1 \\ 0 \\ {\begin{array}{*{20}c} {1/k} \\ 0 \\ \end{array} } \\ \end{array} } \right],\left[ {\begin{array}{*{20}c} 0 \\ 0 \\ {\begin{array}{*{20}c} 0 \\ {\iota_{1} } \\ \end{array} } \\ \end{array} } \right],\left[ {\begin{array}{*{20}c} 0 \\ 0 \\ {\begin{array}{*{20}c} 0 \\ {\iota_{2} } \\ \end{array} } \\ \end{array} } \right], \ldots ,\left[ {\begin{array}{*{20}c} 0 \\ 0 \\ {\begin{array}{*{20}c} 0 \\ {\iota_{{d_{z} }} } \\ \end{array} } \\ \end{array} } \right]. $$

(10)

Here $ \frac{1}{k} = \sqrt {\frac{\pi }{2}} $; $ \iota_{j} $ is a vector of zeroes except for a one in position $ j $, and the dimensions of the four blocks in the vectors in (10) are 1, $ d_{x} $, 1 and $ d_{z} $. Therefore, a vector in the null space of $ H\left( {\psi^{*} } \right) $ will be of the form $ w = \left[ {\begin{array}{*{20}c} 1 \\ 0 \\ {\begin{array}{*{20}c} {1/k} \\ \delta \\ \end{array} } \\ \end{array} } \right] $ where in this expression $ \delta $ is arbitrary (not necessarily the true parameter value).

Now we consider what happens to the least squares criterion function for small moves in this direction. That is, we consider a parameter value $ \psi^{o} = \psi^{*} + \tau w $ for small $ \tau > 0 $. (We require $ \tau > 0 $ because $ \sigma_{u} > 0.) $ Then, we calculate the following:

$$ {\text{SSE}}\left( {\psi^{*} } \right) = {\text{usual}}\,{\text{ least}}\,{\text{squares}}\,{\text{SSE}} = \mathop \sum \limits_{i} e_{i}^{2} ,e_{i} = {\text{OLS}}\,{\text{residuals,}} $$

(11A)

$$ {\text{SSE}}\left( {\psi^{o} } \right) = \mathop \sum \limits_{i} \left[ {e_{i} + \tau \left( {\exp \left( {\tau z_{i}^{'} \delta } \right) - 1} \right)} \right]^{2} , $$

(11B)

$$ \Delta \equiv {\text{SSE}}\left( {\psi^{o} } \right) - {\text{SSE}}\left( {\psi^{*} } \right) = \tau^{2} \mathop \sum \limits_{i} \left[ {\exp \left( {\tau z_{i}^{'} \delta } \right) - 1} \right]^{2} + 2\tau \mathop \sum \limits_{i} e_{i} \exp \left( {\tau z_{i}^{'} \delta } \right). $$

(11C)

For local movements (small $ \tau $), the term of order $ \tau $ will dominate the term of order $ \tau^{2} $, and so a necessary and sufficient condition for $ \psi^{*} $ to be a local minimum of $ {\text{SSE}} $ is:

$$ 2\tau \mathop \sum \limits_{i} e_{i} { \exp }\left( {\tau z_{i}^{'} \delta } \right) \ge 0\,{\text{for}}\,{\text{all}}\,\delta . $$

(12)

This is a strong requirement that intuitively should not be expected to hold. And it does not, as the following result shows.

Result 2

If $ \mathop \sum \limits_{i} z_{i}^{'} e_{i} \ne 0, $ the point $ \psi^{*} $ is neither a local minimum nor a local maximum of $ {\text{SSE}} . $

Proof

We have $ \Delta $ = $ 2\tau \mathop \sum \limits_{i} e_{i} \exp \left( {\tau z_{i}^{'} \delta } \right) + \tau^{2} \mathop \sum \limits_{i} \left[ {\exp \left( {\tau z_{i}^{'} \delta } \right) - 1} \right]^{2} $ as given in Eq. (11C). (For small $ \tau $, the first term will be the one that matters, which the calculation we are about to do will verify.) We make use of the Taylor series expansion $ \exp \left( x \right) = 1 + x + \raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} x^{2} $ + h.o.t. (higher-order terms), which yields

$$ \Delta = 2\tau \mathop \sum \limits_{i} e_{i} [1 + \tau z_{i}^{'} \delta + \raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} \tau^{2} \left( {z_{i}^{'} \delta } \right)^{2} ] + \tau^{2} \mathop \sum \limits_{i} \left[ {\tau z_{i}^{'} \delta + \raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} \tau^{2} \left( {z_{i}^{'} \delta } \right)^{2} } \right]^{2} + {\text{ h}}.{\text{o}}.{\text{t}}. $$

(13)

Since $ \mathop \sum \nolimits_{i} e_{i} = 0 $, when $ \mathop \sum \nolimits_{i} z_{i}^{'} e_{i} \ne 0 $ the dominant term is $ 2\tau^{2} \mathop \sum \nolimits_{i} \left( {z_{i}^{'} e_{i} } \right)\delta $.

This term can be made of either sign by appropriate choice of $ \delta $. For example, suppose that the first nonzero element of $ \mathop \sum \nolimits_{i} z_{i}^{'} e_{i} $ is in position “j” and that it is positive. Then, if we pick $ \delta $ to be equal to zero except for a value of one in position “j,” this term will be positive, and if instead we put a value of minus one in position “j,” the term will be negative. (Reverse the signs when the first nonzero element of $ \mathop \sum \nolimits_{i} z_{i}^{'} e_{i} $ is negative.) So $ \Delta $ can be of either sign, and the condition in Eq. (12) cannot hold.

If $ \mathop \sum \nolimits_{i} z_{i}^{'} e_{i} = 0 $, the dominant term in the expression for $ \Delta $ will be $ \tau^{3} \mathop \sum \nolimits_{i} e_{i} \left( {z_{i}^{'} \delta } \right)^{2} $, and we cannot find anything useful to say about the sign of that term. However, we note that $ \mathop \sum \nolimits_{i} z_{i}^{'} e_{i} = $ 0 is an event of probability zero unless $ z_{i} $ is made up of linear combinations of $ x_{i} $. This is formally possible but very unlikely in empirical practice, so this case is not of particular practical importance.

5 MLE of the RSCFG model

We now consider the MLE of the RSCFG model. The model is as given in Eq. (4).

We define the following notation:

$$ \begin{aligned} \sigma_{i} & = \sigma_{u} { \exp }\left( {z_{i}^{'} \delta } \right),\,\lambda_{i} = \sigma_{i} /\sigma_{v} ,\,\omega_{i}^{2} = \sigma_{i}^{2} + \sigma_{v}^{2} ,\,c_{i} = - \varepsilon_{i} \lambda_{i} /\omega_{i} , \\ \varphi_{i} & = \varphi \left( {c_{i} } \right),\,\varPhi_{i} = \varPhi \left( {c_{i} } \right), \\ \end{aligned} $$

(14)

where $ \varphi $ is the standard normal density and $ \varPhi $ is the standard normal cdf. Then, the log-likelihood is given by

$$ \ln L = {\text{constant}}\, - \frac{1}{2}\mathop \sum \limits_{i} \ln \omega_{i}^{2} - \frac{1}{2}\mathop \sum \limits_{i} \frac{{\varepsilon_{i}^{2} }}{{\omega_{i}^{2} }} + \mathop \sum \limits_{i} \ln \varPhi_{i} , $$

(15)

where $ \varepsilon_{i} = y_{i} - \alpha - x_{i}^{'} \beta $.

Let $ \theta = \left[ {\begin{array}{*{20}c} \alpha \\ \beta \\ {\begin{array}{*{20}c} {\sigma_{u} } \\ \delta \\ {\sigma_{v}^{2} } \\ \end{array} } \\ \end{array} } \right] $, the vector of the parameters we wish to estimate. This differs from $ \psi $ of the previous section because it includes $ \sigma_{v}^{2} $. Let $ \theta^{*} = \left[ {\begin{array}{*{20}c} {\hat{\alpha }} \\ {\hat{\beta }} \\ {\begin{array}{*{20}c} 0 \\ 0 \\ {\hat{\sigma }_{v}^{2} } \\ \end{array} } \\ \end{array} } \right] $, our potential stationary point, where $ \hat{\alpha } $ and $ \hat{\beta } $ are the OLS estimates and $ \hat{\sigma }_{v}^{2} = \frac{1}{n}\mathop \sum \nolimits_{i} e_{i}^{2} $ where the $ e_{i}^{2} $ are the OLS residuals. Then, we have the following result.

Result 3

The log-likelihood given in Eq. (14) has a stationary point at ${\theta}^{{*}} . $

Proof

The derivatives of the log-likelihood with respect to the parameters in $ \theta $ are:

$$ \nabla_{\alpha } \ln L = \mathop \sum \limits_{i} \left[ {\frac{{\varepsilon_{i} }}{{\omega_{i}^{2} }} + \frac{{\varphi_{i} }}{{\varPhi_{i} }}\frac{{\lambda_{i} }}{{\omega_{i} }}} \right], $$

(16A)

$$ \nabla_{\beta } \ln L = \mathop \sum \limits_{i} \left[ {\frac{{\varepsilon_{i} }}{{\omega_{i}^{2} }} + \frac{{\varphi_{i} }}{{\varPhi_{i} }}\frac{{\lambda_{i} }}{{\omega_{i} }}} \right]x_{i} , $$

(16B)

$$ \nabla_{{\sigma_{u} }} \ln L = \mathop \sum \limits_{i} \left[ { - \frac{{\sigma_{u} }}{{\omega_{i}^{2} }}\exp \left( {2z_{i}^{'} \delta } \right) + \frac{{\sigma_{u} }}{{\omega_{i}^{4} }}{ \exp }\left( {2z_{i}^{'} \delta } \right)\varepsilon_{i}^{2} - \frac{{\varphi_{i} }}{{\varPhi_{i} }}\frac{1}{{\sigma_{v} }}\frac{{\varepsilon_{i} }}{{\omega_{i} }}{ \exp }\left( {z_{i}^{'} \delta } \right) + \sigma_{u} \frac{{\varphi_{i} }}{{\varPhi_{i} }}\frac{{\lambda_{i} \varepsilon_{i} }}{{\omega_{i}^{3} }}{ \exp }\left( {2z_{i}^{'} \delta } \right)} \right], $$

(16C)

$$ \nabla_{\delta } \ln L = \mathop \sum \limits_{i} z_{i} \left[ { - \frac{{\varphi_{i} }}{{\varPhi_{i} }}\frac{{\varepsilon_{i} \lambda_{i} }}{{\omega_{i} }} - \frac{{\sigma_{u}^{2} }}{{\omega_{i}^{2} }}\exp \left( {2z_{i}^{'} \delta } \right) + \sigma_{u}^{2} \frac{{\varepsilon_{i}^{2} }}{{\omega_{i}^{4} }}\exp \left( {2z_{i}^{'} \delta } \right) + \sigma_{u}^{2} \frac{{\varphi_{i} }}{{\varPhi_{i} }}\frac{{\lambda_{i} \varepsilon_{i} }}{{\omega_{i}^{3} }}{ \exp }\left( {2z_{i}^{'} \delta } \right)} \right], $$

(16D)

$$ \nabla_{{\sigma_{v}^{2} }} \ln L = \frac{1}{2}\mathop \sum \limits_{i} \left[ {\frac{{\varphi_{i} }}{{\varPhi_{i} }}\frac{1}{{\sigma_{v}^{2} }}\frac{{\lambda_{i} \varepsilon_{i} }}{{\omega_{i} }} - \frac{1}{{\omega_{i}^{2} }} + \frac{{\varepsilon_{i}^{2} }}{{\omega_{i}^{4} }} + \frac{{\varphi_{i} }}{{\varPhi_{i} }}\frac{{\lambda_{i} \varepsilon_{i} }}{{\omega_{i}^{3} }}} \right]. $$

(16E)

At $ \theta^{*} $, the following simplifications occur: $ \sigma_{u} = 0 $, $ \lambda_{i} = 0,c_{i} = 0,\omega_{i}^{2} = \sigma_{v}^{2} $, $ \frac{{\varphi_{i} }}{{\varPhi_{i} }} = \sqrt {\frac{2}{\pi }} $ and $ { \exp }\left( {z_{i}^{'} \delta } \right) $ = 1. Then, we have $ \nabla_{\alpha } \ln L = \frac{1}{{\sigma_{v}^{2} }}\mathop \sum \nolimits_{i} \varepsilon_{i} $, $ \nabla_{\beta } \ln L = \frac{1}{{\sigma_{v}^{2} }}\mathop \sum \nolimits_{i} x_{i} \varepsilon_{i} $, $ \nabla_{{\sigma_{u} }} \ln L = - \sqrt {\frac{2}{\pi }} \frac{1}{{\sigma_{v}^{2} }}\mathop \sum \nolimits_{i} \varepsilon_{i} $, $ \nabla_{\delta } \ln L = 0 $ and $ \nabla_{{\sigma_{v}^{2} }} \ln L = \frac{1}{2}\mathop \sum \nolimits_{i} \left[ { - \frac{1}{{\sigma_{v}^{2} }} + \frac{1}{{\sigma_{v}^{4} }}\varepsilon_{i}^{2} } \right] $. All of these expressions equal zero when evaluated at the OLS values $ e_{i} $ and $ \hat{\sigma }_{v}^{2} $.

Next we ask whether we can identify cases such that this may be a local maximum of the likelihood. We use a Taylor series expansion (similar to Eq. (9)) of $ \ln L $ around the point $ \theta^{*} $:

$$\begin{aligned} \ln L\left( \theta \right) - \ln L\left( {\theta^{*} } \right) &= (\nabla_{\theta } \ln L\left( {\theta^{*} } \right))^{'} (\theta - \theta^{*} ) + \raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} \left( {\theta - \theta^{*} } \right)^{'} H\left( {\theta^{*} } \right)\left( {\theta - \theta^{*} } \right)\\ &\quad + {\text{ higher-order}}\,{\text{terms}} .\end{aligned}$$

(17)

Here $ H\left( {\theta^{*} } \right) $ is the Hessian (second derivative matrix) evaluated at $ \theta^{*} $. The first term on the right-hand side of (17) equals zero because $ \nabla_{\theta } \ln L\left( {\theta^{*} } \right) = 0 $, so we need to consider the second term, $ \left( {\theta - \theta^{*} } \right)^{'} H\left( {\theta^{*} } \right)\left( {\theta - \theta^{*} } \right) $. A necessary condition for $ \theta^{*} $ to be a local maximum of $ \ln L $ is that $ H\left( {\theta^{*} } \right) $ is negative semi-definite, since, if there exists a vector $ \nu $ such that $ \nu^{\prime}H\left( {\theta^{*} } \right)\nu > $ 0, then for small enough scalar $ \tau > 0 $, $ \theta = \theta^{*} + \tau \nu $ will lead to a higher likelihood value than $ \theta^{*} $.

The Hessian is complicated and is given in a supplemental file available from the authors. However, at $ \theta^{*} $ the simplifications listed above hold, and we obtain the following expression for the Hessian evaluated at $ \theta^{*} $.

$$ H = - \frac{1}{{\sigma_{v}^{2} }}\left[ {\begin{array}{*{20}c} G & 0 \\ 0 & {\frac{n}{{2\sigma_{v}^{2} }}} \\ \end{array} } \right], $$

(18A)

$$ G = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {A_{11} } & { A_{12} } \\ {A_{21} } & { A_{22} } \\ \end{array} } & {\begin{array}{*{20}c} { - kA_{11} } & 0 \\ { - kA_{21} } & 0 \\ \end{array} } \\ {\begin{array}{*{20}c} { - kA_{11} } & { - kA_{12} } \\ 0 & 0 \\ \end{array} } & {\begin{array}{*{20}c} { k^{2} A_{11} } & \xi \\ { \xi^{\prime}} & 0 \\ \end{array} } \\ \end{array} } \right]. $$

(18B)

Here, $ k= \sqrt {\frac{2}{\pi }} $; $ \xi = k\mathop \sum \limits_{i} z_{i}^{'} e_{i} $ (a row vector); and

$$ \left[ {\begin{array}{*{20}c} {A_{11} } & {A_{12} } \\ {A_{21} } & {A_{22} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} n & {n\bar{x}^{'} } \\ {n\bar{x}} & {\mathop \sum \limits_{i} x_{i} x_{i}^{'} } \\ \end{array} } \right] $$

(18C)

which is positive definite.

Suppose first that $ \mathop \sum \nolimits_{i} z_{i}^{'} e_{i} \ne 0 $ ($ \xi \ne 0) $, which will be the case with probability one unless $ z_{i} $ is made up of linear combinations of $ x_{i} $. When $ \xi \ne 0 $, $ G $ and $ H $ are nonsingular if $ d_{z} $ = 1 (so that $ \xi $ is a scalar), and they are singular if $ d_{z} \ge $ 2. Regardless of the dimension of $ \xi $, $ H $ will be negative semi-definite if and only if $ G $ is positive semi-definite. We now show that this is not the case.

Result 4

If $ \varvec{\xi}\ne 0 $, then $ \varvec{G} $ is neither positive semi-definite nor negative semi-definite.

Proof

(i) Let $ \eta_{1} = \left[ {k,0,1,\xi } \right]' $. Then, $ \eta_{1}^{'} G\eta_{1} = 2\xi \xi^{\prime} > 0 $ so $ G $ cannot be negative semi-definite. (ii) Now let $ \eta_{2} = \left[ {k,0,1, - \xi } \right]' $. Then, $ \eta_{2}^{'} G\eta_{2} = - 2\xi \xi^{\prime} < 0 $ so $ G $ cannot be positive semi-definite.

This shows that when $ \mathop \sum \nolimits_{i} z_{i}^{'} e_{i} \ne 0 $, $ \theta^{*} $ cannot be a local maximum of the likelihood. (The necessary condition for $ \theta^{*} $ to be a local maximum, namely that $ H $ should be negative semi-definite, fails when $ G $ is not positive semi-definite.) It also cannot be a local minimum.

Finally, we consider the case that $ \xi = 0 $ ($ \mathop \sum \nolimits_{i} z_{i}^{'} e_{i} = 0) $, which essentially means that $ z_{i} $ is made up of linear combinations of $ x_{i} $. This case is very similar to the case considered in Sect. 4. The eigenvectors that correspond to the zero eigenvalues, and which span the null space of $ H\left( {\psi^{*} } \right) $, are as follows:

$$ \left[ {\begin{array}{*{20}c} 1 \\ 0 \\ {\begin{array}{*{20}c} {1/k} \\ {\begin{array}{*{20}c} 0 \\ 0 \\ \end{array} } \\ \end{array} } \\ \end{array} } \right],\left[ {\begin{array}{*{20}c} 0 \\ 0 \\ {\begin{array}{*{20}c} 0 \\ {\begin{array}{*{20}c} {\iota_{1} } \\ 0 \\ \end{array} } \\ \end{array} } \\ \end{array} } \right],\left[ {\begin{array}{*{20}c} 0 \\ 0 \\ {\begin{array}{*{20}c} 0 \\ {\begin{array}{*{20}c} {\iota_{2} } \\ 0 \\ \end{array} } \\ \end{array} } \\ \end{array} } \right], \ldots ,\left[ {\begin{array}{*{20}c} 0 \\ 0 \\ {\begin{array}{*{20}c} 0 \\ {\begin{array}{*{20}c} {\iota_{{d_{z} }} } \\ 0 \\ \end{array} } \\ \end{array} } \\ \end{array} } \right]. $$

(19)

So a vector in the null space of $ H\left( {\psi^{*} } \right) $ will be of the form $ w = \left[ {\begin{array}{*{20}c} 1 \\ 0 \\ {\begin{array}{*{20}c} {1/k} \\ {\begin{array}{*{20}c} \delta \\ 0 \\ \end{array} } \\ \end{array} } \\ \end{array} } \right] $ for arbitrary $ \delta $. Now we consider what happens to the log-likelihood function for small movements in this direction. That is, we consider a parameter value $ \theta^{o} = \theta^{*} + \tau w $ for small $ \tau > 0 $. It is easy to calculate

$$ \ln L\left( {\theta^{*} } \right) = {\text{constant}}\, - \frac{n}{2} - n\ln 2 - \frac{n}{2}\ln \hat{\sigma }_{v}^{2} . $$

(20)

However, $ \ln L(\theta^{*} + \tau w) $ yields an explicit but impenetrable expression, which we give in Appendix 2. We do not see any way to say anything meaningful in this case. This is arguably not troublesome, because (as noted above) $ \mathop \sum \nolimits_{i} z_{i}^{'} e_{i} = 0 $ is a zero-probability event unless the elements of $ z_{i} $ are linear combinations of $ x_{i} $.

6 Practical implications

It is certainly possible for the residuals (OLS or NLLS or MLE) to have a positive (wrong) skew. The main practical implication of our results is that this does not mean that there is a problem. There is always a stationary point ($ \psi^{*} $, above) of the NLLS criterion that would indicate zero inefficiency, and similarly there is always a stationary point ($ \theta^{*} $) of the log-likelihood that would indicate zero inefficiency. However, in general these points are not a local minimum of the sum of squares or a local maximum of the log-likelihood. This does not appear to have anything to do with the skew of the residuals.

We will illustrate these issues with a small simulation. The DGP will be as simple as possible: $ y_{i} = \alpha + v_{i} - u_{i} $, $ i = 1, \ldots ,n, $ where $ v_{i} $ is $ N\left( {0,\sigma_{v}^{2} } \right) $ and (conditional on $ z_{i} $) $ u_{i} $ is half normal with pre-truncation variance $ \sigma_{u}^{2} \cdot { \exp }\left( {2z_{i} \delta } \right) $. Here, $ z_{i} $ is N(0,1), $ \alpha = 0 $, $ \sigma_{v}^{2} = 1 $ and $ n $ = 100. The number of replications is 1000.

We use three different choices for $ \delta $ and $ \sigma_{u}^{2} $. DGP1 has $ \delta = 0 $ and $ \sigma_{u}^{2} = 1 $. This is the same DGP as for the standard SFA model with $ \lambda = 1 $. DGP2 has $ \delta = 0.5 $ and $ \sigma_{u}^{2} = 0.6065 $ ($ \sigma_{u} = 0.7788) $, and DGP3 has $ \delta = 1 $ and $ \sigma_{u}^{2} = 0.1353 \left( {\sigma_{u} = 0.3678} \right) $. Using the properties of the lognormal distribution, we have $ E\left( {u^{2} } \right) = \sigma_{u}^{2} e^{{2\delta^{2} }} $, and our values of $ \delta $ and $ \sigma_{u}^{2} $ are picked such that $ \delta $ varies but $ E\left( {u^{2} } \right) = 1 $ for each of the DGPs.

For each replication, we estimate the model in three ways. (i) OLS, which estimates $ \alpha $ and $ \sigma_{\varepsilon }^{2} \equiv {\text{var}}\left( {\varepsilon_{i} } \right) = \sigma_{v}^{2} + \frac{\pi - 2}{\pi }\sigma_{u}^{2} e^{{\delta^{2} }} $; (ii) NLLS, which estimates $ \alpha ,\delta $ and $ \sigma_{u}^{2} $; and (iii) MLE, which estimates $ \alpha ,\delta ,\sigma_{u}^{2} $ and $ \sigma_{v}^{2} $. We then calculate the mean, bias, variance and MSE of each of the estimates. We count the number of times that each of the various residuals has the wrong skew. Finally, we count how often the NLLS estimator equals $ \psi^{*} $ and how often the MLE equals $ \theta^{*} $. We need to be explicit about how this is defined. For example, the event “MLE is equal to $ \theta^{*} $” is the complement of the event “MLE is not equal to $ \theta^{*} , $” and this latter event occurs if and only if the likelihood evaluated at the MLE is larger than the likelihood evaluated at $ \theta^{*} $. That is, we compare likelihood values, rather than comparing the MLE parameter estimates to $ \theta^{*} $, and similarly for NLLS. Inequality is easier to verify or disprove numerically than equality.

The results strongly support the supposition that the residuals can have a wrong skew, but that this does not mean that there is a wrong skew problem. The proportions of replications in which the OLS residuals have a wrong skew (positive third moment) are 0.304, 0.104 and 0.064 for DGP1, DGP2 and DGP3, respectively. (As an aside, the value of 0.304 is very close to the value of 0.301 reported by Simar and Wilson (2009), Table 1, for the same parameter values.) The proportions of replications with a wrong skew for the NLLS residuals are 0.019, 0.001 and 0.001, while for the MLE residuals they are 0.072, 0.003 and 0.002. So a wrong skew can occur. However, it was never the case, for any of the 1000 replications for any of the three DGPs, that the NLLS estimator was equal to $ \psi^{*} $ or that the MLE was equal to $ \theta^{*} $. This is obviously a striking difference between the RSCFG model and the standard SFA model, where a wrong skew of the OLS residuals implies that the MLE is degenerate at a value that implies no inefficiency.

Table 1 DGP1: $ \delta = 0 $, $ \sigma_{u}^{2} = 1 $

Full size table

Although parameter estimation is not the focus of this paper, some evidence on the performance of the various estimators was generated in the course of the simulations. In Tables 1, 2 and 3 (for DGP1, DGP2 and DGP3, respectively), we give the mean, bias, variance and MSE of the OLS, NLLS and MLE estimators.

Table 2 DGP2: $ \delta = 0.5 $, $ \sigma_{u}^{2} = 0.6065 $

Full size table

Table 3 DGP3: $ \delta = 1 $, $ \sigma_{u}^{2} = 0.1353 $

Full size table

Consider first the results in Table 1. In DGP1, where $ \delta = 0 $ and therefore exp($ z_{i}^{'} \delta $) is constant, the OLS estimate of $ \alpha $ is the sample mean, and it correctly estimates the population mean which equals $ \alpha - \sqrt {\frac{2}{\pi }} \sigma_{u} = - 0.7979 $. NLLS fails to give any coherent results because $ \alpha $ and $ \sigma_{u} $ are not separately identified by the first moment (conditional on $ z $) of the data. There is no apparent problem with MLE, for which the composed error distributional assumption yields identification. So the RSCFG model gives sensible MLE results even in this case where the RSCFG model is not needed.

Similar comments apply to the results for the other two DGPs (Tables 2 and 3). The only surprise (to us) is how poorly the NLLS estimator performs. With $ \delta \ne 0 $, we have identification from the first conditional moment but still have a great deal of difficulty of sorting out $ \alpha $ from $ \sigma_{u} $. Again, MLE is much better, and so the NLLS estimator is not recommended for practical use in this model.

7 Concluding remarks

In the standard normal/half-normal SFA model, the “wrong skew” problem refers to a set of problems that occur in the case that the OLS residuals are positively skewed. In this case, Waldman (1982) showed that two things happen. First, the COLS estimator does not exist. Second, there is always a stationary point of the likelihood at a parameter value that corresponds to no inefficiency, and in the wrong skew case, this is a local maximum of the likelihood.

In this paper, we investigated the extent to which these results generalize to related but different models. Specifically, we considered the RSCFG model in which the (pre-truncation) variance of the half-normal error depends on explanatory variables. There is no equivalent of the COLS estimator for this model, but we considered NLLS as well as MLE. We found that there is always a stationary point of the criterion function (for both NLLS and MLE) at a parameter value that corresponds to no inefficiency, thus generalizing one of Waldman’s results. For NLLS, we show that this parameter value is generally neither a local minimum nor a local maximum of the sum of squares. Similarly, for MLE, this parameter value is in general neither a local minimum nor a local maximum of the likelihood. Some limited simulations indicate that these stationary points are not a global minimum (for NLLS) or a global maximum (for MLE). So some of Waldman’s results generalize to the RSCFG model and some do not.

If nothing else, this paper shows the inherent limitations of proceeding on a case-by-case basis. There are some other related results in models similar to the normal/half-normal stochastic frontier model. For example, there is no analogue to Waldman’s results in the normal/exponential stochastic frontier model. However, Rho and Schmidt (2015) show that essentially the same results as Waldman’s hold in the zero inefficiency stochastic frontier model of Kumbhakar et al. (2013). Horrace and Wright (forthcoming) show that results similar to Waldman’s hold for a “delta sequence” family of distributions of inefficiency, for which the inefficiency distribution converges to a Dirac delta function located at the origin as the variance of inefficiency goes to zero. This is a considerable generalization because this family includes many (but not all) commonly assumed distributions. They indicate that their results extend to models with determinants of inefficiency, like the RSCFG model, but they do not give details. It does not appear that our Results 1 or 2 (which have to do with NLLS, not MLE) or Result 4 (the stationary point of the log-likelihood is neither a local min nor a local max) follow from their results. Some more general results would be very useful to understand why results like Waldman’s hold in one specific model and not in other closely related models.

An alternative direction of research is to find alternatives to the normal/half-normal distributional assumption that allow both positive and negative skewness, so that the skew cannot be “wrong.” Papers that do this include Carree (2002), Griffin and Steel (2008), Almanidis and Sickles (2011), Almanidis et al. (2014), Bonanno et al. (2017) and Hafner et al. (2018). A somewhat similar model is the Laplace model of Horrace and Parmeter (2018), for which the population skewness is negative but a wrong skew of the OLS or LAD residuals does not cause difficulties in inference. None of these models include environmental variables, but they could be modified to do so.

Our paper does not address the important practical question of what to do if in an empirical setting one encounters a stationary point that is the global minimum (for NLLS) or maximum (for MLE). Simar and Wilson (2009) provide guidance for the normal/half-normal model without determinants of inefficiency. Extending their results to other models like the RSCFG model is a reasonable task but one that is beyond the scope of the present paper.

References

Aigner DJ, Lovell CAK, Schmidt P (1977) Formulation and estimation of stochastic frontier production function models. J Econom 6:21–37
Article Google Scholar
Almanidis P, Sickles RC (2011) The skewness issue in stochastic frontier models: fact or fiction? In: van Keilegom I, Wilson PW (eds) Exploring research frontiers in contemporary statistics and econometrics. Springer, Berlin
Google Scholar
Almanidis P, Qian J, Sickles RC (2014) Stochastic frontier models with bounded inefficiency. In: Sickles RC, Horrace WC (eds) Econometric methods and applications. Springer, New York
Google Scholar
Alvarez A, Amsler C, Orea L, Schmidt P (2006) Interpreting and testing the scaling property in models where inefficiency depends on firm characteristics. J Prod Anal 25:201–212
Article Google Scholar
Amsler C, Tsay W-J, Schmidt P (2015) A post-truncation parameterization of truncated normal technical inefficiency. J Prod Anal 44:209–220
Article Google Scholar
Battese GE, Coelli TJ (1995) A model for technical inefficiency effects in a stochastic frontier production function for panel data. Empir Econ 20:325–332
Article Google Scholar
Bonanno G, DeGiovanni D, Domma F (2017) The ‘wrong skewness’ problem: a re-specification of stochastic frontiers. J Prod Anal 47:49–64
Article Google Scholar
Carree MA (2002) Technological inefficiency and the skewness of the error component in stochastic frontier analysis. Econ Lett 77:101–107
Article Google Scholar
Caudill SB, Ford JM (1993) Biases in frontier estimation due to heteroskedasticity. Econ Lett 41:17–20
Article Google Scholar
Caudill SB, Ford JM, Gropper DM (1995) Frontier estimation and firm-specific inefficiency measures in the presence of heteroskedasticity. J Bus Econ Stat 13:105–111
Google Scholar
Griffin JE, Steel MFJ (2008) Flexible mixture modelling of stochastic frontiers. J Prod Anal 29:33–50
Article Google Scholar
Hafner CM, Manner H, Simar L (2018) The ‘wrong skewness’ problem in stochastic frontier models: a new approach. Econom Rev 37:380–400
Article Google Scholar
Horrace WC, Parmeter CF (2018) A laplace stochastic frontier model. Econom Rev 37:260–280
Article Google Scholar
Horrace WC, Wright IA (forthcoming) Stationary points for parametric stochastic frontier models. J Bus Econ Stat 36
Huang CJ, Liu J-T (1994) Estimation of a non-neutral stochastic frontier production function. J Prod Anal 5:171–180
Article Google Scholar
Kumbhakar SC, Ghosh S, McGuckin JT (1991) A generalized production frontier approach for estimating determinants of inefficiency in U.S. Dairy Farms. J Bus Econ Stat 9:279–286
Google Scholar
Kumbhakar SC, Parmeter CF, Tsionas EG (2013) A zero inefficiency stochastic frontier model. J Econom 172:66–76
Article Google Scholar
Meeusen W, van den Broeck J (1977) Efficiency estimation from Cobb-Douglas production functions with composed error. Int Econ Rev 18:435–444
Article Google Scholar
Olson JA, Schmidt P, Waldman DM (1980) A Monte Carlo study of estimators of stochastic frontier production functions. J Econom 13:67–82
Article Google Scholar
Reifschneider D, Stevenson R (1991) Systematic departures from the frontier: a framework for the analysis of firm inefficiency. Int Econ Rev 32:715–723
Article Google Scholar
Rho S, Schmidt P (2015) Are all firms inefficient? J Prod Anal 43:327–349
Article Google Scholar
Simar L, Wilson PW (2009) Inferences from cross-sectional stochastic frontier models. Econom Rev 29:62–98
Article Google Scholar
Stevenson RE (1980) Likelihood functions for generalized stochastic frontier estimation. J Econom 13:57–66
Article Google Scholar
Waldman DM (1982) A stationary point for the stochastic frontier likelihood. J Econom 18:275–279
Article Google Scholar
Wang H-J (2002) Heteroscedasticity and non-monotonic efficiency effects of a stochastic frontier model. J Prod Anal 18:241–253
Article Google Scholar

Download references

Author information

Authors and Affiliations

Korea Energy Economics Institute, Ulsan, South Korea
Cheol-Keun Cho
Michigan State University, East Lansing, USA
Peter Schmidt

Authors

Cheol-Keun Cho
View author publications
You can also search for this author in PubMed Google Scholar
Peter Schmidt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Schmidt.

Ethics declarations

Conflict of interest

Both authors declare that they have no conflicts of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 86 kb)

Appendices

Appendix 1 1.1 Hessian for the NLLS problem

$$ \nabla_{\alpha \alpha } {\text{SSE}} = 2n, $$

$$ \nabla_{\alpha \beta } {\text{SSE}} = 2n\bar{x}^{'} , $$

$$ \nabla_{{\alpha \sigma_{u} }} {\text{SSE}} = - 2k\mathop \sum \limits_{i} { \exp }\left( {z_{i}^{'} \delta } \right), $$

$$ \nabla_{\alpha \delta } {\text{SSE}} = - 2k\sigma_{u} \mathop \sum \limits_{i} { \exp }\left( {z_{i}^{'} \delta } \right)z_{i}^{'} , $$

$$ \nabla_{\beta \beta } {\text{SSE}} = \mathop \sum \limits_{i} x_{i} x_{i}^{'} , $$

$$ \nabla_{{\beta \sigma_{u} }} {\text{SSE}} = - 2k\mathop \sum \limits_{i} { \exp }\left( {z_{i}^{'} \delta } \right)x_{i} , $$

$$ \nabla_{\beta \delta } {\text{SSE}} = - 2k\sigma_{u} \mathop \sum \limits_{i} \exp \left( {z_{i}^{'} \delta } \right)x_{i} z_{i}^{'} , $$

$$ \nabla_{{\sigma_{u} \sigma_{u} }} {\text{SSE}} = 2k^{2} \mathop \sum \limits_{i} { \exp }\left( {2z_{i}^{'} \delta } \right), $$

$$ \nabla_{{\sigma_{u} \delta }} {\text{SSE}} = 2k\mathop \sum \limits_{i} \left( {y_{i} - \alpha - x_{i}^{'} \beta + 2k\sigma_{u} { \exp }\left( {z_{i}^{'} \delta } \right)} \right){ \exp }\left( {z_{i}^{'} \delta } \right)z_{i}^{'} , $$

$$ \nabla_{\delta \delta } {\text{SSE}} = - 2k\sigma_{u} \mathop \sum \limits_{i} \left( {y_{i} - \alpha - x_{i}^{'} \beta + 2k\sigma_{u} \exp \left( {z_{i}^{'} \delta } \right)} \right)\exp \left( {z_{i}^{'} \delta } \right)z_{i} z_{i}^{'} . $$

Appendix 2 2.1 Expression for ln $ \varvec{L}\left( {\varvec{\theta}^{\varvec{o}} } \right) $

We define $ \theta^{o} = \theta^{*} + \tau w $ = $ \left[ {\begin{array}{*{20}c} {\hat{\alpha } + \tau } \\ {\hat{\beta }} \\ {\begin{array}{*{20}c} {\tau /k} \\ {\begin{array}{*{20}c} {\tau \delta } \\ {\hat{\sigma }_{v}^{2} } \\ \end{array} } \\ \end{array} } \\ \end{array} } \right] $. We also define the following notation:

$$ \varepsilon_{i}^{o} = y_{i} - \left( {\hat{\alpha } + \tau } \right) - x_{i}^{'} \hat{\beta } = \hat{\varepsilon }_{i} - \tau , $$

$$ \left( {\sigma_{i}^{o} } \right)^{2} = \left( {\frac{\tau }{k}} \right)^{2} \exp \left( {2\tau z_{i}^{'} \delta } \right), $$

$$ \left( {\omega_{i}^{o} } \right)^{2} = \left( {\sigma_{i}^{o} } \right)^{2} + \hat{\sigma }_{v}^{2} , $$

$$ \lambda_{i}^{o} = \sigma_{i}^{o} /\hat{\sigma }_{v} , $$

$$ c_{i}^{o} = - \varepsilon_{i}^{o} \lambda_{i}^{o} /\omega_{i}^{o} . $$

Then, the log-likelihood is given by

$ \ln L\left( {\theta^{o} } \right) = $ constant $ - \mathop \sum \limits_{i} \ln \omega_{i}^{o} - \raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} \mathop \sum \limits_{i} \left[ {\left( {\varepsilon_{i}^{o} } \right)^{2} /\left( {\omega_{i}^{o} } \right)^{2} } \right] $ + $ \mathop \sum \limits_{i} \ln \varPhi \left( {c_{i}^{o} } \right) $.

This would need to be compared to $ \ln L\left( {\theta^{*} } \right) $ as given in Eq. (20) of the text. No easy comparison is possible due to the various functions (logarithm, exponential, normal cdf) present in the likelihood expression above. One could attempt Taylor series expansions of these functions around the value $ \tau = 0 $, which is similar in spirit to what was done in Waldman (1982) and what we do in Sect. 4 for the least squares problem, but that did not get us any useful result for the least squares problem, which is simpler, so we will not follow that path here.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cho, CK., Schmidt, P. The wrong skew problem in stochastic frontier models when inefficiency depends on environmental variables. Empir Econ 58, 2031–2047 (2020). https://doi.org/10.1007/s00181-018-1573-x

Download citation

Received: 02 February 2018
Accepted: 05 October 2018
Published: 16 October 2018
Issue Date: May 2020
DOI: https://doi.org/10.1007/s00181-018-1573-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The wrong skew problem in stochastic frontier models when inefficiency depends on environmental variables

Abstract

Similar content being viewed by others

Stochastic Frontier Analysis: Foundations and Advances I

Stochastic Frontier Analysis: Foundations and Advances I

Stochastic Frontier Analysis: Foundations and Advances I

1 Introduction

2 More detail on the wrong skew problem in the standard SFA model

3 The RSCFG model

4 Nonlinear least squares estimation of the RSCFG model

Result 1

Proof

Result 2

Proof

5 MLE of the RSCFG model

Result 3

Proof

Result 4

Proof

6 Practical implications

7 Concluding remarks

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Electronic supplementary material

Supplementary material 1 (PDF 86 kb)

Appendices

Appendix 1

1.1 Hessian for the NLLS problem

Appendix 2

2.1 Expression for ln \( \varvec{L}\left( {\varvec{\theta}^{\varvec{o}} } \right) \)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The wrong skew problem in stochastic frontier models when inefficiency depends on environmental variables

Abstract

Similar content being viewed by others

Stochastic Frontier Analysis: Foundations and Advances I

Stochastic Frontier Analysis: Foundations and Advances I

Stochastic Frontier Analysis: Foundations and Advances I

1 Introduction

2 More detail on the wrong skew problem in the standard SFA model

3 The RSCFG model

4 Nonlinear least squares estimation of the RSCFG model

Result 1

Proof

Result 2

Proof

5 MLE of the RSCFG model

Result 3

Proof

Result 4

Proof

6 Practical implications

7 Concluding remarks

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Electronic supplementary material

Supplementary material 1 (PDF 86 kb)

Appendices

Appendix 1

1.1 Hessian for the NLLS problem

Appendix 2

2.1 Expression for ln \( \varvec{L}\left( {\varvec{\theta}^{\varvec{o}} } \right) \)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation