1 Introduction

Consider the stochastic frontier model

$$ y_{i} = \alpha + x_{i}^{'} \beta + v_{i} - u_{i} = \alpha + x_{i}^{'} \beta + \varepsilon_{i} , $$
(1)

where \( x_{i} \) is “fixed” (independent of \( v_{i} \) and \( u_{i} ) \); \( v_{i} \) is distributed as \( N\left( {0,\sigma_{v}^{2} } \right) \); \( u_{i} \) is distributed as \( N^{ + } \left( {0,\sigma_{u}^{2} } \right) \), i.e., “half normal”; and \( v_{i} \) and \( u_{i} \) are independent. We will sometimes use the standard notation that \( \sigma^{2} = \sigma_{v}^{2} + \sigma_{u}^{2} \) and \( \lambda = \sigma_{u} /\sigma_{v} \). This is the model of Aigner et al. (1977) and Meeusen and van den Broeck (1977), and we will call it the standard SFA model.

The distribution of \( \varepsilon_{i} \) is “skew normal,” and the model can be estimated by MLE using the skew normal density. An alternative method of estimation is corrected OLS (COLS), which was suggested by Aigner, Lovell and Schmidt and further analyzed in Olson et al. (1980) and Waldman (1982), and which will be described below.

Under the assumptions made above, the third moment of \( \varepsilon_{i} \) is negative. However, its sample equivalent, the third moment of the OLS residuals, can be positive. This is the so-called wrong skew case. In the wrong skew case, two problems arise. The first, noted in the original Aigner, Lovell and Schmidt article, is that the COLS estimate does not exist. The second and more subtle problem, or set of problems, was noted by Waldman. The likelihood always (regardless of wrong or right skew) has a stationary point at the parameter values that reflect no inefficiency (\( \lambda = 0 \) and other parameters = the OLS estimates). At that stationary point, the Hessian is always singular. In the wrong skew case, this point is a local maximum of the likelihood function, and empirically in this case, this point is also the global maximum. Thus, with a positive probability the data will indicate no inefficiency. This complicates the process of inference, see Simar and Wilson (2009).

The question this paper addresses is whether similar problems occur in models in which the distribution of technical inefficiency depends on observable “environmental” variables. Specifically, we will consider a form of the RSCFG model of Reifschneider and Stevenson (1991), Caudill and Ford (1993) and Caudill et al. (1995). Here, we obtain results that are similar to but different from those for the standard SFA model. There is no COLS estimator, but the model can be estimated by nonlinear least squares, and the least squares criterion function always has a stationary point at the parameter values that reflect no inefficiency. The model can also be estimated by MLE, and the likelihood always has a stationary point and the Hessian is singular at the parameter values that reflect no inefficiency. In general, these stationary points are neither a local minimum nor a local maximum of the relevant criterion function (sum of squares or likelihood). None of these statements have any connection to the skew of the residuals.

This paper does not aim to give advice about how to proceed if in a particular data set the stationary point is the global minimum of the NLLS criterion or the maximum of the log-likelihood. Nor do we attempt to construct models in which a skew of the residuals of either sign is not wrong. Some papers that do these things will be discussed in the final section of the paper. We simply ask the questions of whether, for a given criterion function, there is always a stationary point and whether we can say that it is or is not a local maximum or minimum. These are rather specific questions, but as we will see it is not trivial to answer them.

2 More detail on the wrong skew problem in the standard SFA model

We will now be a little more precise than in the previous section about the nature of the wrong skew problem in the standard SFA model. We do this so that it is clear what results we might hope or expect to generalize the RSCFG model.

The model is as given in Eq. (1). It is well known that \( E\left( u \right) \equiv \mu = \sqrt {\frac{2}{\pi }} \sigma_{u} \) so that \( E\left( \varepsilon \right) = - \mu \). Also \( {\text{var}}\left( u \right) = \frac{\pi - 2}{\pi }\sigma_{u}^{2} \), \( E\left( {u^{2} } \right) = \sigma_{u}^{2} \) and \( \mu_{3}^{\prime} \equiv E(\epsilon+\mu)^{3} \) = \( - E\left( {u - \mu } \right)^{3} \) = \( \sqrt {\frac{2}{\pi }} \left( {\frac{\pi - 4}{\pi }} \right)\sigma_{u}^{3} \). Note that \( \mu_{3}^{'} \le 0 \). OLS implicitly estimates \( \left( {\alpha - \mu } \right) \) and \( \beta \), and the OLS residuals \( e_{i} \) correspondingly are “estimates” of \( v_{i} - \left( {u_{i} - \mu } \right) \). So

$$ \hat{\sigma }^{2} \equiv \frac{1}{N}\mathop \sum \limits_{i} e_{i}^{2} \to_{p} \sigma^{2} = \sigma_{v}^{2} + \sigma_{u}^{2} \,{\text{and}}\,\hat{\mu }_{3}^{'} \equiv \frac{1}{N}\mathop \sum \limits_{i} e_{i}^{3} \to_{p} \mu_{3}^{'} = \sqrt {\frac{2}{\pi }} \left( {\frac{\pi - 4}{\pi }} \right)\sigma_{u}^{3} . $$
(2)

Therefore, we obtain consistent estimates of \( \sigma_{u}^{2} \) and \( \sigma_{v}^{2} \) as

$$ \hat{\sigma }_{u}^{2} = \left[ {\sqrt {\frac{\pi }{2}} \left( {\frac{\pi }{\pi - 4}} \right)\hat{\mu }_{3}^{'} } \right]^{2/3} ,\quad \hat{\sigma }_{v}^{2} = \hat{\sigma }^{2} - \hat{\sigma }_{u}^{2} . $$
(3)

If \( \hat{\alpha },\hat{\alpha } \) are the least squares estimates, the COLS estimates of \( \alpha \) and \( \beta \) are \( \tilde{\alpha } = \hat{\alpha } + \sqrt {\frac{2}{\pi }} \hat{\sigma }_{u} \) and \( \tilde{\beta } = \hat{\beta } \). However, our interest in this paper is just in the estimates of \( \sigma_{u}^{2} \) and \( \sigma_{v}^{2} \) (as given in Eq. (3)) themselves. In the wrong skew case that \( \hat{\mu }_{3}^{'} > 0 \), in Eq. (3) we have \( \hat{\sigma }_{u} < 0 \) and \( \hat{\sigma }_{u}^{2} \) is not really well defined. So the COLS method fails.

With respect to the MLE, the situation is more complicated. Waldman (1982) showed that the point \( \hat{\alpha },\hat{\beta } = \) OLS, \( \hat{\sigma }_{u}^{2} = 0 \), \( \hat{\sigma }_{v}^{2} = \hat{\sigma }^{2} \) is always a stationary point of the likelihood. He also showed that the information matrix is singular at this point. Finally, Waldman showed that the stationary point given above is a local maximum of the likelihood when \( \hat{\mu }_{3}^{'} > 0 \). It is generally thought that it is also the global maximum.

The wrong skew problem occurs most frequently when the sample size is small and the population value of \( \lambda = \sigma_{u} /\sigma_{v} \) is small. Many people might find the frequency with which it occurs to be surprising. For example, Simar and Wilson (2009, Table 1, p. 71) report simulations in which the probability of a wrong skew is 0.301 when \( n \) = 100 and \( \lambda \) = 1; it is 0.320 when \( n \) = 500 and \( \lambda \) = 0.5; and it is 0.386 when \( n \) = 10,000 and \( \lambda \) = 0.1.

3 The RSCFG model

We will now consider the case that the distribution of technical inefficiency (\( u_{i} \)) depends on some observable “environmental variables” \( z_{i} \) that may or may not affect the level of the frontier but that do affect the level of technical inefficiency. A possible example of such a variable in an agricultural setting would be ownership of the farm (private vs. state-owned).

The most commonly assumed case is that the distribution of \( u_{i} \) is truncated normal. In standard notation, \( u_{i} \) is distributed as \( N^{ + } \left( {\mu_{i} ,\sigma_{i}^{2} } \right) \). When \( \mu_{i} = 0 \) and \( \sigma_{i}^{2} \) is constant (does not depend on i), we have the standard stochastic frontier model of the previous section. When \( \mu_{i} \) and \( \sigma_{i}^{2} \) are constant, we have the truncated normal model of Stevenson (1980). However, here we are interested in models in which \( \mu_{i} \) and/or \( \sigma_{i}^{2} \) depend on environmental variables \( z_{i} \). For example, in the RSCFG model of Reifschneider and Stevenson (1991), Caudill and Ford (1993) and Caudill et al. (1995), \( \mu_{i} = 0 \) and \( \sigma_{i}^{2} \) is a function of \( z_{i} \) and some parameters. In the KGMHLBC model of Kumbhakar et al. (1991), Huang and Liu (1994) and Battese and Coelli (1995), \( \sigma_{i}^{2} \) is constant (does not depend on i) and \( \mu_{i} \) is a function of \( z_{i} \) and parameters. In the model of Wang (2002), both \( \mu_{i} \) and \( \sigma_{i}^{2} \) depend on \( z_{i} \) and parameters. In the model of Alvarez et al. (2006), there is a “scaling function” \( g\left( {z_{i} ,\theta } \right) \) such that \( \mu_{i} = \mu \cdot g\left( {z_{i} ,\theta } \right) \) and \( \sigma_{i} = \sigma \cdot g\left( {z_{i} ,\theta } \right). \) A related model is the model of Amsler et al. (2015) in which the post-truncation mean and variance of \( u_{i} \) are parameterized.

In this paper, we will consider the specific case of the RSCFG model with \( \sigma_{i} = \sigma_{u} { \exp }(z_{i}^{'} \delta \)). We treat \( x_{i} \) and \( z_{i} \) as “fixed” (independent of \( v_{i} \) and \( u_{i} ) \) so that the model is:

$$ y_{i} = \alpha + x_{i}^{'} \beta + v_{i} - u_{i} ,\;v_{i} \sim N\left( {0,\sigma_{v}^{2} } \right),\;u_{i} \sim N^{ + } \left( {0,\sigma_{u}^{2} { \exp }\left( {2z_{i}^{'} \delta } \right)} \right). $$
(4)

This is a straightforward extension of the standard stochastic frontier model because \( u_{i} \) is still half normal. We will use the notation \( d_{z} \) = dimension(\( z_{i} \)) and \( d_{x} \) = dimension(\( x_{i} \)).

4 Nonlinear least squares estimation of the RSCFG model

Given the RSCFG model with \( \sigma_{i} = \sigma_{u} { \exp }(z_{i}^{'} \delta \)), we have \( E\left( {y_{i} |x_{i} ,z_{i} } \right) = \alpha + x_{i}^{'} \beta - \sqrt {\frac{2}{\pi }} \sigma_{u} { \exp }\left( {z_{i}^{'} \delta } \right) \). This suggests a nonlinear least squares (NLLS) estimator that minimizes (with respect to \( \alpha ,\beta ,\sigma_{u} \) and \( \delta \)) the criterion function

$$ {\text{SSE}} = \mathop \sum \limits_{i} \left[ {y_{i} - \alpha - x_{i}^{'} \beta + \sqrt {\frac{2}{\pi }} \sigma_{u} { \exp }\left( {z_{i}^{'} \delta } \right)} \right]^{2} . $$
(5)

We will denote the NLLS estimates as \( \tilde{\alpha },\tilde{\beta },\tilde{\sigma }_{u} \) and \( \tilde{\delta } \).

We can note a few points about identification of the model based on this criterion function. First, obviously \( \delta \) is not identified when \( \sigma_{u} = 0 \). Second, \( \alpha \) and \( \sigma_{u} \) are not separately identified when \( \delta = 0 \). Third, this criterion function does not lead directly to an estimate of \( \sigma_{v}^{2} \), but we can derive an estimate based on the NLLS estimates. Note that if \( u_{i} \) ~ \( N^{ + } (0,\sigma_{i}^{2} \)), then \( E\left( {u_{i}^{2} } \right) = \)\( \sigma_{i}^{2} \) and so \( E\left( {\varepsilon_{i}^{2} } \right) = \)\( \sigma_{v}^{2} + \sigma_{i}^{2} \) = \( \sigma_{v}^{2} + \sigma_{u}^{2} { \exp }\left( {2z_{i}^{'} \delta } \right) \). If \( \tilde{\varepsilon }_{i} = y_{i} - \tilde{\alpha } - x_{i}^{'} \tilde{\beta } \), this leads to the estimate

$$ \tilde{\sigma }_{v}^{2} = \frac{1}{n}\mathop \sum \limits_{i} \tilde{\varepsilon }_{i}^{2} - \tilde{\sigma }_{u}^{2} \frac{1}{n}\mathop \sum \limits_{i} { \exp }\left( {2z_{i}^{'} \tilde{\delta }} \right). $$
(6)

This is similar in spirit to the estimator given in (3) for the standard SFA model.

Let \( \psi = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} \alpha \\ \beta \\ \end{array} } \\ {\sigma_{u} } \\ \delta \\ \end{array} } \right] \), the parameters that we seek to estimate by NLLS. Let \( \psi^{*} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\hat{\alpha }} \\ {\hat{\beta }} \\ \end{array} } \\ 0 \\ 0 \\ \end{array} } \right] \), where \( \hat{\alpha },\hat{\beta } \) = OLS of \( y \) on intercept and \( x \). Obviously, \( \psi^{*} \) is a set of parameters that indicate no inefficiency. Note that we carefully said “a set of parameters that indicate no inefficiency” rather than “the set of parameters that indicate no inefficiency” because when \( \sigma_{u} = 0 \) we have no inefficiency regardless of the value of \( \delta \).

Result 1

The criterion SSE given in Eq. (5) has a stationary point at \( {\psi}^{{*}} . \)

Proof

The derivatives of the NLLS criterion function with respect to the parameters in \( \psi \) are:

$$ \nabla_{\alpha } {\text{SSE}} = - 2\mathop \sum \limits_{i} \left[ {y_{i} - \alpha - x_{i}^{'} \beta + \sqrt {\frac{2}{\pi }} \sigma_{u} { \exp }\left( {z_{i}^{'} \delta } \right)} \right], $$
(7A)
$$ \nabla_{\beta } {\text{SSE}} = - 2\mathop \sum \limits_{i} \left\{ {\left[ {y_{i} - \alpha - x_{i}^{'} \beta + \sqrt {\frac{2}{\pi }} \sigma_{u} { \exp }\left( {z_{i}^{'} \delta } \right)} \right]x_{i} } \right\}, $$
(7B)
$$ \nabla_{{\sigma_{u} }} {\text{SSE}} = 2\sqrt {\frac{2}{\pi }} \mathop {\sum \left\{ {\left[ {y_{i} - \alpha - x_{i}^{'} \beta + \sqrt {\frac{2}{\pi }} \sigma_{u} { \exp }\left( {z_{i}^{'} \delta } \right)} \right] {\text{exp}}\left( {z_{i}^{'} \delta } \right)} \right\}}\limits_{i} , $$
(7C)
$$ \nabla_{\delta } {\text{SSE}} = 2\sqrt {\frac{2}{\pi }} \sigma_{u} \mathop \sum \limits_{i} \left\{ {\left[ {y_{i} - \alpha - x_{i}^{'} \beta + \sqrt {\frac{2}{\pi }} \sigma_{u} { \exp }\left( {z_{i}^{'} \delta } \right)} \right] {\text{exp}}\left( {z_{i}^{'} \delta } \right)z_{i} } \right\}. $$
(7D)

These derivatives are all zero at \( \psi^{*} \). (The derivatives in (7A), (7B) and (7D) equal zero when \( \alpha = \hat{\alpha } \), \( \beta = \hat{\beta } \) and \( \sigma_{u} = 0 \) regardless of the value of \( \delta \), but \( \delta = 0 \) is required for the derivative in (7C) to equal zero.) So this point is a stationary point of the NLLS criterion function.

Next we ask whether there is any readily interpretable condition (analogous to “wrong skew” in the standard SFA model) such that this stationary point is a local minimum of the NLLS criterion function. The Hessian (second derivative) matrix \( H \) is messy and is given in Appendix 1. The Hessian evaluated at the stationary point \( \psi^{*} \) is simpler and is given by:

$$ \frac{1}{2}H\left( {\psi^{*} } \right) = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} n & {n\bar{x}^{'} } \\ {n\bar{x}} & {\mathop \sum \limits_{i} x_{i} x_{i}^{'} } \\ \end{array} } & {\begin{array}{*{20}c} { - kn} & 0 \\ { - kn\bar{x}} & 0 \\ \end{array} } \\ {\begin{array}{*{20}c} { - kn} & { - kn\bar{x}^{'} } \\ 0 & 0 \\ \end{array} } & {\begin{array}{*{20}c} {k^{2} n} & { 0} \\ { 0} & { 0} \\ \end{array} } \\ \end{array} } \right]. $$
(8)

In this expression, \( k = \sqrt {2/\pi } \). The four block rows (and columns) correspond to \( \alpha ,\beta ,\sigma_{u} \) and \( \delta \), and they are of dimension 1, \( d_{x} \), 1 and \( d_{z} \), respectively. The matrix is singular, with rank equal to \( d_{x} + 1 \), so there are \( d_{z} + 1 \) eigenvalues equal to zero.

The Hessian \( H\left( {\psi^{*} } \right) \) is positive semi-definite, which is a necessary condition for \( \psi^{*} \) to be a local minimum. However, because the Hessian is singular, this is not a sufficient condition. As in Waldman (1982), we need to examine the values of the criterion function in the directions that correspond to the eigenvectors associated with the zero eigenvalues. To elaborate on this point, consider the Taylor series expansion of \( {\text{SSE}} \) around the point \( \psi^{*} \):

$$\begin{aligned} {\text{SSE}}\left( \psi \right) - {\text{SSE}}\left( {\psi^{*} } \right) &= \left( {\nabla_{\psi } {\text{SSE}}\left( {\psi^{*} } \right)} \right)^{'} \left( {\psi - \psi^{*} } \right) + \raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} \left( {\psi - \psi^{*} } \right)^{'} H\left( {\psi^{*} } \right)\left( {\psi - \psi^{*} } \right)\\ &\quad + {\text{higher-order}}\;{\text{terms}}.\end{aligned} $$
(9)

The first term on the right-hand side of (9) equals zero because \( \nabla_{\psi } {\text{SSE}}\left( {\psi^{*} } \right) = 0 \), so we consider the second term, \( \left( {\psi - \psi^{*} } \right)^{'} H\left( {\psi^{*} } \right)\left( {\psi - \psi^{*} } \right) \). A necessary condition for \( \psi^{*} \) to be a local minimum is that \( H\left( {\psi^{*} } \right) \) is positive semi-definite, since, if there exists a vector \( g \) such that \( g^{\prime}H\left( {\theta^{*} } \right)g < \) 0, then for small enough scalar τ > 0, \( \psi = \psi^{*} + \tau g \) will lead to a smaller \( {\text{SSE}} \) than \( \psi^{*} \). However, for vectors that are linear combinations of the eigenvectors corresponding to the zero eigenvalues (i.e., vectors in the null space of \( H\left( {\psi^{*} } \right) \)) the second-order term above is zero, and we need to investigate the behavior of the criterion function in those directions, since the higher-order terms could be of either sign.

The eigenvectors that correspond to the zero eigenvalues, and which span the null space of \( H\left( {\psi^{*} } \right) \), are as follows:

$$ \left[ {\begin{array}{*{20}c} 1 \\ 0 \\ {\begin{array}{*{20}c} {1/k} \\ 0 \\ \end{array} } \\ \end{array} } \right],\left[ {\begin{array}{*{20}c} 0 \\ 0 \\ {\begin{array}{*{20}c} 0 \\ {\iota_{1} } \\ \end{array} } \\ \end{array} } \right],\left[ {\begin{array}{*{20}c} 0 \\ 0 \\ {\begin{array}{*{20}c} 0 \\ {\iota_{2} } \\ \end{array} } \\ \end{array} } \right], \ldots ,\left[ {\begin{array}{*{20}c} 0 \\ 0 \\ {\begin{array}{*{20}c} 0 \\ {\iota_{{d_{z} }} } \\ \end{array} } \\ \end{array} } \right]. $$
(10)

Here \( \frac{1}{k} = \sqrt {\frac{\pi }{2}} \); \( \iota_{j} \) is a vector of zeroes except for a one in position \( j \), and the dimensions of the four blocks in the vectors in (10) are 1, \( d_{x} \), 1 and \( d_{z} \). Therefore, a vector in the null space of \( H\left( {\psi^{*} } \right) \) will be of the form \( w = \left[ {\begin{array}{*{20}c} 1 \\ 0 \\ {\begin{array}{*{20}c} {1/k} \\ \delta \\ \end{array} } \\ \end{array} } \right] \) where in this expression \( \delta \) is arbitrary (not necessarily the true parameter value).

Now we consider what happens to the least squares criterion function for small moves in this direction. That is, we consider a parameter value \( \psi^{o} = \psi^{*} + \tau w \) for small \( \tau > 0 \). (We require \( \tau > 0 \) because \( \sigma_{u} > 0.) \) Then, we calculate the following:

$$ {\text{SSE}}\left( {\psi^{*} } \right) = {\text{usual}}\,{\text{ least}}\,{\text{squares}}\,{\text{SSE}} = \mathop \sum \limits_{i} e_{i}^{2} ,e_{i} = {\text{OLS}}\,{\text{residuals,}} $$
(11A)
$$ {\text{SSE}}\left( {\psi^{o} } \right) = \mathop \sum \limits_{i} \left[ {e_{i} + \tau \left( {\exp \left( {\tau z_{i}^{'} \delta } \right) - 1} \right)} \right]^{2} , $$
(11B)
$$ \Delta \equiv {\text{SSE}}\left( {\psi^{o} } \right) - {\text{SSE}}\left( {\psi^{*} } \right) = \tau^{2} \mathop \sum \limits_{i} \left[ {\exp \left( {\tau z_{i}^{'} \delta } \right) - 1} \right]^{2} + 2\tau \mathop \sum \limits_{i} e_{i} \exp \left( {\tau z_{i}^{'} \delta } \right). $$
(11C)

For local movements (small \( \tau \)), the term of order \( \tau \) will dominate the term of order \( \tau^{2} \), and so a necessary and sufficient condition for \( \psi^{*} \) to be a local minimum of \( {\text{SSE}} \) is:

$$ 2\tau \mathop \sum \limits_{i} e_{i} { \exp }\left( {\tau z_{i}^{'} \delta } \right) \ge 0\,{\text{for}}\,{\text{all}}\,\delta . $$
(12)

This is a strong requirement that intuitively should not be expected to hold. And it does not, as the following result shows.

Result 2

If \( \mathop \sum \limits_{i} z_{i}^{'} e_{i} \ne 0, \) the point \( \psi^{*} \) is neither a local minimum nor a local maximum of \( {\text{SSE}} . \)

Proof

We have \( \Delta \) = \( 2\tau \mathop \sum \limits_{i} e_{i} \exp \left( {\tau z_{i}^{'} \delta } \right) + \tau^{2} \mathop \sum \limits_{i} \left[ {\exp \left( {\tau z_{i}^{'} \delta } \right) - 1} \right]^{2} \) as given in Eq. (11C). (For small \( \tau \), the first term will be the one that matters, which the calculation we are about to do will verify.) We make use of the Taylor series expansion \( \exp \left( x \right) = 1 + x + \raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} x^{2} \) + h.o.t. (higher-order terms), which yields

$$ \Delta = 2\tau \mathop \sum \limits_{i} e_{i} [1 + \tau z_{i}^{'} \delta + \raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} \tau^{2} \left( {z_{i}^{'} \delta } \right)^{2} ] + \tau^{2} \mathop \sum \limits_{i} \left[ {\tau z_{i}^{'} \delta + \raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} \tau^{2} \left( {z_{i}^{'} \delta } \right)^{2} } \right]^{2} + {\text{ h}}.{\text{o}}.{\text{t}}. $$
(13)

Since \( \mathop \sum \nolimits_{i} e_{i} = 0 \), when \( \mathop \sum \nolimits_{i} z_{i}^{'} e_{i} \ne 0 \) the dominant term is \( 2\tau^{2} \mathop \sum \nolimits_{i} \left( {z_{i}^{'} e_{i} } \right)\delta \).

This term can be made of either sign by appropriate choice of \( \delta \). For example, suppose that the first nonzero element of \( \mathop \sum \nolimits_{i} z_{i}^{'} e_{i} \) is in position “j” and that it is positive. Then, if we pick \( \delta \) to be equal to zero except for a value of one in position “j,” this term will be positive, and if instead we put a value of minus one in position “j,” the term will be negative. (Reverse the signs when the first nonzero element of \( \mathop \sum \nolimits_{i} z_{i}^{'} e_{i} \) is negative.) So \( \Delta \) can be of either sign, and the condition in Eq. (12) cannot hold.

If \( \mathop \sum \nolimits_{i} z_{i}^{'} e_{i} = 0 \), the dominant term in the expression for \( \Delta \) will be \( \tau^{3} \mathop \sum \nolimits_{i} e_{i} \left( {z_{i}^{'} \delta } \right)^{2} \), and we cannot find anything useful to say about the sign of that term. However, we note that \( \mathop \sum \nolimits_{i} z_{i}^{'} e_{i} = \) 0 is an event of probability zero unless \( z_{i} \) is made up of linear combinations of \( x_{i} \). This is formally possible but very unlikely in empirical practice, so this case is not of particular practical importance.

5 MLE of the RSCFG model

We now consider the MLE of the RSCFG model. The model is as given in Eq. (4).

We define the following notation:

$$ \begin{aligned} \sigma_{i} & = \sigma_{u} { \exp }\left( {z_{i}^{'} \delta } \right),\,\lambda_{i} = \sigma_{i} /\sigma_{v} ,\,\omega_{i}^{2} = \sigma_{i}^{2} + \sigma_{v}^{2} ,\,c_{i} = - \varepsilon_{i} \lambda_{i} /\omega_{i} , \\ \varphi_{i} & = \varphi \left( {c_{i} } \right),\,\varPhi_{i} = \varPhi \left( {c_{i} } \right), \\ \end{aligned} $$
(14)

where \( \varphi \) is the standard normal density and \( \varPhi \) is the standard normal cdf. Then, the log-likelihood is given by

$$ \ln L = {\text{constant}}\, - \frac{1}{2}\mathop \sum \limits_{i} \ln \omega_{i}^{2} - \frac{1}{2}\mathop \sum \limits_{i} \frac{{\varepsilon_{i}^{2} }}{{\omega_{i}^{2} }} + \mathop \sum \limits_{i} \ln \varPhi_{i} , $$
(15)

where \( \varepsilon_{i} = y_{i} - \alpha - x_{i}^{'} \beta \).

Let \( \theta = \left[ {\begin{array}{*{20}c} \alpha \\ \beta \\ {\begin{array}{*{20}c} {\sigma_{u} } \\ \delta \\ {\sigma_{v}^{2} } \\ \end{array} } \\ \end{array} } \right] \), the vector of the parameters we wish to estimate. This differs from \( \psi \) of the previous section because it includes \( \sigma_{v}^{2} \). Let \( \theta^{*} = \left[ {\begin{array}{*{20}c} {\hat{\alpha }} \\ {\hat{\beta }} \\ {\begin{array}{*{20}c} 0 \\ 0 \\ {\hat{\sigma }_{v}^{2} } \\ \end{array} } \\ \end{array} } \right] \), our potential stationary point, where \( \hat{\alpha } \) and \( \hat{\beta } \) are the OLS estimates and \( \hat{\sigma }_{v}^{2} = \frac{1}{n}\mathop \sum \nolimits_{i} e_{i}^{2} \) where the \( e_{i}^{2} \) are the OLS residuals. Then, we have the following result.

Result 3

The log-likelihood given in Eq. (14) has a stationary point at \({\theta}^{{*}} . \)

Proof

The derivatives of the log-likelihood with respect to the parameters in \( \theta \) are:

$$ \nabla_{\alpha } \ln L = \mathop \sum \limits_{i} \left[ {\frac{{\varepsilon_{i} }}{{\omega_{i}^{2} }} + \frac{{\varphi_{i} }}{{\varPhi_{i} }}\frac{{\lambda_{i} }}{{\omega_{i} }}} \right], $$
(16A)
$$ \nabla_{\beta } \ln L = \mathop \sum \limits_{i} \left[ {\frac{{\varepsilon_{i} }}{{\omega_{i}^{2} }} + \frac{{\varphi_{i} }}{{\varPhi_{i} }}\frac{{\lambda_{i} }}{{\omega_{i} }}} \right]x_{i} , $$
(16B)
$$ \nabla_{{\sigma_{u} }} \ln L = \mathop \sum \limits_{i} \left[ { - \frac{{\sigma_{u} }}{{\omega_{i}^{2} }}\exp \left( {2z_{i}^{'} \delta } \right) + \frac{{\sigma_{u} }}{{\omega_{i}^{4} }}{ \exp }\left( {2z_{i}^{'} \delta } \right)\varepsilon_{i}^{2} - \frac{{\varphi_{i} }}{{\varPhi_{i} }}\frac{1}{{\sigma_{v} }}\frac{{\varepsilon_{i} }}{{\omega_{i} }}{ \exp }\left( {z_{i}^{'} \delta } \right) + \sigma_{u} \frac{{\varphi_{i} }}{{\varPhi_{i} }}\frac{{\lambda_{i} \varepsilon_{i} }}{{\omega_{i}^{3} }}{ \exp }\left( {2z_{i}^{'} \delta } \right)} \right], $$
(16C)
$$ \nabla_{\delta } \ln L = \mathop \sum \limits_{i} z_{i} \left[ { - \frac{{\varphi_{i} }}{{\varPhi_{i} }}\frac{{\varepsilon_{i} \lambda_{i} }}{{\omega_{i} }} - \frac{{\sigma_{u}^{2} }}{{\omega_{i}^{2} }}\exp \left( {2z_{i}^{'} \delta } \right) + \sigma_{u}^{2} \frac{{\varepsilon_{i}^{2} }}{{\omega_{i}^{4} }}\exp \left( {2z_{i}^{'} \delta } \right) + \sigma_{u}^{2} \frac{{\varphi_{i} }}{{\varPhi_{i} }}\frac{{\lambda_{i} \varepsilon_{i} }}{{\omega_{i}^{3} }}{ \exp }\left( {2z_{i}^{'} \delta } \right)} \right], $$
(16D)
$$ \nabla_{{\sigma_{v}^{2} }} \ln L = \frac{1}{2}\mathop \sum \limits_{i} \left[ {\frac{{\varphi_{i} }}{{\varPhi_{i} }}\frac{1}{{\sigma_{v}^{2} }}\frac{{\lambda_{i} \varepsilon_{i} }}{{\omega_{i} }} - \frac{1}{{\omega_{i}^{2} }} + \frac{{\varepsilon_{i}^{2} }}{{\omega_{i}^{4} }} + \frac{{\varphi_{i} }}{{\varPhi_{i} }}\frac{{\lambda_{i} \varepsilon_{i} }}{{\omega_{i}^{3} }}} \right]. $$
(16E)

At \( \theta^{*} \), the following simplifications occur: \( \sigma_{u} = 0 \), \( \lambda_{i} = 0,c_{i} = 0,\omega_{i}^{2} = \sigma_{v}^{2} \), \( \frac{{\varphi_{i} }}{{\varPhi_{i} }} = \sqrt {\frac{2}{\pi }} \) and \( { \exp }\left( {z_{i}^{'} \delta } \right) \) = 1. Then, we have \( \nabla_{\alpha } \ln L = \frac{1}{{\sigma_{v}^{2} }}\mathop \sum \nolimits_{i} \varepsilon_{i} \), \( \nabla_{\beta } \ln L = \frac{1}{{\sigma_{v}^{2} }}\mathop \sum \nolimits_{i} x_{i} \varepsilon_{i} \), \( \nabla_{{\sigma_{u} }} \ln L = - \sqrt {\frac{2}{\pi }} \frac{1}{{\sigma_{v}^{2} }}\mathop \sum \nolimits_{i} \varepsilon_{i} \), \( \nabla_{\delta } \ln L = 0 \) and \( \nabla_{{\sigma_{v}^{2} }} \ln L = \frac{1}{2}\mathop \sum \nolimits_{i} \left[ { - \frac{1}{{\sigma_{v}^{2} }} + \frac{1}{{\sigma_{v}^{4} }}\varepsilon_{i}^{2} } \right] \). All of these expressions equal zero when evaluated at the OLS values \( e_{i} \) and \( \hat{\sigma }_{v}^{2} \).

Next we ask whether we can identify cases such that this may be a local maximum of the likelihood. We use a Taylor series expansion (similar to Eq. (9)) of \( \ln L \) around the point \( \theta^{*} \):

$$\begin{aligned} \ln L\left( \theta \right) - \ln L\left( {\theta^{*} } \right) &= (\nabla_{\theta } \ln L\left( {\theta^{*} } \right))^{'} (\theta - \theta^{*} ) + \raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} \left( {\theta - \theta^{*} } \right)^{'} H\left( {\theta^{*} } \right)\left( {\theta - \theta^{*} } \right)\\ &\quad + {\text{ higher-order}}\,{\text{terms}} .\end{aligned}$$
(17)

Here \( H\left( {\theta^{*} } \right) \) is the Hessian (second derivative matrix) evaluated at \( \theta^{*} \). The first term on the right-hand side of (17) equals zero because \( \nabla_{\theta } \ln L\left( {\theta^{*} } \right) = 0 \), so we need to consider the second term, \( \left( {\theta - \theta^{*} } \right)^{'} H\left( {\theta^{*} } \right)\left( {\theta - \theta^{*} } \right) \). A necessary condition for \( \theta^{*} \) to be a local maximum of \( \ln L \) is that \( H\left( {\theta^{*} } \right) \) is negative semi-definite, since, if there exists a vector \( \nu \) such that \( \nu^{\prime}H\left( {\theta^{*} } \right)\nu > \) 0, then for small enough scalar \( \tau > 0 \), \( \theta = \theta^{*} + \tau \nu \) will lead to a higher likelihood value than \( \theta^{*} \).

The Hessian is complicated and is given in a supplemental file available from the authors. However, at \( \theta^{*} \) the simplifications listed above hold, and we obtain the following expression for the Hessian evaluated at \( \theta^{*} \).

$$ H = - \frac{1}{{\sigma_{v}^{2} }}\left[ {\begin{array}{*{20}c} G & 0 \\ 0 & {\frac{n}{{2\sigma_{v}^{2} }}} \\ \end{array} } \right], $$
(18A)
$$ G = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {A_{11} } & { A_{12} } \\ {A_{21} } & { A_{22} } \\ \end{array} } & {\begin{array}{*{20}c} { - kA_{11} } & 0 \\ { - kA_{21} } & 0 \\ \end{array} } \\ {\begin{array}{*{20}c} { - kA_{11} } & { - kA_{12} } \\ 0 & 0 \\ \end{array} } & {\begin{array}{*{20}c} { k^{2} A_{11} } & \xi \\ { \xi^{\prime}} & 0 \\ \end{array} } \\ \end{array} } \right]. $$
(18B)

Here, \( k= \sqrt {\frac{2}{\pi }} \); \( \xi = k\mathop \sum \limits_{i} z_{i}^{'} e_{i} \) (a row vector); and

$$ \left[ {\begin{array}{*{20}c} {A_{11} } & {A_{12} } \\ {A_{21} } & {A_{22} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} n & {n\bar{x}^{'} } \\ {n\bar{x}} & {\mathop \sum \limits_{i} x_{i} x_{i}^{'} } \\ \end{array} } \right] $$
(18C)

which is positive definite.

Suppose first that \( \mathop \sum \nolimits_{i} z_{i}^{'} e_{i} \ne 0 \) (\( \xi \ne 0) \), which will be the case with probability one unless \( z_{i} \) is made up of linear combinations of \( x_{i} \). When \( \xi \ne 0 \), \( G \) and \( H \) are nonsingular if \( d_{z} \) = 1 (so that \( \xi \) is a scalar), and they are singular if \( d_{z} \ge \) 2. Regardless of the dimension of \( \xi \), \( H \) will be negative semi-definite if and only if \( G \) is positive semi-definite. We now show that this is not the case.

Result 4

If \( \varvec{\xi}\ne 0 \), then \( \varvec{G} \) is neither positive semi-definite nor negative semi-definite.

Proof

(i) Let \( \eta_{1} = \left[ {k,0,1,\xi } \right]' \). Then, \( \eta_{1}^{'} G\eta_{1} = 2\xi \xi^{\prime} > 0 \) so \( G \) cannot be negative semi-definite. (ii) Now let \( \eta_{2} = \left[ {k,0,1, - \xi } \right]' \). Then, \( \eta_{2}^{'} G\eta_{2} = - 2\xi \xi^{\prime} < 0 \) so \( G \) cannot be positive semi-definite.

This shows that when \( \mathop \sum \nolimits_{i} z_{i}^{'} e_{i} \ne 0 \), \( \theta^{*} \) cannot be a local maximum of the likelihood. (The necessary condition for \( \theta^{*} \) to be a local maximum, namely that \( H \) should be negative semi-definite, fails when \( G \) is not positive semi-definite.) It also cannot be a local minimum.

Finally, we consider the case that \( \xi = 0 \) (\( \mathop \sum \nolimits_{i} z_{i}^{'} e_{i} = 0) \), which essentially means that \( z_{i} \) is made up of linear combinations of \( x_{i} \). This case is very similar to the case considered in Sect. 4. The eigenvectors that correspond to the zero eigenvalues, and which span the null space of \( H\left( {\psi^{*} } \right) \), are as follows:

$$ \left[ {\begin{array}{*{20}c} 1 \\ 0 \\ {\begin{array}{*{20}c} {1/k} \\ {\begin{array}{*{20}c} 0 \\ 0 \\ \end{array} } \\ \end{array} } \\ \end{array} } \right],\left[ {\begin{array}{*{20}c} 0 \\ 0 \\ {\begin{array}{*{20}c} 0 \\ {\begin{array}{*{20}c} {\iota_{1} } \\ 0 \\ \end{array} } \\ \end{array} } \\ \end{array} } \right],\left[ {\begin{array}{*{20}c} 0 \\ 0 \\ {\begin{array}{*{20}c} 0 \\ {\begin{array}{*{20}c} {\iota_{2} } \\ 0 \\ \end{array} } \\ \end{array} } \\ \end{array} } \right], \ldots ,\left[ {\begin{array}{*{20}c} 0 \\ 0 \\ {\begin{array}{*{20}c} 0 \\ {\begin{array}{*{20}c} {\iota_{{d_{z} }} } \\ 0 \\ \end{array} } \\ \end{array} } \\ \end{array} } \right]. $$
(19)

So a vector in the null space of \( H\left( {\psi^{*} } \right) \) will be of the form \( w = \left[ {\begin{array}{*{20}c} 1 \\ 0 \\ {\begin{array}{*{20}c} {1/k} \\ {\begin{array}{*{20}c} \delta \\ 0 \\ \end{array} } \\ \end{array} } \\ \end{array} } \right] \) for arbitrary \( \delta \). Now we consider what happens to the log-likelihood function for small movements in this direction. That is, we consider a parameter value \( \theta^{o} = \theta^{*} + \tau w \) for small \( \tau > 0 \). It is easy to calculate

$$ \ln L\left( {\theta^{*} } \right) = {\text{constant}}\, - \frac{n}{2} - n\ln 2 - \frac{n}{2}\ln \hat{\sigma }_{v}^{2} . $$
(20)

However, \( \ln L(\theta^{*} + \tau w) \) yields an explicit but impenetrable expression, which we give in Appendix 2. We do not see any way to say anything meaningful in this case. This is arguably not troublesome, because (as noted above) \( \mathop \sum \nolimits_{i} z_{i}^{'} e_{i} = 0 \) is a zero-probability event unless the elements of \( z_{i} \) are linear combinations of \( x_{i} \).

6 Practical implications

It is certainly possible for the residuals (OLS or NLLS or MLE) to have a positive (wrong) skew. The main practical implication of our results is that this does not mean that there is a problem. There is always a stationary point (\( \psi^{*} \), above) of the NLLS criterion that would indicate zero inefficiency, and similarly there is always a stationary point (\( \theta^{*} \)) of the log-likelihood that would indicate zero inefficiency. However, in general these points are not a local minimum of the sum of squares or a local maximum of the log-likelihood. This does not appear to have anything to do with the skew of the residuals.

We will illustrate these issues with a small simulation. The DGP will be as simple as possible: \( y_{i} = \alpha + v_{i} - u_{i} \), \( i = 1, \ldots ,n, \) where \( v_{i} \) is \( N\left( {0,\sigma_{v}^{2} } \right) \) and (conditional on \( z_{i} \)) \( u_{i} \) is half normal with pre-truncation variance \( \sigma_{u}^{2} \cdot { \exp }\left( {2z_{i} \delta } \right) \). Here, \( z_{i} \) is N(0,1), \( \alpha = 0 \), \( \sigma_{v}^{2} = 1 \) and \( n \) = 100. The number of replications is 1000.

We use three different choices for \( \delta \) and \( \sigma_{u}^{2} \). DGP1 has \( \delta = 0 \) and \( \sigma_{u}^{2} = 1 \). This is the same DGP as for the standard SFA model with \( \lambda = 1 \). DGP2 has \( \delta = 0.5 \) and \( \sigma_{u}^{2} = 0.6065 \) (\( \sigma_{u} = 0.7788) \), and DGP3 has \( \delta = 1 \) and \( \sigma_{u}^{2} = 0.1353 \left( {\sigma_{u} = 0.3678} \right) \). Using the properties of the lognormal distribution, we have \( E\left( {u^{2} } \right) = \sigma_{u}^{2} e^{{2\delta^{2} }} \), and our values of \( \delta \) and \( \sigma_{u}^{2} \) are picked such that \( \delta \) varies but \( E\left( {u^{2} } \right) = 1 \) for each of the DGPs.

For each replication, we estimate the model in three ways. (i) OLS, which estimates \( \alpha \) and \( \sigma_{\varepsilon }^{2} \equiv {\text{var}}\left( {\varepsilon_{i} } \right) = \sigma_{v}^{2} + \frac{\pi - 2}{\pi }\sigma_{u}^{2} e^{{\delta^{2} }} \); (ii) NLLS, which estimates \( \alpha ,\delta \) and \( \sigma_{u}^{2} \); and (iii) MLE, which estimates \( \alpha ,\delta ,\sigma_{u}^{2} \) and \( \sigma_{v}^{2} \). We then calculate the mean, bias, variance and MSE of each of the estimates. We count the number of times that each of the various residuals has the wrong skew. Finally, we count how often the NLLS estimator equals \( \psi^{*} \) and how often the MLE equals \( \theta^{*} \). We need to be explicit about how this is defined. For example, the event “MLE is equal to \( \theta^{*} \)” is the complement of the event “MLE is not equal to \( \theta^{*} , \)” and this latter event occurs if and only if the likelihood evaluated at the MLE is larger than the likelihood evaluated at \( \theta^{*} \). That is, we compare likelihood values, rather than comparing the MLE parameter estimates to \( \theta^{*} \), and similarly for NLLS. Inequality is easier to verify or disprove numerically than equality.

The results strongly support the supposition that the residuals can have a wrong skew, but that this does not mean that there is a wrong skew problem. The proportions of replications in which the OLS residuals have a wrong skew (positive third moment) are 0.304, 0.104 and 0.064 for DGP1, DGP2 and DGP3, respectively. (As an aside, the value of 0.304 is very close to the value of 0.301 reported by Simar and Wilson (2009), Table 1, for the same parameter values.) The proportions of replications with a wrong skew for the NLLS residuals are 0.019, 0.001 and 0.001, while for the MLE residuals they are 0.072, 0.003 and 0.002. So a wrong skew can occur. However, it was never the case, for any of the 1000 replications for any of the three DGPs, that the NLLS estimator was equal to \( \psi^{*} \) or that the MLE was equal to \( \theta^{*} \). This is obviously a striking difference between the RSCFG model and the standard SFA model, where a wrong skew of the OLS residuals implies that the MLE is degenerate at a value that implies no inefficiency.

Table 1 DGP1: \( \delta = 0 \), \( \sigma_{u}^{2} = 1 \)

Although parameter estimation is not the focus of this paper, some evidence on the performance of the various estimators was generated in the course of the simulations. In Tables 1, 2 and 3 (for DGP1, DGP2 and DGP3, respectively), we give the mean, bias, variance and MSE of the OLS, NLLS and MLE estimators.

Table 2 DGP2: \( \delta = 0.5 \), \( \sigma_{u}^{2} = 0.6065 \)
Table 3 DGP3: \( \delta = 1 \), \( \sigma_{u}^{2} = 0.1353 \)

Consider first the results in Table 1. In DGP1, where \( \delta = 0 \) and therefore exp(\( z_{i}^{'} \delta \)) is constant, the OLS estimate of \( \alpha \) is the sample mean, and it correctly estimates the population mean which equals \( \alpha - \sqrt {\frac{2}{\pi }} \sigma_{u} = - 0.7979 \). NLLS fails to give any coherent results because \( \alpha \) and \( \sigma_{u} \) are not separately identified by the first moment (conditional on \( z \)) of the data. There is no apparent problem with MLE, for which the composed error distributional assumption yields identification. So the RSCFG model gives sensible MLE results even in this case where the RSCFG model is not needed.

Similar comments apply to the results for the other two DGPs (Tables 2 and 3). The only surprise (to us) is how poorly the NLLS estimator performs. With \( \delta \ne 0 \), we have identification from the first conditional moment but still have a great deal of difficulty of sorting out \( \alpha \) from \( \sigma_{u} \). Again, MLE is much better, and so the NLLS estimator is not recommended for practical use in this model.

7 Concluding remarks

In the standard normal/half-normal SFA model, the “wrong skew” problem refers to a set of problems that occur in the case that the OLS residuals are positively skewed. In this case, Waldman (1982) showed that two things happen. First, the COLS estimator does not exist. Second, there is always a stationary point of the likelihood at a parameter value that corresponds to no inefficiency, and in the wrong skew case, this is a local maximum of the likelihood.

In this paper, we investigated the extent to which these results generalize to related but different models. Specifically, we considered the RSCFG model in which the (pre-truncation) variance of the half-normal error depends on explanatory variables. There is no equivalent of the COLS estimator for this model, but we considered NLLS as well as MLE. We found that there is always a stationary point of the criterion function (for both NLLS and MLE) at a parameter value that corresponds to no inefficiency, thus generalizing one of Waldman’s results. For NLLS, we show that this parameter value is generally neither a local minimum nor a local maximum of the sum of squares. Similarly, for MLE, this parameter value is in general neither a local minimum nor a local maximum of the likelihood. Some limited simulations indicate that these stationary points are not a global minimum (for NLLS) or a global maximum (for MLE). So some of Waldman’s results generalize to the RSCFG model and some do not.

If nothing else, this paper shows the inherent limitations of proceeding on a case-by-case basis. There are some other related results in models similar to the normal/half-normal stochastic frontier model. For example, there is no analogue to Waldman’s results in the normal/exponential stochastic frontier model. However, Rho and Schmidt (2015) show that essentially the same results as Waldman’s hold in the zero inefficiency stochastic frontier model of Kumbhakar et al. (2013). Horrace and Wright (forthcoming) show that results similar to Waldman’s hold for a “delta sequence” family of distributions of inefficiency, for which the inefficiency distribution converges to a Dirac delta function located at the origin as the variance of inefficiency goes to zero. This is a considerable generalization because this family includes many (but not all) commonly assumed distributions. They indicate that their results extend to models with determinants of inefficiency, like the RSCFG model, but they do not give details. It does not appear that our Results 1 or 2 (which have to do with NLLS, not MLE) or Result 4 (the stationary point of the log-likelihood is neither a local min nor a local max) follow from their results. Some more general results would be very useful to understand why results like Waldman’s hold in one specific model and not in other closely related models.

An alternative direction of research is to find alternatives to the normal/half-normal distributional assumption that allow both positive and negative skewness, so that the skew cannot be “wrong.” Papers that do this include Carree (2002), Griffin and Steel (2008), Almanidis and Sickles (2011), Almanidis et al. (2014), Bonanno et al. (2017) and Hafner et al. (2018). A somewhat similar model is the Laplace model of Horrace and Parmeter (2018), for which the population skewness is negative but a wrong skew of the OLS or LAD residuals does not cause difficulties in inference. None of these models include environmental variables, but they could be modified to do so.

Our paper does not address the important practical question of what to do if in an empirical setting one encounters a stationary point that is the global minimum (for NLLS) or maximum (for MLE). Simar and Wilson (2009) provide guidance for the normal/half-normal model without determinants of inefficiency. Extending their results to other models like the RSCFG model is a reasonable task but one that is beyond the scope of the present paper.