1 Introduction

In parametric stochastic frontier models for cross-sectional data, the error term is composed as the sum of a two-sided noise component and a one-sided inefficiency component. The canonical model of Aigner et al. (1977) assumes that the noise distribution is zero-mean normal and the inefficiency distribution is half-normal. Other common distributional assumptions on the inefficiency term include the exponential (Meeusen and van den Broeck 1977; Aigner et al. 1977), the truncated normal (Stevenson 1980), and the gamma (Stevenson 1980; Greene 1980). For surveys, see Greene (2007) and Kumbhakar and Lovell (2000). The beauty of the canonical model is it allows for continuous inefficiency, does not require estimation of a support bound (in what amounts to a deconvolution problem), and only has two unknown distributional parameters, making maximum likelihood estimation simple relative to other specifications, except for (perhaps) the normal-exponential model. Consequently, the normal-half normal specification of the model has found widespread use.

In the normal-half normal production function specification, the skewness of the composed error is negative, and parameters can be estimated by maximum likelihood estimation (MLE) or corrected ordinary least squares (COLS).Footnote 1 Waldman (1982) shows that when the skewness of the ordinary least squares (OLS) residuals is positive, OLS is a local maximum of the likelihood function, and estimated inefficiency is zero in the sample.Footnote 2 Horrace and Wright (2020) generalize the Waldman result to the case where the inefficiency distribution is from a scalable parametric class, like the exponential distribution. This “wrong skewness” phenomenon is widely documented in the literature and is often regarded as an estimation failure.Footnote 3 When it occurs, researchers are advised to either obtain a new sample (which is rarely feasible) or respecify the model. While there are many ways one can respecify a model, a reasonable approach would be to chose an inefficiency distribution that accommodates residuals of either positive or negative skew. For example, Li (1996), Carree (2002), Almanidis and Sickles (2011), Almanidis et al. (2014), and Hafner et al. (2019) develop models with new distributional assumptions on the inefficiency component that allow for skew in either direction.

An alternative “solution” to the problem that does not require respecification is that of Simar and Wilson (2010), who argue that “wrong skewness” is not an estimation or misspecification failure, but a finite sample problem that most likely occurs when the ratio of the inefficiency variance to the noise variance (the inefficiency variance ratio) is small. That is, wrong skewness may not indicate that the normal-half normal model is wrong or that inefficiency does not exist in the population. They propose a bootstrap method (called “bagging”) to construct confidence intervals for model parameters and expected inefficiency which have higher coverage than traditional intervals, regardless of residual skewness direction. Then, the sample under study can still be used to infer the model parameters. While bagging can be applied to any parametric form of the model, their technique is specifically intended to salvage the canonical model when the residual skew has the wrong sign. Such is the spirit of this research.

We take Simar and Wilson’s (2010) view that wrong skewness may be a consequence of a small inefficiency variance ratio in finite samples, even when the canonical model is properly specified.Footnote 4 However, instead of the bagging approach of Simar and Wilson (2010), this paper provides a finite sample adjustment to existing estimators in the presence of wrong skewness. That is, we impose a negative residual skewness constraint in the MLE (or COLS) algorithm. A natural candidate for this constraint is the upper bound of the population skew, which is a monotonic function of the positive lower bound of the inefficiency variance ratio in the half-normal model. However, the constraint is nonlinear in the parameters of interest, complicating computation of the optimum. Therefore, a linearization approximation of the constraint is proposed. Additionally, a model selection approach is proposed to determine the lower bound of the inefficiency variance ratio used in the constraint. Monte Carlo experiments suggest that our correction becomes more reliable as the true inefficiency variance ratio increases. A shortcoming of the approach is that in finite samples the linear approximation may not be accurate enough to guarantee a negative residual skew, but it will always give us nonzero estimates for firm-level inefficiencies. The possible failure of correct residual skewness using the linearized constraint illustrates a trade-off between computational complexity and accuracy. Using the original nonlinear constraint avoids this issue, but the computational simplicity of our approach, as shown in our Monte Carlo experiments and empirical example, would be lost.

The proposed finite sample adjustment provides a nonzero point estimate for technical (or cost) inefficiency that may be used in applied research and that serves as a compliment to the bagging intervals of Simar and Wilson (2010). That is, if the wrong skew exists in the canonical model, then one may use Simar and Wilson’s bagging technique to construct confidence intervals on efficiency, while the proposed methodology provides nonzero point estimates without having to respecify the model. Since wrong skewness can occur fairly regularly (even when efficiency may exist in the population under study), the finite sample adjustment is particularly attractive in cases where the half-normal inefficiency assumption is maintained. It is noteworthy that the proposed adjustment is only needed in finite samples, for as the sample size increases wrong skewness is less likely to be an issue when the ratio of inefficiency variance to noise variance is sizable.

This rest of this paper is organized as follows. The next section discusses the wrong skewness issue in the literature. In Sect. 3, we propose a finite sample correction approach. To simplify computation of the proposed constrained estimation, a linearized version of the constraint is used, so that constrained MLE (or COLS) can be easily implemented in most software packages (like STATA). The constrained estimators are discussed in Sect. 4. In Sect. 5, Monte Carlo experiments are conducted to study the properties of constrained COLS. An empirical example is used to illustrate the proposed approach in Sect. 6, and all the point estimates of inefficiency lie within the confidence intervals of Simar and Wilson (2010). The last section concludes.

2 Wrong skewness issue

A stochastic production frontier (SPF) model for a cross-sectional sample of size N is:

$$\begin{aligned} y_{i}=x_{i}^{\prime }\beta +\varepsilon _{i},\ \ \ i=1,\ldots ,N, \end{aligned}$$
(1)

with composed error \(\varepsilon _{i}=v_{i}-u_{i}.\) The disturbance \(v_{i}\) is assumed \(iidN(0,\sigma _{v}^{2})\). Inefficiency of firm i is characterized by \(u_{i}\ge 0\). In the SPF literature, \(u_{i}\) is usually assumed half-normal \(|iidN(0,\sigma _{u}^{2})|\) (Aigner et al. 1977; Wang and Schmidt 2009), and independent of \(v_{i}\), with variance \( Var(u_{i})=\frac{\pi -2}{\pi }\sigma _{u}^{2}\). The first component of the \( p\times 1\) vector \(x_{i}\) is 1, so the intercept term is contained in the \(p\times 1\) slope parameter vector \(\beta \). As in Aigner et al. (1977) and Simar and Wilson (2010), let \(\sigma ^{2}=\sigma _{u}^{2}+\sigma _{v}^{2}\) and \(\lambda =\sigma _{u}/\sigma _{v}\). The parameters to be estimated are \(\theta =(\beta ,\lambda ,\sigma ^{2})\).

There are two primary estimators suggested in the literature: the maximum likelihood estimator and corrected least squares (Aigner et al. 1977; Olson et al. 1980). Under the normal-half normal specification, the MLE of \((\beta ,\lambda ,\sigma ^{2})\) is the set of parameters values maximizing the likelihood function:

$$\begin{aligned} \ln L(\beta ,\lambda ,\sigma ^{2}|(y_{i},x_{i}),i&=1,...,N) \nonumber \\&=\frac{N}{2}\ln \left( \frac{2}{\pi }\right) -\frac{N}{2}\ln \sigma ^{2}\nonumber \\&\quad +\sum \limits _{i=1}^{N}\ln \left[ 1-\Phi \left( \frac{\lambda }{\sqrt{ \sigma ^{2}}}(y_{i}-x_{i}^{\prime }\beta )\right) \right] \nonumber \\&\quad -\frac{1}{2\sigma ^{2}}\sum \limits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta )^{2}, \end{aligned}$$
(2)

where \(\Phi \left( \cdot \right) \) is the standard normal cumulative distribution function. The COLS estimate of \(\beta \) is simply the least squares slope estimate in the regression of \(y_{i}\) on \(x_{i}\). However, the mean of \(\varepsilon _{i}=v_{i}-u_{i}\) is negative due to the term \( -u_{i}\), so the COLS estimate needs to be adjusted by adding the bias, \( \sqrt{2\sigma _{u}^{2}/\pi }\), back into the intercept estimator. The bias can be consistently estimated using the variance estimates:

$$\begin{aligned} {\hat{\sigma }}_{u}^{2}=\left[ \sqrt{\frac{\pi }{2}}\left( \frac{\pi }{\pi -4} \right) {\hat{\mu }}_{3}^{\prime }\right] ^{2/3},\quad {\hat{\sigma }} _{v}^{2}={\hat{\mu }}_{2}^{\prime }-\frac{\pi -2}{\pi }{\hat{\sigma }}_{u}^{2}, \end{aligned}$$
(3)

where \({\hat{\mu }}_{2}^{\prime }\) and \({\hat{\mu }}_{3}^{\prime }\) are the estimates of second and third sample moments of the least squares residuals.

Both MLE and COLS are consistent. The Monte Carlo experiments in Olson et al. (1980) show that there is little difference between MLE and COLS for the slope coefficients in finite samples. For the intercept and variance parameters, however, MLE and COLS differ. In addition to MLE and COLS, Olson et al. (1980) also consider a third consistent estimator, the two-step Newton–Raphson estimator, which has different finite sample properties than MLE and COLS.

Waldman (1982) discovers an important property of MLE: for the likelihood function (2) above, the point \((b,0,s^{2})\) is a stationary point, where b and \(s^{2}\) are the OLS estimates of \(\beta \) and \(\sigma ^{2}\). Intuitively, when \(\lambda =0\), the term \(u_{i}\) disappears, so the likelihood function of the SPF model (2) boils down to one of a linear model with \(u_{i}=0\). A salient result in Waldman (1982) is that when the skewness of the OLS residuals is positive, i.e., \({\hat{\mu }}_{3}^{\prime }>0\), then \((b,0,s^{2})\) is a local maximum in the parameter space of the likelihood function.Footnote 5 This is the so-called wrong skewness issue in the literature, because \(\mu _{3}^{\prime }<0\) in the normal-half normal model. Olson et al. (1980) refer to this phenomenon as “Type I failure” since the COLS estimator defined in (3) does not exist when \(\hat{\mu }_{3}^{\prime }>0\).

The Monte Carlo studies in Simar and Wilson (2010) show that the wrong skewness issue is not rare, even when the inefficiency variance ratio is considerably large. For example, the frequency of wrong skewness could be 30% for a sample of size of 100 when \(\lambda =\sigma _{u}/\sigma _{v}=1\). Wrong skewness casts doubt on the specification of the SPF model (Greene 2007). Moreover, it invalidates the calculation of standard errors of parameter estimates (Simar and Wilson 2010).

Greene (2007) considers OLS residual skewness a useful diagnostic tool for the normal-half normal model. Wrong skewness suggests there is little evidence of inefficiency in the sample, implying that firms in the sample are “super efficient.” Thus, \(\lambda \) and \(\sigma _{u}^{2}\) are assumed to be zero, and the stochastic frontier model reduces to a production function without the inefficiency term.Footnote 6 Another interpretation of the wrong skewness issue is that the normal-half normal model is not the correct specification. Other specifications may well reveal the presence of inefficiency and reconcile the distribution of one-sided inefficiency with the data. The binomial distribution considered by Carree (2002) and doubly truncated normal distribution proposed by Almanidis and Sickles (2011) and Almanidis et al. (2014) could have either negative or positive skewness. They argue that models with ambiguous skewness may be more appropriate in applied research.

Simar and Wilson (2010) argue that wrong skewness is a finite sample problem, even when the model is correctly specified.Footnote 7 They show that a bootstrap aggregating method provides useful information about inefficiency and the model parameters, regardless of whether residuals are skewed in the desired direction. We also consider wrong skewness to be a consequence of estimation in finite samples when the inefficiency variance ratio \( Var(u_{i})/Var(\varepsilon _{i})\) is small.Footnote 8 Since the OLS residuals of a production function regression with \(u_{i}=0\) display skewness in either direction with probability 50%, a sample drawn from an SPF model with small inefficiency variance ratio could generate positively skewed residuals with high probability.Footnote 9

3 Finite sample correction

As illustrated by Simar and Wilson (2010), wrong skewness may occur when the inefficiency variance ratio is sizable, so simply setting \(\sigma _{u}^{2}=0\) when the skewness is positive could be a mistake. Instead of improved interval estimates proposed by Simar and Wilson (2010), this paper proposes a finite sample adjustment to existing estimators in the presence of wrong skewness.

For MLE, a constraint with non-positive residuals skewness is imposed:

$$\begin{aligned}&\max \ln L(\beta ,\lambda ,\sigma ^{2}|(y_{i},x_{i}),i=1,...,N)\nonumber \\&\quad s.t.\,\frac{1}{N}\sum _{i=1}^{N}\left[ \frac{y_{i}-{\bar{y}} -x_{i}^{\prime }\beta +{\bar{x}}^{\prime }\beta }{\sqrt{\frac{1}{N} \sum \nolimits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta -{\bar{y}}+{\bar{x}}^{\prime }\beta )^{2}}}\right] ^{3}\le 0, \end{aligned}$$
(4)

where \({\bar{y}}=\frac{1}{N}\sum \nolimits _{i=1}^{N}y_{i}\) and \({\bar{x}}=\frac{1 }{N}\sum \nolimits _{i=1}^{N}x_{i}\). Unfortunately, when implementing maximum likelihood estimation with the inequality constraint defined by (4), there is a practical issue. As pointed out by Waldman (1982), in the case of positive skewness of residuals, OLS \((b,0,s^{2})\) is a local maximum and the unconstrained MLE is equal to \((b,0,s^{2})\). Since OLS is a local maximum in the parameter space of unconstrained MLE, the constraint (4) is always binding at the maximum, leading to zero skewness of the constrained MLE residuals.Footnote 10

If we regard the sign of residual skewness as an important indicator of model specification, the constrained MLE above seems unsatisfactory. We, therefore, propose a (negative) upper bound of skewness instead of zero in (4). This is relevant for empirical modeling. As in the empirical example below, when there is evidence of technical inefficiency in the data (Greene 2007, p. 202), its variance cannot be too small, relative to that of the composed error \(\varepsilon _{i}\). Denote the inefficiency variance ratio by

$$\begin{aligned} k=Var(u_{i})/Var(\varepsilon _{i}). \end{aligned}$$

That is, a lower bound on the inefficiency variance ratio is implicitly imposed, \(k\ge k_{0}\). From this perspective, to impose a positive value of \(k_{0}\) is to obtain a nonzero estimated inefficiency.

To develop the relationship between the upper bound of skewness and the lower bound of the inefficiency variance ratio, consider the second and third moment of \(\varepsilon _{i}\). Under the normal-half normal specification, Olson et al. (1980) show that

$$\begin{aligned} Var(\varepsilon _{i})=\sigma _{v}^{2}+\frac{\pi -2}{\pi }\sigma _{u}^{2} \, \end{aligned}$$
(5)

and

$$\begin{aligned} E[\varepsilon _{i}-E(\varepsilon _{i})]^{3}=\sigma _{u}^{3}\sqrt{2/\pi } [(\pi -4)/\pi ]. \end{aligned}$$
(6)

Using \(\gamma _{1}(\varepsilon _{i})\) to denote the skewness of \(\varepsilon _{i}\), we have

$$\begin{aligned} \gamma _{1}(\varepsilon _{i})=E\left[ \frac{\varepsilon _{i}-E(\varepsilon _{i})}{\sqrt{Var(\varepsilon _{i})}}\right] ^{3}=-\left[ \frac{Var(u_{i})}{ Var(\varepsilon _{i})}\right] ^{3/2}\gamma _{1}(u_{i})=-k^{3/2}\gamma _{1}(u_{i}), \end{aligned}$$

where \(\gamma _{1}(u_{i})=\frac{4-\pi }{\pi -2}\sqrt{\frac{2}{\pi -2 }}\simeq 0.9953\), a constant for a half-normal distribution of \( u_{i}\). Denote \(g(k)=\gamma _{1}(\varepsilon _{i})=-k^{3/2}\gamma _{1}(u_{i})\). Since \(\gamma _{1}(u_{i})>0\), \(g(k)<0\) (e.g., \( g(0.1)\approx -0.0315\), \(g(0.2)\approx -0.0890\) and \(g(0.3)\approx -0.1635\) ) and \(g^{\prime }(k)=-\frac{3}{2}k^{1/2}\gamma _{1}(u_{i})<0\). An important property of g(k) is that it is a monotonically decreasing function of k. This implies that any upper bound, say \(g_{0}\), of the population skewness, \( g(k)=\gamma _{1}(\varepsilon _{i})\le g_{0}\), is equivalent to a lower bound, denoted by \(k_{0}\), of the inefficiency variance ratio, \(k\ge k_{0}\) , i.e., \(g_{0}=g(k_{0})<0\).

We impose this upper bound on the sample skewness, by replacing 0 in the constraint (4) with the negative upper bound of the population skewness, \(g(k_{0})\). Consequently, a modified constraint

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^{N}\left[ \frac{y_{i}-{\bar{y}}-x_{i}^{\prime }\beta + {\bar{x}}^{\prime }\beta }{\sqrt{\frac{1}{N}\sum \nolimits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta -{\bar{y}}+{\bar{x}}^{\prime }\beta )^{2}}}\right] ^{3}\le g(k_{0}) \end{aligned}$$

is used in the constrained MLE in the event of wrong skewness of the OLS residuals.

Based on Waldman’s (1982) argument, the constraint above will also be binding at a maximum in the neighborhood of OLS. The constraint becomes

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^{N}\left[ \frac{y_{i}-{\bar{y}}-x_{i}^{\prime }\beta + {\bar{x}}^{\prime }\beta }{\sqrt{\frac{1}{N}\sum \nolimits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta -{\bar{y}}+{\bar{x}}^{\prime }\beta )^{2}}}\right] ^{3}=g(k_{0}) \end{aligned}$$
(7)

This finite sample adjustment gives a constrained estimator of parameter vector \((\beta ,\lambda ,\sigma ^{2})\).

The constrained COLS slope coefficients can be similarly defined. We use constraint (7), but replace the likelihood (2) with the sum of squared residuals as the objective function of a minimization problem. Since COLS reduces to OLS in the presence of wrong skewness and OLS is a local maximum of likelihood, as a finite sample adjustment to OLS, the constrained COLS slope coefficients are expected be close to their constrained MLE counterparts.

3.1 Linearizing the constraint

The nonlinearity of \(\beta \) in the constraint (7) creates computational difficulties in calculating the constrained MLE. To simplify computation, a linearized version of the constraint (7) is considered. Given that OLS is a local maximum of likelihood in the presence of wrong skewness, empiricists normally start by estimating OLS with \( u_{i}=0 \). This is the first step in LIMDEP (Greene 1995) and FRONTIER (Coelli 1996). If the skewness of the OLS residuals is positive, then OLS is the optimum and the point of departure for our linearization concept.

Since the primary concern is skewness correction, we impose the additional restriction that the MLE residual variance \(\frac{1}{N}\sum \nolimits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta -{\bar{y}}+{\bar{x}}^{\prime }\beta )^{2}\) is equal to that of OLS residuals, \({\hat{\mu }}_{2}^{\prime }\), which is a consistent estimator of the error variance. Thus, the linearized constraint becomes:

$$\begin{aligned} \frac{1}{N}\sum \limits _{i=1}^{N}[y_{i}-{\bar{y}}-(x_{i}+{\bar{x}})^{\prime }\beta ]^{3}=g(k_{0})\cdot ({\hat{\mu }}_{2}^{\prime })^{3/2}. \end{aligned}$$

Denote \(f(\beta )=\frac{1}{N}\sum \nolimits _{i=1}^{N}[y_{i}-{\bar{y}}-(x_{i}+ {\bar{x}})^{\prime }\beta ]^{3}\). The first-order Taylor expansion of \(f(\beta )\) at the OLS estimate \({\hat{\beta }}_\mathrm{OLS}\) is:

$$\begin{aligned} f(\beta )\approx f({\hat{\beta }}_\mathrm{OLS})+\left[ \frac{\partial f(\beta )}{ \partial \beta }_{|{\hat{\beta }}_\mathrm{OLS}}\right] ^{\prime }\cdot (\beta -\hat{ \beta }_\mathrm{OLS}), \end{aligned}$$

where \(\frac{\partial f(\beta )}{\partial \beta }_{|{\hat{\beta }}_\mathrm{OLS}}\) is the derivative of \(f(\beta )\) with respect to \(\beta \) evaluated at \(\hat{ \beta }_\mathrm{OLS}\). \(f({\hat{\beta }}_\mathrm{OLS})\) is the third central moment of OLS residuals, i.e., \({\hat{\mu }}_{3}^{\prime }\). Now,

$$\begin{aligned} \frac{\partial f(\beta )}{\partial \beta }=-\frac{3}{N}\sum \limits _{i=1}^{N}[y_{i}-{\bar{y}}-(x_{i}+{\bar{x}})^{\prime }\beta ]^{2}(x_{i}- {\bar{x}}), \end{aligned}$$

and

$$\begin{aligned} \frac{\partial f(\beta )}{\partial \beta }_{|{\hat{\beta }}_\mathrm{OLS}}=-\frac{3}{N} \sum _{i=1}^{N}e_{i}^{2}(x_{i}-{\bar{x}}), \end{aligned}$$

where \(e_{i}\) denotes the OLS residual \( y_{i}-x_{i}^{\prime }{\hat{\beta }}_\mathrm{OLS}\). Its sample mean is equal to zero since a constant term is included in the regression. Hence, an approximation of the constraint (7) is

$$\begin{aligned} {\hat{\mu }}_{3}^{\prime }-\frac{3}{N}\sum _{i=1}^{N}e_{i}^{2}(x_{i}-{\bar{x}} )^{\prime }(\beta -{\hat{\beta }}_\mathrm{OLS})=g(k_{0})\cdot ({\hat{\mu }}_{2}^{\prime })^{3/2}, \end{aligned}$$
(8)

or

$$\begin{aligned} \left[ \frac{1}{N}\sum _{i=1}^{N}e_{i}^{2}(x_{i}-{\bar{x}})\right] ^{\prime }(\beta - {\hat{\beta }}_\mathrm{OLS})=\frac{{\hat{\mu }}_{3}^{\prime }}{3}-\frac{g(k_{0})}{3}(\hat{ \mu }_{2}^{\prime })^{3/2}. \end{aligned}$$
(9)

Letting the \(N\times 1\) vector \({\tilde{e}}\) be the squared OLS residual vector \((e_{1}^{2},...,e_{N}^{2})^{\prime }\), the constraint above can be written in matrix form as

$$\begin{aligned} \frac{1}{N}{\tilde{e}}^{\prime }M_{0}X(\beta -{\hat{\beta }}_\mathrm{OLS})=\frac{\hat{\mu }_{3}^{\prime }}{3}-\frac{g(k_{0})}{3}({\hat{\mu }}_{2}^{\prime })^{3/2}, \end{aligned}$$

where \(M_{0}=I-\frac{1}{N}\iota \iota ^{\prime }\) and \(\iota =(1,...,1)^{\prime }\). Thus, the linear constraint above can be written as

$$\begin{aligned} R\beta =q(k_{0}) \end{aligned}$$
(10)

with \(R=\frac{1}{N}{\tilde{e}}^{\prime }M_{0}X\) and \(q(k_{0})=R{\hat{\beta }} _\mathrm{OLS}+\frac{{\hat{\mu }}_{3}^{\prime }}{3}+\frac{\Pi }{3}k_{0}^{3/2}({\hat{\mu }} _{2}^{\prime })^{3/2}\), depending on the value of \(k_{0}\).Footnote 11

Therefore, the proposed finite sample correction for MLE of \((\beta ,\lambda ,\sigma ^{2})\), i.e., the constrained MLE, is defined as the solution to maximizing the likelihood (2) subject to the linear constraint (10). The corresponding estimators of \(\sigma _{u}^{2}\) and \(\sigma _{v}^{2}\) can be obtained by using the relationship \(\sigma ^{2}=\sigma _{u}^{2}+\sigma _{v}^{2}\) and \(\lambda =\sigma _{u}/\sigma _{v}\).

Similarly, the constrained COLS of \(\beta \) is defined to minimize the sum of squared residuals subject to (10). As in the unconstrained estimation, the constrained estimators of \(\sigma _{u}^{2}\) and \(\sigma _{v}^{2}\) can be obtained by formula (3).

If \(k_{0}=0\), then \(g(k_{0})=0\) and the constraint above becomes \(R(\beta - {\hat{\beta }}_\mathrm{OLS})={\hat{\mu }}_{3}^{\prime }/3\). This implies that the constrained and unconstrained estimators would be similar, since \({\hat{\mu }} _{3}^{\prime }\) is usually very small in the presence of wrong skewness. In the extreme case of \({\hat{\mu }}_{3}^{\prime }=0\), the constrained estimator reduces to OLS, which is a local maximum of the likelihood.

Using the linearized constraint (10), the estimates, standard errors, and confidence intervals of the constrained MLE and constrained COLS can be easily obtained using Stata or other existing software.Footnote 12

However, since (10) does not guarantee a negative residual skewness in finite samples, there is a possibility that wrong skewness could still occur after our correction. The Monte Carlo experiments below show that this may only be a concern when the underlying inefficiency variance ratio is very small. However, as stressed above, by setting \( k_{0}>0\), a nonzero estimated inefficiency may be obtained even in the presence of wrong residual skewness.

3.2 Choosing the value of \(k_{0}\)

The idea of the proposed constrained estimators is to adjust the slope coefficients to obtain a correct sign of residual skewness using the constraint (10), which is a function of \(k_{0}\). It is expected that when the chosen value of \(k_{0}\) is small, a slight adjustment results in the constrained MLE (or constrained COLS), and its value will be close to the unconstrained MLE.

Choosing a specific value of \(k_{0}\) is an empirical issue. On the one hand, when there is a priori evidence of inefficiency, the inefficiency variance ratio cannot be too small. On the other hand, as illustrated by the Monte Carlo study in Simar and Wilson (2010), wrong skewness is less likely to occur as the inefficiency variance ratio increases.Footnote 13 In the spirit of this trade-off, we develop a model selection criteria to choose \(k_{0}\). The idea is to incorporate a penalty function, so that as \(k_{0}\) increases the penalty decreases. Hence, the fit of the model and effect of the constraint on the optimum can be balanced.

For constrained MLE, we propose a Bayesian information criterion (BIC) via the likelihood to choose the value of \(k_{0}\):

$$\begin{aligned} BIC(k_{0})=-2l_{r}(k_{0})-k_{0}\ln N, \end{aligned}$$

where \(l_{r}(k_{0})\) is the log-likelihood evaluated at the constrained MLE of \((\beta ,\lambda ,\sigma ^{2})\), depending on \(k_{0}\). Since OLS \( (b,0,s^{2})\) is a local maximum of the log-likelihood function in the presence of positive skewness with a restriction on \(k_{0}\), the value of \( l_{r}(k_{0})\) decreases with \(k_{0}\) in the neighborhood of \((b,0,s^{2})\).Footnote 14 Different from the usual BIC, here we use a negative sign in front of the penalty term \(k_{0}\ln N\) so that \(-2l_{r}(k_{0})\) and \(-k_{0}\ln N\) move in opposite directions with \(k_{0}\). An optimal value of \(k_{0}\) is chosen to minimize \(BIC(k_{0})\):

$$\begin{aligned} {\tilde{k}}_{0}=\arg \min _{k_{0}\in [0,1)}BIC(k_{0}). \end{aligned}$$

Similarly, for the constrained COLS, a criterion based on sum of squared residuals is proposed to select the value of \(k_{0}\):

$$\begin{aligned} C(k_{0})=\frac{1}{N}SSR_{r}(k_{0})-k_{0}{\hat{\sigma }}_{\varepsilon }^{2}\frac{ \ln N}{N}, \end{aligned}$$

where \(SSR_{r}(k_{0})\) is the sum of squared residuals of OLS with the constraint (10). \(C(k_{0})\) is a Mallows’ \(C_{p}\)-type criterion, similar to the expression proposed by Bai and Ng (2002) to choose the number of factors in the approximate factor models, except that the penalty term takes a negative sign. By applying the properties of the usual restricted least squares, it can be shown that \(SSR_{r}(k_{0})\) increases with \(k_{0}\) (See “Appendix”). Hence, the effect of increasing \(k_{0}\) on the model fit can be balanced by the penalty term, thus an appropriate value of \(k_{0}\) is chosen to minimize \(C(k_{0})\):

$$\begin{aligned} {\hat{k}}_{0}=\arg \min _{k_{0}\in [0,1)}C(k_{0}). \end{aligned}$$

The estimated error variance \({\hat{\sigma }}_{\varepsilon }^{2}\) provides an appropriate scaling to the penalty term. Here, we use \({\hat{\sigma }} _{\varepsilon }^{2}=\frac{1}{N}SSR\), where SSR is the sum of squared residuals of OLS without constraint.

In practice, to find the value of \({\tilde{k}}_{0}\) (or \({\hat{k}}_{0}\)) a grid search can be applied to \(BIC(k_{0})\) (or \(C(k_{0})\)) starting from a small positive value, e.g., 0.05.

Since the measures of the model fit in the constrained MLE and COLS, i.e., the objective functions in the penalized least squares and penalized maximum likelihood are different, \({\tilde{k}}_{0}\) is not necessarily equal to \( {\hat{k}}_{0}\). However, in the neighborhood of OLS \((b,0,s^{2})\) with a small value of \(\lambda \), when the term \(\sum \nolimits _{i=1}^{N}\ln \left[ 1-\Phi \left( \frac{\lambda }{\sqrt{\sigma ^{2}}}(y_{i}-x_{i}^{\prime }\beta )\right) \right] \) in \(l(\beta ,\lambda ,\sigma ^{2})\) has small values of partial derivatives in the first-order conditions, \({\tilde{k}}_{0}\) should be close to \({\hat{k}}_{0}\).

It is worthwhile to note that \(k_{0}\) is not a model parameter here and is selected by the proposed selection criteria only for finite sample correction. Thus, choosing \(k_{0}\) is inherently different from model selection in the literature, such as, choosing the number of model parameters, where consistency is a primary requirement for the penalty term. Therefore, we could use different penalty terms in \(BIC(k_{0})\) or \( C(k_{0}) \) above as long as a unique value of \(k_{0}\) can be chosen. The Monte Carlo experiments and empirical example below suggest that the proposed selection criteria work well.Footnote 15

4 Constrained estimators

With the proposed finite sample adjustment, the sample can still be used to construct a point estimate for inferring population parameters in the presence of wrong skewness. This is similar in spirit to Simar and Wilson (2010), who still rely on the MLE estimation results, but provide more accurate interval estimates using improved inference (bagging) methods.

As previously mentioned, any negative constraint on sample skewness is binding in the presence of wrong skewness. This result implies that estimated \(\lambda \) (or k) is implicitly determined by the constraint (10). Consequently, it is biased when the selected value of \(k_{0}\) , the lower bound of k, is not equal to the true value of k. Inconsistency of the proposed constrained estimators might be a concern. However, this concern may be overstated. Under the true specification, as the sample size increases, wrong skewness is less likely to appear. Thus, the proposed finite sample adjustment becomes unnecessary, and asymptotics are less of a concern. In addition, with the nature of finite sample adjustment, the proposed method is regarded as an adjustment to existing estimators, rather than a new estimator.Footnote 16

In the next subsection, properties of constrained estimators are studied. Since the constrained COLS is essentially restricted least squares, which has an analytical solution, we mainly focus on it.

4.1 Constrained COLS

The proposed constrained COLS, denoted by \({\hat{\beta }}_{r}\), is a two-step estimator. In the first step, for a given \(k_{0}\), the constrained COLS \( {\hat{\beta }}_{r}(k_{0})\) is defined as the solution of

$$\begin{aligned} \min _{\beta }SSR(\beta )= & {} \min _{\beta }(Y-X\beta )^{\prime }(Y-X\beta ) \\ s.t.\,R\beta= & {} q(k_{0}). \end{aligned}$$

In the second step, \(k_{0}\) is selected such that \({\hat{k}}_{0}=\arg \min _{k_{0}}C(k_{0})\), where \(C(k_{0})=\frac{1}{N}(Y-X{\hat{\beta }} _{r}(k_{0}))^{\prime }(Y-X{\hat{\beta }}_{r}(k_{0}))-k_{0}{\hat{\sigma }} _{\varepsilon }^{2}\frac{\ln N}{N}\). The proposed constrained COLS is defined as \({\hat{\beta }}_{r}={\hat{\beta }}_{r}({\hat{k}}_{0})\).

This two-step estimator is equivalent to a one-step penalized least squares with the linear constraint:

$$\begin{aligned}&\min _{\beta ,k_{0}}\frac{1}{N}(Y-X\beta )^{\prime }(Y-X\beta )-k_{0}\hat{ \sigma }_{\varepsilon }^{2}\frac{\ln N}{N} \\&\quad s.t.\,R\beta =q(k_{0}). \end{aligned}$$

This equivalence comes from the fact that in the objective function \(k_{0}\) only appears in the penalty term \(-k_{0}{\hat{\sigma }}_{\varepsilon }^{2}\frac{ \ln N}{N}\). Thus, \(\beta \) can be concentrated out for a given \(k_{0}\).

For a given \(k_{0}\), \({\hat{\beta }}_{r}(k_{0})\) is the restricted least square. By Amemiya (1985) or Greene (2012),

$$\begin{aligned} {\hat{\beta }}_{r}(k_{0})={\hat{\beta }}_\mathrm{OLS}-(X^{\prime }X)^{-1}R^{\prime }[R(X^{\prime }X)^{-1}R^{\prime }]^{-1}[R{\hat{\beta }}_\mathrm{OLS}-q(k_{0})], \end{aligned}$$

and

$$\begin{aligned} SSR_{r}(k_{0})=SSR+[R{\hat{\beta }}_\mathrm{OLS}-q(k_{0})]^{\prime }[R(X^{\prime }X)^{-1}R^{\prime }]^{-1}[R{\hat{\beta }}_\mathrm{OLS}-q(k_{0})]. \end{aligned}$$

Thus, the criterion is

$$\begin{aligned} C(k_{0})= & {} \frac{1}{N}SSR+\frac{1}{N}[R{\hat{\beta }}_\mathrm{OLS}-q(k_{0})]^{\prime }[R(X^{\prime }X)^{-1}R^{\prime }]^{-1}[R{\hat{\beta }}_\mathrm{OLS}-q(k_{0})]\\&-k_{0} {\hat{\sigma }}_{\varepsilon }^{2}\frac{\ln N}{N}. \end{aligned}$$

Minimizing \(C(k_{0})\) defines \({\hat{k}}_{0}\). The follow proposition proves the existence and uniqueness of \({\hat{k}}_{0}\).

Proposition 1

In the presence of positive skewness of OLS residuals, i.e., \({\hat{\mu }} _{3}^{\prime }>0\), (i) \(\frac{dSSR_{r}(k_{0})}{dk_{0}}>0\); (ii) for a relatively large sample size N, there exists a solution for \({\hat{k}}_{0}\) such that \({\hat{k}}_{0}\) minimizes \(C(k_{0})\); (iii) \(\frac{d^{2}C(k_{0})}{ dk_{0}^{2}}>0\), implying that \({\hat{k}}_{0}\) is the unique solution.

The proof in “Appendix” shows that a relatively large N guarantees the existence of \({\hat{k}}_{0}\). Since \(\frac{\ln N}{N}\rightarrow 0\), when \(N\rightarrow \infty \), compared with the first term \(\frac{1}{N}SSR_{r}(k_{0})\), which converges to a nonzero constant, the penalty term \(-k_{0}{\hat{\sigma }}_{\varepsilon }^{2} \frac{\ln N}{N}\) in \(C(k_{0})\) can be ignored asymptotically. This implies that \({\hat{k}}_{0}\rightarrow 0\) as \(N\rightarrow \infty \). Hence, when N is large, the proposed constrained COLS approaches the OLS with constraint \( R(\beta -{\hat{\beta }}_\mathrm{OLS})={\hat{\mu }}_{3}^{\prime }/3\), which is very close to OLS in the presence of wrong skewness.

This property also implies that in a sample with a large number of firms, the selected \({\hat{k}}_{0}\) could be 0. In this case, to obtain nonzero inefficiency estimates, a small positive value, say, 0.05, is suggested.Footnote 17

For a given sample, the difference between OLS and the constrained COLS

$$\begin{aligned} {\hat{\beta }}_\mathrm{OLS}-{\hat{\beta }}_{r}=(X^{\prime }X)^{-1}R^{\prime }[R(X^{\prime }X)^{-1}R^{\prime }]^{-1}[R{\hat{\beta }}_\mathrm{OLS}-q({\hat{k}}_{0})] \end{aligned}$$

depends on \({\hat{k}}_{0}\), and

$$\begin{aligned} \frac{d[{\hat{\beta }}_\mathrm{OLS}-{\hat{\beta }}_{r}]}{d{\hat{k}}_{0}}=-(X^{\prime }X)^{-1}R^{\prime }[R(X^{\prime }X)^{-1}R^{\prime }]^{-1}\frac{\Pi }{2}\hat{k }_{0}^{1/2}({\hat{\mu }}_{2}^{\prime })^{3/2} \end{aligned}$$

implying that the magnitude of this difference is positively correlated with the chosen value \({\hat{k}}_{0}\).

4.2 Constrained MLE

For a given \(k_{0}\), the constrained MLE \(({\hat{\beta }}_{CMLE}(k_{0}),\hat{ \lambda }_{CMLE}(k_{0}),{\hat{\sigma }}_{CMLE}^{2}(k_{0}))\) depends on \(k_{0}\). Minimizing \(BIC(k_{0})\) determines the value of \(k_{0}\), i.e., \({\tilde{k}} _{0}=\arg \min _{k_{0}\in [0,1)}BIC(k_{0})\). Similar to the constrained COLS, \(({\hat{\beta }}_{CMLE},{\hat{\lambda }}_{CMLE},{\hat{\sigma }} _{CMLE}^{2})\) is defined as \(({\hat{\beta }}_{CMLE}({\tilde{k}}_{0}),\hat{\lambda }_{CMLE}({\tilde{k}}_{0}),{\hat{\sigma }}_{CMLE}^{2}({\tilde{k}}_{0}))\). It can also be written as a penalized maximum likelihood estimator with a constraint,

$$\begin{aligned}&\min _{\beta ,\lambda ,\sigma ^{2},k_{0}}-2l(\beta ,\lambda ,\sigma ^{2})-k_{0}\ln N \\&\quad s.t.R\beta =q(k_{0}), \end{aligned}$$

where

$$\begin{aligned} l(\beta ,\lambda ,\sigma ^{2})= & {} \frac{N}{2}\ln (\frac{2}{\pi })-\frac{N }{2}\ln \sigma ^{2}+\sum \limits _{i=1}^{N}\ln \left[ 1-\Phi \left( \frac{ \lambda }{\sqrt{\sigma ^{2}}}(y_{i}-x_{i}^{\prime }\beta )\right) \right] \\&- \frac{1}{2\sigma ^{2}}\sum \limits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta )^{2} \end{aligned}$$

defined in (2).

Since there is no analytical solution to the constrained optimization problem above, it is difficult to derive the properties of constrained MLE.

However, dividing by N, \(\frac{1}{N}BIC(k_{0})=-\frac{2}{N} l_{r}(k_{0})-k_{0}\frac{\ln N}{N}\), compared with \(-\frac{2}{N}l_{r}(k_{0})\) , which does not converge to zero, the penalty term \(-k_{0}\frac{\ln N}{N}\) can be asymptotically ignored as \(N\rightarrow \infty \), implying that \( {\tilde{k}}_{0}\) tends to 0 as \(N\rightarrow \infty \). Since \({\tilde{k}}_{0} \) is small when N is large, the proposed constrained MLE is expected be close to MLE. Since the MLE of slope parameters is very close to OLS, the constrained MLE and constrained COLS are expected to be close. Similar to the constrained COLS, the selected \({\tilde{k}}_{0}\) could be 0 in a sample with a large N. In this case, we also impose a lower bound of, say, 0.05 , to obtain nonzero estimated inefficiency.

We now consider the difference between constrained MLE and OLS by examining the first-order conditions of (2). Aigner et al. (1977) show that:

$$\begin{aligned} \frac{\partial \ln L}{\partial \sigma ^{2}}= & {} -\frac{N}{2\sigma ^{2}}+\frac{ 1}{2\sigma ^{4}}\sum \limits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta )^{2}\nonumber \\&+\frac{ \lambda }{2\sigma ^{3}}\sum \limits _{i=1}^{N}\frac{\phi (\cdot )}{1-\Phi (\cdot )}(y_{i}-x_{i}^{\prime }\beta )=0; \end{aligned}$$
(11)
$$\begin{aligned} \frac{\partial \ln L}{\partial \lambda }= & {} -\frac{1}{\sigma ^{2}} \sum \limits _{i=1}^{N}\frac{\phi (\cdot )}{1-\Phi (\cdot )} (y_{i}-x_{i}^{\prime }\beta )=0; \end{aligned}$$
(12)
$$\begin{aligned} \frac{\partial \ln L}{\partial \beta }= & {} \frac{1}{\sigma ^{2}} \sum \limits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta )x_{i}+\frac{\lambda }{ \sigma }\sum \limits _{i=1}^{N}\frac{\phi (\cdot )}{1-\Phi (\cdot )}x_{i}=0, \end{aligned}$$
(13)

where \(\phi (\cdot )\) is the standard normal density function. \(\phi (\cdot ) \) and \(\Phi (\cdot )\) are evaluated at \(\frac{\lambda }{\sigma } (y_{i}-x_{i}^{\prime }\beta )=\frac{\lambda }{\sigma }\varepsilon _{i}\). Waldman (1982) shows that in the presence of wrong skewness \(\lambda =0\) and OLS is a local maximum of the log-likelihood.

For our constrained MLE, the constraint (7) or (9) involves the value of \(k_{0}\), not \(\lambda \) directly. Since \(\lambda \) is a monotonic increasing function of k, \(k\ge k_{0}\) implies

$$\begin{aligned} \lambda \ge \sqrt{\frac{\pi }{\pi -2}\frac{1}{(1/k_{0}-1)}}. \end{aligned}$$
(14)

To show how restricting \(\lambda \) affects the estimation result and how the constrained MLE of \(\beta \) is different from the OLS, consider equation (13).Footnote 18 Taking the first-order Taylor expansion at \(\lambda =0\) gives

$$\begin{aligned} \frac{\phi \left( \frac{\lambda }{\sigma }\varepsilon _{i}\right) }{1-\Phi \left( \frac{\lambda }{\sigma }\varepsilon _{i}\right) }\approx \sqrt{\frac{2}{\pi }}+\frac{2}{\pi } \frac{\lambda }{\sigma }\varepsilon _{i}. \end{aligned}$$

Thus, (13) becomes

$$\begin{aligned} 0= & {} \sum \limits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta )x_{i}+\lambda \sigma \sum \limits _{i=1}^{N}\frac{\phi (\frac{\lambda }{\sigma }\varepsilon _{i})}{ 1-\Phi \left( \frac{\lambda }{\sigma }\varepsilon _{i}\right) }x_{i}\approx \sum \limits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta )x_{i}\\&+\lambda \sigma \sum \limits _{i=1}^{N}\left( \sqrt{\frac{2}{\pi }}+\frac{2}{\pi }\frac{\lambda }{ \sigma }\varepsilon _{i}\right) x_{i} \\= & {} \left( 1+\frac{2}{\pi }\lambda ^{2}\right) \sum \limits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta )x_{i}+\sqrt{\frac{2}{\pi }}\lambda \sigma \sum \limits _{i=1}^{N}x_{i}. \end{aligned}$$

That is,

$$\begin{aligned} \sum \limits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta )x_{i}+\frac{\sqrt{\frac{2}{ \pi }}\lambda }{\left( 1+\frac{2}{\pi }\lambda ^{2}\right) }\sqrt{\sigma ^{2}} \sum \limits _{i=1}^{N}x_{i}=0. \end{aligned}$$
(15)

In matrix form, the equation (15) above can be written as

$$\begin{aligned} X^{\prime }y-X^{\prime }X\beta +\varphi (\lambda )\sqrt{\sigma ^{2}} X^{\prime }\iota =0 \end{aligned}$$
(16)

where \(\varphi (\lambda )=\sqrt{\frac{2}{\pi }}\lambda /(1+\frac{2}{\pi } \lambda ^{2})\) and \(\iota \) is the \(N\times 1\) vector of ones. Equivalently,

$$\begin{aligned} {\hat{\beta }}_{CMLE}\simeq (X^{\prime }X)^{-1}X^{\prime }y+\varphi (\lambda ) \sqrt{\sigma ^{2}}(X^{\prime }X)^{-1}X^{\prime }\iota . \end{aligned}$$
(17)

In the presence of wrong skewness, OLS (i.e., \(\lambda =\varphi =0\)) is a local maximum of the log-likelihood. Under the constraint (14), the estimator of \(\beta \) is adjusted by the second term in equation (17).Footnote 19

Given the fact that \(\varphi (\lambda )\) is monotonically increasing in \( \lambda \) in the range [0, \(\sqrt{\pi /2}\approx 1.2533]\), the difference between the constrained MLE and the OLS of \(\beta \) is positively related to the value of \(\lambda \).Footnote 20 The larger \(\lambda \) (or \(k_{0}\)) is imposed, the bigger is the difference between the OLS and the constrained MLE. Furthermore, in a given sample this difference depends not only on \( \varphi (\lambda )\), but also on the sample value of the regressors and \( \sigma ^{2}\) jointly determined by first-order equations. We conjecture that constraint (10) with a small value of \(k_{0}\) slightly adjusts the estimators of \(\beta \) and \(\sigma _{v}^{2}\), but has a much larger effect on the estimated \(\sigma _{u}^{2}\) and \(\lambda \). This point is confirmed in the Monte Carlo experiments and empirical example below.

5 Monte Carlo experiments

In this section, Monte Carlo experiments are conducted to study how the proposed constraints affect the estimates and how the chosen value of \(k_{0} \), the imposed lower bound of k, is affected by the sample size. Since respecification is always a solution to wrong skew, we also study our approach under model misspecification. We consider two forms of misspecification: when the inefficiency distribution is binomial (Carree 2002) and when it is exponential. The binomial case is particularly relevant since it is a specification that allows for residual skewness in either direction. We compare our approach under misspecification to the correctly specified estimators in terms of the root mean squared error (RMSE) of the estimated efficiency variance and average efficiency, and the proposed method’s performance is surprisingly good (given the misspecification) when the sample size is small. Due to its computational convenience, our focus is on the constrained COLS estimator.

We consider a specification

$$\begin{aligned} y_{i}=\beta _{0}+\beta _{1}x_{1i}+\beta _{2}x_{2i}+\varepsilon _{i},\, \varepsilon _{i}=-u_{i}+v_{i},\ \ i=1,\ldots ,N, \end{aligned}$$

where \(\beta _{0}=1,\) \(\beta _{1}=0.8,\) \(\beta _{2}=0.2\), \(x_{1i}\sim \log (|N(4,100)|)\), \(x_{2i}\sim \log (|N(2,60)|)\), \(v_{i}\sim N(0,\sigma _{v}^{2}) \) and \(u_{i}\sim |N(0,\sigma _{u}^{2})|\). \(k=Var(u_{i})/Var(\varepsilon _{i}) \) is the inefficiency variance ratio.Footnote 21\(\sigma _{u}^{2}=\frac{ \pi }{\pi -2}Var(u_{i})=\frac{\pi -2}{\pi }kVar(\varepsilon _{i})\) and \( \sigma _{v}^{2}=(1-k)Var(\varepsilon _{i})\). We set \(Var(\varepsilon _{i})=\sigma _{v}^{2}+Var(u_{i})=0.026\), so the variance of \(x_{1i}\) and \( Var(\varepsilon _{i})\) are comparable to those in the empirical example below.

Since the focus is the proposed correction for samples with wrong residual skewness, we report the frequency of wrong skewness in the replications and drop the samples with correct skewness. The number of replications is 4, 000 before dropping the samples with correct skewness. We conduct experiments with \(k=\) 0.1, 0.2, 0.3, 0.5,  0.7 and \(N=50\), 100, 200. For our binomial misspecification experiments, we generate \(u_{i}\sim B(10,p)\) with \(Var(u)=p(1-p)\). For exponential misspecification, we generate \(u_{i}\sim Exp(\sigma _{u})\) with \( Var(u)=\sigma _{u}^{2}\). The same inefficiency variance ratio and sample size combinations are considered in both cases.

Table 1 Monte Carlo results: constrained COLS

Table 1 reports the simulation results. Column (2) gives the average value of \({\hat{k}}_{0}\). To obtain \({\hat{k}}_{0}\) for each sample, a grid search is conducted to minimize \(C(k_{0})\) on the interval [0.05, 0.9]. As expected, the average value of \({\hat{k}}_{0}\) decreases with N. Column (3) shows that there is still a possibility of wrong skewness after the proposed finite sample correction. The frequency depends on the inefficiency variance ratio and sample size, varying from 16.3% to \(39.9\%\). For example, for \( k=0.5\), \(N=100\), our finite sample correction approach could fail with a possibility of \(28.4\%\). This failure is a cost of the linearization approximation (8). When \(k_{0}\) is small, \(g(k_{0})({\hat{\mu }} _{2}^{\prime })^{3/2}\) could be a small negative value close to zero. Consequently, due to approximation error, a linearized constraint does not guarantee a negative third moment of residuals or skewness. However, as k increases, the failure frequency can be greatly reduced, e.g., to \(16.3\%\) for \(k=0.7\), \(N=200\).

For parameter estimators, columns (4)–(7) indicate that with the correction of \({\hat{\sigma }}_{u}\sqrt{2/\pi }\), constrained COLS of \(\beta _{0}\) is less biased than OLS, but with a much larger root mean squared errors (RMSE). But when k and N increase, the RMSE of constrained COLS is comparable to that of OLS. (Bias and RMSE of OLS of \(\beta _{0}\) (and \( \beta _{1}\)) are included in columns (6), (8) (and (10), (12)) for comparison). In addition, compared with OLS, the constrained COLS of \(\beta _{1}\) is slightly upward biased with bigger RMSE, and the bias and RMSE decrease with k and N.

In the presence of wrong skewness, \(\sigma _{u}^{2}\) is typically assumed to be zero. Using our correction, column (12) shows that the estimated \( \sigma _{u}^{2}\) tends to be overestimated for a small value of k and underestimated for a big value of k. Compared with \(\sigma _{u}^{2}\), \( \sigma _{v}^{2}\) can be estimated more accurately in terms of bias, as indicated in column (14).

Columns (14) and (15) report the bias and RMSE of the mean technical efficiency \(E[\exp (-u_{i})]=2\exp (\sigma _{u}^{2}/2)[1-\Phi (\sigma _{u})]\) . In the presence of wrong skewness, traditional practice suggests that the estimated \(\sigma _{u}^{2}\) is 0, implying that the estimated mean technical efficiency is 1. This practice obviously overestimates the true mean technical efficiency. Column (14) shows that the mean technical efficiency estimator using the proposed correction could be unbiased with a reasonable RMSE for a sizable value of k, say, 0.2 here under the current design. It is downward biased for a small value of k, and upward biased for \(k>0.2\).

Table 2 Monte Carlo results: comparison between constrained COLS vs Carree (2002)

Under binomial misspecification, we focus on the performance of the variance estimate of \(\sigma _{u}^{2}\) and the mean technical efficiency score \(E(\mathbf {e^{-u}})\). The binomial case is particularly relevant since it is a specification that accommodates residual skewness in either direction. The results are reported in Table 2. Columns (1)–(3) are very similar to those in Table 1 except that the frequency of wrong skewness after applying the proposed finite sample correction is higher. This makes sense considering the misspecified model in the current setting. Columns (4)–(7) report the coefficient estimates and Columns (12)–(15) for Carree’s (2002) approach, which is OLS estimated with a binomial inefficiency. Columns (8)–(11) report the bias and RMSE of the variance estimates and mean technical efficiency estimates for the proposed constrained COLS, and those using the Carree (2002) approach are in columns (16)–(19). We find that the proposed method works well (i.e., smaller bias and RMSE) when the sample size or k is small.Footnote 22 As sample size increases, the precision of binomial estimation improves very fast, and it is similar to that of the proposed method.

Table 3 Monte Carlo results: constrained COLS with Exponential inefficiency

We also conduct misspecified simulations, assuming that inefficiency follows an exponential distribution, while applying our proposed finite sample correction in Sect. 4 based on a half-normal distribution. The results are reported in Table 3. In general, the performance of proposed estimator is similar to that in Table 1 (the normal-half normal design), except for that of the mean technical efficiency, which depends heavily on the distribution of the data generating process. The bias of the variance estimator is small, indicating a degree of robustness of our approach.

6 Empirical example: the US airline industry

Table 4 Estimates of Airlines Example: constrained MLE

In this section, an airlines example is used to illustrate our approach. This is an unbalanced panel data set with 256 observations. See Greene (2007) for detailed information of this data set. In this example, the dependent variable is the logarithm of output and the independent variables include the logarithm of fuel, materials, equipment, labor, and property. Here, the unbalanced panel is treated as a cross section for 256 firms to ensure that the wrong skewness issue arises.Footnote 23 Column (2) of Table 4 presents the OLS estimates along with standard errors (column 3). Except for the constant term, the slope coefficients are consistent with Table 2.11 in Greene (2007). The OLS residual skewness (0.0167) is in the wrong direction for the estimated normal-half normal model. Thus, the estimates of \(\lambda \) and \(\sigma _{u}^{2}\) are set to zero and firms are considered to be “super efficient.” However, Greene (2007, footnote 84) does suggest that there is evidence of technical inefficiency in the data. The second root of the likelihood with positive \(\lambda \) is reported in the second section of Table 4. This MLE yields a small positive residual skewness 0.0093.Footnote 24 Usually, in the presence of “wrong” skewness, researchers are advised to obtained a new sample or respecify the model. For the purpose of comparison, we also report the \(95\%\) confidence (prediction) intervals for MLE obtained using the bagging procedure of Simar and Wilson (2010) in column (4) and corresponding standard errors in column (5).

Instead, we use the constrained MLE (and constrained COLS), a finite sample adjustment to the existing MLE (and COLS). The optimal value of \(k_{0}\) can be chosen by \(BIC(k_{0})\) (and \(C(k_{0})\) for the constrained COLS) proposed above. For purposes of illustration, we present constrained MLE results of \(k_{0}=0.05\), 0.1, 0.15, and 0.2 in columns (8)–(15) of Table 4 and compare the values of \(BIC(k_{0})\), showing that \({\tilde{k}} _{0}=0.15\) achieves the minimum of \(BIC(k_{0})\). Thus, the constrained MLE of \(\lambda \) and \(\sigma _{u}\) are positive, 0.6829 and 0.1015  respectively. We also report the mean technical efficiency, i.e., \(E(e^{-u})\) for comparison. Furthermore, consistent with the negative population skewness of the composed error, the skewness of constrained MLE residuals \((-\,0.0599)\) has the desired sign.

Since the constraint slightly adjusts the coefficients of constrained MLE, as expected, the rest of the coefficients are very close to the unconstrained MLE and OLS. For example, the constrained estimated coefficient of variable Log fuel is 0.3907 (column (12)), while its unconstrained counterpart is 0.3836 (in column 4) and OLS coefficient is 0.3828 (in column (2)). All estimates obtained from constrained MLE stay within the bagging confidence intervals. In addition, the bagging standard errors are closer to those from the constrained MLE in column (12) than those from unconstrained MLE in column (5) or OLS in column (3). These provide a degree of comfort for the validity of our proposed method.

Consistent with the analysis in Sect. 4.2, the difference between the constrained MLE slope coefficients and its OLS (and unconstrained MLE) counterparts is positively related to the magnitude of \(k_{0}\). The bigger the value of \(k_{0}\) , the larger is the difference. However, this difference is relatively small. For example, the constrained estimated coefficients of variable Log fuel using \(k_{0}=0.2\) is 0.3939 (in column (14) of Table 4), compared with the OLS 0.3828 and the unconstrained MLE 0.3836 (in columns (2) and (4) of Table 4). This is also the case for \(\sigma _{v}\) and \(\sigma ^{2}\). In stark contrast to this small difference in slope coefficients, the residual skewness and estimated k change significantly, since they are implicitly determined by the chosen value of \(k_{0}\) in the constraint. Another important point observed in Table 4 is that the value of the likelihood decreases with \(k_{0}\).Footnote 25

Table 5 Estimates of Airlines Example: constrained COLS

The results of constrained COLS are reported in columns (8)–(15) of Table 5 and are very close to their constrained MLE counterparts for given values of \(k_{0}=0.05\), 0.1, 0.15, and 0.2.Footnote 26 However, for the constrained COLS, the optimal value of \( k_{0}\) is 0.1 by applying Mallows’ \(C_{p}\)-type criterion \(C(k_{0})\) proposed above. (Table 5 reports \(N\times C(k_{0})\) instead of \(C(k_{0})\).) This is slightly different from \({\tilde{k}}_{0}=0.15\) by minimizing \( BIC(k_{0})\) in the constrained MLE. Therefore, the constrained COLS of \( \sigma _{u}\) is 0.0853 and skewness is \(-\,0.0325\) in column (10).

It is worth mentioning that the value of criterion C(0.15) is nearly equal to C(0.1) in this empirical example, implying that \(BIC(k_{0})\) for the constrained MLE and \(C(k_{0})\) for the constrained COLS result in similar optimal values of \(k_{0}\). Again, all the estimates obtained from constrained COLS stay within the bagging confidence intervals by Simar and Wilson (2010) and the bagging standard errors are very close to those from constrained COLS.

Since the proposed finite sample adjustment restricts the inefficiency variance ratio, it indirectly affects the estimated \(\sigma _{u}^{2}\). In this example, it is \(0.1015^{2}\), for the constrained MLE. Consequently, the mean technical efficiency estimate, \(2\exp ({\hat{\sigma }} _{u}^{2}/2)[1-\Phi ({\hat{\sigma }}_{u})]\), depends on the chosen value of \( k_{0}\). However, efficiency rankings appear to be preserved under different choices of \(k_{0}\). For the unconstrained MLE, the least efficient firm is the 79th with technical efficiency .8958. If we impose \(k_{0}=0.05\), 0.1, 0.15, 0.2 in the constraint, the technical efficiency becomes .8583, .8308, .8015, .7722, respectively, and it remains lowest among the 256 firms. The most efficient firm is the 250th with technical efficiency .9696, 0.9669, .9655, .9644, .9636 for the unconstrained MLE and constrained MLE with \(k_{0}=0.05\), 0.1, 0.15, 0.2, respectively. This is also the case for the median firm.

Finally, we also calculate the bagging prediction intervals for efficiency scores of the least efficient firm (i.e., the 79th firm) and the most efficient firm (i.e., the 250th firm): [0.5366, 0.9987] and [0.8865,0.9988], respectively. All above estimates fall safely into these prediction intervals.

7 Conclusions

This paper studies the wrong skewness issue in parametric stochastic frontier models. Following Simar and Wilson’s (2010), we consider wrong skewness to be a consequence of estimation in finite samples when the inefficiency variance ratio is small. (Another possibility is that the model is misspecified.) In finite samples, the data may fail to be informative enough to detect the existence of inefficiency term in stochastic frontier models, even though the population inefficiency variance ratio could be fairly large. Thus, the resulting residuals could display skewness in either direction with probability of as high as 50%.

As an alternative to the usual “solutions” to the wrong skew problem, we propose a feasible finite sample adjustment to existing estimates. When there is evidence of inefficiency, it is reasonable to impose a lower bound on the inefficiency variance ratio in the normal-half normal model, equivalent to a negative upper bound on the residual skewness. Thus, we propose to use this negative bound on residual skewness as a constraint in the MLE and COLS in the event of wrong skewness.

The idea of the proposed constrained estimators is to slightly adjust the slope coefficients in finite samples. They provide a point estimate that yields a negative residual skewness, though a correct sign of residual skewness is not always guaranteed. Since the constraint is based on \(k_{0}\), the choice of \(k_{0}\) affects estimation results. A model selection approach is proposed to select \(k_{0}\). Monte Carlo experiments show that the bias of constrained estimates is less of a concern when sample size is large and inefficiency variance ratio increases. The proposed method is comparable with existing methods in the literature such as the binomial estimation by Carree (2002). The empirical example in this paper also shows that the value \(k_{0}\) has little effect on the estimated slope coefficients and \(\sigma _{v}\), \(\sigma ^{2}\), while the residual skewness and estimated k are implicitly determined by the value of \(k_{0}\). In this sense, the proposed method can be regarded as a finite sample adjustment to existing estimators, rather than a new estimator. When the sample size is large, since wrong skewness is less likely to occur, such adjustment becomes unnecessary.