Wrong skewness and finite sample correction in the normal-half normal stochastic frontier model

Cai, Jun; Feng, Qu; Horrace, William C.; Wu, Guiying Laura

doi:10.1007/s00181-020-01988-z

Wrong skewness and finite sample correction in the normal-half normal stochastic frontier model

Published: 08 January 2021

Volume 60, pages 2837–2866, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Empirical Economics Aims and scope Submit manuscript

Wrong skewness and finite sample correction in the normal-half normal stochastic frontier model

Download PDF

293 Accesses
8 Citations
4 Altmetric
1 Mention
Explore all metrics

Abstract

In parametric stochastic frontier models, the composed error is specified as the sum of a two-sided noise component and a one-sided inefficiency component, which is usually assumed to be half-normal, implying that the error distribution is skewed in one direction. In practice, however, estimation residuals may display skewness in the wrong direction. Model respecification or pulling a new sample is often prescribed. Since wrong skewness may manifest as a finite sample problem, this paper proposes a finite sample adjustment to existing estimators to obtain the desired direction of residual skewness. This provides an alternative empirical approach to deal with the wrong skewness problem that does not require respecification of the model.

The wrong skewness problem in stochastic frontier analysis: a review

Article 17 October 2023

Multivariate Skew Normal-Based Stochastic Frontier Models

Article 22 March 2022

Model uncertainty and efficiency measurement in stochastic frontier analysis with generalized errors

Article Open access 19 May 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In parametric stochastic frontier models for cross-sectional data, the error term is composed as the sum of a two-sided noise component and a one-sided inefficiency component. The canonical model of Aigner et al. (1977) assumes that the noise distribution is zero-mean normal and the inefficiency distribution is half-normal. Other common distributional assumptions on the inefficiency term include the exponential (Meeusen and van den Broeck 1977; Aigner et al. 1977), the truncated normal (Stevenson 1980), and the gamma (Stevenson 1980; Greene 1980). For surveys, see Greene (2007) and Kumbhakar and Lovell (2000). The beauty of the canonical model is it allows for continuous inefficiency, does not require estimation of a support bound (in what amounts to a deconvolution problem), and only has two unknown distributional parameters, making maximum likelihood estimation simple relative to other specifications, except for (perhaps) the normal-exponential model. Consequently, the normal-half normal specification of the model has found widespread use.

In the normal-half normal production function specification, the skewness of the composed error is negative, and parameters can be estimated by maximum likelihood estimation (MLE) or corrected ordinary least squares (COLS).^{Footnote 1} Waldman (1982) shows that when the skewness of the ordinary least squares (OLS) residuals is positive, OLS is a local maximum of the likelihood function, and estimated inefficiency is zero in the sample.^{Footnote 2} Horrace and Wright (2020) generalize the Waldman result to the case where the inefficiency distribution is from a scalable parametric class, like the exponential distribution. This “wrong skewness” phenomenon is widely documented in the literature and is often regarded as an estimation failure.^{Footnote 3} When it occurs, researchers are advised to either obtain a new sample (which is rarely feasible) or respecify the model. While there are many ways one can respecify a model, a reasonable approach would be to chose an inefficiency distribution that accommodates residuals of either positive or negative skew. For example, Li (1996), Carree (2002), Almanidis and Sickles (2011), Almanidis et al. (2014), and Hafner et al. (2019) develop models with new distributional assumptions on the inefficiency component that allow for skew in either direction.

An alternative “solution” to the problem that does not require respecification is that of Simar and Wilson (2010), who argue that “wrong skewness” is not an estimation or misspecification failure, but a finite sample problem that most likely occurs when the ratio of the inefficiency variance to the noise variance (the inefficiency variance ratio) is small. That is, wrong skewness may not indicate that the normal-half normal model is wrong or that inefficiency does not exist in the population. They propose a bootstrap method (called “bagging”) to construct confidence intervals for model parameters and expected inefficiency which have higher coverage than traditional intervals, regardless of residual skewness direction. Then, the sample under study can still be used to infer the model parameters. While bagging can be applied to any parametric form of the model, their technique is specifically intended to salvage the canonical model when the residual skew has the wrong sign. Such is the spirit of this research.

We take Simar and Wilson’s (2010) view that wrong skewness may be a consequence of a small inefficiency variance ratio in finite samples, even when the canonical model is properly specified.^{Footnote 4} However, instead of the bagging approach of Simar and Wilson (2010), this paper provides a finite sample adjustment to existing estimators in the presence of wrong skewness. That is, we impose a negative residual skewness constraint in the MLE (or COLS) algorithm. A natural candidate for this constraint is the upper bound of the population skew, which is a monotonic function of the positive lower bound of the inefficiency variance ratio in the half-normal model. However, the constraint is nonlinear in the parameters of interest, complicating computation of the optimum. Therefore, a linearization approximation of the constraint is proposed. Additionally, a model selection approach is proposed to determine the lower bound of the inefficiency variance ratio used in the constraint. Monte Carlo experiments suggest that our correction becomes more reliable as the true inefficiency variance ratio increases. A shortcoming of the approach is that in finite samples the linear approximation may not be accurate enough to guarantee a negative residual skew, but it will always give us nonzero estimates for firm-level inefficiencies. The possible failure of correct residual skewness using the linearized constraint illustrates a trade-off between computational complexity and accuracy. Using the original nonlinear constraint avoids this issue, but the computational simplicity of our approach, as shown in our Monte Carlo experiments and empirical example, would be lost.

The proposed finite sample adjustment provides a nonzero point estimate for technical (or cost) inefficiency that may be used in applied research and that serves as a compliment to the bagging intervals of Simar and Wilson (2010). That is, if the wrong skew exists in the canonical model, then one may use Simar and Wilson’s bagging technique to construct confidence intervals on efficiency, while the proposed methodology provides nonzero point estimates without having to respecify the model. Since wrong skewness can occur fairly regularly (even when efficiency may exist in the population under study), the finite sample adjustment is particularly attractive in cases where the half-normal inefficiency assumption is maintained. It is noteworthy that the proposed adjustment is only needed in finite samples, for as the sample size increases wrong skewness is less likely to be an issue when the ratio of inefficiency variance to noise variance is sizable.

This rest of this paper is organized as follows. The next section discusses the wrong skewness issue in the literature. In Sect. 3, we propose a finite sample correction approach. To simplify computation of the proposed constrained estimation, a linearized version of the constraint is used, so that constrained MLE (or COLS) can be easily implemented in most software packages (like STATA). The constrained estimators are discussed in Sect. 4. In Sect. 5, Monte Carlo experiments are conducted to study the properties of constrained COLS. An empirical example is used to illustrate the proposed approach in Sect. 6, and all the point estimates of inefficiency lie within the confidence intervals of Simar and Wilson (2010). The last section concludes.

2 Wrong skewness issue

A stochastic production frontier (SPF) model for a cross-sectional sample of size N is:

$$\begin{aligned} y_{i}=x_{i}^{\prime }\beta +\varepsilon _{i},\ \ \ i=1,\ldots ,N, \end{aligned}$$

(1)

with composed error $\varepsilon _{i}=v_{i}-u_{i}.$ The disturbance $v_{i}$ is assumed $iidN(0,\sigma _{v}^{2})$. Inefficiency of firm i is characterized by $u_{i}\ge 0$. In the SPF literature, $u_{i}$ is usually assumed half-normal $|iidN(0,\sigma _{u}^{2})|$ (Aigner et al. 1977; Wang and Schmidt 2009), and independent of $v_{i}$, with variance $ Var(u_{i})=\frac{\pi -2}{\pi }\sigma _{u}^{2}$. The first component of the $ p\times 1$ vector $x_{i}$ is 1, so the intercept term is contained in the $p\times 1$ slope parameter vector $\beta $. As in Aigner et al. (1977) and Simar and Wilson (2010), let $\sigma ^{2}=\sigma _{u}^{2}+\sigma _{v}^{2}$ and $\lambda =\sigma _{u}/\sigma _{v}$. The parameters to be estimated are $\theta =(\beta ,\lambda ,\sigma ^{2})$.

There are two primary estimators suggested in the literature: the maximum likelihood estimator and corrected least squares (Aigner et al. 1977; Olson et al. 1980). Under the normal-half normal specification, the MLE of $(\beta ,\lambda ,\sigma ^{2})$ is the set of parameters values maximizing the likelihood function:

$$\begin{aligned} \ln L(\beta ,\lambda ,\sigma ^{2}|(y_{i},x_{i}),i&=1,...,N) \nonumber \\&=\frac{N}{2}\ln \left( \frac{2}{\pi }\right) -\frac{N}{2}\ln \sigma ^{2}\nonumber \\&\quad +\sum \limits _{i=1}^{N}\ln \left[ 1-\Phi \left( \frac{\lambda }{\sqrt{ \sigma ^{2}}}(y_{i}-x_{i}^{\prime }\beta )\right) \right] \nonumber \\&\quad -\frac{1}{2\sigma ^{2}}\sum \limits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta )^{2}, \end{aligned}$$

(2)

where $\Phi \left( \cdot \right) $ is the standard normal cumulative distribution function. The COLS estimate of $\beta $ is simply the least squares slope estimate in the regression of $y_{i}$ on $x_{i}$. However, the mean of $\varepsilon _{i}=v_{i}-u_{i}$ is negative due to the term $ -u_{i}$, so the COLS estimate needs to be adjusted by adding the bias, $ \sqrt{2\sigma _{u}^{2}/\pi }$, back into the intercept estimator. The bias can be consistently estimated using the variance estimates:

$$\begin{aligned} {\hat{\sigma }}_{u}^{2}=\left[ \sqrt{\frac{\pi }{2}}\left( \frac{\pi }{\pi -4} \right) {\hat{\mu }}_{3}^{\prime }\right] ^{2/3},\quad {\hat{\sigma }} _{v}^{2}={\hat{\mu }}_{2}^{\prime }-\frac{\pi -2}{\pi }{\hat{\sigma }}_{u}^{2}, \end{aligned}$$

(3)

where ${\hat{\mu }}_{2}^{\prime }$ and ${\hat{\mu }}_{3}^{\prime }$ are the estimates of second and third sample moments of the least squares residuals.

Both MLE and COLS are consistent. The Monte Carlo experiments in Olson et al. (1980) show that there is little difference between MLE and COLS for the slope coefficients in finite samples. For the intercept and variance parameters, however, MLE and COLS differ. In addition to MLE and COLS, Olson et al. (1980) also consider a third consistent estimator, the two-step Newton–Raphson estimator, which has different finite sample properties than MLE and COLS.

Waldman (1982) discovers an important property of MLE: for the likelihood function (2) above, the point $(b,0,s^{2})$ is a stationary point, where b and $s^{2}$ are the OLS estimates of $\beta $ and $\sigma ^{2}$. Intuitively, when $\lambda =0$, the term $u_{i}$ disappears, so the likelihood function of the SPF model (2) boils down to one of a linear model with $u_{i}=0$. A salient result in Waldman (1982) is that when the skewness of the OLS residuals is positive, i.e., ${\hat{\mu }}_{3}^{\prime }>0$, then $(b,0,s^{2})$ is a local maximum in the parameter space of the likelihood function.^{Footnote 5} This is the so-called wrong skewness issue in the literature, because $\mu _{3}^{\prime }<0$ in the normal-half normal model. Olson et al. (1980) refer to this phenomenon as “Type I failure” since the COLS estimator defined in (3) does not exist when $\hat{\mu }_{3}^{\prime }>0$.

The Monte Carlo studies in Simar and Wilson (2010) show that the wrong skewness issue is not rare, even when the inefficiency variance ratio is considerably large. For example, the frequency of wrong skewness could be 30% for a sample of size of 100 when $\lambda =\sigma _{u}/\sigma _{v}=1$. Wrong skewness casts doubt on the specification of the SPF model (Greene 2007). Moreover, it invalidates the calculation of standard errors of parameter estimates (Simar and Wilson 2010).

Greene (2007) considers OLS residual skewness a useful diagnostic tool for the normal-half normal model. Wrong skewness suggests there is little evidence of inefficiency in the sample, implying that firms in the sample are “super efficient.” Thus, $\lambda $ and $\sigma _{u}^{2}$ are assumed to be zero, and the stochastic frontier model reduces to a production function without the inefficiency term.^{Footnote 6} Another interpretation of the wrong skewness issue is that the normal-half normal model is not the correct specification. Other specifications may well reveal the presence of inefficiency and reconcile the distribution of one-sided inefficiency with the data. The binomial distribution considered by Carree (2002) and doubly truncated normal distribution proposed by Almanidis and Sickles (2011) and Almanidis et al. (2014) could have either negative or positive skewness. They argue that models with ambiguous skewness may be more appropriate in applied research.

Simar and Wilson (2010) argue that wrong skewness is a finite sample problem, even when the model is correctly specified.^{Footnote 7} They show that a bootstrap aggregating method provides useful information about inefficiency and the model parameters, regardless of whether residuals are skewed in the desired direction. We also consider wrong skewness to be a consequence of estimation in finite samples when the inefficiency variance ratio $ Var(u_{i})/Var(\varepsilon _{i})$ is small.^{Footnote 8} Since the OLS residuals of a production function regression with $u_{i}=0$ display skewness in either direction with probability 50%, a sample drawn from an SPF model with small inefficiency variance ratio could generate positively skewed residuals with high probability.^{Footnote 9}

3 Finite sample correction

As illustrated by Simar and Wilson (2010), wrong skewness may occur when the inefficiency variance ratio is sizable, so simply setting $\sigma _{u}^{2}=0$ when the skewness is positive could be a mistake. Instead of improved interval estimates proposed by Simar and Wilson (2010), this paper proposes a finite sample adjustment to existing estimators in the presence of wrong skewness.

For MLE, a constraint with non-positive residuals skewness is imposed:

$$\begin{aligned}&\max \ln L(\beta ,\lambda ,\sigma ^{2}|(y_{i},x_{i}),i=1,...,N)\nonumber \\&\quad s.t.\,\frac{1}{N}\sum _{i=1}^{N}\left[ \frac{y_{i}-{\bar{y}} -x_{i}^{\prime }\beta +{\bar{x}}^{\prime }\beta }{\sqrt{\frac{1}{N} \sum \nolimits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta -{\bar{y}}+{\bar{x}}^{\prime }\beta )^{2}}}\right] ^{3}\le 0, \end{aligned}$$

(4)

where ${\bar{y}}=\frac{1}{N}\sum \nolimits _{i=1}^{N}y_{i}$ and ${\bar{x}}=\frac{1 }{N}\sum \nolimits _{i=1}^{N}x_{i}$. Unfortunately, when implementing maximum likelihood estimation with the inequality constraint defined by (4), there is a practical issue. As pointed out by Waldman (1982), in the case of positive skewness of residuals, OLS $(b,0,s^{2})$ is a local maximum and the unconstrained MLE is equal to $(b,0,s^{2})$. Since OLS is a local maximum in the parameter space of unconstrained MLE, the constraint (4) is always binding at the maximum, leading to zero skewness of the constrained MLE residuals.^{Footnote 10}

If we regard the sign of residual skewness as an important indicator of model specification, the constrained MLE above seems unsatisfactory. We, therefore, propose a (negative) upper bound of skewness instead of zero in (4). This is relevant for empirical modeling. As in the empirical example below, when there is evidence of technical inefficiency in the data (Greene 2007, p. 202), its variance cannot be too small, relative to that of the composed error $\varepsilon _{i}$. Denote the inefficiency variance ratio by

$$\begin{aligned} k=Var(u_{i})/Var(\varepsilon _{i}). \end{aligned}$$

That is, a lower bound on the inefficiency variance ratio is implicitly imposed, $k\ge k_{0}$. From this perspective, to impose a positive value of $k_{0}$ is to obtain a nonzero estimated inefficiency.

To develop the relationship between the upper bound of skewness and the lower bound of the inefficiency variance ratio, consider the second and third moment of $\varepsilon _{i}$. Under the normal-half normal specification, Olson et al. (1980) show that

$$\begin{aligned} Var(\varepsilon _{i})=\sigma _{v}^{2}+\frac{\pi -2}{\pi }\sigma _{u}^{2} \, \end{aligned}$$

(5)

and

$$\begin{aligned} E[\varepsilon _{i}-E(\varepsilon _{i})]^{3}=\sigma _{u}^{3}\sqrt{2/\pi } [(\pi -4)/\pi ]. \end{aligned}$$

(6)

Using $\gamma _{1}(\varepsilon _{i})$ to denote the skewness of $\varepsilon _{i}$, we have

$$\begin{aligned} \gamma _{1}(\varepsilon _{i})=E\left[ \frac{\varepsilon _{i}-E(\varepsilon _{i})}{\sqrt{Var(\varepsilon _{i})}}\right] ^{3}=-\left[ \frac{Var(u_{i})}{ Var(\varepsilon _{i})}\right] ^{3/2}\gamma _{1}(u_{i})=-k^{3/2}\gamma _{1}(u_{i}), \end{aligned}$$

where $\gamma _{1}(u_{i})=\frac{4-\pi }{\pi -2}\sqrt{\frac{2}{\pi -2 }}\simeq 0.9953$, a constant for a half-normal distribution of $ u_{i}$. Denote $g(k)=\gamma _{1}(\varepsilon _{i})=-k^{3/2}\gamma _{1}(u_{i})$. Since $\gamma _{1}(u_{i})>0$, $g(k)<0$ (e.g., $ g(0.1)\approx -0.0315$, $g(0.2)\approx -0.0890$ and $g(0.3)\approx -0.1635$ ) and $g^{\prime }(k)=-\frac{3}{2}k^{1/2}\gamma _{1}(u_{i})<0$. An important property of g(k) is that it is a monotonically decreasing function of k. This implies that any upper bound, say $g_{0}$, of the population skewness, $ g(k)=\gamma _{1}(\varepsilon _{i})\le g_{0}$, is equivalent to a lower bound, denoted by $k_{0}$, of the inefficiency variance ratio, $k\ge k_{0}$ , i.e., $g_{0}=g(k_{0})<0$.

We impose this upper bound on the sample skewness, by replacing 0 in the constraint (4) with the negative upper bound of the population skewness, $g(k_{0})$. Consequently, a modified constraint

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^{N}\left[ \frac{y_{i}-{\bar{y}}-x_{i}^{\prime }\beta + {\bar{x}}^{\prime }\beta }{\sqrt{\frac{1}{N}\sum \nolimits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta -{\bar{y}}+{\bar{x}}^{\prime }\beta )^{2}}}\right] ^{3}\le g(k_{0}) \end{aligned}$$

is used in the constrained MLE in the event of wrong skewness of the OLS residuals.

Based on Waldman’s (1982) argument, the constraint above will also be binding at a maximum in the neighborhood of OLS. The constraint becomes

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^{N}\left[ \frac{y_{i}-{\bar{y}}-x_{i}^{\prime }\beta + {\bar{x}}^{\prime }\beta }{\sqrt{\frac{1}{N}\sum \nolimits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta -{\bar{y}}+{\bar{x}}^{\prime }\beta )^{2}}}\right] ^{3}=g(k_{0}) \end{aligned}$$

(7)

This finite sample adjustment gives a constrained estimator of parameter vector $(\beta ,\lambda ,\sigma ^{2})$.

The constrained COLS slope coefficients can be similarly defined. We use constraint (7), but replace the likelihood (2) with the sum of squared residuals as the objective function of a minimization problem. Since COLS reduces to OLS in the presence of wrong skewness and OLS is a local maximum of likelihood, as a finite sample adjustment to OLS, the constrained COLS slope coefficients are expected be close to their constrained MLE counterparts.

3.1 Linearizing the constraint

The nonlinearity of $\beta $ in the constraint (7) creates computational difficulties in calculating the constrained MLE. To simplify computation, a linearized version of the constraint (7) is considered. Given that OLS is a local maximum of likelihood in the presence of wrong skewness, empiricists normally start by estimating OLS with $ u_{i}=0 $. This is the first step in LIMDEP (Greene 1995) and FRONTIER (Coelli 1996). If the skewness of the OLS residuals is positive, then OLS is the optimum and the point of departure for our linearization concept.

Since the primary concern is skewness correction, we impose the additional restriction that the MLE residual variance $\frac{1}{N}\sum \nolimits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta -{\bar{y}}+{\bar{x}}^{\prime }\beta )^{2}$ is equal to that of OLS residuals, ${\hat{\mu }}_{2}^{\prime }$, which is a consistent estimator of the error variance. Thus, the linearized constraint becomes:

$$\begin{aligned} \frac{1}{N}\sum \limits _{i=1}^{N}[y_{i}-{\bar{y}}-(x_{i}+{\bar{x}})^{\prime }\beta ]^{3}=g(k_{0})\cdot ({\hat{\mu }}_{2}^{\prime })^{3/2}. \end{aligned}$$

Denote $f(\beta )=\frac{1}{N}\sum \nolimits _{i=1}^{N}[y_{i}-{\bar{y}}-(x_{i}+ {\bar{x}})^{\prime }\beta ]^{3}$. The first-order Taylor expansion of $f(\beta )$ at the OLS estimate ${\hat{\beta }}_\mathrm{OLS}$ is:

$$\begin{aligned} f(\beta )\approx f({\hat{\beta }}_\mathrm{OLS})+\left[ \frac{\partial f(\beta )}{ \partial \beta }_{|{\hat{\beta }}_\mathrm{OLS}}\right] ^{\prime }\cdot (\beta -\hat{ \beta }_\mathrm{OLS}), \end{aligned}$$

where $\frac{\partial f(\beta )}{\partial \beta }_{|{\hat{\beta }}_\mathrm{OLS}}$ is the derivative of $f(\beta )$ with respect to $\beta $ evaluated at $\hat{ \beta }_\mathrm{OLS}$. $f({\hat{\beta }}_\mathrm{OLS})$ is the third central moment of OLS residuals, i.e., ${\hat{\mu }}_{3}^{\prime }$. Now,

$$\begin{aligned} \frac{\partial f(\beta )}{\partial \beta }=-\frac{3}{N}\sum \limits _{i=1}^{N}[y_{i}-{\bar{y}}-(x_{i}+{\bar{x}})^{\prime }\beta ]^{2}(x_{i}- {\bar{x}}), \end{aligned}$$

and

$$\begin{aligned} \frac{\partial f(\beta )}{\partial \beta }_{|{\hat{\beta }}_\mathrm{OLS}}=-\frac{3}{N} \sum _{i=1}^{N}e_{i}^{2}(x_{i}-{\bar{x}}), \end{aligned}$$

where $e_{i}$ denotes the OLS residual $ y_{i}-x_{i}^{\prime }{\hat{\beta }}_\mathrm{OLS}$. Its sample mean is equal to zero since a constant term is included in the regression. Hence, an approximation of the constraint (7) is

$$\begin{aligned} {\hat{\mu }}_{3}^{\prime }-\frac{3}{N}\sum _{i=1}^{N}e_{i}^{2}(x_{i}-{\bar{x}} )^{\prime }(\beta -{\hat{\beta }}_\mathrm{OLS})=g(k_{0})\cdot ({\hat{\mu }}_{2}^{\prime })^{3/2}, \end{aligned}$$

(8)

or

$$\begin{aligned} \left[ \frac{1}{N}\sum _{i=1}^{N}e_{i}^{2}(x_{i}-{\bar{x}})\right] ^{\prime }(\beta - {\hat{\beta }}_\mathrm{OLS})=\frac{{\hat{\mu }}_{3}^{\prime }}{3}-\frac{g(k_{0})}{3}(\hat{ \mu }_{2}^{\prime })^{3/2}. \end{aligned}$$

(9)

Letting the $N\times 1$ vector ${\tilde{e}}$ be the squared OLS residual vector $(e_{1}^{2},...,e_{N}^{2})^{\prime }$, the constraint above can be written in matrix form as

$$\begin{aligned} \frac{1}{N}{\tilde{e}}^{\prime }M_{0}X(\beta -{\hat{\beta }}_\mathrm{OLS})=\frac{\hat{\mu }_{3}^{\prime }}{3}-\frac{g(k_{0})}{3}({\hat{\mu }}_{2}^{\prime })^{3/2}, \end{aligned}$$

where $M_{0}=I-\frac{1}{N}\iota \iota ^{\prime }$ and $\iota =(1,...,1)^{\prime }$. Thus, the linear constraint above can be written as

$$\begin{aligned} R\beta =q(k_{0}) \end{aligned}$$

(10)

with $R=\frac{1}{N}{\tilde{e}}^{\prime }M_{0}X$ and $q(k_{0})=R{\hat{\beta }} _\mathrm{OLS}+\frac{{\hat{\mu }}_{3}^{\prime }}{3}+\frac{\Pi }{3}k_{0}^{3/2}({\hat{\mu }} _{2}^{\prime })^{3/2}$, depending on the value of $k_{0}$.^{Footnote 11}

Therefore, the proposed finite sample correction for MLE of $(\beta ,\lambda ,\sigma ^{2})$, i.e., the constrained MLE, is defined as the solution to maximizing the likelihood (2) subject to the linear constraint (10). The corresponding estimators of $\sigma _{u}^{2}$ and $\sigma _{v}^{2}$ can be obtained by using the relationship $\sigma ^{2}=\sigma _{u}^{2}+\sigma _{v}^{2}$ and $\lambda =\sigma _{u}/\sigma _{v}$.

Similarly, the constrained COLS of $\beta $ is defined to minimize the sum of squared residuals subject to (10). As in the unconstrained estimation, the constrained estimators of $\sigma _{u}^{2}$ and $\sigma _{v}^{2}$ can be obtained by formula (3).

If $k_{0}=0$, then $g(k_{0})=0$ and the constraint above becomes $R(\beta - {\hat{\beta }}_\mathrm{OLS})={\hat{\mu }}_{3}^{\prime }/3$. This implies that the constrained and unconstrained estimators would be similar, since ${\hat{\mu }} _{3}^{\prime }$ is usually very small in the presence of wrong skewness. In the extreme case of ${\hat{\mu }}_{3}^{\prime }=0$, the constrained estimator reduces to OLS, which is a local maximum of the likelihood.

Using the linearized constraint (10), the estimates, standard errors, and confidence intervals of the constrained MLE and constrained COLS can be easily obtained using Stata or other existing software.^{Footnote 12}

However, since (10) does not guarantee a negative residual skewness in finite samples, there is a possibility that wrong skewness could still occur after our correction. The Monte Carlo experiments below show that this may only be a concern when the underlying inefficiency variance ratio is very small. However, as stressed above, by setting $ k_{0}>0$, a nonzero estimated inefficiency may be obtained even in the presence of wrong residual skewness.

3.2 Choosing the value of $k_{0}$

The idea of the proposed constrained estimators is to adjust the slope coefficients to obtain a correct sign of residual skewness using the constraint (10), which is a function of $k_{0}$. It is expected that when the chosen value of $k_{0}$ is small, a slight adjustment results in the constrained MLE (or constrained COLS), and its value will be close to the unconstrained MLE.

Choosing a specific value of $k_{0}$ is an empirical issue. On the one hand, when there is a priori evidence of inefficiency, the inefficiency variance ratio cannot be too small. On the other hand, as illustrated by the Monte Carlo study in Simar and Wilson (2010), wrong skewness is less likely to occur as the inefficiency variance ratio increases.^{Footnote 13} In the spirit of this trade-off, we develop a model selection criteria to choose $k_{0}$. The idea is to incorporate a penalty function, so that as $k_{0}$ increases the penalty decreases. Hence, the fit of the model and effect of the constraint on the optimum can be balanced.

For constrained MLE, we propose a Bayesian information criterion (BIC) via the likelihood to choose the value of $k_{0}$:

$$\begin{aligned} BIC(k_{0})=-2l_{r}(k_{0})-k_{0}\ln N, \end{aligned}$$

where $l_{r}(k_{0})$ is the log-likelihood evaluated at the constrained MLE of $(\beta ,\lambda ,\sigma ^{2})$, depending on $k_{0}$. Since OLS $ (b,0,s^{2})$ is a local maximum of the log-likelihood function in the presence of positive skewness with a restriction on $k_{0}$, the value of $ l_{r}(k_{0})$ decreases with $k_{0}$ in the neighborhood of $(b,0,s^{2})$.^{Footnote 14} Different from the usual BIC, here we use a negative sign in front of the penalty term $k_{0}\ln N$ so that $-2l_{r}(k_{0})$ and $-k_{0}\ln N$ move in opposite directions with $k_{0}$. An optimal value of $k_{0}$ is chosen to minimize $BIC(k_{0})$:

$$\begin{aligned} {\tilde{k}}_{0}=\arg \min _{k_{0}\in [0,1)}BIC(k_{0}). \end{aligned}$$

Similarly, for the constrained COLS, a criterion based on sum of squared residuals is proposed to select the value of $k_{0}$:

$$\begin{aligned} C(k_{0})=\frac{1}{N}SSR_{r}(k_{0})-k_{0}{\hat{\sigma }}_{\varepsilon }^{2}\frac{ \ln N}{N}, \end{aligned}$$

where $SSR_{r}(k_{0})$ is the sum of squared residuals of OLS with the constraint (10). $C(k_{0})$ is a Mallows’ $C_{p}$-type criterion, similar to the expression proposed by Bai and Ng (2002) to choose the number of factors in the approximate factor models, except that the penalty term takes a negative sign. By applying the properties of the usual restricted least squares, it can be shown that $SSR_{r}(k_{0})$ increases with $k_{0}$ (See “Appendix”). Hence, the effect of increasing $k_{0}$ on the model fit can be balanced by the penalty term, thus an appropriate value of $k_{0}$ is chosen to minimize $C(k_{0})$:

$$\begin{aligned} {\hat{k}}_{0}=\arg \min _{k_{0}\in [0,1)}C(k_{0}). \end{aligned}$$

The estimated error variance ${\hat{\sigma }}_{\varepsilon }^{2}$ provides an appropriate scaling to the penalty term. Here, we use ${\hat{\sigma }} _{\varepsilon }^{2}=\frac{1}{N}SSR$, where SSR is the sum of squared residuals of OLS without constraint.

In practice, to find the value of ${\tilde{k}}_{0}$ (or ${\hat{k}}_{0}$) a grid search can be applied to $BIC(k_{0})$ (or $C(k_{0})$) starting from a small positive value, e.g., 0.05.

Since the measures of the model fit in the constrained MLE and COLS, i.e., the objective functions in the penalized least squares and penalized maximum likelihood are different, ${\tilde{k}}_{0}$ is not necessarily equal to $ {\hat{k}}_{0}$. However, in the neighborhood of OLS $(b,0,s^{2})$ with a small value of $\lambda $, when the term $\sum \nolimits _{i=1}^{N}\ln \left[ 1-\Phi \left( \frac{\lambda }{\sqrt{\sigma ^{2}}}(y_{i}-x_{i}^{\prime }\beta )\right) \right] $ in $l(\beta ,\lambda ,\sigma ^{2})$ has small values of partial derivatives in the first-order conditions, ${\tilde{k}}_{0}$ should be close to ${\hat{k}}_{0}$.

It is worthwhile to note that $k_{0}$ is not a model parameter here and is selected by the proposed selection criteria only for finite sample correction. Thus, choosing $k_{0}$ is inherently different from model selection in the literature, such as, choosing the number of model parameters, where consistency is a primary requirement for the penalty term. Therefore, we could use different penalty terms in $BIC(k_{0})$ or $ C(k_{0}) $ above as long as a unique value of $k_{0}$ can be chosen. The Monte Carlo experiments and empirical example below suggest that the proposed selection criteria work well.^{Footnote 15}

4 Constrained estimators

With the proposed finite sample adjustment, the sample can still be used to construct a point estimate for inferring population parameters in the presence of wrong skewness. This is similar in spirit to Simar and Wilson (2010), who still rely on the MLE estimation results, but provide more accurate interval estimates using improved inference (bagging) methods.

As previously mentioned, any negative constraint on sample skewness is binding in the presence of wrong skewness. This result implies that estimated $\lambda $ (or k) is implicitly determined by the constraint (10). Consequently, it is biased when the selected value of $k_{0}$ , the lower bound of k, is not equal to the true value of k. Inconsistency of the proposed constrained estimators might be a concern. However, this concern may be overstated. Under the true specification, as the sample size increases, wrong skewness is less likely to appear. Thus, the proposed finite sample adjustment becomes unnecessary, and asymptotics are less of a concern. In addition, with the nature of finite sample adjustment, the proposed method is regarded as an adjustment to existing estimators, rather than a new estimator.^{Footnote 16}

In the next subsection, properties of constrained estimators are studied. Since the constrained COLS is essentially restricted least squares, which has an analytical solution, we mainly focus on it.

4.1 Constrained COLS

The proposed constrained COLS, denoted by ${\hat{\beta }}_{r}$, is a two-step estimator. In the first step, for a given $k_{0}$, the constrained COLS $ {\hat{\beta }}_{r}(k_{0})$ is defined as the solution of

$$\begin{aligned} \min _{\beta }SSR(\beta )= & {} \min _{\beta }(Y-X\beta )^{\prime }(Y-X\beta ) \\ s.t.\,R\beta= & {} q(k_{0}). \end{aligned}$$

In the second step, $k_{0}$ is selected such that ${\hat{k}}_{0}=\arg \min _{k_{0}}C(k_{0})$, where $C(k_{0})=\frac{1}{N}(Y-X{\hat{\beta }} _{r}(k_{0}))^{\prime }(Y-X{\hat{\beta }}_{r}(k_{0}))-k_{0}{\hat{\sigma }} _{\varepsilon }^{2}\frac{\ln N}{N}$. The proposed constrained COLS is defined as ${\hat{\beta }}_{r}={\hat{\beta }}_{r}({\hat{k}}_{0})$.

This two-step estimator is equivalent to a one-step penalized least squares with the linear constraint:

$$\begin{aligned}&\min _{\beta ,k_{0}}\frac{1}{N}(Y-X\beta )^{\prime }(Y-X\beta )-k_{0}\hat{ \sigma }_{\varepsilon }^{2}\frac{\ln N}{N} \\&\quad s.t.\,R\beta =q(k_{0}). \end{aligned}$$

This equivalence comes from the fact that in the objective function $k_{0}$ only appears in the penalty term $-k_{0}{\hat{\sigma }}_{\varepsilon }^{2}\frac{ \ln N}{N}$. Thus, $\beta $ can be concentrated out for a given $k_{0}$.

For a given $k_{0}$, ${\hat{\beta }}_{r}(k_{0})$ is the restricted least square. By Amemiya (1985) or Greene (2012),

$$\begin{aligned} {\hat{\beta }}_{r}(k_{0})={\hat{\beta }}_\mathrm{OLS}-(X^{\prime }X)^{-1}R^{\prime }[R(X^{\prime }X)^{-1}R^{\prime }]^{-1}[R{\hat{\beta }}_\mathrm{OLS}-q(k_{0})], \end{aligned}$$

and

$$\begin{aligned} SSR_{r}(k_{0})=SSR+[R{\hat{\beta }}_\mathrm{OLS}-q(k_{0})]^{\prime }[R(X^{\prime }X)^{-1}R^{\prime }]^{-1}[R{\hat{\beta }}_\mathrm{OLS}-q(k_{0})]. \end{aligned}$$

Thus, the criterion is

$$\begin{aligned} C(k_{0})= & {} \frac{1}{N}SSR+\frac{1}{N}[R{\hat{\beta }}_\mathrm{OLS}-q(k_{0})]^{\prime }[R(X^{\prime }X)^{-1}R^{\prime }]^{-1}[R{\hat{\beta }}_\mathrm{OLS}-q(k_{0})]\\&-k_{0} {\hat{\sigma }}_{\varepsilon }^{2}\frac{\ln N}{N}. \end{aligned}$$

Minimizing $C(k_{0})$ defines ${\hat{k}}_{0}$. The follow proposition proves the existence and uniqueness of ${\hat{k}}_{0}$.

Proposition 1

In the presence of positive skewness of OLS residuals, i.e., ${\hat{\mu }} _{3}^{\prime }>0$, (i) $\frac{dSSR_{r}(k_{0})}{dk_{0}}>0$; (ii) for a relatively large sample size N, there exists a solution for ${\hat{k}}_{0}$ such that ${\hat{k}}_{0}$ minimizes $C(k_{0})$; (iii) $\frac{d^{2}C(k_{0})}{ dk_{0}^{2}}>0$, implying that ${\hat{k}}_{0}$ is the unique solution.

The proof in “Appendix” shows that a relatively large N guarantees the existence of ${\hat{k}}_{0}$. Since $\frac{\ln N}{N}\rightarrow 0$, when $N\rightarrow \infty $, compared with the first term $\frac{1}{N}SSR_{r}(k_{0})$, which converges to a nonzero constant, the penalty term $-k_{0}{\hat{\sigma }}_{\varepsilon }^{2} \frac{\ln N}{N}$ in $C(k_{0})$ can be ignored asymptotically. This implies that ${\hat{k}}_{0}\rightarrow 0$ as $N\rightarrow \infty $. Hence, when N is large, the proposed constrained COLS approaches the OLS with constraint $ R(\beta -{\hat{\beta }}_\mathrm{OLS})={\hat{\mu }}_{3}^{\prime }/3$, which is very close to OLS in the presence of wrong skewness.

This property also implies that in a sample with a large number of firms, the selected ${\hat{k}}_{0}$ could be 0. In this case, to obtain nonzero inefficiency estimates, a small positive value, say, 0.05, is suggested.^{Footnote 17}

For a given sample, the difference between OLS and the constrained COLS

$$\begin{aligned} {\hat{\beta }}_\mathrm{OLS}-{\hat{\beta }}_{r}=(X^{\prime }X)^{-1}R^{\prime }[R(X^{\prime }X)^{-1}R^{\prime }]^{-1}[R{\hat{\beta }}_\mathrm{OLS}-q({\hat{k}}_{0})] \end{aligned}$$

depends on ${\hat{k}}_{0}$, and

$$\begin{aligned} \frac{d[{\hat{\beta }}_\mathrm{OLS}-{\hat{\beta }}_{r}]}{d{\hat{k}}_{0}}=-(X^{\prime }X)^{-1}R^{\prime }[R(X^{\prime }X)^{-1}R^{\prime }]^{-1}\frac{\Pi }{2}\hat{k }_{0}^{1/2}({\hat{\mu }}_{2}^{\prime })^{3/2} \end{aligned}$$

implying that the magnitude of this difference is positively correlated with the chosen value ${\hat{k}}_{0}$.

4.2 Constrained MLE

For a given $k_{0}$, the constrained MLE $({\hat{\beta }}_{CMLE}(k_{0}),\hat{ \lambda }_{CMLE}(k_{0}),{\hat{\sigma }}_{CMLE}^{2}(k_{0}))$ depends on $k_{0}$. Minimizing $BIC(k_{0})$ determines the value of $k_{0}$, i.e., ${\tilde{k}} _{0}=\arg \min _{k_{0}\in [0,1)}BIC(k_{0})$. Similar to the constrained COLS, $({\hat{\beta }}_{CMLE},{\hat{\lambda }}_{CMLE},{\hat{\sigma }} _{CMLE}^{2})$ is defined as $({\hat{\beta }}_{CMLE}({\tilde{k}}_{0}),\hat{\lambda }_{CMLE}({\tilde{k}}_{0}),{\hat{\sigma }}_{CMLE}^{2}({\tilde{k}}_{0}))$. It can also be written as a penalized maximum likelihood estimator with a constraint,

$$\begin{aligned}&\min _{\beta ,\lambda ,\sigma ^{2},k_{0}}-2l(\beta ,\lambda ,\sigma ^{2})-k_{0}\ln N \\&\quad s.t.R\beta =q(k_{0}), \end{aligned}$$

where

$$\begin{aligned} l(\beta ,\lambda ,\sigma ^{2})= & {} \frac{N}{2}\ln (\frac{2}{\pi })-\frac{N }{2}\ln \sigma ^{2}+\sum \limits _{i=1}^{N}\ln \left[ 1-\Phi \left( \frac{ \lambda }{\sqrt{\sigma ^{2}}}(y_{i}-x_{i}^{\prime }\beta )\right) \right] \\&- \frac{1}{2\sigma ^{2}}\sum \limits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta )^{2} \end{aligned}$$

defined in (2).

Since there is no analytical solution to the constrained optimization problem above, it is difficult to derive the properties of constrained MLE.

However, dividing by N, $\frac{1}{N}BIC(k_{0})=-\frac{2}{N} l_{r}(k_{0})-k_{0}\frac{\ln N}{N}$, compared with $-\frac{2}{N}l_{r}(k_{0})$ , which does not converge to zero, the penalty term $-k_{0}\frac{\ln N}{N}$ can be asymptotically ignored as $N\rightarrow \infty $, implying that $ {\tilde{k}}_{0}$ tends to 0 as $N\rightarrow \infty $. Since ${\tilde{k}}_{0} $ is small when N is large, the proposed constrained MLE is expected be close to MLE. Since the MLE of slope parameters is very close to OLS, the constrained MLE and constrained COLS are expected to be close. Similar to the constrained COLS, the selected ${\tilde{k}}_{0}$ could be 0 in a sample with a large N. In this case, we also impose a lower bound of, say, 0.05 , to obtain nonzero estimated inefficiency.

We now consider the difference between constrained MLE and OLS by examining the first-order conditions of (2). Aigner et al. (1977) show that:

$$\begin{aligned} \frac{\partial \ln L}{\partial \sigma ^{2}}= & {} -\frac{N}{2\sigma ^{2}}+\frac{ 1}{2\sigma ^{4}}\sum \limits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta )^{2}\nonumber \\&+\frac{ \lambda }{2\sigma ^{3}}\sum \limits _{i=1}^{N}\frac{\phi (\cdot )}{1-\Phi (\cdot )}(y_{i}-x_{i}^{\prime }\beta )=0; \end{aligned}$$

(11)

$$\begin{aligned} \frac{\partial \ln L}{\partial \lambda }= & {} -\frac{1}{\sigma ^{2}} \sum \limits _{i=1}^{N}\frac{\phi (\cdot )}{1-\Phi (\cdot )} (y_{i}-x_{i}^{\prime }\beta )=0; \end{aligned}$$

(12)

$$\begin{aligned} \frac{\partial \ln L}{\partial \beta }= & {} \frac{1}{\sigma ^{2}} \sum \limits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta )x_{i}+\frac{\lambda }{ \sigma }\sum \limits _{i=1}^{N}\frac{\phi (\cdot )}{1-\Phi (\cdot )}x_{i}=0, \end{aligned}$$

(13)

where $\phi (\cdot )$ is the standard normal density function. $\phi (\cdot ) $ and $\Phi (\cdot )$ are evaluated at $\frac{\lambda }{\sigma } (y_{i}-x_{i}^{\prime }\beta )=\frac{\lambda }{\sigma }\varepsilon _{i}$. Waldman (1982) shows that in the presence of wrong skewness $\lambda =0$ and OLS is a local maximum of the log-likelihood.

For our constrained MLE, the constraint (7) or (9) involves the value of $k_{0}$, not $\lambda $ directly. Since $\lambda $ is a monotonic increasing function of k, $k\ge k_{0}$ implies

$$\begin{aligned} \lambda \ge \sqrt{\frac{\pi }{\pi -2}\frac{1}{(1/k_{0}-1)}}. \end{aligned}$$

(14)

To show how restricting $\lambda $ affects the estimation result and how the constrained MLE of $\beta $ is different from the OLS, consider equation (13).^{Footnote 18} Taking the first-order Taylor expansion at $\lambda =0$ gives

$$\begin{aligned} \frac{\phi \left( \frac{\lambda }{\sigma }\varepsilon _{i}\right) }{1-\Phi \left( \frac{\lambda }{\sigma }\varepsilon _{i}\right) }\approx \sqrt{\frac{2}{\pi }}+\frac{2}{\pi } \frac{\lambda }{\sigma }\varepsilon _{i}. \end{aligned}$$

Thus, (13) becomes

$$\begin{aligned} 0= & {} \sum \limits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta )x_{i}+\lambda \sigma \sum \limits _{i=1}^{N}\frac{\phi (\frac{\lambda }{\sigma }\varepsilon _{i})}{ 1-\Phi \left( \frac{\lambda }{\sigma }\varepsilon _{i}\right) }x_{i}\approx \sum \limits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta )x_{i}\\&+\lambda \sigma \sum \limits _{i=1}^{N}\left( \sqrt{\frac{2}{\pi }}+\frac{2}{\pi }\frac{\lambda }{ \sigma }\varepsilon _{i}\right) x_{i} \\= & {} \left( 1+\frac{2}{\pi }\lambda ^{2}\right) \sum \limits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta )x_{i}+\sqrt{\frac{2}{\pi }}\lambda \sigma \sum \limits _{i=1}^{N}x_{i}. \end{aligned}$$

That is,

$$\begin{aligned} \sum \limits _{i=1}^{N}(y_{i}-x_{i}^{\prime }\beta )x_{i}+\frac{\sqrt{\frac{2}{ \pi }}\lambda }{\left( 1+\frac{2}{\pi }\lambda ^{2}\right) }\sqrt{\sigma ^{2}} \sum \limits _{i=1}^{N}x_{i}=0. \end{aligned}$$

(15)

In matrix form, the equation (15) above can be written as

$$\begin{aligned} X^{\prime }y-X^{\prime }X\beta +\varphi (\lambda )\sqrt{\sigma ^{2}} X^{\prime }\iota =0 \end{aligned}$$

(16)

where $\varphi (\lambda )=\sqrt{\frac{2}{\pi }}\lambda /(1+\frac{2}{\pi } \lambda ^{2})$ and $\iota $ is the $N\times 1$ vector of ones. Equivalently,

$$\begin{aligned} {\hat{\beta }}_{CMLE}\simeq (X^{\prime }X)^{-1}X^{\prime }y+\varphi (\lambda ) \sqrt{\sigma ^{2}}(X^{\prime }X)^{-1}X^{\prime }\iota . \end{aligned}$$

(17)

In the presence of wrong skewness, OLS (i.e., $\lambda =\varphi =0$) is a local maximum of the log-likelihood. Under the constraint (14), the estimator of $\beta $ is adjusted by the second term in equation (17).^{Footnote 19}

Given the fact that $\varphi (\lambda )$ is monotonically increasing in $ \lambda $ in the range [0, $\sqrt{\pi /2}\approx 1.2533]$, the difference between the constrained MLE and the OLS of $\beta $ is positively related to the value of $\lambda $.^{Footnote 20} The larger $\lambda $ (or $k_{0}$) is imposed, the bigger is the difference between the OLS and the constrained MLE. Furthermore, in a given sample this difference depends not only on $ \varphi (\lambda )$, but also on the sample value of the regressors and $ \sigma ^{2}$ jointly determined by first-order equations. We conjecture that constraint (10) with a small value of $k_{0}$ slightly adjusts the estimators of $\beta $ and $\sigma _{v}^{2}$, but has a much larger effect on the estimated $\sigma _{u}^{2}$ and $\lambda $. This point is confirmed in the Monte Carlo experiments and empirical example below.

5 Monte Carlo experiments

In this section, Monte Carlo experiments are conducted to study how the proposed constraints affect the estimates and how the chosen value of $k_{0} $, the imposed lower bound of k, is affected by the sample size. Since respecification is always a solution to wrong skew, we also study our approach under model misspecification. We consider two forms of misspecification: when the inefficiency distribution is binomial (Carree 2002) and when it is exponential. The binomial case is particularly relevant since it is a specification that allows for residual skewness in either direction. We compare our approach under misspecification to the correctly specified estimators in terms of the root mean squared error (RMSE) of the estimated efficiency variance and average efficiency, and the proposed method’s performance is surprisingly good (given the misspecification) when the sample size is small. Due to its computational convenience, our focus is on the constrained COLS estimator.

We consider a specification

$$\begin{aligned} y_{i}=\beta _{0}+\beta _{1}x_{1i}+\beta _{2}x_{2i}+\varepsilon _{i},\, \varepsilon _{i}=-u_{i}+v_{i},\ \ i=1,\ldots ,N, \end{aligned}$$

where $\beta _{0}=1,$ $\beta _{1}=0.8,$ $\beta _{2}=0.2$, $x_{1i}\sim \log (|N(4,100)|)$, $x_{2i}\sim \log (|N(2,60)|)$, $v_{i}\sim N(0,\sigma _{v}^{2}) $ and $u_{i}\sim |N(0,\sigma _{u}^{2})|$. $k=Var(u_{i})/Var(\varepsilon _{i}) $ is the inefficiency variance ratio.^{Footnote 21}$\sigma _{u}^{2}=\frac{ \pi }{\pi -2}Var(u_{i})=\frac{\pi -2}{\pi }kVar(\varepsilon _{i})$ and $ \sigma _{v}^{2}=(1-k)Var(\varepsilon _{i})$. We set $Var(\varepsilon _{i})=\sigma _{v}^{2}+Var(u_{i})=0.026$, so the variance of $x_{1i}$ and $ Var(\varepsilon _{i})$ are comparable to those in the empirical example below.

Since the focus is the proposed correction for samples with wrong residual skewness, we report the frequency of wrong skewness in the replications and drop the samples with correct skewness. The number of replications is 4, 000 before dropping the samples with correct skewness. We conduct experiments with $k=$ 0.1, 0.2, 0.3, 0.5, 0.7 and $N=50$, 100, 200. For our binomial misspecification experiments, we generate $u_{i}\sim B(10,p)$ with $Var(u)=p(1-p)$. For exponential misspecification, we generate $u_{i}\sim Exp(\sigma _{u})$ with $ Var(u)=\sigma _{u}^{2}$. The same inefficiency variance ratio and sample size combinations are considered in both cases.

Table 1 Monte Carlo results: constrained COLS

Full size table

Table 1 reports the simulation results. Column (2) gives the average value of ${\hat{k}}_{0}$. To obtain ${\hat{k}}_{0}$ for each sample, a grid search is conducted to minimize $C(k_{0})$ on the interval [0.05, 0.9]. As expected, the average value of ${\hat{k}}_{0}$ decreases with N. Column (3) shows that there is still a possibility of wrong skewness after the proposed finite sample correction. The frequency depends on the inefficiency variance ratio and sample size, varying from 16.3% to $39.9\%$. For example, for $ k=0.5$, $N=100$, our finite sample correction approach could fail with a possibility of $28.4\%$. This failure is a cost of the linearization approximation (8). When $k_{0}$ is small, $g(k_{0})({\hat{\mu }} _{2}^{\prime })^{3/2}$ could be a small negative value close to zero. Consequently, due to approximation error, a linearized constraint does not guarantee a negative third moment of residuals or skewness. However, as k increases, the failure frequency can be greatly reduced, e.g., to $16.3\%$ for $k=0.7$, $N=200$.

For parameter estimators, columns (4)–(7) indicate that with the correction of ${\hat{\sigma }}_{u}\sqrt{2/\pi }$, constrained COLS of $\beta _{0}$ is less biased than OLS, but with a much larger root mean squared errors (RMSE). But when k and N increase, the RMSE of constrained COLS is comparable to that of OLS. (Bias and RMSE of OLS of $\beta _{0}$ (and $ \beta _{1}$) are included in columns (6), (8) (and (10), (12)) for comparison). In addition, compared with OLS, the constrained COLS of $\beta _{1}$ is slightly upward biased with bigger RMSE, and the bias and RMSE decrease with k and N.

In the presence of wrong skewness, $\sigma _{u}^{2}$ is typically assumed to be zero. Using our correction, column (12) shows that the estimated $ \sigma _{u}^{2}$ tends to be overestimated for a small value of k and underestimated for a big value of k. Compared with $\sigma _{u}^{2}$, $ \sigma _{v}^{2}$ can be estimated more accurately in terms of bias, as indicated in column (14).

Columns (14) and (15) report the bias and RMSE of the mean technical efficiency $E[\exp (-u_{i})]=2\exp (\sigma _{u}^{2}/2)[1-\Phi (\sigma _{u})]$ . In the presence of wrong skewness, traditional practice suggests that the estimated $\sigma _{u}^{2}$ is 0, implying that the estimated mean technical efficiency is 1. This practice obviously overestimates the true mean technical efficiency. Column (14) shows that the mean technical efficiency estimator using the proposed correction could be unbiased with a reasonable RMSE for a sizable value of k, say, 0.2 here under the current design. It is downward biased for a small value of k, and upward biased for $k>0.2$.

Table 2 Monte Carlo results: comparison between constrained COLS vs Carree (2002)

Full size table

Under binomial misspecification, we focus on the performance of the variance estimate of $\sigma _{u}^{2}$ and the mean technical efficiency score $E(\mathbf {e^{-u}})$. The binomial case is particularly relevant since it is a specification that accommodates residual skewness in either direction. The results are reported in Table 2. Columns (1)–(3) are very similar to those in Table 1 except that the frequency of wrong skewness after applying the proposed finite sample correction is higher. This makes sense considering the misspecified model in the current setting. Columns (4)–(7) report the coefficient estimates and Columns (12)–(15) for Carree’s (2002) approach, which is OLS estimated with a binomial inefficiency. Columns (8)–(11) report the bias and RMSE of the variance estimates and mean technical efficiency estimates for the proposed constrained COLS, and those using the Carree (2002) approach are in columns (16)–(19). We find that the proposed method works well (i.e., smaller bias and RMSE) when the sample size or k is small.^{Footnote 22} As sample size increases, the precision of binomial estimation improves very fast, and it is similar to that of the proposed method.

Table 3 Monte Carlo results: constrained COLS with Exponential inefficiency

Full size table

We also conduct misspecified simulations, assuming that inefficiency follows an exponential distribution, while applying our proposed finite sample correction in Sect. 4 based on a half-normal distribution. The results are reported in Table 3. In general, the performance of proposed estimator is similar to that in Table 1 (the normal-half normal design), except for that of the mean technical efficiency, which depends heavily on the distribution of the data generating process. The bias of the variance estimator is small, indicating a degree of robustness of our approach.

6 Empirical example: the US airline industry

Table 4 Estimates of Airlines Example: constrained MLE

Full size table

In this section, an airlines example is used to illustrate our approach. This is an unbalanced panel data set with 256 observations. See Greene (2007) for detailed information of this data set. In this example, the dependent variable is the logarithm of output and the independent variables include the logarithm of fuel, materials, equipment, labor, and property. Here, the unbalanced panel is treated as a cross section for 256 firms to ensure that the wrong skewness issue arises.^{Footnote 23} Column (2) of Table 4 presents the OLS estimates along with standard errors (column 3). Except for the constant term, the slope coefficients are consistent with Table 2.11 in Greene (2007). The OLS residual skewness (0.0167) is in the wrong direction for the estimated normal-half normal model. Thus, the estimates of $\lambda $ and $\sigma _{u}^{2}$ are set to zero and firms are considered to be “super efficient.” However, Greene (2007, footnote 84) does suggest that there is evidence of technical inefficiency in the data. The second root of the likelihood with positive $\lambda $ is reported in the second section of Table 4. This MLE yields a small positive residual skewness 0.0093.^{Footnote 24} Usually, in the presence of “wrong” skewness, researchers are advised to obtained a new sample or respecify the model. For the purpose of comparison, we also report the $95\%$ confidence (prediction) intervals for MLE obtained using the bagging procedure of Simar and Wilson (2010) in column (4) and corresponding standard errors in column (5).

Instead, we use the constrained MLE (and constrained COLS), a finite sample adjustment to the existing MLE (and COLS). The optimal value of $k_{0}$ can be chosen by $BIC(k_{0})$ (and $C(k_{0})$ for the constrained COLS) proposed above. For purposes of illustration, we present constrained MLE results of $k_{0}=0.05$, 0.1, 0.15, and 0.2 in columns (8)–(15) of Table 4 and compare the values of $BIC(k_{0})$, showing that ${\tilde{k}} _{0}=0.15$ achieves the minimum of $BIC(k_{0})$. Thus, the constrained MLE of $\lambda $ and $\sigma _{u}$ are positive, 0.6829 and 0.1015 respectively. We also report the mean technical efficiency, i.e., $E(e^{-u})$ for comparison. Furthermore, consistent with the negative population skewness of the composed error, the skewness of constrained MLE residuals $(-\,0.0599)$ has the desired sign.

Since the constraint slightly adjusts the coefficients of constrained MLE, as expected, the rest of the coefficients are very close to the unconstrained MLE and OLS. For example, the constrained estimated coefficient of variable Log fuel is 0.3907 (column (12)), while its unconstrained counterpart is 0.3836 (in column 4) and OLS coefficient is 0.3828 (in column (2)). All estimates obtained from constrained MLE stay within the bagging confidence intervals. In addition, the bagging standard errors are closer to those from the constrained MLE in column (12) than those from unconstrained MLE in column (5) or OLS in column (3). These provide a degree of comfort for the validity of our proposed method.

Consistent with the analysis in Sect. 4.2, the difference between the constrained MLE slope coefficients and its OLS (and unconstrained MLE) counterparts is positively related to the magnitude of $k_{0}$. The bigger the value of $k_{0}$ , the larger is the difference. However, this difference is relatively small. For example, the constrained estimated coefficients of variable Log fuel using $k_{0}=0.2$ is 0.3939 (in column (14) of Table 4), compared with the OLS 0.3828 and the unconstrained MLE 0.3836 (in columns (2) and (4) of Table 4). This is also the case for $\sigma _{v}$ and $\sigma ^{2}$. In stark contrast to this small difference in slope coefficients, the residual skewness and estimated k change significantly, since they are implicitly determined by the chosen value of $k_{0}$ in the constraint. Another important point observed in Table 4 is that the value of the likelihood decreases with $k_{0}$.^{Footnote 25}

Table 5 Estimates of Airlines Example: constrained COLS

Full size table

The results of constrained COLS are reported in columns (8)–(15) of Table 5 and are very close to their constrained MLE counterparts for given values of $k_{0}=0.05$, 0.1, 0.15, and 0.2.^{Footnote 26} However, for the constrained COLS, the optimal value of $ k_{0}$ is 0.1 by applying Mallows’ $C_{p}$-type criterion $C(k_{0})$ proposed above. (Table 5 reports $N\times C(k_{0})$ instead of $C(k_{0})$.) This is slightly different from ${\tilde{k}}_{0}=0.15$ by minimizing $ BIC(k_{0})$ in the constrained MLE. Therefore, the constrained COLS of $ \sigma _{u}$ is 0.0853 and skewness is $-\,0.0325$ in column (10).

It is worth mentioning that the value of criterion C(0.15) is nearly equal to C(0.1) in this empirical example, implying that $BIC(k_{0})$ for the constrained MLE and $C(k_{0})$ for the constrained COLS result in similar optimal values of $k_{0}$. Again, all the estimates obtained from constrained COLS stay within the bagging confidence intervals by Simar and Wilson (2010) and the bagging standard errors are very close to those from constrained COLS.

Since the proposed finite sample adjustment restricts the inefficiency variance ratio, it indirectly affects the estimated $\sigma _{u}^{2}$. In this example, it is $0.1015^{2}$, for the constrained MLE. Consequently, the mean technical efficiency estimate, $2\exp ({\hat{\sigma }} _{u}^{2}/2)[1-\Phi ({\hat{\sigma }}_{u})]$, depends on the chosen value of $ k_{0}$. However, efficiency rankings appear to be preserved under different choices of $k_{0}$. For the unconstrained MLE, the least efficient firm is the 79th with technical efficiency .8958. If we impose $k_{0}=0.05$, 0.1, 0.15, 0.2 in the constraint, the technical efficiency becomes .8583, .8308, .8015, .7722, respectively, and it remains lowest among the 256 firms. The most efficient firm is the 250th with technical efficiency .9696, 0.9669, .9655, .9644, .9636 for the unconstrained MLE and constrained MLE with $k_{0}=0.05$, 0.1, 0.15, 0.2, respectively. This is also the case for the median firm.

Finally, we also calculate the bagging prediction intervals for efficiency scores of the least efficient firm (i.e., the 79th firm) and the most efficient firm (i.e., the 250th firm): [0.5366, 0.9987] and [0.8865,0.9988], respectively. All above estimates fall safely into these prediction intervals.

7 Conclusions

This paper studies the wrong skewness issue in parametric stochastic frontier models. Following Simar and Wilson’s (2010), we consider wrong skewness to be a consequence of estimation in finite samples when the inefficiency variance ratio is small. (Another possibility is that the model is misspecified.) In finite samples, the data may fail to be informative enough to detect the existence of inefficiency term in stochastic frontier models, even though the population inefficiency variance ratio could be fairly large. Thus, the resulting residuals could display skewness in either direction with probability of as high as 50%.

As an alternative to the usual “solutions” to the wrong skew problem, we propose a feasible finite sample adjustment to existing estimates. When there is evidence of inefficiency, it is reasonable to impose a lower bound on the inefficiency variance ratio in the normal-half normal model, equivalent to a negative upper bound on the residual skewness. Thus, we propose to use this negative bound on residual skewness as a constraint in the MLE and COLS in the event of wrong skewness.

The idea of the proposed constrained estimators is to slightly adjust the slope coefficients in finite samples. They provide a point estimate that yields a negative residual skewness, though a correct sign of residual skewness is not always guaranteed. Since the constraint is based on $k_{0}$, the choice of $k_{0}$ affects estimation results. A model selection approach is proposed to select $k_{0}$. Monte Carlo experiments show that the bias of constrained estimates is less of a concern when sample size is large and inefficiency variance ratio increases. The proposed method is comparable with existing methods in the literature such as the binomial estimation by Carree (2002). The empirical example in this paper also shows that the value $k_{0}$ has little effect on the estimated slope coefficients and $\sigma _{v}$, $\sigma ^{2}$, while the residual skewness and estimated k are implicitly determined by the value of $k_{0}$. In this sense, the proposed method can be regarded as a finite sample adjustment to existing estimators, rather than a new estimator. When the sample size is large, since wrong skewness is less likely to occur, such adjustment becomes unnecessary.

Notes

The skewness of the composed error is positive in the stochastic frontier cost function model. We use the terminology COLS following Olson et al. (1980). COLS is also called MOLS. See Greene (2007).
Greene (2007, p.131) claims “In this instance, the OLS results are the MLEs, and consequently, one must estimate the one-sided terms as 0.”
For example, estimating the variance parameters in COLS is invalid in this case.
It may also be a consequence of a misspecified model, but that is not our focus here.
Waldman (1982, p. 278) also suggests that $(b,0,s^{2})$ may be a global maximum. There are two roots in this normal-half normal model: OLS $ (b,0,s^{2})$ and one at the MLE with positive $\lambda $. When the residual skewness is positive, the first is superior to the second (Greene 2007, note 28).
Kumbhakar et al. (2013) propose a stochastic frontier model to accommodate the presence of both efficient and inefficient firms in the sample.
Waldman (1982, p.278) notes that for $\sigma _{u}>0$ “as the sample size increases the probability that $\sum e_{t}^{3}>0$ and hence that $ (b,0,s^{2})$ locates a local maximum goes to zero.”
Badunenko et al. (2012) find that the estimation of efficiency scores depends on the estimated ratio of the variation in efficiency to the variation in noise. As discussed by Kim et al. (2007) and Feng and Horrace (2012) in fixed effects stochastic frontier models, small signal-to-noise ratio leads to inaccurate inference.
As pointed out by Simar and Wilson (2010, p.72), this problem could happen in other one-sided specifications. In a previous version of this paper, our Monte Carlo experiments suggest that wrong skewness could also occur with high probability in exponential and binomial SPF models, when the signal-to-noise ratio is small.
This stems from the fact that Waldman (1982) shows that OLS is local maximum in the parameter space of MLE when the OLS residuals are positively skewed. In fact, the non-positivity constraint will bind globally (when the OLS residuals are positively skewed), if OLS is a global maximum, as the Monte Carlo studies of Olsen et al. (1980) suggest.
It is worth noting that (10) is not a direct linearization of (7). Alternatively, a full linearization of (7) can be similarly obtained by replacing $R=\frac{1}{N}{\tilde{e}}^{\prime }M_{0}X$ with $R=\frac{1}{N}({\tilde{e}}^{\prime }M_{0}-\sqrt{{\hat{\mu }}_{2}^{\prime }} {\hat{\mu }}_{3}^{\prime }e^{\prime })X$. The additional term $-\frac{1}{N} \sqrt{{\hat{\mu }}_{2}^{\prime }}{\hat{\mu }}_{3}^{\prime }e^{\prime }X$ is from the effect of the denominator of the constraint in (7). Monte Carlo simulations suggest that the estimation results are robust to this choice. Details are available upon request.
In the empirical example below, the command Frontier in Stata, which allows for a linear constraint, is employed.
Table 1 in Simar and Wilson (2010) provides some guidance. On the one hand, when $\lambda ^{2}\le 0.1$ (i.e., $k=1/(\frac{\pi }{\pi -2}\frac{1}{\lambda ^{2}}+1)<0.035)$ for samples with size less than 200, the proportion of wrong skewness is close to $50\%$, implying that the inefficiency term is hard to distinguish from noise. On the other hand, when $\lambda ^{2}\ge 1$ ($k\ge 0.267$), the wrong skewness probability decreases dramatically. For example, only $6\%$ of samples display wrong residual skewness for $\lambda ^{2}=2$ ($k=0.421$) and $N=200$. We have a similar finding both for Simar and Wilson’s design and the design in Sect. 5 of this paper. Results are available upon request.
The constraint $k\ge k_{0}$ is always binding in the neighborhood of OLS. And a restriction on k is equivalent on $\lambda $, which is a monotonic increasing function of k in the half-normal model,
$$\begin{aligned} \lambda =\sqrt{\frac{\sigma _{u}^{2}}{\sigma _{v}^{2}}}=\sqrt{\frac{ Var(u_{i})}{\frac{\pi -2}{\pi }\sigma _{v}^{2}}}=\sqrt{\frac{k}{\frac{\pi -2 }{\pi }(1-k)}}=\sqrt{\frac{\pi }{\pi -2}\frac{1}{(1/k-1)}}. \end{aligned}$$
Per a referee’s advice, we experimented with selecting $k_0$ by minimizing the mean integrated squared error of the difference between the constrained and unconstrained residual densities, but the selection performed poorly in terms of the RMSE of the estimated coefficients in finite samples.
In this sense, our approach is different from the literature on models with moment conditions, e.g., Moon and Schorfheide (2009).
When N is large, the wrong skewness problem is less likely to occur unless the inefficiency variance ratio is very small. When it does occur in this setting, alternative approaches including respecification may be considered.
Strictly speaking, restricting $\lambda $ as a constraint yields a different result from constraint (7). Though the population skewness is equal to $g(k_{0})$ and thus a monotonic function of $\lambda $, the sample skewness is not a function of $\lambda $. However, the insights derived here on the effect of the chosen value of $k_{0}$ on estimation still apply.
As pointed out by a referee, the second term on the right-hand side of (17) is a constant in X. See Papadopoulos (2018), pp. 338–339. This is due to the fact we only take a first-order Taylor expansion of $\phi (\frac{\lambda }{\sigma }\varepsilon _{i})/ [1-\Phi (\frac{\lambda }{\sigma }\varepsilon _{i})]$. With higher-order terms included, constrained MLE and COLS involve additional terms.
For a small value of $k_{0}$, e.g., $k_{0}\in [0.1,0.3]$, $\lambda $ lies in the interval [0.5530, 1.0860].
Coelli (1995) also uses this signal-to-noise ratio measure, denoted by $ \gamma ^{*}$, in his Monte Carlo experiments.
Due to misspecification and small sample, the variance estimates with proposed method do not show a consistency property and may yield extreme estimates in some cases, for instance $N=200$ with $k=0.5$.
With the exception of perhaps Green and Mayes (1991), Mester (1997), and Parmeter and Racine (2012), there appear to be very few empirical studies with wrong skewness in the literature. As in Greene (2007, Table 2.11), we use this panel data example as a cross-sectional one only for the purpose of illustration.
Inconsistent with the statements of Waldman (1982) and Greene (2007), the MLE with positive $\lambda $ achieves a slightly bigger value of log-likelihood than OLS for this dataset. Similarly, the inconsistency between OLS and MLE in the presence of positive OLS residual skewness by using FRONTIER is discussed by Simar and Wilson (2010). Greene (2007, p. 202) notes: “... for this data set, and more generally, when the OLS residuals are positively skewed, then there is a second maximizer of the log-likelihood, OLS, that may be superior to the stochastic frontier.”
This property can be obtained by the equation (3) in Waldman (1982, p.278):
$$\begin{aligned} \Delta l=\frac{\mu ^{3}}{6s^{3}}\sqrt{\frac{2}{\pi }}\frac{\pi -4}{\pi } \sum \limits _{i=1}^{N}e_{i}^{3} \end{aligned}$$
where $\mu $ can be regarded as $\lambda $ changing from 0 as in the analysis in Sect. 3.1. Since $\pi -4<0$, in the presence wrong skewness $ (\sum \nolimits _{i=1}^{N}e_{i}^{3}>0$), the log-likelihood decreases with the imposed value of $\lambda $ (and $k_{0}$).
The constant term is calculated by OLS intercept plus $\sqrt{2{\hat{\sigma }} _{u}^{2}/\pi }$. The standard errors formula of the COLS estimators of constant term, $\sigma ^{2}$ and $\gamma $ (not $\lambda $) can be found in Coelli (1995).

References

Aigner DJ, Lovell CAK, Schmidt P (1977) Formulation and estimation of stochastic frontier production function models. J Econ 6:21–37
Article Google Scholar
Amemiya T (1985) Advanced econometrics. Harvard University Press, Cambridge
Google Scholar
Almanidis P, Sickles RC (2011) The Skewness issue in stochastic frontier models: fact of fiction? In: van Keilegom I, Wilson PW (eds) Exploring research frontiers in contemporary statistics and econometrics. Springer, Berlin
Google Scholar
Almanidis P, Qian J, Sickles R (2014) 2014, Stochastic Frontier with Bounded Efficiency. In: Sickles RC, Horrace WC (eds) Festschrift in Honor of Peter Schmidt: Econometric Methods and Applications. Springer Science & Business Media, New York, NY, pp 47–81
Chapter Google Scholar
Badunenko O, Henderson D, Kumbhakar S (2012) When, where and how to perform efficiency estimation. J Roy Stat Soc Ser A 175:863–892
Article Google Scholar
Bai J, Ng S (2002) Determining the number of factors in approximate factor models. Econometrica 70:191–221
Article Google Scholar
Carree M (2002) Technological inefficiency and the skewness of the error component in stochastic frontier analysis. Econ Lett 77:101–107
Article Google Scholar
Coelli T (1995) Estimator and hypothesis tests for a stochastic frontier function: a MonteCarlo analysis. J Prod Anal 6:247–268
Coelli T (1996) A guide to frontier version 4.1: a computer program for stochastic frontier production and cost function estimation, CEPA working paper No. 96/07, Centre for efficiency and productivity analysis, University of New England, Arimidale, NSW 2351, Australia
Feng Q, Horrace WC (2012) Alternative technical efficiency measures: skew, bias and scale. J Appl Econ 27:253–268
Article Google Scholar
Green A, Mayes D (1991) Technical inefficiency in manufacturing industries. Econ J 101:523–538
Article Google Scholar
Greene W (1980) On the estimation of a flexible frontier production model. J Econ 3:101–115
Article Google Scholar
Greene W (1995) LIMDEP Version 7.0 User’s Manual. Econometric Software, Inc, New York
Greene W (2007) The econometric approach to efficiency analysis. In: Fried HO, Lovell CAK, Schmidt S (eds) The measurement of productive efficiency: techniques and applications. Oxford University Press, New York
Google Scholar
Greene W (2012) Econometric analysis, 7th edn. Pearson, London
Google Scholar
Hafner C, Manner H, Simar L (2019) The “Wrong Skewness” problem in stochastic frontier models: a new approach. Econ Rev 37:380–400
Article Google Scholar
Horrace WC, Wright IA (2020) Stationary points for stochastic frontier models. J Bus Econ Stat 38(3):516–526
Article Google Scholar
Kim M, Kim Y, Schmidt P (2007) On the accuracy of bootstrap confidence intervals for efficiency Levels in Stochastic Frontier Models with Panel Data. J Prod Anal 28:165–181
Article Google Scholar
Kumbhakar S, Lovell K (2000) Stochastic frontier analysis. Cambridge University Press, Cambridge
Book Google Scholar
Li Q (1996) Estimating a stochastic production frontier when the adjusted error is symmetric. Econ Lett 52:221–228
Article Google Scholar
Kumbhakar S, Parmeter C, Tsionas E (2013) A zero inefficient stochastic frontier model. J Econ 172:66–76
Article Google Scholar
Meeusen W, van den Broeck J (1977) Efficiency estimation from cobb-douglas production functions with composed error. Int Econ Rev 18:435–444
Article Google Scholar
Mester LJ (1997) Measuring efficiency at us banks: accounting for heterogeneity is important. Eur J Oper Res 98:230–242
Article Google Scholar
Moon HR, Schorfheide F (2009) Estimation with overidentifying inequality moment conditions. J Econ 153:136–154
Article Google Scholar
Olson J, Schmidt P, Waldman DM (1980) A monte carlo study of estimators of stochastic frontier production functions. J Econ 13:67–82
Article Google Scholar
Papadopoulos A (2018) The two-tier stochastic frontier framework: theory and applications, models and tools. Ph.D. Thesis, Athens University of Economics and Business
Parmeter CF, Racine JS (2012) Smooth constrained frontier analysis. In: Recent advances and future directions in causality, prediction, and specification analysis: essays in Honor of Halbert L. White Jr., edited by X. Chen and N.E. Swanson, Chapter 18, pp 463–489, Springer, New York
Simar L, Wilson PW (2010) Inferences from cross-sectional stochastic frontier models. Econ Rev 29:62–98
Article Google Scholar
Stevenson R (1980) Likelihood functions for generalized stochastic frontier estimation. J Econ 13:58–66
Article Google Scholar
Waldman D (1982) A stationary point for the stochastic frontier likelihood. J Econ 18:275–279
Article Google Scholar
Wang WS, Schmidt P (2009) On the distribution of estimated technical efficiency in stochastic frontier models. J Econ 148:36–45
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Management, Huazhong University of Science and Technology, Wuhan, China
Jun Cai
Syracuse University, Syracuse, NY, USA
William C. Horrace
Nanyang Technological University, Singapore, Singapore
Qu Feng & Guiying Laura Wu

Authors

Jun Cai
View author publications
You can also search for this author in PubMed Google Scholar
Qu Feng
View author publications
You can also search for this author in PubMed Google Scholar
William C. Horrace
View author publications
You can also search for this author in PubMed Google Scholar
Guiying Laura Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to William C. Horrace.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Feng and Wu acknowledge financial support of the MOE AcRF Tier 1 Grant M4012113 at Nanyang Technological University.

Appendix: Constrained COLS

Proof of Proposition 1

As defined in Sect. 3, the constrained COLS is the restricted least squares with the linear constraint

$$\begin{aligned} R\beta =q(k_{0}) \end{aligned}$$

where $R=\frac{1}{N}{\tilde{e}}^{\prime }M_{0}X$ and $q(k_{0})=R{\hat{\beta }} _\mathrm{OLS}+\frac{{\hat{\mu }}_{3}^{\prime }}{3}+\frac{\Pi }{3}k_{0}^{3/2}({\hat{\mu }} _{2}^{\prime })^{3/2}$, with the slope estimator

$$\begin{aligned} {\hat{\beta }}_{r}(k_{0})={\hat{\beta }}_\mathrm{OLS}-(X^{\prime }X)^{-1}R^{\prime }[R(X^{\prime }X)^{-1}R^{\prime }]^{-1}[R{\hat{\beta }}_\mathrm{OLS}-q(k_{0})] \end{aligned}$$

and the corresponding sum of squared residuals

$$\begin{aligned} SSR_{r}(k_{0})=SSR+[R{\hat{\beta }}_\mathrm{OLS}-q(k_{0})]^{\prime }[R(X^{\prime }X)^{-1}R^{\prime }]^{-1}[R{\hat{\beta }}_\mathrm{OLS}-q(k_{0})], \end{aligned}$$

where ${\hat{\beta }}_\mathrm{OLS}=(X^{\prime }X)^{-1}X^{\prime }y$. Thus,

$$\begin{aligned} \frac{dSSR_{r}(k_{0})}{dk_{0}}=-2[R{\hat{\beta }}_\mathrm{OLS}-q(k_{0})]^{\prime }[R(X^{\prime }X)^{-1}R^{\prime }]^{-1}\frac{dq(k_{0})}{dk_{0}}. \end{aligned}$$

Given the facts that ${\hat{\mu }}_{3}^{\prime }>0$ in the presence of wrong skewness, the scalar

$$\begin{aligned} R{\hat{\beta }}_\mathrm{OLS}-q(k_{0})= & {} R{\hat{\beta }}_\mathrm{OLS}-[R{\hat{\beta }}_\mathrm{OLS}+\frac{ {\hat{\mu }}_{3}^{\prime }}{3}+\frac{\Pi }{3}k_{0}^{3/2}({\hat{\mu }}_{2}^{\prime })^{3/2}] \\= & {} -[\frac{{\hat{\mu }}_{3}^{\prime }}{3}+\frac{\Pi }{3}k_{0}^{3/2}({\hat{\mu }} _{2}^{\prime })^{3/2}]<0. \end{aligned}$$

In addition,

$$\begin{aligned} \frac{dq(k_{0})}{dk_{0}}=\frac{1}{2}({\hat{\mu }}_{2}^{\prime })^{3/2}\Pi k_{0}^{1/2}>0, \end{aligned}$$

and the scalar $R(X^{\prime }X)^{-1}R^{\prime }>0$ since the matrix $ X^{\prime }X$ is positive definite. Therefore,

$$\begin{aligned} \frac{dSSR_{r}(k_{0})}{dk_{0}}>0. \end{aligned}$$

$\square $

Since $C(k_{0})=\frac{1}{N}SSR_{r}(k_{0})-k_{0}{\hat{\sigma }}_{\varepsilon }^{2}\frac{\ln N}{N}$, the FOC is

$$\begin{aligned} \frac{1}{N}\frac{dSSR_{r}(k_{0})}{dk_{0}}-{\hat{\sigma }}_{\varepsilon }^{2} \frac{\ln N}{N}=0, \end{aligned}$$

or

$$\begin{aligned} -2[R{\hat{\beta }}_\mathrm{OLS}-q(k_{0})]^{\prime }[R(X^{\prime }X)^{-1}R^{\prime }]^{-1}\frac{dq(k_{0})}{dk_{0}}-{\hat{\sigma }}_{\varepsilon }^{2}\ln N=0. \end{aligned}$$

Substituting $R{\hat{\beta }}_\mathrm{OLS}-q(k_{0})=-[\frac{{\hat{\mu }}_{3}^{\prime }}{3} +\frac{\Pi }{3}k_{0}^{3/2}({\hat{\mu }}_{2}^{\prime })^{3/2}]$ and $\frac{ dq(k_{0})}{dk_{0}}=\frac{1}{2}({\hat{\mu }}_{2}^{\prime })^{3/2}\Pi k_{0}^{1/2}$ into the equation above, we obtain

$$\begin{aligned} \left[ \frac{1}{\Pi }\frac{{\hat{\mu }}_{3}^{\prime }}{({\hat{\mu }}_{2}^{\prime })^{3/2}}+k_{0}^{3/2}\right] k_{0}^{1/2}=\frac{3}{\Pi ^{2}}\frac{{\hat{\sigma }} _{\varepsilon }^{2}}{({\hat{\mu }}_{2}^{\prime })^{3}}\ln N\cdot [ R(X^{\prime }X)^{-1}R^{\prime }]. \end{aligned}$$

The LHS $k_{0}^{2}+\frac{1}{\Pi }\frac{{\hat{\mu }}_{3}^{\prime }}{({\hat{\mu }} _{2}^{\prime })^{3/2}}k_{0}^{1/2}$ is a monotonic increasing function of $ k_{0}$, with a minimum 0 at $k_{0}=0$ and a maximum of $1+\frac{1}{\Pi } \frac{{\hat{\mu }}_{3}^{\prime }}{({\hat{\mu }}_{2}^{\prime })^{3/2}}$ at $k_{0}=1$ . Since the OLS residual skewness $\frac{{\hat{\mu }}_{3}^{\prime }}{({\hat{\mu }} _{2}^{\prime })^{3/2}}$ is usually a very small positive number in the presence of wrong skewness, $1+\frac{1}{\Pi }\frac{{\hat{\mu }}_{3}^{\prime }}{( {\hat{\mu }}_{2}^{\prime })^{3/2}}$ is slightly bigger than 1.

Consider the RHS $\frac{3}{\Pi ^{2}}\frac{{\hat{\sigma }}_{\varepsilon }^{2}}{( {\hat{\mu }}_{2}^{\prime })^{3}}\frac{\ln N}{N}\cdot N[R(X^{\prime }X)^{-1}R^{\prime }]$. It is positive. In addition, the positive scalar

$$\begin{aligned} N\cdot R(X^{\prime }X)^{-1}R^{\prime }= & {} \frac{1}{N}{\tilde{e}}^{\prime }M_{0}X(X^{\prime }X)^{-1}X^{\prime }M_{0}{\tilde{e}}=\frac{1}{N}({\tilde{e}}- {\hat{\mu }}_{2}^{\prime }\iota )^{\prime }X(X^{\prime }X)^{-1}X^{\prime }( {\tilde{e}}-{\hat{\mu }}_{2}^{\prime }\iota ) \\= & {} \frac{1}{N}[{\tilde{e}}^{\prime }P_{X}{\tilde{e}}-2{\hat{\mu }}_{2}^{\prime } {\tilde{e}}^{\prime }P_{X}\iota +({\hat{\mu }}_{2}^{\prime })^{2}\iota ^{\prime }P_{X}\iota ] \\= & {} \frac{1}{N}[{\tilde{e}}^{\prime }P_{X}{\tilde{e}}-N({\hat{\mu }}_{2}^{\prime })^{2}]=\frac{1}{N}{\tilde{e}}^{\prime }P_{X}{\tilde{e}}-({\hat{\mu }}_{2}^{\prime })^{2}, \end{aligned}$$

where $P_{X}\iota =\iota $ and ${\tilde{e}}^{\prime }\iota =N{\hat{\mu }} _{2}^{\prime }$. We normalize ${\tilde{e}}$ by dividing it by its average $ {\hat{\mu }}_{2}^{\prime }$, i.e., $\mathring{e}={\tilde{e}}/{\hat{\mu }} _{2}^{\prime }$ , such that $\mathring{e}^{\prime }\iota ={\tilde{e}}^{\prime }\iota /{\hat{\mu }}_{2}^{\prime }=N$. Thus,

$$\begin{aligned}&\frac{3}{\Pi ^{2}}\frac{{\hat{\sigma }}_{\varepsilon }^{2}}{({\hat{\mu }} _{2}^{\prime })^{3}}\ln N\cdot [ R(X^{\prime }X)^{-1}R^{\prime }] \\&\quad =\frac{3}{\Pi ^{2}}\frac{{\hat{\sigma }}_{\varepsilon }^{2}}{({\hat{\mu }} _{2}^{\prime })^{3}}\ln N\cdot \frac{1}{N^{2}}{\tilde{e}}^{\prime }M_{0}X(X^{\prime }X)^{-1}X^{\prime }M_{0}{\tilde{e}}] \\&\quad =\frac{3}{\Pi ^{2}}\frac{\ln N}{N}\cdot \frac{1}{N}\mathring{e}^{\prime }M_{0}X(X^{\prime }X)^{-1}X^{\prime }M_{0}\mathring{e}]. \end{aligned}$$

Since for a large N,

$$\begin{aligned} \mathring{e}^{\prime }M_{0}X(X^{\prime }X)^{-1}X^{\prime }M_{0}\mathring{e}= \frac{1}{N}\mathring{e}^{\prime }M_{0}X\left( \frac{X^{\prime }X}{N} \right) ^{-1}X^{\prime }M_{0}\mathring{e}=O_{p}(1), \end{aligned}$$

we obtain,

$$\begin{aligned} \text {RHS}=\frac{3}{\Pi ^{2}}\frac{\ln N}{N}\cdot \frac{O_{p}(1)}{N}. \end{aligned}$$

For a relatively large sample size N, RHS falls into the unity interval, implying the existence of ${\hat{k}}_{0}$ as the solution to $\min _{k_{0}\in [ 0,1)}C(k_{0})$.

Uniqueness of ${\hat{k}}_{0}$ is guaranteed by the second-order condition. The second-order derivative of $C(k_{0})$ is

$$\begin{aligned} \frac{d^{2}C(k_{0})}{dk_{0}^{2}}= & {} \frac{1}{N}\frac{d^{2}SSR_{r}(k_{0})}{ dk_{0}^{2}} \\= & {} \frac{\Pi ^{2}({\hat{\mu }}_{2}^{\prime })^{3}}{3N[R(X^{\prime }X)^{-1}R^{\prime }]}\frac{d\left[ \frac{1}{\Pi }\frac{{\hat{\mu }}_{3}^{\prime }}{( {\hat{\mu }}_{2}^{\prime })^{3/2}}k_{0}^{1/2}+k_{0}^{2}\right] }{dk_{0}} \\= & {} \frac{\Pi ^{2}({\hat{\mu }}_{2}^{\prime })^{3}}{3N[R(X^{\prime }X)^{-1}R^{\prime }]}\left[ \frac{1}{2\Pi }\frac{{\hat{\mu }}_{3}^{\prime }}{(\hat{\mu }_{2}^{\prime })^{3/2}}k_{0}^{-1/2}+2k_{0}\right] >0 \end{aligned}$$

for any $0<k_{0}<1$ since OLS residual skewness $\frac{{\hat{\mu }}_{3}^{\prime }}{({\hat{\mu }}_{2}^{\prime })^{3/2}}>0$ in the presence of wrong skewness.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cai, J., Feng, Q., Horrace, W.C. et al. Wrong skewness and finite sample correction in the normal-half normal stochastic frontier model. Empir Econ 60, 2837–2866 (2021). https://doi.org/10.1007/s00181-020-01988-z

Download citation

Received: 11 March 2020
Accepted: 11 November 2020
Published: 08 January 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s00181-020-01988-z

Keywords

JEL Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Wrong skewness and finite sample correction in the normal-half normal stochastic frontier model

Abstract

Similar content being viewed by others

The wrong skewness problem in stochastic frontier analysis: a review

Multivariate Skew Normal-Based Stochastic Frontier Models

Model uncertainty and efficiency measurement in stochastic frontier analysis with generalized errors

1 Introduction

2 Wrong skewness issue

3 Finite sample correction

3.1 Linearizing the constraint

3.2 Choosing the value of \(k_{0}\)

4 Constrained estimators

4.1 Constrained COLS

Proposition 1

4.2 Constrained MLE

5 Monte Carlo experiments

6 Empirical example: the US airline industry

7 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Constrained COLS

Proof of Proposition 1

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Navigation

Wrong skewness and finite sample correction in the normal-half normal stochastic frontier model

Abstract

Similar content being viewed by others

The wrong skewness problem in stochastic frontier analysis: a review

Multivariate Skew Normal-Based Stochastic Frontier Models

Model uncertainty and efficiency measurement in stochastic frontier analysis with generalized errors

1 Introduction

2 Wrong skewness issue

3 Finite sample correction

3.1 Linearizing the constraint

3.2 Choosing the value of \(k_{0}\)

4 Constrained estimators

4.1 Constrained COLS

Proposition 1

4.2 Constrained MLE

5 Monte Carlo experiments

6 Empirical example: the US airline industry

7 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Constrained COLS

Appendix: Constrained COLS

Proof of Proposition 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation