Evaluating the CDF of the distribution of the stochastic frontier composed error

Amsler, Christine; Schmidt, Peter; Tsay, Wen-Jen

doi:10.1007/s11123-019-00554-9

Evaluating the CDF of the distribution of the stochastic frontier composed error

Published: 08 August 2019

Volume 52, pages 29–35, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Productivity Analysis Aims and scope Submit manuscript

Evaluating the CDF of the distribution of the stochastic frontier composed error

Download PDF

432 Accesses
7 Citations
Explore all metrics

Abstract

In the stochastic frontier model, the composed error is the sum (or difference) of a normal and a half normal random variable. Often the composed error is linked to other errors using a copula, and evaluation of the copula requires evaluation of the cdf of the composed error. There is no analytical expression for this cdf, though there are several approximations. We propose a computationally efficient simulation based method of evaluation and use it to evaluate the accuracy of these approximations. We also derive the exact cdf of the composed error for the special case that the stochastic frontier relative variance parameter λ equals one, and we use this expression to investigate the accuracy of our evaluations and the existing approximations.

Likelihood computation in the normal-gamma stochastic frontier model

Article 27 September 2017

Maximum simulated likelihood estimation of the seemingly unrelated stochastic frontier regressions

Article 11 November 2020

Goodness--of--fit tests for stochastic frontier models based on the characteristic function

Article 14 March 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

This paper deals with the evaluation of the cumulative distribution function (cdf) of the stochastic frontier model’s composed error. Since the normal/half-normal composed error has a skew-normal distribution, we can also say that the paper deals with the evaluation of the cdf of the skew-normal distribution.

Evaluation of the skew-normal cdf may be important in a number of contexts, including at least the following two. (1) In many multi-equation models, or in a panel data setting, the composed error in a stochastic frontier production or cost function is linked to other errors using a copula. Some examples of this approach include Amsler et al. (2014, 2016, 2017), Carta and Steel (2012), Das (2015), Genius et al. (2012), Huang et al. (2017), Huang et al. (2018), Lai and Huang (2013), Shi and Zhang (2011), Sriboonchitta et al. (2017) and Tran and Tsionas (2015). The evaluation of the likelihood of such a model involves the calculation of the copula density, which in turn requires the calculation of the cdf of each of the marginal distributions of the various errors in the model. Therefore, if one of the errors is a stochastic frontier composed error, evaluating the likelihood requires calculation of the skew-normal cdf. (2) We may want to test the distributional assumptions of the stochastic frontier model by testing whether the composed error has a skew-normal distribution, as suggested by Wang et al. (2011). Their preferred test is a bootstrapped version of the Kolmogorov-Smirnov test, and its calculation requires the calculation of the skew-normal cdf.

There is no known closed form solution for the skew-normal cdf. It can be calculated by simulation, and there are some available approximations, such as Ashour and Abdul-Hameed (2010) and Tsay et al. (2013). In this paper we provide a simulation-based method which is computationally efficient relative to the simple empirical cdf. We use it to evaluate the accuracy of the existing approximations.

The paper has five main contributions. First, it proposes the new simulation-based method of evaluating the skew-normal cdf. Second, it uses this method to evaluate the accuracy of existing approximations, notably that of Tsay et al. (2013). Third, we create a tabulation of the cdf, part of which is given in this paper and most of which is in a supplemental file, which can be used in estimation. Interpolation in such a file is faster than evaluating an approximation, which in turn is faster than simulation-based or quadrature methods.

A fourth objective is to extend the range of values of the composed error for which we can calculate a cdf value that is not zero and is not one. This is important because some commonly used copulas are undefined if the marginal cdf has a value of zero or one. For example, if the composed error ε has cdf F, and if Φ is the standard normal cdf, the Gaussian (normal) copula contains the term Φ⁻¹(F(ε)), which equals minus infinity when F = 0 and equals plus infinity when F = 1. This could cause the calculation of the copula density to break down. We are able to calculate non-zero values of F(ε) and 1 − F(ε) for a much wider range of ε than in previous papers.

Finally, we derive a closed-form solution for the skew-normal cdf in the special case that the relative variance parameter in the stochastic frontier model (λ) equals one. Therefore, in that special case, we have a much-needed exact standard for assessing the accuracy of both simulation-based and approximate methods of evaluating the cdf.

2 Theory

We start with some notation and basics. The composed error is ε = v + u, where $v\sim N\left( {0,\sigma _v^2} \right)$, $u\sim N^ + \left( {0,\sigma _u^2} \right)$, and v and u are independent. Standard notation is $\sigma ^2 = \sigma _u^2 + \sigma _v^2$ and λ = σ_u/σ_v. Then ε has the skew-normal density $sn_{\lambda ,\sigma }\left( \varepsilon \right) = \left( {\frac{2}{\sigma }} \right)\varphi \left( {\frac{\varepsilon }{\sigma }} \right)\Phi \left( {\frac{{\lambda \varepsilon }}{\sigma }} \right)$, where φ is the standard normal pdf and Φ is the standard normal cdf. We want to calculate and tabulate the skew-normal cdf P_λ,σ(Q) = P(ε ≤ Q), for as large a range of values of Q as we can (i.e. where the calculations are numerically possible).

The above discussion is for the case of v + u, which would be natural in a cost frontier, and follows the discussion in Tsay et al. (2013). In the case of a production frontier, as in the original papers of Aigner et al. (1977) and Meeusen and van den Broeck (1977), we would want to consider $\varepsilon _ \ast = v - u$ instead of ε = v + u. But this does not require a separate tabulation, because the distribution of $\varepsilon _ \ast$ is the same as the distribution of (−ε). Explicitly, if $P_{\lambda ,\sigma }^ \ast \left( Q \right) = P\left( {\varepsilon _ \ast \le Q} \right)$, then $P_{\left( {\lambda ,\sigma } \right)}^ \ast \left( Q \right) = 1 - P_{\left( {\lambda ,\sigma } \right)}\left( { - Q} \right)$ and we can get values of P^* from a tabulation of P.

It would appear that we would require a three−dimensional tabulation, giving probabilities over values of the two parameters λ and σ, plus values of Q. But in fact we only need a two-dimensional tabulation, over values of λ and Q. Specifically, we can pick σ = 1 and just tabulate P_λ,1(Q). For other values of σ, we use the fact that P_λ,σ(Q) = P_λ,1(Q/σ). To see why this equality holds, start with $P_{\lambda ,\sigma }\left( Q \right) = {\int}_0^Q {\left( {\frac{2}{\sigma }} \right)\varphi \left( {\frac{\varepsilon }{\sigma }} \right)\Phi \left( {\frac{{\lambda \varepsilon }}{\sigma }} \right)d\varepsilon }$ and make the substitutions $z = \frac{\varepsilon }{\sigma }$ and dε = σdz, and note that the upper limit of integration ε = Q becomes $z = \frac{Q}{\sigma }$. Thus we have ${\int}_0^Q {\left( {\frac{2}{\sigma }} \right)\varphi \left( {\frac{\varepsilon }{\sigma }} \right){\it{\Phi }}\left( {\frac{{\lambda \varepsilon }}{\sigma }} \right)d\varepsilon = {\int}_0^{Q/\sigma } {2\varphi \left( z \right){\it{\Phi }}\left( {\lambda z} \right)dz = P_{\lambda ,1}\left( {Q/\sigma } \right)} }$.

There is no closed-form expression for the cdf of the skew-normal distribution. The required integral is widely regarded as intractable. (See the Appendix for some explanation of this point.) The cdf can be calculated (or estimated) by numerical integration, or by simulation. Numerical integration (quadrature) is of questionable accuracy, especially in the extreme tails. We calculate cdf values that are sometimes extremely small, like 4.08e−115 for λ = 1, Q = −16, and we cannot expect quadrature to yield an accurate evaluation of a probability that small, whereas as we will see this cdf value is accurately evaluated by our simulation algorithm.

The most obvious path to evaluation by simulation is the empirical cdf, that is, F(Q) is estimated by the fraction of draws from the distribution of ε that are less than or equal to Q. This works reasonably well in the middle of the distribution, but in the tails it requires an unreasonably large number of draws. For example, in Tsay et al. (2013), Table 1, p. 262, for λ = 1.5, σ² = 1.444, they report F(−3) = 0.0000006, or 6/10,000,000. That is, they used 10,000,000 replications and got six draws that were less than or equal to −3. In the calculations we report below, we have probability values in the tails that are very small, e.g. 3.87e−31 for λ = 1 and Q = −8, and so we would need a number of replications on the order of 10³¹ or larger to hope to estimate this probability. That is obviously not feasible.

Table 1 Values of our evaluation of F(Q), for Q ≤ 0, in bold, R = 10,000,000

Full size table

Similarly, Wang et al. (2011) calculated and reported the quantiles of $\varepsilon _ \ast = v - u$ based on a sample of 10,000,000 draws, for various values of λ. In their supplemental tables (available on request from the authors), they consider a very large set of values of λ, and they give the empirical quantiles 0.01, 0.02, …, 0.99. (They also give the quantiles zero and one, but these are just the minimum and maximum values in the sample, whereas the population distribution of $\varepsilon _ \ast$ does not have a finite minimum or maximum value.) The information in the quantile values is in principle the same as in the cdf values, and they could have considered quantiles smaller than 1% or bigger than 99%, but for exactly the same reasons as given in the previous paragraph they could not have calculated meaningful quantiles very far into the tail without using far more than 10,000,000 draws.

As an alternative, we will propose a method that is very similar to a method often used in the literature on simulated MLE. See, e.g., Greene (2010). A probability is the expectation of an indicator function, and by the law of iterated expectations P(ε ≤ Q) = P(v + u ≤ Q) = P(v ≤ Q − u) = E_uP(v ≤ Q − u|u) = E_uΦ[(Q − u)/σ_v]. (The last equality follows from the independence of v and u.) We calculate this by averaging Φ[(Q − u)/σ_v] over a large number of draws from the distribution of u.

To be very explicit, our procedure is as follows. (1) Set σ = 1 and pick a value of λ. Calculate the implied values of σ_u and σ_v. With σ² = 1, these are $\sigma _v^2 = 1/\left( {1 + \lambda ^2} \right)$ and $\sigma _u^2 = \lambda ^2/\left( {1 + \lambda ^2} \right)$. (2) Pick a value of Q. (3) Now, for replication r = 1, …, R, where R is a very large number, take a draw from N(0,1), take its absolute value, and multiply by σ_u to get u_r. This generates a draw from $N^ + \left( {0,\sigma _u^2} \right)$ because the absolute value of a N(0,1) random variable is distributed as N⁺(0,1), and multiplying by σ_u converts N⁺(0,1) into $N^ + \left( {0,\sigma _u^2} \right)$. (4) Calculate Φ[(Q − u_r)/σ_v]. (5) Average this over the R replications.

This is preferable to an evaluation of the empirical cdf because it avoids the randomness from drawing v, and because reliable methods exist for evaluating the normal cdf in the extreme tails, such as 20 standard deviations from zero.

Finally, although the skew-normal cdf is analytically intractable, we were able to derive an exact expression for the cdf for the special case of λ = 1 (σ_u = σ_v). This is given in the following result, which we prove in the Appendix.

3 Result

Suppose that λ = 1 (σ_u = σ_v). Then $P_{1,\sigma }\left( Q \right) = \Phi ^2\left( {\frac{Q}{{\sqrt 2 \sigma _u}}} \right)$. When σ² = 1, this simplifies to P_1,1(Q) = Φ²(Q).

This result is useful because, apart from the trivial case of λ = 0 (the normal distribution), it provides the only exact standard for assessing the accuracy of both simulation-based and approximate methods of evaluating the skew-normal cdf.

4 Some tabulations and comparisons

Table 1 gives values of the cdf F(Q) = P(ε ≤ Q) for non-positive values of Q, with −16 ≤ Q ≤ 0. Table 2 gives values of 1 − F(Q) for positive values of Q, with 1 ≤ Q ≤ 20. The reason that we show values of 1 − F(Q) for positive values is that otherwise, for the larger values of Q, F(Q) would round to one unless a very large number of decimal places were preserved, and if they were preserved the number of digits “9” would fill a whole line. For example, for λ = 1 and Q = 12, our value of F(Q) is 1−3.64006e−39, and to display that in decimal form would require 38 digits “9” between the decimal place and 635994. Of course, it does not matter whether we report that 1 − F(Q) = 3.64006e−39 or F(Q) = 1−3.64006e−39.

Table 2 Values of our evaluation of 1 − F(Q), for Q > 0, in bold, R = 10,000,000

Full size table

For each (Q,λ) “cell,” the top number, in bold, is our evaluation of F(Q) or 1 − F(Q) and the number underneath it is the approximation of Tsay et al. (2013).

The first thing to note is that we are able to calculate a value for F(Q), both for our method and for the approximation of Tsay et al., for a much larger range of Q than has previously been done. The numerical issues involved will be discussed in the next Section. For now, we simply note that Tsay et al., Table 1, reported results for Q from −3.0 to 3.0, and their algorithm would not calculate probabilities smaller than about 1.0e−16. Ashour and Abdul-Hameed (2010) tabulated results for Q in the range from zero to four. Wang et al. (2011) tabulated quantiles, not cdf values, but the smallest quantile they considered was 0.01 and the largest was 0.99.

To ask how close our cdf values are to the Tsay et al. approximation, we have to ask what we mean by close. For example, for Q = −1, λ = 1, the cdf values of 0.02516 and 0.02514 are close in both absolute and relative terms, whereas for Q = −12, λ = 1, the values of 3.153e−66 and 1.994e−61 are close in absolute terms but not in relative terms. Both Tsay et al. and Ashour and Abdul-Hameed comment on closeness in absolute terms, but it is not clear why this is relevant. Indeed, the relevant notion of closeness logically depends on the copula. For example, if we are using the normal copula, what is relevant is the value of Φ⁻¹(F(ε)). For Q = −12, λ = 1, the value of Φ⁻¹(F(ε)) is −16.923 for our calculation, −16.259 for the Tsay et al. approximation, and minus infinity for the Ashour and Abdul-Hameed approximation, which equals zero for all Q < −3. As another example, for Q = 16, λ = 1, the value of Φ⁻¹(F(ε)) is 18.134 for our calculation and 15.712 for the Tsay et al. approximation. Of course, these numbers would be different for a different copula, and ultimately the bias in estimation caused by a miscalculated cdf will depend both on the copula and the model that uses the copula.

Having said that, the values of our calculation of F(Q) and the Tsay et al. approximation are quite close in both absolute and relative terms for non-extreme values of Q, say −4 ≤ Q ≤ 3. For more extreme values of Q, they are close in absolute but not always in relative terms.

The Ashour and Abdul-Hameed approximation, which sets F(Q) = 0 for Q < −3, is in a sense infinitely bad in relative terms for Q in that range, and we will drop it from further consideration, even though it appears to be accurate in the non-extreme part of the range of Q.

5 Numerical issues and accuracy checks

5.1 Numerical issues

Our calculations were done in MATLAB.

For the non-positive values of Q, we calculated the normal cdf (so that we can calculate Φ[(Q−u_r)/σv]) using the MATLAB command normcdf. This gave results that matched those in Marsaglia (2004) for −16.6 ≤ z ≤ −0.1 where z is generic notation for the normal cdf argument. The MATLAB results also matched the results from the online Casio Keisan normal cdf calculator.

For positive values, some care needed to be taken to keep the cdf from rounding to one. For example, for z = 12, normcdf returns “1”. However, the MATLAB command normcdf, ‘upper’ returns the upper tail probability 1 − Φ(12) = 1.77648e−33. The key is to average the values of 1 − Φ[(Q−u_r)/σ_v] and then subtract this average from one so that the small deviations from one are preserved. If you subtract the individual deviations from one separately for each replication and then average, you will just get one.

A check of the accuracy of the routine normcdf, ‘upper’ is that, for positive z, the value of 1 − Φ(z) equaled Φ(−z), which it did, even for extreme values of z. For example, normcdf evaluated at z = −20 gives 2.753624e−89 and normcdf,’upper’ evaluated at z = 20 gives 1–2.753624e−89.

Similar considerations apply to the calculation of the Tsay et al. approximation. For Q < 0, the approximate cdf as given in equation (12) of their text and in the last equation of their Appendix is of the form 2GH, where G and H are our shorthand for the terms in the last equation in the Appendix. The term G is easily calculated, but H involves the “error function” ${\mathrm{erf}}\left( z \right) = \frac{2}{{\sqrt \pi }}{\int}_0^z {\exp \left( { - t^2} \right)dt}$. Specifically, H = 1 − erf(z) where z is a linear function of Q, with a negative coefficient on Q. For Q < 0 but large in magnitude, z will be a large positive number and erf(z) will round to one and 1−erf(z) will round to zero. The solution is to use not the MATLAB command erf, but rather to use the command erfc to calculate erfc(z) = 1 − erf(z). For example, when λ = 1 and Q = −10, z = 9.7847, and we could calculate erf(9.7847) = 1, 1 − erf(9.7847) = 0, H = 0 and approximate cdf = 0, which is not an accurate or useful result. We need to calculate H = erfc(9.7847) = 1.5117e − 43, which yields an approximate cdf of 6.7566e−43.

Conversely, for large positive Q, we have an extra term, which we will call J, as given in the last term of the second to last equation of the Appendix. The approximate cdf is equal to $2\left( {GH^ \ast + J} \right)$, where H^* is like H except for a sign change in one term. G is the same as before and H^* is almost zero and it won’t matter numerically if it rounds to zero or not. The value of the approximate cdf is determined by the term 2J where J is 0.5 minus a very small number, and it is essential not to let J round to 0.5. So for example for Q = 20, 2J = erf(14.1421) and if you calculate erf(14.1421) you will get one “exactly.” Instead you need to calculate erf(14.421) as 1 – erfc(14.1421) = 1–5.51281e−89 and this will lead to the approximate cdf equal to 1− 5.51282e−89, a meaningful result.

Finally, we consider the evaluation of the exact cdf P_1,1(Q) = Φ²(Q) for the case of λ = 1. For negative Q, there is no numerical issue. We just calculate calculate Φ(Q) and take the square. For large positive Q, however, we need to take care to keep Φ(Q) from rounding to one. For example, for Q = 8, we use the command normcdf, ‘upper’ to obtain 1 − Φ(8) = 6.22096e − 16, which we translate into Φ(8) = 1 − 6.22096e − 16 and Φ²(8) = 1 − 2 × 6.2209e−16 + 6.22096²e − 32 = 1 − 1.24419e − 15.

5.2 Accuracy of the calculations

The Tsay et al. approximation and the exact result for λ = 1 are closed-form expressions, apart from the need to evaluate the normal cdf and error function. So there is little question of numerical accuracy for these results. However, our simulated F(Q) is not a closed form expression and it could be inaccurate for any number of reasons, most notably the inherent randomness of the simulation and the quality of the random number generator.

The first thing we investigate is how sensitive the results are to the choice of R, the number of replications in the simulation. Table 3 gives results for some values of Q, for R ranging from 1,000,000 to 100,000,000, for the case of λ = 1, so that we have the exact result to compare to. For Q ≤ 4, the results do not depend very much on the number of replications, and R = 1,000,000 is sufficient to give reasonably accurate results. Things begin to be less clear for Q = 6, and for Q ≥ 8 the results depend more strongly on R, and they do not converge unambiguously to the exact result even for R = 100,000,000.

Table 3 Simulated F(Q), λ = 1, for various values of R

Full size table

Table 4 gives the comparison between the simulated F(Q) and the exact F(Q), for λ = 1 and for more different values of Q. The results are the same as described in the previous paragraph. The simulated cdf is accurate for Q ≤ 4 and becomes less accurate thereafter.

Table 4 Exact versus simulated F(Q), λ = 1

Full size table

Of course, this may just reflect too strict a meaning of the word accurate. The simulated cdf is quite close to the exact cdf in the absolute sense, for all of the values of Q that we consider. The inaccuracy that we have identified is in the relative sense.

There are multiple possible explanations for numerical inaccuracy in a simulation, but in this case it is easy to suspect the random number generator. We used the MATLAB command rng (s,’twister’) where “twister” denotes the Marsenne twister algorithm and s is the seed. We picked s = 1. A sample of 10,000,000 pseudo-random normal deviates from this generator passed standard tests on the first four moments. However, a more focused test of the random number generator is to check whether other random number generators give different results, and, if so, whether they more closely match the exact results for λ = 1.

We considered two addition pseudo-random number generators. One is the MATLAB command rng (‘default’), which is the same as rng (0,’twister’). The other creates pseudo-random deviates z such that the values of Φ(z) uniformly fill the space [0,1]. Explicitly, for j = 1, …, R, we choose $z_j = \Phi ^{ - 1}\left( {\left( {j - 1/2} \right)/R} \right)$. We could call this uniform spacing. This is somewhat similar to a Van der Corput sequence, which is a one−dimensional Halton sequence, but it is simpler because we have no need to have uncorrelated draws.

For values of Q where our results are numerically stable (Q ≤ 4) the choice of random number generator makes very little difference. For example, for Q = 2 and λ = 1, the three random number generators listed above yield 1−F(Q) as 0.0450189, 0.0449673 and 0.0449827, respectively. These are all quite close to each other and to the exact value of 0.0449826. However, for larger values of Q the random number generator matters more. For example, for Q = 8 and λ = 1, we obtain 2.58074e−16, 3.69353e−16 and 4.86878e−16. So the choice of random number generators matters for the larger values of Q. However, none of these numbers is particularly close (in relative terms) to the exact value of 1.24419e−15. So the problem could be random number generation, but it is not due to the specific random number generator we used, and the Marsenne twister is considered to be the state−of-the−art random number generation algorithm. There is essentially an infinity of possible random number generators, and it is just not clear how likely it is that we could find one that would solve our inaccuracy problem, if in fact the problem does lie in random number generation.

6 Tabulations

We have created a set of supplemental tables, available on request, that give our calculation of F(Q) as a function of Q and λ. They cover the range −16 ≤ Q ≤ 10 and 0 ≤ λ ≤ 8 and were calculated for R = 10,000,000. We trust these calculations to be accurate for Q ≤ 4. For Q between 4 and 10, they are less accurate, but we have included these numbers because it is not clear what a better alternative would be. An evaluation that is exactly equal to one is not a good alternative. For the case of λ = 1, for which we have an exact result, the Tsay et al. approximation is quite accurate for large values of Q. We conjecture that this may be so for other values of λ as well, but we have no evidence to support this conjecture.

7 Concluding remarks

In our view, the main contribution of the paper is to have extended the range of the argument over which we can get numerically stable and believable cdf values. This range is not as wide as we would like, but it is considerably wider than in previous papers.

The other substantial contribution of the paper is the derivation of a closed-form expression for the exact cdf, for the special case of λ = 1. This allows us to check the accuracy of the cdf values that we have calculated and tabulated, at least for one special case.

References

Aigner DJ, Lovell CAK, Schmidt P (1977) Formulation and estimation of stochastic frontier production function models. J Econ 6:21–37
Article Google Scholar
Amsler C, Prokhorov A, Schmidt P (2014) Using copulas to model time dependence in stochastic frontier models. Econom Rev 33:497–522
Article Google Scholar
Amsler C, Prokhorov A, Schmidt P (2016) Endogeneity in stochastic frontier models. J Econ 190:280–288
Article Google Scholar
Amsler C, Prokhorov A, Schmidt P (2017) Endogenous environmental variables in stochastic frontier models. J Econ 199:131–140
Article Google Scholar
Ashour SK, Abdul-Hameed MA (2010) Approximate skew normal distribution. J Adv Res 1:1–11
Article Google Scholar
Carta A, Steel MFJ (2012) Modelling multi-output stochastic frontiers using copulas. Comput Stat Data Anal 56:3757–3773
Article Google Scholar
Das A (2015) Copula-based stochastic frontier model with autocorrelated inefficiency. Cent Eur J Econ Model Econ 7:111–126
Google Scholar
Genius M, Stefanou S, Tzouvelekas V (2012) Measuring productivity growth under factor non-substitution: an application to us steam-electric power generation utilities. Eur J Oper Res 220:844–852
Article Google Scholar
Greene WH (2010) A stochastic frontier model with correction for sample selection. J Product Anal 34:15–24
Article Google Scholar
Huang T-H, Chiang D-L, Chao S-W (2017) A new approach to jointly estimating the lerner index and cost efficiency for multi-output banks under a stochastic meta-frontier framework. Q Rev Econ Financ 65:212–226
Article Google Scholar
Huang T-H, Liu N-H, Kumbhakar SC (2018) Joint estimation of the lerner index and cost efficiency using copula methods. Empir Econ 54:799–822
Article Google Scholar
Lai H-P, Huang CJ (2013) Maximum likelihood estimation of seemingly unrelated stochastic frontier regressions. J Product Anal 40:1–14
Article Google Scholar
Marsaglia G (2004) Evaluating the normal distribution. J Stat Softw 11:1–11
Google Scholar
Meeusen W, van den Broeck J (1977) Efficiency estimation from cobb-douglas production functions with composed error. Int Econ Rev 18:435–444
Article Google Scholar
Owen DB (1980) A table of normal integrals. Commun Stat: Simul Comput 9:389–419
Article Google Scholar
Shi P, Zhang W (2011) A copula regression model for estimating firm efficiency in the insurance industry. J Appl Stat 38:2271–2287
Article Google Scholar
Sriboonchitta S, Liu J, Wiboonpongse A, Denoeux T (2017) A double−copula stochastic frontier model with dependent error components and correction for sample selection. Int J Approx Reason 80:174–184
Article Google Scholar
Tran KC, Tsionas EG (2015) Endogeneity in stochastic frontier models: copula approach without external instruments. Econ Lett 133:85–88
Article Google Scholar
Tsay W-J, Huang CJ, Fu T-T, Ho L-L (2013) A simple closed form approximation for the cumulative distribution function of the composite error of stochastic frontier models. J Product Anal 39:259–269
Article Google Scholar
Wang WS, Amsler C, Schmidt P (2011) Goodness of fit tests in stochastic frontier models. J Product Anal 35:95–118
Article Google Scholar

Download references

Author information

Authors and Affiliations

Michigan State University, East Lansing, MI, USA
Christine Amsler & Peter Schmidt
Academia Sinica, Taipei, Taiwan
Wen-Jen Tsay

Authors

Christine Amsler
View author publications
You can also search for this author in PubMed Google Scholar
Peter Schmidt
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Jen Tsay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Schmidt.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental Table 1

Supplemental Table 2

Appendix

We wish to evaluate

$$P \equiv P_{1,1}\left( Q \right) = 2{\int}_0^\infty {{\it{\Phi }}\left( {\frac{{Q - u}}{{\sigma _v}}} \right)} \frac{1}{{\sigma _u}}\varphi \left( {\frac{u}{{\sigma _u}}} \right)du.$$

Suppose that $\sigma ^2 = \sigma _u^2 + \sigma _v^2 = 1$. Now make the substitutions $z = \frac{u}{{\sigma _u}}$ and = σ_udz, and define $a = \frac{Q}{{\sigma _v}}$ and $\lambda = \frac{{\sigma _u}}{{\sigma _v}}$. This yields

$$P = 2{\int}_0^\infty {{\it{\Phi }}\left( {a - \lambda z} \right)\varphi \left( z \right)dz,}$$

According to Owen (1980), equation 10,010.6, p. 403,

$${\int}_0^\infty {\Phi \left( {a + bz} \right)\varphi \left( z \right)dz} = \frac{1}{2}\Phi \left( {\frac{a}{{\sqrt {1 + b^2} }}} \right) + T\left( {\frac{a}{{\sqrt {1 + b^2} }},\,b} \right),$$

where (Owen, p. 391)

$$T\left( {h,b} \right) = {\int}_0^b {\frac{{\varphi \left( h \right)\varphi \left( {hx} \right)}}{{1 + x^2}}dx} .$$

In our case b = −λ and $\sqrt {1 + \lambda ^2} = 1/\sigma _v$ so $\frac{a}{{\sqrt {1 + b^2} }} = Q$. Therefore

$$P = \Phi \left( Q \right) + 2T\left( {Q, - \lambda } \right).$$

According to equation 2.6, p. 414 of Owen, T(Q,−λ) = −T(Q,λ) and therefore

$$P = \Phi \left( Q \right) - 2T\left( {Q,\lambda } \right)$$

There is no closed form expression for the integral that defines T(Q,λ), so all that we have done so far is to exchange one intractable integral for another. However, there is an exception, which is the case that λ = 1. Equation 2.3, p. 414, of Owen says that

$$T\left( {Q,1} \right) = \frac{1}{2}\Phi \left( Q \right)\left[ {1 - \Phi \left( Q \right)} \right].$$

Therefore when λ = 1 we have

$$P = {\it{\Phi }}\left( Q \right) - 2\left( {\frac{1}{2}} \right){\it{\Phi }}\left( Q \right)\left[ {1 - {\it{\Phi }}\left( Q \right)} \right] = {\it{\Phi }}^2\left( Q \right).$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amsler, C., Schmidt, P. & Tsay, WJ. Evaluating the CDF of the distribution of the stochastic frontier composed error. J Prod Anal 52, 29–35 (2019). https://doi.org/10.1007/s11123-019-00554-9

Download citation

Published: 08 August 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s11123-019-00554-9

Keywords

JEL classification

C13

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Evaluating the CDF of the distribution of the stochastic frontier composed error

Abstract

Similar content being viewed by others

Likelihood computation in the normal-gamma stochastic frontier model

Maximum simulated likelihood estimation of the seemingly unrelated stochastic frontier regressions

Goodness--of--fit tests for stochastic frontier models based on the characteristic function

1 Introduction

2 Theory

3 Result

4 Some tabulations and comparisons

5 Numerical issues and accuracy checks

5.1 Numerical issues

5.2 Accuracy of the calculations

6 Tabulations

7 Concluding remarks

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Supplementary information

Supplemental Table 1

Supplemental Table 2

Appendix

Rights and permissions

About this article

Cite this article

Keywords

JEL classification

Navigation

Evaluating the CDF of the distribution of the stochastic frontier composed error

Abstract

Similar content being viewed by others

Likelihood computation in the normal-gamma stochastic frontier model

Maximum simulated likelihood estimation of the seemingly unrelated stochastic frontier regressions

Goodness--of--fit tests for stochastic frontier models based on the characteristic function

1 Introduction

2 Theory

3 Result

4 Some tabulations and comparisons

5 Numerical issues and accuracy checks

5.1 Numerical issues

5.2 Accuracy of the calculations

6 Tabulations

7 Concluding remarks

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Supplementary information

Supplemental Table 1

Supplemental Table 2

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL classification

Search

Navigation