1 Introduction

This paper deals with the evaluation of the cumulative distribution function (cdf) of the stochastic frontier model’s composed error. Since the normal/half-normal composed error has a skew-normal distribution, we can also say that the paper deals with the evaluation of the cdf of the skew-normal distribution.

Evaluation of the skew-normal cdf may be important in a number of contexts, including at least the following two. (1) In many multi-equation models, or in a panel data setting, the composed error in a stochastic frontier production or cost function is linked to other errors using a copula. Some examples of this approach include Amsler et al. (2014, 2016, 2017), Carta and Steel (2012), Das (2015), Genius et al. (2012), Huang et al. (2017), Huang et al. (2018), Lai and Huang (2013), Shi and Zhang (2011), Sriboonchitta et al. (2017) and Tran and Tsionas (2015). The evaluation of the likelihood of such a model involves the calculation of the copula density, which in turn requires the calculation of the cdf of each of the marginal distributions of the various errors in the model. Therefore, if one of the errors is a stochastic frontier composed error, evaluating the likelihood requires calculation of the skew-normal cdf. (2) We may want to test the distributional assumptions of the stochastic frontier model by testing whether the composed error has a skew-normal distribution, as suggested by Wang et al. (2011). Their preferred test is a bootstrapped version of the Kolmogorov-Smirnov test, and its calculation requires the calculation of the skew-normal cdf.

There is no known closed form solution for the skew-normal cdf. It can be calculated by simulation, and there are some available approximations, such as Ashour and Abdul-Hameed (2010) and Tsay et al. (2013). In this paper we provide a simulation-based method which is computationally efficient relative to the simple empirical cdf. We use it to evaluate the accuracy of the existing approximations.

The paper has five main contributions. First, it proposes the new simulation-based method of evaluating the skew-normal cdf. Second, it uses this method to evaluate the accuracy of existing approximations, notably that of Tsay et al. (2013). Third, we create a tabulation of the cdf, part of which is given in this paper and most of which is in a supplemental file, which can be used in estimation. Interpolation in such a file is faster than evaluating an approximation, which in turn is faster than simulation-based or quadrature methods.

A fourth objective is to extend the range of values of the composed error for which we can calculate a cdf value that is not zero and is not one. This is important because some commonly used copulas are undefined if the marginal cdf has a value of zero or one. For example, if the composed error ε has cdf F, and if Φ is the standard normal cdf, the Gaussian (normal) copula contains the term Φ−1(F(ε)), which equals minus infinity when F = 0 and equals plus infinity when F = 1. This could cause the calculation of the copula density to break down. We are able to calculate non-zero values of F(ε) and 1 − F(ε) for a much wider range of ε than in previous papers.

Finally, we derive a closed-form solution for the skew-normal cdf in the special case that the relative variance parameter in the stochastic frontier model (λ) equals one. Therefore, in that special case, we have a much-needed exact standard for assessing the accuracy of both simulation-based and approximate methods of evaluating the cdf.

2 Theory

We start with some notation and basics. The composed error is ε = v + u, where \(v\sim N\left( {0,\sigma _v^2} \right)\), \(u\sim N^ + \left( {0,\sigma _u^2} \right)\), and v and u are independent. Standard notation is \(\sigma ^2 = \sigma _u^2 + \sigma _v^2\) and λ = σu/σv. Then ε has the skew-normal density \(sn_{\lambda ,\sigma }\left( \varepsilon \right) = \left( {\frac{2}{\sigma }} \right)\varphi \left( {\frac{\varepsilon }{\sigma }} \right)\Phi \left( {\frac{{\lambda \varepsilon }}{\sigma }} \right)\), where φ is the standard normal pdf and Φ is the standard normal cdf. We want to calculate and tabulate the skew-normal cdf Pλ,σ(Q) = P(ε ≤ Q), for as large a range of values of Q as we can (i.e. where the calculations are numerically possible).

The above discussion is for the case of v + u, which would be natural in a cost frontier, and follows the discussion in Tsay et al. (2013). In the case of a production frontier, as in the original papers of Aigner et al. (1977) and Meeusen and van den Broeck (1977), we would want to consider \(\varepsilon _ \ast = v - u\) instead of ε = v + u. But this does not require a separate tabulation, because the distribution of \(\varepsilon _ \ast\) is the same as the distribution of (−ε). Explicitly, if \(P_{\lambda ,\sigma }^ \ast \left( Q \right) = P\left( {\varepsilon _ \ast \le Q} \right)\), then \(P_{\left( {\lambda ,\sigma } \right)}^ \ast \left( Q \right) = 1 - P_{\left( {\lambda ,\sigma } \right)}\left( { - Q} \right)\) and we can get values of P* from a tabulation of P.

It would appear that we would require a three−dimensional tabulation, giving probabilities over values of the two parameters λ and σ, plus values of Q. But in fact we only need a two-dimensional tabulation, over values of λ and Q. Specifically, we can pick σ = 1 and just tabulate Pλ,1(Q). For other values of σ, we use the fact that Pλ,σ(Q) = Pλ,1(Q/σ). To see why this equality holds, start with \(P_{\lambda ,\sigma }\left( Q \right) = {\int}_0^Q {\left( {\frac{2}{\sigma }} \right)\varphi \left( {\frac{\varepsilon }{\sigma }} \right)\Phi \left( {\frac{{\lambda \varepsilon }}{\sigma }} \right)d\varepsilon }\) and make the substitutions \(z = \frac{\varepsilon }{\sigma }\) and  = σdz, and note that the upper limit of integration ε = Q becomes \(z = \frac{Q}{\sigma }\). Thus we have \({\int}_0^Q {\left( {\frac{2}{\sigma }} \right)\varphi \left( {\frac{\varepsilon }{\sigma }} \right){\it{\Phi }}\left( {\frac{{\lambda \varepsilon }}{\sigma }} \right)d\varepsilon = {\int}_0^{Q/\sigma } {2\varphi \left( z \right){\it{\Phi }}\left( {\lambda z} \right)dz = P_{\lambda ,1}\left( {Q/\sigma } \right)} }\).

There is no closed-form expression for the cdf of the skew-normal distribution. The required integral is widely regarded as intractable. (See the Appendix for some explanation of this point.) The cdf can be calculated (or estimated) by numerical integration, or by simulation. Numerical integration (quadrature) is of questionable accuracy, especially in the extreme tails. We calculate cdf values that are sometimes extremely small, like 4.08e−115 for λ = 1, Q = −16, and we cannot expect quadrature to yield an accurate evaluation of a probability that small, whereas as we will see this cdf value is accurately evaluated by our simulation algorithm.

The most obvious path to evaluation by simulation is the empirical cdf, that is, F(Q) is estimated by the fraction of draws from the distribution of ε that are less than or equal to Q. This works reasonably well in the middle of the distribution, but in the tails it requires an unreasonably large number of draws. For example, in Tsay et al. (2013), Table 1, p. 262, for λ = 1.5, σ2 = 1.444, they report F(−3) = 0.0000006, or 6/10,000,000. That is, they used 10,000,000 replications and got six draws that were less than or equal to −3. In the calculations we report below, we have probability values in the tails that are very small, e.g. 3.87e−31 for λ = 1 and Q = −8, and so we would need a number of replications on the order of 1031 or larger to hope to estimate this probability. That is obviously not feasible.

Table 1 Values of our evaluation of F(Q), for Q ≤ 0, in bold, R = 10,000,000

Similarly, Wang et al. (2011) calculated and reported the quantiles of \(\varepsilon _ \ast = v - u\) based on a sample of 10,000,000 draws, for various values of λ. In their supplemental tables (available on request from the authors), they consider a very large set of values of λ, and they give the empirical quantiles 0.01, 0.02, …, 0.99. (They also give the quantiles zero and one, but these are just the minimum and maximum values in the sample, whereas the population distribution of \(\varepsilon _ \ast\) does not have a finite minimum or maximum value.) The information in the quantile values is in principle the same as in the cdf values, and they could have considered quantiles smaller than 1% or bigger than 99%, but for exactly the same reasons as given in the previous paragraph they could not have calculated meaningful quantiles very far into the tail without using far more than 10,000,000 draws.

As an alternative, we will propose a method that is very similar to a method often used in the literature on simulated MLE. See, e.g., Greene (2010). A probability is the expectation of an indicator function, and by the law of iterated expectations P(ε ≤ Q) = P(v + u ≤ Q) = P(v ≤ Q − u) = EuP(v ≤ Q − u|u) = EuΦ[(Q − u)/σv]. (The last equality follows from the independence of v and u.) We calculate this by averaging Φ[(Q − u)/σv] over a large number of draws from the distribution of u.

To be very explicit, our procedure is as follows. (1) Set σ = 1 and pick a value of λ. Calculate the implied values of σu and σv. With σ2 = 1, these are \(\sigma _v^2 = 1/\left( {1 + \lambda ^2} \right)\) and \(\sigma _u^2 = \lambda ^2/\left( {1 + \lambda ^2} \right)\). (2) Pick a value of Q. (3) Now, for replication r = 1, …, R, where R is a very large number, take a draw from N(0,1), take its absolute value, and multiply by σu to get ur. This generates a draw from \(N^ + \left( {0,\sigma _u^2} \right)\) because the absolute value of a N(0,1) random variable is distributed as N+(0,1), and multiplying by σu converts N+(0,1) into \(N^ + \left( {0,\sigma _u^2} \right)\). (4) Calculate Φ[(Q − ur)/σv]. (5) Average this over the R replications.

This is preferable to an evaluation of the empirical cdf because it avoids the randomness from drawing v, and because reliable methods exist for evaluating the normal cdf in the extreme tails, such as 20 standard deviations from zero.

Finally, although the skew-normal cdf is analytically intractable, we were able to derive an exact expression for the cdf for the special case of λ = 1 (σu = σv). This is given in the following result, which we prove in the Appendix.

3 Result

Suppose that λ = 1 (σu = σv). Then \(P_{1,\sigma }\left( Q \right) = \Phi ^2\left( {\frac{Q}{{\sqrt 2 \sigma _u}}} \right)\). When σ2 = 1, this simplifies to P1,1(Q) = Φ2(Q).

This result is useful because, apart from the trivial case of λ = 0 (the normal distribution), it provides the only exact standard for assessing the accuracy of both simulation-based and approximate methods of evaluating the skew-normal cdf.

4 Some tabulations and comparisons

Table 1 gives values of the cdf F(Q) = P(ε ≤ Q) for non-positive values of Q, with −16 ≤ Q ≤ 0. Table 2 gives values of 1 − F(Q) for positive values of Q, with 1 ≤ Q ≤ 20. The reason that we show values of 1 − F(Q) for positive values is that otherwise, for the larger values of Q, F(Q) would round to one unless a very large number of decimal places were preserved, and if they were preserved the number of digits “9” would fill a whole line. For example, for λ = 1 and Q = 12, our value of F(Q) is 1−3.64006e−39, and to display that in decimal form would require 38 digits “9” between the decimal place and 635994. Of course, it does not matter whether we report that 1 − F(Q) = 3.64006e−39 or F(Q) = 1−3.64006e−39.

Table 2 Values of our evaluation of 1 − F(Q), for Q > 0, in bold, R = 10,000,000

For each (Q,λ) “cell,” the top number, in bold, is our evaluation of F(Q) or 1 − F(Q) and the number underneath it is the approximation of Tsay et al. (2013).

The first thing to note is that we are able to calculate a value for F(Q), both for our method and for the approximation of Tsay et al., for a much larger range of Q than has previously been done. The numerical issues involved will be discussed in the next Section. For now, we simply note that Tsay et al., Table 1, reported results for Q from −3.0 to 3.0, and their algorithm would not calculate probabilities smaller than about 1.0e−16. Ashour and Abdul-Hameed (2010) tabulated results for Q in the range from zero to four. Wang et al. (2011) tabulated quantiles, not cdf values, but the smallest quantile they considered was 0.01 and the largest was 0.99.

To ask how close our cdf values are to the Tsay et al. approximation, we have to ask what we mean by close. For example, for Q = −1, λ = 1, the cdf values of 0.02516 and 0.02514 are close in both absolute and relative terms, whereas for Q = −12, λ = 1, the values of 3.153e−66 and 1.994e−61 are close in absolute terms but not in relative terms. Both Tsay et al. and Ashour and Abdul-Hameed comment on closeness in absolute terms, but it is not clear why this is relevant. Indeed, the relevant notion of closeness logically depends on the copula. For example, if we are using the normal copula, what is relevant is the value of Φ−1(F(ε)). For Q = −12, λ = 1, the value of Φ−1(F(ε)) is −16.923 for our calculation, −16.259 for the Tsay et al. approximation, and minus infinity for the Ashour and Abdul-Hameed approximation, which equals zero for all Q < −3. As another example, for Q = 16, λ = 1, the value of Φ−1(F(ε)) is 18.134 for our calculation and 15.712 for the Tsay et al. approximation. Of course, these numbers would be different for a different copula, and ultimately the bias in estimation caused by a miscalculated cdf will depend both on the copula and the model that uses the copula.

Having said that, the values of our calculation of F(Q) and the Tsay et al. approximation are quite close in both absolute and relative terms for non-extreme values of Q, say −4 ≤ Q ≤ 3. For more extreme values of Q, they are close in absolute but not always in relative terms.

The Ashour and Abdul-Hameed approximation, which sets F(Q) = 0 for Q < −3, is in a sense infinitely bad in relative terms for Q in that range, and we will drop it from further consideration, even though it appears to be accurate in the non-extreme part of the range of Q.

5 Numerical issues and accuracy checks

5.1 Numerical issues

Our calculations were done in MATLAB.

For the non-positive values of Q, we calculated the normal cdf (so that we can calculate Φ[(Qur)/σv]) using the MATLAB command normcdf. This gave results that matched those in Marsaglia (2004) for −16.6 ≤ z ≤ −0.1 where z is generic notation for the normal cdf argument. The MATLAB results also matched the results from the online Casio Keisan normal cdf calculator.

For positive values, some care needed to be taken to keep the cdf from rounding to one. For example, for z = 12, normcdf returns “1”. However, the MATLAB command normcdf, ‘upper’ returns the upper tail probability 1 − Φ(12) = 1.77648e−33. The key is to average the values of 1 − Φ[(Qur)/σv] and then subtract this average from one so that the small deviations from one are preserved. If you subtract the individual deviations from one separately for each replication and then average, you will just get one.

A check of the accuracy of the routine normcdf, ‘upper’ is that, for positive z, the value of 1 − Φ(z) equaled Φ(−z), which it did, even for extreme values of z. For example, normcdf evaluated at z = −20 gives 2.753624e−89 and normcdf,’upper’ evaluated at z = 20 gives 1–2.753624e−89.

Similar considerations apply to the calculation of the Tsay et al. approximation. For Q < 0, the approximate cdf as given in equation (12) of their text and in the last equation of their Appendix is of the form 2GH, where G and H are our shorthand for the terms in the last equation in the Appendix. The term G is easily calculated, but H involves the “error function” \({\mathrm{erf}}\left( z \right) = \frac{2}{{\sqrt \pi }}{\int}_0^z {\exp \left( { - t^2} \right)dt}\). Specifically, H = 1 − erf(z) where z is a linear function of Q, with a negative coefficient on Q. For Q < 0 but large in magnitude, z will be a large positive number and erf(z) will round to one and 1−erf(z) will round to zero. The solution is to use not the MATLAB command erf, but rather to use the command erfc to calculate erfc(z) = 1 − erf(z). For example, when λ = 1 and Q = −10, z = 9.7847, and we could calculate erf(9.7847) = 1, 1 − erf(9.7847) = 0, H = 0 and approximate cdf = 0, which is not an accurate or useful result. We need to calculate H = erfc(9.7847) = 1.5117e − 43, which yields an approximate cdf of 6.7566e−43.

Conversely, for large positive Q, we have an extra term, which we will call J, as given in the last term of the second to last equation of the Appendix. The approximate cdf is equal to \(2\left( {GH^ \ast + J} \right)\), where H* is like H except for a sign change in one term. G is the same as before and H* is almost zero and it won’t matter numerically if it rounds to zero or not. The value of the approximate cdf is determined by the term 2J where J is 0.5 minus a very small number, and it is essential not to let J round to 0.5. So for example for Q = 20, 2J = erf(14.1421) and if you calculate erf(14.1421) you will get one “exactly.” Instead you need to calculate erf(14.421) as 1 – erfc(14.1421) = 1–5.51281e−89 and this will lead to the approximate cdf equal to 1− 5.51282e−89, a meaningful result.

Finally, we consider the evaluation of the exact cdf P1,1(Q) = Φ2(Q) for the case of λ = 1. For negative Q, there is no numerical issue. We just calculate calculate Φ(Q) and take the square. For large positive Q, however, we need to take care to keep Φ(Q) from rounding to one. For example, for Q = 8, we use the command normcdf, ‘upper’ to obtain 1 − Φ(8) = 6.22096e − 16, which we translate into Φ(8) = 1 − 6.22096e − 16 and Φ2(8) = 1 − 2 × 6.2209e−16 + 6.220962e − 32 = 1 − 1.24419e − 15.

5.2 Accuracy of the calculations

The Tsay et al. approximation and the exact result for λ = 1 are closed-form expressions, apart from the need to evaluate the normal cdf and error function. So there is little question of numerical accuracy for these results. However, our simulated F(Q) is not a closed form expression and it could be inaccurate for any number of reasons, most notably the inherent randomness of the simulation and the quality of the random number generator.

The first thing we investigate is how sensitive the results are to the choice of R, the number of replications in the simulation. Table 3 gives results for some values of Q, for R ranging from 1,000,000 to 100,000,000, for the case of λ = 1, so that we have the exact result to compare to. For Q ≤ 4, the results do not depend very much on the number of replications, and R = 1,000,000 is sufficient to give reasonably accurate results. Things begin to be less clear for Q = 6, and for Q ≥ 8 the results depend more strongly on R, and they do not converge unambiguously to the exact result even for R = 100,000,000.

Table 3 Simulated F(Q), λ = 1, for various values of R

Table 4 gives the comparison between the simulated F(Q) and the exact F(Q), for λ = 1 and for more different values of Q. The results are the same as described in the previous paragraph. The simulated cdf is accurate for Q ≤ 4 and becomes less accurate thereafter.

Table 4 Exact versus simulated F(Q), λ = 1

Of course, this may just reflect too strict a meaning of the word accurate. The simulated cdf is quite close to the exact cdf in the absolute sense, for all of the values of Q that we consider. The inaccuracy that we have identified is in the relative sense.

There are multiple possible explanations for numerical inaccuracy in a simulation, but in this case it is easy to suspect the random number generator. We used the MATLAB command rng (s,’twister’) where “twister” denotes the Marsenne twister algorithm and s is the seed. We picked s = 1. A sample of 10,000,000 pseudo-random normal deviates from this generator passed standard tests on the first four moments. However, a more focused test of the random number generator is to check whether other random number generators give different results, and, if so, whether they more closely match the exact results for λ = 1.

We considered two addition pseudo-random number generators. One is the MATLAB command rng (‘default’), which is the same as rng (0,’twister’). The other creates pseudo-random deviates z such that the values of Φ(z) uniformly fill the space [0,1]. Explicitly, for j = 1, …, R, we choose \(z_j = \Phi ^{ - 1}\left( {\left( {j - 1/2} \right)/R} \right)\). We could call this uniform spacing. This is somewhat similar to a Van der Corput sequence, which is a one−dimensional Halton sequence, but it is simpler because we have no need to have uncorrelated draws.

For values of Q where our results are numerically stable (Q ≤ 4) the choice of random number generator makes very little difference. For example, for Q = 2 and λ = 1, the three random number generators listed above yield 1−F(Q) as 0.0450189, 0.0449673 and 0.0449827, respectively. These are all quite close to each other and to the exact value of 0.0449826. However, for larger values of Q the random number generator matters more. For example, for Q = 8 and λ = 1, we obtain 2.58074e−16, 3.69353e−16 and 4.86878e−16. So the choice of random number generators matters for the larger values of Q. However, none of these numbers is particularly close (in relative terms) to the exact value of 1.24419e−15. So the problem could be random number generation, but it is not due to the specific random number generator we used, and the Marsenne twister is considered to be the state−of-the−art random number generation algorithm. There is essentially an infinity of possible random number generators, and it is just not clear how likely it is that we could find one that would solve our inaccuracy problem, if in fact the problem does lie in random number generation.

6 Tabulations

We have created a set of supplemental tables, available on request, that give our calculation of F(Q) as a function of Q and λ. They cover the range −16 ≤ Q ≤ 10 and 0 ≤ λ ≤ 8 and were calculated for R = 10,000,000. We trust these calculations to be accurate for Q ≤ 4. For Q between 4 and 10, they are less accurate, but we have included these numbers because it is not clear what a better alternative would be. An evaluation that is exactly equal to one is not a good alternative. For the case of λ = 1, for which we have an exact result, the Tsay et al. approximation is quite accurate for large values of Q. We conjecture that this may be so for other values of λ as well, but we have no evidence to support this conjecture.

7 Concluding remarks

In our view, the main contribution of the paper is to have extended the range of the argument over which we can get numerically stable and believable cdf values. This range is not as wide as we would like, but it is considerably wider than in previous papers.

The other substantial contribution of the paper is the derivation of a closed-form expression for the exact cdf, for the special case of λ = 1. This allows us to check the accuracy of the cdf values that we have calculated and tabulated, at least for one special case.