Keywords

JEL Classification:

3.1 Introduction

The parametric approach to estimate stochastic production frontiers was introduced by Aigner et al. (1977), Meeusen and van den Broeck (1977), and Battese and Corra (1977). These approaches specified a parametric production function and a two-component error term. One component, reflecting the influence of many unaccountable factors on production as well as measurement error, is considered “noise” and is usually assumed to be normally distributed. The other component describes inefficiency and is assumed to have a one-sided distribution, of which the conventional candidates include the half normal (Aigner et al. 1977), truncated normal (Stevenson 1980), exponential (Meeusen and van den Broeck 1977) and gamma (Greene 1980a,b1990; Stevenson 1980). This stochastic frontier production function has become an iconic modeling paradigm in econometric research, rate making decisions in regulated industries across the world, in evaluating outcomes of market reforms in transition economies, and in establishing performance benchmarks for local, state, and federal governmental activities.

In this paper we propose a new class of parametric stochastic frontier models with a more flexible specification of the inefficiency term, which we view as improvement on the basic iconic stochastic frontier production model. Instead of allowing unbounded support for the distribution of productive (cost) inefficiency term in the right (left) tail, we introduce an unobservable upper bound to inefficiencies or a lower bound to the efficiencies, which we call the inefficiency bound. The introduction of the inefficiency bound makes the parametric stochastic frontier model more appealing for empirical studies in at least two aspects. First, it is plausible to allow only bounded support in many applications of stochastic frontier models wherein the extremely inefficient firms in a competitive industry of market are eliminated by competition. Bounded inefficiency makes sense in this setting since the extremely inefficient stores will be forced to close and thus individual production units constitute a truncated sample.Footnote 1 This is consistent with the arguments of Alchian (1950) and Stigler (1958) wherein firms are at any point in time not in a static long run equilibrium, but rather are tending to that situation as they are buffeted by demand and cost shocks. As a consequence, even if we correctly specify a family of distributions for the inefficiency term, the stochastic frontier model may still be misspecified. This particular setting is one in which the inefficiency bound is informative as an indicator of competitive pressures and/or the extent of supervisory oversight by direct management or by corporate boards. In settings in which firms can successfully differentiate their product, which is the typical market structure and not the exception, or where there are market concentrations that may reflect collusive behavior or conditions for a natural monopoly and regulatory oversight, incentives to fully exploit market power or to instead make satisficing decision are both possible outcomes. Much more likely is that it is not one or the other but some middle ground between the two extremes that would be found empirically.Footnote 2

A second justification for our introduction of the inefficiency bound into the classical stochastic production frontier model is that our model points to an explanation for the finding of positive skewness in many applied studies using the traditional stochastic frontier, and thus to the potential of our bounded inefficiency model to explain these positive (“wrong”) skewness findings.Footnote 3 Researchers have often found positive instead of negative skewness in many samples examined in applied work, which may point to the stochastic frontier being incorrectly specified. However, we conjecture that the distribution of the inefficiency term may itself be negatively skewed, which may happen if there is an additional truncation on the right tail of the distribution. One such specification in which this is a natural consequence is when the distribution of the inefficiency term is doubly truncated normal, that is, a normal distribution truncated at a point on the right tail as well as at zero. As normal distributions are symmetric, the doubly truncated normal distribution may exhibit negative skewness if the truncation on the right is closer to the mode than that on the left. We also consider the truncated half normal distribution, which is a special case of the former, and the truncated exponential distribution. Although these two distributions are always positively skewed, the fact that there is a truncation on the right tail makes the skewness very hard to identify empirically. That is to say, when the true distribution of the one-sided inefficiency error is bounded (truncated), the extent to which skewness is present in any finite sample may be substantially reduced, often to the extent that negative sample skewness for the composite error is not statistically significant. Thus the finding of positive skewness may speak to the weak identifiability of skewness properties in a bounded frontier model.

In addition to proposing new parametric forms for the classical stochastic production frontier model, we also show that our models are identifiable, and in which cases the identification is local or global. Initial consistent estimates are based on method of moments estimates, based on explicit analytic expressions which we derive, and which either can be used in a two-step method of scoring or as starting values in solving the normal equations for the relevant sample likelihood, based on the parametric density functions whose expressions we also provide. As the regulatory conditions for maximum likelihood estimation method are satisfied, we employ it in order to obtain consistent and asymptotically efficient estimates of the model parameters, including this of the inefficiency bound. We conduct Monte Carlo experiments to study the finite sample behavior of our estimators. We also extend the model to the panel data setting and allow for a time-varying inefficiency bound. By allowing the inefficiency bound to be time-varying, we contribute another time-varying technical efficiency model to the efficiency literature. Our model differs from those most commonly used in the literature, e.g., Cornwell et al. (1990), Kumbhakar (1990), Battese and Coelli (1992), and Lee and Schmidt (1993) in that, while previous time-varying efficiency models are time-varying in the mean or intercept of individual effects, our model is time-varying in the lower support of the distribution of individual effects.

The outline of this paper is as follows. In Sect. 3.2 we present the new models and derive analytic formula for density functions and expressions that allow us to evaluate inefficiencies. Section 3.3 deals with the positive skewness issue inherent in the traditional stochastic frontier model. Section 3.4 discusses the identification of the new models and the methods of estimation. Section 3.5 presents Monte Carlo results on the finite sample performance of the bounded inefficiency model vis-a-vis classical stochastic frontier estimators. The extension of the new models to panel data settings and specification of the time-varying bound is presented in Sect. 3.6. In Sect. 3.7 we give an illustrative study of the efficiency of US banking industry in 1984–2009. Section 3.8 concludes.

3.2 The Model

We consider the following Cobb-Douglas production model,

$$\displaystyle{ y_{i} =\alpha _{0} +\sum _{ k=1}^{K}\alpha _{ k}x_{i,k} +\varepsilon _{i} }$$
(3.1)

where

$$\displaystyle{ \varepsilon _{i} = v_{i} - u_{i}. }$$
(3.2)

For every production unit i, y i is the log output, x ik the k-th log input, v i the noise component, and u i the (nonnegative) inefficiency component. We maintain the usual assumption that v i is iid N(0, σ v 2), u i is iid, and v i and u i are independent from each other and from regressors. Clearly we can consider other more flexible functional forms for production (or cost) that are linear or linear in logarithms, such as the generalized Leontief or the transcendental logarithmic, or ones that are nonlinear. The only necessary assumption is that the error process \(\varepsilon _{i}\) is additively separable from the functional forms we employ in the stochastic production (cost) frontier.

As described in the introduction, our model differs from the traditional stochastic frontier model in that u i is of bounded support. Additional to the lower bound, which is zero and which is the frontier, we specify an upper bound to the distribution of u i (in the case of the cost frontier \(\varepsilon _{i} = v_{i} + u_{i}\)). In particular, we assume that u i is distributed as doubly truncated normal, the density of which is given by

$$\displaystyle{ f(u) = \frac{\frac{1} {\sigma _{u}}\phi (\frac{u-\mu } {\sigma _{u}} )} {\Phi (\frac{B-\mu } {\sigma _{u}} ) - \Phi (\frac{-\mu } {\sigma _{u}} )}\mathbf{1}_{[0,B]}(u),\ \sigma _{u} > 0,B > 0 }$$
(3.3)

where \(\Phi (\cdot )\) and ϕ(⋅ ) are the cdf and pdf of the standard normal distribution, respectively, and 1[0, B] is an indicator function. It is a distribution obtained by truncating N(μ, σ u 2) at zero and B > 0. The parameter B is the upper bound of the distribution of u i and we may call it the inefficiency bound. The inefficiency bound may be a useful index of competitiveness of a market or an industry.Footnote 4 In the banking industry, which we examine in Sect. 3.7, the inefficiency bound may also represent factors that influence the financial health of the industry. It may be natural to extend this specification and treat the bound as a function of individual specific covariates z i , such as exp(δ z i ), which would allow identification of bank-specific measures of financial health.

Using the usual nomenclature of stochastic frontier models, we may call the model described above the normal-doubly truncated normal model, or simply, the doubly truncated normal model. The doubly truncated normal model is rather flexible. It nests the truncated normal (B → ), half normal (μ = 0 and B → ), and truncated half normal models (μ = 0). One desirable feature of our model is that the doubly truncated normal distribution may be positively or negatively skewed, depending on the truncation parameter B. This feature provides us with an alternative explanation for the positive skewness problem prevalent in empirical stochastic frontier studies. This will be made more clear later in the paper. Another desirable feature of our model is that, like the truncated normal model, it can describe the scenario that only a few firms in the sector are efficient, a phenomenon that is described in the business press as “few stars, most dogs”, while in the truncated half normal model and the truncated exponential model (in which the distribution of u i is truncated exponential), most firms are implicitly assumed to be relatively efficient.Footnote 5

In Table 3.1 we provide detailed properties of our model. In particular, we present the density functions for the error term \(\varepsilon _{i}\), which is necessary for maximum likelihood estimation, and the analytic form for \(E[u_{i}\vert \varepsilon _{i}]\), which is the best predictor of the inefficiency term u i under our assumptions, and the conditional distribution of u i given \(\varepsilon _{i}\), which is useful for making inferences on u i . The results for the truncated half normal model, a special case of the doubly truncated normal model (μ = 0), are also presented. Finally, we also provide results for the truncated exponential model, in which the inefficiency term u i is distributed according to the following density function,

$$\displaystyle{ f(u) = \frac{1} {\sigma _{u}(1 - {e}^{-B/\sigma _{u}})}{e}^{-\frac{u} {\sigma _{u}} }\mathbf{1}_{[0,B]}(u),\sigma _{u} > 0,B > 0 }$$
(3.4)

The truncated exponential distribution can be further generalized to the truncated gamma distribution, which shares the nice property with the doubly truncated normal distribution that it may be positively or negatively skewed.

Table 3.1 Key results. \(f(\varepsilon )\) is the density of \(\varepsilon = v - u\), \(\mathbb{E}(u\vert \varepsilon )\) is the conditional mean of u given \(\varepsilon\), and \(f(u\vert \varepsilon )\) is the conditional density of u given \(\varepsilon\). ϕ(⋅ ) and \(\Phi (\cdot )\) are the pdf and cdf of the standard normal distribution, respectively. And 1 [0, B](⋅ ) is an indicator function

For the doubly truncated normal model and the truncated half normal model, the analytic forms of our results use the so-called γ-parametrization, which specifies

$$\displaystyle{ \sigma = \sqrt{\sigma _{u }^{2 } +\sigma _{ v }^{2}},\ \ \ \gamma =\sigma _{ u}^{2}{/\sigma }^{2}. }$$
(3.5)

By definition γ ∈ [0, 1], a compact support, which is desirable for the numerical procedure of maximum likelihood estimation. Another parametrization initially employed by Aigner et al. (1977) is the λ-parametrization

$$\displaystyle{ \sigma = \sqrt{\sigma _{u }^{2 } +\sigma _{ v }^{2}},\ \ \ \lambda =\sigma _{u}/\sigma _{v}. }$$
(3.6)

We may check that when B → , the density function for \(\varepsilon _{i}\) in the doubly truncated normal model reduces to that of the truncated normal model introduced by Stevenson (1980). Furthermore, if μ = 0, it reduces to the likelihood function for the half normal model introduced by Aigner et al. (1977). Similarly, the truncated exponential model reduces to the exponential model introduced by Meeusen and van den Broeck (1977).

3.3 The Skewness Issue

A common and important methodological problem encountered when dealing with empirical implementation of the stochastic frontier model is that the residuals may be skewed in the wrong direction. In particular, the ordinary least squares (OLS) residuals may show positive skewness even though the composed error term vu should display negative skewness, in keeping with u s positive skewness. This problem has important consequences for the interpretation of the skewness of the error term as a measure of technological inefficiency. It may imply that a nonrepresentative random sample had been drawn from an inefficiency distribution possessing the correct population skewness (see Carree 2002; Greene 2007; Simar and Wilson 2010; Almanidis and Sickles 2011 Footnote 6; Feng et al. 2012). This is considered a finite sample “artifact” and the usual suggestion in the literature and by programs implementing stochastic frontier models is to treat all firms in the sample as fully efficient and proceed with straightforward OLS based on the results of Olson et al. (1980) and Waldman (1982). As this would suggest setting the variance of the inefficiency term to zero, it would have problematic impacts on estimation and on inference. Simar and Wilson (2010) suggest a bagging method to overcome the inferential problems when a half-normal distribution for inefficiencies is specified. However, a finding of positive skewness in a sample may also indicate that inefficiencies are in fact drawn from a distribution which has positive skewness.Footnote 7 Carree (2002) considers one-sided distributions of inefficiencies (u i ) that can have negative or positive skewness. However, Carree (2002) uses the binomial distribution, which is a discrete distribution wherein continuous inefficiencies fall into discrete “inefficiency categories” and which implicitly assumes that only a very small fraction of the firms attain a level of productivity close to the frontier, especially when u i is negatively skewed.Footnote 8

Our model addresses the positive skewness problem in the spirit of Carree (2002), but with a more appealing distributional specification on the efficiency term. For the doubly truncated normal model, let \(\xi _{1} = \frac{-\mu } {\sigma _{u}}\), \(\xi _{2} = \frac{B-\mu } {\sigma _{u}}\), and \(\eta _{k} \equiv \frac{\xi _{1}^{k}\phi (\xi _{ 1})-\xi _{2}^{k}\phi (\xi _{ 2})} {\Phi (\xi _{2})-\Phi (\xi _{1})}\), k = 0, 1, , 4. Note that η 0 is the inverse Mill’s ratio and it is equal to \(\sqrt{2/\pi }\) in the half normal model, and that ξ 1 and ξ 2 are the lower and upper truncation points of the standard normal density, respectively. The skewness of the doubly truncated normal distribution is given by

$$\displaystyle{ S_{u} = \frac{2\eta _{0}^{3} -\eta _{0}(3\eta _{1} + 1) +\eta _{2}} {{\left(1 -\eta _{0}^{2} +\eta _{1}\right)}^{3/2}}. }$$
(3.7)

It can be checked that when B > 2μ, S u is positive and when B < 2μ, S u is negative. Since B > 0 by definition, it is obvious that only when μ > 0 is it possible for u i to be negatively skewed. The larger μ is, the larger range of values B may take such that u i is negatively skewed. Consider the limiting case where a normal distribution with μ →  is truncated at zero and B > 0. An infinitely large μ means that there is effectively no truncation on the left at all and that any finite truncation on the right gives rise to a negative skewness. Finally, for both the truncated half normal model (μ = 0) and the truncated exponential model, the skewness of u i is always positive.

Consequently, the doubly truncated normal model has a residual that has an ambiguous sign of the skewness, which depends on an unobservable relationship between the truncation parameter B and μ. We argue that this ambiguity theoretically could explain the prevalence of the positive skewness problem in applied stochastic frontier research. When the underlying data generating process for u i is based on the doubly truncated normal distribution, increasing sample size does not solve the positive skewness problem. The skewness of the OLS residual \(\varepsilon\) may be positively skewed even when sample size goes to infinity. Hence the positive skewness problem also may be a large sample problem.Footnote 9

Based on the above discussion, it is clear that the doubly truncated normal model generalizes the stochastic frontier model in a way that allows for positive as well as negative skewness for the residual. In addition, although the truncated half normal and the truncated exponential models have negative (correct) skewness in large samples, the existence of the inefficiency bound reduces the identifiability of negative skewness in finite sample, often to the extent that positive skewness appears. This implies that finding a positive skewness does not necessarily mean that the stochastic frontier model is inapplicable. It may be due to a finite sample “artifact” (Simar and Wilson 2010) or it may be that we are studying a market or an industry in which firms do not fall below some minimal level of efficiency in order to remain in the market or industry. In the latter case, the traditional unbounded support for the inefficiency term would be misspecified and should be substituted with the model of bounded inefficiency.

3.4 Identification and Estimation

3.4.1 Identification

We utilize the set or partial identification concepts that have been revisited (see, for example, Tamer 2010) and that were enunciated early in the production setting by Marschak and Andrews (1944) (see also the critique by Nerlove (1965)). That this has been the relatively recent interest of many econometricians speaks to a cycle of classical econometric study that has defined the production frontier portion of Peter Schmidt’s research that our paper develops. We can put it into a historical perspective by looking at the intellectual development of the production function by Paul Samuelson (see his 1979 review of his professor Paul Douglas), his student Lawrence Klein whose classic Textbook of Econometrics (1953) sold at the unheard price of $6.00 and which provides insights today for those interested in production econometrics, his student Arthur Goldberger (see, for example, “The Interpretation and Estimation of Cobb-Douglas Functions”, 1968), his student Jan Kmenta (see, for example, Zellner et al. 1966), and his student Peter Schmidt, whose work on the stochastic frontier production function with Dennis Aigner and C. A. Knox Lovell (1977) is regarded as the seminal research contribution to the field of productive efficiency econometrics. In turn, each of these legacies arguably can be viewed as the most successful student of their respective professor. Our contribution is leveraged by these seminal contributions as well as the selective constraints that economic theory has imposed on their contributions, which we try to address in our stochastic frontier model with bounded inefficiency.

Identification using first and second order moments is a well-accepted methodology. Our models are not identified by such moments alone and require higher order moments. The use of higher order moments to identify and estimate econometric models is well-known and has proven quite important in parametric econometric modeling (see, for example, Cragg 1997; Dagenais and Dagenais 1997). Identification strategies that utilize the properties of the underlying joint distribution function for the exponential class, requiring the identification of distributions defined by third and forth order moments, have been the mainstay of recent work in nonparametric identification (Newey and Powell 2003; Matzkin 2012). Alternative approaches have also been introduced to utilize other types of information, such as heteroskedastic covariance restrictions to obtain point and set identification for parametric and semiparametric models (Lewbel 2012). We explore the sensitivity of the use of such higher order moments restrictions in our Monte Carlo experiments.

Identification of our model may be done in two parts. The first part is concerned with the parameters describing the technology, and the second part identifies the distributional parameters using the information contained in the distribution of the residual. For models without an intercept term the identification conditions for the first part are well known and are satisfied in most of the cases. The structural parameters can be consistently obtained by applying straightforward OLS. However, for models containing an intercept term there is a need to bias correction it using the distributional parameters since \(E[\varepsilon ] = -E[u]\neq 0\) (see Afriat 1972; Richmond 1974). Therefore, the identification of the second part, which is based on method-of-moments requires a closer examination. Table 3.2 lists the population (central) moments of \((\varepsilon _{i})\) for the doubly truncated normal model and the truncated exponential model. The moments of the truncated half normal model can be obtained by setting μ = 0 in the doubly truncated normal model. These results are essential for the discussion of identification and the method of moments estimation.

Table 3.2 Central moments of \(\varepsilon\)

To examine the identification of the second part we note that under the assumption of independence of the noise and inefficiency term the following equality holds

$$\displaystyle\begin{array}{rcl} & & E\left[{(\varepsilon -E(\varepsilon ))}^{4}\right] - 3{\left(E\left[{(\varepsilon -E(\varepsilon ))}^{2}\right]\right)}^{2} \\ & & \quad =\psi _{4} - 3\psi _{2}^{2} = E\left[{(u - E(u))}^{4}\right] - 3{\left(E\left[{(u - E(u))}^{2}\right]\right)}^{2}{}\end{array}$$
(3.8)

This is a measure of excess kurtosis and for the truncated half-normal model is derived as

$$\displaystyle{ \psi _{4} - 3\psi _{2}^{2} =\sigma _{ u}^{4}({-\xi }^{3}\tilde{\eta }_{ 0} + 3\xi \tilde{\eta }_{0} - {4\xi }^{2}\tilde{\eta }_{ 0}^{2} - 4\tilde{\eta }_{ 0}^{2} - {3\xi }^{2}\tilde{\eta }_{ 0}^{2} - 12\xi \tilde{\eta }_{ 0}^{3}) }$$
(3.9)

where \(\tilde{\eta }_{0} = \frac{{(2\pi )}^{-1/2}-\xi \phi (\xi )} {\Phi (\xi )-\frac{1} {2} }\). Notice that for normal distribution \(\tilde{\eta }_{0} = 0\) and thus the excess kurtosis is also zero.

After multiplying (3.9) by \(\psi _{3}^{-4/3}\) we eliminate σ u and the resulting function, which we denote by g has only one argument ξ

$$\displaystyle{ g(\xi ) = \frac{{-\xi }^{3}\tilde{\eta }_{0} + 3\xi \tilde{\eta }_{0} - {4\xi }^{2}\tilde{\eta }_{0}^{2} - 4\tilde{\eta }_{0}^{2} - {3\xi }^{2}\tilde{\eta }_{0}^{2} - 12\xi \tilde{\eta }_{0}^{3}} {{\left(2\tilde{\eta }_{0}^{3} - 3\xi \tilde{\eta }_{0}^{2} -\tilde{\eta }_{0} {+\xi }^{2}\tilde{\eta }_{0}\right)}^{-4/3}} }$$
(3.10)

The weak law of large numbers implies that

$$\displaystyle{ \text{plim} \frac{1} {n}\mathop{\sum }\nolimits _{i}\hat{\varepsilon }_{i}^{k} = m_{ k} =\psi _{k} }$$
(3.11)

The first order moment is zero by definition and thus is not useful for identification purposes. By employing the Slutsky theorem we can specify the following function G

$$\displaystyle\begin{array}{rcl} g(\xi )& =& \frac{m_{4} - 3m_{2}^{2}} {{m}^{4/3}}{}\\ &\Longrightarrow& {}\\ G(\xi )& =& g(\xi ) -\frac{m_{4} - 3m_{2}^{2}} {{m}^{4/3}} {}\\ \end{array}$$

Similarly, we can derive the function G for the normal-truncated exponential model with function g expressed by

$$\displaystyle{ g(\xi ) = \frac{36{e}^{2\xi } - 24{e}^{\xi } - 24{e}^{3\xi } + 6{e}^{4\xi } {-\xi }^{4}{e}^{\xi } - {4\xi }^{4}{e}^{2\xi } {-\xi }^{4}{e}^{3\xi } + 6} {(6{e}^{2\xi } - 4{e}^{\xi } - 4{e}^{3\xi } + {e}^{4\xi } + 1){(-\frac{2{e}^{3\xi }-\left({\xi }^{3}+6\right){e}^{2\xi }+\left(6{-\xi }^{3}\right){e}^{\xi }-2} {{e}^{3\xi }-3{e}^{2\xi }+3{e}^{\xi }-1} )}^{4/3}} }$$
(3.12)

Both the truncated half normal model and the truncated exponential model are globally identified. To see this, we can examine the monotonicity of the function G with respect to the parameter ξ which will allow us to express this parameter (implicitly) as a function of sample moments and data. This condition provides the necessary and sufficient condition for global identification ala Rothenberg (1971). For the truncated half normal model, G is monotonically decreasing and for the truncated exponential model, G is monotonically increasing. Hence, in both cases, G is invertible and ξ can be identified. The identification of other parameters then follows from the third order moment of least squares residuals. Note, however, that for large values of ξ (e.g., ξ > 5 for the normal-truncated half-normal model and ξ > 20 for the normal-truncated exponential model), the curve g(ξ) is nearly flat and gives poor identification. ξ can be large for two reasons: either σ u goes to zero or the bound parameter is large. In the first case the distribution of the inefficiency process approaches the Dirac-delta distribution which makes it very hard for the distributional parameters to be identified. This limiting case is discussed in Wang and Schmidt (2008). In the second case the distribution of the inefficiency term becomes unbounded as in the standard stochastic frontier models for which it is straightforward to show that the model is globally identified (see Aigner et al. 1977; Olson et al. 1980).

It is not clear, however, that the doubly truncated normal model is globally identifiable. However, local identification can be verified. We may examine \(\psi _{3}^{-4/3}(\psi _{4} - 3\psi _{2}^{2})\) and \(\psi _{3}^{-5/3}(\psi _{5} - 10\psi _{2}\psi _{3})\), both of which are functions of ξ 1 and ξ 2 only and we denote them as \(g_{1}(\xi _{1},\xi _{2})\) and \(g_{2}(\xi _{1},\xi _{2})\), respectively. Let \(\hat{g}_{1}\) and \(\hat{g}_{2}\) be the sample versions of g 1 and g 2, respectively, we have the following system of identification equations,

$$\displaystyle\begin{array}{rcl} G_{1}(\xi _{1},\xi _{2})\ \equiv g_{1}(\xi _{1},\xi _{2}) -\hat{ g}_{1}& =& 0 {}\\ G_{2}(\xi _{1},\xi _{2})\ \equiv g_{2}(\xi _{1},\xi _{2}) -\hat{ g}_{2}& =& 0. {}\\ \end{array}$$

By the implicit function theorem (or Rothenberg 1971), the identification of ξ 1 and ξ 2 depends on the rank of the matrix

$$\displaystyle{ H = \left(\begin{array}{cc} \frac{\partial g_{1}} {\partial \xi _{1}} & \frac{\partial g_{1}} {\partial \xi _{2}} \\ \frac{\partial g_{2}} {\partial \xi _{1}} & \frac{\partial g_{2}} {\partial \xi _{2}} \end{array} \right). }$$

If H is of full rank, then ξ 1 and ξ 2 can be written as functions of \(\hat{g}_{1}\) and \(\hat{g}_{2}\); the identification of the model then follows. The analytic form of H is very complicated, but we may examine the invertibility of H by numerically evaluating g 1 and g 2 and inferring the sign of each element in H. It can be verified that the determinant of H is nonzero in neighborhoods within I 1, I 2, and I 4, the definitions of which are given as follows,

  1. (i)

    I 1 ≡ { (μ, B) | μ ≤ 0, B > 0}

  2. (ii)

    I 2 ≡ { (μ, B) | μ > 0, B ∈ (0, 2μ)}

  3. (iii)

    \(I_{3} \equiv \{ (\mu,B)\vert B = 2\mu > 0\}\)

  4. (iv)

    I 4 ≡ { (μ, B) | μ > 0, B > 2μ}.

The line I 3 ≡ { (μ, B) | B = 2μ > 0} corresponds to the case where B = 2μ and ψ 3 = 0, hence the functions g 1 and g 2 are not continuous and the implicit function theorem is not applicable. Nonetheless, simulation results in the next section show that when the true values of B and μ satisfy B = 2μ, both B and μ are consistently estimated. This may indicate that the restricted (B = 2μ) model may be nested in the unrestricted model and the model is locally identifiable on \(I_{2}\bigcup I_{3}\bigcup I_{4}\).

We may treat the doubly truncated normal model as a collection of different sub-models corresponding to the different domains of parameters. Treated separately, each of the sub-models is globally identified. In maximum likelihood estimation, the separate treatment is easily achieved by constrained optimization on each parameter subset. For example, on the line of {(μ, B) | μ = 0, B > 0} ⊂ I 1, the doubly truncated normal model reduces to the truncated half normal model. As another useful example, the line I 2 corresponds to a sub-model that has positive skewness even asymptotically.

3.4.2 Method of Moment Estimation

The method-of-moments (Olson et al. 1980) may be employed to estimate our model or to obtain initial values for maximum likelihood estimation. In the first step of this approach, OLS is used to obtain consistent estimates of the parameters describing the technology, apart from the intercept. In the second step, using the distributional assumptions on the residual, equations of moment conditions are solved to obtain estimates of the parameters describing the distribution of the residual.

More specifically, we may rewrite the production frontier model in (3.1) and (3.2) as

$$\displaystyle{ y_{i} = \left(\alpha _{0} - \mathbb{E}u_{i}\right) +\sum _{ k=1}^{K}\alpha _{ k}x_{i,k} +\varepsilon _{ i}^{{\ast}}, }$$

where \(\varepsilon _{i}^{{\ast}} =\varepsilon _{i} + \left(\mathbb{E}u_{i}\right)\) has zero mean and constant variance \(\sigma _{\varepsilon }^{2}\). Hence OLS yields consistent estimates for \(\varepsilon _{i}^{{\ast}}\) and α k , k = 1, , K. Equating the sample moments of estimated residuals \((\hat{\varepsilon }_{i}^{{\ast}})\) to the population moments, one can solve for the parameters associated with the distribution of \((\varepsilon _{i}^{{\ast}})\).

3.4.3 Maximum Likelihood Estimation

For more efficient estimation, we may use maximum likelihood estimation (MLE). Note that with the presence of a noise term v i , the range of residual is unbounded and does not depend on the parameter. No other standard regularity conditions might be questioned. In the remainder of this section we provide the log-likelihood functions for the bounded inefficiency model for the three parametric distributions we have considered. Note that in practice we may also need the gradients of the log likelihood function. The gradients are complicated in form but straightforward to derive. These are provided in the appendix.

In addition to the γ-parametrization discussed earlier, we re-parametrize the bound parameter with another parameter \(\tilde{B} =\exp (-B)\). Unlike the bound, \(\tilde{B}\) takes values in compact unit interval which facilitates the numerical procedure of maximum likelihood estimation as well as establishing the asymptotic normality of this parameter. When \(\tilde{B}\) lies in the interior of parameter space, the MLE estimator is asymptotically normal (see Rao 1973; Davidson and MacKinnon 1993 among others).

The log-likelihood function for the doubly truncated normal model with γ-parameterization is given by

$$\displaystyle\begin{array}{rcl} \ln L& =& -n\ln \left[\Phi (\frac{-\ln \tilde{B}-\mu } {\sigma _{u}(\sigma,\gamma )} ) - \Phi ( \frac{-\mu } {\sigma _{u}(\sigma,\gamma )})\right] \\ & & -n\ln \sigma -\frac{n} {2} \ln (2\pi ) -\sum _{i=1}^{n}\frac{{(\varepsilon _{i}+\mu )}^{2}} {{2\sigma }^{2}} \\ & & +\sum _{i=1}^{n}\ln \Bigg\{\Phi \left(\frac{(-\ln \tilde{B} +\varepsilon _{i})\sqrt{\gamma /(1-\gamma )} - (\ln \tilde{B}+\mu )\sqrt{(1-\gamma )/\gamma }} {\sigma } \right)\ \\ & & -\Phi \left(\frac{\varepsilon _{i}\sqrt{\gamma /(1-\gamma )} -\mu \sqrt{(1-\gamma )/\gamma }} {\sigma } \right)\Bigg\},\ \ \ \ \ \ \ \ \ \ \ {}\end{array}$$
(3.13)

where \(\varepsilon _{i} = y_{i} - x_{i}\alpha\), x i  = (1, x ik ), and \(\alpha = {(\alpha _{0},\alpha _{k})}^{{\prime}}\).

$$\displaystyle{ \sigma _{u}(\sigma,\gamma ) =\sigma \sqrt{\gamma }. }$$
(3.14)

This can be expressed in terms of the λ-parametrization as in Aigner et al. (1977) by substituting γ in (3.13) with

$$\displaystyle{ \gamma (\lambda ) = \frac{{\lambda }^{2}} {1 {+\lambda }^{2}}. }$$
(3.15)

The log-likelihood function for the truncated half normal model is

$$\displaystyle\begin{array}{rcl} \ln L& =& -n\ln \left(\Phi \left( \frac{-\ln \tilde{B}} {\sigma _{u}(\sigma,\gamma )}\right) -\frac{1} {2}\right) - n\ln \sigma -\frac{n} {2} \ln (2\pi ) \\ & & -\sum _{i=1}^{n} \frac{\varepsilon _{i}^{2}} {{2\sigma }^{2}} +\sum _{ i=1}^{n}\ln \left\{\Phi \left(\frac{(-\ln \tilde{B} +\varepsilon _{i})\sqrt{\gamma /(1-\gamma )} -\ln \tilde{ B}\sqrt{(1-\gamma )/\gamma }} {\sigma } \right)\right. \\ & & \left.-\Phi \left(\frac{\varepsilon _{i}\sqrt{\gamma /(1-\gamma )}} {\sigma } \right)\right\}, {}\end{array}$$
(3.16)

Again, substituting γ into (3.16) with γ(λ) in (3.15), we get the logL with λ-parametrization.

Finally, the log-likelihood function for the truncated exponential model with γ-parametrization is given by

$$\displaystyle\begin{array}{rcl} \ln L& =& -\frac{n} {2} \ln \gamma - n\ln \sigma - n\ln \left(1 - {e}\frac{{\ln \tilde{{B}\gamma }^{-1/2} }} {{\sigma } }\right) + \frac{n} {2} \frac{1-\gamma } {\gamma } + \frac{{\gamma }^{-1/2}} {\sigma } \sum _{i=1}^{n}\varepsilon _{ i} \\ & & +\sum _{i=1}^{n}\ln \left[\Phi \left(\frac{(-\ln \tilde{B} +\varepsilon _{i}){(1-\gamma )}^{-1/2}} {\sigma } + \sqrt{\frac{1-\gamma } {\gamma }} \right)\right. \\ & & \left.-\Phi \left(\frac{\varepsilon _{i}{(1-\gamma )}^{-1/2}} {\sigma } + \sqrt{\frac{1-\gamma } {\gamma }} \right)\right], \end{array}$$
(3.17)

where \(\varepsilon _{i} = y - x_{i}\alpha\).

After estimating the model, we can estimate the composed error term \(\varepsilon _{i}\):

$$\displaystyle{ \hat{\varepsilon }_{i} = y_{i} -\hat{\alpha }_{0} -\sum x_{i,k}\hat{\alpha }_{k},i = 1,\cdots \,,n. }$$
(3.18)

From this we can estimate the inefficiency term u i using the formula for \(E(u_{i}\vert \varepsilon _{i})\) in Table 3.1.

One reasonable question is whether or not one can test for the absence or the presence of the bound (\(H_{0}:\tilde{ B} = 0\) vs.\(\ H_{1}:\tilde{ B} > 0\)), which one may wish to test since this would suggest that the proper specification would be the standard SF model which assumes no bound as a special case of our more general bounded SF model. The test procedure is slightly complicated but still feasible. The first complication arises from the fact that \(\tilde{B}\) lies on the boundary of the parameter space under the null. Second, it is obvious from the log-likelihood functions provided above that the bound is not identified in this case and it can be shown that any finite order derivative of the log-likelihood function with respect to \(\tilde{B}\) is zero. Thus the conventional Wald and Lagrange Multiplier (LM) statistics are not defined and the Likelihood Ratio (LR) statistic has a nonstandard asymptotic distribution that strictly would dominate the χ (1) 2 distribution. Lee (1993) derives the asymptotic distribution of such an estimate as a mixture of χ 2 distributions under the null that its value is zero, focusing in particular on the SF model under the assumption of half-normally distributed inefficiencies. Here λ is globally identified, which can also be seen using the method-of-moments estimator provided in Aigner et al. (1977). Lee (1993) provides useful one-to-one reparametrization which transform the singular information matrix into a nonsingular one. However, since the bound in our model case is not identified in this situation, there is no such re-parametrization and hence this procedure cannot be used. An alternative is to apply the bootstrap procedure proposed by Hansen (19961999) to construct asymptotically equivalent p-values to make an inference. To implement the test we treat the \(\hat{\varepsilon }_{i}\) (i = 1, , n) as a sample from which the bootstrap samples \(\hat{\varepsilon }_{i}^{(m)}\) (i = 1, , n; m = 1, , M) are drawn with replacement. Using the bootstrap sample we estimate the model under the null and the alternative of bounded inefficiency and construct the corresponding LR statistic. We repeat this procedure M times and calculate the percentage of times the bootstrap LR exceeds the actual one. This provides us with the bootstrap estimate of the asymptotic p-value of LR under the null.

3.5 Panel Data

In the same spirit as Schmidt and Sickles (1984) and Cornwell et al. (1990), we may specify a panel data model of bounded inefficiencies:

$$\displaystyle{ y_{it} =\alpha _{0} +\sum _{ k=1}^{K}\alpha _{ k}x_{it,k} +\varepsilon _{it} }$$
(3.19)

where

$$\displaystyle{ \varepsilon _{it} = v_{it} - u_{it}. }$$
(3.20)

We assume that the inefficiency components (u it ) are positive, independent from the regressors, and are independently drawn from a time-varying distribution with upper bound B t . We may set B t to be time-invariant. However, it is certainly more plausible to assume otherwise, as the market or industry may well become more or less forgiving as time goes by, especially in settings in which market reforms are being introduced or firms are adjusting to a phased transition from regulation to deregulation.

Note that since u it is time-varying, the above panel data model is in effect a time-varying technical efficiency model. Our model differs from the existing literature in that, while previous time-varying efficiency models, notably Cornwell et al. (1990), Kumbhakar (1990), Battese and Coelli (1992), and Lee and Schmidt (1993), are time-varying in the mean or intercept of individual effects, our model is time-varying in the upper support of the distribution of inefficiency term u i .

The assumption that u it is independent over time simplifies estimation and analysis considerably. In particular, the covariance matrix of \(\varepsilon _{i} \equiv {(\varepsilon _{i1},\ldots,\varepsilon _{iT})}^{{\prime}}\) is diagonal. This enables us to treat the panel model as a collection of cross-section models in the chronological order. We may certainly impose more structure on the sample path of the upper bound of u it , B t , without incurring heavy costs in terms of analytic difficulty. For example, we may impose smoothness conditions on B t . This is empirically plausible, indeed, since changes in the market competitive conditions may come gradually. And it is also technically desirable, since imposing smoothness conditions gives us more degree of freedom in estimation, hence better estimators of model parameters. A natural way of doing this is to let B t be a sum of weighted polynomials,

$$\displaystyle{ B_{t} =\sum _{ i=0}^{K}b_{ i}{(t/T)}^{i},\ \ t = 1,\ldots,T, }$$
(3.21)

where (b i ) are constants. We may also use trigonometric series, splines, among others, in the modeling of B t . For an extensive survey of efforts to generalize such heterogeneities in efficiencies see Sickles et al. (2013).

3.6 Simulations

To examine the finite sample performance of the MLE estimator of the doubly truncated normal model,Footnote 10 we run a series of Monte Carlo experiments in the standard cross-sectional setting. The data generating process is (3.1) and (3.2) with one regressor x and no constant term and is based on the data generating process utilized in study 2 of Aigner, Lovell, and Schmidt. We maintain the assumption that v i is iid N(0, σ v 2), u i is iid, and v i and u i are independent from each other and from regressors. The number of repetitions is 1, 000. Throughout we keep the coefficient α on the single regressor technology parameter set at 0. 6 and examine performances in terms of bias and mean absolute error as we change in each of the distributional parameters (σ, γ, μ, and B). As the SF benchmark we use the singly truncated normal model (Stevenson 1980) on the simulated data. We report average estimates and mean absolute errors (MAE) in Tables 3.33.6. Each of these sets of experiments selectively change the distributional parameters. We draw the following conclusions from these experiments.Footnote 11

Table 3.3 Monte Carlo simulation of the doubly truncated normal model: on the dimension of σ. γ = 0. 95, μ = 0, B = 0. 5. The number of repetitions is 1,000. Mean absolute errors (MAE) are given in parentheses
Table 3.4 Monte Carlo simulation of the doubly truncated normal model: on the dimension of γ. \(\sigma = \sqrt{2}\), μ = 0, B = 0. 5. The number of repetitions is 1,000. Mean absolute errors (MAE) are given in parentheses
Table 3.5 Monte Carlo simulation of the doubly truncated normal model: on the dimension of B. \(\sigma = \sqrt{2}\), γ = 0. 95, μ = 0. The number of repetitions is 1,000. Mean absolute errors (MAE) are given in parentheses
Table 3.6 Monte Carlo simulation of the doubly truncated normal model: On the dimension of μ. \(\sigma = \sqrt{2}\), γ = 0. 95, B = 0. 5. The number of repetitions is 1,000. Mean absolute errors (MAE) are given in parentheses

First, all parameters in the doubly truncated normal model appear to be well-estimated, with biases and MAE’s that fall as sample sizes rise. The biases are generally small, and the MAE’s of almost all estimates decrease at \(\sqrt{ N}\) rate as N increases, except that of \(\hat{\mu }\) in a couple of particular cases. More specifically, when σ is small (i.e., the variation in the composite error is small), \(\hat{\mu }\) does not converge at the optimal rate as N increases (see Table 3.3). The same happens when B is large (see Table 3.5). This observation is connected with the well-known difficulty of identifying μ in the singly truncated model (B → ) from finite sample. As is well known, the technological parameter α in the singly truncated normal model is consistently estimated. However, estimates of distributional parameters in the singly truncated model are not well-defined and thus we do not calculate the corresponding MAE’s.

Second, Table 3.3 shows that as σ becomes smaller, the MAE of \(\hat{\alpha }\) is monotone decreasing, while the MAE’s of \(\hat{\sigma }\), \(\hat{\gamma }\), and \(\hat{\mu }\) is monotone increasing. To reconcile the apparent divergence, note that the composite error \(\varepsilon\) is noise for the technological parameters, but signal for distributional parameters. The effect of σ on the MAE of \(\hat{B}\) is ambiguous, which decreases at first and then increases as σ becomes smaller.

Third, if we mistakenly estimate a singly truncated model on a DGP with double truncation, we tend to underestimate the average technical efficiency (ATE). This is understandable since the singly truncated model may treat some extreme (negative) measurement errors as inefficiencies. Within the doubly truncated model, it is also clear that as B becomes larger, the ATE decreases (See Table 3.5). However, our simulation results show that the efficiency ranking would not be affected if we estimate a misspecified model.

Finally, as is expected, MLE correctly estimates the doubly truncated normal model when the composite error has positive population skewness. This is evident in Table 3.6, where the third case (μ = 0. 3, B = 0. 5) corresponds to negative (positive) skewness in u (\(\varepsilon\)). In all cases, the double truncation in the DGP of u makes finite-sample positive skewness more probable, resulting in many zero \(\hat{\gamma }\)’s (super-efficiency) from the misspecified (singly truncated) model. Hence the average \(\hat{\gamma }\)’s in the misspecified model are generally much lower than the true value.

3.7 An Empirical Illustration to Analyze US Banking Industry Dynamics

3.7.1 Empirical Model and Data

We now apply the bounded inefficiency (BIE) model to an analysis of the US banking industry, which underwent a series of deregulatory reforms in the early 1980s and 1990s, and experienced an adverse economic environment in the last few turbulent years of 2000s.Footnote 12 Our analysis covers a lengthy period between 1984 and 2009 and our illustration aims to use the panel variant of our BIE model to capture efficiency trends of the US banking sector during these years as well as how the lower bound of inefficiency also changed as the market became more or less competitive vis-a-vis inefficient firms.

Following Adams et al. (1999) and Kneip et al. (2012), we specify a multi-output/multi-input stochastic output distance frontier model asFootnote 13

$$\displaystyle{ Y _{it} = Y _{{it}^{{\ast}}}^{{\prime}}\gamma + X_{ it}^{{\prime}}\beta + v_{ it} - u_{it}, }$$
(3.22)

where Y it is the log of real estate loans; X it is the negative of log of inputs, which include demand deposit (dd), time and savings deposit (dep), labor (lab), capital (cap), and purchased funds (purf).Footnote 14 Y it includes the log of commercial and industrial loans/real estate loans (ciln) and installment loans/real estate loans (inln). In order to account for the riskiness and heterogeneity of the banks we include the log of the ratio of equity to total assets (eqrt) which usually measures the risk of insolvency of the banks in banking literature.Footnote 15 The lower the ratio the more riskier a bank is considered. We assume the v it are iid across i and t, and for each t, u it has a upper bound B t . Then we can treat this model as a generic panel data bounded inefficiency model as discussed in Sect. 3.5. Once the individual effects u it are estimated, technical efficiency for a particular firm at time t is calculated as \(TE =\exp (u_{it} -\max _{1\leq j\leq N}u_{jt})\).

The output distance function is known as a Young index (ratio of the geometric mean of the outputs to the geometric mean of the inputs) described in Balk (2008), which leads to the Cobb-Douglas specification of the distance function introduced by Klein (1953). Although this functional form has been criticized for its separability and curvature properties it remains a reasonable and parsimonious first-order local approximation to the true function (Coelli 2000) and we use it in our limited empirical illustration of the bounded stochastic frontier model. We use the parsimonious Cobb-Douglas model as well to allow comparisons with the results from our Monte Carlo simulations, which due to the need to estimate highly nonlinear models, have been somewhat limited by computational and time constraints to a relatively simple linear in logs specification.Footnote 16 Translog distance function estimates, which one may view as more general, have their own attendant problems due to multicollinearity in the second order terms of the four-output/five-input technology. This typically is addressed by utilizing additional restrictions, such as those imposed by cost minimization or profit maximization, in order to be empirically identify the translog parameters.Footnote 17 We do not use these side conditions to empirically identify the parameters due to our use of a stochastic frontier model that admits to technical inefficiency but does not attempt to trace this inefficiency to its logical implication in the first order conditions of cost minimization or profit maximization (the so-called “Greene problem”, Kutlu 2013). Utilization of side conditions to address errors in the optimization of allocations is beyond the scope of this paper. That said, our translog estimates have provided qualitatively similar results, which are available on request.

We use US commercial banking data from 1984 first quarter through 2009 third quarter. There are several ways in which data can be merged or deleted depending on whether or not banks continued as independent entities during the sample period we consider in our illustration of the insights gained by the bounded inefficiency model. One approach is to express the data for a bank on a pro-forma basis that goes back in time to account for mergers. For example, if a bank in 2008 is the result of a merger in 2008 then the pre-2008 data is merged on a pro-forma basis wherein the non-surviving bank’s data is viewed as part of the surviving bank in earlier time periods. The Federal Reserve uses this approach in estimating risk measurement models, such as the Charge-off at Risk Model (Frye and Pelz 2008), which is the basis of risk dashboards used for centralized bank supervision. This sample design reflects methodologies used by banks in calibrating credit risk models, such as those used for Basel III and for Comprehensive Capital Analysis and Review (CCAR).Footnote 18 An alternative to the retroactive merging in of legacy banks is to utilize an unbalanced design wherein banks simply attrit from the sample when their ownership changes. Although at first blush this would seem to address the problem of selection in cases when weaker banks get taken over, there are also many cases of mergers-of-equals as well (e.g., JP Morgan and Bank One merger). Roughly 84 % of banks in our sample ceased their operation due to reasons other than failure, such as merger or voluntary liquidation, or remained inactive, or were no longer regulated by the Federal Reserve. Almanidis and Sickles (2012) have proposed a general model that combines the mixture hazard model with the canonical stochastic frontier model to investigate the main determinants of the probability and time to failure of a panel of US commercial banks during the financial distress that began in August of 2007. In their analysis they focused on banks failures, not on ownership changes or changes in regulatory oversight that were not due to liquidation due to financial distress. Unlike the standard hazard model, which would assume that all banks in the sample eventually experience the event (failure), the mixture hazard model distinguishes between healthy (long-term survivors) and at-risk banks. Almanidis and Sickles did not find that selection on banks per se impacted their estimates in any significant way. Moreover, their formal mixture hazard framework is far removed from the basic modeling issues addressed in this paper, namely the introduction of a different stochastic frontier paradigm that acknowledges a lower bound to inefficient firm operating practices. In order to maintain comparability between our results and those from many other studies using stochastic frontier analysis and to find some middle ground between the pro-forma merging algorithm practiced by the Federal Reserve and the deletion of firms from the sample that attrit and the potential misspecification due to the many potential ways (unobserved in our sample) in which such attrition may have occurred, we utilize a balanced panel and study only firms that have remained in business during our sample period.

The data is a balanced panel of 4,193 commercial banks and was compiled from the Consolidated Reports of Condition and Income (Call Report) and the FDIC Summary of Deposits. The data set includes 431,879 observations for 103 quarterly periods. This is a fairly long panel and thus the assumption of time-invariant inefficiencies does not seem tenable. For this reason we compare the estimates from our BIE model to the estimates from other time-varying effects models such as CSSW (the within variant of Cornwell et al. (1990)) and BC (Battese and Coelli 1992) models, along with the baseline fixed effect estimator (FIX) of Schmidt and Sickles (1984). Descriptive statistics for the bank-level variables are given in Table 3.7, where all nominal values are converted to reflect 2000 year values.

Table 3.7 Descriptive statistics for bank-specific variables

3.7.2 Results

Table 3.8 compares the parameter estimates of the bounded inefficiency (BIE) model with that of FIX, CSSW, and BC.Footnote 19 The structural parameters are statistically significant at the 1 % level and have the expected sign for all four models. The adjusted Bera and Premaratne (2001) skewness test statistic is calculated to be 990. 26, leading to rejection of the null hypothesis of symmetry at any conventional significance level. The asymmetry of the least squares residuals is also verified by quantile-quantile plot representation in Fig. 3.1. The technology parameters from BIE model are somewhat different from those obtained from other models. The negative value of the coefficient of the eqrt implies that riskier firms tend to produce more loans, and especially real estate loans that are considered of high risk. The positive sign of the estimate of the time trend shows technological progress on average. There is a slight difference between the distributional parameters of BIE and BC model which are also statistically significant at any conventional significance level. We also tested ( not reported here) other distributional specifications for BIE discussed above. The distributional parameters obtained from normal-truncated half-normal model did not differ very much from that reported in the table, but those obtained from normal-truncated exponential model did. However, this is not a specific to bounded inefficiency models. Similar differences have been documented in unbounded SF models as well.

Table 3.8 Comparisons of various estimators. Estimates and standard errors (in parentheses) for each model parameters from competing models (FIX, CSSW, BC, BIE)
Fig. 3.1
figure 1

Quantile-quantile plot

We also estimate the time-varying inefficiency bound, B, using two approaches. First we estimate the bound for the panel data model without imposing any restriction on its sample path. In the second approach we specify the bound as a sum of weighted time polynomials. We choose to fit a fifth degree polynomial the coefficients of which are estimated by MLE along with the rest parameters of the model.Footnote 20 Both approaches are illustrated in Fig. 3.2 with their respective 95 % confidence intervals. It can be seen that the inefficiency bound has had a decreasing trend up to year 2005, when the financial crisis (informally) began, and then it is increasing for the remaining periods through the third quarter of 2009. One interpretation of this trend can be that the deregulations in 1980s and 1990s increased competitive pressures and forced many inefficient banks to exit the industry, reducing the upper limit of inefficiency that banks could sustain and still remain in their particular niche market in the larger banking industry. The new upward trend can be attributed to the adverse economic environment and an increase in the proportion of banks that are characterized as “too big to fail.”

Fig. 3.2
figure 2

Estimated and smoothed inefficiency bound

Fig. 3.3
figure 3

Averaged efficiencies from each estimator

Of course, for time-varying efficiency models such as CSSW, BC, and BIE, average efficiencies change over time.Footnote 21 These are illustrated in Fig. 3.3 along with their 95 % confidence bounds. The BIE averaged efficiencies (panel 4) are significantly higher than those obtained from the fixed effect time-invariant model. However, the differences are small compared to BC and CSSW models. These small differences are not unexpected, however, since the existence of the inefficiency bound implies that the mean conditional distribution of inefficiencies is also bounded from above, resulting in higher average efficiencies. Failing to take the bound into account could possibly yield underestimated mean and individual efficiency scores (see Table 3.1). We smooth the BIE averaged efficiencies by fitting ninth degree polynomial of time in order to capture their trend and also to be able to compare them with other two time-varying averaged efficiency estimates. These are represented by a curve labeled BIEsmooth. It can be seen that the efficiency trend for the BIE model is in close agreement with the CSSW model and better reflects the deregulatory reforms and consolidation of the US commercial banking industry. It is increasing initially and then falls soon after the saving and loans (S&L) crisis of early 1990s began. It has the decreasing pace and reaches its minimum in 1993 a year before Congress passed the Reigle-Neal Act which allowed commercial banks to merge with and acquire banks across the state lines. This spurred a new era of interstate banking and branching, which along with the Gramm-Leach-Billey Act that granted broad-based securities and insurance power to commercial banks, substantially decreased the number of banks operated in the US from 10,453 in 1994 to 8,315 by the end of the millennium. After 1994 the banking industry witnessed a rapid increase in averaged efficiencies of its institutions due in part to the disappearance of inefficient banks previously sheltered from competitive pressure and due to the expansion of large banks that both financially and geographically diversified their products. The increasing trend continues until the new recessionary period of 2001 and then steadily falls thereafter until the rapid decline illustrating the effects of the 2007–2009 crisis. The CSSW model is able to show the weakness of the banking industry as early as 2005. This weakness is illustrated by the estimated inefficiency bound from the BIE model. On the other hand, the BC model shows a slight, statistically non-significant, upward efficiency trend for all these periods (η = 0. 0066).

In sum, Figs. 3.2 and 3.3 display an interesting findings: on one hand, an upward trend is observed for the average efficiency of the industry, presumably benefiting from the deregulations in the 1980s and 1990s; on the other hand, the industry appears to be more “tolerant” of less efficient banks in the last decade. Possibly, these banks have a characteristic that we have not properly controlled for and we are currently examining this issue. Given the recent experiences in the credit markets due in part to the poor oversight lending authorities gave in their mortgage and other lending activities, our results also may be indicative of a backsliding in the toleration of inefficiency that could have contributed to the problems the financial services industry faces today.

3.8 Conclusions

In this paper we have introduced a series of parametric stochastic frontier models that have upper (lower) bounds on the inefficiency (efficiency). The model parameters can be estimated by maximum likelihood, including the inefficiency bound. The models are easily applicable for both cross-section and panel data settings. In the panel data setting, we set the inefficiency bound to be varying over time, hence contributing another time-varying efficiency model to the literature. We have examined the finite sample performance of the maximum likelihood estimator in the cross-sectional setting. We also have showed how the positive skewness problem inherent in traditional stochastic frontier model can be avoided when the bound is taken into account. An empirical illustration focusing on the US banking industry using the new model revealed intuitive and revealing trends in efficiency scores.