Constrained nonparametric estimation of input distance function

Sun, Kai

doi:10.1007/s11123-013-0372-9

Constrained nonparametric estimation of input distance function

Published: 23 November 2013

Volume 43, pages 85–97, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Productivity Analysis Aims and scope Submit manuscript

Constrained nonparametric estimation of input distance function

Download PDF

Kai Sun¹

387 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

This paper proposes a constrained nonparametric method of estimating an input distance function. A regression function is estimated via kernel methods without functional form assumptions. To guarantee that the estimated input distance function satisfies its properties, monotonicity constraints are imposed on the regression surface via the constraint weighted bootstrapping method borrowed from statistics literature. The first, second, and cross partial analytical derivatives of the estimated input distance function are derived, and thus the elasticities measuring input substitutability can be computed from them. The method is then applied to a cross-section of 3,249 Norwegian timber producers.

Global and local distance-based generalized linear models

Article 21 May 2015

On the estimation of spatial stochastic frontier models: an alternative skew-normal approach

Article Open access 06 August 2019

Bayesian Nonparametric Spatially Smoothed Density Estimation

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

One of the main objectives of productivity analysis is to estimate a representation of technology using econometrics or data envelopement analysis (DEA), among others. It is well-known that DEA assumes no functional form of a frontier, and linear programming allows one to impose linear constraints on each observation. However, when it comes to econometric methods, a functional form is usually required, and information (e.g., marginal cost/product) can be calculated after parameters are estimated. Researchers should then check if the estimated information follows theoretical properties of the technology. While using a large micro-level data set, however, it may well be the case that such properties are not satisfied for a subset of observations. In order to address this issue, Terrell (1996), Rambaldi and Doran (1997), Ryan and Wales (2000), and Henningsen and Henning (2009), among others, proposed methods of imposing various constraints on parametric functions. However, it is still a problem to impose these constraints on a nonparametric function.

In this paper, we provide solutions to the two above-mentioned problems of econometric methods, i.e., functional forms that are not flexible and theoretical constraints that are difficult to impose. To address the first problem, we propose estimating a technology without a parametric functional form via nonparametric kernel econometric methods. Thus, the functional form assumption is relaxed and the technology is estimated in a fully flexible manner. The price one has to pay for such flexibility, however, is many observations may violate properties of the technology. Therefore, it is desirable that constraints are imposed on the nonparametric regression function such that the properties are satisfied for each individual observation. To do this, we apply the new approach of constraint weighted bootstrapping (CWB) first introduced by Hall and Huang (2001) and further studied by Du et al. (2013) and Parmeter et al. (2013).

To explore the kernel and the CWB methods, we first need to specify a technology as our main focus. In this paper, we choose an input distance function (IDF) because although the functional form of an IDF is generally unknown (Färe et al. 1994), many past and more recent studies have sought to estimate parametric distance functions specifying a translog functional form and using ordinary least squares (OLS) to estimate the unknown parameters (Lovell et al. 1994; Grosskopf et al. 1997; Ray 2003; Cuesta et al. 2009, among others). Färe et al. (1985, 1994) and Coelli and Perelman (1999), among others, used DEA to estimate the distance function without specifying any functional form. This paper fills the gap by estimating the IDF using nonparametric econometric methods without any functional form assumption. ^{Footnote 1} Furthermore, monotonicity constraints based on the properties of the IDF are imposed via CWB. ^{Footnote 2} Our methodology extends to other representations of technology in a straightforward manner. As a by-product of the econometric estimation, the first and second order analytical derivatives of the nonparametric IDF are derived, and thus the elasticities measuring input substitutability/complementarity can be calculated from them.

As an empirical example, we apply the proposed methodology to a Norwegian forestry data set from Lien et al. (2007) compiled by Statistics Norway. Both the unconstrained and constrained nonparametric IDFs as well as the implied elasticities are estimated and results are compared. We find that without imposing constraints, 18.25 % observations violate one of the theoretical properties of the IDF. The Kolmogorov–Smirnov test shows that the gradients and the elasticities calculated from the constrained IDF significantly differs from those calculated from the unconstrained counterpart. Finally, we reported density plots of the estimated Antonelli, Morishima, and Symmetric elasticities of complementarity.

The rest of the paper is organized as follows. Section 2 describes the methodology of the constrained nonparametric econometric estimation methodology. Section 3 applies the methodology to a real data set. Section 4 discusses limitations and possible extensions of the current method, and Sect. 5 concludes the paper.

2 Methodology

2.1 Nonparametric estimation of a distance function via kernel methods

The distance function representation of a production technology, proposed by Shephard (1953, 1970) does not require any aggregation, prices, or behaviorial assumptions. Following Färe and Primont (1995), we first define the production technology of the firm using the input set, L(Y), which represents the set of all K inputs, ${X\in \mathbb{R}^{K}_{+}}$, which can produce the vector of Q outputs, ${Y\in \mathbb{R}^{Q}_{+}}$. That is:

$$ L(Y)=\{X\in {\mathbb{R}}^{K}_{+}: X\,\hbox{can produce}\,Y\}. $$

(1)

We assume that the technology satisfies the standard axiom of strong disposability. The IDF is then defined on the input set, L(Y), as:

$$ D(X,Y)=\max \{\rho: (X/\rho)\in L(Y)\}, $$

(2)

where ρ is the scalar distance by which the input vector can be deflated. D(X, Y) satisfies the following properties: (1) it is non-increasing in each output level; (2) it is non-decreasing in each input level; (3) it is homogeneous of degree 1 in X. ^{Footnote 3} It is based on an input-saving approach and gives the maximum amount by which an input vector can be radially contracted while still being able to produce the same output vector. The IDF, D(X, Y), will take a value which is greater than or equal to one if the input vector, X, is an element of the feasible input set, L(Y). That is, D(X,Y) ≥ 1 if $X\in L(Y)$. The distance function will take a value of unity if X is located on the inner boundary of the input set.

To empirically estimate the distance function, we first define:

$$ D \equiv A\cdot D(X,Y), $$

(3)

where A is the productivity parameter, and X and Y are the input and output vectors, respectively. Using property (3) of the IDF, viz., homogenous of degree 1 in X, we can write (3) as:

$$ D/X_1 = A \cdot D(\tilde{X}, Y), $$

(4)

where X ₁ is the numeraire input, and $\tilde{X}$ is a vector of input ratios, with elements $\tilde{X}_{k}=X_{k}/X_{1}, \forall k=2,{\ldots}, K$. Taking the natural logarithm for both sides gives:

$$ \ln D - \ln X_1 = \ln A + \ln D(\tilde{X}, Y). $$

(5)

Letting D = 1 would give: ^{Footnote 4}

$$ \begin{aligned} -\ln X_1 &= \ln D(\tilde X, Y) + \ln A\\ &= \ln D(\exp(\ln\tilde{X}),\exp (\ln Y)) + \ln A\\ &\equiv m(\ln\tilde{X},\ln Y) + v, \end{aligned} $$

(6)

where $v=\ln A$ is the noise term, interpreted as the natural logarithm of the productivity parameter. ^{Footnote 5} Using general notation, (6) can be written as:

$$ {\mathcal{Y}}=m(z)+v, $$

(7)

where $\mathcal{Y}=-\ln X_{1}$, $m(\cdot)$ is the unknown smooth distance function, z is the vector of continuous variables (i.e., $\ln \tilde{X}_{k}, \forall k=2,{\ldots}, K;\,\ln Y_{q}, \forall q=1,{\ldots},Q$), and v is the random error uncorrelated with any element of z.

To estimate the unknown function, one can use the local-constant least-squares estimator of m(z) (see Li and Racine 2006 for more details), given by:

$$ \hat{m}(z)={\frac{\sum_{i=1}^{n} K({\frac{z_i-z}{h}}) {\mathcal{Y}}_i}{\sum_{i=1}^{n} K({\frac{z_i-z}{h}})}}, $$

(8)

where $K(\cdot)$ is a (scalar) Gaussian product kernel weighting function for the continuous variables (see “Appendix 2” for an explicit expression); n denotes the sample size; h is a vector of bandwidth, with each element for a particular variable in the z vector.

Estimation of the bandwidths, h, is typically the most salient factor when performing nonparametric estimation. Although many selection methods exist, we utilize the data-driven least-squares cross validation (LSCV) method. Specifically, the bandwidths are chosen to minimize

$$ n^{-1} \sum_{i=1}^{n} [{\mathcal{Y}}_i - \hat{m}_{-i}(z_{i})]^{2}, $$

(9)

where $\hat{m}_{-i}(\cdot) = {\frac{\sum_{j \neq i}^{n} K({\frac{z_j-z_i}{h}}) \mathcal{Y}_j}{\sum_{j \neq i}^{n} K({\frac{z_j-z_i}{h}})}}$ is the leave-one-out local-constant kernel estimator of $m(\cdot)$. We use the npregbw function from the np package (Hayfield and Racine 2008) in R (R Development Core Team 2011) to estimate the bandwidth vector. This bandwidth vector is then plugged into (8) to estimate the IDF.

To calculate the derivatives of the distance function with respect to each input and output, (6) can be re-written as:

$$ \ln D = \ln X_{1} + m(\ln \tilde{X}_{2}, {\ldots}, \ln \tilde{X}_{K}, \ln Y_1, {\ldots}, \ln Y_Q) + v. $$

(10)

For example, the first partial derivatives of interest, that should be investigated to guarantee the monotonicity properties of the IDF, are:

$$ {\frac{\partial \ln D}{\partial \ln {X}_1}} = 1-\sum_{k=2}^K {\frac{\partial m}{\partial \ln \tilde{X_{k}}}}, $$

(11)

$$ {\frac{\partial \ln D}{\partial \ln {X}_k}} = {\frac{\partial \ln D}{\partial \ln \tilde{X}_k}} = {\frac{\partial m}{\partial \ln \tilde{X_{k}}}}, \quad \forall k=2,{\ldots},K, $$

(12)

and ^{Footnote 6}

$$ {\frac{\partial \ln D}{\partial \ln {Y}_q}} = {\frac{\partial m}{\partial \ln Y_q}}, \quad \forall q=1,{\ldots},Q, $$

(13)

where the first partial derivative of $m(\cdot)$ with respect to a particular argument, say, $z_l \in \{\ln \tilde{X}_{2}, {\ldots}, \ln \tilde{X}_{K}, \ln Y_1, {\ldots}, \ln Y_Q\}$, is

$$ {\frac{\partial m}{\partial z_l}} = \sum_{i=1}^n {\frac{\left({\frac{z_{li}-z_l}{h_l^2}}\right)K(\cdot)\sum_i K(\cdot) - K(\cdot)\sum_i \left[\left({\frac{z_{li}-z_l}{h_l^2}}\right) K(\cdot)\right]}{(\sum_i K(\cdot))^2}} \cdot (-\ln X_{1i}). $$

(14)

See “Appendix 2” for detailed derivation of the first and the second partial derivatives of $m(\cdot)$ with respect to z _l, and the cross partial derivatives with respect to z _l and z _k, ∀l ≠ k.

2.2 Imposition of regularity constraints

Recall that the IDF has the following theoretical properties:^{Footnote 7}

$$ {\frac{\partial \ln D}{\partial \ln X_k}} \geq 0, \quad \forall k=1,{\ldots},K $$

(15)

and

$$ {\frac{\partial \ln D}{\partial \ln Y_q}} \leq 0, \quad \forall q=1,{\ldots},Q. $$

(16)

However, when it comes to empirical estimation, it is very likely for one to obtain violations of these properties for some individual observation. Most empirical researchers check these regularity conditions at the mean of the data instead of every data point, and report results evaluated at the mean. This practice defeats the purpose of using micro data. Results may not be of much use for policy analysis if the theoretical restrictions are violated for many individual producers. Instead of ignoring results that violate rationality, we use a new statistical method that imposes these economic constraints. We then calculate the gradients of the IDF and the elasticities based on that all these joint constraints are satisfied.

In order to impose such observation-specific inequality constraints, we follow the constraint weighted bootstrapping (CWB) method first proposed by Hall and Huang (2001) and further studied by Du et al. (2013) and Parmeter et al. (2013), whose idea is to transform the response variable by assigning observation-specific weights such that certain constraints in the model are satisfied. To illustrate this methodology, let ${\{\mathcal{Y}_{i},z_{i}\}^{n}_{i=1}}$ denote sample pairs of response and explanatory variables, where $\mathcal{Y}_{i}$ is a scalar, ^{Footnote 8} z _i is of dimension (K + Q − 1), and n denotes the sample size. The goal is to estimate the conditional mean model $\mathcal{Y}=m(z)+v$, subject to constraints on the first order gradient of the l-th element in z, m _l(z) = ∂m(z)/∂z _l, where z _l is the l-th element of the vector z, or on a linear combination of any of the first order gradients.

We can express the local-constant estimator as:

$$ \hat{m}(z)=n\cdot\sum^{n}_{i=1}A_{i}(z)n^{-1}{\mathcal{Y}}_{i}, $$

(17)

where $A_{i}(z)={\frac{K({\frac{z_i-z}{h}})}{\sum_{i=1}^{n} K({\frac{z_i-z}{h}})}}, K(\cdot)$ and h are defined the same as in (8). ^{Footnote 9} The first order gradient of the local-constant estimator, $\hat m_l(z)$, can be expressed as:

$$ \hat{m}_l(z)=n\cdot\sum^{n}_{i=1}A_{i,l}(z)n^{-1}{\mathcal{Y}}_{i}, $$

(18)

where $A_{i,l}(z)={\frac{\partial A_i(z)}{\partial z_l}}$ (see “Appendix 2” for an explicit expression of the derivative of A _i(z) with respect to z _l). Therefore, a particular linear combination of these first order gradients is:

$$ 1-\sum_{l=1}^{K-1} \hat{m}_l(z) = 1 - \sum_{l=1}^{K-1} \left( n\cdot\sum^{n}_{i=1}A_{i,l}(z)n^{-1}{\mathcal{Y}}_{i} \right), $$

(19)

which is used to impose constraints in the form of (11).

To impose the monotonicity constraints, re-write (18) and (19) as:

$$ \hat{m}_l(z|p)=n\cdot\sum^{n}_{i=1}A_{i,l}(z)p_i{\mathcal{Y}}_{i}, \; \forall l=1,{\ldots},K+Q-1, \hbox{ and} $$

(20)

$$ 1-\sum_{l=1}^{K-1} \hat{m}_l(z|p)= 1- \sum_{l=1}^{K-1} \left( n\cdot\sum^{n}_{i=1}A_{i,l}(z)p_i{\mathcal{Y}}_{i} \right), $$

(21)

where p _i is the weight for the ith observation of the response $\mathcal{Y}$, and ∑ ⁿ_i=1 p _i = 1. ^{Footnote 10} If p _i = 1/n (i.e., uniform weights), then the constrained estimator will reduce to the unconstrained estimator.

The goal is to transform the response as little as possible through the weights such that the constraints are satisfied. The following is the weight selection criterion proposed by Du et al. (2013) and Parmeter et al. (2013):

$$ \begin{aligned} &p^{*} = \hbox{argmin} {\mathcal{D}}(p)=(p_{u}-p)^{\prime}(p_{u}-p)\\ &\quad\hbox{st}.\,{\bf l}(z) \leq \hat{m}_l(z \mid p) \leq {\bf u}(z), \quad \forall l=1,{{\ldots}}, K+Q-1, \hbox{ and}\\ &\quad\quad {\bf l}(z) \leq 1 - \sum_{l=1}^{K-1} \hat{m}_l(z \mid p) \leq {\bf u}(z), \end{aligned} $$

(22)

where p ^* is a vector of optimal weight for each response observation, $\mathcal{D}(p)$ is an L ₂ metric, p _u is a vector of uniform weights (i.e., 1/n), which can also be viewed as an initial search condition, l(z) and u(z) represent observation-specific lower and upper bounds for $\hat{m}_l(z \mid p)$, respectively. If l(z) = 0 and ${\bf u}(z)=+\infty$, then we can impose monotonically increasing constraints; if u(z) = 0 and ${\bf l}(z)=-\infty$, then we can impose monotonically decreasing constraints. The optimization problem (22) is a standard quadratic programming problem that can be numerically solved using the quadprog package (Berwin and Weingessel 2011) in R. The constrained estimator $\hat m(z \mid p^*) = n \sum_{i=1}^n A_i(z) p_i^* \mathcal{Y}_{i}$ can then be calculated using the optimal weight for each observation, p ^*_i . Following Du et al. (2013) and Parmeter et al. (2013), the same bandwidth vector estimated from the unconstrained IDF are used to estimate the constrained IDF, as the same A _i(z) appears in both the unconstrained and the constrained estimator. The codes for estimating the constrained model are available from the author upon request.

2.3 Elasticities from the distance function

After the first, second, and cross partial derivatives of the IDF satisfying theoretical restrictions are estimated, three types of elasticities measuring input substitutability/complementarity can be computed without any additional information (Stern 2011), viz., the Antonelli elasticity of complementarity (AEC), the Morishima elasticity of complementarity (MEC), and the Symmetric elasticity of complementarity (SEC). The formulas are provided here for convenience:

$$ AEC_{kl} = {\frac{D \cdot D_{kl}}{D_k \cdot D_l}}, $$

(23)

$$ MEC_{kl} = {\frac{D_{kl}\cdot X_{l}}{D_{k}}}-{\frac{D_{ll}\cdot X_{l}}{D_{l}}}, $$

(24)

and

$$ SEC_{kl} = \frac{{\frac{-D_{kk}}{D_k D_k}}+2{\frac{D_{kl}}{D_k D_l}}-{\frac{D_{ll}}{D_l D_l}}}{\frac{1}{D_k X_k}+\frac{1}{D_l X_l}}, $$

(25)

$\forall k,l=1,{\ldots},K$. D _k (D _kk) and D _l (D _ll) are the first (second) partial derivatives of the IDF with respect to the kth and lth input, respectively; and D _kl are the cross partial derivatives with respect to the kth and lth input. Here both the AEC and the SEC are symmetric: they give the same elasticity estimate no matter what input causes a change. The MEC is not symmetric for the arguments made in Blackorby and Russell (1981). If the MEC > 0, then the two inputs are complements in the Morishima sense. Similar interpretation applies to the AEC and the SEC. For comparison purposes, in the application section, the elasticities are calculated from the IDF both with and without the theoretical restrictions.

3 Application

As an empirical illustration of the proposed methodology, we use a cross-sectional data set of 3,249 active forest owners (i.e., owners who harvest trees) for the year 2003 compiled by Statistics Norway. According to Statistics Norway, ^{Footnote 11} the value added in Norwegian forestry was estimated at Norwegian Krone (NOK) 5.4 billion in 2011. Timber sale is the largest component in Norwegian forestry. From 2001 to 2011, 36 % of the forest properties sold timber. In addition, forest owners can also earn income from selling hunting and fishing rights, leasing out sites and renting out cabins. The Ministry of Agriculture and Food of Norway (2007) reported that approximately 88 % of the forest area is privately owned, and the majority of the forest holdings are farm and family forests. During the last 80 years, timber stock increased because the annual timber growth has been considerably faster than the annual harvest.

Since this data set has been used in Lien et al. (2007) in which a detailed description of the sampling method is available, a brief description of the data is given as follows. The output variable (Y) consists of annual timber sales from the forest, measured in cubic meters. The labor input variable (X ₁) is the sum of hours worked by contractors and hours worked by the owner, his family or hired labor. The land input variable (X ₂) measures the forest area to be cut in hectares. The capital input variable (X ₃) is the amount of timber stock that can be cut without affecting future harvesting. Table 1 presents summary statistics in the sample.

Table 1 Summary statistics of the variables

Full size table

The estimation results are given in Tables 2, 3 and Figs. 1, 2, 3, 4. Table 2 reports the estimated bandwidth vector and shows percentages of violations of the monotonicity properties of the IDF. Table 3 presents the Kolmogorov–Smirnov testing results for equality of distributions between the information estimated from the unconstrained and the constrained models. Figure 1 reports the histogram of observation-specific weights for imposing the monotonicity constraints. Figures 2, 3, 4 plot the kernel densities of the gradient and elasticity estimates under the unconstrained and the constrained models.

Table 2 Bandwidths and percentages of violations

Full size table

Table 3 Testing for equality of distributions from the unconstrained and constrained models

Full size table

It can be seen from Table 2 that, the bandwidth estimate for each regressor is small enough (i.e., less than twice the standard deviation of the corresponding regressor) to indicate nonlinearity of the regression function, hence the appropriateness of the nonparametric approach. Using these bandwidth estimates, although nearly no violation of economic theory occurs for the gradients of $\ln X_1$ and $\ln Y$, there are 18.25 and 7.08 % of violations for the gradients of $\ln {X}_2$ and $\ln {X}_3$, respectively. ^{Footnote 12} This suggests that it should not be trivial to impose the economic constraints of (15) and (16).

The constraint weighted bootstrapping (CWB) method is then used to impose these constraints. Figure 1 plots the distribution of p ^*_i , the optimal weight for each observation suggested by CWB. It can be seen that most observations share similar weights, and these optimal weights are quite close to the uniform weights, 1/n = 1/3,249 ≈ 3 × 10⁻⁴. After the dependent variable is transformed by these weights, we can then use them to estimate the gradients and the elasticities under the constraints.

Figure 2 shows the kernel density estimates for the unconstrained and constrained distributions of the four gradients, i.e., $\partial \ln D/\partial \ln X_k, \forall k=1,2,3$, and $\partial \ln D/\partial \ln Y$, on which for each observation, a non-negativity constraint is imposed on the first three gradients, and a non-positivity constraint is imposed on the last one. A vertical line is drawn at zero and the shaded area highlights where the violations occur. The densities for the constrained gradients are plotted using the Silverman reflection method for boundary correction such that the estimated densities integrate to one. It can be seen that there are masses near zero with the constrained model for the gradients with many violations. Although very few violations are observed for the gradient of $\ln X_1$, the unconstrained and constrained distributions of it are not close to each other. This is because the gradient of the first input is essentially a linear combination of the gradients of the other two inputs, whose non-trivial amount of violations affect the constrained gradient of the first input.

We also plot the kernel density estimates of the elasticities under the unconstrained and constrained models, as Figs. 3 and 4 illustrate. It can be seen that, in most cases, the constrained elasticities have smaller variation than their unconstrained counterparts. This suggests that estimating an IDF satisfying its properties may improve the efficiency of the elasticity estimates from the IDF. A vertical line is drawn at zero for a better view of the percentage of observations whose inputs are substitutes/complements. For example, if the MEC between any two inputs is greater (less) than zero, then these inputs are Morishima complements (substitutes). However, for both Figures, the elasticity estimates from the unconstrained and constrained models seem to be quite close to each other. To convince the reader that significant differences exist between the distributions, Table 3 reports the p values from the Kolmogorov–Smirnov test for equality of distributions for the gradients and the elasticities estimated from the unconstrained and constrained models. It can be seen that the null of equality of distributions is rejected at the 5 % level in all cases except the AEC estimates between labor and capital.

4 Discussion

4.1 Choice of numeraire input

When it comes to estimating the IDF with some parametric functional form, e.g., Cobb–Douglas or Translog, the choice of the normalizing input does not affect the results. However, this is generally not the case for nonparametric estimation of the IDF. ^{Footnote 13} The empirical example chooses labor input as the numeraire input when estimating the IDF. This assumes that the labor input is endogenous. In order to see how the results change when other inputs (i.e., land or capital) are used for normalization, we simply report the unconstrained and the constrained gradient estimates in Figs. 5 and 6 in “Appendix 3”, which uses land and capital input as the numeraire, respectively. The elasticity plots are omitted to save space. Although it is recommended that researchers choose an input that is endogenous a priori, the question of how to find the most appropriate numeraire may be answered in future research.

4.2 Extensions to other representations of technology

The estimation procedure in Sect. 2 can be easily extended to other specifications of technology. We provide two examples here: a production and a cost function.

The production function can be written as $y=B\cdot F(X)$, where y is a scalar output, B is the productivity parameter, $F(\cdot)$ is an unknown function, and X is an input vector. Applying log transformation similar to (6) gives an estimable production function: $\ln y=f(\ln X) + u$, where $u=\ln B$ is the error term. The unknown function $f(\cdot)$ can be estimated nonparametrically via kernel methods. One can also impose some desirable constraints on $f(\cdot)$, e.g., $\partial \ln y / \partial \ln X \geq 0$ as a non-negative marginal product constraint.

For the cost function, it can be written as C = λC(W, Y), where C is the total cost, λ is the productivity parameter, $C(\cdot)$ is an unknown function, and W and Y are input price and output vectors, respectively. Applying the log transformation and imposing the homogeneity of degree one restriction in input price gives an estimable cost function: $\ln \tilde C=c(\ln \tilde W, \ln Y) + u$, where $\tilde{C}$ is the total cost divided by a numeraire input price, $\tilde{W}$ is the input price vector divided by the numeraire input price, and $u=\ln \lambda$ is the error term. The cost function $c(\cdot)$ can be estimated using kernel methods, and some necessary regularity constraints can be imposed on it such that (1) cost shares are constrained between zero and one: $0 \leq \partial \ln \tilde C/ \partial \ln \tilde W \leq 1$, and (2) marginal cost is non-negative: $\partial \ln \tilde C / \partial \ln Y \geq 0$.

After the regression functions are estimated under the constraints, different elasticities can then be calculated subject to the specification of choice and data availability. See Stern (2011) for a classification scheme of different definitions of elasticities based on primal and dual representations of technology.

4.3 Possibly endogenous regressors

It is well known that estimation of production/cost/distance functions may subject to the endogeneity problem that causes estimation results to be biased and inconsistent. Unfortunately, the nonparametric instrumental variable (IV) estimation is a quite young field—see Su and Ullah (2008) for a three-step estimation procedure for nonparametric simultaneous equations models via kernel methods, and Newey and Powell (2003) for IV estimation via series approximation, among others. It is unclear whether the CWB procedure can be seamlessly applied to the nonparametric structural models, which may be saved for future research.

5 Conclusion

This paper uses econometric methods to estimate an input distance function (IDF) without functional form assumptions, and imposes economic properties of the IDF on the estimated regression function via constraint weighted bootstrapping (CWB). As a by-product, the first, second, and cross partial analytical derivatives of the estimated IDF are derived, and thus various elasticities can be computed. Applying the proposed method to a cross-section of Norwegian forest owners, we find that imposing the constraints eliminates the problem of economic violations in empirical work, and therefore policy implications may be more reliable. The proposed method can be extended to other representations of technology in a straightforward manner, and this opens the door for further empirical work to estimate models subject to economic theory. As a future research topic, more work should be done on the unification of CWB and the nonparametric structural modeling approach.

Notes

Another primal representation of technology is a production function. A nonparametric production function via kernel regression has been studied by Henderson (2009), Du et al. (2013), among others. Furthermore, the estimation of the production function may require fewer regularities as it is arguable that the marginal product of labor can be negative because of labor hoarding or regulation (Heshmati et al. 2013).
An IDF is dual to a cost function, and therefore, it must satisfy conditions similar to those of the cost function. However, estimation of the IDF does not require input price data.
An input distance function is concave in inputs if the input requirement set, L(Y), is convex (Kumbhakar and Lovell 2000, p. 30). However, in this paper, we do not assume that L(Y) is convex, since “a convexity property is occasionally added to the list of properties satisfied by the input sets L(Y).” (Kumbhakar and Lovell 2000, p. 22).
Alternatively, the estimating equation can be derived by defining D(X,Y)≡ 1/A, and then imposing the homogeneity restriction and taking the natural logarithm. We would like to thank an anonymous referee for the suggestion.
In fact, the translog specification of $m(\cdot)$ in (6) can be derived from a second-order Taylor expansion of $m(\cdot)$, whose gradients are viewed as unknown parameters (Berndt and Christensen 1973). But we do not have to rely on the Taylor approximation, since we can estimate (6) using kernel-based nonparametric methods without specifying any parametric functional form.
By chain-rule, $\partial \ln D/\partial \ln X_k = \partial \ln D/\partial \ln \tilde X_k$, $\forall k=2,{\ldots},K$.
Since D, X _k, and Y _q are all non-negative, ${\frac{\partial \ln D}{\partial \ln X_k}} \geq 0$ implies ${\frac{\partial D}{\partial X_k}} \geq 0$, and ${\frac{\partial \ln D}{\partial \ln Y_q}} \leq 0$ implies ${\frac{\partial D}{\partial Y_q}} \leq 0$.
$\mathcal{Y}_{i} = -\ln X_{1i}$ in this paper.
The CWB method is also applicable to the local-linear kernel estimator. This is because (17) becomes the local-linear estimator (Li and Racine 2004) if we write
$$ A_i(z)=\left\{\sum_{i=1}^n K\left({\frac{z_i-z}{h}}\right) \left[ \begin{array}{cc} 1 & z_i-z \\ z_i-z & (z_i-z)(z_i-z)^{\prime} \\ \end{array} \right] \right\}^{-1} K\left({\frac{z_i-z}{h}}\right) \left[ \begin{array}{c} 1 \\ z_i-z \\ \end{array} \right] $$
Note that the A _i(z) in the local-linear case is a 2 × 1 block vector. The first element of it is the conditional mean, and the second element gives the gradient vector.
Du et al. (2013) showed that p _i can be either positive or negative for the purpose of imposing generalized constraints. In contrast, Hall and Huang (2001) used the power divergence metric which restricts 0 ≤ p _i ≤ 1.
See http://www.ssb.no/english/subjects/10/04/20/.
We also checked whether the estimated IDF is concave in all the inputs: before the monotonicity constraints are imposed, there are 2,080 out of the 3,249 observations, or about 64.02 % of the observations that satisfy the input concavity condition, and after the monotonicity constraints are imposed, there are 2,137, or about 65.77 % of the observations that satisfy the input concavity condition.
The author would like to thank an anonymous referee for this observation.

References

Berndt E, Christensen L (1973) The translog function and the substitution of equipment, structures, and labor in US manufacturing 1929–68. J Econom 1(1):81–113
Article Google Scholar
Berwin A, Weingessel A (2011) quadprog: Functions to solve Quadratic Programming Problems
Blackorby C, Russell R (1981) The Morishima elasticity of substitution: Symmetry, constancy, separability, and its relationship to the Hicks and Allen elasticities. Rev Econ Stud 48:147–158
Article Google Scholar
Coelli T, Perelman S (1999) A comparison of parametric and non-parametric distance functions: with application to European railways. Eur J Oper Res 117:326–339
Article Google Scholar
Cuesta R, Lovell C, Zofio J (2009) Environmental efficiency measurement with translog distance functions: a parametric approach. Ecol Econ 68:2232–2242
Article Google Scholar
Du P, Parmeter C, Racine J (2013) Nonparametric kernel regression with multiple predictors and multiple shape constraints. Stat Sin 23(3):1343–1372
Google Scholar
Färe R, Primont D (1995) Multi-output production and duality: theory and applications. Kluwer Academic Publishers, Boston
Book Google Scholar
Färe R, Grosskopf S, Lovell C (1985) The measurement of efficiency of production. Kluwer Academic Publishers, Boston
Book Google Scholar
Färe R, Grosskopf S, Lovell C (1994) Production frontiers. Cambridge University Press, Cambridge
Google Scholar
Grosskopf S, Hayes K, Taylor L, Webler W (1997) Budget constrained frontier measures of fiscal equality and efficiency in schooling. Rev Econ Stat 79:116–124
Article Google Scholar
Hall P, Huang H (2001) Nonparametric kernel regression subject to monotonicity constraints. Ann Stat 29(3):624–647
Article Google Scholar
Hayfield T, Racine J (2008) Nonparametric econometrics: the np package. J Stat Softw 27(5):1–32
Google Scholar
Henderson D (2009) A non-parametric examination of capital-skill complementarity. Oxf Bull Econ Stat 71:519–538
Article Google Scholar
Henningsen A, Henning C (2009) Imposing regional monotonicity on translog stochastic production frontiers with a simple three-step procedure. J Prod Anal 32:217–229
Article Google Scholar
Heshmati A, Kumbhakar S, Sun K (2013) Estimation of productivity in Korean electric power plants: a semiparametric smooth coefficient model. Working paper
Kumbhakar S, Lovell C (2000) Stochastic frontier analysis. Cambridge University Press, Cambridge
Book Google Scholar
Li Q, Racine J (2004) Cross-validated local linear nonparametric regression. Stat Sin 14:485–512
Google Scholar
Li Q, Racine J (2006) Nonparametric econometrics: theory and practice. Princeton University Press, Princeton
Google Scholar
Lien G, Størdal S, Baardsen S (2007) Technical efficiency in timber production and effects of other income sources. Small Scale For 6:65–78
Article Google Scholar
Lovell C, Richardson S, Travers P, Wood L (1994) Resources and functionings: a new view of inequality in Australia. In: Eichhorn W (ed) Models and measurement of welfare and inequality. Springer, Berlin, pp 787–807
Newey W, Powell J (2003) Instrumental variable estimation of nonparametric models. Econometrica 71(5):1565–1578
Article Google Scholar
Parmeter C, Sun K, Henderson D, Kumbhakar S (2013) Estimation and inference under economic restrictions. J Prod Anal. doi:10.1007/s11123-013-0339-x
R Development Core Team (2011) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Google Scholar
Rambaldi A, Doran H (1997) Applying linear time-varying constraints to econometric models: with an application to demand systems. J Econom 79(1):83–95
Article Google Scholar
Ray S (2003) Measuring scale efficiency from the translog multi-input, multi-output distance function. Department of Economics, University of Connecticut, no. 2003-25, Working papers
Ryan D, Wales T (2000) Imposing local concavity in the translog and generalized Leontief cost functions. Econ Lett 67:253–260
Article Google Scholar
Shephard R (1953) Cost and production functions. Princeton University Press, Princeton
Google Scholar
Shephard R (1970) Theory of cost and production functions. Princeton University Press, Princeton
Google Scholar
Stern D (2011) Elasticities of substitution and complementarity. J Prod Anal 36(1):79–89
Article Google Scholar
Su L, Ullah A (2008) Local polynomial estimation of nonparametric simultaneous equations models. J Econom 144(1):193–218
Article Google Scholar
Terrell D (1996) Incorporating monotonicity and concavity conditions in flexible functional forms. J Appl Econ 11(2):179–194
Article Google Scholar
The Ministry of Agriculture and Food of Norway (2007) Norwegian forests: policy and resources. The Ministry of Agriculture and Food of Norway, Oslo
Google Scholar

Download references

Acknowledgments

The author would like to thank Gudbrand Lien for providing the data set, Daniel J. Henderson, Subal C. Kumbhakar, Christopher F. Parmeter, attendees at the 2011 Econometric Society North American Winter Meeting, and the two anonymous referees for useful comments. The author is responsible for any remaining errors.

Author information

Authors and Affiliations

Economics and Strategy Group, Aston Business School, Aston University, Aston Triangle, Birmingham, B4 7ET, UK
Kai Sun

Authors

Kai Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kai Sun.

Appendices

Appendix 1

In order to obtain the derivative of the distance function with respect to each input in level form, we start from the equation in log form:

$$ \ln D(X,Y) = \ln X_{1} + m(\ln \tilde{X}_{2}, {\ldots}, \ln \tilde{X}_{K}, \ln Y_1, {\ldots}, \ln Y_Q) $$

where $\ln \tilde{X}_{k}= \ln X_{k} - \ln X_{1}, \forall k=2,{\ldots}, K$.

$$ \begin{aligned} {\frac{\partial \ln D}{\partial \ln X_{1}}}&=1-\sum_{k=2}^K {\frac{\partial m}{\partial \ln \tilde{X_{k}}}}\\ {\frac{\partial \ln D}{\partial \ln X_{k}}}&= {\frac{\partial m}{\partial \ln \tilde{X}_{k}}}, \quad \forall k=2,{{\ldots}},K\\ {\frac{\partial^{2} \ln D}{\partial \ln X_{1}^{2}}}&= -\sum_{k=2}^K {\frac{\partial^{2} m}{\partial \ln \tilde{X}_{k}^{2}}} \cdot {\frac{\partial \ln \tilde{X}_{k}}{\partial \ln X_{1}}}=\sum_{k=2}^K {\frac{\partial^{2}m}{\partial \ln \tilde{X}_{k}^{2}}}\\ {\frac{\partial^{2} \ln D}{\partial \ln X_{k}^{2}}}&= {\frac{\partial^{2} m}{\partial \ln \tilde{X}_{k}^{2}}},\quad \forall k=2,{{\ldots}},K\\ {\frac{\partial^{2} \ln D}{\partial \ln X_{1} \partial \ln X_{k}}}&=-{\frac{\partial^{2} m}{\partial \ln \tilde{X}_{k}^{2}}},\quad \forall k=2,{{\ldots}},K\\ {\frac{\partial^{2} \ln D}{\partial \ln X_{k} \partial \ln X_{l}}}&={\frac{\partial^{2} m}{\partial \ln \tilde{X}_{k} \partial \ln \tilde{X}_{l}}}, \quad \forall k,l=2,{{\ldots}},K \hbox{ and } k\neq l \end{aligned}$$

Once we obtain the derivatives in log form, it would be straightforward to recover the derivatives in level form.

$$ \begin{aligned} D_{k}&={\frac{\partial D}{\partial X_{k}}}={\frac{\partial \ln D}{\partial \ln X_{k}}} \cdot {\frac{D}{X_{k}}},\quad \forall k=1,{{\ldots}},K\\ D_{kk}&={\frac{\partial^{2} D}{\partial X_{k}^{2}}} = {\frac{\partial^{2} \ln D}{\partial \ln X_{k}^{2}}} \cdot {\frac{D}{X_{k}^{2}}} + {\frac{1}{X_{k}}}\left(D_{k}-{\frac{1}{X_{k}}}D\right){\frac{\partial \ln D}{\partial \ln X_{k}}}, \quad \forall k=1,{\ldots},K\\ D_{kl}&={\frac{\partial^{2} D}{\partial X_{k} \partial X_{l}}}={\frac{\partial^{2} \ln D}{\partial \ln X_{k} \partial \ln X_{l}}} \cdot {\frac{D}{X_{k}X_{l}}} + D_{l}\cdot {\frac{1}{X_{k}}} \cdot {\frac{\partial \ln D}{\partial \ln X_{k}}},\quad \forall k,l=1,{\ldots},K \hbox{ and } k\neq l \end{aligned} $$

Appendix 2

This appendix derives the first, second, and cross partial analytical derivatives of $\hat{m}(z)$ with respect to the lth continuous variable z _l.

$$ \hat{m}(z) = \sum_{i=1}^n A_i(z) {\mathcal{Y}}_i $$

where

$$ A_i(z)={\frac{K\left({\frac{z_i-z}{h}}\right)}{\sum_{i=1}^n K\left({\frac{z_i-z}{h}}\right)}} $$

where $\mathcal{Y}_i = -\ln X_{1i}$, and $K(\cdot)$ is a product kernel function:

$$ K(\cdot)=\prod_{s=1}^{q}K\left({\frac{z_{si}-z_{s}}{h_{s}}}\right) $$

For the lth continuous variable z _l,

$$ K\left({\frac{z_{li}-z_{l}}{h_{l}}}\right)= {\frac{1}{\sqrt{2 \pi}}} \exp\left(-{\frac{1}{2}} \left({\frac{z_{li}-z_{l}}{h_{l}}}\right)^{2}\right) $$

where h _l denotes the bandwidth for z _l.

$$ \begin{aligned} {\frac{\partial^r \hat{m}(z)}{\partial z_l^r}} &= \sum_{i=1}^n {\frac{\partial^r A_i(z)}{\partial z_l^r}} {\mathcal{Y}}_i, \quad \forall r=1,2\\ {\frac{\partial^2 \hat m(z)}{\partial z_l \partial z_k}} &= \sum_{i=1}^n {\frac{\partial^2 A_i(z)} {\partial z_l \partial z_k}} {\mathcal{Y}}_i, \quad \forall l\neq k. \end{aligned} $$

Therefore, the derivatives of $\hat{m}(z)$ with respect to continuous z variables can be expressed in terms of the derivatives of A _i(z) with respect to these variables. Specifically,

$$ \begin{aligned} {\frac{\partial A_i(z)}{\partial z_l}}&={\frac{T(\cdot)}{\left(\sum_{i=1}^n K(\cdot)\right)^2}}\\ {\frac{\partial^2 A_i(z)}{\partial z_l^2}}&={\frac{{\frac{\partial T(\cdot)}{\partial z_l}}\left(\sum_{i=1}^n K(\cdot)\right)^2 - 2T(\cdot)\left(\sum_{i=1}^n K(\cdot)\right)\left(\sum_{i=1}^n {\frac{\partial K(\cdot)}{\partial z_l}}\right)} {\left(\sum_{i=1}^n K(\cdot)\right)^4}}\\ {\frac{\partial^2 A_i(z)}{\partial z_l \partial z_k}}&={\frac{{\frac{\partial T(\cdot)}{\partial z_k}}\left(\sum_{i=1}^n K(\cdot)\right)^2 - 2T(\cdot)\left(\sum_{i=1}^n K(\cdot)\right)\left(\sum_{i=1}^n {\frac{\partial K(\cdot)}{\partial z_k}}\right)} {\left(\sum_{i=1}^n K(\cdot)\right)^4}} \end{aligned} $$

where

$$ \begin{aligned} T(\cdot)&={\frac{\partial K(\cdot)}{\partial z_l}}\sum_{i=1}^n K(\cdot)-K(\cdot)\sum_{i=1}^n {\frac{\partial K(\cdot)}{\partial z_l}} \\ {\frac{\partial T(\cdot)}{\partial z_l}}&={\frac{\partial^2 K(\cdot)}{\partial z_l^2}}\sum_{i=1}^n K(\cdot)- K(\cdot) \sum_{i=1}^n {\frac{\partial^2 K(\cdot)}{\partial z_l^2}}\\ {\frac{\partial T(\cdot)}{\partial z_k}}&={\frac{\partial^2 K(\cdot)}{\partial z_l \partial z_k}}\sum_{i=1}^n K(\cdot)- K(\cdot) \sum_{i=1}^n {\frac{\partial^2 K(\cdot)}{\partial z_l \partial z_k}} + {\frac{\partial K(\cdot)}{\partial z_l}} \sum_{i=1}^n {\frac{\partial K(\cdot)}{\partial z_k}} - {\frac{\partial K(\cdot)}{\partial z_k}} \sum_{i=1}^n {\frac{\partial K(\cdot)}{\partial z_l}} \end{aligned} $$

We can see the derivatives of A _i(z) are functions of $T(\cdot)$ and its derivatives. In order to calculate $T(\cdot)$ and its derivatives, we need to calculate the first, second, and cross partial derivatives of $K(\cdot)$ with respect to the continuous variables:

$$ \begin{aligned} {\frac{\partial K(\cdot)}{\partial z_l}}&=\left({\frac{z_{li}-z_l}{h_l^2}}\right)K(\cdot)\\ {\frac{\partial^2 K(\cdot)}{\partial z_l^2}}&=\left[{\frac{(z_{li}-z_l)^2-h_l^2}{h_l^4}}\right]K(\cdot)\\ {\frac{\partial^2 K(\cdot)}{\partial z_l \partial z_k}}&=\left({\frac{z_{li}-z_l} {h_l^2}}\right)\left({\frac{z_{ki}-z_k}{h_k^2}}\right)K(\cdot), \quad \forall l \neq k. \end{aligned} $$

Appendix 3

This appendix contains kernel density plots of the gradients of the IDF when alternative inputs are chosen as the numeraire input. (Figs. 5, 6)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, K. Constrained nonparametric estimation of input distance function. J Prod Anal 43, 85–97 (2015). https://doi.org/10.1007/s11123-013-0372-9

Download citation

Published: 23 November 2013
Issue Date: February 2015
DOI: https://doi.org/10.1007/s11123-013-0372-9

Keywords

JEL Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Constrained nonparametric estimation of input distance function

Abstract

Similar content being viewed by others

Global and local distance-based generalized linear models

On the estimation of spatial stochastic frontier models: an alternative skew-normal approach

Bayesian Nonparametric Spatially Smoothed Density Estimation

1 Introduction

2 Methodology