1 Introduction

One of the main objectives of productivity analysis is to estimate a representation of technology using econometrics or data envelopement analysis (DEA), among others. It is well-known that DEA assumes no functional form of a frontier, and linear programming allows one to impose linear constraints on each observation. However, when it comes to econometric methods, a functional form is usually required, and information (e.g., marginal cost/product) can be calculated after parameters are estimated. Researchers should then check if the estimated information follows theoretical properties of the technology. While using a large micro-level data set, however, it may well be the case that such properties are not satisfied for a subset of observations. In order to address this issue, Terrell (1996), Rambaldi and Doran (1997), Ryan and Wales (2000), and Henningsen and Henning (2009), among others, proposed methods of imposing various constraints on parametric functions. However, it is still a problem to impose these constraints on a nonparametric function.

In this paper, we provide solutions to the two above-mentioned problems of econometric methods, i.e., functional forms that are not flexible and theoretical constraints that are difficult to impose. To address the first problem, we propose estimating a technology without a parametric functional form via nonparametric kernel econometric methods. Thus, the functional form assumption is relaxed and the technology is estimated in a fully flexible manner. The price one has to pay for such flexibility, however, is many observations may violate properties of the technology. Therefore, it is desirable that constraints are imposed on the nonparametric regression function such that the properties are satisfied for each individual observation. To do this, we apply the new approach of constraint weighted bootstrapping (CWB) first introduced by Hall and Huang (2001) and further studied by Du et al. (2013) and Parmeter et al. (2013).

To explore the kernel and the CWB methods, we first need to specify a technology as our main focus. In this paper, we choose an input distance function (IDF) because although the functional form of an IDF is generally unknown (Färe et al. 1994), many past and more recent studies have sought to estimate parametric distance functions specifying a translog functional form and using ordinary least squares (OLS) to estimate the unknown parameters (Lovell et al. 1994; Grosskopf et al. 1997; Ray 2003; Cuesta et al. 2009, among others). Färe et al. (1985, 1994) and Coelli and Perelman (1999), among others, used DEA to estimate the distance function without specifying any functional form. This paper fills the gap by estimating the IDF using nonparametric econometric methods without any functional form assumption. Footnote 1 Furthermore, monotonicity constraints based on the properties of the IDF are imposed via CWB. Footnote 2 Our methodology extends to other representations of technology in a straightforward manner. As a by-product of the econometric estimation, the first and second order analytical derivatives of the nonparametric IDF are derived, and thus the elasticities measuring input substitutability/complementarity can be calculated from them.

As an empirical example, we apply the proposed methodology to a Norwegian forestry data set from Lien et al. (2007) compiled by Statistics Norway. Both the unconstrained and constrained nonparametric IDFs as well as the implied elasticities are estimated and results are compared. We find that without imposing constraints, 18.25 % observations violate one of the theoretical properties of the IDF. The Kolmogorov–Smirnov test shows that the gradients and the elasticities calculated from the constrained IDF significantly differs from those calculated from the unconstrained counterpart. Finally, we reported density plots of the estimated Antonelli, Morishima, and Symmetric elasticities of complementarity.

The rest of the paper is organized as follows. Section 2 describes the methodology of the constrained nonparametric econometric estimation methodology. Section 3 applies the methodology to a real data set. Section 4 discusses limitations and possible extensions of the current method, and Sect. 5 concludes the paper.

2 Methodology

2.1 Nonparametric estimation of a distance function via kernel methods

The distance function representation of a production technology, proposed by Shephard (1953, 1970) does not require any aggregation, prices, or behaviorial assumptions. Following Färe and Primont (1995), we first define the production technology of the firm using the input set, L(Y), which represents the set of all K inputs, \({X\in \mathbb{R}^{K}_{+}}\), which can produce the vector of Q outputs, \({Y\in \mathbb{R}^{Q}_{+}}\). That is:

$$ L(Y)=\{X\in {\mathbb{R}}^{K}_{+}: X\,\hbox{can produce}\,Y\}. $$
(1)

We assume that the technology satisfies the standard axiom of strong disposability. The IDF is then defined on the input set, L(Y), as:

$$ D(X,Y)=\max \{\rho: (X/\rho)\in L(Y)\}, $$
(2)

where ρ is the scalar distance by which the input vector can be deflated. D(XY) satisfies the following properties: (1) it is non-increasing in each output level; (2) it is non-decreasing in each input level; (3) it is homogeneous of degree 1 in X. Footnote 3 It is based on an input-saving approach and gives the maximum amount by which an input vector can be radially contracted while still being able to produce the same output vector. The IDF, D(XY), will take a value which is greater than or equal to one if the input vector, X, is an element of the feasible input set, L(Y). That is, D(X,Y) ≥ 1 if \(X\in L(Y)\). The distance function will take a value of unity if X is located on the inner boundary of the input set.

To empirically estimate the distance function, we first define:

$$ D \equiv A\cdot D(X,Y), $$
(3)

where A is the productivity parameter, and X and Y are the input and output vectors, respectively. Using property (3) of the IDF, viz., homogenous of degree 1 in X, we can write (3) as:

$$ D/X_1 = A \cdot D(\tilde{X}, Y), $$
(4)

where X 1 is the numeraire input, and \(\tilde{X}\) is a vector of input ratios, with elements \(\tilde{X}_{k}=X_{k}/X_{1}, \forall k=2,{\ldots}, K\). Taking the natural logarithm for both sides gives:

$$ \ln D - \ln X_1 = \ln A + \ln D(\tilde{X}, Y). $$
(5)

Letting D = 1 would give: Footnote 4

$$ \begin{aligned} -\ln X_1 &= \ln D(\tilde X, Y) + \ln A\\ &= \ln D(\exp(\ln\tilde{X}),\exp (\ln Y)) + \ln A\\ &\equiv m(\ln\tilde{X},\ln Y) + v, \end{aligned} $$
(6)

where \(v=\ln A\) is the noise term, interpreted as the natural logarithm of the productivity parameter. Footnote 5 Using general notation, (6) can be written as:

$$ {\mathcal{Y}}=m(z)+v, $$
(7)

where \(\mathcal{Y}=-\ln X_{1}\), \(m(\cdot)\) is the unknown smooth distance function, z is the vector of continuous variables (i.e., \(\ln \tilde{X}_{k}, \forall k=2,{\ldots}, K;\,\ln Y_{q}, \forall q=1,{\ldots},Q\)), and v is the random error uncorrelated with any element of z.

To estimate the unknown function, one can use the local-constant least-squares estimator of m(z) (see Li and Racine 2006 for more details), given by:

$$ \hat{m}(z)={\frac{\sum_{i=1}^{n} K({\frac{z_i-z}{h}}) {\mathcal{Y}}_i}{\sum_{i=1}^{n} K({\frac{z_i-z}{h}})}}, $$
(8)

where \(K(\cdot)\) is a (scalar) Gaussian product kernel weighting function for the continuous variables (see “Appendix 2” for an explicit expression); n denotes the sample size; h is a vector of bandwidth, with each element for a particular variable in the z vector.

Estimation of the bandwidths, h, is typically the most salient factor when performing nonparametric estimation. Although many selection methods exist, we utilize the data-driven least-squares cross validation (LSCV) method. Specifically, the bandwidths are chosen to minimize

$$ n^{-1} \sum_{i=1}^{n} [{\mathcal{Y}}_i - \hat{m}_{-i}(z_{i})]^{2}, $$
(9)

where \(\hat{m}_{-i}(\cdot) = {\frac{\sum_{j \neq i}^{n} K({\frac{z_j-z_i}{h}}) \mathcal{Y}_j}{\sum_{j \neq i}^{n} K({\frac{z_j-z_i}{h}})}}\) is the leave-one-out local-constant kernel estimator of \(m(\cdot)\). We use the npregbw function from the np package (Hayfield and Racine 2008) in R (R Development Core Team 2011) to estimate the bandwidth vector. This bandwidth vector is then plugged into (8) to estimate the IDF.

To calculate the derivatives of the distance function with respect to each input and output, (6) can be re-written as:

$$ \ln D = \ln X_{1} + m(\ln \tilde{X}_{2}, {\ldots}, \ln \tilde{X}_{K}, \ln Y_1, {\ldots}, \ln Y_Q) + v. $$
(10)

For example, the first partial derivatives of interest, that should be investigated to guarantee the monotonicity properties of the IDF, are:

$$ {\frac{\partial \ln D}{\partial \ln {X}_1}} = 1-\sum_{k=2}^K {\frac{\partial m}{\partial \ln \tilde{X_{k}}}}, $$
(11)
$$ {\frac{\partial \ln D}{\partial \ln {X}_k}} = {\frac{\partial \ln D}{\partial \ln \tilde{X}_k}} = {\frac{\partial m}{\partial \ln \tilde{X_{k}}}}, \quad \forall k=2,{\ldots},K, $$
(12)

and Footnote 6

$$ {\frac{\partial \ln D}{\partial \ln {Y}_q}} = {\frac{\partial m}{\partial \ln Y_q}}, \quad \forall q=1,{\ldots},Q, $$
(13)

where the first partial derivative of \(m(\cdot)\) with respect to a particular argument, say, \(z_l \in \{\ln \tilde{X}_{2}, {\ldots}, \ln \tilde{X}_{K}, \ln Y_1, {\ldots}, \ln Y_Q\}\), is

$$ {\frac{\partial m}{\partial z_l}} = \sum_{i=1}^n {\frac{\left({\frac{z_{li}-z_l}{h_l^2}}\right)K(\cdot)\sum_i K(\cdot) - K(\cdot)\sum_i \left[\left({\frac{z_{li}-z_l}{h_l^2}}\right) K(\cdot)\right]}{(\sum_i K(\cdot))^2}} \cdot (-\ln X_{1i}). $$
(14)

See “Appendix 2” for detailed derivation of the first and the second partial derivatives of \(m(\cdot)\) with respect to z l , and the cross partial derivatives with respect to z l and z k , ∀l ≠ k.

2.2 Imposition of regularity constraints

Recall that the IDF has the following theoretical properties:Footnote 7

$$ {\frac{\partial \ln D}{\partial \ln X_k}} \geq 0, \quad \forall k=1,{\ldots},K $$
(15)

and

$$ {\frac{\partial \ln D}{\partial \ln Y_q}} \leq 0, \quad \forall q=1,{\ldots},Q. $$
(16)

However, when it comes to empirical estimation, it is very likely for one to obtain violations of these properties for some individual observation. Most empirical researchers check these regularity conditions at the mean of the data instead of every data point, and report results evaluated at the mean. This practice defeats the purpose of using micro data. Results may not be of much use for policy analysis if the theoretical restrictions are violated for many individual producers. Instead of ignoring results that violate rationality, we use a new statistical method that imposes these economic constraints. We then calculate the gradients of the IDF and the elasticities based on that all these joint constraints are satisfied.

In order to impose such observation-specific inequality constraints, we follow the constraint weighted bootstrapping (CWB) method first proposed by Hall and Huang (2001) and further studied by Du et al. (2013) and Parmeter et al. (2013), whose idea is to transform the response variable by assigning observation-specific weights such that certain constraints in the model are satisfied. To illustrate this methodology, let \({\{\mathcal{Y}_{i},z_{i}\}^{n}_{i=1}}\) denote sample pairs of response and explanatory variables, where \(\mathcal{Y}_{i}\) is a scalar, Footnote 8 z i is of dimension (K + Q − 1), and n denotes the sample size. The goal is to estimate the conditional mean model \(\mathcal{Y}=m(z)+v\), subject to constraints on the first order gradient of the l-th element in zm l (z) = ∂m(z)/∂z l , where z l is the l-th element of the vector z, or on a linear combination of any of the first order gradients.

We can express the local-constant estimator as:

$$ \hat{m}(z)=n\cdot\sum^{n}_{i=1}A_{i}(z)n^{-1}{\mathcal{Y}}_{i}, $$
(17)

where \(A_{i}(z)={\frac{K({\frac{z_i-z}{h}})}{\sum_{i=1}^{n} K({\frac{z_i-z}{h}})}}, K(\cdot)\) and h are defined the same as in (8). Footnote 9 The first order gradient of the local-constant estimator, \(\hat m_l(z)\), can be expressed as:

$$ \hat{m}_l(z)=n\cdot\sum^{n}_{i=1}A_{i,l}(z)n^{-1}{\mathcal{Y}}_{i}, $$
(18)

where \(A_{i,l}(z)={\frac{\partial A_i(z)}{\partial z_l}}\) (see “Appendix 2” for an explicit expression of the derivative of A i (z) with respect to z l ). Therefore, a particular linear combination of these first order gradients is:

$$ 1-\sum_{l=1}^{K-1} \hat{m}_l(z) = 1 - \sum_{l=1}^{K-1} \left( n\cdot\sum^{n}_{i=1}A_{i,l}(z)n^{-1}{\mathcal{Y}}_{i} \right), $$
(19)

which is used to impose constraints in the form of (11).

To impose the monotonicity constraints, re-write (18) and (19) as:

$$ \hat{m}_l(z|p)=n\cdot\sum^{n}_{i=1}A_{i,l}(z)p_i{\mathcal{Y}}_{i}, \; \forall l=1,{\ldots},K+Q-1, \hbox{ and} $$
(20)
$$ 1-\sum_{l=1}^{K-1} \hat{m}_l(z|p)= 1- \sum_{l=1}^{K-1} \left( n\cdot\sum^{n}_{i=1}A_{i,l}(z)p_i{\mathcal{Y}}_{i} \right), $$
(21)

where p i is the weight for the ith observation of the response \(\mathcal{Y}\), and ∑ n i=1 p i  = 1. Footnote 10 If p i  = 1/n (i.e., uniform weights), then the constrained estimator will reduce to the unconstrained estimator.

The goal is to transform the response as little as possible through the weights such that the constraints are satisfied. The following is the weight selection criterion proposed by Du et al. (2013) and Parmeter et al. (2013):

$$ \begin{aligned} &p^{*} = \hbox{argmin} {\mathcal{D}}(p)=(p_{u}-p)^{\prime}(p_{u}-p)\\ &\quad\hbox{st}.\,{\bf l}(z) \leq \hat{m}_l(z \mid p) \leq {\bf u}(z), \quad \forall l=1,{{\ldots}}, K+Q-1, \hbox{ and}\\ &\quad\quad {\bf l}(z) \leq 1 - \sum_{l=1}^{K-1} \hat{m}_l(z \mid p) \leq {\bf u}(z), \end{aligned} $$
(22)

where p * is a vector of optimal weight for each response observation, \(\mathcal{D}(p)\) is an L 2 metric, p u is a vector of uniform weights (i.e., 1/n), which can also be viewed as an initial search condition, l(z) and u(z) represent observation-specific lower and upper bounds for \(\hat{m}_l(z \mid p)\), respectively. If l(z) = 0 and \({\bf u}(z)=+\infty\), then we can impose monotonically increasing constraints; if u(z) = 0 and \({\bf l}(z)=-\infty\), then we can impose monotonically decreasing constraints. The optimization problem (22) is a standard quadratic programming problem that can be numerically solved using the quadprog package (Berwin and Weingessel 2011) in R. The constrained estimator \(\hat m(z \mid p^*) = n \sum_{i=1}^n A_i(z) p_i^* \mathcal{Y}_{i}\) can then be calculated using the optimal weight for each observation, p * i . Following Du et al. (2013) and Parmeter et al. (2013), the same bandwidth vector estimated from the unconstrained IDF are used to estimate the constrained IDF, as the same A i (z) appears in both the unconstrained and the constrained estimator. The codes for estimating the constrained model are available from the author upon request.

2.3 Elasticities from the distance function

After the first, second, and cross partial derivatives of the IDF satisfying theoretical restrictions are estimated, three types of elasticities measuring input substitutability/complementarity can be computed without any additional information (Stern 2011), viz., the Antonelli elasticity of complementarity (AEC), the Morishima elasticity of complementarity (MEC), and the Symmetric elasticity of complementarity (SEC). The formulas are provided here for convenience:

$$ AEC_{kl} = {\frac{D \cdot D_{kl}}{D_k \cdot D_l}}, $$
(23)
$$ MEC_{kl} = {\frac{D_{kl}\cdot X_{l}}{D_{k}}}-{\frac{D_{ll}\cdot X_{l}}{D_{l}}}, $$
(24)

and

$$ SEC_{kl} = \frac{{\frac{-D_{kk}}{D_k D_k}}+2{\frac{D_{kl}}{D_k D_l}}-{\frac{D_{ll}}{D_l D_l}}}{\frac{1}{D_k X_k}+\frac{1}{D_l X_l}}, $$
(25)

\(\forall k,l=1,{\ldots},K\). D k  (D kk ) and D l  (D ll ) are the first (second) partial derivatives of the IDF with respect to the kth and lth input, respectively; and D kl are the cross partial derivatives with respect to the kth and lth input. Here both the AEC and the SEC are symmetric: they give the same elasticity estimate no matter what input causes a change. The MEC is not symmetric for the arguments made in Blackorby and Russell (1981). If the MEC > 0, then the two inputs are complements in the Morishima sense. Similar interpretation applies to the AEC and the SEC. For comparison purposes, in the application section, the elasticities are calculated from the IDF both with and without the theoretical restrictions.

3 Application

As an empirical illustration of the proposed methodology, we use a cross-sectional data set of 3,249 active forest owners (i.e., owners who harvest trees) for the year 2003 compiled by Statistics Norway. According to Statistics Norway, Footnote 11 the value added in Norwegian forestry was estimated at Norwegian Krone (NOK) 5.4 billion in 2011. Timber sale is the largest component in Norwegian forestry. From 2001 to 2011, 36 % of the forest properties sold timber. In addition, forest owners can also earn income from selling hunting and fishing rights, leasing out sites and renting out cabins. The Ministry of Agriculture and Food of Norway (2007) reported that approximately 88 % of the forest area is privately owned, and the majority of the forest holdings are farm and family forests. During the last 80 years, timber stock increased because the annual timber growth has been considerably faster than the annual harvest.

Since this data set has been used in Lien et al. (2007) in which a detailed description of the sampling method is available, a brief description of the data is given as follows. The output variable (Y) consists of annual timber sales from the forest, measured in cubic meters. The labor input variable (X 1) is the sum of hours worked by contractors and hours worked by the owner, his family or hired labor. The land input variable (X 2) measures the forest area to be cut in hectares. The capital input variable (X 3) is the amount of timber stock that can be cut without affecting future harvesting. Table 1 presents summary statistics in the sample.

Table 1 Summary statistics of the variables

The estimation results are given in Tables 2, 3 and Figs. 1, 2, 3, 4. Table 2 reports the estimated bandwidth vector and shows percentages of violations of the monotonicity properties of the IDF. Table 3 presents the Kolmogorov–Smirnov testing results for equality of distributions between the information estimated from the unconstrained and the constrained models. Figure 1 reports the histogram of observation-specific weights for imposing the monotonicity constraints. Figures 2, 3, 4 plot the kernel densities of the gradient and elasticity estimates under the unconstrained and the constrained models.

Table 2 Bandwidths and percentages of violations
Table 3 Testing for equality of distributions from the unconstrained and constrained models
Fig. 1
figure 1

Observation-specific weights

Fig. 2
figure 2

Kernel density plots of the gradients of the IDF: unconstrained versus constrained

It can be seen from Table 2 that, the bandwidth estimate for each regressor is small enough (i.e., less than twice the standard deviation of the corresponding regressor) to indicate nonlinearity of the regression function, hence the appropriateness of the nonparametric approach. Using these bandwidth estimates, although nearly no violation of economic theory occurs for the gradients of \(\ln X_1\) and \(\ln Y\), there are 18.25 and 7.08 % of violations for the gradients of \(\ln {X}_2\) and \(\ln {X}_3\), respectively. Footnote 12 This suggests that it should not be trivial to impose the economic constraints of (15) and (16).

The constraint weighted bootstrapping (CWB) method is then used to impose these constraints. Figure 1 plots the distribution of p * i , the optimal weight for each observation suggested by CWB. It can be seen that most observations share similar weights, and these optimal weights are quite close to the uniform weights, 1/n = 1/3,249 ≈ 3 × 10−4. After the dependent variable is transformed by these weights, we can then use them to estimate the gradients and the elasticities under the constraints.

Figure 2 shows the kernel density estimates for the unconstrained and constrained distributions of the four gradients, i.e., \(\partial \ln D/\partial \ln X_k, \forall k=1,2,3\), and \(\partial \ln D/\partial \ln Y\), on which for each observation, a non-negativity constraint is imposed on the first three gradients, and a non-positivity constraint is imposed on the last one. A vertical line is drawn at zero and the shaded area highlights where the violations occur. The densities for the constrained gradients are plotted using the Silverman reflection method for boundary correction such that the estimated densities integrate to one. It can be seen that there are masses near zero with the constrained model for the gradients with many violations. Although very few violations are observed for the gradient of \(\ln X_1\), the unconstrained and constrained distributions of it are not close to each other. This is because the gradient of the first input is essentially a linear combination of the gradients of the other two inputs, whose non-trivial amount of violations affect the constrained gradient of the first input.

We also plot the kernel density estimates of the elasticities under the unconstrained and constrained models, as Figs. 3 and 4 illustrate. It can be seen that, in most cases, the constrained elasticities have smaller variation than their unconstrained counterparts. This suggests that estimating an IDF satisfying its properties may improve the efficiency of the elasticity estimates from the IDF. A vertical line is drawn at zero for a better view of the percentage of observations whose inputs are substitutes/complements. For example, if the MEC between any two inputs is greater (less) than zero, then these inputs are Morishima complements (substitutes). However, for both Figures, the elasticity estimates from the unconstrained and constrained models seem to be quite close to each other. To convince the reader that significant differences exist between the distributions, Table 3 reports the p values from the Kolmogorov–Smirnov test for equality of distributions for the gradients and the elasticities estimated from the unconstrained and constrained models. It can be seen that the null of equality of distributions is rejected at the 5 % level in all cases except the AEC estimates between labor and capital.

Fig. 3
figure 3

Kernel density plots of the nonparametric MEC: unconstrained versus constrained

Fig. 4
figure 4

Kernel density plots of the nonparametric AEC and SEC: unconstrained versus constrained

4 Discussion

4.1 Choice of numeraire input

When it comes to estimating the IDF with some parametric functional form, e.g., Cobb–Douglas or Translog, the choice of the normalizing input does not affect the results. However, this is generally not the case for nonparametric estimation of the IDF. Footnote 13 The empirical example chooses labor input as the numeraire input when estimating the IDF. This assumes that the labor input is endogenous. In order to see how the results change when other inputs (i.e., land or capital) are used for normalization, we simply report the unconstrained and the constrained gradient estimates in Figs. 5 and 6 in “Appendix 3”, which uses land and capital input as the numeraire, respectively. The elasticity plots are omitted to save space. Although it is recommended that researchers choose an input that is endogenous a priori, the question of how to find the most appropriate numeraire may be answered in future research.

Fig. 5
figure 5

Kernel density plots of the gradients of the IDF: unconstrained versus constrained. Land input is chosen as the numeraire input

Fig. 6
figure 6

Kernel density plots of the gradients of the IDF: unconstrained versus constrained. Capital input is chosen as the numeraire input

4.2 Extensions to other representations of technology

The estimation procedure in Sect. 2 can be easily extended to other specifications of technology. We provide two examples here: a production and a cost function.

The production function can be written as \(y=B\cdot F(X)\), where y is a scalar output, B is the productivity parameter, \(F(\cdot)\) is an unknown function, and X is an input vector. Applying log transformation similar to (6) gives an estimable production function: \(\ln y=f(\ln X) + u\), where \(u=\ln B\) is the error term. The unknown function \(f(\cdot)\) can be estimated nonparametrically via kernel methods. One can also impose some desirable constraints on \(f(\cdot)\), e.g., \(\partial \ln y / \partial \ln X \geq 0\) as a non-negative marginal product constraint.

For the cost function, it can be written as C = λC(WY), where C is the total cost, λ is the productivity parameter, \(C(\cdot)\) is an unknown function, and W and Y are input price and output vectors, respectively. Applying the log transformation and imposing the homogeneity of degree one restriction in input price gives an estimable cost function: \(\ln \tilde C=c(\ln \tilde W, \ln Y) + u\), where \(\tilde{C}\) is the total cost divided by a numeraire input price, \(\tilde{W}\) is the input price vector divided by the numeraire input price, and \(u=\ln \lambda\) is the error term. The cost function \(c(\cdot)\) can be estimated using kernel methods, and some necessary regularity constraints can be imposed on it such that (1) cost shares are constrained between zero and one: \(0 \leq \partial \ln \tilde C/ \partial \ln \tilde W \leq 1\), and (2) marginal cost is non-negative: \(\partial \ln \tilde C / \partial \ln Y \geq 0\).

After the regression functions are estimated under the constraints, different elasticities can then be calculated subject to the specification of choice and data availability. See Stern (2011) for a classification scheme of different definitions of elasticities based on primal and dual representations of technology.

4.3 Possibly endogenous regressors

It is well known that estimation of production/cost/distance functions may subject to the endogeneity problem that causes estimation results to be biased and inconsistent. Unfortunately, the nonparametric instrumental variable (IV) estimation is a quite young field—see Su and Ullah (2008) for a three-step estimation procedure for nonparametric simultaneous equations models via kernel methods, and Newey and Powell (2003) for IV estimation via series approximation, among others. It is unclear whether the CWB procedure can be seamlessly applied to the nonparametric structural models, which may be saved for future research.

5 Conclusion

This paper uses econometric methods to estimate an input distance function (IDF) without functional form assumptions, and imposes economic properties of the IDF on the estimated regression function via constraint weighted bootstrapping (CWB). As a by-product, the first, second, and cross partial analytical derivatives of the estimated IDF are derived, and thus various elasticities can be computed. Applying the proposed method to a cross-section of Norwegian forest owners, we find that imposing the constraints eliminates the problem of economic violations in empirical work, and therefore policy implications may be more reliable. The proposed method can be extended to other representations of technology in a straightforward manner, and this opens the door for further empirical work to estimate models subject to economic theory. As a future research topic, more work should be done on the unification of CWB and the nonparametric structural modeling approach.