1 Introduction

Since the pioneering work of Aigner et al. (1977), stochastic frontier (SF) analysis has been widely used in productivity and efficiency studies to describe and estimate models of the production frontier. The empirical model typically assumes that a decision-making unit (DMU) employs a single production process or technology in the production of an output using multi-inputs. However, if an organization (a DMU) operates multiple production divisions (sub-DMUs), with each division supported by its own set of resource inputs, these sub-DMUs may be subject to the same random shocks as the parent DMU. Given that these divisions would share some commonly observed or unobserved characteristics of the parent DMU, the divisions’ technical efficiencies may well be correlated. A system of multiple stochastic frontier regressions on the sub-DMUs will be a more appropriate model to investigate a DMU’s operation and performance. Since the system estimation takes into account the mutual dependency among the composite errors, the estimator is more efficient than that from the regression-by-regression estimation.

Previously, Lai and Huang (2013) discussed the estimation of a system of stochastic frontier models for cross-section data. When sub-DMUs are observed over time, some unobserved heterogeneity may exist and a model that can capture the panel characteristics will be able to provide more effective estimation and also a better prediction of the inefficiency. The unobserved heterogeneity for the firm level panel data can be incorporated into the model in several ways. For instance, one can introduce the heterogeneity through fixed effects, random effects or heterogeneous variance in the symmetric/one-sided random component. In the model considered in this paper, we assume the firm heterogeneity comes from the random effects as well as heterogeneous variance of the inefficiency. The assumption of random effects can be easily extended to the fixed effects under Mundlak (1978)’s assumption.

There are several applications of the joint estimation of the system of equations. For instance, Huang et al. (2018b) consider a model with two equations, one for cost efficiency and one for market power, where each equation has a composed error. Similarly, Genius et al. (2012) use a system of input demands with composite errors. Other empirical examples include Huang et al. (2017a, 2017b), and Huang et al. (2018a). See also Amsler and Schmidt (2021) for a systematic review. The common feature of these empirical studies is that they all use the cross-sectional approach even though some of them employ panel data. Accordingly, the main objective of this paper is to extend the model of Lai and Huang (2013) so that the model can capture the unobserved panel characteristics of the multiple SF regression models.

The plan of the paper is as follows. In Sect. 2, we introduce a system of panel stochastic frontier regressions, where the composite errors are correlated. Section 3 discusses how to use the simulated maximum likelihood approach to estimate the model. We then discuss how to estimate the inefficiency using the simulated approach in Sect. 4. We examine the finite sample performance of the proposed estimator by Monte Carlo simulation in Sect. 5. An empirical study of the hotel industry in Taiwan is given in Sect. 6, and a summary conclusion is given in Sect. 7.

2 The seemingly unrelated SF panel model

Consider the following production frontiers of J sub-DMUs:

$$\begin{aligned} y_{it}^{1}&=\beta _{0}^{1}+x_{it}^{1\prime }\beta _{1}+\alpha _{i}^{1} +v_{it}^{1}-u_{it}^{1},\nonumber \\ y_{it}^{2}&=\beta _{0}^{2}+x_{it}^{2\prime }\beta _{2}+\alpha _{i}^{2} +v_{it}^{2}-u_{it}^{2},\nonumber \\&\vdots \nonumber \\ y_{it}^{J}&=\beta _{0}^{J}+x_{it}^{J\prime }\beta _{J}+\alpha _{i}^{J} +v_{it}^{J}-u_{it}^{J}, \end{aligned}$$
(1)

where \(j=1,\ldots ,J\), \(i=1,\ldots ,N\), and \(t=1,\ldots ,T\). \(y_{it}^{j}\) and \(x_{it}^{j}\) are the log output and the log inputs of the jth sub-DMU. In order to implement the maximum likelihood approach to estimate the model, we make the following assumptions about the random components:

  1. [A1]:

    \(\alpha _{i}^{j} \sim N(0,\sigma _{\alpha j}^{2})\) is the firm-specific random effect. For a fixed i, \(\alpha _{i}^{j}\) and \(\alpha _{i}^{j^{\prime }}\) are independent to each other for \(j\ne j^{\prime }\).

  2. [A2]:

    \(v_{it}^{j}\sim N(0,\sigma _{vj}^{2})\) is a two-sided symmetric random noise. For fixed i and j, \(v_{it}^{j}\) and \(v_{is}^{j^{\prime }}\) are independent across time. For fixed i and t, \(v_{it}^{j}\) and \(v_{it}^{j^{\prime }}\) are correlated for \(j\ne j^{\prime }\).

  3. [A3]:

    \(u_{it}^{j}\sim N^{+}(0,\sigma _{uj,it}^{2})\) is a one-sided random component that captures the inefficiency and its variance. Moreover, \(\sigma _{uj,it}\) can be parametrized as \(\sigma _{uj,it}=\exp \left( \delta _{j}^{\prime }w_{it}^{j}\right) \), where \(w_{it}^{j}\) is the vector of exogenous determinants of inefficiency \(u_{it}^{j}\). For fixed i and j, \(u_{it}^{j}\) and \(u_{is}^{j^{\prime }}\) are independent across time. For fixed i and t, \(u_{it}^{j}\) and \(u_{it}^{j^{\prime }}\) are correlated for \(j\ne j^{\prime }\).

  4. [A4]:

    The three random components \(\alpha _{i}^{j},v_{it}^{j}\) and \(u_{it}^{j}\) are independent to each other and uncorrelated with \(x_{it}\) for fixed i, t and j.

Although we assume the random effects across divisions are independent in assumption [A1], we will discuss in Sect. 3.2 that the assumption can be further extended to include correlated random effects. Assumption [A2] suggests that the \(v_{it}^{j}\)’s of different divisions at the same time are correlated with each other. The same scenario applies to, the \(u_{it}^{j}\)’s by assumption [A3]. Now, let us define \(e_{it}^{j}=\alpha _{i}^{j}+v_{it} ^{j}-u_{it}^{j}\) as the composite error of the jth equation, then it follows from [A1]–[A3] that \(e_{it}^{j}\) and \(e_{it}^{j^{\prime }}\) will be correlated. For simplicity, we also define \(\varepsilon _{it}^{j}=v_{it}^{j}-u_{it}^{j}\), which is the composite error containing all time-variant random components, so that \(e_{it}^{j}=\alpha _{i}^{j}+\varepsilon _{it}^{j}\). There are two main distinctions between the model given in (1) and the SF model discussed in Lai and Huang (2013). One is that we extend the cross-section model to the panel model with random firm effects, and the other is that we allow for heteroscedasticity in the distribution of inefficiency.

3 Copulas and the simulated likelihood function

To motivate the simulated likelihood approach, we first discuss the model without random firm effects and then discuss the model with random effects.

3.1 The model without random effects

Consider the special case of the system of equations (1), where \(\alpha _{i}^{j}=0\) for all i and j. Let \(\theta _{j}=(\beta _{0}^{j} ,\beta _{j}^{\prime },\sigma _{vj}^{2},\delta _{j}^{\prime })^{\prime }\) be a vector of parameters in the jth SF regression and \(\varepsilon _{it}^{j}=v_{it} ^{j}-u_{it}^{j}\) denote the composite error of the jth SF regression. In this model, the only source of heterogeneity is the heteroscedastic variance of \(u_{it}^{j}\) and the correlation between equations comes from \(v_{it}^{j}\) and \(v_{it}^{j^{\prime }}\) and/or \(u_{it}^{j}\) and \(u_{it}^{j^{\prime }}\).

Let \(F_{\varepsilon ^{j}}(\varepsilon _{it}^{j};\theta _{j})\) and \(f_{\varepsilon ^{j}}(\varepsilon _{it}^{j})\) denote the cumulative distribution function (cdf) and probability density function (pdf) of \(\varepsilon _{it}^{j}\), respectively. When the distribution function of u is half normal, i.e., \(N^{+}(0,\sigma _{uj}^{2})\), the pdf of the associated composite errors \(\varepsilon _{it}^{j}\) is

$$\begin{aligned} f_{\varepsilon ^{j}}(\varepsilon _{it}^{j})=\frac{2}{\sigma _{j}}\phi \left( \frac{\varepsilon _{it}^{j}}{\sigma _{j}}\right) \Phi \left( -\frac{\lambda _{j}}{\sigma _{j}}\varepsilon _{it}^{j}\right) , \end{aligned}$$
(2)

where \(\phi (\cdot )\) represents a standard normal density function, \(\sigma _{j}=\sqrt{\sigma _{vj}^{2}+\sigma _{uj,it}^{2}}\) and \(\lambda _{j} =\sigma _{uj,it}^{2}/\sigma _{vj}\).Footnote 1 It can be shown that \(\varepsilon _{it}^{j}\) follows a closed skew normal distribution, i.e.,

$$\begin{aligned} \varepsilon _{it}^{j}\sim \mathrm{CSN}_{1,1}\left( 0,\sigma _{j}^{2},-\frac{\lambda _{j}^{2}}{1+\lambda _{j}^{2}},0,\frac{\lambda _{j}^{2}\sigma _{j}^{2}}{\left( 1+\lambda _{j}^{2}\right) ^{2}}\right) , \end{aligned}$$
(3)

which has the cdf

$$\begin{aligned} F_{\varepsilon ^{j}}(\varepsilon _{it}^{j})=2\cdot \Phi _{2}\left( \left( \begin{array}[c]{c} \varepsilon _{it}^{j}\\ 0 \end{array} \right) ;\left( \begin{array}[c]{c} 0\\ 0 \end{array} \right) ,\left( \begin{array}[c]{cc} \sigma _{j}^{2} &{} \frac{\lambda _{j}^{2}\sigma _{j}^{2}}{1+\lambda _{j}^{2}}\\[5mm] \frac{\lambda _{j}^{2}\sigma _{j}^{2}}{1+\lambda _{j}^{2}} &{} \frac{\lambda _{j}^{2}\sigma _{j}^{2}}{1+\lambda _{j}^{2}} \end{array} \right) \right) , \end{aligned}$$
(4)

where \(\Phi _2\left( \cdot ;\mu ,\Sigma \right) \) denotes the cdf of the bivariate normal distribution with mean \(\mu \) and variance \(\Sigma \).

For simplicity, we let \(\varepsilon _{it}=\left( \varepsilon _{it} ^{1},\ldots ,\varepsilon _{it}^{J}\right) ^{\prime }\) be a \(J\times 1\) vector and \(\varepsilon _{i.}=\left( \varepsilon _{i1}^{\prime },\ldots ,\varepsilon _{iT}^{\prime }\right) ^{\prime }\) be a \(JT\times 1\) vector. According to the result of Sklar’s theorem (1959) and Schweizer and Sklar (1983), the joint cdf of \(\varepsilon _{it}\) can be represented as a copula function of its own one-dimensional margins. More specifically,

$$\begin{aligned} F_{\varepsilon }\left( \varepsilon _{it}^{1},\ldots ,\varepsilon _{it}^{J}\right) =C\left( F_{\varepsilon ^{1}}\left( \varepsilon _{it}^{1}\right) ,\ldots ,F_{\varepsilon ^{J}}\left( \varepsilon _{it}^{J}\right) ;R\right) , \end{aligned}$$
(5)

where R is the vector of parameters of the copula function. The dependence of the marginal distributions is captured by the copula function. The corresponding copula density is

$$\begin{aligned} c\left( F_{\varepsilon ^{1}}\left( \varepsilon _{it}^{1}\right) ,\ldots ,F_{\varepsilon ^{J}}\left( \varepsilon _{it}^{J}\right) ;R\right) =\frac{\partial ^{J}C\left( F_{\varepsilon ^{1}}\left( \varepsilon _{it} ^{1}\right) ,\ldots ,F_{\varepsilon ^{J}}\left( \varepsilon _{it}^{J}\right) ;R\right) }{\partial F_{\varepsilon ^{1}}\left( \varepsilon _{it}^{1}\right) \ldots \partial F_{\varepsilon ^{J}}\left( \varepsilon _{it}^{J}\right) }. \end{aligned}$$

Therefore, it follows from (8) that the joint pdf of \(\varepsilon _{it}\) can be represented as

$$\begin{aligned} f_{\varepsilon }\left( \varepsilon _{it}^{1},\ldots ,\varepsilon _{it}^{J}\right) =c\left( F_{\varepsilon ^{1}}\left( \varepsilon _{it}^{1}\right) ,\ldots ,F_{\varepsilon ^{J}}\left( \varepsilon _{it}^{J}\right) ;R\right) \cdot \prod \nolimits _{j=1}^{J}f_{\varepsilon ^{J}}\left( \varepsilon _{it} ^{J}\right) . \end{aligned}$$
(6)

One special copula is the independent (or product) copula, which is defined as

$$\begin{aligned} C\left( \zeta _{it}^{1},\ldots ,\zeta _{it}^{J}\right) =\prod \nolimits _{j=1} ^{J}\zeta _{it}^{j}, \end{aligned}$$

where \(\zeta _{it}^{j}=F_{\varepsilon ^{j}}\left( \varepsilon _{it}^{j}\right) \). The corresponding copula density is

$$\begin{aligned} c\left( \zeta _{it}^{1},\ldots ,\zeta _{it}^{J}\right) =1. \end{aligned}$$

Therefore, Eq. (6) suggests that \(f_{\varepsilon }\left( \varepsilon _{it}^{1},\ldots ,\varepsilon _{it}^{J}\right) =\prod \nolimits _{j=1} ^{J}f_{\varepsilon ^{j}}\left( \varepsilon _{it}^{j}\right) \) when the independent copula is used. It implies \(\varepsilon _{it}^{j\prime }s\) are independent to each other.

Moreover, if the Gaussian copula is assumed, then

$$\begin{aligned} C\left( F_{\varepsilon ^{1}}\left( \varepsilon _{it}^{1}\right) ,\ldots ,F_{\varepsilon ^{J}}\left( \varepsilon _{it}^{J}\right) ;R\right) =\Phi _{R}\left( \Phi ^{-1}\left( F_{\varepsilon ^{1}}\left( \varepsilon _{it}^{1}\right) \right) ,\ldots ,\Phi ^{-1}\left( F_{\varepsilon ^{J}}\left( \varepsilon _{it}^{J}\right) \right) \right) , \end{aligned}$$

where \(\Phi _{R}(\cdot )\) denotes the standardized multivariate normal distribution with the correlation matrix R. The corresponding copula densityFootnote 2 is

$$\begin{aligned} c\left( \varsigma _{it}^{1},\ldots ,\varsigma _{it}^{J}\right) =\frac{1}{\left| R\right| ^{1/2}}\exp \left( -\frac{1}{2}\varsigma _{i.}^{\prime }\left( R^{-1}-I\right) \varsigma _{i.}\right) , \end{aligned}$$
(7)

where \(\varsigma _{i.}=(\varsigma _{it}^{1},\ldots ,\varsigma _{it}^{J})^{\prime }\) and \(\varsigma _{it}^{j}=\Phi ^{-1}\left( F_{\varepsilon ^{j}}\left( \varepsilon _{it}^{j}\right) \right) \). Therefore, the joint pdf of \(\varepsilon _{it}^{1},\ldots ,\varepsilon _{it}^{J}\) is

$$\begin{aligned} f_{\varepsilon }\left( \varepsilon _{it}^{1},\ldots ,\varepsilon _{it}^{J}\right)= & {} f_{\varepsilon }\left( \varepsilon _{it}\right) =c\left( \Phi ^{-1}\left( F_{\varepsilon ^{1}}\left( \varepsilon _{it}^{1}\right) \right) ,\ldots ,\Phi ^{-1}\left( F_{\varepsilon ^{J}}\left( \varepsilon _{it}^{J}\right) \right) ;R\right) \nonumber \\&\times \,{\displaystyle \prod \nolimits _{j=1}^{J}} f_{\varepsilon ^{j}}\left( \varepsilon _{it}^{j}\right) . \end{aligned}$$
(8)

The log-likelihood function is

$$\begin{aligned} \ln L(\theta )=\sum _{i=1}^{N}\sum _{t=1}^{T}\ln f_{\varepsilon }\left( \varepsilon _{it}^{1},\ldots ,\varepsilon _{it}^{J}\right) . \end{aligned}$$
(9)

Let \(\theta =\left( \theta _{1}^{\prime },\ldots ,\theta _{J}^{\prime }\right) ^{\prime }\) and \(\Theta \) be the parameter space, then the maximum likelihood (ML) estimator of \(\theta \) is defined as

$$\begin{aligned} {\widehat{\theta }}_{\text {ML}}=\arg \underset{\theta \in \Theta }{\text {Max}}\ln L(\theta ). \end{aligned}$$
(10)

Since there are no random firm effects, this model is a pooled system of regressions. Empirically, the true copula is in general unknown, so the sandwich formula is suggested to used for computing the standard errors of the ML estimator.

Note that the current model has a very similar setting with the model discussed in Amsler et al. (2014, APS hereafter). The main difference between them is that the APS model has a single equation, i.e., \(J=1\), which is a special case of our model. Moreover, they use the copula function to model time dependence instead of the correlation between the multi-equations for the J divisions of firm i at time t. Although in principal one can always estimate the model in (1) using equation-by-equation estimation, the estimators of the model parameters are consistent but inefficient due to ignorance of the correlation across equations. Empirically, it is not clear how the random components, such as \(v_{it}^{j}\) and \(v_{it}^{j^{\prime }}\) and/or \(u_{it}^{j}\) and \(u_{it}^{j^{\prime }}\), are correlated. With the copula framework, we can focus on modeling the marginal distributions of each single equation and the dependence between the marginal distributions of the composite errors can be captured by a copula function. Compared with other existing approaches, such as directly specifying the joint distribution of the composite errors, using a copula is relatively easy from the practical point of view.

3.2 The model with random effects

In this section, we discuss estimation of the system of equations (1) with random firm effects. Let \(\theta _{j}=(\beta _{0}^{j},\beta _{j}^{\prime },\sigma _{\alpha j}^{2},\sigma _{vj}^{2},\delta _{j}^{\prime })^{\prime }\) denote the vector of the parameters in the jth division. In the system, the outputs \(y_{i1}^{j},y_{i2}^{j},\ldots ,y_{iT}^{j}\) within the same division are correlated due to their common component \(\alpha _{i}^{j}\). Under assumption [A1], the \(\alpha _{i}^{j}\)’s are independent to each other.

To take into account the correlation between the J divisions, let \(\alpha _{i}=\left( \alpha _{i}^{1},\ldots ,\alpha _{i}^{J}\right) ^{\prime }\) be the \(J\times 1\) vector of random effects and let \(\varepsilon _{it}=\left( \varepsilon _{it}^{1},\ldots ,\varepsilon _{it}^{J}\right) ^{\prime }\) be defined as before. Recall that \(e_{it}^{j}=\alpha _{i}^{j}+v_{it}^{j}-u_{it}^{j}\), so we denote \(e_{it}=\left( e_{it}^{1},\ldots ,e_{it}^{J}\right) ^{\prime }\) as a \(J\times 1\) vector and \(e_{i.}=\left( e_{i1}^{\prime },\ldots ,e_{iT}^{\prime }\right) ^{\prime }\) as a \(JT\times 1\) vector. The joint pdf of \(e_{it}\) in the system SF regression can be evaluated using the simulated approach.Footnote 3 It is worth mentioning that \(e_{i1}^{j},\ldots ,e_{iT}^{j}\) are correlated with each other due to the common component \(\alpha _{i}^{j}\), but they are conditionally independent if \(\alpha _{i}^{j}\) is known. In other words,

$$\begin{aligned} f_{e}\left( e_{it}\right) =\int f_{e|\alpha }(e_{it}^{1},\ldots ,e_{it} ^{J}|\alpha _{i})f_{\alpha }(\alpha _{i})\mathrm{d}\alpha _{i}, \end{aligned}$$
(11)

where \(f_{e|\alpha }(e_{it}^{1},\ldots ,e_{it}^{J}|\alpha _{i})=f_{\varepsilon }(\varepsilon _{it}^{1},\ldots ,\varepsilon _{it}^{J})\).

The above result suggests that \(f_{e}\left( e_{it}\right) \) can be evaluated by the simulated joint pdf

$$\begin{aligned} f_{e}^{s}\left( e_{it}\right) =\frac{1}{M} {\displaystyle \sum \nolimits _{m=1}^{M}} f_{\varepsilon }(\varepsilon _{it(m)}), \end{aligned}$$
(12)

where \(\varepsilon _{it(m)}=\left( \varepsilon _{it(m)}^{1},\ldots ,\varepsilon _{it(m)}^{J}\right) ^{\prime }\) is a \(J\times 1\) vector, \(\varepsilon _{it(m)}^{j}=e_{it}^{j}-\alpha _{i(m)}^{j}\) and \(\alpha _{i(m)}^{j}\) denotes the mth draw from the distribution of \(\alpha _{i}^{j}\). The superscript s of \(f_{e}^{s}\left( e_{it}\right) \) denotes the simulated density. Therefore, it follows from Eqs. (8) and (12) that the simulated joint pdf \(f_{e}^{s}\left( e_{it}\right) \) can be evaluated by

$$\begin{aligned} f_{e}^{s}\left( e_{it}\right) =\frac{1}{M} {\displaystyle \sum \nolimits _{m=1}^{M}} \left[ c\left( F_{\varepsilon ^{1}}\left( \varepsilon _{it(m)}^{1}\right) ,\ldots ,F_{\varepsilon ^{J}}\left( \varepsilon _{it(m)}^{J}\right) ;R\right) \times {\displaystyle \prod \nolimits _{j=1}^{J}} f_{\varepsilon ^{j}}^{s}\left( \varepsilon _{it(m)}^{j}\right) \right] . \end{aligned}$$
(13)

For the special case when \(J=1\), Eq. (11) degenerates to the single equation case

$$\begin{aligned} f_{e^{j}}(e_{it}^{j})=\int f_{e^{j}|\alpha ^{j}}(e_{it}^{j}|\alpha _{i} ^{j})f_{\alpha ^{j}}(\alpha _{i}^{j})\mathrm{d}\alpha _{i}^{j}, \end{aligned}$$
(14)

which can be evaluated by the simulated density

$$\begin{aligned} f_{e^{j}}^{s}(e_{it}^{j})&=\frac{1}{M} {\displaystyle \sum \nolimits _{m=1}^{M}} f_{\varepsilon ^{j}}(e_{it}^{j}-\alpha _{i(m)}^{j}) \end{aligned}$$
(15)
$$\begin{aligned}&=\frac{1}{M} {\displaystyle \sum \nolimits _{m=1}^{M}} f_{\varepsilon ^{j}}(\varepsilon _{it(m)}^{j}), \end{aligned}$$
(16)

where \(\alpha _{i(m)}^{j}\) denotes the mth Halton draw from \(\alpha _{i}^{j\prime }\)s distribution \(N(0,\sigma _{\alpha j}^{2})\).

Now, let us consider the \(JT\times 1\) random vector \(e_{i.}=\left( e_{i1}^{\prime },\ldots ,e_{iT}^{\prime }\right) ^{\prime }\). The joint pdf of \(e_{i.}\) conditional on \(\alpha _{i}\) is

$$\begin{aligned} f_{e|\alpha }(e_{i.}|\alpha _{i})&=f_{e|\alpha }\left( e_{i1},\ldots ,e_{iT} |\alpha _{i}\right) \\&=f_{\varepsilon }\left( \varepsilon _{i.}\right) \\&= {\displaystyle \prod \nolimits _{t=1}^{T}} f_{\varepsilon }(\varepsilon _{it})\\&= {\displaystyle \prod \nolimits _{t=1}^{T}} f_{\varepsilon }\left( \varepsilon _{it}^{1},\ldots ,\varepsilon _{it}^{J}\right) . \end{aligned}$$

The third equality is due to that \(e_{i1},\ldots ,e_{iT}\) are conditionally independent given \(\alpha _{i}\). It follows from (13) that the joint pdf of \(e_{i.}\) can be approximated by the simulated density

$$\begin{aligned} f_{e}^{s}(e_{i.})&=\int f_{e|\alpha }^{s}(e_{i.}|\alpha _{i})f_{\alpha }(\alpha _{i})\mathrm{d}\alpha _{i}\nonumber \\&=\frac{1}{M} {\displaystyle \sum \nolimits _{m=1}^{M}} \left[ {\displaystyle \prod \nolimits _{t=1}^{T}} f_{\varepsilon }(\varepsilon _{it(m)}^{1},\ldots ,\varepsilon _{it(m)}^{J})\right] \nonumber \\&=\frac{1}{M} {\displaystyle \sum \nolimits _{m=1}^{M}} {\displaystyle \prod \nolimits _{t=1}^{T}} \left[ c\left( F_{\varepsilon ^{1}}\left( \varepsilon _{it(m)}^{1}\right) ,\ldots ,F_{\varepsilon ^{J}}\left( \varepsilon _{it(m)}^{J}\right) ;R\right) \times {\displaystyle \prod \nolimits _{j=1}^{J}} f_{\varepsilon ^{j}}^{s}\left( \varepsilon _{it(m)}^{j}\right) \right] . \end{aligned}$$
(17)

The logarithm of the simulated likelihood function of the SF system is

$$\begin{aligned}&\ln L^{s}\left( \theta _{1},\ldots ,\theta _{J},R\right) \nonumber \\&\quad = {\displaystyle \sum \nolimits _{i=1}^{N}} \ln f_{e}^{s}(e_{i.})\nonumber \\&\quad = {\displaystyle \sum \nolimits _{i=1}^{N}} \ln \left\{ \frac{1}{M} {\displaystyle \sum \nolimits _{m=1}^{M}} {\displaystyle \prod \nolimits _{t=1}^{T}} \left[ \begin{array} [c]{c} c\left( F_{\varepsilon ^{1}}\left( \varepsilon _{it(m)}^{1}\right) ,\ldots ,F_{\varepsilon J}\left( \varepsilon _{it(m)}^{J}\right) ;R\right) \\ \times {\displaystyle \prod \nolimits _{j=1}^{J}} f_{\varepsilon ^{j}}^{s}\left( \varepsilon _{it(m)}^{j}\right) \end{array} \right] \right\} . \end{aligned}$$
(18)

Let \(\theta =\left( \theta _{1}^{\prime },\ldots ,\theta _{J}^{\prime },R\right) ^{\prime }\) be the set of all parameters and let \(\Theta \) denote the parameter space, then the maximum simulated likelihood (MSL) estimator of \(\theta \) is defined as

$$\begin{aligned} {\widehat{\theta }}_{\text {MSL}}=\text {arg }\underset{\theta \in \Theta }{\text {Max}}\,\ln L^{s}\left( \theta \right) . \end{aligned}$$
(19)

Similar to the ML estimator in Sect. 3.1, the sandwich formula is suggested to used for computing the standard errors of the MSL estimator.

Note that in the above discussion we assumes the random effects \(\alpha _{i}^{j}\)’s are independent to each other. In other words, in the MSL estimation we assume \(f_{\alpha }(\alpha _{i})= {\displaystyle \prod \nolimits _{j=1}^{J}} f_{\alpha ^{j}}(\alpha _{i}^{j})\) in Eq. (11) and draw \(\alpha _{i}^{j}\) for each j and i independently. We may generalize our above discussion by extending assumption [A1] to allow correlated random effects. We now focus on the marginal distribution of \(e_{it}^{j}\), \(f_{e^{j}}(e_{it} ^{j})\). Recall that it can be represented as\(\ f_{e^{j}}(e_{it}^{j})=\int f_{e^{j}|\alpha ^{j}}(e_{it}^{j}|\alpha _{i}^{j})f_{\alpha ^{j}}(\alpha _{i} ^{j})\mathrm{d}\alpha _{i}^{j}\) whether \(\alpha _{i}^{j}\) and \(\alpha _{i}^{j^{\prime }}\) are correlated with each other or not. Therefore, the marginal pdf \(f_{e^{j} }(e_{it}^{j})\) can be evaluated by the simulated pdf \(f_{e}^{s}\left( e_{it}\right) =\frac{1}{M} {\displaystyle \sum \nolimits _{m=1}^{M}} f_{\varepsilon }(e_{it}^{j}-\alpha _{i(m)}^{j})=\frac{1}{M} {\displaystyle \sum \nolimits _{m=1}^{M}} f_{\varepsilon }\left( \varepsilon _{it(m)}\right) \).

According to Sklar’s theorem, the joint pdf of \(e_{it}=\left( e_{it} ^{1},\ldots ,e_{it}^{J}\right) ^{\prime }\) can be represented as

$$\begin{aligned} f_{e}\left( e_{it}\right) =c\left( F_{e^{1}}\left( e_{it}^{1}\right) ,\ldots ,F_{e^{J}}\left( e_{it}^{J}\right) ;R\right) \times {\displaystyle \prod \nolimits _{j=1}^{J}} f_{e^{j}}\left( e_{it}^{j}\right) , \end{aligned}$$

where the dependence parameter R captures the correlation between \(e_{it}^{j}\)’s. Although that \(\alpha _{i}^{j}\) and \(\alpha _{i}^{j^{\prime }}\) are correlated with each other, the following result holds for the conditional pdf of \(e_{it}^{j}\)

$$\begin{aligned} f_{e^{j}|\alpha }\left( e_{it}^{j}|\alpha _{i}^{1},\ldots ,\alpha _{i}^{J}\right) =f_{e^{j}}\left( e_{it}^{j}|\alpha _{i}^{j}\right) , \end{aligned}$$

which suggests that the information about \(\alpha _{i}^{j^{\prime }}\), where \(j^{\prime }\ne j\), is redundant for predicting \(e_{it}^{j}\) once we know \(\alpha _{i}^{j}\).

Given the information set \(\alpha _{i}=\left( \alpha _{i}^{1},\ldots ,\alpha _{i}^{J}\right) \), the conditional joint pdf of \(f_{e|\alpha }\left( e_{it}|\alpha _{i}\right) \) can be represented as

$$\begin{aligned} f_{e|\alpha }\left( e_{it}|\alpha _{i}\right)&=c\left( F_{e^{1}}\left( e_{it}^{1}|\alpha _{i}\right) ,\ldots ,F_{e^{J}}\left( e_{it}^{J}|\alpha _{i}\right) ;R\right) \times {\displaystyle \prod \nolimits _{j=1}^{J}} f_{e^{j}}\left( e_{it}^{j}|\alpha _{i}\right) \nonumber \\&=c\left( F_{e^{1}}\left( e_{it}^{1}|\alpha _{i}^{j}\right) ,\ldots ,F_{e^{J} }\left( e_{it}^{J}|\alpha _{i}^{J}\right) ;R\right) \times {\displaystyle \prod \nolimits _{j=1}^{J}} f_{e^{j}}\left( e_{it}^{j}|\alpha _{i}^{j}\right) . \end{aligned}$$
(20)

Since \(e_{i1},\ldots ,e_{iT}\) are conditionally independent given \(\alpha _{i}\),

$$\begin{aligned} f_{e}(e_{i.})&=\int f_{e|\alpha }(e_{i.}|\alpha _{i})f_{\alpha }(\alpha _{i})\mathrm{d}\alpha _{i}\nonumber \\&=\int {\displaystyle \prod \nolimits _{t=1}^{T}} f_{e|\alpha }\left( e_{it}|\alpha _{i}\right) f_{\alpha }(\alpha _{i} )\mathrm{d}\alpha _{i}\nonumber \\&= {\displaystyle \prod \nolimits _{t=1}^{T}} \int \left[ c\left( F_{e^{1}}\left( e_{it}^{1}|\alpha _{i}^{j}\right) ,\ldots ,F_{e^{J}}\left( e_{it}^{J}|\alpha _{i}^{J}\right) ;R\right) \right. \nonumber \\&\quad \left. \times \, {\displaystyle \prod \nolimits _{j=1}^{J}} f_{e^{j}}\left( e_{it}^{j}|\alpha _{i}^{j}\right) \right] f_{\alpha } (\alpha _{i})\mathrm{d}\alpha _{i}, \end{aligned}$$
(21)

where the third equality is due to (20). Thus, (21) can be evaluated by the simulated density

$$\begin{aligned} f_{e}^{s}(e_{i.})= & {} \frac{1}{M} {\displaystyle \sum \nolimits _{m=1}^{M}} \left\{ {\displaystyle \prod \nolimits _{t=1}^{T}} \left[ c\left( F_{e^{1}}\left( e_{it}^{1}|\alpha _{i(m)}^{j}\right) ,\ldots ,F_{e^{J}}\left( e_{it}^{J}|\alpha _{i(m)}^{J}\right) ;R\right) \right. \right. \\&\left. \left. \times \, {\displaystyle \prod \nolimits _{j=1}^{J}} f_{e^{j}}\left( e_{it}^{j}|\alpha _{i(m)}^{j}\right) \right] \right\} . \end{aligned}$$

Although \(\alpha _{i(m)}^{j}\)’s are drawn independently, their dependence will be captured by the copula. The logarithm of the simulated likelihood function of the whole sample is

$$\begin{aligned} \ln L^{s}\left( \theta _{1},\ldots ,\theta _{J},R\right) = {\displaystyle \sum \nolimits _{i=1}^{N}} \ln f_{e}^{s}(e_{i.}), \end{aligned}$$
(22)

where \(f_{e}^{s}(e_{i.})\) is given by (21). Note that the simulated joint pdf \(f_{e}^{s}(e_{i.})\) in (21) and the simulated joint pdf in (17) are equivalent. In other words, the model is estimated in the same way whether the random effects \(\alpha _{i}\)’s are correlated or not. However, it is worth mentioning that when the correlated random effects are correlated, the copula not only captures the dependence between \(\varepsilon _{it}^{j}\) and \(\varepsilon _{it}^{j^{\prime }}\), but also captures the dependence between \(\alpha _{i}^{j}\) and \(\alpha _{i}^{j^{\prime }}\). Therefore, assumption [A1] can be further extended by allowing the possible correlation between the random effects, i.e., correlated \(\alpha _{i}^{j}\) and \(\alpha _{i}^{j^{\prime }}\) for \(j\ne j^{\prime }\). The MSL estimator from (22) is also consistent.

4 Prediction of the inefficiency and technical efficiency

The prediction of the inefficiency and technical efficiency (TE) can follow the simulated approach proposed by Lai and Kumbhakar (2018). For a given division j, let \(g(\cdot ):{\mathbb {R}}^{+}\rightarrow {\mathbb {R}}\) be a continuous function of \(u_{it}^{j}\) and we are interested in the conditional expectation \({\mathbb {E}}\left( g(u_{it}^{j})|e_{it}^{j}\right) \). For instance, if \(g(u_{it}^{j})=u_{it}^{j}\), then \({\mathbb {E}}\left( g(u_{it} ^{j})|e_{it}^{j}\right) ={\mathbb {E}}\left( u_{it}^{j}|e_{it}^{j}\right) ;\) if \(g(u_{it}^{j})=e^{-u_{it}^{j}}\), then \({\mathbb {E}}\left( g(u_{it} ^{j})|e_{it}^{j}\right) ={\mathbb {E}}\left( e^{-u_{it}^{j}}|\varepsilon _{it}^{j}\right) \).

The conditional expectation of the inefficiency given the composite error \(e_{it}^{j}\) is defined as

$$\begin{aligned} {\mathbb {E}}\left( u_{it}^{j}|e_{it}^{j}\right) =\int _{0}^{\infty }u_{it} ^{j}f_{u^{j}|e^{j}}\left( u_{it}^{j}|e_{it}^{j}\right) \mathrm{d}u_{it}^{j}, \end{aligned}$$

where

$$\begin{aligned} f_{u^{j}|e^{j}}\left( u_{it}^{j}|e_{it}^{j}\right)&=\frac{\int _{-\infty }^{\infty }f_{u^{j},e^{j},\alpha ^{j}}\left( u_{it}^{j},e_{it}^{j},\alpha _{i}^{j}\right) \mathrm{d}\alpha _{i}^{j}}{f_{e^{j}}\left( e_{it}^{j}\right) }\\&=\frac{\int _{-\infty }^{\infty }f_{u^{j}|e^{j},\alpha ^{j}}\left( u_{it} ^{j}|e_{it}^{j},\alpha _{i}^{j}\right) f_{e^{j}|\alpha ^{j}}\left( e_{it} ^{j}|\alpha _{i}^{j}\right) f_{\alpha ^{j}}\left( \alpha _{i}^{j}\right) \mathrm{d}\alpha _{i}^{j}}{\int _{-\infty }^{\infty }f_{e^{j}|\alpha ^{j}}\left( e_{it} ^{j}|\alpha _{i}^{j}\right) f_{\alpha ^{j}}\left( \alpha _{i}^{j}\right) \mathrm{d}\alpha _{i}^{j}}. \end{aligned}$$

Therefore, it follows that

$$\begin{aligned} {\mathbb {E}}\left( u_{it}^{j}|e_{it}^{j}\right)&=\int _{0}^{\infty } u_{it}^{j}\int _{-\infty }^{\infty }f_{u^{j}|e^{j},\alpha ^{j}}\left( u_{it} ^{j}|e_{it}^{j},\alpha _{i}^{j}\right) \frac{f_{e^{j}|\alpha ^{j}}\left( e_{it}^{j}|\alpha _{i}^{j}\right) f_{\alpha ^{j}}\left( \alpha _{i}^{j}\right) }{\int _{-\infty }^{\infty }f_{e^{j}|\alpha ^{j}}\left( e_{it}^{j}|\alpha _{i} ^{j}\right) f_{\alpha ^{j}}\left( \alpha _{i}^{j}\right) \mathrm{d}\alpha _{i}^{j} }\mathrm{d}\alpha _{i}^{j}\mathrm{d}u_{it}^{j} \end{aligned}$$
(23a)
$$\begin{aligned}&=\int _{-\infty }^{\infty }\left( \int _{0}^{\infty }u_{it}^{j}f_{u^{j} |e^{j},\alpha ^{j}}\left( u_{it}^{j}|e_{it}^{j},\alpha _{i}^{j}\right) \mathrm{d}u_{it}^{j}\right) \frac{f_{e^{j}|\alpha ^{j}}\left( e_{it}^{j}|\alpha _{i}^{j}\right) f_{\alpha ^{j}}\left( \alpha _{i}^{j}\right) }{\int _{-\infty }^{\infty }f_{e^{j}|\alpha ^{j}}\left( e_{it}^{j}|\alpha _{i}^{j}\right) f_{\alpha ^{j}}\left( \alpha _{i}^{j}\right) \mathrm{d}\alpha _{i}^{j}}\mathrm{d}\alpha _{i}^{j}. \end{aligned}$$
(23b)

Equation (23b) is related to the law of iterative expectation

$$\begin{aligned} {\mathbb {E}}\left( u_{it}^{j}|e_{it}^{j}\right) ={\mathbb {E}}\left[ \left. {\mathbb {E}}\left( u_{it}^{j}|e_{it}^{j},\alpha _{i}^{j}\right) \right| e_{it}^{j}\right] . \end{aligned}$$
(24)

Given \(f\left( u_{it}^{j}|e_{it}^{j},\alpha _{i}^{j}\right) =f\left( u_{it}^{j}|\varepsilon _{it}^{j}\right) \) and the well-known result

$$\begin{aligned} {\mathbb {E}}\left( u_{it}^{j}|\varepsilon _{it}^{j}\right) ={\widetilde{\mu }}_{it}^{j}+{\widetilde{\sigma }}_{it}^{j}\left[ \frac{{\phi \left( -{\widetilde{\mu }}_{it}^{j}/{\widetilde{\sigma }}_{it}^{j}\right) }}{{1-\Phi \left( -{\widetilde{\mu }}_{it}^{j}/{\widetilde{\sigma }}_{it}^{j}\right) }}\right] , \end{aligned}$$
(25)

where \(\varepsilon _{it}^{j}=v_{it}^{j}-u_{it}^{j}\), \(\widetilde{\mu }_{it}^{j}=-\varepsilon _{it}^{j}\sigma _{uj,it}^{2}/\left( \sigma _{uj,it}^{2}+\sigma _{vj}^{2}\right) \) and \({\widetilde{\sigma }} _{it}^{j2}=\sigma _{uj,it}^{2}\sigma _{vj}^{2}/\left( \sigma _{uj,it}^{2} +\sigma _{vj}^{2}\right) \), by assumptions [A2]–[A5], one may combine the results in Eqs. (24) and (25) to obtain the simulated estimator \({\mathbb {E}}^{s}\left( u_{it}^{j}|e_{it}^{j}\right) \) of \({\mathbb {E}}\left( u_{it}^{j}|e_{it}^{j}\right) \), which is defined as

$$\begin{aligned} {\mathbb {E}}^{s}\left( u_{it}^{j}|e_{it}^{j}\right) = {\displaystyle \sum \nolimits _{m=1}^{M}} \left\{ {\widetilde{\mu }}_{it(m)}^{j}+{\widetilde{\sigma }}_{it}^{j}\left[ \frac{{\phi \left( -{\widetilde{\mu }}_{it(m)}^{j}/{\widetilde{\sigma }}_{it} ^{j}\right) }}{{1-\Phi \left( -{\widetilde{\mu }}_{it(m)}^{j}/\widetilde{\sigma }_{it}^{j}\right) }}\right] \right\} W_{it(m)}^{j}, \end{aligned}$$
(26)

where \({\widetilde{\mu }}_{it(m)}^{j}=- \varepsilon _{it(m)}^{j} \sigma _{uj,it}^{2}/\left( \sigma _{uj,it}^{2}+\sigma _{vj}^{2}\right) \), \(\varepsilon _{it(m)}^{j}=e_{it}^{j}-\alpha _{i(m)}^{j}\) and the weight \(W_{it(m)}^{j}\) is defined as

$$\begin{aligned} W_{it(m)}^{j}=\frac{f_{\varepsilon ^{j}}(e_{it}^{j}-\alpha _{i(m)}^{j})}{\sum _{m=1}^{M}f_{\varepsilon ^{j}}(e_{it}^{j}-\alpha _{i(m)}^{j})} =\frac{f_{\varepsilon ^{j}}(\varepsilon _{it(m)}^{j})}{\sum _{m=1}^{M} f_{\varepsilon ^{j}}(\varepsilon _{it(m)}^{j})}. \end{aligned}$$
(27)

Note that the weights \(W_{it(m)}^{j}\) for \(m=1,\ldots , M\) have a sum equal to one. The estimator (26) has the form of the weighted average and gives the simulated estimator of the inefficiency \({\mathbb {E}}\left[ u_{it}^{j} |e_{it}^{j}\right] \).

Moreover, given the result

$$\begin{aligned} {\mathbb {E}}\left( e^{-u_{it}^{j}}|\varepsilon _{it}^{j}\right) =\frac{1-\Phi \left( {\widetilde{\sigma }}_{it}-{\widetilde{\mu }}_{it}^{j}/\widetilde{\sigma }_{it}^{j}\right) }{1-\Phi \left( -{\widetilde{\mu }}_{it}^{j} /{\widetilde{\sigma }}_{it}^{j}\right) }\exp \left( -{\widetilde{\mu }}_{it} ^{j}+\frac{1}{2}{\widetilde{\sigma }}_{it}^{j2}\right) , \end{aligned}$$
(28)

and

$$\begin{aligned} {\mathbb {E}}\left( e^{-u_{it}^{j}}|e_{it}^{j}\right) ={\mathbb {E}}\left[ \left. {\mathbb {E}}\left( e^{-u_{it}^{j}}|e_{it}^{j},\alpha _{i}^{j}\right) \right| e_{it}^{j}\right] , \end{aligned}$$
(29)

one can obtain the simulated estimator of TE,

$$\begin{aligned} {\mathbb {E}}^{s}\left( e^{-u_{it}^{j}}|e_{it}^{j}\right) = {\displaystyle \sum \nolimits _{m=1}^{M}} \left[ \frac{1-\Phi \left( {\widetilde{\sigma }}_{it}^{j}-\widetilde{\mu }_{it(m)}^{j}/{\widetilde{\sigma }}_{it}^{j}\right) }{1-\Phi \left( -{\widetilde{\mu }}_{it(m)}^{j}/{\widetilde{\sigma }}_{it}^{j}\right) }\exp \left( -{\widetilde{\mu }}_{it(m)}^{j}+\frac{1}{2}{\widetilde{\sigma }}_{it}^{j2}\right) \right] W_{it(m)}^{j}, \end{aligned}$$
(30)

by the same approach used in (26).

In a manner similar to (26) and (30), we can also compute the conditional expectation of the marginal effect of \(w_{it,k}^{j}\) on \({\mathbb {E}}\left( u_{it}^{j}|e_{it}^{j}\right) \). If \(w_{it,k}^{j}\) is a continuous variable, by the law of iterative expectation and the formula

$$\begin{aligned} \frac{\partial {\mathbb {E}}\left( u_{it}^{j}|\varepsilon _{it}^{j}\right) }{\partial w_{it,k}^{j}}=\delta _{j,k}\sigma _{uj,it}\left[ \begin{array} [c]{c} \left( 1+\left( \frac{{\widetilde{\mu }}_{it}^{j}}{{\widetilde{\sigma }}_{it}^{j} }\right) ^{2}\right) \left( \frac{\phi \left( -{\widetilde{\mu }}_{it} ^{j}/{\widetilde{\sigma }}_{it}^{j}\right) }{1-\Phi \left( -{\widetilde{\mu }} _{it}^{j}/{\widetilde{\sigma }}_{it}^{j}\right) }\right) \\ +\left( \frac{{\widetilde{\mu }}_{it}^{j}}{{\widetilde{\sigma }}_{it}^{j}}\right) \left( \frac{\phi \left( -{\widetilde{\mu }}_{it}^{j}/{\widetilde{\sigma }} _{it}^{j}\right) }{1-\Phi \left( -{\widetilde{\mu }}_{it}^{j}/\widetilde{\sigma }_{it}^{j}\right) }\right) ^{2} \end{array} \right] \end{aligned}$$
(31)

one can obtain the simulated estimator

$$\begin{aligned} \frac{\partial {\mathbb {E}}^{s}\left( u_{it}^{j}|e_{it}^{j}\right) }{\partial w_{it,k}^{j}}=\sum _{m=1}^{M}\delta _{j,k}\sigma _{uj,it}\left[ \begin{array} [c]{c} \left( 1+\left( \frac{{\widetilde{\mu }}_{it(m)}^{j}}{{\widetilde{\sigma }} _{it}^{j}}\right) ^{2}\right) \left( \frac{\phi \left( -{\widetilde{\mu }}_{it(m)}^{j}/{\widetilde{\sigma }}_{it}^{j}\right) }{1-\Phi \left( -{\widetilde{\mu }}_{it(m)}^{j}/{\widetilde{\sigma }}_{it}^{j}\right) }\right) \\ +\left( \frac{{\widetilde{\mu }}_{it(m)}^{j}}{{\widetilde{\sigma }}_{it}^{j} }\right) \left( \frac{\phi \left( -{\widetilde{\mu }}_{it(m)}^{j} /{\widetilde{\sigma }}_{it}^{j}\right) }{1-\Phi \left( -{\widetilde{\mu }} _{it(m)}^{j}/{\widetilde{\sigma }}_{it}^{j}\right) }\right) ^{2} \end{array} \right] W_{it(m)}^{j}. \end{aligned}$$
(32)

Moreover, if \(w_{it,k}^{j}\) is a dummy variable, the conditional expectation of the marginal effect of \(w_{it,k}^{j}\) on \({\mathbb {E}}\left( u_{it} ^{j}|e_{it}^{j}\right) \) is

$$\begin{aligned} \frac{\Delta {\mathbb {E}}\left( u_{it}^{j}|e_{it}^{j}\right) }{\Delta w_{it,k}^{j}}={\mathbb {E}}\left( u_{it}^{j}|e_{it}^{j},w_{it,k}^{j}=1\right) -{\mathbb {E}}\left( u_{it}^{j}|e_{it}^{j},w_{it,k}^{j}=0\right) , \end{aligned}$$
(33)

where the estimators of \({\mathbb {E}}\left( u_{it}^{j}|e_{it}^{j},w_{it,k} ^{j}=1\right) \) and \({\mathbb {E}}\left( u_{it}^{j}|e_{it}^{j},w_{it,k} ^{j}=1\right) \) can be obtained using Eqs. (26) and (27) and substituting \(w_{it,k}^{j}=1\) and \(w_{it,k}^{j}=0\), respectively. Therefore, the inefficiency, TE, and marginal effect of \(w_{it,k}^{j}\) on inefficiency can be estimated using (26), (30), (32) and (33), where the parameters are replaced by the MSL estimates obtained from (19).

5 Simulation

In this section, we conduct a Monte Carlo simulation to investigate the finite sample performance of the proposed simulated estimator. We consider the following data generating process (DGP) for the following system of two equations:

$$\begin{aligned} y_{it}^{1}&=\beta _{0}^{1}+x_{it}^{1}\beta _{1}^{1}+\alpha _{i} ^{1}+\varepsilon _{it}^{1},\\ y_{it}^{2}&=\beta _{0}^{2}+x_{it}^{2}\beta _{1}^{2}+\alpha _{i} ^{2}+\varepsilon _{it}^{2}, \end{aligned}$$

where \(\varepsilon _{it}^{j}=v_{it}^{j}-u_{it}^{j}\), \(v_{it}^{j} \sim i.i.d.N(0,\sigma _{vj}^{2})\), and \(u_{it}^{j}\sim N^{+}(0,\sigma _{uj,it}^{2})\) for \(j=1,2\). \(\alpha _{i}^{1}\) and \(\alpha _{i}^{2}\) are generated from a bivariate normal distribution with zero mean, correlation coefficient \(\varphi \), and variance \(\sigma _{\alpha j}^{2}\), where \(j=1,2\), The heteroscedastic variance of \(u_{it}^{j}\) is specified as \(\sigma _{uj,it}=\exp (\delta _{0}^{j}+\delta _{1}^{j}w_{it}^{j})\). The true parameters are set as

$$\begin{aligned} \beta _{0}^{1}=0.75\text {, }\beta _{1}^{1}=0.75\text {, }\sigma _{\alpha 1}=0.1\text {, }\sigma _{v1}=0.15\text {, }\delta _{0}^{1}=-0.5\text {, }\delta _{1}^{1}=0.1\text {, } \end{aligned}$$

and

$$\begin{aligned} \beta _{0}^{2}=1.5\text {, }\beta _{1}^{2}=0.5\text {, }\sigma _{\alpha 2}=0.15\text {, }\sigma _{v2}=0.1\text {, }\delta _{0}^{2}=-0.75\text {, } \delta _{1}^{2}=0.5\text {.} \end{aligned}$$

The exogenous variable \(x_{it}^{1}\) is drawn from Uniform[5, 10], \(w_{it} ^{1}\) is drawn from \(N(-1,0.5^{2})\), \(x_{it}^{2}\) is drawn from Uniform[2, 5] , and \(w_{it}^{2}\) is drawn from \(N(0,0.4^{2})\). Given the above marginal distributions of \(v_{it}^{j}\) and \(u_{it}^{j}\), we generate the composite errors \(\varepsilon _{it}^{1}\) and \(\varepsilon _{it}^{2}\) from the Gaussian copula with the copula parameter \(\rho \).Footnote 4 Under the above setting for \(\delta ^{\prime }s\) , the sample means of \(\sigma _{u1,it}\) and \(\sigma _{u2,it}\) are about 0.549 and 0.472, and their standard deviations are about 0.029 and 0.028, respectively.

Below, we consider three experiments, labeled as Experiment I, Experiment II and Experiment III. In Experiment I, we intend to investigate the finite sample performance of the MSL estimator discussed in Sect. 3.2, and we also compare the MSL estimators with and without taking into account the dependence between divisions in estimation. In Experiment II, we intend to investigate the consequences of a misspecified copula. We examine the finite sample performance of the estimator when the random effects are correlated in Experiment III. Moreover, the total number of replications is 500 and 300 Halton draws are used in these experiments.

In Experiment I, we investigate the sampling patterns of our estimator by considering the following different combinations of N, T and \(\rho \):

$$\begin{aligned} N=\{100,200\},\quad T=\{5,10\}\text { and }\rho =\{0.25,0.75\}. \end{aligned}$$

In the first experiment, we let the random effects be uncorrelated, i.e., we set \(\varphi =0\). In addition to estimating the model under the Gaussian copula specification, we also estimate the model using an independent copula, which ignores the correlation between the two equations, in order to compare the finite sample performances of these estimators. We summarize the biases and root mean squared errors (RMSE) in Tables 1 and 2, respectively. The results of independent and Gaussian copulas are given in Panels A and B of these tables.

Table 1 Biases of the Monte Carlo experiments (DGP is Gaussian copula and \(\alpha _{i}^{\prime }\)s are independent)
Table 2 RMSEs of the Monte Carlo experiments (DGP is Gaussian copula and \(\alpha _{i}^{\prime }\)s are independent)

From Tables 1 and 2, we find that the magnitudes of the biases and those of RMSEs are small for both copulas considered in these experiments. Nevertheless, we do not see any clear pattern that the biases of the Gaussian copula are smaller than those of the independent copula; but it can be observed from these tables that most of the biases in Panels A and B decrease as the sample size increases. However, Table 2 reveals that the RMSEs decrease when we increase either N or T for both independent and Gaussian copula specifications. Indeed, the RMSEs of the independent copula are larger than those of the Gaussian copula when N or T increases. In summary, the results in Tables 1 and 2 together provide evidence that the MSL estimators are consistent under both the independent and the Gaussian copula specifications, and the MSL estimator with the Gaussian copula is more efficient than that with the independent copula.

In our Experiment II, we reestimate the above model using Clayton, Gumbel, FGM (Farlie–Gumbel–Morgenstern) and AMH (Ali–Mikhail–Haq)Footnote 5 copulas for the sample size \(N=100\) and \(T=5\) in order to investigate how the MSL estimator performs under a misspecified copula. Both the Clayton and Gumbel copulas are Archimedean copulas, so their copulas can be represented in the form

$$\begin{aligned} C_\mathrm{Arch}(\zeta _{1},\zeta _{2})=\psi ^{-1}\left[ \psi (\zeta _{1})+\psi (\zeta _{2})\right] , \end{aligned}$$

where \(\psi :[0,1]\rightarrow {\mathbb {R}}^{+}\) is called the Archimedean generator. \(\psi (\cdot )\) is a continuous, strictly decreasing, and convex function and satisfies \(\psi (0)=\infty \) and \(\psi (1)=0\). Moreover, the two arguments \(\zeta _{j,it}\) are defined as \(\zeta _{j,it}=F_{\varepsilon ^{j} }\left( \varepsilon _{it}^{j}\right) \) for \(j=1,2\) in our model. The corresponding density of the Archimedean copula is

$$\begin{aligned} c_\mathrm{Arch}(\zeta _{1},\zeta _{2})=\frac{-\psi ^{\prime \prime }\left( C_\mathrm{Arch} (\zeta _{1},\zeta _{2})\right) \psi ^{\prime }(\zeta _{1})+\psi ^{\prime }(\zeta _{2})}{\left[ \psi ^{\prime }\left( C_\mathrm{Arch}(\zeta _{1},\zeta _{2})\right) \right] ^{3}}. \end{aligned}$$
  1. (i)

    If \(\psi (\zeta )=\zeta ^{-a}-1\), we obtain the Clayton copula, which has the form

    $$\begin{aligned} C_\mathrm{Clay}(\zeta _{1},\zeta _{2})=\left( \zeta _{1}^{-a}+\zeta _{2}^{-a}-1\right) ^{-1/a}, \end{aligned}$$

    where \(0<a<\infty \) controls the strength of dependence. When \(a=0\), there is no dependence, and when \(a=\infty \), there is prefect dependence.

  2. (ii)

    If \(\psi (\zeta )=(-\ln \zeta )^{a}\), we obtain the Gumbel copula, which has the form

    $$\begin{aligned} C_\mathrm{Gum}(\zeta _{1},\zeta _{2})=\exp \left\{ -\left[ \left( -\ln \zeta _{1}\right) ^{a}+\left( -\ln \zeta _{2}\right) ^{a}\right] ^{1/a}\right\} , \end{aligned}$$

    where \(a\ge 1\) controls the strength of dependence. When \(a=1\), there is no dependence, and when \(a=\infty \), there is prefect dependence.

  3. (iii)

    The third copula we considered is the FGM copula, which is defined as

    $$\begin{aligned} C_\mathrm{FGM}(\zeta _{1},\zeta _{2})=\zeta _{1}\zeta _{2}\left[ 1+\kappa _{F} (1-\zeta _{1})(1-\zeta _{2})\right] , \end{aligned}$$

    where \(-1\le \kappa _{F}\le 1\) is the copula parameter and \(\zeta _{1}\) and \(\zeta _{2}\) are defined as before. The corresponding copula density is

    $$\begin{aligned} c_\mathrm{FGM}(\zeta _{1},\zeta _{2})=1+\kappa _{F}(1-2\zeta _{1})(1-2\zeta _{2}). \end{aligned}$$

    The Spearman’s \(\rho \) of the FGM copula is \(\kappa _{F}/3\), so it ranges between \(-1/3\) and 1/3.

  4. (iv)

    The last copula is the AMH copula, whose copula function is defined as

    $$\begin{aligned} C_\mathrm{AMH}(\zeta _{1},\zeta _{2})=\frac{\zeta _{1}\zeta _{2}}{1-\kappa _{A} (1-\zeta _{1})(1-\zeta _{2})}, \end{aligned}$$

    where \(-1\le \kappa _{A}\le 1\) is the copula parameter and \(\zeta _{1}\) and \(\zeta _{2}\) are defined as before. The AMH copula density is

    $$\begin{aligned}&c_\mathrm{AMH}(\zeta _{1},\zeta _{2})\\&\quad =\left[ 1+\kappa _{A}\left( \zeta _{1}\zeta _{2}+\zeta _{1}\,{+}\,\zeta _{2}-2\,{+}\,\kappa _{A}(1\,{-}\,\zeta _{1})(1\,{-}\,\zeta _{2})\right) \right] \left[ 1\,{-}\,\kappa _{A}(1\,{-}\,\zeta _{1})(1\,{-}\,\zeta _{2})\right] ^{-3}. \end{aligned}$$

In the DGP, the true copula is a Gaussian copula and we consider \(\rho =0.25\) and 0.75. We summarize biases and RMSEs of our estimates in Table 3. Since the dependence parameters of the Clayton, Gumbel, FGM and AMH copulas are not directly comparable with the Gaussian copula parameter \(\rho \), we do not report the biases and RMSEs of the dependence parameters. We found that the bias of \(\beta _{0}^{1}\) is relatively large when the Gumbel copula is used and \(\rho =0.25\). However, the biases of the estimates of the other parameters are quite close to the estimates of the Gaussian copula in Tables 1 and 2. Most of the estimates have biases slightly larger than the estimates from the Gaussian copula, but they are also of a small magnitude. Moreover, there is no clear evidence showing that the estimators of the misspecified copulas have larger biases than that of the independent copula. For the RMSEs, we found that some RMSEs from the misspecified copulas are slightly larger than those from the Gaussian copula, but there is no clear pattern. On the other hand, it is worth mentioning that the RMSEs of the independent copula are in general larger than the other copulas, which suggests that using a copula to capture the dependence between the divisions is helpful for improving the estimation efficiency even if the copula is misspecified. From the above experiment, we may conclude that if a misspecified copula is used, the ML estimator may have a slightly larger bias and RMSE than that from the correctly specified copula, but the problem may not be serious as one expects.

Table 3 Bias and RMSE when different copulas are used (\(N=100\), \(T=5\), DGP is Gaussian copula and \(\alpha _{i}^{\prime }\)s are independent)
Table 4 Bias when \(\alpha _{i}\prime \)s are correlated (DGP is Gaussian copula)

In Experiment III, we allow correlation between the random effects, i.e., \(\alpha _{i}^{1}\) and \(\alpha _{i}^{2}\) are correlated with correlation coefficient \(\varphi \). Similar to Tables 1 and 2, we consider independent and Gaussian copulas for \((N,T)=\{(100,5),(200,10)\}\) and \(\rho =\{0.25,0.75\}\). To investigate how the degree of the correlation between the random effects affect the performance of the MSL estimator, we set \(\varphi =\{0.25,0.75\}\) and compare their differences. We summarize the simulation results in Tables 4 and 5. It is worth mentioning that the same model specification is used in Experiments I and III, but their DGPs are slightly different. The only difference between them is the value of \(\varphi \). \(\varphi =0\) in Experiment I. Similar to what we have observed from Table 1, the biases from the independent copula are quite close to that of the Gaussian copula. This is because both independent and Gaussian copulas give consistent estimators whether \(\alpha _{i}^{j}\)’s are correlated or not. However, it is clear that the estimator using the Gaussian copula is more efficient than the estimator using the independent copula in terms of RMSE. By comparing the biases in Tables 1 and 5, we find that the biases of \(\rho \) in Table 5 are slightly larger than those in Table 1 because the correlation between the random effects has been picked up by the copula parameter in the model. Moreover, we also find that the estimator using the independent copula loses efficiency as the degree of dependence between \(\varepsilon _{it}^{j}\)’s (or \(\alpha _{i}^{j} \)’s) increases.

Table 5 RMSE when \(\alpha _{i}\prime \)s are correlated (DGP is Gaussian copula)
Table 6 Empirical results

In summary, we find that using either the independent copula or Gaussian copula gives a consistent estimator of the model parameter. Misspecification of a copula may cause estimation bias, but the problem may not be too serious. Even though we do not estimate the correlation coefficient \(\varphi \) between the random effects in the model, the MSL estimator is still consistent and the correlation will be automatically captured by the copula parameter, maintaining the estimation efficiency.

6 The empirical application

This paper applies the proposed method to the production frontier of Taiwan’s international hotels; each hotel has two divisions, accommodation and restaurant, i.e., \(J=2\) for the system in (1). The data are derived from the annual report of the Taiwan Tourism Bureau at the Ministry of Transportation and Communications. Our sample is an unbalanced panel data, which contains 725 sample observations from 61 international grand hotels during 2001–2013. The minimum observed time period is 6 years, while the maximum period is 13 years.

For the accommodation division, the output is measured in total revenue (\(y_{1}\)), while the inputs include the total number of workers (\(x_{11}\)), the total number of rooms (\(x_{12}\)), and other expenses (\(x_{13}\)), which include utilities, materials, maintenance fees and so on. The output and inputs are allocable within the accommodation division. The output for the restaurant division is also measured by the total revenue (\(y_{2}\)), while the corresponding inputs are the total number of workers (\(x_{21}\)), the floor area of the restaurant (\(x_{22}\)) and other expenses (\(x_{23}\)), including utilities, materials and so on. Again, the output and inputs are attributable and accountable within the restaurant division. All revenues and other expenses are measured in New Taiwan dollars (NT$). Logarithms are applied to outputs and inputs. The exogenous determinants of the inefficiencies of the two divisions include the scale of the hotel (\(w_{1}\)) and the area dummy variable (\(w_{2}\)). The scale variable \(w_{1}\) ranges from 1 to 5.Footnote 6 The dummy variable \(w_{2}\) is equal to one if the hotel is located in a scenic area and zero otherwise. Since these two divisions of a hotel share certain common characteristics, such as the same DMU, brand and location, we expect the two outputs \(y_{1}\) and \(y_{2}\) should be correlated with each other and the composite errors to be also.

The empirical model is specified as

$$\begin{aligned} y_{it}^{j}=\beta _{0}^{j}+\beta _{1}^{j}x_{1,it}^{j}+\beta _{2}^{j}x_{2,it} ^{j}+\beta _{3}^{j}x_{3,it}^{j}+\alpha _{i}^{j}+\varepsilon _{it}^{j},\text { }j=1,2. \end{aligned}$$
(34)

To obtain the likelihood function of the empirical model, we differentiate Eq. (8) with respect to \(\varepsilon _{it}^{1}\) and \(\varepsilon _{it}^{2}\) and obtain the joint pdf

$$\begin{aligned} f_{\varepsilon }\left( \varepsilon _{it}^{1},\varepsilon _{it}^{2}\right) =c\left( F_{\varepsilon ^{1}}\left( \varepsilon _{it}^{1}\right) ,F_{\varepsilon ^{2}}\left( \varepsilon _{it} ^{2}\right) \right) f_{\varepsilon ^{1}}\left( \varepsilon _{it}^{1}\right) f_{\varepsilon ^{2}}\left( \varepsilon _{it}^{2}\right) , \end{aligned}$$
(35)

where \(c(F_{\varepsilon ^{1}}(\varepsilon _{it}^{1}),F_{\varepsilon ^{2} }(\varepsilon _{it}^{2}))=\partial C(F_{\varepsilon ^{1}}(\varepsilon _{it}^{1}),F_{\varepsilon ^{2}}(\varepsilon _{it}^{2}))/ \partial F_{\varepsilon ^{1}}(\varepsilon _{it}^{1})\partial F_{\varepsilon ^{2} }(\varepsilon _{it}^{2})\) is the copula density.

Table 7 Predicted Inefficiencies, TEs and marginal effects

For the sake of comparison, we consider three different model specifications: Model (I) is the separate estimation, where each division is estimated by a single-equation stochastic frontier model with random effects. Model (II) uses the joint estimation, where the joint pdf of (35) and an independent copula are used. For the independent copula, we have \(c(F_{\varepsilon ^{1}}(\varepsilon _{it}^{1}),F_{\varepsilon ^{2} }(\varepsilon _{it}^{2}))=1\) and, therefore,

$$\begin{aligned} f_{\varepsilon }(\varepsilon _{it}^{1},\varepsilon _{it}^{2})=f_{\varepsilon ^{1} }\left( \varepsilon _{it}^{1}\right) f_{\varepsilon ^{2}}\left( \varepsilon _{it}^{2}\right) . \end{aligned}$$
(36)

The independent copula implies that the two divisions are uncorrelated, so the correlation between the two divisions is ignored in the empirical model. It is worth mentioning that the simulated likelihood functions of Models (I) and (II) are different since their simulated likelihood functions are evaluated in different ways. In Model (I), the log-likelihood function of the system is the sum of the logarithms of the simulated likelihood functions of each single equation. For Model (II), it is the logarithm of the simulated product of the marginal pdfs. We may expect that both of them give consistent estimators of the parameters as long as the marginal probability model is correctly specified. Model (III) uses the joint pdf in (35) and the Gaussian copula, where the correlation between the two divisions is captured by the copula parameter \(\rho \). Under the Gaussian copula assumption, the joint pdf of \(\varepsilon _{it}^{1}\) and \(\varepsilon _{it}^{2}\) is

$$\begin{aligned} f_{\varepsilon }\left( \varepsilon _{it}^{1},\varepsilon _{it}^{2}\right) =c\left( F_{\varepsilon ^{1}}\left( \varepsilon _{it}^{1}\right) ,F_{\varepsilon ^{2}}\left( \varepsilon _{it} ^{2}\right) \right) f_{\varepsilon ^{1}}\left( \varepsilon _{it}^{1}\right) f_{\varepsilon ^{2}}\left( \varepsilon _{it}^{2}\right) , \end{aligned}$$
(37)

where

$$\begin{aligned} c\left( \varsigma _{it}^{1},\varsigma _{it}^{2}\right) =\frac{1}{\sqrt{1-\rho ^{2}}}\exp \left( \frac{\left( \varsigma _{it}^{1}\right) ^{2}+\left( \varsigma _{it}^{2}\right) ^{2}}{2}+\frac{2\rho \varsigma _{it}^{1} \varsigma _{it}^{2}-\left( \varsigma _{it}^{1}\right) ^{2}-\left( \varsigma _{it}^{2}\right) ^{2}}{2\left( 1-\rho ^{2}\right) }\right) , \end{aligned}$$
(38)

\(\varsigma _{it}^{j}=\Phi ^{-1}\left( r_{it}^{j}\right) =\Phi ^{-1}\left( F_{\varepsilon ^{j}}\left( \varepsilon _{it}^{j}\right) \right) \) and \(r_{it}^{j}=F_{\varepsilon ^{j}}(\varepsilon _{it}^{j})\) for \(j=1,2\). Under the Gaussian copula, the linear correlation between \(F_{\varepsilon ^{1}}\) and \(F_{\varepsilon ^{2}}\) is

$$\begin{aligned} \gamma _{\text {Spearman}}=\frac{6}{\pi }\arcsin \frac{\rho }{2}, \end{aligned}$$
(39)

which measures the correlation of the two divisions in terms of the cdfs of \(\varepsilon _{it}^{1}\) and \(\varepsilon _{it}^{2}\) and is also called the Spearman’s rank correlation coefficient of \(\varepsilon _{it}^{1}\) and \(\varepsilon _{it}^{2}\).

We summarize the simulated ML estimates of Models (I), (II) and (III) in Table 6. As we expected, the estimated parameters of Models (I) and (II) are quite close to each other since they have the same probability model but their likelihood functions are evaluated in different ways. The estimates of Model (III) are slightly different from those of Models (I) and (II). Model (III) allows for correlation between the two divisions, and thus, its MSL estimator is more efficient. For the accommodation division, the input elasticities of \(x_{11}\) and \(x_{12}\) from the separate model and the independent copula model are smaller than those from the Gaussian copula models, while the elasticity of \(x_{13}\) from the Gaussian copula model is larger than the other two specification. A similar pattern can also be found in the restaurant division. Moreover, almost all standard errors of Model (III) are much smaller than those obtained from Models (I) and (II). The coefficients of the exogenous determinants of inefficiencies have the same signs in the three models. Given the estimated Gaussian copula parameter \({\widehat{\rho }}=0.8431\), the corresponding Spearman’s rank coefficient \({\widehat{\gamma }}_{\text {Spearman}}\) is 0.8311, which suggests that \(F_{\varepsilon ^{1}}(\varepsilon _{it}^{1})\) and \(F_{\varepsilon ^{2}}(\varepsilon _{it}^{2})\) are highly correlated and that the two divisions are also. Both Models (I) and (II) give less efficient estimators.

Once we obtain the estimated parameters, we may predict the inefficiencies, TEs and the marginal effects of the exogenous determinants (\(w_{it}\)) on the inefficiency using (26), (30), (32) and (33). Their sample statistics are summarized in Panels A, B and C of Table 7, respectively. Given the estimators of Models (I) and (II), it looks like the inefficiencies of the two divisions from Models (I) and (II) are lower than those from the Gaussian copula, and, therefore, the TEs are larger in Models (I) and (II) than those in Model (III). The predicted marginal effects from the three models consistently suggest that increasing the scale of a hotel has negative effects on inefficiencies of the accommodation and restaurant sectors and is thus helpful in improving the technical efficiency. Moreover, our results also show that the hotels located in scenic areas are also less efficient compared with those located in cities.

7 Conclusion

In this paper, we used a copula-based simulated maximum likelihood approach to estimate multiple panel stochastic frontier regressions with correlated composite errors. The innovation of the proposed model is to address the unobserved heterogeneity of the panel data. Compared with the separate estimation, the joint estimation of the multiple SF regressions is more efficient since the joint approach takes into consideration the correlation among composite errors. Therefore, this paper provides a better prediction of the inefficiency under this panel data framework. Although the system model considered in the current paper focuses on a production frontier system, it is a straightforward extension to modify the current production frontier system to a cost frontier system or to a system consisting of a combination of the production and cost frontiers.