Abstract
For the partially linear varying-coefficient model when the parametric covariates are measured with additive errors, the estimator of the error variance is defined based on residuals of the model. At the same time, we construct Jackknife estimator as well as Jackknife empirical likelihood statistic of the error variance. Under both the response variables and their associated covariates form a stationary \(\alpha \)-mixing sequence, we prove that the proposed estimators and Jackknife empirical likelihood statistic are asymptotic normality and asymptotic \(\chi ^2\) distribution, respectively. Numerical simulations are carried out to assess the performance of the proposed method.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Consider the following partially linear varying-coefficient errors-in-variables (EV) model
where \(Y_i\) are the scalar response variables and \((X_i^\tau , W_i^\tau , T_i)\) are covariates, \(a(\cdot )=(a_1(\cdot ),\cdots ,a_q(\cdot ))^\tau \) is a q-dimensional vector of unknown coefficient functions, \(\beta =(\beta _1,\cdots ,\beta _p)^\tau \) is a p-dimensional vector of unknown parameters, \(\epsilon _i\) are random errors. Because of the curse of dimensionality, we assume that \(T_i\) is univariate; \(e_i\) are independent and identically distributed (i.i.d.) with mean zero and covariate matrix \(\Sigma _e\), and are independent of \((Y_i, X_i, W_i, T_i)\). In order to identify the model, \(\Sigma _e\) is assumed to be known. When \(\Sigma _e\) is unknown, one can employ the approaches proposed by Liang et al. (1999) to estimate it.
When \(X_i\) are observed exactly, the model (1.1) boils down to the partially linear varying-coefficient model, which has been studied by many authors, for example, Fan and Huang (2005) proposed a profile least square method to estimate the unknown parameter and studied the asymptotic normality of the estimator. Besides, based on the estimator, they introduced the profile empirical likelihood ratio test and showed the test statistic asymptotically \(\chi ^2\) distributed under the null hypothesis. In addition, Ahmad et al. (2005), You and Zhou (2006), Huang and Zhang (2009), Wang et al. (2011), Bravo (2014) extensively explored partially linear varying-coefficient models; Zhou et al. (2010), Wei et al. (2012), Singh et al. (2014) for similar research related to EV models.
For the model (1.1), You and Chen (2006) studied the case where the covariates were observed with measurement errors and proposed estimators for the parametric and nonparametric component respectively. When the covariates in nonparametric part are measured with errors, Feng and Xue (2014) investigated the profile least square estimators and conducted a linear hypothesis test for the parametric part.
It is worth pointing out that the works mentioned above all assume that variables or errors are independent. However, the independence assumption is inadequate in some applications, especially in the field of economics and financial analysis, where the data often exhibit dependence to some extent. Therefore, the dependence data have drawn considerable interests of statisticians. One case of them is serially correlated errors, such as AR(1) errors, MA(\(\infty \)) errors, negatively associated errors, martingale difference errors, etc. See, for example, the work of You et al. (2005), Liang et al. (2006), Liang and Jing (2009), You and Chen (2007), Fan et al. (2013), Fan et al. (2013) and Miao et al. (2013).
As we know, the empirical likelihood (EL) introduced by Owen (1988, (1990) is an effective method for constructing confidence regions which enjoys numerous nice properties over normal approximation-based methods and the bootstrap [see Hall (1992), Hall and La Scala (1990), Zi et al. (2012)]. The EL related to model (1.1) or partially linear varying-coefficient model has been studied by some authors, for example, You and Zhou (2006), Huang and Zhang (2009), Wang et al. (2011), and Fan et al. (2012) for the partially time-varying coefficient (in this case \(T_i=i/n\)) errors-in-variables model. It can be seen that the EL in these papers is based on linear functional of the studied parametric or nonparametric parts in the models. However, when nonlinear functionals are involved, such as U-statistics and variance of random sample, an application of the EL method will be computationally difficult and the Wilks theorem does not hold in general, i.e., the asymptotic distribution of the EL ratio is not a chi-squared distribution. Fortunately, in the study of the EL on one and two-sample U-statistics, Jing et al. (2009) proposed a new approach called jackknife empirical likelihood (JEL), which can handle the situation where nonlinear statistics are involved. At the same time, another attractive feature of the JEL is that the new method is simple to use. Thanks to the advantages, the JEL method has been applied recent years. See, for example, Gong et al. (2010), Peng (2012), Peng et al. (2012) and Feng and Peng (2012).
In the sequel, we assume that \(\{(X_i, W_i, T_i, \epsilon _i),i\ge 1\}\) is a sequence of stationary \(\alpha \)-mixing random variables with \(E(\epsilon _i|X_i,W_i,T_i)=0 ~a.s.\) and \(E(\epsilon _i^2|X_i,W_i,T_i)=\sigma ^2 ~a.s.\) from the model (1.1). Recall that a sequence \(\{\zeta _k, k\ge 1\}\) is said to be \(\alpha \)-mixing if the \(\alpha \)-mixing coefficient
converges to zero as \(n\rightarrow \infty \), where \(\mathcal{F}^m_l=\sigma \{\zeta _l, \zeta _{l+1},\cdots ,\zeta _m\}\) denotes the \(\sigma \)-algebra generated by \(\zeta _l, \zeta _{l+1},\ldots ,\zeta _m\) with \(l\le m\). As we know, among the most frequently used mixing conditions, the \(\alpha \)-mixing is the weakest and many time series present \(\alpha \)-mixing property. For a more detailed and general review, we refer to Doukhan (1994) and Lin and Lu (1996).
In this paper, we focus on estimating the error variance \(\sigma ^2\), and investigate asymptotic normality of estimator for the error variance. It is well known that the error of a regression model impacts its performance, and the study for the error variance could help researchers to improve the accuracy of the model. So it is necessary to investigate large sample properties of the estimators of the error variance. Up to now, only a few researchers have discussed the asymptotic normality of the estimator for the error variance. Among of them, we refer to You and Chen (2006), Liang and Jing (2009), Zhang and Liang (2012) and Fan et al. (2013), Fan et al. (2013). At the same time, we construct Jackknife estimator as well as JEL statistic of \(\sigma ^2\), and prove that they are asymptotic normality and asymptotic \(\chi ^2\) distribution, respectively. Based on the JEL statistic of \(\sigma ^2\), we can construct its confidence interval which plays a crucial role in quantifying estimation uncertainty. With the study for error variance, we can get more comprehensive understanding of statistical models. Hence, the statistical inference can be improved. These results are new, even for independent data.
We organize the paper as follows. In Sect. 2, we give the methodologies and show how to build the estimators. Main results are listed in Sect. 3. Section 4 presents a simulation study to verify the idea and demonstrate the advantages of jackknife method. Proofs of Main Results are put in Sect. 5. Some preliminary lemmas, which are used in the proofs of the main results, are collected in Appendix.
2 Estimators
2.1 Profile least squares estimation
The local linear regression technique is applied to estimate the coefficient functions \(\{a_j(\cdot ),j=1,2,\cdots ,q\}\) in (1.1). For t in a small neighborhood of \(t_0\), one can approximate a(t) locally by a linear function \(a_j(t)\approx a_j(t_0)+a'_j(t_0)(t-t_0)\equiv a^*_j+b^*_j(t-t_0)\), \(j=1,2,\cdots ,q,\) where \(a'_j(t)=\partial a_j(t)/\partial t\). This leads to the following weighted local least-squares problem if \(\beta \) is known: find \((a^*,b^*)\) so as to minimize
where \(a^*=(a^*_1,a^*_2,\cdots ,a^*_q)^\tau \), \(b^*=(b^*_1,b^*_2,\cdots ,b^*_q)^\tau \), \(K_h(\cdot )=K(\cdot /h)/h\), \(K(\cdot )\) is a kernel function and \(0<h:=h_n\rightarrow 0\) is a bandwidth.
For the sake of descriptive convenience, we denote \(\mathbf Y =(Y_1,Y_2,\cdots ,Y_n)^\tau , \mathbf X =(X_1,X_2,\cdots ,X_n)^\tau , \mathbf W =(W_1,W_2,\cdots ,W_n)^\tau , \omega _t=diag(K_h(T_1-t),K_h(T_2-t),\cdots ,K_h(T_n-t))\), and
Then the minimizer in (2.1) is found to be \( \Big (\begin{array}{c} \hat{a}^* \\ h\hat{b}^* \end{array}\Big ) =\{D_t^\tau \omega _tD_t\}^{-1}D_t^\tau \omega _t(\mathbf Y -\mathbf X \beta ). \) Therefore, when \(\beta \) is known, we obtain the estimator of \(\alpha (t)\) by
Let \(S_i=\Big (W_i^\tau ~~ 0\Big )\{D_{T_i}^\tau \omega _{T_i}D_{T_i}\}^{-1}D_{T_i}^\tau \omega _{T_i}\), \(\tilde{Y}_i=Y_i-S_i\mathbf Y \) and \(\tilde{X}_i^\tau =X_i^\tau -S_i\mathbf X \). Substituting (2.2) into the original varying-coefficient model, and applying the least square method, one can obtain the estimator of parametric component \(\beta \), \( \tilde{\beta }=(\sum _{i=1}^n\tilde{X}_i\tilde{X}_i^\tau )^{-1}\sum _{i=1}^n\tilde{X}_i\tilde{Y}_i. \) However, since \(X_i\) cannot be observed directly and we have \(\xi _i=X_i+e_i\) instead, we can write (2.1) as
Similarly, one can obtain the following modified profile least squares estimator of \(\beta \)
and the estimators of \(a(\cdot )\) and \(\sigma ^2\), respectively
2.2 Jackknife method
Since the estimators we have constructed are based on samples \((\tilde{\xi }_i, \tilde{Y}_i)_{i=1}^n\), they are regarded as the pseudo observations. Let \(\hat{\beta }_{n,-i}\) be the estimator of \(\beta \) when the ith observation is deleted,
Therefore the ith Jackknife pseudo sample is \( J_i=n\hat{\beta }_n-(n-1)\hat{\beta }_{n,-i}. \) Hence, we have the Jackknife estimator of \(\beta \)
From \( \hat{\sigma }_n^2=\frac{1}{n}\sum _{i=1}^n(\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)^2-\hat{\beta }_n^\tau \Sigma _e\hat{\beta }_n, \) similarly, let \(\hat{\sigma }^2_{n,-i}\) be the estimator of \(\sigma ^2\) when the ith observation is deleted, \( \hat{\sigma }_{n,-i}^2=\frac{1}{n-1}\sum _{j\ne i}^n(\tilde{Y}_j-\tilde{\xi }_j^\tau \hat{\beta }_{n,-i})^2 -\hat{\beta }_{n,-i}^\tau \Sigma _e\hat{\beta }_{n,-i}. \) Then we have the ith Jackknife pseudo sample \( \sigma _{J_i}^2=n\hat{\sigma }_n^2-(n-1)\hat{\sigma }_{n,-i}^2, \) and the Jackknife estimator of \(\sigma ^2\)
Based on the Jackknife pseudo sample, one constructs the Jackknife empirical likelihood of \(\sigma ^2\)
The solution to the above maximization is \( \hat{p}_i=\frac{1}{n[1+\lambda (\sigma _{J_i}^2-\sigma ^2)]},~~i=1,2,\ldots ,n, \) where \(\lambda \) satisfies \( \frac{1}{n}\sum _{i=1}^n\frac{\sigma _{J_i}^2-\sigma ^2}{1+\lambda (\sigma _{J_i}^2-\sigma ^2)}=0. \) Therefore, we have the log empirical likelihood ratio function of \(\sigma ^2\)
3 Main results
In order to formulate the main results, we need to impose the following basic assumptions.
-
(A1)
The random variable T has bounded support \(\Omega \), and its density function \(f(\cdot )\) is Lipschitz continuous and away from 0 on its support.
-
(A2)
The \(q\times q\) matrix \(E(\mathbf W \mathbf W ^\tau |T)\) is nonsingular for each \(T \in \Omega \). \(E(\mathbf X \mathbf X ^\tau |T)\), \(E(\mathbf W \mathbf W ^\tau |T)\) and \(E(\mathbf X \mathbf W ^\tau |T)\) are all lipschitz continuous. Set \(\Gamma (T_i)=E(W_iW_i^\tau |T_i)\), \(\Phi (T_i)=E(X_iW_i^\tau |T_i)\), \(i=1,2,\cdots ,n\), the derivatives of order 2 of functions \(\Gamma (\cdot )\) and \(\Phi (\cdot )\) are bounded for each \(T\in \Omega \). The \(q\times q\) matrix \(EX_1X_1^\tau -E\Phi ^\tau (T_1)\Gamma ^{-1}(T_1)\Phi (T_1)\) is positive definite.
-
(A3)
There is a \(\delta >4\) such that \(E(\Vert X_1\Vert ^{2\delta }|T_1)<\infty ~ a.s.\), \(E(\Vert W_1\Vert ^{2\delta }|T_1)<\infty ~ a.s.\), \(E\Vert \xi _1\Vert ^{2\delta }<\infty ~ a.s.\), \(E[|\epsilon _1|^{2\delta }|X_1,W_1]<\infty ~ a.s.\)
-
(A4)
\(\{a_j(\cdot ),j=1,2,\cdots ,q\}\) have continuous second derivatives in \(T\in \Omega \).
-
(A5)
The function \(K(\cdot )\) is a symmetric probability density function with bounded compact support which is Lipschitz continuous as well, and the bandwidth h satisfies \(nh^8\rightarrow 0\) and \(nh^2/(\log n)^2\rightarrow \infty \).
-
(A6)
The \(\alpha \)-mixing coefficient \(\alpha (n)\) satisfies that \(\alpha (n)=O(n^{-\lambda })\) for some \(\lambda >\max \{\frac{7\delta +4}{\delta -4},\frac{9\delta +4}{\delta +4}\}\) with the same \(\delta \) as in (A3).
Remark 3.1
-
(a)
Assumptions (A1)–(A6) are quite mild and commonly used in literature. Particularly, (A1)–(A2) and (A4)–(A5) are employed in Fan and Huang (2005), Feng and Xue (2014).
-
(b)
Assumptions (A3) implies \(E\Vert X_1\Vert ^{2\delta }<\infty \) and \(E\Vert W_1\Vert ^{2\delta }<\infty \).
-
(c)
Assumption (A6) indicates relatively low mixing speed. In fact, when the \(\alpha \)-mixing coefficient decays exponentially, i.e. \(\alpha (n)=O(\rho ^n)\), \(0<\rho <1\), one can verify easily that (A6) is satisfied.
Theorem 3.1
-
(i)
Suppose assumptions (A1)–(A6) are satisfied, then \( \sqrt{n}(\hat{\sigma }_n^2-\sigma ^2)\mathop {\rightarrow }\limits ^\mathcal{{D}} N(0,\Pi ), \) where \(\Pi =\lim _{n\rightarrow \infty }Var\{\frac{1}{\sqrt{n}}\sum _{i=1}^n(\epsilon _i-e_i^\tau \beta )^2\}\). Further, \(\hat{\Pi }\) is a plug-in estimator of \(\Pi \), where \( \hat{\Pi }=\frac{1}{n}\{\sum _{i=1}^n [(\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)^2-\hat{\beta }_n^\tau \Sigma _e\hat{\beta }_n-\hat{\sigma }_n^2]\}^2. \)
-
(ii)
Suppose assumptions (A1)–(A6) are satisfied, then \( \sqrt{n}(\hat{\sigma }^2_J-\sigma ^2)=\sqrt{n}(\hat{\sigma }^2_n-\sigma ^2)+o_p(1). \) Furthermore, with (i) we have \(\sqrt{n}(\hat{\sigma }_J^2-\sigma ^2)\mathop {\rightarrow }\limits ^\mathcal{{D}} N(0,\Pi ).\)
Theorem 3.2
Suppose assumptions (A1)–(A6) are satisfied, then \( \frac{\Sigma _4}{\Pi }l(\sigma ^2)\mathop {\rightarrow }\limits ^\mathcal{{D}}\chi _1^2, \) where \(\Sigma _4=E(\epsilon _1-e_1^\tau \beta )^4-(\sigma ^2+\beta ^\tau \Sigma _e\beta )^2>0\). Moreover, \(\hat{\Sigma }_4\) is a plug-in estimator of \(\Sigma _4\), where \( \hat{\Sigma }_4=\frac{1}{n}\sum _{i=1}^n\{ (\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)^4-(\hat{\beta }_n^\tau \Sigma _e\hat{\beta }_n+\hat{\sigma }_n^2)^2\}. \)
Remark 3.2
-
(a)
Under the conditions of Theorem 3.2, if \(\{\epsilon _i\}\) is a sequence of independent random variables, then one can verify \(\Pi =\Sigma _4\) and \(l(\sigma ^2)\mathop {\rightarrow }\limits ^\mathcal{{D}}\chi _1^2\). In this case, the jackknife empirical likelihood method does not relate to estimation for the asymptotic variance \(\Sigma _4\) of the jackknife pseudo samples. However, when \(\{\epsilon _i\}\) is a sequence of dependent random variables, we cannot ignore the covariance between \((\epsilon _i-e_i^\tau \beta )^2\) and \((\epsilon _j-e_j^\tau \beta )^2)\) for \(i\ne j\), which leads to \(\Pi \ne \Sigma _4\). Thus, to construct an approximate confidence interval of \(\sigma ^2\), we need to estimate \(\Pi \) and \(\Sigma _4\).
-
(b)
From Theorem 3.2, it is easy to construct an approximate confidence region with level \(1-\tau \) for \(\sigma ^2\) as \(I(\tau )=\{\sigma ^2: \frac{\hat{\Sigma _4}}{\hat{\Pi }}l(\sigma ^2)\le c_\tau \}\), where \(c_\tau \) is chosen to satisfy \(P(\chi _1^2\le c_\tau )=1-\tau \).
4 Simulation
In this section, we conduct numerical simulation to investigate the finite sample behavior of the profile least square estimator \(\hat{\sigma }_n^2\) and the jackknife estimator \(\hat{\sigma }_J^2\) in terms of sample means, bias, mean square error (MSE). Besides, we study the performance of proposed jackknife empirical likelihood method for constructing confidence intervals for \(\sigma ^2\) and compare it with normal approximation method in terms of coverage probability and average interval length.
Consider the following partially linear varying-coefficient EV model:
where \(\beta _1=1\), \(\beta _2=2\), \(a_1(T)=sin(6\pi T)\), \(a_2=sin(2\pi T)\). The measurement error \(e_i \sim N(0,\Sigma _e)\), where \(\Sigma _e=0.3^2I_2\) and \(I_2\) is the \(2\times 2\) identity matrix. \(X_i, W_i, T_i, \epsilon _i\) are generated from AR(1) model as follows:
-
\(X_{i,j}=\rho X_{i,j-1}+u_{i,j}\), \(i=1,2\) with \(u_{i,j}\) are i.i.d. N(0, 1),
-
\(W_{i,j}=\rho ^2 W_{i,j-1}+w_{i,j}\), \(i=1,2\) with \(w_{i,j}\) are i.i.d. N(0, 1),
-
\(T_j=\sqrt{\rho } T_{j-1}+t_j\), \(t_j\) are i.i.d. \(N(0,0.1^2)\),
-
\(\epsilon _j=\rho \epsilon _{j-1}+\eta _j\), \(\eta _j\) are i.i.d. N(0, 0.5).
It is easy to verify that \(\{X_i, W_i, T_i, \epsilon _i\}\) is a sequence of stationary and \(\alpha \)-mixing random variables (see Doukhan (1994)) with \(0<\rho <1\). When \(\rho =0\), \(\{(X_i, W_i, T_i, \epsilon _i),~~i=1,2,\ldots ,n\}\) are i.i.d. random variables. In order to investigate the influence of dependence on the estimators, we take the samples with \(\rho \)=0, 0.2, 0.5, 0.8, respectively. In fact, since the data generated from AR(1) model, one can easily find that the true value of \(\sigma ^2=0.5/(1-\rho ^2)\), which means that when the coefficient \(\rho \) changes, \(\sigma ^2\) changes as well.
The following simulation is based 1000 replications. For the proposed estimators, we employ the Epanechnikov kernel function \(K(u)=15/16(1-u^2)^2I(|u|\le 1)\), and the bandwidth h is selected by minimizing the MSE in a grid search.
Taking sample sizes \(n=50\), 100, 200, 500, we calculate bias and MSE of \(\hat{\sigma }_n^2\) and \(\hat{\sigma }_J^2\), respectively, to evaluate the two estimators’ performance. According to Table 1, basically, the jackknife estimator performs better than the profile least square estimator. Both Bias(\(\hat{\sigma }_J^2\)) and MSE(\(\hat{\sigma }_J^2\)) are smaller than those of \(\hat{\sigma }_n^2\). Besides, both estimators get more accurate when n increases. The gap between \(MSE(\hat{\sigma }_n^2)\) and \(MSE(\hat{\sigma }_J^2)\) becomes narrow as n increasing. In other words, the jackknife estimator can significantly improve the estimation accuracy when sample size is small. In addition, as the dependence of observations increases (i.e., \(\rho \) increases), which leads to larger \(\sigma ^2\), the accuracy of estimation slightly decreases when observations present relatively strong dependence. Specifically, the MSE for both estimators become larger as \(\sigma ^2\) rise.
Coverage probabilities and average interval lengths are reported in Table 2, showing that the jackknife empirical likelihood method is much more accurate than the normal approximation method in all scenarios in terms of coverage probabilities. Since it is obvious that the coverage probabilities for JEL are closer to the level than normal approximation method (NAM). In most cases, the average interval lengths based on JEL are smaller than NAM. More precisely, as n increases, the coverage probabilities for both JEL method and NAM become closer to the level, the confidence intervals for both methods becomes narrow. When \(\rho =0\) i.e. independent cases, JEL performs much better than NAM with higher coverage probabilities and shorter confidence intervals. When dependence increases, the coverage probabilities slightly fall down, due to the fact that stronger dependence leads to bigger variance \(\sigma ^2\).
5 Proofs of main results
Throughout this paper, let C, \(C_1\), \(C_2\) denote finite positive constants, whose values may change in different scenarios. Let \(\mu _i=\int u^iK(u)du\), and \(c_n=\{\log (n)/(nh)\}^{1/2}+h^2\). From (A5), one can easily verify that \(c_n=o(n^{-1/4})\). Set \(\epsilon =(\epsilon _1,\epsilon _2,\cdots ,\epsilon _n)^\tau ,~~\mathbf 1 _n=(1,1,\cdots ,1)^\tau \).
Proof of Theorem 3.1
(i) From Lemma 6.3, it follows that \( \frac{1}{\sqrt{n}}\sum _{i=1}^n[(\epsilon _i-e_i^\tau \beta )^2-(\sigma ^2+\beta ^\tau \Sigma _e\beta )] \mathop {\rightarrow }\limits ^\mathcal{{D}} N(0,\Pi ), \) where \(\Pi =\lim _{n\rightarrow \infty }Var\{\frac{1}{\sqrt{n}}\sum _{i=1}^n(\epsilon _i-e_i^\tau \beta )^2\}\). Therefore, to prove Theorem 3.1 (i), it is sufficient to show that
From \( \hat{\sigma }_n^2=\frac{1}{n}\sum _{i=1}^n(\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)^2 -\hat{\beta }_n^\tau \Sigma _e\hat{\beta }_n, \) one can write
First, we prove that
From the definition of \(\tilde{Y}_i\) and (1.1), one can write
Note that from the proof of Lemma 3 in Owen (1990) and (A3), we have \(\max _{1\le i\le n}\Vert X_i\Vert =o(n^{1/2\delta })~~a.s.\) and \(\max _{1\le i\le n}\Vert W_i\Vert =o(n^{1/2\delta })~~a.s.\)
Furthermore, from Lemma 6.6 and (A2), we have
Lemma 6.9 (i) gives \(\Vert \hat{\beta }_n-\beta \Vert =O_p(n^{-1/2})\), therefore
From (A1)–(A4), one can easily obtain that \(P\big (\frac{1}{n}\sum _{i=1}^n(W_i^\tau a(T_i))^2>\eta \big ) \le \frac{E[a^\tau (T_1)\Gamma (T_1)a(T_1)]}{\eta }<\frac{C}{\eta }, \) which implies \(\frac{1}{n}\sum _{i=1}^n(W_i^\tau a(T_i))^2=O_p(1)\). Together with (6.9) and (A5) we have
Note that \(\frac{1}{n}\sum _{i=1}^nW_iW_i^\tau =O_p(1)\). Therefore, together with (6.14), we have
From (6.9), (A3) and (A4), we have \( \max _{1\le i\le n}|\tilde{M}_i|=\max _{1\le i\le n}|W_i^\tau a(T_i)|O_p(c_n) =O_p(n^{1/{2\delta }})O_p(c_n). \) Similar to the proof of (5.6), one can obtain that
As to \(A_{15}\), by (6.6), (6.14), Lemma 6.10, (A1), (A2) and (A5), we have
From (A1), (A2), (A4), it is easy to verify that \(|\frac{1}{n}a^\tau (T_i)W_iW_i^\tau \mathbf 1 |=O_p(1)\). Therefore, with Lemma 6.10, (6.9) and (6.14), we have
From Lemma 6.10 and (6.14), it is directly derived that
Hence, with (5.4)–(5.7),(5.8)–(5.12), we finish the proof of (5.2). Write
Note that \(\frac{1}{n}\sum _{i=1}^ne_ie_i^\tau -\Sigma _e=o_p(1)\) from the strong law of large number for i.i.d. random variables and \(\Vert \hat{\beta }_n-\beta \Vert =O_p(n^{-1/2})\). Then
Hence, by (5.13)–(5.16), we complete the proof of (5.3). Write
Applying Lemma 6.3, we have \(\Vert \frac{1}{n}\sum _{i=1}^nW_ie_i^\tau \Vert =O_p(n^{-1/2})\). Then by (6.14), we have
Similarly, by (6.6) and (6.9), one can obtain that
Combining (5.17)–(5.20), we prove (5.4). As a result, (5.1) can be written as
This completes the proof of Theorem 3.1 (i).
(ii) To prove \(\sqrt{n}(\hat{\sigma }_J^2-\sigma ^2)=\sqrt{n}(\hat{\sigma }_n^2-\sigma ^2)+o_p(1)\), it is sufficient to prove that \(\hat{\sigma }_J^2=\hat{\sigma }_n^2+o_p(n^{-1/2}).\) According to the definition, we have \(\hat{\sigma }_J^2=\hat{\sigma }_n^2+\frac{n-1}{n}\sum _{i=1}^n(\hat{\sigma }_n^2-\hat{\sigma }_{n,-i}^2)\). Therefore, to obtain the desired result, we only need to prove
Note that \(\sum _{i=1}^n[\tilde{\xi }_i(\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)+\Sigma _e\hat{\beta }_n]=0\), with Lemma 6.4 we have
Therefore, to prove (5.21), it is sufficient to prove \(B_k=o_p(n^{-1/2}),~~k=1,2,3,4.\)
From Lemmas 6.7 and 6.11, we have
Similarly, one can easily check that
Using Lemmas 6.11 and 6.5, we have
Therefore, one can obtain that
Hence, combining (5.22)–(5.25), we finish the proof of (5.21). \(\square \)
Proof of Theorem 3.2
Define \(g(\lambda )=\frac{1}{n}\sum _{i=1}^n\frac{\sigma _{J_i}^2-\sigma ^2}{1+\lambda (\sigma _{J_i}^2-\sigma ^2)}\). It is easy to check that
where \(S_{\sigma ^2}=\frac{1}{n}\sum _{i=1}^n(\sigma _{J_i}^2-\sigma ^2)^2\), \(R_n=\max _{1\le i\le n}|\sigma _{J_i}^2-\sigma ^2|\). Next we prove
Write
Hence, to prove (5.26) we only need to prove \(\max _{1\le i\le n}|b_{ki}|=o_p(n^{-1/2})\) for \(k=1,2,3,4,5.\)
Apparently, we have
From (A3), we have
which implies \(\frac{4}{n}\sum _{i=1}^n(\epsilon _i-e_i^\tau \beta )^3\tilde{\xi }(\beta -\hat{\beta }_n)=o_p(1)\) from \(\Vert \hat{\beta }_n-\beta \Vert =O_p(n^{-1/2})\) given by Lemma 6.9 (i). Similarly \(\frac{4}{n}\sum _{i=1}^n(\epsilon _i-e_i^\tau \beta )(\tilde{\xi }(\beta -\hat{\beta }_n))^3=o_p(1)\), \(\frac{6}{n}\sum _{i=1}^n(\epsilon _i-e_i^\tau \beta )^2(\tilde{\xi }(\beta -\hat{\beta }_n))^2=o_p(1)\) and \(\frac{1}{n}\sum _{i=1}^n(\tilde{\xi }(\beta -\hat{\beta }_n))^4=o_p(1)\). Therefore, from Lemma 6.5, we have
From (5.28), one can derive that
By the same approaches used in (5.22)-(5.25), one can easily check
Hence, together with (5.29) and (5.30), we have proved (5.26).
According to Theorem 3.1, one can write \( S_{\sigma ^2}=\frac{1}{n}\sum _{i=1}^n(\sigma _{J_i}^2)^2-(\sigma ^2)^2+o_p(1), \)
Therefore, to prove (5.27), we need to investigate the convergency of \(\frac{1}{n}\sum _{i=1}^n(\sigma _{J_i}^2)^2\) first.
From (5.21), we have \(\frac{2(n-1)}{n}\hat{\sigma }_n^2\sum _{i=1}^n(\hat{\sigma }_n^2-\hat{\sigma }_{n,-i}^2)=o_p(n^{-1/2})\). Using the same techniques in proving (5.26), one can get \( \frac{(n-1)^2}{n}\sum _{i=1}^n(\hat{\sigma }_n^2-\hat{\sigma }_{n,-i}^2)^2 =\frac{(n-1)^2}{n}\sum _{i=1}^nb_{1i}^2+o_p(1). \) Together with (5.28), we have
which proves (5.27).
Applying Theorem 3.1, we have \(|\frac{1}{n}\sum _{i=1}^n\sigma _{J_i}^2-\sigma ^2|=O_p(n^{-1/2})\). Together with (5.27), we have \(\frac{|\lambda |}{1+|\lambda |R_n}=O_p(n^{-1/2})\). From (5.26), it follows that \(|\lambda |=O_p(n^{-1/2})\). Let \(\gamma _i=\lambda (\sigma _{J_i}^2-\sigma ^2)\), then still by (5.26), \(\max _{1\le i\le n}|\gamma _i|=|\lambda |R_n=o_p(1)\). Note that
By (5.26) and (5.27), it is easy to derive that \(\frac{1}{n}\sum _{i=1}^n(\sigma _{J_i}^2-\sigma ^2)\frac{\gamma _i^2}{1+\gamma _i} =\frac{1}{n}\sum _{i=1}^n(\sigma _{J_i}^2-\sigma ^2)^2\lambda ^2(\sigma _{J_i}^2-\sigma ^2)\frac{1}{1+\gamma _i} =o_p(n^{-1/2})\). Therefore
Denote \(\lambda = S_{\sigma ^2}^{-1}\frac{1}{n}\sum _{i=1}^n(\sigma _{J_i}^2-\sigma ^2)+\phi _n\), where \(|\phi _n|=o_p(n^{-1/2})\). Let \(\eta _i=\sum _{k=3}^\infty \frac{(-1)^{k-1}}{k}\gamma _i^k\), then \(\eta _i=O(\gamma _i^3)\), which implies \( |\sum _{i=1}^n\eta _i|\le C\sum _{i=1}^n|\gamma _i|^3 =C\sum _{i=1}^n|\lambda ^2(\hat{\sigma }_{J_i}^2-\sigma ^2)^2\gamma _i| \le Cn\lambda ^2S_{\sigma ^2}\max _{1\le i\le n}|\gamma _i|=o_p(1)\). Hence
Finally, together with Theorem 3.1, we finish the proof of Theorem 3.2. \(\square \)
References
Ahmad I, Leehalanon S, Li Q (2005) Efficient estimation of a semiparametric partially linear varying coefficient model. Ann Stat 33:258–283
Bravo F (2014) Varying coefficients partially linear models with randomly censored data. Ann Inst Stat Math 66:383–412
Doukhan P (1994) Mixing: properties and examples. Springer, New York
Fan GL, Xu HX, Liang HY (2012) Empirical likelihood inference for partially time-varying coefficient errors-in-variables models. Electron J Stat 6:1040–1058
Fan GL, Liang HY, Wang JF (2013) Statistical inference for partially linear time-varying coefficient errors-in-variables models. J Stat Plann Inference 143:505–519
Fan GL, Liang HY, Wang JF (2013) Empirical likelihood for heteroscedastic partially linear errors-in-variables model with \(\alpha \)-mixing errors. Stat Pap 54:85–112
Fan J, Huang T (2005) Profile likelihood inferences on semiparametric varying-coefficient partially linear models. Beroulli 11:1031–1057
Feng H, Peng L (2012) Jackknife empirical likelihood tests for distribution functions. J Stat Plan Inference 142:1571–1585
Feng S, Xue L (2014) Bias-corrected statistical inference for partially linear varying coefficient errors-in-variables models with restricted condition. Ann Inst Stat Math 66:121–140
Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. John Hopkins University Press, Baltimore
Gong Y, Peng L, Qi Y (2010) Smoothed jackknife empirical likelihood method for roc curve. J Multivar Anal 101:1520–1531
Hall P (1992) The bootstrap and edgeworth expansion. Springer, New York
Hall P, La Scala B (1990) Methodology and algorithms of empirical likelihood. Int Stat Rev 58:109–127
Hong S, Cheng P (1994) The convergence rate of estimation for parameter in a semiparametric model. Chin J Appl Probab Stat 10:62–71
Huang Z, Zhang R (2009) Empirical likelihood for nonparametric parts in semiparametric varying-coefficient partially linear models. Stat Probab Lett 79:1798–1808
Jing BY, Yuan J, Zhou W (2009) Jackknife empirical likelihood. J Am Stat Assoc 104:1224–1232
Liang H, Härdle W, Carroll RJ (1999) Estimation in a semiparametric partially linear errors-in-variables model. Ann Stat 27:1519–1535
Liang HY, Jing BY (2009) Asymptotic normality in partially linear models based on dependent errors. J Stat Plan Inference 139:1357–1371
Liang HY, Mammitzsch V, Steinebach J (2006) On a semiparametric regression model whose errors form a linear process with negatively associated innovations. Statistics 40:207–226
Liebscher E (2001) Estimation of the density and the regression function under mixing conditions. Stat Decis 19:9–26
Lin Z, Lu C (1996) Limit theory for mixing dependent random variables. Science Press, New York
Miao Y, Zhao F, Wang K, Chen Y (2013) Asymptotic normality and strong consistency of LS estimators in the EV regression model with NA errors. Stat Pap 54:193–206
Miller RG (1974) An unbalanced jackknife. Ann Stat 2:880–891
Owen AB (1988) Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75:237–249
Owen AB (1990) Empirical likelihood ratio confidence regions. Ann Stat 8:90–120
Peng L (2012) Approximate jackknife empirical likelihood method for estimating equations. Can J Stat 40:110–123
Peng L, Qi Y, Van Keilegom I (2012) Jackknife empirical likelihood method for copulas. Test 21:74–92
Shao QM (1993) Complete convergence for \(\alpha \)-mixing sequences. Stat Probab Lett 16:279–287
Singh S, Jain K, Sharma S (2014) Replicated measurement error model under exact linear restrictions. Stat Pap 55:253–274
Wang X, Li G, Lin L (2011) Empirical likelihood inference for semiparametric varying-coefficient partially linear EV models. Metrika 73:171–185
Wei C, Luo Y, Wu X (2012) Empirical likelihood for partially linear additive errors-in-variables models. Stat Pap 53:485–496
Yang SC (2007) Maximal moment inequality for partial sums of strong mixing sequences and application. Acta Math Sin Engl Ser 23:1013–1024
You J, Chen G (2006) Estimation of a semiparametric varying-coefficient partially linear errors-in-variables model. J Multivar Anal 97:324–341
You J, Chen G (2007) Semiparametric generalized least squares estimation in partially linear regression models with correlated errors. J Stat Plan Inference 137:117–132
You J, Zhou X, Chen G (2005) Jackknifing in partially linear regression models with serially correlated errrors. J Multivar Anal 92:386–404
You J, Zhou Y (2006) Empirical likelihood for semiparametric varying-coefficient partially linear regression models. Stati Probab Lett 76:412–422
Zhang JJ, Liang HY (2012) Asymptotic normality of estimators in heteroscedastic semiparametric model with strong mixing errors. Commun Stat 41:2172–2201
Zhou H, You J, Zhou B (2010) Statistical inference for fixed-effects partially linear regression models with errors in variables. Stat Pap 51:629–650
Zi X, Zou C, Liu Y (2012) Two-sample empirical likelihood method for difference between coefficients in linear regression model. Stat Pap 53:83–93
Acknowledgments
The authors would like to thank anonymous referees for their valuable comment and suggestions which lead to the improvement of the paper. This research was supported by the National Natural Science Foundation of China (11271286) and the Specialized Research Fund for the Doctor Program of Higher Education (20120072110007).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
In this section, we give some preliminary Lemmas, which have been used in Section 5. Let \(\{X_i, i\ge 1\}\) be a stationary sequence of \(\alpha \)-mixing random variables with the mixing coefficients \(\{\alpha (k)\}\).
Lemma 6.1
(Liebscher (2001), Proposition 5.1) Assume that \(EX_i=0\) and \(|X_i|\le S<\infty \) a.s. \((i=1,2,\cdots ,n)\). Then for n, \(m\in \mathbb {N}\), \(0<m\le n/2\) and \(\epsilon >0\), \( P(|\sum _{i=1}^nX_i|>\epsilon )\le 4\exp \{-\frac{\epsilon ^2}{16}(nm^{-1}D_m+\frac{1}{3}\epsilon Sm)^{-1}\}+32\frac{S}{\epsilon }n\alpha (m), \) where \(D_m=\max _{1\le j\le 2m}Var(\sum _{i=1}^jX_i)\).
Lemma 6.2
(Yang (2007), Theorem 2.2)
-
(i)
Let \(r>2,~\delta >0,~EX_i=0\) and \(E|X_i|^{r+\delta }<\infty \). Suppose that \(\lambda >r(r+\delta )/(2\delta )\) and \(\alpha (n)=O(n^{-\lambda })\). Then for any \(\epsilon >0\), there exists a positive constant \(C:=C(\epsilon ,r,\delta ,\lambda )\) such that \(E\max _{1\le m\le n}|\sum _{i=1}^mX_i|^r\le C\{n^\epsilon \sum _{i=1}^nE|X_i|^r+(\sum _{i=1}^n\Vert X_i\Vert _{r+\delta }^2)^{r/2}\}.\)
-
(ii)
If \(EX_i=0\) and \(E|X_i|^{2+\delta }<\infty \) for some \(\delta >0\), then \(E(\sum _{i=1}^nX_i)^2\le \{1+16\sum _{l=1}^n\alpha ^{\frac{\delta }{2+\delta }}(l)\}\sum _{i=1}^n\Vert X_i\Vert _{2+\delta }^2\).
Lemma 6.3
(Lin and Lu (1996), Theorem 3.2.1) Suppose that \(EX_1\!=\!0,~~E|X_1|^{2+\delta }\!<\!\infty \) for some \(\delta \!>\!0\) and \(\sum _{n=1}^{\infty }\alpha ^{\delta /(2+\delta )}(n)\!<\!\infty \). Then \(\sigma ^2\!:=\!EX_1^2+2\sum _{j=2}^\infty EX_1X_j<\infty \) and, if \(\sigma \ne 0\), \( \frac{S_n}{\sigma \sqrt{n}}\mathop {\rightarrow }\limits ^\mathcal{{D}}N(0,1). \)
Lemma 6.4
(Miller (1974), Lemma 2.1) For a nonsingular matrix A, and vectors U and V, we have \((A+UV^\tau )^{-1}=A^{-1}-\frac{(A^{-1}U)(V^\tau A^{-1})}{1+V^\tau A^{-1}U}\).
Lemma 6.5
(Shao (1993), Corollary 1) Let \(EX_i=0\) and \(\sup _i E|X_i|^r<\infty \) for some \(r>1\). Suppose that \(\alpha (n)=O(\log ^{-\psi }n)\) for some \(\psi >r/(r-1)\). Then \(n^{-1}\sum _{i=1}^n X_i=o(1)~~a.s\).
Lemma 6.6
Suppose (A1)–(A3), (A5) and (A6) are satisfied, then
Proof
We only prove (6.1) here, because (6.2) can be proved similarly. Write
Here, we only give the proof of
We divide \(\Omega \) into subintervals \(\{\Delta _l\}\) (\(l=1,2,\cdots ,l_n\)) with length \(r_n=h\sqrt{\frac{\log n}{nh}}\), and the center of \(\Delta _l\) is at \(t_l\). Then the total number of the subintervals satisfies \(l_n=O(r_n^{-1})\). Then
Therefore, to prove (6.4), it is sufficient to show that \(I_k=O_p(c_n),~~k=1,2,3\).
Using the Lipschitz continuity of \(K(\cdot )\), we have \( |K_h(T_i-t)-K_h(T_i-t_l)|\le \frac{C_1}{h^2}|t-t_l|I(|T_i-t_l|\le C_2h)\le \frac{C_1 r_n}{h^2}I(|T_i-t_l|\le C_2h). \) Therefore, the \((k_1,k_2)\) component in \(I_1\), \(1\le k_1\le k_2\le p\), can be written as
For \(I_{11}\), applying Lemmas 6.1 and 6.2 we have
where \(D_m=\max _{1\le j\le 2m}E(h\sum _{i=1}^j[|W_{ik_1}W_{ik_2}|I(|T_i-t_l|\le C_2h) -E|W_{ik_1}W_{ik_2}|I(|T_i-t_l|\le C_2h)])^2 n^{-2/\delta } \le \frac{C_2mh}{n^{2/\delta }}\). Taking \(m=[\frac{n^{1-1/\delta }h}{C_0\sqrt{nh\log n}}]\), we have
On the other hand, we have \(E|W_{ik_1}W_{ik_2}|I(|T_i-t_l|\le C_2h)=O(h).\) Therefore \(I_{12}=O(\sqrt{\frac{\log n}{nh}}). \) Together with (6.5), one can derive \(I_1=O_p(C_n).\) One can rewrite \(I_2\) as
By the same technique used in proving (6.5), we have \(I_{21}=O_p\left( \sqrt{\frac{\log n}{nh}}\right) \), \(I_{22}=O_p\left( \sqrt{\frac{\log n}{nh}}\right) .\) Using Taylor’s expansion, we have \(I_{23}=O(h^2). \) From (A1), we have
Thus, (6.4) is proved, which completes the proof of this lemma. \(\square \)
Lemma 6.7
Suppose (A1)–(A3), (A5) and (A6) are satisfied, then \( \frac{1}{n}\sum _{i=1}^n\tilde{\xi _i}\tilde{\xi _i}^\tau \mathop {\rightarrow }\limits ^\mathrm{P} \Sigma _e+EX_1X_1^\tau -E[\Phi ^\tau (T_1)\Gamma ^{-1}(T_1)\Phi (T_1)]. \)
Proof
From the definition \(\tilde{\xi _i}^\tau =\xi _i^\tau -S_i{\varvec{\xi }}\) and (1.1), we have
where \(S_i=(W_i^\tau ,~0)(D_{T_i}^\tau \omega _{T_i}D_{T_i})^{-1}D_{T_i}^\tau \omega _{T_i}\). By (6.1) and (6.2) in Lemma 6.6, we have
Similarly, using the approaches above and those in the proof of (6.1) and (6.2), we have
From (6.6) and using Lemma 6.5, it follows that
Similarly \( \frac{1}{n}\sum _{i=1}^n(e_i^\tau -S_i\mathbf e )^\tau (X_i^\tau -S_i\mathbf X ) =\frac{1}{n}\sum _{i=1}^ne_i (X_i^\tau -W_i^\tau \Gamma ^{-1}(T_i)\Phi (T_i))\{1+O_p(c_n)\} \mathop {\rightarrow }\limits ^\mathrm{P} 0. \) According to (6.7), we have \( \frac{1}{n}\sum _{i=1}^n(e_i^\tau -S_i\mathbf e )^\tau (e_i^\tau -S_i\mathbf e ) =\frac{1}{n}\sum _{i=1}^ne_ie_i^\tau \mathop {\rightarrow }\limits ^\mathrm{a.s.} \Sigma _e. \) Thus the conclusion is proved. \(\square \)
Lemma 6.8
Suppose (A1)–(A6) are satisfied, then \(\sum _{i=1}^n\tilde{\xi _i}\tilde{M}_i=o_p(\sqrt{n}),\) where \(\tilde{M}_i=M_i-S_iM\) and \(M_i=W_i^\tau a(T_i)\).
Proof
According to the definition, we have
Note that \( D_t^\tau \omega _tM =\Big (\begin{array}{ccc} \sum _{i=1}^nW_iW^\tau _i a(T_i) K_h(T_i-t) \\ \sum _{i=1}^nW_iW^\tau _i a(T_i) \frac{T_i-t}{h}K_h(T_i-t) \end{array}\Big ). \) Using the similar techniques in the proof of Lemma 6.6, one can easily check that \( D_t^\tau \omega _tM=n\Gamma (t)f(t) a(t)\otimes \Big (\begin{array}{c} 1 \\ 0 \end{array} \Big )\{1+O_p(c_n)\}. \) Therefore \(S_iM=W_i^\tau a(T_i)\{1+O_p(c_n)\}\), furthermore,
Then, from (6.6) and law of large numbers for stationary \(\alpha \)-mixing sequences, one can obtain
Similarly with (6.7), we have \( \frac{1}{n}\sum _{i=1}^n(e_i^\tau -S_i\mathbf e )^\tau (M_i^\tau -S_iM) \mathop {\rightarrow }\limits ^\mathrm{P} 0, \) which, together with (6.8) and (6.10), yields that \( \sum _{i=1}^n\tilde{\xi }_i\tilde{M}_i=O_p(nc_n^2)=o_p(\sqrt{n}). \) \(\square \)
Lemma 6.9
-
(i)
Suppose (A1)–(A6) are satisfied, then
$$\begin{aligned} \sqrt{n}(\hat{\beta }_n-\beta )\mathop {\rightarrow }\limits ^\mathcal{{D}} N(0,\Sigma _1^{-1}\Sigma _2\Sigma _1^{-1}), \end{aligned}$$where \(\Sigma _1=E(X_1X_1^\tau )-E[\Phi ^\tau (T_1)\Gamma ^{-1}(T_1)\Phi (T_1)]\), \(\Phi (T_1)=E(W_1X_1^\tau |T_1)\), \(\Gamma (T_1)=E(W_1W_1^\tau |T_1)\) \(\Sigma _2=\lim _{n\rightarrow \infty }Var\{\frac{1}{\sqrt{n}}\sum _{i=1}^n[\xi _i-\Psi ^\tau (T_i)\Gamma ^{-1}(T_i)W_i][\epsilon _i-e_i^\tau \beta ]\}\). Further, \(\hat{\Sigma }_1^{-1}\hat{\Sigma }_2\hat{\Sigma }_1^{-1}\) is a consistent estimator of \(\Sigma _1^{-1}\Sigma _2\Sigma _1^{-1}\), where \(\hat{\Sigma }_1=\frac{1}{n}\sum _{i=1}^n\tilde{\xi }_i\tilde{\xi }_i^\tau -\Sigma _e,\) \(\hat{\Sigma }_2=\frac{1}{n}\Big \{\sum _{i=1}^n [\tilde{\xi }_i(\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)]+\Sigma _e\hat{\beta }_n\Big \}^{\otimes 2}, \) here \(C^{\otimes 2}\) means \(CC^\tau \).
-
(ii)
Suppose (A1)–(A6) are satisfied, then \(\sqrt{n}(\hat{\beta }_J-\beta )=\sqrt{n}(\hat{\beta }_n-\beta )+o_p(1)\).
Proof
(i) Let \(\sum _{i=1}^n\tilde{\xi }_i\tilde{\xi }^\tau _i-n\Sigma _e=A\), then \(\hat{\beta }_n=A^{-1}\sum _{i=1}^n\tilde{\xi }_i\tilde{Y}^\tau _i\). Write
From Lemma 6.7, we have \(A^{-1}=O(\frac{1}{n})\). According to the definition and (1.1), we write
Similar to the proof of (6.2) in Lemma 6.6, one can easily check that \( D_t^\tau \omega _t\epsilon =n\mathbf 1 _{2q}\otimes \Big (\begin{array}{c} 1 \\ 0 \end{array} \Big )O_p\Big (\sqrt{\frac{\log n}{nh}}\Big ). \) Together with (6.1), (A1) and (A2), we have
Therefore
Combining (6.11)–(6.15) and Lemma 6.8, we have
Let \(\eta _i=\Sigma _e\beta +[\xi _i-\Phi ^\tau (T_i)\Gamma ^{-1}(T_i)W_i][\epsilon _i-e^\tau \beta ]\). Obviously, \(\{\eta _i,i\ge 1\}\) is an \(\alpha \)-mixing sequence with \(E\eta _i=0\) and \(E|\eta _i|^\delta <\infty \) for \(\delta >4\). Applying Lemma 6.3, one can complete the proof of (i).
(ii) To prove \(\sqrt{n}(\hat{\beta }_J-\beta )=\sqrt{n}(\hat{\beta }_n-\beta )+o_p(1)\), it is sufficient to prove \(\hat{\beta }_J=\hat{\beta }_n+o_p(\frac{1}{\sqrt{n}})\).
Note that \(\hat{\beta }_J=\hat{\beta }_n+\frac{n-1}{n}\sum _{i=1}^n(\hat{\beta }_n-\hat{\beta }_{n,-i}).\) Therefore, we only need to prove that
From the definition,
Using the fact [see Theorem 11.2.3 in Golub and Van Loan (1996)] \( (A+B)^{-1}=A^{-1}-A^{-1}BA^{-1}-A^{-1}B\sum _{k=1}^{\infty }C^kA^{-1}, \) where A is a nonsingular matrix, and \(C=-A^{-1}B\). We write
where \(D=A^{-1}B\sum _{k=1}^{\infty }C^kA^{-1}, A=[\sum _{j\ne i}\tilde{\xi }_j\tilde{\xi }_j^\tau -n\Sigma _e], B=\Sigma _e, C=-A^{-1}B\).
Applying Lemma 6.4, we write
Let \(A=[\sum _{j=1}^n\tilde{\xi }_j\tilde{\xi }_j^\tau -n\Sigma _e]\), the same as in the proof of Lemma 6.9 (i). Then combining (6.17), (6.18) and the definitions of \(\hat{\beta }_n\) and \(\hat{\beta }_{n,-i}\) and noting that \(\sum _{i=1}^n[\tilde{\xi }_i(\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)+\Sigma _e\hat{\beta }_n]=0\), we write
where \(v_i=\tilde{\xi }_i^\tau A^{-1}\tilde{\xi }_i\), \(r_i=\tilde{\xi }_i^\tau A^{-1}\Sigma _e A^{-1}\tilde{\xi }_i\). By Lemma 6.7 and (A3), we have \(v_i=O_p(n^{-1})\) and \(r_i=O_p(n^{-2})\). Therefore, to prove (6.16), it is sufficient to prove that
First, we deal with \(I_1\). Since
to prove the desired result, one needs only to show that
In fact, from \(\max _{1\le i\le n}|v_i|=o(n^{-3/4})~a.s.\) by the proof of Lemma 3 in Owen (1990), and Lemma 6.11, it follows that \( \frac{1}{\sqrt{n}}I_1=o_p(1). \) Similarly \(\frac{1}{\sqrt{n}}I_2=o_p(1)\), \(\frac{1}{\sqrt{n}}I_3=o_p(1).\)
Meanwhile, \( \Vert \frac{1}{\sqrt{n}}I_{n4}\Vert =\frac{n}{\sqrt{n}}O_p(\frac{1}{n})\rightarrow 0. \) Similarly, we have
Recall the definition of A, B, C, D and Lemma 6.7, we have \(A^{-1}=O(\frac{1}{n})\), \(C=O(\frac{1}{n})\) and
Therefore, by (A3), one can easily obtain that \(\sqrt{n}D\sum _{i=1}^n\sum _{j\ne i}^n\tilde{\xi }_j\tilde{Y}_j=\sqrt{n}O\Big (\frac{1}{n^3}\Big )n^2O_p(1)\rightarrow 0.\) \(\square \)
Lemma 6.10
Suppose (A3) and (A6) are satisfied, then \(\frac{1}{n}\sum _{i=1}^n\epsilon _iW_{ik}\!=\!o(n^{-1/4})~~a.s. \) for \(1\le k\le p\).
Proof
Following the proof of Lemma 2 in Hong and Cheng (1994) under the independent case, using Lemmas 6.1 and 6.2, it is not difficult to prove this lemma. \(\square \)
Lemma 6.11
Suppose (A1)–(A3), (A5) and (A6) are satisfied, then \( \frac{1}{n}\sum _{i=1}^n[\tilde{\xi }_i(\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)+\Sigma _e\hat{\beta }_n] [\tilde{\xi }_i(\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)+\Sigma _e\hat{\beta }_n]^\tau \mathop {\rightarrow }\limits ^\mathrm{P} \Sigma _3\) and \(\max _{1\le i\le n}\Vert \hat{\beta }_n-\hat{\beta }_{n,-i}\Vert =O_p(n^{-1})\), where \(\Sigma _3=(\Sigma _1+\Sigma _e)(\sigma ^2+\beta ^\tau \Sigma _e\beta )-\Sigma _e\beta \beta ^\tau \Sigma _e\).
Proof
(i) Write
First, we evaluate the cross terms. By Lemmas 6.9 and 6.5, (A2) and (A3), we have
Similarly \(\Vert \frac{1}{n}\sum _{i=1}^n\tilde{\xi }_i(\tilde{Y}_i-\tilde{\xi }_i^\tau \beta )\tilde{\xi }_i^\tau \tilde{\xi }_i^\tau (\hat{\beta }_n-\beta )\Vert \mathop {\rightarrow }\limits ^\mathrm{P}0.\) Note that \(\sum _{i=1}^n[\tilde{\xi }_i(\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)+\Sigma _e\hat{\beta }_n]=0\), with Lemma 6.7 we have
Therefore, one can write (6.20) as
On applying Lemma 6.5 and (6.6) we have
With \(\max _{1\le i\le n}\Vert \tilde{\xi }_i\Vert =o(n^{1/{2\delta }})\), \(\Vert \hat{\beta }_n-\hat{\beta }\Vert =O_p(n^{-1/2})\), and Lemma 6.7, one can derive that \(H_2\rightarrow 0\), \(H_3\rightarrow \Sigma _e\beta \beta ^\tau \Sigma _e.\) Hence, the first conclusion is verified.
Similar to the derivation of (6.19), one can write
where \(v_i=\tilde{\xi }_i^\tau A^{-1}\tilde{\xi }_i\), \(r_i=\tilde{\xi }_i^\tau A^{-1}\Sigma _e A^{-1}\tilde{\xi }_i\). Then, it is sufficient to show that
For \(a_{1i}\), since \(E[\tilde{\xi }_i(\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)+\Sigma _e\hat{\beta }_n]=0\), \(E\Vert \tilde{\xi }_i(\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)+\Sigma _e\hat{\beta }_n\Vert ^\delta <\infty \) and \(\max _{1\le i\le n}|v_i|=o(n^{-3/4})~~a.s.\), we have \(\max _{1\le i\le n}\Vert \tilde{\xi }_i(\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)+\Sigma _e\hat{\beta }_n\Vert =O_p(1)\). Therefore, \( \max _{1\le i\le n}\Vert a_{1i}\Vert =O_p(n^{-1}). \) It is easy to see that
which implies \(\frac{n^2\max _{1\le i\le n}\Vert a_{i3}\Vert }{\sqrt{n}}\rightarrow 0\). Then \(\max _{1\le i\le n}\Vert a_{i3}\Vert =o_p(n^{-3/2}). \) Similarly, \(\max _{1\le i\le n}\Vert a_{6i}\Vert =o_p(n^{-3/2}).\)
From \(\max _{1\le i\le n}|v_i|\!=\!o(1)~a.s.\), \(\max _{1\le i\le n}|r_i|\!=\!o(n^{-1})~a.s.\) and \(\max _{1\le i\le n}\Vert \tilde{\xi }_i\Vert =o(n^{1/{2\delta }})~~a.s.\), it is easy to show that \(\max _{1\le i\le n}\Vert a_{2i}\Vert =o(n^{-1})\), \(\max _{1\le i\le n}\Vert a_{4i}\Vert =O(n^{-2})\), \(\max _{1\le i\le n}\Vert a_{5i}\Vert =o(n^{-1})\), \(\max _{1\le i\le n}\Vert a_{7i}\Vert =o(n^{-2})\), \(\max _{1\le i\le n}\Vert a_{8i}\Vert =o(n^{-1})\).
Then the proof of the second conclusion is completed. \(\square \)
Rights and permissions
About this article
Cite this article
Liu, AA., Liang, HY. Jackknife empirical likelihood of error variance in partially linear varying-coefficient errors-in-variables models. Stat Papers 58, 95–122 (2017). https://doi.org/10.1007/s00362-015-0689-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-015-0689-8
Keywords
- Asymptotic normality
- Error variance
- Jackknife empirical likelihood
- Varying-coefficient errors-in-variables model
- \(\alpha \)-Mixing