1 Introduction

Consider the following partially linear varying-coefficient errors-in-variables (EV) model

$$\begin{aligned} \left\{ \begin{array}{l} Y_i=X_i^\tau \beta +W_i^\tau a(T_i)+\epsilon _i, \\ \xi _i=X_i+e_i, \end{array} \right. i=1,2,\ldots ,n, \end{aligned}$$
(1.1)

where \(Y_i\) are the scalar response variables and \((X_i^\tau , W_i^\tau , T_i)\) are covariates, \(a(\cdot )=(a_1(\cdot ),\cdots ,a_q(\cdot ))^\tau \) is a q-dimensional vector of unknown coefficient functions, \(\beta =(\beta _1,\cdots ,\beta _p)^\tau \) is a p-dimensional vector of unknown parameters, \(\epsilon _i\) are random errors. Because of the curse of dimensionality, we assume that \(T_i\) is univariate; \(e_i\) are independent and identically distributed (i.i.d.) with mean zero and covariate matrix \(\Sigma _e\), and are independent of \((Y_i, X_i, W_i, T_i)\). In order to identify the model, \(\Sigma _e\) is assumed to be known. When \(\Sigma _e\) is unknown, one can employ the approaches proposed by Liang et al. (1999) to estimate it.

When \(X_i\) are observed exactly, the model (1.1) boils down to the partially linear varying-coefficient model, which has been studied by many authors, for example, Fan and Huang (2005) proposed a profile least square method to estimate the unknown parameter and studied the asymptotic normality of the estimator. Besides, based on the estimator, they introduced the profile empirical likelihood ratio test and showed the test statistic asymptotically \(\chi ^2\) distributed under the null hypothesis. In addition, Ahmad et al. (2005), You and Zhou (2006), Huang and Zhang (2009), Wang et al. (2011), Bravo (2014) extensively explored partially linear varying-coefficient models; Zhou et al. (2010), Wei et al. (2012), Singh et al. (2014) for similar research related to EV models.

For the model (1.1), You and Chen (2006) studied the case where the covariates were observed with measurement errors and proposed estimators for the parametric and nonparametric component respectively. When the covariates in nonparametric part are measured with errors, Feng and Xue (2014) investigated the profile least square estimators and conducted a linear hypothesis test for the parametric part.

It is worth pointing out that the works mentioned above all assume that variables or errors are independent. However, the independence assumption is inadequate in some applications, especially in the field of economics and financial analysis, where the data often exhibit dependence to some extent. Therefore, the dependence data have drawn considerable interests of statisticians. One case of them is serially correlated errors, such as AR(1) errors, MA(\(\infty \)) errors, negatively associated errors, martingale difference errors, etc. See, for example, the work of You et al. (2005), Liang et al. (2006), Liang and Jing (2009), You and Chen (2007), Fan et al. (2013), Fan et al. (2013) and Miao et al. (2013).

As we know, the empirical likelihood (EL) introduced by Owen (1988, (1990) is an effective method for constructing confidence regions which enjoys numerous nice properties over normal approximation-based methods and the bootstrap [see Hall (1992), Hall and La Scala (1990), Zi et al. (2012)]. The EL related to model (1.1) or partially linear varying-coefficient model has been studied by some authors, for example, You and Zhou (2006), Huang and Zhang (2009), Wang et al. (2011), and Fan et al. (2012) for the partially time-varying coefficient (in this case \(T_i=i/n\)) errors-in-variables model. It can be seen that the EL in these papers is based on linear functional of the studied parametric or nonparametric parts in the models. However, when nonlinear functionals are involved, such as U-statistics and variance of random sample, an application of the EL method will be computationally difficult and the Wilks theorem does not hold in general, i.e., the asymptotic distribution of the EL ratio is not a chi-squared distribution. Fortunately, in the study of the EL on one and two-sample U-statistics, Jing et al. (2009) proposed a new approach called jackknife empirical likelihood (JEL), which can handle the situation where nonlinear statistics are involved. At the same time, another attractive feature of the JEL is that the new method is simple to use. Thanks to the advantages, the JEL method has been applied recent years. See, for example, Gong et al. (2010), Peng (2012), Peng et al. (2012) and Feng and Peng (2012).

In the sequel, we assume that \(\{(X_i, W_i, T_i, \epsilon _i),i\ge 1\}\) is a sequence of stationary \(\alpha \)-mixing random variables with \(E(\epsilon _i|X_i,W_i,T_i)=0 ~a.s.\) and \(E(\epsilon _i^2|X_i,W_i,T_i)=\sigma ^2 ~a.s.\) from the model (1.1). Recall that a sequence \(\{\zeta _k, k\ge 1\}\) is said to be \(\alpha \)-mixing if the \(\alpha \)-mixing coefficient

$$\begin{aligned} \alpha (n):\mathop {=}\limits ^\mathrm{def}\sup _{k\ge 1}\sup \{|P(AB)-P(A)P(B)|: A\in \mathcal{F}^\infty _{n+k}, B\in \mathcal{F}^k_1\} \end{aligned}$$

converges to zero as \(n\rightarrow \infty \), where \(\mathcal{F}^m_l=\sigma \{\zeta _l, \zeta _{l+1},\cdots ,\zeta _m\}\) denotes the \(\sigma \)-algebra generated by \(\zeta _l, \zeta _{l+1},\ldots ,\zeta _m\) with \(l\le m\). As we know, among the most frequently used mixing conditions, the \(\alpha \)-mixing is the weakest and many time series present \(\alpha \)-mixing property. For a more detailed and general review, we refer to Doukhan (1994) and Lin and Lu (1996).

In this paper, we focus on estimating the error variance \(\sigma ^2\), and investigate asymptotic normality of estimator for the error variance. It is well known that the error of a regression model impacts its performance, and the study for the error variance could help researchers to improve the accuracy of the model. So it is necessary to investigate large sample properties of the estimators of the error variance. Up to now, only a few researchers have discussed the asymptotic normality of the estimator for the error variance. Among of them, we refer to You and Chen (2006), Liang and Jing (2009), Zhang and Liang (2012) and Fan et al. (2013), Fan et al. (2013). At the same time, we construct Jackknife estimator as well as JEL statistic of \(\sigma ^2\), and prove that they are asymptotic normality and asymptotic \(\chi ^2\) distribution, respectively. Based on the JEL statistic of \(\sigma ^2\), we can construct its confidence interval which plays a crucial role in quantifying estimation uncertainty. With the study for error variance, we can get more comprehensive understanding of statistical models. Hence, the statistical inference can be improved. These results are new, even for independent data.

We organize the paper as follows. In Sect. 2, we give the methodologies and show how to build the estimators. Main results are listed in Sect. 3. Section 4 presents a simulation study to verify the idea and demonstrate the advantages of jackknife method. Proofs of Main Results are put in Sect. 5. Some preliminary lemmas, which are used in the proofs of the main results, are collected in Appendix.

2 Estimators

2.1 Profile least squares estimation

The local linear regression technique is applied to estimate the coefficient functions \(\{a_j(\cdot ),j=1,2,\cdots ,q\}\) in (1.1). For t in a small neighborhood of \(t_0\), one can approximate a(t) locally by a linear function \(a_j(t)\approx a_j(t_0)+a'_j(t_0)(t-t_0)\equiv a^*_j+b^*_j(t-t_0)\), \(j=1,2,\cdots ,q,\) where \(a'_j(t)=\partial a_j(t)/\partial t\). This leads to the following weighted local least-squares problem if \(\beta \) is known: find \((a^*,b^*)\) so as to minimize

$$\begin{aligned} \sum _{i=1}^n\Big [Y_i-X_i^\tau \beta -\Big (W_i^\tau ,~~\frac{T_i-t}{h}W_i^\tau \Big )\Big ( \begin{array}{c} a^*\\ hb^* \end{array} \Big )\Big ]^2 K_h(T_i-t), \end{aligned}$$
(2.1)

where \(a^*=(a^*_1,a^*_2,\cdots ,a^*_q)^\tau \), \(b^*=(b^*_1,b^*_2,\cdots ,b^*_q)^\tau \), \(K_h(\cdot )=K(\cdot /h)/h\), \(K(\cdot )\) is a kernel function and \(0<h:=h_n\rightarrow 0\) is a bandwidth.

For the sake of descriptive convenience, we denote \(\mathbf Y =(Y_1,Y_2,\cdots ,Y_n)^\tau , \mathbf X =(X_1,X_2,\cdots ,X_n)^\tau , \mathbf W =(W_1,W_2,\cdots ,W_n)^\tau , \omega _t=diag(K_h(T_1-t),K_h(T_2-t),\cdots ,K_h(T_n-t))\), and

$$\begin{aligned} M=\left( \begin{array}{c} W_1^\tau a(T_1)\\ \vdots \\ W_n^\tau a(T_n) \end{array}\right) ,~~ D_t=\left( \begin{array}{cc} W_1^\tau &{}\quad \frac{T_1-t}{h}W_1^\tau \\ \vdots &{}\quad \vdots \\ W_n^\tau &{}\quad \frac{T_n-t}{h}W_n^\tau \end{array}\right) . \end{aligned}$$

Then the minimizer in (2.1) is found to be \( \Big (\begin{array}{c} \hat{a}^* \\ h\hat{b}^* \end{array}\Big ) =\{D_t^\tau \omega _tD_t\}^{-1}D_t^\tau \omega _t(\mathbf Y -\mathbf X \beta ). \) Therefore, when \(\beta \) is known, we obtain the estimator of \(\alpha (t)\) by

$$\begin{aligned} \tilde{a}(t,\beta )=\Big (I_q,~~0_q\Big )\{D_t^\tau \omega _tD_t\}^{-1}D_t^\tau \omega _t(\mathbf Y -\mathbf X \beta ). \end{aligned}$$
(2.2)

Let \(S_i=\Big (W_i^\tau ~~ 0\Big )\{D_{T_i}^\tau \omega _{T_i}D_{T_i}\}^{-1}D_{T_i}^\tau \omega _{T_i}\), \(\tilde{Y}_i=Y_i-S_i\mathbf Y \) and \(\tilde{X}_i^\tau =X_i^\tau -S_i\mathbf X \). Substituting (2.2) into the original varying-coefficient model, and applying the least square method, one can obtain the estimator of parametric component \(\beta \), \( \tilde{\beta }=(\sum _{i=1}^n\tilde{X}_i\tilde{X}_i^\tau )^{-1}\sum _{i=1}^n\tilde{X}_i\tilde{Y}_i. \) However, since \(X_i\) cannot be observed directly and we have \(\xi _i=X_i+e_i\) instead, we can write (2.1) as

$$\begin{aligned} \sum _{i=1}^n\Big [Y_i-\xi _i^\tau \beta -\Big (W_i^\tau ,~~\frac{T_i-t}{h}W_i^\tau \Big )\Big ( \begin{array}{c} a^*\\ hb^* \end{array} \Big )\Big ]^2 K_h(T_i-t)-n\beta ^\tau \Sigma _e\beta . \end{aligned}$$

Similarly, one can obtain the following modified profile least squares estimator of \(\beta \)

$$\begin{aligned} \hat{\beta }_n=\Big (\sum _{i=1}^n\tilde{\xi }_i\tilde{\xi }_i^\tau -n\Sigma _e\Big )^{-1}\sum _{i=1}^n\tilde{\xi }_i\tilde{Y}_i, \end{aligned}$$

and the estimators of \(a(\cdot )\) and \(\sigma ^2\), respectively

$$\begin{aligned} \hat{a}_n(t)=\Big (I_q,~~0_q\Big )\{D_t^\tau \omega _tD_t\}^{-1}D_t^\tau \omega _t(\mathbf Y -\mathbf \xi \hat{\beta }_n), \\ \hat{\sigma }_n^2=\frac{1}{n}\sum _{i=1}^n[Y_i-\xi _i^\tau \hat{\beta }_n-W_i^\tau \hat{a}_n(T_i)]^2-\hat{\beta }_n^\tau \Sigma _e\hat{\beta }_n. \end{aligned}$$

2.2 Jackknife method

Since the estimators we have constructed are based on samples \((\tilde{\xi }_i, \tilde{Y}_i)_{i=1}^n\), they are regarded as the pseudo observations. Let \(\hat{\beta }_{n,-i}\) be the estimator of \(\beta \) when the ith observation is deleted,

$$\begin{aligned} \hat{\beta }_{n,-i}=\Big [\sum _{j\ne i}^n\tilde{\xi }_j\tilde{\xi }_j^\tau -(n-1)\Sigma _e\Big ]^{-1}\sum _{j\ne i}^n\tilde{\xi }_j\tilde{Y}_j. \end{aligned}$$

Therefore the ith Jackknife pseudo sample is \( J_i=n\hat{\beta }_n-(n-1)\hat{\beta }_{n,-i}. \) Hence, we have the Jackknife estimator of \(\beta \)

$$\begin{aligned} \hat{\beta }_J=\frac{1}{n}\sum _{i=1}^nJ_i=n\hat{\beta }_n-\frac{(n-1)}{n}\sum _{i=1}^n\hat{\beta }_{n,-i}. \end{aligned}$$

From \( \hat{\sigma }_n^2=\frac{1}{n}\sum _{i=1}^n(\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)^2-\hat{\beta }_n^\tau \Sigma _e\hat{\beta }_n, \) similarly, let \(\hat{\sigma }^2_{n,-i}\) be the estimator of \(\sigma ^2\) when the ith observation is deleted, \( \hat{\sigma }_{n,-i}^2=\frac{1}{n-1}\sum _{j\ne i}^n(\tilde{Y}_j-\tilde{\xi }_j^\tau \hat{\beta }_{n,-i})^2 -\hat{\beta }_{n,-i}^\tau \Sigma _e\hat{\beta }_{n,-i}. \) Then we have the ith Jackknife pseudo sample \( \sigma _{J_i}^2=n\hat{\sigma }_n^2-(n-1)\hat{\sigma }_{n,-i}^2, \) and the Jackknife estimator of \(\sigma ^2\)

$$\begin{aligned} \hat{\sigma }_J^2=\frac{1}{n}\sum _{i=1}^n\sigma _{J_i}^2=n\hat{\sigma }_n^2-\frac{n-1}{n}\sum _{i=1}^n\hat{\sigma }_{n,-i}^2. \end{aligned}$$

Based on the Jackknife pseudo sample, one constructs the Jackknife empirical likelihood of \(\sigma ^2\)

$$\begin{aligned} L(\sigma ^2)\!:=\!\sup \Big \{\prod _{i=1}^n np_i: p_1>0, p_2>0,\ldots ,p_n>0,\sum _{i=1}^np_i=1,\sum _{i=1}^np_i\sigma _{J_i}^2=\sigma ^2\Big \}. \end{aligned}$$

The solution to the above maximization is \( \hat{p}_i=\frac{1}{n[1+\lambda (\sigma _{J_i}^2-\sigma ^2)]},~~i=1,2,\ldots ,n, \) where \(\lambda \) satisfies \( \frac{1}{n}\sum _{i=1}^n\frac{\sigma _{J_i}^2-\sigma ^2}{1+\lambda (\sigma _{J_i}^2-\sigma ^2)}=0. \) Therefore, we have the log empirical likelihood ratio function of \(\sigma ^2\)

$$\begin{aligned} l(\sigma ^2)=2\sum _{i=1}^n\log [1+\lambda (\sigma _{J_i}^2-\sigma ^2)]. \end{aligned}$$

3 Main results

In order to formulate the main results, we need to impose the following basic assumptions.

  1. (A1)

    The random variable T has bounded support \(\Omega \), and its density function \(f(\cdot )\) is Lipschitz continuous and away from 0 on its support.

  2. (A2)

    The \(q\times q\) matrix \(E(\mathbf W \mathbf W ^\tau |T)\) is nonsingular for each \(T \in \Omega \). \(E(\mathbf X \mathbf X ^\tau |T)\), \(E(\mathbf W \mathbf W ^\tau |T)\) and \(E(\mathbf X \mathbf W ^\tau |T)\) are all lipschitz continuous. Set \(\Gamma (T_i)=E(W_iW_i^\tau |T_i)\), \(\Phi (T_i)=E(X_iW_i^\tau |T_i)\), \(i=1,2,\cdots ,n\), the derivatives of order 2 of functions \(\Gamma (\cdot )\) and \(\Phi (\cdot )\) are bounded for each \(T\in \Omega \). The \(q\times q\) matrix \(EX_1X_1^\tau -E\Phi ^\tau (T_1)\Gamma ^{-1}(T_1)\Phi (T_1)\) is positive definite.

  3. (A3)

    There is a \(\delta >4\) such that \(E(\Vert X_1\Vert ^{2\delta }|T_1)<\infty ~ a.s.\), \(E(\Vert W_1\Vert ^{2\delta }|T_1)<\infty ~ a.s.\), \(E\Vert \xi _1\Vert ^{2\delta }<\infty ~ a.s.\), \(E[|\epsilon _1|^{2\delta }|X_1,W_1]<\infty ~ a.s.\)

  4. (A4)

    \(\{a_j(\cdot ),j=1,2,\cdots ,q\}\) have continuous second derivatives in \(T\in \Omega \).

  5. (A5)

    The function \(K(\cdot )\) is a symmetric probability density function with bounded compact support which is Lipschitz continuous as well, and the bandwidth h satisfies \(nh^8\rightarrow 0\) and \(nh^2/(\log n)^2\rightarrow \infty \).

  6. (A6)

    The \(\alpha \)-mixing coefficient \(\alpha (n)\) satisfies that \(\alpha (n)=O(n^{-\lambda })\) for some \(\lambda >\max \{\frac{7\delta +4}{\delta -4},\frac{9\delta +4}{\delta +4}\}\) with the same \(\delta \) as in (A3).

Remark 3.1

  1. (a)

    Assumptions (A1)–(A6) are quite mild and commonly used in literature. Particularly, (A1)–(A2) and (A4)–(A5) are employed in Fan and Huang (2005), Feng and Xue (2014).

  2. (b)

    Assumptions (A3) implies \(E\Vert X_1\Vert ^{2\delta }<\infty \) and \(E\Vert W_1\Vert ^{2\delta }<\infty \).

  3. (c)

    Assumption (A6) indicates relatively low mixing speed. In fact, when the \(\alpha \)-mixing coefficient decays exponentially, i.e. \(\alpha (n)=O(\rho ^n)\), \(0<\rho <1\), one can verify easily that (A6) is satisfied.

Theorem 3.1

  1. (i)

    Suppose assumptions (A1)–(A6) are satisfied, then \( \sqrt{n}(\hat{\sigma }_n^2-\sigma ^2)\mathop {\rightarrow }\limits ^\mathcal{{D}} N(0,\Pi ), \) where \(\Pi =\lim _{n\rightarrow \infty }Var\{\frac{1}{\sqrt{n}}\sum _{i=1}^n(\epsilon _i-e_i^\tau \beta )^2\}\). Further, \(\hat{\Pi }\) is a plug-in estimator of \(\Pi \), where \( \hat{\Pi }=\frac{1}{n}\{\sum _{i=1}^n [(\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)^2-\hat{\beta }_n^\tau \Sigma _e\hat{\beta }_n-\hat{\sigma }_n^2]\}^2. \)

  2. (ii)

    Suppose assumptions (A1)–(A6) are satisfied, then \( \sqrt{n}(\hat{\sigma }^2_J-\sigma ^2)=\sqrt{n}(\hat{\sigma }^2_n-\sigma ^2)+o_p(1). \) Furthermore, with (i) we have \(\sqrt{n}(\hat{\sigma }_J^2-\sigma ^2)\mathop {\rightarrow }\limits ^\mathcal{{D}} N(0,\Pi ).\)

Theorem 3.2

Suppose assumptions (A1)–(A6) are satisfied, then \( \frac{\Sigma _4}{\Pi }l(\sigma ^2)\mathop {\rightarrow }\limits ^\mathcal{{D}}\chi _1^2, \) where \(\Sigma _4=E(\epsilon _1-e_1^\tau \beta )^4-(\sigma ^2+\beta ^\tau \Sigma _e\beta )^2>0\). Moreover, \(\hat{\Sigma }_4\) is a plug-in estimator of \(\Sigma _4\), where \( \hat{\Sigma }_4=\frac{1}{n}\sum _{i=1}^n\{ (\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)^4-(\hat{\beta }_n^\tau \Sigma _e\hat{\beta }_n+\hat{\sigma }_n^2)^2\}. \)

Remark 3.2

  1. (a)

    Under the conditions of Theorem 3.2, if \(\{\epsilon _i\}\) is a sequence of independent random variables, then one can verify \(\Pi =\Sigma _4\) and \(l(\sigma ^2)\mathop {\rightarrow }\limits ^\mathcal{{D}}\chi _1^2\). In this case, the jackknife empirical likelihood method does not relate to estimation for the asymptotic variance \(\Sigma _4\) of the jackknife pseudo samples. However, when \(\{\epsilon _i\}\) is a sequence of dependent random variables, we cannot ignore the covariance between \((\epsilon _i-e_i^\tau \beta )^2\) and \((\epsilon _j-e_j^\tau \beta )^2)\) for \(i\ne j\), which leads to \(\Pi \ne \Sigma _4\). Thus, to construct an approximate confidence interval of \(\sigma ^2\), we need to estimate \(\Pi \) and \(\Sigma _4\).

  2. (b)

    From Theorem 3.2, it is easy to construct an approximate confidence region with level \(1-\tau \) for \(\sigma ^2\) as \(I(\tau )=\{\sigma ^2: \frac{\hat{\Sigma _4}}{\hat{\Pi }}l(\sigma ^2)\le c_\tau \}\), where \(c_\tau \) is chosen to satisfy \(P(\chi _1^2\le c_\tau )=1-\tau \).

4 Simulation

In this section, we conduct numerical simulation to investigate the finite sample behavior of the profile least square estimator \(\hat{\sigma }_n^2\) and the jackknife estimator \(\hat{\sigma }_J^2\) in terms of sample means, bias, mean square error (MSE). Besides, we study the performance of proposed jackknife empirical likelihood method for constructing confidence intervals for \(\sigma ^2\) and compare it with normal approximation method in terms of coverage probability and average interval length.

Consider the following partially linear varying-coefficient EV model:

$$\begin{aligned} \left\{ \begin{aligned} Y_i&=X_{1i}\beta _1+X_{2i}\beta _2+W_{1i}a_1(T_i)+W_{2i}a_2(T_i)+\epsilon _i,\\ \xi _i&=X_i+e_i,\\ \end{aligned} \right. ~~i=1,2,\ldots ,n, \end{aligned}$$

where \(\beta _1=1\), \(\beta _2=2\), \(a_1(T)=sin(6\pi T)\), \(a_2=sin(2\pi T)\). The measurement error \(e_i \sim N(0,\Sigma _e)\), where \(\Sigma _e=0.3^2I_2\) and \(I_2\) is the \(2\times 2\) identity matrix. \(X_i, W_i, T_i, \epsilon _i\) are generated from AR(1) model as follows:

  • \(X_{i,j}=\rho X_{i,j-1}+u_{i,j}\), \(i=1,2\) with \(u_{i,j}\) are i.i.d. N(0, 1),

  • \(W_{i,j}=\rho ^2 W_{i,j-1}+w_{i,j}\), \(i=1,2\) with \(w_{i,j}\) are i.i.d. N(0, 1),

  • \(T_j=\sqrt{\rho } T_{j-1}+t_j\), \(t_j\) are i.i.d. \(N(0,0.1^2)\),

  • \(\epsilon _j=\rho \epsilon _{j-1}+\eta _j\), \(\eta _j\) are i.i.d. N(0, 0.5).

It is easy to verify that \(\{X_i, W_i, T_i, \epsilon _i\}\) is a sequence of stationary and \(\alpha \)-mixing random variables (see Doukhan (1994)) with \(0<\rho <1\). When \(\rho =0\), \(\{(X_i, W_i, T_i, \epsilon _i),~~i=1,2,\ldots ,n\}\) are i.i.d. random variables. In order to investigate the influence of dependence on the estimators, we take the samples with \(\rho \)=0, 0.2, 0.5, 0.8, respectively. In fact, since the data generated from AR(1) model, one can easily find that the true value of \(\sigma ^2=0.5/(1-\rho ^2)\), which means that when the coefficient \(\rho \) changes, \(\sigma ^2\) changes as well.

The following simulation is based 1000 replications. For the proposed estimators, we employ the Epanechnikov kernel function \(K(u)=15/16(1-u^2)^2I(|u|\le 1)\), and the bandwidth h is selected by minimizing the MSE in a grid search.

Table 1 Sample means, biases and mean square errors for the estimator \(\hat{\sigma }_n^2\) and \(\hat{\sigma }_J^2\)

Taking sample sizes \(n=50\), 100, 200, 500, we calculate bias and MSE of \(\hat{\sigma }_n^2\) and \(\hat{\sigma }_J^2\), respectively, to evaluate the two estimators’ performance. According to Table 1, basically, the jackknife estimator performs better than the profile least square estimator. Both Bias(\(\hat{\sigma }_J^2\)) and MSE(\(\hat{\sigma }_J^2\)) are smaller than those of \(\hat{\sigma }_n^2\). Besides, both estimators get more accurate when n increases. The gap between \(MSE(\hat{\sigma }_n^2)\) and \(MSE(\hat{\sigma }_J^2)\) becomes narrow as n increasing. In other words, the jackknife estimator can significantly improve the estimation accuracy when sample size is small. In addition, as the dependence of observations increases (i.e., \(\rho \) increases), which leads to larger \(\sigma ^2\), the accuracy of estimation slightly decreases when observations present relatively strong dependence. Specifically, the MSE for both estimators become larger as \(\sigma ^2\) rise.

Table 2 Coverage probabilities for the jackknife empirical likelihood (\(CP_J\)) and the normal approximation method based on \(\sigma _n^2\) (\(CP_N\)) with confidence level 0.90, 0.95, respectively, and their corresponding average interval lengths \(AIL_J\) and \(AIL_N\)

Coverage probabilities and average interval lengths are reported in Table 2, showing that the jackknife empirical likelihood method is much more accurate than the normal approximation method in all scenarios in terms of coverage probabilities. Since it is obvious that the coverage probabilities for JEL are closer to the level than normal approximation method (NAM). In most cases, the average interval lengths based on JEL are smaller than NAM. More precisely, as n increases, the coverage probabilities for both JEL method and NAM become closer to the level, the confidence intervals for both methods becomes narrow. When \(\rho =0\) i.e. independent cases, JEL performs much better than NAM with higher coverage probabilities and shorter confidence intervals. When dependence increases, the coverage probabilities slightly fall down, due to the fact that stronger dependence leads to bigger variance \(\sigma ^2\).

5 Proofs of main results

Throughout this paper, let C, \(C_1\), \(C_2\) denote finite positive constants, whose values may change in different scenarios. Let \(\mu _i=\int u^iK(u)du\), and \(c_n=\{\log (n)/(nh)\}^{1/2}+h^2\). From (A5), one can easily verify that \(c_n=o(n^{-1/4})\). Set \(\epsilon =(\epsilon _1,\epsilon _2,\cdots ,\epsilon _n)^\tau ,~~\mathbf 1 _n=(1,1,\cdots ,1)^\tau \).

Proof of Theorem 3.1

(i) From Lemma 6.3, it follows that \( \frac{1}{\sqrt{n}}\sum _{i=1}^n[(\epsilon _i-e_i^\tau \beta )^2-(\sigma ^2+\beta ^\tau \Sigma _e\beta )] \mathop {\rightarrow }\limits ^\mathcal{{D}} N(0,\Pi ), \) where \(\Pi =\lim _{n\rightarrow \infty }Var\{\frac{1}{\sqrt{n}}\sum _{i=1}^n(\epsilon _i-e_i^\tau \beta )^2\}\). Therefore, to prove Theorem 3.1 (i), it is sufficient to show that

$$\begin{aligned} \hat{\sigma }_n^2-\sigma ^2&=\frac{1}{n}\sum _{i=1}^n[(\epsilon _i-e_i^\tau \beta )^2-(\sigma ^2+\beta ^\tau \Sigma _e\beta )]+o_p(1). \end{aligned}$$

From \( \hat{\sigma }_n^2=\frac{1}{n}\sum _{i=1}^n(\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)^2 -\hat{\beta }_n^\tau \Sigma _e\hat{\beta }_n, \) one can write

$$\begin{aligned} \hat{\sigma }_n^2-\sigma ^2&=\Big [\frac{1}{n}\sum _{i=1}^n(\tilde{Y}_i-\tilde{X}_i^\tau \hat{\beta }_n)^2-\sigma ^2\Big ]+\Big [\frac{1}{n}\sum _{i=1}^n\hat{\beta }_n^\tau \tilde{e}_i\tilde{e}_i^\tau \hat{\beta }_n-\hat{\beta }_n^\tau \Sigma _e\hat{\beta }_n\Big ] \nonumber \\&\qquad -\,\Big [\frac{2}{n}\sum _{i=1}^n(\tilde{Y}_i-\tilde{X}_i^\tau \hat{\beta }_n)\hat{\beta }_n^\tau \tilde{e}_i\Big ] \nonumber \\&:=A_1+A_2-A_3. \end{aligned}$$
(5.1)

First, we prove that

$$\begin{aligned} A_1=\frac{1}{n}\sum _{i=1}^n(\epsilon _i^2-\sigma ^2)+o_p\Big (\frac{1}{\sqrt{n}}\Big ), \end{aligned}$$
(5.2)
$$\begin{aligned} A_2=\frac{1}{n}\sum _{i=1}^n\beta ^\tau (e_ie_i^\tau -\Sigma _e)\beta +o_p\Big (\frac{1}{\sqrt{n}}\Big ), \end{aligned}$$
(5.3)
$$\begin{aligned} A_3=\frac{2}{n}\sum _{i=1}^n\epsilon _ie_i^\tau \beta +o_p\Big (\frac{1}{\sqrt{n}}\Big ). \end{aligned}$$
(5.4)

From the definition of \(\tilde{Y}_i\) and (1.1), one can write

$$\begin{aligned} A_1&=\frac{1}{n}\sum _{i=1}^n(\epsilon _i^2-\sigma ^2)+\frac{1}{n}\sum _{i=1}^n[\tilde{X}_i^\tau (\beta -\hat{\beta }_n)]^2+\frac{1}{n}\sum _{i=1}^n\tilde{M}_i^2+\frac{1}{n}\sum _{i=1}^n(S_i\epsilon )^2 \nonumber \\&\quad \ +\frac{2}{n}\sum _{i=1}^n[\tilde{X}_i^\tau (\beta -\hat{\beta }_n)]\tilde{M}_i+\frac{2}{n}\sum _{i=1}^n[\tilde{X}_i^\tau (\beta -\hat{\beta }_n)]\tilde{\epsilon }_i+\frac{2}{n}\sum _{i=1}^n\tilde{M}_i\tilde{\epsilon }_i\!-\!\frac{2}{n}\sum _{i=1}^n\epsilon _iS_i\epsilon \nonumber \\&:=\frac{1}{n}\sum _{i=1}^n(\epsilon _i^2-\sigma ^2)+\sum _{j=1}^7A_{1j}. \end{aligned}$$
(5.5)

Note that from the proof of Lemma 3 in Owen (1990) and (A3), we have \(\max _{1\le i\le n}\Vert X_i\Vert =o(n^{1/2\delta })~~a.s.\) and \(\max _{1\le i\le n}\Vert W_i\Vert =o(n^{1/2\delta })~~a.s.\)

Furthermore, from Lemma 6.6 and (A2), we have

$$\begin{aligned} \max _{1\le i\le n}\Vert \tilde{X}_i\Vert&\le \max _{1\le i\le n}\Vert X_i\Vert +\max _{1\le i\le n}\Vert W_i^\tau \Gamma ^{-1}(T_i)\Phi (T_i)\Vert \{1+O_p(c_n)\} \\&\le O_p(n^{1/{2\delta }})+C\max _{1\le i\le n}\Vert W_i^\tau \Vert \{1+O_p(c_n)\}=O_p(n^{1/{2\delta }}). \end{aligned}$$

Lemma 6.9 (i) gives \(\Vert \hat{\beta }_n-\beta \Vert =O_p(n^{-1/2})\), therefore

$$\begin{aligned} A_{11}=\frac{1}{n}\sum _{i=1}^n[\tilde{X}_i^\tau (\beta -\hat{\beta }_n)]^2\le \max _{1\le i\le n}\Vert \tilde{X}_i\Vert ^2\Vert \beta -\hat{\beta }_n\Vert ^2=O_p(n^{1/\delta -1})=o_p(n^{-1/2}). \end{aligned}$$
(5.6)

From (A1)–(A4), one can easily obtain that \(P\big (\frac{1}{n}\sum _{i=1}^n(W_i^\tau a(T_i))^2>\eta \big ) \le \frac{E[a^\tau (T_1)\Gamma (T_1)a(T_1)]}{\eta }<\frac{C}{\eta }, \) which implies \(\frac{1}{n}\sum _{i=1}^n(W_i^\tau a(T_i))^2=O_p(1)\). Together with (6.9) and (A5) we have

$$\begin{aligned} A_{12}=\frac{1}{n}\sum _{i=1}^n(W_i^\tau a(T_i))^2O_p(c_n^2)= O_p(c_n^2)=o_p(n^{-1/2}). \end{aligned}$$
(5.7)

Note that \(\frac{1}{n}\sum _{i=1}^nW_iW_i^\tau =O_p(1)\). Therefore, together with (6.14), we have

$$\begin{aligned} A_{13}=\frac{1}{n}\sum _{i=1}^n(S_i\epsilon )^2= \frac{1}{n}\sum _{i=1}^nW_i^\tau W_i O_p\Big (\frac{\log n}{nh}\Big ) =O_p\Big (\frac{\log n}{nh}\Big )=o_p(n^{-1/2}). \end{aligned}$$
(5.8)

From (6.9), (A3) and (A4), we have \( \max _{1\le i\le n}|\tilde{M}_i|=\max _{1\le i\le n}|W_i^\tau a(T_i)|O_p(c_n) =O_p(n^{1/{2\delta }})O_p(c_n). \) Similar to the proof of (5.6), one can obtain that

$$\begin{aligned} |A_{14}|\le 2(\max _{1\le i\le n}\Vert \tilde{X}_i^\tau \Vert \Vert \beta -\hat{\beta }_n\Vert \max _{1\le i\le n}|\tilde{M}_i|) =O_p(n^{1/\delta -1/2}c_n)=o_p(n^{-1/2}). \end{aligned}$$
(5.9)

As to \(A_{15}\), by (6.6), (6.14), Lemma 6.10, (A1), (A2) and (A5), we have

$$\begin{aligned} |A_{15}|&=\Bigg |\frac{2}{n}\sum _{i=1}\tilde{X}_i^\tau (\beta -\hat{\beta }_n)\epsilon _i-\frac{2}{n}\sum _{i=1}^n\tilde{X}_i^\tau (\beta -\hat{\beta }_n)W_i^\tau O_p\Bigg (\sqrt{\frac{\log n}{nh}}\Bigg )\Bigg | \nonumber \\&\le \Bigg \Vert \frac{2}{n}\sum _{i=1}X_i^\tau \epsilon _i\Bigg \Vert \Bigg \Vert \beta -\hat{\beta }_n\Bigg \Vert [1+O_p(c_n)] \nonumber \\&\ \quad +\Bigg \Vert \frac{2}{n}\sum _{i=1}W_i^\tau \Gamma ^{-1}(T_i)\Phi (T_i)\epsilon _i\Bigg \Vert \Bigg \Vert \beta -\hat{\beta }_n\Bigg \Vert [1+O_p(c_n)] \nonumber \\&\quad \ +\max _{1\le i\le n}\Vert \tilde{X}_i^\tau \Vert \max _{1\le i\le n}\Vert W_i^\tau \Vert \Vert \beta -\hat{\beta }_n\Vert O_p\Bigg (\sqrt{\frac{\log n}{nh}}\Bigg ) \nonumber \\&=o(n^{-1/4})O_p(n^{-1/2})+O_p(n^{1/\delta })O_p(n^{-1/2})O_p\Bigg (\sqrt{\frac{\log n}{nh}}\Bigg ) =o_p(n^{-1/2}). \end{aligned}$$
(5.10)

From (A1), (A2), (A4), it is easy to verify that \(|\frac{1}{n}a^\tau (T_i)W_iW_i^\tau \mathbf 1 |=O_p(1)\). Therefore, with Lemma 6.10, (6.9) and (6.14), we have

$$\begin{aligned} |A_{16}|&=\Bigg |\frac{2}{n}\sum _{i=1}^na^\tau (T_i)W_i\tilde{\epsilon }_i\Bigg |O_p(c_n)\le \Bigg |\frac{2}{n}\sum _{i=1}^na^\tau (T_i)W_i\epsilon _i\Bigg |O_p(c_n) \nonumber \\&\qquad +\,\Bigg |\frac{2}{n}\sum _{i=1}^na^\tau (T_i)W_iW_i^\tau \mathbf 1 \Bigg |O_p(c_n)O_p\Bigg (\sqrt{\frac{\log n}{nh}}\Bigg ) =o_p(n^{-1/2}). \end{aligned}$$
(5.11)

From Lemma 6.10 and (6.14), it is directly derived that

$$\begin{aligned} |A_{17}|=\Bigg |\frac{2}{n}\sum _{i=1}^n\epsilon _iW_i^\tau \mathbf 1 \Bigg |O_p\Bigg (\sqrt{\frac{\log n}{nh}}\Bigg ) =o(n^{-1/4})O_p\Bigg (\sqrt{\frac{\log n}{nh}}\Bigg )=o_p(n^{-1/2}). \end{aligned}$$
(5.12)

Hence, with (5.4)–(5.7),(5.8)–(5.12), we finish the proof of (5.2). Write

$$\begin{aligned} A_2&=\frac{1}{n}\sum _{i=1}^n\beta ^\tau (e_ie_i^\tau -\Sigma _e)\beta \frac{1}{n}\sum _{i=1}^n(\hat{\beta }_n-\beta )^\tau (e_ie_i^\tau -\Sigma _e)(\hat{\beta }_n-\beta ) \nonumber \\&\quad \ +\frac{1}{n}\sum _{i=1}^n(\hat{\beta }_n-\beta )^\tau (e_ie_i^\tau -\Sigma _e)\beta +\frac{1}{n}\sum _{i=1}^n\beta ^\tau (e_ie_i^\tau -\Sigma _e)(\hat{\beta }_n-\beta ) \nonumber \\&:=\frac{1}{n}\sum _{i=1}^n\beta ^\tau (e_ie_i^\tau -\Sigma _e)\beta +A_{21}+A_{22}+A_{23}. \end{aligned}$$
(5.13)

Note that \(\frac{1}{n}\sum _{i=1}^ne_ie_i^\tau -\Sigma _e=o_p(1)\) from the strong law of large number for i.i.d. random variables and \(\Vert \hat{\beta }_n-\beta \Vert =O_p(n^{-1/2})\). Then

$$\begin{aligned} |A_{21}|&=\Bigg |(\hat{\beta }_n-\beta )^\tau \Bigg [\frac{1}{n}\sum _{i=1}^ne_ie_i^\tau -\Sigma _e\Bigg ](\hat{\beta }_n-\beta )\Bigg |=o_p(n^{-1})=o_p(n^{-1/2}), \end{aligned}$$
(5.14)
$$\begin{aligned} |A_{22}|&=\Bigg |(\hat{\beta }_n-\beta )^\tau \Bigg [\frac{1}{n}\sum _{i=1}^ne_ie_i^\tau -\Sigma _e\Bigg ]\beta \Bigg | =o_p(n^{-1/2}), \end{aligned}$$
(5.15)
$$\begin{aligned} |A_{23}|&=\Bigg |\beta ^\tau \Bigg [\frac{1}{n}\sum _{i=1}^ne_ie_i^\tau -\Sigma _e\Bigg ](\hat{\beta }_n-\beta )\Bigg | =o_p(n^{-1/2}). \end{aligned}$$
(5.16)

Hence, by (5.13)–(5.16), we complete the proof of (5.3). Write

$$\begin{aligned} A_3&=\frac{2}{n}\sum _{i=1}^n\epsilon _ie_i^\tau \beta -\frac{2}{n}\sum _{i=1}^2S_i\epsilon e_i^\tau \beta +\frac{2}{n}\sum _{i=1}^n[\tilde{X}_i^\tau (\beta -\hat{\beta }_n)+\tilde{M}_i]\beta ^\tau e_i \nonumber \\&\quad \ +\frac{2}{n}\sum _{i=1}^n[\tilde{X}_i^\tau (\beta -\hat{\beta }_n)+\tilde{M}_i+\tilde{\epsilon }_i](\hat{\beta }_n-\beta )^\tau e_i \nonumber \\&:=\frac{2}{n}\sum _{i=1}^n\epsilon _ie_i^\tau \beta +A_{31}+A_{32}+A_{33}. \end{aligned}$$
(5.17)

Applying Lemma 6.3, we have \(\Vert \frac{1}{n}\sum _{i=1}^nW_ie_i^\tau \Vert =O_p(n^{-1/2})\). Then by (6.14), we have

$$\begin{aligned} |A_{31}|=\Bigg |\frac{2}{n}\sum _{i=1}^n\mathbf 1 ^\tau W_ie_i^\tau \beta \Bigg |O_p\Bigg (\sqrt{\frac{\log n}{nh}}\Bigg ) =O_p(n^{-1/2})O_p\Bigg (\sqrt{\frac{\log n}{nh}}\Bigg )=o_p(n^{-1/2}). \end{aligned}$$
(5.18)

Similarly, by (6.6) and (6.9), one can obtain that

$$\begin{aligned} |A_{32}|&=\Bigg |(\beta -\hat{\beta }_n)^\tau \Bigg [\frac{2}{n}\sum _{i=1}^n\tilde{X}_ie_i^\tau \Bigg ]\beta +\Bigg [\frac{2}{n}\sum _{i=1}^n\tilde{M}_ie_i^\tau \Bigg ]\beta \Bigg | \nonumber \\&\le \,\Bigg |(\beta -\hat{\beta }_n)^\tau \Bigg [\frac{2}{n}\sum _{i=1}^nX_ie_i^\tau \Bigg ]\beta \Bigg |[1+O_p(c_n)] \nonumber \\&\quad +\,\Bigg |(\beta -\hat{\beta }_n)^\tau \Bigg [\frac{2}{n}\sum _{i=1}^n\Phi (T_i)\Gamma ^{-1}(T_i)W_ie_i^\tau \Bigg ]\beta \Bigg |[1+O_p(c_n)] \nonumber \\&\quad +\,\Bigg |\Bigg [\frac{2}{n}\sum _{i=1}^na^\tau (T_i)W_ie_i^\tau \Bigg ]\beta \Bigg |O_p(c_n)=o_p(n^{-1/2}), \end{aligned}$$
(5.19)
$$\begin{aligned} |A_{33}|&=\Bigg |(\beta -\hat{\beta }_n)^\tau \Bigg [\frac{2}{n}\sum _{i=1}^n\tilde{X}_ie_i^\tau \Bigg ](\beta -\hat{\beta }_n)\Bigg |+\Bigg |\Bigg [\frac{2}{n}\sum _{i=1}^n\tilde{M}_ie_i^\tau \Bigg ](\hat{\beta }_n-\beta )\Bigg | \nonumber \\&\qquad +\,\Bigg |\Bigg [\frac{2}{n}\sum _{i=1}^n\tilde{\epsilon }_ie_i^\tau \Bigg ](\hat{\beta }_n-\beta )\Bigg | \nonumber \\&\le \,O_p(n^{-1})O_p(n^{-1/2})+O_p(n^{-1})O_p(c_n)+\Bigg |\frac{2}{n}\sum _{i=1}^n\epsilon _ie_i^\tau (\hat{\beta }_n-\beta )\Bigg | \nonumber \\&\qquad +\,\Bigg |\frac{2}{n}\sum _{i=1}^nS_i\epsilon e_i^\tau (\hat{\beta }_n-\beta )\Bigg | \nonumber \\&=\,O_p(n^{-3/2})+O_p(n^{-1}c_n)\!+\!O_p(n^{-1})\!+\!O_p(n^{-1})O_p\Bigg (\!\sqrt{\frac{\log n}{nh}}\Bigg ) \!=\!o_p(n^{-1/2}). \end{aligned}$$
(5.20)

Combining (5.17)–(5.20), we prove (5.4). As a result, (5.1) can be written as

$$\begin{aligned} \hat{\sigma }_n^2-\sigma ^2=\frac{1}{n}\sum _{i=1}^n[(\epsilon _i-e_i^\tau \beta )^2-(\sigma ^2+\beta ^\tau \Sigma _e\beta )]+o_p(n^{-1/2}). \end{aligned}$$

This completes the proof of Theorem 3.1 (i).

(ii) To prove \(\sqrt{n}(\hat{\sigma }_J^2-\sigma ^2)=\sqrt{n}(\hat{\sigma }_n^2-\sigma ^2)+o_p(1)\), it is sufficient to prove that \(\hat{\sigma }_J^2=\hat{\sigma }_n^2+o_p(n^{-1/2}).\) According to the definition, we have \(\hat{\sigma }_J^2=\hat{\sigma }_n^2+\frac{n-1}{n}\sum _{i=1}^n(\hat{\sigma }_n^2-\hat{\sigma }_{n,-i}^2)\). Therefore, to obtain the desired result, we only need to prove

$$\begin{aligned} \sqrt{n}\sum _{i=1}^n(\hat{\sigma }_n^2-\hat{\sigma }_{n,-i}^2)=o_p(1). \end{aligned}$$
(5.21)

Note that \(\sum _{i=1}^n[\tilde{\xi }_i(\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)+\Sigma _e\hat{\beta }_n]=0\), with Lemma 6.4 we have

$$\begin{aligned} \sum _{i=1}^n(\hat{\sigma }_n^2-\hat{\sigma }_{n,-i}^2)&=\frac{1}{n-1}\sum _{i=1}^n(\hat{\beta }_n-\hat{\beta }_{n,-i})^\tau \tilde{\xi }_i\tilde{\xi }_i^\tau (\hat{\beta }_n-\hat{\beta }_{n,-i}) \\&\quad +\,\frac{2}{n-1}\sum _{i=1}^n[\tilde{\xi }_i^\tau (\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)+\hat{\beta }_n^\tau \Sigma _e](\hat{\beta }_n-\hat{\beta }_{n,-i}) \\&\quad +\,\sum _{i=1}^n(\hat{\beta }_n-\hat{\beta }_{n,-i})^\tau \Sigma _e(\hat{\beta }_n-\hat{\beta }_{n,-i}) \\&\quad -\,\frac{1}{n-1}\sum _{i=1}^n(\hat{\beta }_n-\hat{\beta }_{n,-i})^\tau \sum _{j=1}^n\tilde{\xi }_j\tilde{\xi }_j^\tau (\hat{\beta }_n-\hat{\beta }_{n,-i}):=\sum _{k=1}^4B_k. \end{aligned}$$

Therefore, to prove (5.21), it is sufficient to prove \(B_k=o_p(n^{-1/2}),~~k=1,2,3,4.\)

From Lemmas 6.7 and 6.11, we have

$$\begin{aligned} B_1=\frac{1}{n-1}\sum _{i=1}^n[(\hat{\beta }_n-\hat{\beta }_{n,-i})^\tau \tilde{\xi }_i\tilde{\xi }_i^\tau (\hat{\beta }_n-\hat{\beta }_{n,-i})] =O_p(n^{-2}). \end{aligned}$$
(5.22)

Similarly, one can easily check that

$$\begin{aligned} B_3&=\sum _{i=1}^n(\hat{\beta }_n-\hat{\beta }_{n,-i})^\tau \Sigma _e(\hat{\beta }_n-\hat{\beta }_{n,-i}) O_p(n^{-1}), \end{aligned}$$
(5.23)
$$\begin{aligned} B_4&=\sum _{i=1}^n(\hat{\beta }_n-\hat{\beta }_{n,-i})^\tau \frac{1}{n-1}\sum _{j=1}^n \tilde{\xi }_j\tilde{\xi }_j^\tau (\hat{\beta }_n-\hat{\beta }_{n,-i})=O_p(n^{-1}). \end{aligned}$$
(5.24)

Using Lemmas 6.11 and 6.5, we have

$$\begin{aligned} B_2^2&=\frac{4}{(n-1)^2}\Big \{\sum _{i=1}^n[\tilde{\xi }_i^\tau (\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)+\hat{\beta }_n^\tau \Sigma _e](\hat{\beta }_n-\hat{\beta }_{n,-i})\Big \}^2 \\&\le \frac{4n}{(n-1)^2}\sum _{i=1}^n(\tilde{\xi }_i^\tau (\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)+\hat{\beta }_n^\tau \Sigma _e)^2O_p(n^{-2})=O_p(n^{-2}). \end{aligned}$$

Therefore, one can obtain that

$$\begin{aligned} |B_2|=o_p(n^{-1/2}). \end{aligned}$$
(5.25)

Hence, combining (5.22)–(5.25), we finish the proof of (5.21). \(\square \)

Proof of Theorem 3.2

Define \(g(\lambda )=\frac{1}{n}\sum _{i=1}^n\frac{\sigma _{J_i}^2-\sigma ^2}{1+\lambda (\sigma _{J_i}^2-\sigma ^2)}\). It is easy to check that

$$\begin{aligned} 0=|g(\lambda )|&=\Big |\frac{1}{n}\sum _{i=1}^n(\sigma ^2_{J_i}-\sigma ^2)-\frac{\lambda }{n}\sum _{i=1}^n\frac{(\sigma ^2_{J_i}-\sigma ^2)^2}{1+\lambda (\sigma ^2_{J_i}-\sigma ^2)}\Big |\ge \frac{|\lambda | S_{\sigma ^2}}{1+|\lambda |R_n} \nonumber \\&\quad -\,\Big |\frac{1}{n}\sum _{i=1}^n\sigma _{J_i}^2-\sigma ^2\Big |, \end{aligned}$$

where \(S_{\sigma ^2}=\frac{1}{n}\sum _{i=1}^n(\sigma _{J_i}^2-\sigma ^2)^2\), \(R_n=\max _{1\le i\le n}|\sigma _{J_i}^2-\sigma ^2|\). Next we prove

$$\begin{aligned}&R_n=\max _{1\le i\le n}|\sigma _{J_i}^2-\sigma ^2|=o_p(\sqrt{n}), \end{aligned}$$
(5.26)
$$\begin{aligned}&S_{\sigma ^2}=\frac{1}{n}\sum _{i=1}^n(\sigma _{J_i}^2-\sigma ^2)^2\mathop {\rightarrow }\limits ^\mathrm{P} \Sigma _4. \end{aligned}$$
(5.27)

Write

$$\begin{aligned} \hat{\sigma }_n^2-\hat{\sigma }_{n,-i}^2&=\frac{1}{n-1}[(\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)^2-\hat{\sigma }_n^2-\hat{\beta }_n^\tau \Sigma _e\hat{\beta }_n] \\&\quad +\frac{1}{n-1}(\hat{\beta }_n-\hat{\beta }_{n,-i})^\tau \tilde{\xi }_i\tilde{\xi }_i^\tau (\hat{\beta }_n-\hat{\beta }_{n,-i}) \\&\quad +\frac{2}{n-1}[\tilde{\xi }_i^\tau (\tilde{Y}_i-\tilde{\xi }_i^\tau \hat{\beta }_n)+\hat{\beta }_n^\tau \Sigma _e](\hat{\beta }_n-\hat{\beta }_{n,-i}) \\&\quad +(\hat{\beta }_n-\hat{\beta }_{n,-i})^\tau \Sigma _e(\hat{\beta }_n-\hat{\beta }_{n,-i}) \\&\quad -\frac{1}{n-1}(\hat{\beta }_n-\hat{\beta }_{n,-i})^\tau \sum _{j=1}^n\tilde{\xi }_j\tilde{\xi }_j^\tau (\hat{\beta }_n-\hat{\beta }_{n,-i}):=\sum _{k=1}^5b_{ki}. \end{aligned}$$

Hence, to prove (5.26) we only need to prove \(\max _{1\le i\le n}|b_{ki}|=o_p(n^{-1/2})\) for \(k=1,2,3,4,5.\)

Apparently, we have

$$\begin{aligned} \frac{(n-1)^2}{n}\sum _{i=1}^nb_{1i}^2&=\frac{1}{n}\sum _{i=1}^n\left[ (\epsilon _i-e_i^\tau \beta )^4+(\tilde{\xi }_i^\tau (\beta -\hat{\beta }_n))^4+4(\epsilon _i-e_i^\tau \beta )^3\tilde{\xi }(\beta -\hat{\beta }_n)\right. \\&\left. \quad +\,4(\epsilon _i-e_i^\tau \beta )(\tilde{\xi }(\beta -\hat{\beta }_n))^3+6(\epsilon _i-e_i^\tau \beta )^2(\tilde{\xi }(\beta -\hat{\beta }_n))^2\right] \\&\quad -\,(\hat{\sigma }_n^2+\hat{\beta }_n^\tau \Sigma _e\hat{\beta }_n)^2. \end{aligned}$$

From (A3), we have

$$\begin{aligned} P\Big (n^{-3/2}\Big |\sum _{i=1}^n(\epsilon _i-e_i^\tau \beta )^3\tilde{\xi }_i\Big |>\eta \Big ) \le \frac{1}{\eta }n^{-3/2}\sum _{i=1}^nE|(\epsilon _i-e_i^\tau \beta )^3\tilde{\xi }_i|\rightarrow 0, \end{aligned}$$

which implies \(\frac{4}{n}\sum _{i=1}^n(\epsilon _i-e_i^\tau \beta )^3\tilde{\xi }(\beta -\hat{\beta }_n)=o_p(1)\) from \(\Vert \hat{\beta }_n-\beta \Vert =O_p(n^{-1/2})\) given by Lemma 6.9 (i). Similarly \(\frac{4}{n}\sum _{i=1}^n(\epsilon _i-e_i^\tau \beta )(\tilde{\xi }(\beta -\hat{\beta }_n))^3=o_p(1)\), \(\frac{6}{n}\sum _{i=1}^n(\epsilon _i-e_i^\tau \beta )^2(\tilde{\xi }(\beta -\hat{\beta }_n))^2=o_p(1)\) and \(\frac{1}{n}\sum _{i=1}^n(\tilde{\xi }(\beta -\hat{\beta }_n))^4=o_p(1)\). Therefore, from Lemma 6.5, we have

$$\begin{aligned} \frac{(n-1)^2}{n}\sum _{i=1}^nb_{1i}^2 \mathop {\rightarrow }\limits ^\mathrm{P} E(\epsilon _1-e_1^\tau \beta )^4-(\sigma ^2+\beta ^\tau \Sigma _e\beta )^2=\Sigma _4. \end{aligned}$$
(5.28)

From (5.28), one can derive that

$$\begin{aligned} \max _{1\le i\le n}|b_{1i}|=o_p(n^{-1/2}). \end{aligned}$$
(5.29)

By the same approaches used in (5.22)-(5.25), one can easily check

$$\begin{aligned} \max _{1\le i\le n}|b_{ki}|=O_p(n^{-1}),~~k=2,3,4,5. \end{aligned}$$
(5.30)

Hence, together with (5.29) and (5.30), we have proved (5.26).

According to Theorem 3.1, one can write \( S_{\sigma ^2}=\frac{1}{n}\sum _{i=1}^n(\sigma _{J_i}^2)^2-(\sigma ^2)^2+o_p(1), \)

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n(\sigma _{J_i}^2)^2 =(\hat{\sigma }_n^2)^2 +\frac{2(n-1)}{n}\hat{\sigma }_n^2\sum _{i=1}^n(\hat{\sigma }_n^2-\hat{\sigma }_{n,-i}^2) +\frac{(n-1)^2}{n}\sum _{i=1}^n(\hat{\sigma }_n^2-\hat{\sigma }_{n,-i}^2)^2. \end{aligned}$$

Therefore, to prove (5.27), we need to investigate the convergency of \(\frac{1}{n}\sum _{i=1}^n(\sigma _{J_i}^2)^2\) first.

From (5.21), we have \(\frac{2(n-1)}{n}\hat{\sigma }_n^2\sum _{i=1}^n(\hat{\sigma }_n^2-\hat{\sigma }_{n,-i}^2)=o_p(n^{-1/2})\). Using the same techniques in proving (5.26), one can get \( \frac{(n-1)^2}{n}\sum _{i=1}^n(\hat{\sigma }_n^2-\hat{\sigma }_{n,-i}^2)^2 =\frac{(n-1)^2}{n}\sum _{i=1}^nb_{1i}^2+o_p(1). \) Together with (5.28), we have

$$\begin{aligned} S_{\sigma ^2}=\frac{(n-1)^2}{n}\sum _{i=1}^nb_{1i}^2+o_p(1) \mathop {\rightarrow }\limits ^\mathrm{P} \Sigma _4, \end{aligned}$$

which proves (5.27).

Applying Theorem 3.1, we have \(|\frac{1}{n}\sum _{i=1}^n\sigma _{J_i}^2-\sigma ^2|=O_p(n^{-1/2})\). Together with (5.27), we have \(\frac{|\lambda |}{1+|\lambda |R_n}=O_p(n^{-1/2})\). From (5.26), it follows that \(|\lambda |=O_p(n^{-1/2})\). Let \(\gamma _i=\lambda (\sigma _{J_i}^2-\sigma ^2)\), then still by (5.26), \(\max _{1\le i\le n}|\gamma _i|=|\lambda |R_n=o_p(1)\). Note that

$$\begin{aligned} 0=g(\lambda )&=\frac{1}{n}\sum _{i=1}^n(\sigma _{J_i}^2-\sigma ^2)\frac{1}{1+\gamma _i} =\frac{1}{n}\sum _{i=1}^n(\sigma _{J_i}^2-\sigma ^2)(1-\gamma _i+\frac{\gamma _i^2}{1+\gamma _i}) \\&=\frac{1}{n}\sum _{i=1}^n(\sigma _{J_i}^2-\sigma ^2)-\lambda S_{\sigma ^2} +\frac{1}{n}\sum _{i=1}^n(\sigma _{J_i}^2-\sigma ^2)\frac{\gamma _i^2}{1+\gamma _i}. \end{aligned}$$

By (5.26) and (5.27), it is easy to derive that \(\frac{1}{n}\sum _{i=1}^n(\sigma _{J_i}^2-\sigma ^2)\frac{\gamma _i^2}{1+\gamma _i} =\frac{1}{n}\sum _{i=1}^n(\sigma _{J_i}^2-\sigma ^2)^2\lambda ^2(\sigma _{J_i}^2-\sigma ^2)\frac{1}{1+\gamma _i} =o_p(n^{-1/2})\). Therefore

$$\begin{aligned} \lambda S_{\sigma ^2}=\frac{1}{n}\sum _{i=1}^n(\sigma _{J_i}^2-\sigma ^2)+o_p(n^{-1/2}). \end{aligned}$$

Denote \(\lambda = S_{\sigma ^2}^{-1}\frac{1}{n}\sum _{i=1}^n(\sigma _{J_i}^2-\sigma ^2)+\phi _n\), where \(|\phi _n|=o_p(n^{-1/2})\). Let \(\eta _i=\sum _{k=3}^\infty \frac{(-1)^{k-1}}{k}\gamma _i^k\), then \(\eta _i=O(\gamma _i^3)\), which implies \( |\sum _{i=1}^n\eta _i|\le C\sum _{i=1}^n|\gamma _i|^3 =C\sum _{i=1}^n|\lambda ^2(\hat{\sigma }_{J_i}^2-\sigma ^2)^2\gamma _i| \le Cn\lambda ^2S_{\sigma ^2}\max _{1\le i\le n}|\gamma _i|=o_p(1)\). Hence

$$\begin{aligned} l(\sigma ^2)&=2\sum _{i=1}^n\gamma _i-\sum _{i=1}^n\gamma _i^2+2\sum _{i=1}^n\eta _i= 2\lambda n(\hat{\sigma }_J^2-\sigma ^2)-n\lambda ^2S_{\sigma ^2}+2\sum _{i=1}^n\eta _i \\&=2n(\hat{\sigma }_J^2-\sigma ^2)^2[S_{\sigma ^2}^{-1}(\hat{\sigma }_J^2-\sigma ^2)+\phi _n] -nS_{\sigma ^2}[S_{\sigma ^2}^{-1}(\hat{\sigma }_J^2-\sigma ^2)\!+\!\phi _n]^2\!+\!2\sum _{i=1}^n\eta _i \\&=nS_{\sigma ^2}^{-1}(\hat{\sigma }_J^2-\sigma ^2)^2-nS_{\sigma ^2}\phi _n^2+2\sum _{i=1}^n\eta _i \\&=nS_{\sigma ^2}^{-1}(\hat{\sigma }_J^2-\sigma ^2)^2+o_p(1). \end{aligned}$$

Finally, together with Theorem 3.1, we finish the proof of Theorem 3.2. \(\square \)