Penalized empirical likelihood for partially linear errors-in-variables panel data models with fixed effects

He, Bang-Qiang; Hong, Xing-Jian; Fan, Guo-Liang

doi:10.1007/s00362-018-1049-2

Penalized empirical likelihood for partially linear errors-in-variables panel data models with fixed effects

Regular Article
Published: 28 September 2018

Volume 61, pages 2351–2381, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Statistical Papers Aims and scope Submit manuscript

Penalized empirical likelihood for partially linear errors-in-variables panel data models with fixed effects

Download PDF

421 Accesses
4 Citations
Explore all metrics

Abstract

For the partially linear errors-in-variables panel data models with fixed effects, we, in this paper, study asymptotic distributions of a corrected empirical log-likelihood ratio and maximum empirical likelihood estimator of the regression parameter. In addition, we propose penalized empirical likelihood (PEL) and variable selection procedure for the parameter with diverging numbers of parameters. By using an appropriate penalty function, we show that PEL estimators have the oracle property. Also, the PEL ratio for the vector of regression coefficients is defined and its limiting distribution is asymptotically chi-square under the null hypothesis. Moreover, empirical log-likelihood ratio for the nonparametric part is also investigated. Monte Carlo simulations are conducted to illustrate the finite sample performance of the proposed estimators.

An empirical likelihood check with varying coefficient fixed effect model with panel data

Article 29 June 2021

Empirical likelihood for semi-varying coefficient models for panel data with fixed effects

Article 05 February 2016

Estimation for the multi-way error components model with ill-conditioned panel data

Article 16 June 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The analysis of panel data is the subject of one of the most active and innovative bodies of literature in econometrics. Panel data sets have various advantages over that of pure time-series or cross-sectional data sets, among which the most important one is perhaps that the panel data provide researchers a flexible way to model both heterogeneity among cross-sectional units and possible structural changes over time. Arellano (2003), Baltagi (2005) and Hsiao (2003) provided excellent overviews of statistical inference and econometric analysis of parametric panel data models. However, a misspecified parametric panel data model may result in misleading inference. Therefore, econometricians and statisticians have developed some flexible nonparametric and semi-parametric panel data models. For example, Su and Ullah (2007) proposed a class of two-step estimators for nonparametric panel data with random effects. Cai and Li (2008) studied dynamic nonparametric panel data models. Henderson et al. (2008) considered nonparametric panel data model with fixed effects. Rodriguez-Poo and Soberon (2014) considered varying coefficient fixed effects panel data models, established direct semiparametric estimations. Chen et al. (2013) studied partially linear single-index panel data models with fixed effects, proposed a dummy variable method to remove fixed effects and established a semi-parametric minimum average variance estimation procedure. Baltagi and Li (2002) discussed partially linear panel data models with fixed effects, developed the series estimation procedure and the profile likelihood estimation technique. Hu (2014) proposed the profile likelihood procedure to estimate semi-varying coefficient model for panel data with fixed effects. The partially linear panel data models with fixed effects are widely used in econometric analysis; see, e.g., Henderson et al. (2008), Horowitz and Lee (2004), Hu (2017) and Li et al. (2011).

In this paper, we consider the following partially linear panel data models with fixed effects (e.g. Su and Ullah 2006):

$$\begin{aligned} Y_{it}=X^\tau _{it}\beta +g(Z_{it})+\mu _{i}+\varepsilon _{it},~~i=1,\ldots ,n,~~t=1,\ldots ,T, \end{aligned}$$

(1.1)

where $Y_{it}$ is the response, $(X_{it},Z_{it})\in R^p\times R$ are strictly exogenous variables, ${\beta }=(\beta _1,\ldots ,\beta _p)^\tau $ is a vector of p-dimensional unknown parameters, and the superscript $\tau $ denotes the transpose of a vector or matrix. $g(Z_{it})$ is a unknown functions and $\mu _{i}$ is the unobserved individual effects, $\varepsilon _{it}$ is the random model error. Here, we assume $\varepsilon _{it}$ to be i.i.d. with zero mean and finite variance $\sigma ^2>0$. We allow $\mu _i$ to be correlated with $X_{it}$, and $Z_{it}$ with an unknown correlation structure. Hence, model (1.1) is a fixed effects model.

It is well known that in many fields, such as engineering, economics, biology, biomedical sciences and epidemiology, observations are measured with error. For example, urinary sodium chloride level (Wang et al. 1996) and serum cholesterol level (Carroll et al. 1995) are often subjects to measurement errors. Simply ignoring measurement errors (errors-in-variables), known as the naive method, would result in biased estimators. Handing the measurement errors in covariates is generally a challenge for statistical analysis. For the past two decades, errors-in-variables can be handled by means of corrected score function (Nakamura 1990), corrected likelihood method (Hanfelt and Liang 1997), the instrumental variables estimation approach (Schennach 2007) and so on.

Specifically, we consider the following partially linear errors-in-variables panel data models with fixed effects

$$\begin{aligned} \left\{ \begin{aligned} Y_{it}&= X_{it}^{\tau }\beta +g(Z_{it})+\mu _{i}+\varepsilon _{it}, \\ W_{it}&=X_{it}+\nu _{it}, \end{aligned} \right. \quad i=1,\ldots ,n;~ t=1, \ldots ,T. \end{aligned}$$

(1.2)

where the covariate variables $X_{it}$ are measured with additive error and are not directly observable. Instead, $X_{it}$ are observed $W_{it}=X_{it}+\nu _{it}$, where the measurement errors $\nu _{it}$ are independent and identically distributed, independent of $(X_{it},Z_{it},\varepsilon _{it})$, with mean zero and covariance matrix $\Sigma _{\nu }$. We will assume that $\Sigma _{\nu }$ is known, as in the papers of Zhu and Cui (2003) and You and Chen (2006) and other. When $\Sigma _{\nu }$ is unknown, we can estimate it by repeatedly measuring $W_{it}$; see Liang et al. (1999) and Fan et al. (2013) for details.

It is well-known that high-dimensional data analysis arises frequently in many contemporary statistical studies. The emergence of high-dimensional data, such as the gene expression values in microarray, brings challenges to many traditional statistical methods and theory. One important aspect of the high-dimensional data under the regression setting is that the number of covariates is diverging. When dimensionality diverges, variable selection through regularization has proven to be effective. As argued in Hastie et al. (2009) and Fan and Lv (2008), penalized likelihood can properly adjust the bias-variance trade-off so that the performance improvement can be achieved; Various powerful penalization methods have been developed for variable selection. Fan and Li (2001) proposed a unified approach via nonconcave penalized least squares to automatically and simultaneously select variables. Li and Liang (2008) developed the nonconcave penalized quasilikelihood method for variable selection in semiparametric regression model. Recently, a new and efficient variable selection approach, PEL introduced for the first time by Tang and Leng (2010), was applied to analyze mean vector in multivariate analysis and regression coefficients in linear models with diverging number of parameters. As demonstrated in Tang and Leng (2010), the PEL has merits in both efficiency and adaptivity stemming from a nonparametric likelihood method. Also, the PEL method possesses the same merit of the empirical likelihood (EL) which only uses the data to determine the shape and orientation of confidence regions and without estimating the complex covariance. As far as we know, there are a few papers related to the PEL approach, such as Ren and Zhang (2011) proposed the PEL approach for variable selection in moment restriction models; Leng and Tang (2012) applied the PEL approach to parametric estimation and variable selection for general estimating equations; Wang and Xiang (2017) studied PEL inference for sparse additive hazards regression with a diverging number of covariates.

It is worth pointing out that there is no result available in the literature when the number of covariates is diverging in partially linear errors-in-variables panel data models with fixed effects. In this paper, our aim is to extend the results in Fan et al. (2016) for high-dimensional partially linear varying coefficient model with measurement errors to partially linear error-in-variables panel data models with fixed effects. Our contribution can be summarized as follows. Following the estimation procedure proposed by Fan et al. (2005), we first adapt a local linear dummy variable approach to remove the unknown fixed effects. Moreover, we utilize the EL method to construct confidence regions of unknown parameter and establish asymptotic normality of maximum empirical likelihood (MEL) estimator of the parameter. At last, for building sparse models, we propose an estimating equation-based PEL, a unified framework for variable selection in optimally combining estimating equations. More specifically, this method has the oracle property. Moreover, PEL ratio statistic shows the Wilks’ phenomenon, facilitating hypothesis testing and constructing confidence regions.

The layout of the remainder of this paper is as follows. In Sect. 2, we construct corrected-attenuation EL ratio and test statistic as well as define the MEL and PEL estimators of the parameter and give their asymptotic properties. Moreover, empirical log-likelihood ratio for the nonparametric part is also investigated. Finally, we briefly introduce computational algorithm. The simulated example is provided in Sect. 3. Section 4 summarizes some conclusions and discusses future research. Assumption conditions and the proofs of the asymptotic results are given in Appendix.

2 Methodology and main results

2.1 Modified empirical likelihood

We give vector and matrix notations in the following. Let $Y=(Y_1^{\tau },\ldots ,Y_n^{\tau })^{\tau }$, $X=(X_1^{\tau },\ldots , X_n^{\tau })^{\tau }$, $Z=(Z_1^{\tau }, \ldots ,Z_n^{\tau })^{\tau }$, $\mu _0=(\mu _1^{\tau },\ldots ,\mu _n^{\tau })^{\tau }$ and $\varepsilon =(\varepsilon _1^{\tau },\ldots ,\varepsilon _n^{\tau })^{\tau }$ be $nT\times 1$ vectors, where $Y_i=(Y_{i1},\ldots ,Y_{iT})^{\tau }$, $X_i=(X_{i1},\ldots ,X_{iT})^{\tau }$, $Z_i=(Z_{i1},\ldots ,Z_{iT})^{\tau }$, $\varepsilon _i=(\varepsilon _{i1},\ldots ,\varepsilon _{iT})^{\tau }$, and $D_0=I_{n}\otimes i_T $ with $\otimes $ the Kronecker product, $I_n$ denotes the $n\times n$ identity matrix, and $i_n$ denotes the $n\times 1$ vector of ones. We rewrite model (1.1) in a matrix format which yields

$$\begin{aligned} Y=X\beta +g(Z)+D_0\mu _0+\varepsilon . \end{aligned}$$

(2.1)

For the identification purpose, we impose the restriction $\sum ^n_{i=1}\mu _i=0$. Letting $D=[-i_{n-1}\quad I_{n-1}]\otimes i_T $ and $\mu =(\mu _2,\ldots ,\mu _n)^\tau $, model (2.1) can then be rewritten as

$$\begin{aligned} Y=X\beta +g(Z)+D\mu +\varepsilon . \end{aligned}$$

(2.2)

Let $G_{it}(z,h)=(1, (Z_{it}-z)/h)^{\tau }$, $K_h(z)=K(\cdot /h)/h$ with a kernel function $K(\cdot )$ and a bandwidth h. The diagonal matrices

$$\begin{aligned} K_h(Z_i,z)= & {} \left[ \begin{array}{ccc} K_h(Z_{i1},z) &{}\quad \cdots &{}\quad 0\\ \vdots &{}\quad \ddots &{}\quad \vdots \\ 0&{}\quad \cdots &{}\quad K_h(Z_{iT},z)\\ \end{array}\right] ,~~\\ W_h(u)= & {} \left[ \begin{array}{ccc} K_h(Z_{1},z) &{}\quad \cdots &{}\quad 0\\ \vdots &{}\quad \ddots &{}\quad \vdots \\ 0&{}\quad \cdots &{}\quad K_h(Z_{n},z)\\ \end{array}\right] , \end{aligned}$$

and $\zeta =(\mu ^\tau ,\beta ^\tau )^\tau $. Given $\zeta $, we can estimate the functions $g(\cdot )=(g(z),\{hg'(z)\}^{\tau })^{\tau }$ by

$$\begin{aligned} \sum _{i=1}^n\sum _{t=1}^T\bigg \{\Big (Y_{it}-X^\tau _{it}\beta -\mu _{i}\Big )-\Big [g(z)+hg'(z)(Z_{it}-z)\Big ]\bigg \}^2K_{h}(Z_{it}-z). \end{aligned}$$

(2.3)

Let $G(z,h)=[G_1^{\tau }(z,h),\ldots , G_n^{\tau }(z,H)]^{\tau }$, $G_i(z,h)=(G_{i1}(z,h),\ldots , G_{iT}(z,h))^{\tau },$ $g'(z)=\partial g(z)/\partial z$ and $z=z_{it}$ is in a neighborhood of $Z_{it}$. Then the solution of problem (2.3) is given by

$$\begin{aligned} g(\cdot )= & {} [G^\tau (z,h) W_H(u)G(z,h)]^{-1}G^\tau (z,h) W_H(u)(Y-X\beta -D\mu )\\= & {} S(z,H)(Y-X\beta -D\mu ). \end{aligned}$$

In particular, the estimator for g(z) is given by

$$\begin{aligned} \hat{g}(z)=s(z,h)(Y-X\beta -D\mu ). \end{aligned}$$

(2.4)

where $s(z,h)=(1~~0)S(z,h)$.

Now we consider a way of removing the unknown fixed effects motivated by a least squares dummy variable model in parametric panel data analysis, for which we solve the following optimization problem:

$$\begin{aligned} \hat{\zeta }=\mathrm{arg}\min _{\zeta }\big [Y-X\beta -D\mu -S(Y-X\beta -D\mu )]^\tau [Y-X\beta -D\mu -S(Y-X\beta -D\mu )], \end{aligned}$$

(2.5)

where the smoothing matrix S is

$$\begin{aligned} S= & {} \left( \begin{array}{ccc} (1~~0)[G^\tau (Z_{11},h)W_{11}(z,h)G({Z_{11},h})]^{-1}G^\tau (Z_{11},h)W_{11}(z,h)\\ \vdots \\ (1~~0)[G^\tau (Z_{1T},h)W_{1T}(z,h)G(Z_{1T},h)]^{-1}G^\tau (Z_{11},h)W_{1T}(z,h)\\ \vdots \\ (1~~0)[G^\tau (Z_{nT},h)W_{nT}(z,h)G(Z_{nT},h)]^{-1}G^\tau (Z_{nT},h)W_{nT}(z,h)\\ \end{array}\right) \\= & {} \left( \begin{array}{ccc} S_{11} \\ \vdots \\ S_{1T} \\ \vdots \\ S_{nT} \end{array}\right) . \end{aligned}$$

Supposing that $\widetilde{X}=(I_{nT}-S)X$, $\widetilde{Y}=(I_{nT}-S)Y$, $\widetilde{D}=(I_{nT}-S)D$, we have $\widetilde{\mu }=(\widetilde{D}^{\tau }\widetilde{D})^{-1}\widetilde{D}^{\tau }(\widetilde{Y}-\widetilde{X}\beta )$. Let $H=I_{nT}-\widetilde{D}(\widetilde{D}^{\tau }\widetilde{D})^{-1}\widetilde{D}^{\tau },$ we can obtain $H\widetilde{D}\mu =0$. Hence, the fixed effects term $D\mu $ is eliminated in (2.3). Let $e_{it}$ be the $nT\times 1$ vector with its $\{(i-1)T+t\}$th element being 1 and others 0. We state the approximate residuals as the following:

$$\begin{aligned} \Lambda _i(\beta )=\sum _{t=1}^T\widetilde{X}_{it}^{\tau }H(\widetilde{Y}_{it}-\widetilde{X}_{it}\beta ), ~~~i=1,\ldots ,n. \end{aligned}$$

(2.6)

However, $X_{it}$’s can not be observed in our case and we just have $W_{it}$. If we ignore the measurement error and replace $X_{it}$ with $W_{it}$ in (2.4) directly, (2.4) can be used to show that the resulting estimate is inconsistent. It is well known that in linear regression or partially linear regression, inconsistency caused by the measurement error can be overcome by applying the so-called “correction for attenuation”, see Liang et al. (1999) and Hu et al. (2009) for more details.

$$\begin{aligned} \Gamma _i(\beta )=\sum _{t=1}^T\widetilde{W}_{it}^{\tau }H(\widetilde{Y}_{it}-\widetilde{W}_{it}\beta )-(T-1)\Sigma _{\nu }\beta , ~~~i=1,\ldots ,n. \end{aligned}$$

(2.7)

Note that $E(\Gamma _i(\beta ))=0$, if $\beta $ is the true parameter. Therefore, similar to Owen (1990), we define a corrected-attenuation empirical likelihood (CAEL) ratio of $\beta $ as.

$$\begin{aligned} R_{n}(\beta )=-\max \left\{ \sum _{i=1}^n\ln (np_{i})| p_{i}\ge 0,\sum _{i=1}^np_{i}=1,\sum _{i=1}^np_{i} \Gamma _i(\beta )=0\right\} . \end{aligned}$$

(2.8)

With the assumption that 0 is inside the convex hull of the point $(\Gamma _1(\beta ),\ldots ,\Gamma _n(\beta ))$, a unique value for $R_{n}(\beta )$ exists. By the Lagrange multiplier method, one can obtain that

$$\begin{aligned} R_{n}(\beta )=\sum _{i=1}^n\ln \{1+\gamma ^\tau \Gamma _i(\beta )\}, \end{aligned}$$

(2.9)

where $\gamma $ is determined by

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\frac{\Gamma _i(\beta )}{1+\gamma ^\tau \Gamma _i(\beta )}=0. \end{aligned}$$

(2.10)

Theorem 2.1

Suppose that the conditions of (B1)–(B7) in the Appendix hold. Further assume that $E(\varepsilon ^3|X,Z)=0$ almost surely or $k\ge 8$. If $\beta _0$ is the true value of the parameter vector and ${\mathop {\rightarrow }\limits ^{d}}$ stands for convergence in distribution, $p^{3+2/(k-2)}/n\rightarrow 0$ as $n\rightarrow \infty $, then $(2p)^{-1/2}(2R_{n}(\beta _0)-p){\mathop {\rightarrow }\limits ^{d}}N(0,1)$.

Define $\hat{\beta }_{ME}=\arg \min _{\beta }R_{n}(\beta )$, which is the MEL estimator of the parameter $\beta $.

Theorem 2.2

Under the conditions of Theorem 2.1, we have

$$\begin{aligned} \sqrt{n}A_n\Omega ^{-1/2}(\widehat{\beta }_{ME}-\beta _0){\mathop {\rightarrow }\limits ^{d}}N(0,\Delta ). \end{aligned}$$

$A_n$ represents a projection of the diverging dimensional vector to a fixed dimension s, and $A_n$ is a $s\times p$ matrix such that $A_nA_n^{\tau }\rightarrow \Delta $,$\Delta $ is a $s\times s$ nonnegative symmetric matrix with fixed s, and $\Omega =\Sigma _0^{-1}\Sigma _1\Sigma _0^{-1}$, $\Sigma _1=(T-1)\big \{E(\varepsilon _{11}-\nu _{11}\beta _0)^2\Sigma _2 +\sigma ^2\Sigma _{\nu }+ E[(\nu _{11}\nu _{11}^{\tau }-\Sigma _{\nu })\beta _0]^2\big \}$, $\Sigma _2=E\{[{X_{11}}-E({X_{11}}|Z_{11})]^{\tau }[{X_{11}}-E({X_{11}}|Z_{11})]\}$ and $\Sigma _0=(T-1)\Sigma _2$.

Further, $\widehat{\Sigma }_2^{-1}\widehat{\Sigma }_1\widehat{\Sigma }_2^{-1}$ is a consistent estimator of $\Sigma _2^{-1}\Sigma _1\Sigma _2^{-1}$ where $\widehat{\Sigma }_2=\frac{1}{n}\widetilde{W}^{\tau }H\widetilde{W}-(T-1)\Sigma _{\nu }$ and $\widehat{\Sigma }_1=\{\frac{1}{n}\sum _{i=1}^n\sum _{t=1}^T[\widetilde{W}_{it}^{\tau }H(\widetilde{Y}_{it}-\widetilde{W}_{it}\widehat{\beta })]+(T-1)(\Sigma _{\nu }\widehat{\beta })\}^{\oplus 2}$ and $\mathbf {A}^{\oplus 2}$ means $\mathbf {A}\mathbf {A}^{\tau }$. By Theorem 2.2, we obtain that

$$\begin{aligned} (\widehat{\beta }_{ME}-\beta _0)^{\tau }n[\widehat{\Sigma }_2^{-1}\widehat{\Sigma }_1\widehat{\Sigma }_2^{-1}]^{-1}(\widehat{\beta }_{ME}-\beta _0){\mathop {\rightarrow }\limits ^{d}}\chi _s^2. \end{aligned}$$

2.2 Penalized empirical likelihood for variable selection

We use the PEL by combining the profile likelihood method and the smoothly clipped absolute deviation (SCAD) penalized approach. The SCAD penalty is defined in terms of its first derivative.

We define the penalized empirical likelihood (PEL) as follows,

$$\begin{aligned} \mathcal {L}_{n}(\beta )= R_{n}(\beta )+n\sum _{j=1}^pp_{\lambda }(|\beta _j|), \end{aligned}$$

(2.11)

where $p_{\lambda }(\cdot )$ is a penalty function with tuning parameter $\lambda $. See Fan and Li (2001) for example of this function. In this paper, we use the smoothly clipped absolute deviation penalty, whose first derivative satisfies

$$\begin{aligned} p_{\lambda }'(\theta )=\theta \Big \{I(\theta \le \lambda )+\frac{(a\lambda -\theta )_{+}}{(a-1)\lambda }I(\theta>\lambda )\Big \},~~\theta>0,~\lambda >0, \end{aligned}$$

(2.12)

for some $a>2$ and $p_{\lambda }'(0)=0$. Following Fan and Li (2001), we set $a=3.7$ in our work.

Maximizing the PEL function (2.8) is equivalent to minimizing

$$\begin{aligned} \mathcal {L}_{n}(\beta )= \sum _{i=1}^n\ln \{1+\gamma ^\tau \Gamma _i(\beta )\}+n\sum _{j=1}^pp_{\lambda }(|\beta _j|), \end{aligned}$$

(2.13)

Let $\mathcal {B}=\{j:\beta _{0j}\}$ be the set of nonzero components of the true parameter vector $\beta _0$ and its cardinality $|\mathcal {B}|=d$ where d is allowed grow as $n\rightarrow \infty $. Without loss of generality, one can partition the parameter vector as $\beta =(\beta _1^{\tau },\beta _2^{\tau })^{\tau }$ where $\beta _1\in \mathbb {R}^d$ and $\beta _2\in \mathbb {R}^{p-d}$. Hence, the true parameter $\beta _0=(\beta _{10}^{\tau },0^{\tau })^{\tau }$ and we write $\widehat{\beta }=(\widehat{\beta }_1^{\tau },\widehat{\beta }_2^{\tau })^{\tau }$ called PEL estimator which is the minimizer of (2.13). The matrix $\Omega $ can be decomposed as a block matrix according to the arrangement of $\beta _0$ as $\Omega =\left( \begin{array}{cc} \Omega _{11}&{} \Omega _{12}\\ \Omega _{21} &{} \Omega _{22} \end{array}\right) .$

Theorem 2.3

Suppose that Assumptions (B1)–(B9) hold. If $p^5/n\rightarrow 0$, then with probability tending to 1, the PEL estimator $\widehat{\beta }$ satisfies

(a)
(Sparity): $\widehat{\beta }_2=0$;
(b)
(Asymptotic normality): $n^{1/2}W_n\Omega _p^{-1/2}(\widehat{\beta }_1-\beta _{10}){\mathop {\rightarrow }\limits ^{d}}N(0,G)$, where $W_n\in R^{q \times d}$ such that $ W_n W_n^{\tau }\rightarrow G$ for $G\in R^{q\times q}$ with fixed q and $\Omega _p=\Omega _{11}-\Omega _{12}\Omega _{22}^{-1}\Omega _{21}$.

A remarkable advantage of PEL lies in testing hypotheses and constructing confidence regions for $\beta $. To understand this more clearly, we consider the problem of testing linear hypothesis:

$$\begin{aligned} H_0:L_n\beta _{10}=0~~~vs~~~H_1:L_n\beta _{10}\ne 0 \end{aligned}$$

where $L_n$ is $q\times s$ matrix such that $L_nL_n^{\tau }=I_q$ for a fixed and finite q. A nonparametric profile likelihood ratio statistic is constructed as

$$\begin{aligned} \widetilde{\mathcal {L}}_{n}(\hat{\beta })=-\left\{ \mathcal {L}_{n}(\hat{\beta })-\mathop {\min }\limits _{\beta ,L_n\beta _{10}=0}\mathcal {L}_{n}(\beta )\right\} . \end{aligned}$$

We summarize the property of the test statistic in the following theorem.

Theorem 2.4

Under the conditions of Theorem 2.3. Then under the null hypothesis $H_0$, we have

$$\begin{aligned} 2\widetilde{\mathcal {L}}_{n}(\hat{\beta }){\mathop {\rightarrow }\limits ^{d}}\chi _q^2, ~~as~ n\rightarrow \infty . \end{aligned}$$

As a consequence of the theorem, confidence regions for the parameter $\beta $ can be constructed. More precisely, for any $ 0\le \alpha <1$, let $c_{\alpha }$ be such that $P(\chi _q^2>c_{\alpha })\le 1-\alpha $. Then $\ell _{PEL}(\alpha )=\{\beta \in R^p:\widetilde{\mathcal {L}}_{n}({\beta })\le c_{\alpha }\}$ constitutes a confidence region for $\beta $ with asymptotic coverage $\alpha $ because the event that $\beta $ belongs to $\ell _{PEL}(\alpha )$ is equivalent to the event that $\widetilde{\mathcal {L}}_{n}({\beta })\le c_{\alpha }$.

2.3 Empirical likelihood for the nonparametric part

For model (2.2), we solve the the following optimization problem:

$$\begin{aligned} \hat{\mu }=\mathrm{arg}\min _{\mu }\big [Y-X\beta -g(Z)-D\mu ]^\tau [Y-X\beta -g(Z)-D\mu ], \end{aligned}$$

we have $\mu =({D}^{\tau }{D})^{-1}{D}^{\tau }[Y-X\beta -g(Z)]$. For given $\beta $ and z, an auxiliary random vector for nonparametric part can be stated as

$$\begin{aligned} \Xi _{i}\{g(z)\}=\sum _{t=1}^T(I_{it}-Q)[Y_{it}-X_{it}\beta -g(z)]K_h(Z_{it}-z),~~~i=1,\ldots ,n. \end{aligned}$$

where $Q=D({D}^{\tau }{D})^{-1}{D}^{\tau }$. Note that $E[\Xi _{i}\{g(z)\}]=0$ if g(z) is the true parameter. Thus we can define an empirical log-likelihood ratio statistic for g(z) by using the methodology in Owen (1988). We introduce an adjusted auxiliary random vector for g(z) as follows.

$$\begin{aligned}&\hat{\Xi }_{i}\{g(z)\}=\sum _{t=1}^T(I_{it}-Q)[Y_{it}-X_{it}\hat{\beta }-g(z)-\{\hat{g}(Z_{it})-\hat{g}(z)\}]K_h(Z_{it}-z),\\&\quad i=1,\ldots ,n. \end{aligned}$$

By the adjustment in $\hat{\Xi }_{i}\{g(z)\}$, we not only correct the bias, but also avoid undersmoothing the function g(z), as proved in the Appendix. The adjusted empirical log-likelihood ratio for g(z) can be defined as

$$\begin{aligned} \mathcal {Q}_{n}(g(z))=-\max \left\{ \sum _{i=1}^n\ln (n\tilde{p}_{i})| \tilde{p}_{i}\ge 0,\sum _{i=1}^n\tilde{p}_{i}=1,\sum _{i=1}^n\tilde{p}_{i} \hat{\Xi }_{i}\{g(z)\}=0\right\} . \end{aligned}$$

By the Lagrange multiplier method, one can obtain that

$$\begin{aligned} \mathcal {Q}_{n}(g(z))=\sum _{i=1}^n\ln \{1+\phi ^\tau \hat{\Xi }_{i}\{g(z)\}\}, \end{aligned}$$

where $\phi $ is determined by

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\frac{\hat{\Xi }_{i}\{g(z)\}}{1+\phi ^\tau \hat{\Xi }_{i}\{g(z)\}}=0. \end{aligned}$$

Theorem 2.5

Suppose that the conditions of (B1)–(B9) in the Appendix hold. For a given $z\in \mathcal {Z}$, if g(z) is the true value of the parameter, then

$$\begin{aligned} 2\mathcal {Q}_{n}(g(z)){\mathop {\rightarrow }\limits ^{d}}\chi _1^2. \end{aligned}$$

2.4 Computational algorithm

This section employs the local quadratic approximation algorithm to obtain the minimizer of PEL ratio defined by (2.13). Specifically, for each $j=1,\ldots , p, [p_{\lambda }(|\beta _j|)]'$ can be locally approximated by the quadratic function defined as $[p_{\lambda }(|\beta _j|)]'=p'_{\lambda }(|\beta _j|)sgn(\beta _j)\approx \{p'_{\lambda }(|\beta _{j0}|/|\beta _{j0}|)\}\beta _j$ at an initial value $\beta _{j0}$ of $\beta _j$ is not close to 0; otherwise, we set $\hat{\beta }_j=0$. In other words, in a neighborhood of a given nonzero $\beta _{j0}$, we assume that $p_{\lambda }(|\beta _{j}|)\approx p_{\lambda }(|\beta _{j0}|)+\frac{1}{2}\{p'_{\lambda }(|\beta _{j0}|/|\beta _{j0}|\}(\beta _j^2-\beta _{j0}^2)$. We then make use of algorithm (see Owen, 2001) to obtain the minimum through nonlinear optimization. The procedure is repeated until convergence.

We apply the following Bayesian information criterion (BIC) to select the tuning parameter $\lambda $, which is defined by

$$\begin{aligned} BIC(\lambda )=-2\mathcal {L}_{n}(\beta _{\lambda })+\ln (n)df_{\lambda }, \end{aligned}$$

where $df_{\lambda }$ is the number of nonzero estimated parameters. Then the optimal tuning parameter is the minimizer of the BIC.

3 Simulation studies

In this section, we carry out some simulation to study the finite sample performance of our proposed method. Throughout this section, we choose the Epanechnikov kernel $K(u)=\frac{3}{4}(1-u^2)I\{|u|\le 1\}$ and use the “leave-one-subject-out” cross-validation bandwidth method to select the optimal handwidth $h_{opt}$.

Firstly, we consider the following partially linear errors-in-variables panel data models with fixed effects:

$$\begin{aligned} \left\{ \begin{aligned} Y_{it}&= X_{it}^{\tau }\beta +g(Z_{it})+\mu _{i}+\varepsilon _{it}, \\ W_{it}&=X_{it}+\nu _{it} \end{aligned} \right. \quad i=1,\ldots ,n;~ t=1, \ldots ,T. \end{aligned}$$

(3.1)

where $\beta =(3,1.5,0,0,2,0,\ldots ,0)^{\tau }$, $g(Z_{it})=\cos (2\pi Z_{it})$, $Z_{it}{\sim }U(0,1)$, $\mu _i=\frac{1}{2}\bar{Z_i}+w_i$ and $w_i{\sim }N(0,0.1^2)$ for $i=1,2,\ldots ,n$, and $ \bar{Z_i}=\frac{1}{T}\sum _{t=1}^{T}Z_{it}.$ The measurement error $\nu _{it}{\sim }N(0,\Sigma _{\nu })$ where we take $\Sigma _{\nu }=0.2^2I_{10}$ and $0.4^2I_{10}$ to represent different levels of measurement error. The covariate $X_{it}$ is a p-dimensional normal distribution random vector with mean zero and covariance matrix cov$(X_{it},X_{jt})=0.5^{|i-j|}$.

Table 1 Comparison of coverage probability for MEL, PEL, NMEL and NPEL

Full size table

In our simulations, we take p as the integer part of $10(6n)^{1/5.1}-20$ and the sample sizes $(n,T)=(50,4), (50,6)$ and (100, 6), respectively. In order to show the performance of the proposed methods, we compare MEL and PEL estimators with the native maximum empirical likelihood (NMEL) and native penalized empirical likelihood (NPEL) estimators that the neglecting the measurement errors with a direct replacement of X by W in our proposed estimators. In each case the number of simulated realizations is 500.

Seen from Table 1, when the nominal level is 0.9 and 0.95, shows the coverage probability of confidence region for the whole $\beta $ constructed by MEL and PEL method, respectively. From the results, we can see that the PEL confidence region has slightly higher coverage probability than the NEL confidence region, and the coverage probability tends to the nominal level as the sample size increases.

From Table 2, we can see the average model errors (ME) and the standard deviations (SD) of the $\beta _1$ that is nonzero components of $\beta $. based on PEL and MEL estimators decreases as the sample size increases and the PEL estimator gives the smallest ME and SD among the estimators based on PEL, MEL, NPEL and NMEL methods for all settings. The ME is defined as $ME(\hat{\beta _1})=(\hat{\beta _1}-\beta _1)^{\tau }E(X^{\tau }X)(\hat{\beta _1}-\beta _1).$

Table 3 summaries the variable selection results, where important variable have large effects. The column labeled “C” gives average number of correct zeros and column labeled “I” gives the average number of incorrect zeros. From Table 3, it can be seen that variable selection method based on the PEL select all three true predictors and the average number of correct zeros are close to $p-5$ in all settings. Further the smaller measurement errors lead to better performance. It can also be seen that the PEL approach perform better than the NPEL method for all settings. These findings imply that the model selection result based on the PEL approach effectively reduces model complexity and the selected model is very close to the true model in terms of nonzero coefficients.

Table 2 ME and SD of $\beta _1$ for MEL, PEL, NPEL and NMEL estimators

Full size table

Table 3 Simulation results for variable selection selection based on the PEL and NPEL methods

Full size table

From Fig. 1, We see that the method based on the EL performs slighter better than the NA method since the EL method gives shorter confidence intervals than the NA method which is shown in Theorem 4 in Xue and Zhu (2008). Besides, interestingly, seen from Fig. 1, $\Sigma _v=0.2$ gives shorter confidence intervals and narrower confidence bands than $\Sigma _v=0.4$ for g(z). This shows the empirical likelihood ratio generally works well.

4 Conclusion remarks

The partially linear panel data models with fixed effects has received a lot of attention. But there have been few studies about partially linear errors-in-variables panel data models with fixed effects. We apply empirical likelihood both for parameter and nonparametric parts. Moreover, we propose PEL and variable selection procedure for the parameter with diverging numbers of parameters. By using an appropriate penalty function, we show that PEL estimators has the oracle property. Also, we introduce the PEL ratio statistic to test a linear hypothesis of the parameter and prove it follows an asymptotically chi-square distribution under the null hypothesis. We conduct simulation studies to demonstrate the finite sample performance of our proposed method. Still, more work is needed to extend the method to more complex settings, including errors-in-function, cross-sectional dependence and spatial panel data model. The results presented in this paper provide the foundation for additional work in these directions.

References

Arellano M (2003) Panel data econometrics. Oxford University Press, New York
MATH Google Scholar
Baltagi BH (2005) Econometric analysis of panel data, 2nd edn. Wiley, New York
MATH Google Scholar
Baltagi BH, Li D (2002) Series estimation of partially linear panel data models with fixed effects. Ann Econ Fin 3(1):103–116
Google Scholar
Cai Z, Li Q (2008) Nonparametric estimation and varying coefficient dynamic panel data models. Econom Theory 24:1321–1342
MathSciNet MATH Google Scholar
Carroll RJ, Ruppert D, Stefanski LA (1995) Measurement error in nonlinear models. Chapman & Hall, London
MATH Google Scholar
Chen J, Gao J, Li D (2013) Estimation in partially linear single-index panel data models with fixed effects. J Bus Econ Stat 31(3):315–330
MathSciNet Google Scholar
Fan J, Huang T (2005) Profile likelihood inferences on semiparametric varying-coefficient partially linear models. Bernoulli 11(6):1031–1057
MathSciNet MATH Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
MathSciNet MATH Google Scholar
Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B 70(5):849–911
MathSciNet MATH Google Scholar
Fan J, Peng H, Huang T (2005) Semilinear high-dimensional model for normalization of microarrays data: a theoretical analysis and partial consistency. J Am Stat Assoc 100:781–796
MathSciNet MATH Google Scholar
Fan GL, Liang HY, Wang JF (2013) Statistical inference for partially time-varying coefficient errors-in-variables models. J Stat Plan Inference 143:505–519
MathSciNet MATH Google Scholar
Fan GL, Liang HY, Shen Y (2016) Penalized empirical likelihood for high-dimensional partially linear varying coefficient model with measurement errors. J Multivar Anal 147:183–201
MathSciNet MATH Google Scholar
Hall P, Heyde CC (1980) Martingale limit theory and its application. Academic Press, New York
MATH Google Scholar
Hanfelt JJ, Liang KY (1997) Approximate likelihood for generalized linear errors-in-variables models. J R Stat Soc Ser B 59(3):627–637
MathSciNet MATH Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York
MATH Google Scholar
He BQ, Hong XJ, Fan GL (2017) Block empirical likelihood for partially linear panel data models with fixed effects. Stat Probab Lett 123(4):128–138
MathSciNet MATH Google Scholar
Henderson DJ, Carroll RJ, Li Q (2008) Nonparametric estimation and testing of fixed effects panel data models. J Econom 144:257–275
MathSciNet MATH Google Scholar
Horowitz JL, Lee S (2004) Semiparametric estimation of a panel data proportional hazards model with fixed effects. J Econom 119:155–198
MathSciNet MATH Google Scholar
Hsiao C (2003) Analysis of panel data, 2nd edn. Cambridge University Press, New York
MATH Google Scholar
Hu XM (2014) Estimation in a semi-varying coefficient model for panel data with fixed effects. J Syst Sci Complex 27:594–604
MathSciNet MATH Google Scholar
Hu XM (2017) Semi-parametric inference for semi-varying coefficient panel data model with individual effects. J Multivar Anal 154:262–281
MathSciNet MATH Google Scholar
Hu XM, Wang Z, Zhao Z (2009) Empirical likelihood for semiparametric varying-coefficient partially linear errors-in-variables models. Stat Probab Lett 79(8):1044–1052
MathSciNet MATH Google Scholar
Leng C, Tang CY (2012) Penalized empirical likelihood and growing dimensional general estimating equations. Biometrika 99(3):703–716
MathSciNet MATH Google Scholar
Li R, Liang H (2008) Variable selection in semiparametric regression modeling. Ann Stat 36(1):261–286
MathSciNet MATH Google Scholar
Li DG, Chen J, Gao JT (2011) Non-parametric time-varying coefficient panel data models with fixed effects. Econometrics 14:387–408
MathSciNet MATH Google Scholar
Li G, Lin L, Zhu L (2012) Empirical likelihood for a varying coefficient partially linear model with diverging number of parameters. J Multiva Anal 105:85–111
MathSciNet MATH Google Scholar
Liang H, Härdle W, Carroll RJ (1999) Estimation in a semiparametric partially linear errors-in-variables model. Ann Stat 27:1519–1535
MathSciNet MATH Google Scholar
Nakamura T (1990) Corrected score function for errors-in-variables model: methodoology and application to generalized linear models. Biometrika 77(1):127–137
MathSciNet MATH Google Scholar
Owen AB (1988) Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75:237–249
MathSciNet MATH Google Scholar
Owen AB (1990) Empirical likelihood ratio confidence regions. Ann Stat 18:90–120
MathSciNet MATH Google Scholar
Ren Y, Zhang X (2011) Variable selection using penalized empirical likelihood. Sci China Math 54(9):1829–1845
MathSciNet MATH Google Scholar
Rodriguez-Poo JM, Soberon A (2014) Direct semi-parametric estimation of fixed effects panel data varying coefficient models. Econom J 17(1):107–138
MathSciNet Google Scholar
Schennach SM (2007) Instrumental variable estimation of nonlinear errors-in-variables models. Econometrics 75:201–239
MathSciNet MATH Google Scholar
Su LJ, Ullah A (2006) Profile likelihood estimation of partially linear panel data models with models with fixed effects. Econom Lett 92:75–81
MathSciNet MATH Google Scholar
Su L, Ullah A (2007) More efficient estimation of nonparametric panel data models with random effects. Econom Lett 96(3):375–380
MathSciNet MATH Google Scholar
Tang CY, Leng C (2010) Penalized high-dimensional empirical likelihood. Biometrika 97(4):905–920
MathSciNet MATH Google Scholar
Wang S, Xiang L (2017) Penalized empirical likelihood inference for sparse additive hazards regression with a diverging number of covariates. Stat Comput 27(5):1347–1364
MathSciNet MATH Google Scholar
Wang N, Carroll RJ, Liang KY (1996) Quasi-likelihood and variance functions in measurement error models with replicates. Biometrics 52:423–432
Google Scholar
Xue LG, Zhu LX (2008) Empirical likelihood-based inference in a partially linear model for longitudinal data. Sci China Ser A 51(1):115–130
MathSciNet MATH Google Scholar
You JH, Chen GM (2006) Estimation in a statistical inference in a semiparametric varying-coefficient partially linear errors-in-variables model. J Multivar Anal 97:324–341
MATH Google Scholar
Zhou HB, You JH, Zhou B (2010) Statistical inference for fixed-effects partially linear regression models with errors in variables. Stat Papers 51:629–650
MathSciNet MATH Google Scholar
Zhu LX, Cui HJ (2003) A semiparametric regression model with errors in variables. Scand J Stat 30:429–442
MATH Google Scholar

Download references

Acknowledgements

The authors are grateful to two anonymous referees for providing detailed lists of comments and suggestions which greatly improved the presentation of the paper. This research is supported by the National Social Science Fund of China (18BTJ034).

Author information

Authors and Affiliations

School of Mathematics and Physics, Anhui Polytechnic University, Wuhu, 241000, China
Bang-Qiang He & Guo-Liang Fan
School of Data Sciences, Zhejiang University of Finance & Economics, Hangzhou, 310018, China
Xing-Jian Hong

Authors

Bang-Qiang He
View author publications
You can also search for this author in PubMed Google Scholar
Xing-Jian Hong
View author publications
You can also search for this author in PubMed Google Scholar
Guo-Liang Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xing-Jian Hong.

Appendix: Proofs of the main results

We use Frobenius norm of a matrix A, defined as $||A||=\{tr(A^{\tau }A)\}^{1/2}$. Before we give the details of the proofs, we present some regularity conditions.

(B1)
The random vector $Z_{it}$ has a continuous density function $f(\cdot )$ with a bounded support $\mathcal {Z}$. $ 0<inf_{z\in \mathcal {Z}}f(\cdot )\le sup_{z\in \mathcal {Z}}f(\cdot )<\infty $.
(B2)
The functions $E(X_{it}|Z_{it}=z)$ and $g(\cdot )$ have two bounded and continuous derivatives on $\mathcal {Z}$.
(B3)
The kernel K(v) is a symmetric probability density function with a continuous derivative on its compact support $\mathcal {Z}$.
(B4)
$(\mu _i,W_{it}, Z_{it}, \varepsilon _{it}),~ i=1,\ldots ,n,~t=1,\ldots ,T$ are i.i.d. $E(\varepsilon |W,Z,\mu )=0$ almost surely. Furthermore, for some integer $k\ge 4,$ $E(||W\varepsilon ||^k)\le \infty ,$ $E(||W||^k)\le \infty ,$ $E(|\varepsilon |^k)\le \infty .$
(B5)
$E|\breve{X}_{it}|^{2+\delta }<\infty $, $\Sigma =E[\breve{X}_{it}\breve{X}_{it}^{\tau }]$ is non-singular, where $\breve{X}_{it}=X_{it}-E(X_{it}|Z_{it}).$
(B6)
The bandwidth h satisfies $h\rightarrow 0$, $Nh^8\rightarrow 0$ and $Nh^2/(logN)^2\rightarrow \infty $ as $n\rightarrow \infty $.
(B7)
$\Sigma _1$ and $\Sigma _2$ are positive definite matrices with all eigenvalues being uniformly bounded away from zero and infinity.
(B8)
Let $\varpi _1=\sum _{t=1}^T\frac{T-1}{T}(X_{it}-E(X_{it}|U_{it}))(\varepsilon _{it}-\nu _{it}\beta _0)$, $\varpi _2=\sum _{t=1}^T\frac{T-1}{T}\nu _{it}\varepsilon _{it}$, $\varpi _3=\sum _{t=1}^T\frac{T-1}{T}(\nu _{it}\nu _{it}^{\tau }-\Sigma _{\nu })\beta _0$ and $\varpi _{sj}, j=1,\ldots ,p$ be the j-th component of $\varpi _{s}.$ For k of condition (B4), there is a positive constant c such that as $n\rightarrow \infty $, $E(||\varpi _{s}/\sqrt{p}||^k)\le c, s=1,2,3$.
(B9)
The $p_{\lambda }(\cdot )$ satisfy $\mathop {max}\limits _{j\in \mathcal {B}}p_{\lambda }'(|\beta _{j0}|)=o((np)^{-1/2})$ and $ \mathop {max}\limits _{j\in \mathcal {B}}p_{\lambda }''(|\beta _{j0}|)=o(p^{-1/2}) $.

Note that the obove conditions are assumed to hold uniformly in $z\in \mathcal {Z}$. Conditions (B1)–(B9) while look a bit lengthy, are actually quite mild and can be easily satisfied. (B1)–(B2) are standard in the literature on local linear/polynomial estimation. B5 implies $E(\varepsilon _{it}|X_i,Z_i,\mu _i)=E(\varepsilon _{it}|X_{it},Z_{it},\mu _{it})=0.$ (B1)–(B5) can be founded in Su and Ullah 2006. (B6) and (B7) have been used in Zhou et al. (2010).

For the convenience and simplicity, let $\vartheta _k=\int z^kK(z)dz$, $c_N=\{log(1/h)/(Nh)\}^{1/2}+h^2$ and $\widetilde{M}_{\widetilde{D}}=\widetilde{D}(\widetilde{D}^{\tau }\widetilde{D})^{-1}\widetilde{D}^{\tau }$

$$\begin{aligned} a_n= & {} \max _{1\le j\le p}\{p_{\lambda }'(\beta _{j0})|,\beta _{j0}\ne 0\},~~~~b_n=\max _{1\le j\le p}\{p_{\lambda }''(\beta _{j0})|,\beta _{j0}\ne 0\},\\ B_n= & {} \{\beta : ||\beta -\beta _0||\ge d_n\},~~~d_n=n^{-1/3-\delta }+a_n,~~~0<\delta <1/6 \end{aligned}$$

Lemma A.1

Suppose that Assumptions (B1)–(B6) hold. Then

$$\begin{aligned}&\displaystyle G^{\tau }(z,h)W_h(z)G(z,h)=Nf(z)\times \begin{pmatrix}1&{}\quad 0\\ 0&{}\quad v_2\end{pmatrix}\{1+O_p(c_N)\},\\&\displaystyle G^{\tau }(z,h)W_h(z)X=Nf(z)E(X|Z)\times (1,0)^{\tau }\{1+O_p(c_N)\}, \end{aligned}$$

Proof

Note that

$$\begin{aligned}&G^{\tau }(z,h)W_h(z)G(z,h)\\&\quad =\left( \begin{array}{cc} \sum _{i=1}^n\sum _{t=1}^TK_h(Z_{it}-z)&{}\quad \sum _{i=1}^n\sum _{t=1}^T(\frac{Z_{it}-z}{h})K_h(Z_{it}-z)\\ \sum _{i=1}^n\sum _{t=1}^T(\frac{Z_{it}-z}{h})K_h(Z_{it}-z) &{}\quad \sum _{i=1}^n\sum _{t=1}^T(\frac{Z_{it}-z}{h})^2K_h(Z_{it}-z) \end{array}\right) . \end{aligned}$$

Each element of the above matrix is in the form of a kernel regression. Similar to the proof of Lemma A.2 in Fan and Huang (2005), we can derive the desired result. $\square $

Lemma A.2

Suppose that Assumptions (B1)–(B6) hold, we have

$$\begin{aligned} E|g(Z_{it})-\sum _{k=1}^n\sum _{l=1}^TS_{kl}g(Z_{kl})|^2=O(h^4). \end{aligned}$$

(A.1)

Proof

Similar to the proof of Lemma 5.1 in He et al. (2017). $\square $

Lemma A.3

Suppose that Assumptions (B1)–(B6) hold, we have

$$\begin{aligned} \frac{1}{N}\widetilde{{W}}^{\tau }H\widetilde{{W}}{\mathop {\rightarrow }\limits ^{d}}\frac{T-1}{T}(\Sigma _2+\Sigma _{\nu }), \end{aligned}$$

where $\Sigma _2=E\{[{X_{11}}-E({X_{11}}|Z_{11})]^{\tau }[{X_{11}}-E({X_{11}}|Z_{11})]\}.$

Proof

By Lemma A.1, we can obtain

$$\begin{aligned}{}[I_q~~0^\tau _q]^{-1}(G^{\tau }(z,h)W_h(z)G(z,h))^{-1}G^{\tau }(z,h)W_h(z)X=E(X|Z)+O_p(c_N). \end{aligned}$$

Then we have

$$\begin{aligned} \widetilde{{X}}= & {} [{X}_{11}-E(X_{11}|Z_{11}),\ldots ,{X}_{1T}-E(X_{1T}|Z_{1T}),\ldots ,{X}_{nT}-E(X_{nT}|Z_{nT})]^{\tau }\\&+\,O_p(c_N), \end{aligned}$$

and

$$\begin{aligned} \widetilde{{W}}=\widetilde{{X}}+\nu +O_p(c_N)\triangleq A+O_p(c_N). \end{aligned}$$

By the law of large numbers, we have

$$\begin{aligned} \frac{1}{N}\widetilde{{W}}^{\tau }\widetilde{{W}}= & {} \frac{1}{N}\sum _{i=1}^n\sum _{t=1}^T\big \{[{X}_{it}-E(X_{it}|Z_{it})]^{\tau }[{X}_{it}-E(X_{it}|Z_{it})]+\nu _{it}^{\tau }\nu _{it}\big \}\nonumber \\&+\,O_p(c_N) {\mathop {\rightarrow }\limits ^{p}}\Sigma _2+\Sigma _{\nu }. \end{aligned}$$

(A.2)

Hence, to prove the lemma, we consider the limit of $N^{-1}\widetilde{{W}}^{\tau }\widetilde{M}_{\widetilde{D}}\widetilde{{W}}$. It is easy to show that $N^{-1}\widetilde{{W}}^{\tau }\widetilde{M}_{\widetilde{D}}\widetilde{{W}}=N^{-1}A^{\tau }\widetilde{M}_{\widetilde{D}}A+O_p(c_N)$. Let $(\widetilde{M}_{\widetilde{D}})_{e_{kl}e_{it}}\triangleq m_{e_{kl}e_{it}}$ and $(A)_{it}\triangleq a_{it}=\widetilde{{W}}_{it}$, where $e_{kl}=(k-1)T+l$. Then

$$\begin{aligned} \frac{1}{N}A^{\tau }\widetilde{M}_{\widetilde{D}}A&=\frac{1}{N}\sum _{i=1}^n\sum _{t=1}^T\sum _{k=1}^n\sum _{l=1}^T a_{kl}m_{e_{kl}e_{it}}a_{it}=\frac{1}{N}\sum _{i=1}^n\sum _{t=1}^Ta_{it}m_{e_{it}e_{it}}a_{it}\\&\quad +\,\frac{1}{N}\sum _{e_{kl}\ne e_{it}}\sum _{k=1}^n\sum _{l=1}^Ta_{kl}m_{e_{kl}e_{it}}a_{it} \triangleq I_1+I_2. \end{aligned}$$

For the term $I_2$, we have

$$\begin{aligned} EI_2^2=\frac{1}{N^2}E\left[ \sum _{e_{kl}\ne e_{it}}\sum _{k=1}^n\sum _{l=1}^T\sum _{e_{rs}\ne e_{uv}}\sum _{r=1}^n\sum _{s=1}^Ta_{kl}m_{e_{kl}e_{it}}a_{it}a_{rs}m_{e_{rs}e_{uv}}a_{uv}\right] . \end{aligned}$$

Note that $(X_{11},Z_{11}),\ldots ,(X_{nT},Z_{nT})$ are i.i.d. and $E(a_{it}|Z_{it})=0$, when $e_{kl}\ne e_{rs}$ and $e_{it}\ne e_{uv}$, we have

$$\begin{aligned}&E(a_{kl}m_{e_{kl}e_{it}}a_{it}a_{rs}m_{e_{rs}e_{uv}}a_{uv})\\&\quad =E[m_{e_{kl}e_{it}}m_{e_{rs}e_{uv}}E(a_{kl}a_{it}a_{rs}a_{uv}|Z_{kl},Z_{it},Z_{rs},Z_{uv})]\\&\quad =E[m_{e_{kl}e_{it}}m_{e_{rs}e_{uv}}E(a_{it}a_{rs}a_{uv}|Z_{it},Z_{rs},Z_{uv})E(a_{kl}|Z_{kl})]=0. \end{aligned}$$

Using the same argument and $m_{e_{kl}e_{it}}=m_{e_{it}e_{kl}}$, we have

$$\begin{aligned} EI_2^2=\frac{1}{N^2}\sum _{e_{kl}\ne e_{it}}\sum _{k=1}^n\sum _{l=1}^TE(a_{kl}m_{e_{kl}e_{it}}a_{it})^2+\frac{1}{N^2}\sum _{e_{kl}\ne e_{it}}\sum _{k=1}^n\sum _{l=1}^TE(m_{e_{kl}e_{it}}^2a_{kl}a_{it}a_{it}a_{kl}). \end{aligned}$$

By Conditions (B3), we obtain

$$\begin{aligned} EI_2^2\le \frac{2c}{N^2}\sum _{e_{kl}\ne e_{it}}E(m_{e_{kl}e_{it}})^2 \le \frac{1}{N^2}tr(\widetilde{M}^2)\le \frac{2c}{N}, \end{aligned}$$

where c is a constant. Hence

$$\begin{aligned} EI_2=o_p(1). \end{aligned}$$

(A.3)

Note that $I_1$ can be decomposed as

$$\begin{aligned} I_1=\frac{1}{N}\sum _{i=1}^n\sum _{t=1}^Tm_{e_{it}e_{it}} [a_{it}a_{it}-E(a_{it}a_{it})]+\frac{1}{N}\sum _{i=1}^n\sum _{t=1}^Tm_{e_{it}e_{it}}E(a_{it}a_{it})\triangleq \Pi _1+\Pi _2. \end{aligned}$$

By the definition of S, it is easy to show that

$$\begin{aligned} S=(S_{11},\ldots ,S_{1T},S_{21},\ldots ,S_{nT})^{\tau }[I+diag\{O_p(c_n)\}], \end{aligned}$$

where

$$\begin{aligned} S_{it}=\left( \frac{K_h(Z_{11}-Z_{it})}{Nf(Z_{it})},\ldots ,\frac{K_h(Z_{1T}-Z_{it})}{Nf(Z_{it})},\ldots ,\frac{K_h(Z_{nT}-Z_{it})}{Nf(Z_{it})}\right) ^{\tau }. \end{aligned}$$

Let $D_1$ is the first column vector of D, thus we have

$$\begin{aligned} D_1^{\tau }(I-S^{\tau })(I-S)D_1= & {} \left\{ T-2\sum _{t=1}^T \left[ \sum _{l=1}^T \frac{K_h(Z_{1l}-Z_{1t})}{Nf(Z_{1t})}\right] \right. \\&\left. +\sum _{i=1}^n\sum _{t=1}^T\left[ \sum _{l=1}^T \frac{K_h(Z_{1l}-Z_{it})}{Nf(Z_{it})}\right] ^2\right\} \{1+O_p(c_N)\}. \end{aligned}$$

Because

$$\begin{aligned} \sum _{t=1}^T \left[ \sum _{l=1}^T \frac{K_h(Z_{1l}-Z_{1t})}{Nf(Z_{1t})}\right] \{1+O_p(c_N)\}=O_p\left( \frac{1}{Nh}\right) , \end{aligned}$$

and

$$\begin{aligned} \sum _{i=1}^n\sum _{t=1}^T\left[ \sum _{l=1}^T \frac{K_h(Z_{1l}-Z_{it})}{Nf(Z_{it})}\right] ^2\{1+O_p(c_N)\}=O_p\left( \frac{1}{Nh}\right) , \end{aligned}$$

we have

$$\begin{aligned} D_1^{\tau }(I-S^{\tau })(I-S)D_1 =T\left[ 1+O_p\left( \frac{1}{Nh}\right) \right] . \end{aligned}$$

Consider the projection matrix, for $i=1,\ldots ,T$, we obtain

$$\begin{aligned} (\widetilde{M}_{{\widetilde{D}}_1})_{ii}= & {} (I-S)D_1[D_1^{\tau }(I-S^{\tau })(I-S)D_1]^{-1}D_1^{\tau }(I-S^{\tau })\\= & {} \frac{1}{T}\left[ 1+O_p\left( \frac{1}{Nh}\right) \right] \left[ 1-\sum _{l=1}^T \frac{K_h(Z_{1l}-Z_{it})}{Nf(Z_{it})}\{1+O_p(c_N)\}\right] ^2\\= & {} \frac{1}{T}+O_p\left( \frac{1}{Nh}\right) . \end{aligned}$$

Because ${\widetilde{D}}_1$ is the first column vector of ${\widetilde{D}}$. It is easy to show that $\widetilde{M}_{\widetilde{D}}\widetilde{M}_{{\widetilde{D}}_1}=\widetilde{M}_{{\widetilde{D}}_1}\widetilde{M}_{\widetilde{D}}=\widetilde{M}_{{\widetilde{D}}_1}$. Hence, $\widetilde{M}_{\widetilde{D}}-\widetilde{M}_{{\widetilde{D}}_1}$ is also a projection matrix. Thus $\widetilde{M}_{\widetilde{D}}-\widetilde{M}_{\widetilde{D}_1}=(\widetilde{M}_{\widetilde{D}}-\widetilde{M}_{{\widetilde{D}}_1})^2\ge 0 $. We obtain $(\widetilde{M}_{\widetilde{D}})_{ii}\ge (\widetilde{M}_{{\widetilde{D}}_1})_{ii}=\frac{1}{T}+O_p(\frac{1}{Nh}),~i=1,\ldots ,T$. By a similar argument, we can show that $(\widetilde{M}_{\widetilde{D}})_{ii}\ge \frac{1}{T}+O_p(\frac{1}{Nh}),i=T+1,\ldots ,N$. Thus, we have

$$\begin{aligned} 1\ge m_{e_{it}e_{it}}\ge \frac{1}{T}+O_p\left( \frac{1}{Nh}\right) , \end{aligned}$$

then, it is easy to show that

$$\begin{aligned} tr(\widetilde{M}_{\widetilde{D}})=\sum _{i=1}^{n}\sum _{t=1}^{T} m_{e_{it}e_{it}}\ge \frac{N}{T}+O_p\left( \frac{1}{Nh}\right) . \end{aligned}$$

Hence, we have

$$\begin{aligned} \Pi _2=\frac{1}{N}\sum _{i=1}^n\sum _{t=1}^Tm_{e_{it}e_{it}}E(a_{it}a_{it})=\frac{1}{T}(\Sigma +\Sigma _{\eta })+O_p\left( \frac{1}{Nh}\right) . \end{aligned}$$

(A.4)

By (A.2), it is easy to show that

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^n\sum _{t=1}^T(m_{e_{it}e_{it}}-T^{-1})=O_p\left( \frac{1}{Nh^2}\right) , \end{aligned}$$

By the law of large numbers, $\Pi _1$ is bounded as

$$\begin{aligned} \Pi _1&=\frac{1}{N}\sum _{i=1}^n\sum _{t=1}^T(m_{e_{it}e_{it}}-T^{-1})(a_{it}a_{it}-E(a_{it}a_{it}))\nonumber \\&\quad +\frac{1}{NT}\sum _{i=1}^n\sum _{t=1}^T(a_{it}a_{it}-E(a_{it}a_{it}))\nonumber \\&\le \frac{1}{N}\left[ \sum _{i=1}^n\sum _{t=1}^T(m_{e_{it}e_{it}}-T^{-1})^2\right] ^{1/2}+o_p(1)=o_p(1). \end{aligned}$$

(A.5)

By (A.4) and (A.5), we have

$$\begin{aligned} I_1=\frac{1}{T}(\Sigma _2+\Sigma _{\eta })+o_p(1). \end{aligned}$$

(A.6)

By (A.2), (A.3) and (A.6), the lemma holds. $\square $

Lemma A.4

Under the conditions of Theorem 2.1, if $\beta $ is the true value of the parameter, we have

$$\begin{aligned} \sum _{i=1}^n\sum _{t=1}^T \nu _{it}H\widetilde{g}_{it}= & {} o_p(N^{1/2}), \end{aligned}$$

(A.7)

$$\begin{aligned} \sum _{i=1}^n\sum _{t=1}^T \varepsilon _{it}H\widetilde{g}_{it}= & {} o_p(N^{1/2}), \end{aligned}$$

(A.8)

Proof

Since the proof of (A.8) is similar of (A.7), we prove only (A.7)here. Let $\zeta _N=N^{1/2}/log(N)$,

$$\begin{aligned} P\left( |\sum _{i=1}^n\sum _{t=1}^T \nu _{it}H\widetilde{g}_{it}|>\zeta _N\right)\le & {} P\left( |\sum _{i=1}^n\sum _{t=1}^T \nu _{it}H\widetilde{g}_{it}|>\zeta _N, max_{i,t}|\widetilde{g}_{it}|\le ch^4\right) \nonumber \\&+P(max_{i,t}|\widetilde{g}_{it}|\ge ch^4) \end{aligned}$$

(A.9)

The second term is $o_p(1)$ by Lemma A.2. For the first term, let $R_{it}$ be the event that $|\widetilde{g}_{it}|\le ch^4$. Then

$$\begin{aligned}&P\left( |\sum _{i=1}^n\sum _{t=1}^T \nu _{it}H\widetilde{g}_{it}|>\zeta _N, \{I(R_{it})=1,\forall i,t\}\right) \\&\quad \le \zeta _N^{-2}\sum _{i=1}^n\sum _{t=1}^T E[\nu _{it}H\widetilde{g}_{it}\{I(R_{it})=1\}]^2\\&\qquad +\,\,\zeta _N^{-2}\sum _{i\ne k}^n\sum _{t\ne s}^TE[\nu _{it}H\widetilde{g}_{it}\nu _{ks}\Omega \widetilde{g}_{ks}\{I(R_{ks})=1\}]. \end{aligned}$$

Since $\widetilde{g}_{it}\{I(R_{it})=1\}\le ch^4$ is independent of $\nu _{it}$, the first term is $O\{N\zeta _N^{-2}c^2h^8\}=o(1)$. The second term is easily seen to equal zero. $\square $

Lemma A.5

Under the conditions of Theorem 2.1, if $\beta _0$ is the true value of the parameter, we have

$$\begin{aligned}&\displaystyle \mathop {max}\limits _{1\le i\le n}||\Gamma _i(\beta _0)||=o_p(\sqrt{n/p}), \end{aligned}$$

(A.10)

$$\begin{aligned}&\displaystyle \frac{\{N^{-1/2}\sum _{i=1}^n\Gamma _i^{\tau }(\beta _0)\}\Sigma _1^{-1}\{N^{-1/2}\sum _{i=1}^n\Gamma _i(\beta _0)\}-p}{\sqrt{2p}}{\mathop {\rightarrow }\limits ^{d}}N(0,1). \end{aligned}$$

(A.11)

Proof

From the definition of $\Gamma _{i}(\beta )$ by (2.7), and a simple calculation, yields

$$\begin{aligned} \frac{1}{\sqrt{N}}\sum _{i=1}^n\Gamma _i(\beta _0)= & {} \frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T\big \{ \widetilde{X}_{it}H\widetilde{g}_{it}-\widetilde{X}_{it}H \widetilde{\nu }_{it}\beta _0+\widetilde{X}_{it}H\widetilde{\varepsilon }_{it}\\&+\,\widetilde{\nu }_{it}H\widetilde{g}_{it}-\widetilde{\nu }_{it}H \widetilde{\nu }_{it}^{\tau }\beta _0+\widetilde{\nu }_{it}H\widetilde{\varepsilon }_{it}+\Sigma _{\nu }\beta _0\big \}. \end{aligned}$$

By Lemma A.1, we have $S\varepsilon =O_p(c_N)$. Similar to the proof of Lemma A.3 and under Assumption (B7), we have $\frac{1}{\sqrt{N}}\widetilde{{X}}^{\tau }HS\varepsilon =O(\sqrt{N}c_N^2)=o_p(1)$. Therefore

$$\begin{aligned} \frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T\widetilde{X}_{it}H\widetilde{\varepsilon }_{it}=\frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T \frac{T-1}{T}(X_{it}-E(X_{it}|Z_{it}))\varepsilon _{it} \end{aligned}$$

(A.12)

Similar to the proofs of (A.12), we can derive that

$$\begin{aligned} \displaystyle \frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T\widetilde{X}_{it}H \widetilde{\nu }_{it}= & {} \frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T\frac{T-1}{T}(X_{it}-E(X_{it}|U_{it}))\nu _{it},\\ \displaystyle \frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T\widetilde{\nu }_{it}H \widetilde{\nu }_{it}^{\tau }= & {} \frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T\frac{T-1}{T}\nu _{it}\nu _{it},\\ \frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T\widetilde{\nu }_{it}H\widetilde{\varepsilon }_{it}= & {} \frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T\frac{T-1}{T}\nu _{it}\varepsilon _{it}, \end{aligned}$$

which combining with Lemma A.4, it is easy to obtain

$$\begin{aligned}&\frac{1}{\sqrt{N}}\sum _{i=1}^n\Gamma _i(\beta _0)\nonumber \\&\quad =\frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T\frac{T-1}{T}\big \{[(X_{it}-E(X_{it}|U_{it}))(\varepsilon _{it}-\nu _{it}\beta _0)]+\nu _{it}\varepsilon _{it}\nonumber \\&\qquad +(\Sigma _{\nu }-\nu _{it} \nu _{it}^{\tau })\beta _0]\big \}+o_p(1)\nonumber \\&\quad =\frac{1}{\sqrt{N}}\sum _{i=1}^nG_{i}(\beta _0)+o_p(1). \end{aligned}$$

(A.13)

Therefore, we have

$$\begin{aligned} \Gamma _i(\beta _0)&=\frac{T-1}{T}\sum _{t=1}^T\big \{[(X_{it}-E(X_{it}|U_{it}))(\varepsilon _{it}-\nu _{it}\beta _0)]+\nu _{it}\varepsilon _{it}\nonumber \\&\quad +(\Sigma _{\nu }-\nu _{it} \nu _{it}^{\tau })\beta _0]\big \}+o_p(1)\nonumber \\&=\sum _{t=1}^T\left\{ \frac{T-1}{T}(X_{it}-E(X_{it}|U_{it}))(\varepsilon _{it}-\nu _{it}\beta _0)+\frac{T-1}{T}\nu _{it}\varepsilon _{it}\right. \nonumber \\&\quad \left. +\,\frac{T-1}{T}(\nu _{it}\nu _{it}^{\tau }-\Sigma _{\nu })\beta _0\right\} +o_p(1)=\varpi _1+\varpi _2+\varpi _3+o_p(1). \end{aligned}$$

(A.14)

Let $\varpi _s^{*}=\mathop {max}\limits _{1\le i\le n}||\varpi _{si}||,~s=1,2,3$, and $\{\varpi _{si}, i=1,\ldots , n \}$ is a sequence of independent random variables with common distribution. for any $\varepsilon \ge 0$, then

$$\begin{aligned} P\{\varpi _1^{*}\ge (p)^{1/2}n^{1/k}\epsilon \}\le & {} \sum _{i=1}^nP\{||\varpi _{1i}||\ge (p)^{1/2}n^{1/k}\epsilon \}\\\le & {} \frac{1}{np^{k/2}\epsilon ^k}\sum _{i=1}^nE||\varpi _{1i}||^k\\\le & {} \frac{1}{\epsilon ^k}E||\varpi _{11}/p^{1/2}||^k. \end{aligned}$$

From conditions (B4) and (B7) and Cauchy-Schwarz inequality yields that $\varpi _1^{*}=o_p(\sqrt{p}n^{1/k}).$ By the condition $p=o(n^{(k-2)/(2k)}$ in Theorem 2.1, it is easy to check that $\varpi _1^{*}=o_p(\sqrt{n/p}n^{2-k/(2k)}p)=o_p(\sqrt{n/p})$. Similar to the proof, we obtain $\varpi _2^{*}=o_p(\sqrt{n/p})$ and $\varpi _3^{*}=o_p(\sqrt{n/p})$. Then $\mathop {max}\nolimits _{1\le i\le n}||\Gamma _i(\beta _0)||=o_p(\sqrt{n/p}).$

By applying the martingale central limit theorem as give in Hall and Heyde (1980) and (A.13), it is easy to obtain (A.11). The proof of Lemma A.5 is thus completed. $\square $

Lemma A.6

Under the conditions of Theorem 2.1. Denote$D_n=\{\beta :||\beta -\beta _0||\le ca_n\}$ Then $||\gamma (\beta )||=O_p(a_n)$, for $\beta \in D_n$.

Proof

For $\beta \in D_n$, let$\gamma (\beta )=\rho \theta $, where$\rho \ge 0, \theta \in R^p$, and $||\theta ||=1$. Set

$$\begin{aligned} J(\beta )=\frac{1}{n}\sum _{i=1}^n\Gamma _i(\beta )\Gamma _i^{\tau }(\beta ).~\bar{\Gamma }(\beta )=\frac{1}{n}\sum _{i=1}^n\Gamma _i(\beta ), ~\Gamma ^{*}(\beta )=\mathop {max}\limits _{1\le i\le n}||\Gamma _i(\beta )||. \end{aligned}$$

From (2.7), we can obtain

$$\begin{aligned} 0= & {} \frac{1}{n}\sum _{i=1}^n\frac{\Gamma _i(\beta )}{1+\gamma ^\tau \Gamma _i(\beta )}= \frac{1}{n}\sum _{i=1}^n\theta ^{\tau }\Gamma _i(\beta )-\rho \frac{1}{n}\sum _{i=1}^n\frac{(\theta ^{\tau }\Gamma _i(\beta ))^2}{1+\rho \theta ^{\tau }\Gamma _i(\beta )}\\\le & {} \theta ^{\tau }\bar{\Gamma }(\beta )-\frac{\rho }{1+\rho \Gamma ^{*}}\theta ^{\tau }J(\beta )\theta . \end{aligned}$$

Then

$$\begin{aligned} \rho \left[ \theta ^{\tau }J(\beta )\theta -\mathop {max}\limits _{1\le i\le n}||\Gamma _i(\beta )||n^{-1}\left| \sum _{i=1}^n\theta ^{\tau }\Gamma _i(\beta )\right| \right] \le |\theta ^{\tau }\bar{\Gamma }(\beta )|. \end{aligned}$$

Observe that

$$\begin{aligned} \Gamma ^{*}(\beta )\le \Gamma ^{*}(\beta _0)+|\mathop {max}\limits _{1\le i\le n}\left\| \frac{1}{N}\sum _{i=1}^n\sum _{t=1}^T\widetilde{W}_{it}H\widetilde{W}_{it}(\beta -\beta _0)\right\| \end{aligned}$$

Let $\mathcal {X}_{it}=\widetilde{W}_{it}H\widetilde{W}_{it}$ According to Condition (B.7) and Minkowski inequality, we have

$$\begin{aligned} Var(||\mathcal {X}_{it}||^{r/2})\le & {} E(||\mathcal {X}_{it}||^r)=E\left[ \left( \sum _{j=1}^p\mathcal {X}_{itj}^2\right) ^{r/2}\right] \\\le & {} \left\{ \sum _{j=1}^pE[\mathcal {X}_{itj}^r]^{2/r}\right\} ^{r/2}=O(p^{r/2}). \end{aligned}$$

Then we obtain that

$$\begin{aligned}&\mathop {max}\limits _{1\le i\le n}||\mathcal {X}_{it}||\\&\quad \le \Big [\{Var(||\mathcal {X}_{it}||^{r/2})\}^{1/2}\mathop {max}\limits _{1\le i\le n}\left\{ \frac{||\mathcal {X}_{it}||^{r/2})-E||\mathcal {X}_{it}||^{r/2})}{\{Var(||\mathcal {X}_{it}||^{r/2})\}^{1/2}}\right. +E||\mathcal {X}_{it}||^{r/2}\Big ]^{r/2}\\&\quad \le o_p(\sqrt{p}n^{1/r})+O_p(\sqrt{p})=o_p(\sqrt{p}n^{1/r}). \end{aligned}$$

which combining with (A.11)

$$\begin{aligned} \Gamma ^{*}(\beta )=o_p(n/p) \end{aligned}$$

(A.15)

For $\bar{\Gamma }(\beta )$, it is easy to see that

$$\begin{aligned} \bar{\Gamma }(\beta )=\bar{\Gamma }(\beta _0)+\frac{1}{N}\sum _{i=1}^n\sum _{t=1}^T\widetilde{W}_{it}H\widetilde{W}_{it}(\beta -\beta _0) \end{aligned}$$

Similar to the proofs of (A.10) in Fan et al. (2016), we obtain

$$\begin{aligned} \theta ^{\tau }\bar{\Gamma }(\beta )=O_p(a_n) \end{aligned}$$

(A.16)

Therefore, it follows from (A.15) and (A.16), we have $\mathop {max}\nolimits _{1\le i\le n}||\Gamma _i(\beta )||=n^{-1}|\sum _{i=1}^n$.

From (2.7), similar to the proof (A.11) in Fan et al. (2016) and Lemma B.4 in Li et al. (2012), we can derive $tr[(J(\beta _0)-\Sigma _1)^2]=O_p(p^2(c_n^4+1/n))$ which means that all the eigenvalues of $J(\beta _0)$ converge to those of $\Sigma _1$ at the rate of $O_p(p^2(c_n^4+1/n))$. Therefore, by Lemma A.2, (2.7), (A.16), together with Condition (B7), we have

$$\begin{aligned} J(\beta )&=\frac{1}{n}\sum _{i=1}^n\sum _{t=1}^T\big \{\widetilde{W}_{it}H(\widetilde{Y}_{it}-\widetilde{W}_{it}\beta _0)-(T-1)\Sigma _{\nu }\beta _0 -\widetilde{W}_{it}H\widetilde{W}_{it}^{\tau }(\beta -\beta _0)\nonumber \\&\quad +(T-1)\Sigma _{\nu }(\beta -\beta _0)\big \}^{\oplus 2}\nonumber \\&=J(\beta _0)+\frac{1}{n}\sum _{i=1}^n\sum _{t=1}^T\big \{\widetilde{W}_{it}H\widetilde{W}_{it}^{\tau }(\beta -\beta _0)+(T-1)\Sigma _{\nu }(\beta -\beta _0)\big \}^{\oplus 2} \nonumber \\&\quad -\frac{2}{n}\sum _{i=1}^n\sum _{t=1}^T\big \{\widetilde{W}_{it}H(\widetilde{Y}_{it}-\widetilde{W}_{it}\beta _0)-(T-1)\Sigma _{\nu }\beta _0\big \}\big \{\widetilde{W}_{it}H\widetilde{W}_{it}^{\tau }(\beta -\beta _0)\nonumber \\&\quad +(T-1)\Sigma _{\nu }(\beta -\beta _0)\big \} \nonumber \\&= \Sigma _1+O_p(p^2(c_n^4+1/n)). \end{aligned}$$

(A.17)

we can obtain $\theta ^{\tau }J(\beta )\theta =\theta ^{\tau }\Sigma _1\theta {\mathop {\rightarrow }\limits ^{P}}c.$ Therefore, we obtain $\rho \le c|\theta ^{\tau }\bar{\Gamma }(\beta )|=O_p(a_n),$ then $||\gamma (\beta )||=O_p(a_n)$. $\square $

Lemma A.7

Under the conditions of Theorem 2.1. as $n\rightarrow \infty $, with probability tending to 1, $R_{n}(\beta )$ has a minimum in $D_n$.

Proof

For $\beta \in D_n$,

$$\begin{aligned} H_{1n}(\beta ,\gamma )=\frac{1}{n}\sum _{i=1}^n\frac{\Gamma _i(\beta )}{1+\gamma ^\tau \Gamma _i(\beta )}=0 \end{aligned}$$

According to Lemma A.6, we have $\gamma ^\tau \Gamma _i(\beta )=o_p(1)$. Apply Taylor expansion to $H_{1n}(\beta ,\gamma )$, we obtain $ \bar{\Gamma }(\beta )-J(\beta )\gamma +\delta _n=0,$ where $\bar{\Gamma }(\beta )=\frac{1}{n}\sum _{i=1}^n\Gamma _i(\beta )$, $\delta _n=\frac{1}{n}\sum _{i=1}^n\Gamma _i(\beta )(\gamma ^\tau \Gamma _i(\beta ))^2/[1+\zeta _i]^3$ for some $|\zeta _i|\le |\gamma ^\tau \Gamma _i(\beta )|$. We have $\gamma =J(\beta )^{-1}\bar{\Gamma }(\beta )+J(\beta )^{-1}\delta _n$. Substituting $\gamma $ into (2.6), it is easy to see that

$$\begin{aligned} 2R_{n}(\beta )=n\bar{\Gamma }(\beta )^{\tau }J(\beta )^{-1}\bar{\Gamma }(\beta )-n\delta _n^{\tau }J(\beta )^{-1}\delta _n+\frac{2}{3}\sum _{i=1}^n(\gamma ^\tau \Gamma _i(\beta ))^3(1+\zeta _i)^{-4} \end{aligned}$$

(A.18)

For $\beta \in \partial D_n$, where $\partial D_n$ denotes the boundary of $D_n$, we write $\beta =\beta _0+ca_n\phi $ where $\phi $ is a unit vector, we have a decomposition as $2R_{n}(\beta )=\Pi _0+\Pi _1+\Pi _2$, where $\Pi _0=n\bar{\Gamma }(\beta _0)^{\tau }\Sigma _1^{-1}\bar{\Gamma }(\beta _0)$, $\Pi _1=n(\bar{\Gamma }(\beta )-\bar{\Gamma }(\beta _0))^{\tau }J(\beta )^{-1}(\bar{\Gamma }(\beta )-\bar{\Gamma }(\beta _0))$, $\Pi _2=n[\bar{\Gamma }(\beta _0)^{\tau }(J(\beta )^{-1}-\Sigma _1^{-1})\bar{\Gamma }(\beta _0)+2\bar{\Gamma }(\beta _0)^{\tau }J(\beta )^{-1}(\bar{\Gamma }(\beta )-\bar{\Gamma }(\beta _0)]-n\delta _n^{\tau }J(\beta )^{-1}\delta _n+\frac{2}{3}\sum _{i=1}^n(\gamma ^\tau \Gamma _i(\beta ))^3(1+\zeta _i)^{-4}$ As $n\rightarrow \infty $, we see that

$$\begin{aligned} \Pi _1= & {} n\left\{ \left[ \frac{1}{n}\widetilde{{W}}^{\tau }H\widetilde{{W}}-(T-1) \Sigma _{\nu }\right] (\beta -\beta _0)\right\} ^{\tau }J(\beta )^{-1}\left\{ \left[ \frac{1}{n}\widetilde{{W}}^{\tau }H\widetilde{{W}}\right. \right. \\&\left. \left. -\,(T-1)\Sigma _{\nu }\right] (\beta -\beta _0)\right\} \\= & {} c^2na_n^2\phi ^{\tau }\Sigma _0\Sigma _1^{-1}\Sigma _0\phi \{1+o_p(1)\}=O_p(na_n^2), \end{aligned}$$

$\Pi _2/\Pi _1{\mathop {\rightarrow }\limits ^{P}}0$ and $2R_{n}(\beta _0)-\Pi _0=o_p(1)$. This implies that for any c given, as $n\rightarrow \infty $, $Pr\{2[R_{n}(\beta )- R_{n}(\beta _0)]\ge c\}\rightarrow 1$. In addition, note that for n large,

$$\begin{aligned} \mathcal {L}_{n}(\beta )- \mathcal {L}_{n}(\beta _0)= & {} R_{n}(\beta )- R_{n}(\beta _0)+n\sum _{j=1}^p\{p_{\lambda }(|\beta _j|)-p_{\lambda }(|\beta _{j0}|)\\\ge & {} R_{n}(\beta )- R_{n}(\beta _0)+n\sum _{j\in \mathcal {B}}\{p_{\lambda }(|\beta _j|)-p_{\lambda }(|\beta _{j0}|)\ge R_{n}(\beta )- R_{n}(\beta _0), \end{aligned}$$

where the last inequality holds due to Conditions (B9) and the unbiased property of the SCAD penalty so that $j\in \mathcal {B}$, $p_{\lambda }(|\beta _j|)=p_{\lambda }(|\beta _{j0}|)$ when n is large. Hence, $Pr\{\mathcal {L}_{n}(\beta )\ge \mathcal {L}_{n}(\beta _0)\}\rightarrow 1$ for $\beta \in \partial D_n$, which establishes Lemma A.7. $\square $

Proof of Theorem 2.1

Let $U_i=\gamma ^\tau \Gamma _i(\beta _0)$. Apply Taylor expansion to (2.10), we have

$$\begin{aligned} 0= \frac{1}{n}\sum _{i=1}^n\Gamma _i(\beta _0)\left( 1-U_i+\frac{U_i^2}{1+U_i}\right) =\bar{\Gamma }(\beta _0)-J(\beta _0)\gamma +\delta _n, \end{aligned}$$

(A.19)

where $\delta _n=\frac{1}{n}\sum _{i=1}^n\Gamma _i(\beta _0)U_i^2-\frac{1}{n}\sum _{i=1}^n\Gamma _i(\beta _0)\frac{U_i^3}{1+U_i}$.

From (A.11) and Lemma A.6, we have

$$\begin{aligned} \mathop {max}\limits _{1\le i\le n}|U_i|\le ||\gamma (\beta )||\mathop {max}\limits _{1\le i\le n}||\Gamma _i(\beta _0)||=O_p(p/n^{1/2-1/r}). \end{aligned}$$

Similar to the proof of (A.19) in Li et al. (2012), we can get $||\delta _n ||=o_p(p^{5/2}n^{-1}(n^{-1/2}+c_n^2))+o_p(p^2n^{-1}c_n).$ From (A.19), we obtain that $\gamma =J(\beta _0)^{-1}\bar{\Gamma }(\beta _0)+J(\beta _0)^{-1}\delta _n$. Taylor expansion implies $\ln (1+U_i)=U_i-U_i^2/2+U_i^3/3(1+\varsigma _i)^4,$ for some $\varsigma _i$ such that $|\varsigma _i|\le |U_i|$. Therefore, combining (A.16) and some elementary calculation, we have

$$\begin{aligned} 2R_{n}(\beta _0)&=2\sum _{i=1}^n\ln \{1+U_i\}=n\bar{\Gamma }^{\tau }(\beta _0)J(\beta _0)^{-1}\bar{\Gamma }(\beta _0)-n\delta _n^{\tau }J(\beta _0)^{-1}\delta _n\nonumber \\&\quad +\frac{2}{3}\mathcal {R}_n\{1+o_p(1)\}\nonumber \\&=n\bar{\Gamma }^{\tau }(\beta _0)\Sigma _1^{-1}\bar{\Gamma }(\beta _0)+n\bar{\Gamma }^{\tau }(\beta _0)(J(\beta _0)^{-1}-\Sigma _1^{-1})\bar{\Gamma }(\beta _0)-n\delta _n^{\tau }J(\beta _0)^{-1}\delta _n^{\tau }\nonumber \\&\quad +\frac{2}{3}\mathcal {R}_n\{1+o_p(1)\} \end{aligned}$$

(A.20)

where $\mathcal {R}_n=\sum _{i=1}^n[\gamma ^\tau \Gamma _i(\beta _0)]^3$, By using the proving method of (A.22) and Lemma B.6 in Li et al. (2012), we can easily derive $n\delta _n^{\tau }J(\beta _0)^{-1}\delta _n=o_p(\sqrt{p})$ and $n\bar{\Gamma }^{\tau }(\beta _0)(J(\beta _0)^{-1}-\Sigma _1^{-1})\bar{\Gamma }(\beta _0)=o_p(\sqrt{p})$. The proof of Theorem 2.1 is concluded from the above results together with (A.11). $\square $

Proof of Theorem 2.2

Let $H_{1n}(\beta ,\gamma )=\frac{1}{n}\sum _{i=1}^n\frac{\Gamma _i(\beta )}{1+\gamma ^\tau \Gamma _i(\beta )}$ and $H_{2n}(\beta ,\gamma )=\frac{1}{n}\sum _{i=1}^n\frac{\Gamma _i(\beta )}{1+\gamma ^\tau \Gamma _i(\beta )}(\frac{\partial \Gamma _i(\beta )}{\partial \beta }^{\tau })^{\tau }\gamma $. Note that $\hat{\beta }$ and $\hat{\gamma }$ satisfy $H_{1n}(\hat{\beta },\hat{\gamma })=0$ and $H_{2n}(\hat{\beta },\hat{\gamma })=0$ Let $\varphi =(\beta ^{\tau },\gamma ^{\tau })^{\tau }$, $\varphi _0=(\beta _0^{\tau },0)^{\tau }$ and $\hat{\varphi }_0=(\hat{\beta }_0^{\tau },\hat{\gamma }^{\tau }_0)^{\tau }$. Then by

$$\begin{aligned} 0=H_{jn}(\hat{\beta },\hat{\gamma })=H_{jn}(\beta _0^{\tau },0)+\frac{\partial \Gamma _i(\beta ,0)}{\partial \beta }(\hat{\beta }-\beta _0)+\frac{\partial \Gamma _i(\beta ,0)}{\partial \gamma }(\hat{\gamma }-0)+\delta _{jn}, \end{aligned}$$

where $\delta _{jn}$ with $\delta _{jn}=\frac{1}{2}(\hat{\varphi }_0-\varphi _0)^{\tau }H_{jn}^{\prime \prime }(\varphi )(\hat{\varphi }_0-\varphi _0)$ for $j=1,2$. Here $H_{jn}^{\prime \prime }(\varphi )$ denotes the Hessian matric of $H_{jn}(\varphi )$. Then

$$\begin{aligned} \left( \begin{array}{cc} \hat{\gamma }\\ \hat{\beta }-\beta _0\\ \end{array}\right) = \left[ \begin{array}{cc} \frac{\partial H_{1n}(\beta ,\gamma )}{\partial \gamma } &{}\frac{\partial H_{1n}(\beta ,\gamma )}{\partial \beta }\\ \frac{\partial H_{2n}(\beta ,\gamma )}{\partial \gamma } &{}\frac{\partial H_{2n}(\beta ,\gamma )}{\partial \beta }\\ \end{array}\right] ^{-1}_{(\beta _0,0)}\left( \begin{array}{cc} H_{1n}(\beta _0,0)+\delta _{1n}\\ H_{2n}(\beta _0,0)+\delta _{2n}\\ \end{array}\right) , \end{aligned}$$

from Lemma A.3, we have

$$\begin{aligned} n^{-1}\sum _{i=1}^n\frac{\partial \Gamma _i(\beta ,0)}{\partial \beta }=\frac{1}{n}\widetilde{{W}}^{\tau }H\widetilde{{W}}-(T-1)\Sigma _{\nu }{\mathop {\rightarrow }\limits ^{d}}(T-1)\Sigma _2\{1+o_p(1)\}=\Sigma _0\{1+o_p(1)\}. \end{aligned}$$

(A.21)

Note that $||\hat{\gamma }(\beta )||=O_p(a_n)$ by Lemma A.6 and $||\hat{\beta }-\beta _0||=O_p(a_n)$ by Lemma A.7. Then using the Cauchy-Schwarz inequality, we find

$$\begin{aligned} ||\delta _{1n}||^2=\sum _{k=1}^p\delta _{1n,k}\le c\sum _{k=1}^p||\hat{\varphi }_0-\varphi _0||^4||H_{jn}^{\prime \prime }(\varphi )||^2=O_p(a_n^4p^3). \end{aligned}$$

(A.22)

and Condition (B9) yields that

$$\begin{aligned} ||\sqrt{N}\Sigma _0^{-1}\delta _{1n}||^2\le N\Sigma _0^{-2}||\delta _{1n}||=O_p(Na_n^4p^3)=o_p(1). \end{aligned}$$

and combining with $H_{1n}(\beta _0,0)=n^{-1}\sum _{i=1}^n\Gamma _i(\beta _0)$ , we have

$$\begin{aligned} \sqrt{n}(\widehat{\beta }-\beta _0 )= & {} \Sigma _0^{-1}\cdot \frac{1}{n}\sum _{i=1}^n\Gamma _i(\beta _0)+o_p(1)\\= & {} \Sigma _0^{-1}\cdot \frac{1}{n}\sum _{i=1}^n\sum _{t=1}^T\frac{T-1}{T}\big \{[(X_{it}-E(X_{it}|U_{it}))(\varepsilon _{it}-\nu _{it}\beta _0)]+\nu _{it}\varepsilon _{it}\\&+\,(\Sigma _{\nu }-\nu ^{\tau }\nu _{11})\beta _0]\big \} +o_p(1)=\Sigma _0^{-1}\cdot \frac{1}{n}\sum _{i=1}^n\Gamma _i(\beta _0)+o_p(1). \end{aligned}$$

Note that

$$\begin{aligned} Cov(\Gamma _i(\beta _0))= & {} (T-1)\big \{E[(X_{11}-E(X_{11}|U_{11}))(\varepsilon _{11}-\nu _{11}\beta _0)]^2 + E[\nu _{11}^{\tau }\varepsilon _{11}]^2\\&+\,E[(\Sigma _{\nu }-\nu ^{\tau }\nu _{11})\beta _0]^2\big \}. \end{aligned}$$

Therefore,

$$\begin{aligned} \mathop {lim}\limits _{n\rightarrow \infty }Cov(\Gamma _i(\beta _0))= & {} (T-1)\big \{E(\varepsilon _{11}-\nu _{11}\beta _0)^2\Sigma +\sigma ^2\Sigma _{\nu }\\&+ \,E[(\nu _{11}\nu _{11}^{\tau }-\Sigma _{\nu })\beta _0]^2\big \}= \Sigma _1. \end{aligned}$$

Invoking the Slutsky theorem and the central limit theorem, we can prove Theorem 2.2. $\square $

Proof of Theorem 2.3

From the Lemma A.7, we note that the minimizer of $\mathcal {L}_{n}(\beta )$ is in $\mathcal {D}_n$. Considering $\beta \in \mathcal {D}_n$, we have that for each of its components

$$\begin{aligned} \frac{1}{n}\frac{\partial \mathcal {L}_{n}(\beta )}{\partial \beta _i}= \frac{1}{n}\sum _{i=1}^n\frac{\gamma \left[ \sum _{t=1}^T\widetilde{W}_{it}H\widetilde{W}_{it}+(T-1)\Sigma _{\nu }\right] }{1+\gamma ^\tau \Gamma _i(\beta )}+p_{\lambda }'(|\beta _j|)sign(\beta _j)=I_j+II_j, \end{aligned}$$

(A.23)

First, $\mathop {max}\nolimits _{j\in \mathcal {B}}|I_j|\le \gamma \Sigma _j(1+o_p(1))=O_p(a_n)$, because $\gamma ^\tau \Gamma _i(\beta )=o_p(1)$,where $\Sigma _j$ denotes the jth column of $\Sigma $. as $\tau (n/p)^{1/2}\rightarrow \infty $. $Pr(\mathop {max}\nolimits _{j\in \mathcal {B}}|I_j|>\tau /2)\rightarrow 0$. it can be seen that $p_{\lambda }'(|\beta _j|)sign(\beta _j)$ dominates the sign of $\frac{\partial \mathcal {L}_{n}(\beta )}{\partial \beta _i}$ asymptotically for all $j\notin \mathcal {B}$, as $n\rightarrow \infty $, for any $j\notin \mathcal {B}$, with probability tending to 1,

$$\begin{aligned} \frac{\partial \mathcal {L}_{n}(\beta )}{\partial \beta _i}>0,~\beta _j\in (0,ca_n); \frac{\partial \mathcal {L}_{n}(\beta )}{\partial \beta _i}<0, ~\beta _j\in (0,-ca_n) \end{aligned}$$

which implies that $\hat{\beta }_j=0$ for all $j\notin \mathcal {B}^c$, with probability tending to 1. Thus part (a) of Theorem 2.3 follows.

Next, we establish part (b), Let $\Psi _1$ and $\Psi _2$ be matrices such that $\Psi _1\beta =\beta _1$ and $\Psi _2\beta =\beta _2$. As we have shown that as $n\rightarrow \infty $, $Pr(\hat{\beta }_2=0)\rightarrow 1$, thus by the Lagange multiplier method, finding the minimizer of $\mathcal {L}_{n}(\beta )$ is asymptotic equivalent to solve the minimization of the following objective function

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\log \{1+\gamma ^\tau \Gamma _i(\beta )\}+n\sum _{j=1}^pp_{\lambda }(|\beta _j|)+v^{\tau }\Psi _2\beta , \end{aligned}$$

(A.24)

where v is $p-s$ dimensional column vector of an other Lagrange multiplier. Define

$$\begin{aligned}&\displaystyle \tilde{Q}_{1n}(\beta ,\gamma ,v)=\frac{1}{n}\sum _{i=1}^n\frac{\Gamma _i(\beta )}{1+\gamma ^\tau \Gamma _i(\beta )},\\&\displaystyle \tilde{Q}_{2n}(\beta ,\gamma ,v)=\frac{1}{n}\sum _{i=1}^n\frac{\gamma }{1+\gamma ^\tau \Gamma _i(\beta )}+b(\beta )+\Psi _2^{\tau }v, \end{aligned}$$

and $\tilde{Q}_{3n}(\beta ,\gamma ,v)=\Psi _2\beta $. where

$$\begin{aligned} b(\beta )=\{p_{\lambda }'(|\beta _1|)sign(\beta _1),p_{\lambda }'(|\beta _2|)sign(\beta _2),\ldots ,p_{\lambda }'(|\beta _p|)sign(\beta _p),0^{\tau }\}^{\tau }. \end{aligned}$$

The minimizer $(\beta ,\gamma ,v)$ of (A.24) satisfies $\tilde{Q}_{in}(\beta ,\gamma ,v)=0, (i=1,2,3)$. Since $||\gamma ||=O_p(a_n)$ for $\beta \in \mathcal {B}$, we can obtain that $||v||=O_p(a_n)$ from $\tilde{Q}_{2n}(\beta ,\gamma ,v)=0$, In order to expand $\tilde{Q}_{in}(\beta ,\gamma ,v)(i=1,2,3)$ around the value $(\beta _0,0,0)$, we first give the following partial derivatives,

$$\begin{aligned}&\displaystyle \frac{\partial \tilde{Q}_{1n}(\beta _0,0,0)}{\partial \gamma }=-J(\beta _0), ~ \frac{\partial \tilde{Q}_{1n}(\beta _0,0,0)}{\partial \beta }=\Sigma (\beta _0),~\frac{\partial \tilde{Q}_{1n}(\beta _0,0,0)}{\partial v}=0,\\&\displaystyle \frac{\partial \tilde{Q}_{2n}(\beta _0,0,0)}{\partial \gamma }=\Sigma (\beta _0),~\frac{\partial \tilde{Q}_{2n}(\beta _0,0,0)}{\partial \beta }=b'(\beta ),~\frac{\partial \tilde{Q}_{2n}(\beta _0,0,0)}{\partial v} =\Psi _2^{\tau },\\&\displaystyle \frac{\partial \tilde{Q}_{3n}(\beta _0,0,0)}{\partial \gamma }=0,~\frac{\partial \tilde{Q}_{3n}(\beta _0,0,0)}{\partial \beta }=\Psi _2,~\frac{\partial \tilde{Q}_{3n}(\beta _0,0,0)}{\partial v} =0. \end{aligned}$$

Then by Taylor expansion, we immediately derive that

$$\begin{aligned} \left( \begin{array}{ccc} \tilde{Q}_{1n}(\beta _0,0,0)\\ 0\\ 0\\ \end{array}\right) = \left( \begin{array}{ccc} -\Sigma _1 &{}\Sigma _0 &{}0\\ \Sigma _0^{\tau }&{}0&{}\Psi _2^{\tau }\\ 0&{}\Psi _2&{}0\\ \end{array}\right) \left( \begin{array}{ccc}\tilde{\gamma }\\ \tilde{\beta }\\ \tilde{v}\\ \end{array}\right) +R_n, \end{aligned}$$

(A.25)

where $\Sigma (\beta _0)=n^{-1}\sum _{i=1}^n\partial \Gamma _i(\beta _0)/\partial \beta $, $R_n=\sum _{l=1}^5R_n^{(l)}$, $R_n^{(1)}=(R_{1n}^{\tau 1},R_{2n}^{\tau 1},0)^{\tau }$, $R_{jn}^{\tau 1}\in R^p$ and the kth component of $R_{jn}^{\tau 1}$ for $j=1,2$ is given by

$$\begin{aligned} R_{jn,k}^{(1)}=\frac{1}{2}(\hat{\vartheta }-\vartheta _0)^{\tau }\frac{\partial ^2\tilde{Q}_{jn,k}(\tilde{\vartheta })}{\partial \vartheta \partial \vartheta ^{\tau }}(\hat{\vartheta }-\vartheta _0), \end{aligned}$$

$\vartheta =(\beta ,\gamma )^{\tau }, \tilde{\vartheta }=(\tilde{\beta },\tilde{\gamma })^{\tau }$ satisfying $||\tilde{\vartheta }-\vartheta _0||\le ||\hat{\vartheta }-\vartheta _0||$. $R_n^{(2)}=(0,b^{\tau }(\beta _0),0)^{\tau }$, $R_n^{(3)}=[0,\{b'(\tilde{\vartheta })(\hat{\vartheta }-\vartheta _0)\},0]^{\tau }$, $R_n^{(4)}=[\{(J(\beta _0)-\Sigma _1))\hat{\gamma }\}^{\tau }+(\Sigma (\beta _0)-\Sigma _0)(\hat{\beta }-\beta )\}^{\tau },0,0]^{\tau }$ and $R_n^{(5)}=[0,\{(\Sigma (\beta _0)-\Sigma _0)\hat{\gamma }\}^{\tau },0]^{\tau }$. Similar to the proof of (A.22), we can get $R_n^{(1)}=o_p(n^{-1/2})$. Given Condition (B8) and (B9), we see that $R_n^{(2)}=o_p(n^{-1/2})$ and $R_n^{(3)}=o_p(n^{-1/2})$. By (A.21) and (A.17) which together with Lemma A.6 yields that $R_n^{(4)}=o_p(n^{-1/2})$ and $R_n^{(5)}=o_p(n^{-1/2})$. Hence, we can get $R_n^{(k)}=o_p(n^{-1/2}),k=1,\ldots ,5$.

Define $K_{11}=-\Sigma _1, K_{12}=[\Sigma _0,~~0]$ and $K_{21}=K_{12}^{\tau }$,

$$\begin{aligned} K_{22}=\left( \begin{array}{ccc}0&{}\quad \Psi _2^{\tau }\\ \Psi _2&{}\quad 0\\ \end{array}\right) ,~K=\left( \begin{array}{ccc}K_{11}&{}\quad K_{12}\\ K_{21}&{}\quad K_{22}\\ \end{array}\right) \end{aligned}$$

and let $\kappa =(\beta ^{\tau },v^{\tau })^{\tau }$. Then by inverted (A.25), we find

$$\begin{aligned} \left( \begin{array}{ccc} \hat{\gamma }\\ \hat{\kappa }-\kappa _0\\ \end{array}\right) =K^{-1}\Big \{ \left( \begin{array}{ccc} -\tilde{Q}_{1n}(\beta _0,0,0)\\ 0\\ \end{array}\right) +R_n\Big \}, \end{aligned}$$

(A.26)

As matrix K is partitioned into four blocks, it can be inverted blockwise as follows

$$\begin{aligned} K^{-1}=\left[ \begin{array}{ccc} K_{11}^{-1}+K_{11}^{-1}K_{12}A^{-1}K_{21}K_{11}^{-1}&{}\quad -K_{11}^{-1}K_{12}A^{-1}\\ -A^{-1}K_{21}K_{11}^{-1}&{}\quad A^{-1}\\ \end{array}\right] , \end{aligned}$$

where $A=K_{22}-K_{21}K_{11}^{-1}K_{12}=\left[ \begin{array}{ccc} \Omega ^{-1} &{}\quad \Psi _2^{\tau }\\ \Psi _2&{}\quad 0\\ \end{array}\right] $ and $\Sigma $ is defined in Theorem 2.2. Thus, we get

$$\begin{aligned} \hat{\kappa }-\kappa _0=A^{-1}K_{21}K_{11}^{-1}\tilde{Q}_{1n}(\beta _0,0,0)+o_p(n^{-1/2}). \end{aligned}$$

Matric A can also be inverted blockwise by using the analytic inversion formula,ie.,

$$\begin{aligned} A^{-1}=\left[ \begin{array}{ccc} \Omega -\Omega \Psi _2^{\tau }(\Psi _2\Omega \Psi _2^{\tau })^{-1}\Psi _2\Omega &{}\quad \Omega \Psi _2^{\tau }(\Psi _2\Omega \Psi _2^{\tau })^{-1}\\ (\Psi _2\Omega \Psi _2^{\tau })^{-1}\Psi _2\Omega &{}\quad -(\Psi _2\Omega \Psi _2^{\tau })^{-1}\\ \end{array}\right] , \end{aligned}$$

Further, we have

$$\begin{aligned} \hat{\beta }-\beta _0=[\Omega -\Omega \Psi _2^{\tau }(\Psi _2\Omega \Psi _2^{\tau })^{-1}\Psi _2\Omega ](\Sigma _0\Sigma _1\bar{\Gamma }(\beta _0)+o_p(n^{-1/2})). \end{aligned}$$

It follows by an expansion of $\hat{\beta }_1$ that

$$\begin{aligned} \hat{\beta }_1-\beta _0=[\Psi _1\Omega -\Psi _1\Omega \Psi _2^{\tau }(\Psi _2\Omega \Psi _2^{\tau })^{-1}\Psi _2\Omega ](\Sigma _0\Sigma _1\bar{\Gamma }(\beta _0)+o_p(n^{-1/2})). \end{aligned}$$

(A.27)

Then similar to the proof of Theorem 2.3 in Fan et al. (2016), we have $n^{1/2}W_n\Omega _p^{-1/2}(\widehat{\beta }_1-\beta _{10}){\mathop {\rightarrow }\limits ^{d}}N(0,G)$, which completes the proof of Theorem 2.3. $\square $

Proof of Theorem 2.4

Let $\hat{\beta }$ be the minimizer (2.13) and $U_i=\hat{\gamma }^{\tau }\Gamma _i(\beta )$. Taylor expansion gives

$$\begin{aligned} \mathcal {L}_{n}(\beta )=\sum _{i=1}^nU_i-\sum _{i=1}^nU_i^2/2+\sum _{i=1}^nU_i^3/3(1+\xi _i)^4+o_p(1), \end{aligned}$$

where $|\xi _i|\rightarrow |U_i|$ and $o_p(1)$ is due to the penalty function. From (A.26), we have $\hat{\gamma }=[\Sigma _1^{-1}+\Sigma _1^{-1}\Sigma _0\{\Omega -\Omega \Psi _2^{\tau }(\Psi _2\Omega \Psi _2^{\tau })^{-1}\Psi _2\Omega \}\Sigma _0^{\tau }\Sigma _1^{-1}][\bar{\Gamma }(\beta _0)+o_p(n^{-1/2})].$

Similar to Tang and Leng (2010), Substituting the expansion of $\hat{\gamma }$ and $\hat{\beta }$ given by (A.24) into $U_i$, we show that

$$\begin{aligned} 2\mathcal {L}_{n}(\hat{\beta })= n\bar{\Gamma }(\beta _0)^{\tau }\Psi _2^{\tau }(\Psi _2\Omega ^{-1}\Psi _2^{\tau })^{-1}\Psi _2\bar{\Gamma }(\beta _0)+o_p(1) \end{aligned}$$

(A.28)

Under the null hypothesis, because $L_nL_n^{\tau }=I_q$, there exists $\tilde{\Psi }_2$ such that $\tilde{\Psi }_2\beta =0$ and $\tilde{\Psi }_2\tilde{\Psi }_2^{\tau }=I_{p-d+q}.$ Now by repeating the proof of Theorem 2.3, we establish that under the null hypothesis, the estimation of $\beta $ can be obtained by minimizing (A.27), where $\Psi _2$ is replaced by $\tilde{\Psi }_2$, we can easily obtain that

$$\begin{aligned} 2\mathcal {L}_{n}(\tilde{\beta })= n\bar{\Gamma }(\beta _0)^{\tau }\tilde{\Psi }_2^{\tau }(\tilde{\Psi }_2\Omega ^{-1}\tilde{\Psi }_2^{\tau })^{-1}\tilde{\Psi }_2\bar{\Gamma }(\beta _0)+o_p(1). \end{aligned}$$

Combining Eqs. (A.28), we have

$$\begin{aligned} \mathcal {L}_{n}= n\bar{\Gamma }(\beta _0)^{\tau }\Omega ^{-1/2}(P_1-P_2)\Omega ^{-1/2}\bar{\Gamma }(\beta _0)+o_p(1). \end{aligned}$$

where

$$\begin{aligned} P_1=\Omega ^{-1/2}\Psi _2^{\tau }(\Psi _2\Omega ^{-1}\Psi _2^{\tau })^{-1}\Psi _2\Omega ^{-1/2}, \end{aligned}$$

and

$$\begin{aligned} P_2=\Omega ^{-1/2}\tilde{\Psi }_2^{\tau }(\tilde{\Psi }_2\Omega ^{-1}\tilde{\Psi }_2^{\tau })^{-1}\tilde{\Psi }_2\Omega ^{-1/2}, \end{aligned}$$

are two idempotent matrices. As the rank of $P_1-P_2$ is q, $P_1-P_2$ can be written as $\Upsilon ^{\tau }\Upsilon $, where $\Upsilon $ is $q\times p$ matrix such that $\Upsilon ^{\tau }\Upsilon =I_q$, further, we see that

$$\begin{aligned} \sqrt{n}\Upsilon \Omega ^{-1/2}\bar{\Gamma }(\beta _0){\mathop {\rightarrow }\limits ^{d}}N(0,I_q). \end{aligned}$$

Then

$$\begin{aligned} n\bar{\Gamma }(\beta _0)^{\tau }\Omega ^{-1/2}(P_1-P_2)\Omega ^{-1/2}\bar{\Gamma }(\beta _0){\mathop {\rightarrow }\limits ^{d}}\chi _q^2. \end{aligned}$$

and the proof of Theorem 2.4 is finished. $\square $

Lemma A.8

Under the conditions of Theorem 2.5. For a given z, if g(z) is the true value of the parameter, then

$$\begin{aligned}&\displaystyle \frac{1}{\sqrt{Nh}}\sum _{i=1}^n\hat{\Xi }_{i}\{g(z)\}-b(z){\mathop {\rightarrow }\limits ^{d}}N(0,R). \end{aligned}$$

(A.29)

$$\begin{aligned}&\displaystyle \frac{1}{Nh}\sum _{i=1}^n\hat{\Xi }_{i}\{g(z)\}\hat{\Xi }_{i}^{\tau }\{g(z)\}{\mathop {\rightarrow }\limits ^{P}}R . \end{aligned}$$

(A.30)

$$\begin{aligned}&\displaystyle \mathop {max}\limits _{1\le i\le n}||\hat{\Xi }_{i}\{g(z)\}||=o_p(\sqrt{Nh}), \phi =O_p(N^{-1/2}). \end{aligned}$$

(A.31)

where $b(z)=\left( \frac{N}{h}\right) ^{1/2}\frac{T-1}{T}E[g(Z_{it})-g(z)]f(z)\int K(z)dz$ and $R=\sigma ^2f(z)\int K^2(z)dz$.

Proof

Observe that

$$\begin{aligned} \frac{1}{\sqrt{Nh}}\sum _{i=1}^n\hat{\Xi }_{i}\{g(z)\}-b(z)=S_1(z)+S_2(z)+S_3(z) \end{aligned}$$

where

$$\begin{aligned}&\displaystyle S_1(z)=\frac{1}{\sqrt{Nh}}\sum _{i=1}^n\sum _{t=1}^T(I_{it}-Q)K_h(Z_{it}-z)\varepsilon _{it},\\&\displaystyle S_2(z)=\frac{1}{\sqrt{Nh}}\sum _{i=1}^n\sum _{t=1}^T(I_{it}-Q)\{[g(Z_{it})-g(z)]K_h(Z_{it}-z)-(h/N)^{1/2}b(z)\},\\&\displaystyle S_3(z)=\frac{1}{\sqrt{Nh}}\sum _{i=1}^n\sum _{t=1}^T(I_{it}-Q)X_{it}^{\tau }(\beta -\hat{\beta })K_h(Z_{it}-z). \end{aligned}$$

It is not difficult to prove $E[S_1(z)]=0$ and $Var[S_1(z)]=R+o(1)$. $S_1(z)$ satisfies the conditions of the Cramer–Wold theorem and the Lindeberg condition. Therefore, we get

$$\begin{aligned} S_1(z){\mathop {\rightarrow }\limits ^{d}}N(0,R). \end{aligned}$$

(A.32)

We can also prove that

$$\begin{aligned} S_2(z)=o_p(1). \end{aligned}$$

(A.33)

Theorems 2.2 and condition (B8) imply that $\beta -\hat{\beta }=O_p(N^{-1/2})$. Therefore, we get $S_3(z)=O_p(h^{1/2})$. This together with (A.32) and (A.33) proves (A.29).

Analogously to the proof of (A.30). We can verify (A.30) easily. As to (A.31), we find

$$\begin{aligned} \mathop {max}\limits _{1\le i\le n}||\hat{\Xi }_{i}\{g(z)\}||\le & {} \mathop {max}\limits _{1\le i\le n}||(I_{it}-Q)K_h(Z_{it}-z)\varepsilon _{it}||\\&+\mathop {max}\limits _{1\le i\le n}||(I_{it}-Q)[g(Z_{it})-g(z)]K_h(Z_{it}-z)||\\&+\mathop {max}\limits _{1\le i\le n}||(I_{it}-Q)X_{it}^{\tau }(\beta -\hat{\beta })K_h(Z_{it}-z)||=J_1+J_2+J_3 \end{aligned}$$

From Markov inequality and conditions (B3) and (B4), one can obtain

$$\begin{aligned} P(J_1\ge \sqrt{Nh})\le (Nh)^{-s}\sum _{i=1}^nE[(I_{it}-Q)\varepsilon _{it}K_h(Z_{it}-z)]^{2s}\le C(Nh)^{1-s}\rightarrow 0 \end{aligned}$$

which implies that $J_1=o_p(\sqrt{Nh})$. Using some arguments similar to those used in the proof of Lemma A.6, we can prove $J_2=o_p(\sqrt{Nh})$ and $J_3=o_p(\sqrt{Nh})$. Therefore we obtain that $\mathop {max}\nolimits _{1\le i\le n}||\hat{\Xi }_{i}\{g(z)\}||=o_p(\sqrt{Nh}).$

Applying (A.30) and the proof in Owen (1990), one can derive that $\phi =O_p(N^{-1/2})$, which completes the proof of Lemma A.8. $\square $

Proof of Theorem 2.5

Invoking some arguments similar to those used in the proof of can be proved Theorems 2.4, we can proof

$$\begin{aligned} 2\mathcal {Q}_{n}(g(z))=\left[ \frac{1}{Nh}\sum _{i=1}^n\hat{\Xi }_{i}^2\{g(z)\}\right] ^{-1}\Big \{\frac{1}{\sqrt{Nh}}\sum _{i=1}^n\hat{\Xi }_{i}\{g(z)\}-b(z)\Big \}^2 \end{aligned}$$

From Lemma A.8, we can prove that $2\mathcal {Q}_{n}(g(z)){\mathop {\rightarrow }\limits ^{d}}\chi _1^2.$ $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, BQ., Hong, XJ. & Fan, GL. Penalized empirical likelihood for partially linear errors-in-variables panel data models with fixed effects. Stat Papers 61, 2351–2381 (2020). https://doi.org/10.1007/s00362-018-1049-2

Download citation

Received: 20 August 2017
Revised: 10 September 2018
Published: 28 September 2018
Issue Date: December 2020
DOI: https://doi.org/10.1007/s00362-018-1049-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Penalized empirical likelihood for partially linear errors-in-variables panel data models with fixed effects

Abstract

Similar content being viewed by others

An empirical likelihood check with varying coefficient fixed effect model with panel data

Empirical likelihood for semi-varying coefficient models for panel data with fixed effects

Estimation for the multi-way error components model with ill-conditioned panel data

1 Introduction

2 Methodology and main results

2.1 Modified empirical likelihood

Theorem 2.1

Theorem 2.2

2.2 Penalized empirical likelihood for variable selection

Theorem 2.3

Theorem 2.4

2.3 Empirical likelihood for the nonparametric part

Theorem 2.5

2.4 Computational algorithm

3 Simulation studies

4 Conclusion remarks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs of the main results

Appendix: Proofs of the main results

Lemma A.1

Proof

Lemma A.2

Proof

Lemma A.3

Proof

Lemma A.4

Proof

Lemma A.5

Proof

Lemma A.6

Proof

Lemma A.7

Proof

Proof of Theorem 2.1

Proof of Theorem 2.2

Proof of Theorem 2.3

Proof of Theorem 2.4

Lemma A.8

Proof

Proof of Theorem 2.5

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation