Empirical likelihood for heteroscedastic partially linear single-index models with growing dimensional data

Fang, Jianglin; Liu, Wanrong; Lu, Xuewen

doi:10.1007/s00184-018-0642-7

Empirical likelihood for heteroscedastic partially linear single-index models with growing dimensional data

Published: 02 February 2018

Volume 81, pages 255–281, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Metrika Aims and scope Submit manuscript

Empirical likelihood for heteroscedastic partially linear single-index models with growing dimensional data

Download PDF

Jianglin Fang¹,
Wanrong Liu² &
Xuewen Lu³

388 Accesses
4 Citations
Explore all metrics

Abstract

In this paper, we propose a new approach to the empirical likelihood inference for the parameters in heteroscedastic partially linear single-index models. In the growing dimensional setting, it is proved that estimators based on semiparametric efficient score have the asymptotic consistency, and the limit distribution of the empirical log-likelihood ratio statistic for parameters $(\beta ^{\top },\theta ^{\top })^{\top }$ is a normal distribution. Furthermore, we show that the empirical log-likelihood ratio based on the subvector of $\beta $ is an asymptotic chi-square random variable, which can be used to construct the confidence interval or region for the subvector of $\beta $. The proposed method can naturally be applied to deal with pure single-index models and partially linear models with high-dimensional data. The performance of the proposed method is illustrated via a real data application and numerical simulations.

Inferences for extended partially linear single-index models

Article 21 February 2023

A constructive hypothesis test for the single-index models with two groups

Article 13 September 2017

Specification testing of partially linear single-index models: a groupwise dimension reduction-based adaptive-to-model approach

Article 17 September 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Consider the partially linear single-index model

$$\begin{aligned} Y_{i}=X_{i}^{\top }\beta +g\left( Z_{i}^{\top }\theta \right) +\varepsilon _{i},\quad \left( i=1,\ldots ,n\right) , \end{aligned}$$

(1)

where $X_{i}\in {\mathbb {R}}^{p}$ and $Z_{i}\in {\mathbb {R}}^{r}$ are random variables with dimensions p and r respectively, $Y_{i}$ is a response variable, $\beta $ and $\theta $ are unknown parameters, $g(\cdot )$ is an unknown function, $\varepsilon _{i}$’s are independent random errors with $E(\varepsilon _{i}|X_{i},Z_{i})=0$ and $Var(\varepsilon _{i}|X_{i},Z_{i})=v(X_{i},Z_{i})>0$, v(X, Z) is a function of (X, Z) representing possible heteroscedasticity. For identifiability, we assume that the first component of $\theta $ is 1, and we use $Z_{-1}$ to denote the last $r-1$ components of Z. An interesting problem is statistical inference about $(\beta ^{\top },\theta ^{\top })^{\top }$ in partially linear single-index models, whereas all the other unknown components of model (1), such as the unknown function $g(\cdot )$ and the unspecified function v(X, Z), are termed nuisance parameters. Model (1) was proposed by Carroll et al. (1997), and is a natural extension of the partially linear model and the single-index model (See Engle et al. 1986; Xia et al. 1999). The parametric component $X_{i}^{\top }\beta $ provides a simple summary of covariate effects which are of the main scientific interest. The index $Z_{i}^{\top }\theta $ enables us to simplify the treatment of the multiple auxiliary variables, and the smooth baseline component $g(\cdot )$ enriches model flexibility. Since the partially linear single-index model was introduced, it has been broadly and deeply studied by many authors in various disciplines. For example, Yu and Ruppert (2002) proposed a penalized spline estimation procedure; Xia et al. (2002) integrated the dimension reduction idea and minimum average variance estimation; Xia and Härdle (2006) showed semiparametric estimation of partially linear single-index models; Lai and Wang (2014) studied semiparametric efficient estimation with responses missing at random. Although, such as Yu and Ruppert (2002), they did not explicitly make an equal variance assumption for $\varepsilon $, they did not account for the heteroscedasticity of model (1). Therefore, these estimators are not efficient when heteroscedasticity is present.

In practice, many variables are introduced to reduce possible modeling biases. In the early days, the value for the number of parameters p is in the ranges 10–500, and sample size n is in the ranges 100–10,000. From problems in X-ray crystallography, Huber (1973) noted that in a variable selection context the number of parameters is often large and should be modeled as $p_{n}$, which tends to $\infty $. Afterwards, with the development of technology and huge investment in various forms of data gathering, Donohn (2000) demonstrated with web term-document data, gene expression data and consumer financial history data, large sample sizes with high dimensions are important characteristics. He also found that even in a classical setting such as the Framingham heart study, the sample size is as large as $n=25{,}000$ and the dimension is $p=100$, which can be modeled as $p=O(n^{1/3})$ or $p=O(n^{1/2})$. Recently, high-dimensional data becomes more and more popular in many areas, such as financial and statistical applications, hyperspectral imagery, internet portals, high-throughput genomic data analysis and other areas of computational biology; see, e.g., Bai and Saranadasa (1996), Ledoit and Wolf (2002), Hjort et al. (2009), Zhang et al. (2012) and Ma and Zhu (2013). Zhang et al. (2012) integrated the dimension reduction idea and variable selection in partially linear single-index models with high-dimensional covariates, Ma and Zhu (2013) proposed efficient estimators for heteroscedastic partially linear single-index models allowing high-dimensional covariates. For model (1), although they allowed high-dimensional covariates, the dimension of covariates is fixed, and can not tend to infinity as the sample size $n\rightarrow \infty $.

The method of empirical likelihood introduced in Owen (1988, 1990, 1991, 2001) might be useful for the purpose of making inference for model (1). To the best of our knowledge, there is not much literature on this model by using empirical likelihood method, although it has been successfully applied to various models, see, e.g., linear models (Owen 1991), generalized linear models (Kolaczyk 1994), partially linear models (Shi and Lau 2000), heteroscedastic partially linear models (Lu 2009), single-index models (Xue and Zhu 2006), partially linear single-index models (Zhu and Xue 2006), general estimating equations (Qin and Lawless 1994), quantile estimation (Chen and Hall 1993), errors-in-covariables models (Wang and Rao 2002), Cox regression models(Qin and Jing 2001), additive risk models (Lu and Qi 2004). Empirical likelihood has also been applied to some high-dimensional problems and its asymptotic behavior under the setting where n and p both tend to infinity has also been carefully studied. Hjort et al. (2009) derived the limit distribution of the EL ratio statistic based on p-dimensional estimating equations when $p\rightarrow \infty $ with n at the rate $p=o\left( n^{1/3}\right) $; Chen et al. (2009) improved upon the rate restriction in Hjort et al. (2009) and established a nondegenerate limit distribution of the EL ratio statistic, allowing $p=o\left( n^{1/2}\right) $ under suitable regularity conditions.

Zhu and Xue (2006) investigated likelihood confidence regions in a partially linear single-index model. In their paper, empirical likelihood was constructed from the components of a semiparametric inefficient estimating equation. We think that it might be more informative if we use semiparametric efficient score to construct the empirical likelihood. Furthermore, Zhu and Xue (2006) assumed that dimensions p and r for $\beta $ and $\theta $ are fixed. Their results may not be valid when dimensions $p\rightarrow \infty $ and $r\rightarrow \infty $, as $n\rightarrow \infty $. Motivated by the empirical likelihood method for high-dimensional data in Hjort et al. (2009), in this paper, we propose a new approach to the empirical likelihood inference about $(\beta ^{\top },\theta ^{\top })^{\top }$ for heteroscedastic partially linear single-index models with high-dimensional data based on the semiparametric efficient score, where dimensions $p\rightarrow \infty $ and $r\rightarrow \infty $, as $n\rightarrow \infty $. We will show that the limit distribution of the empirical log-likelihood ratio for $\beta $ is a normal distribution. Furthermore, we will show that the empirical log-likelihood ratio based on a k-dimensional ($k<p$) subvector of $\beta $ is an asymptotically standard chi-square random variable, which can be used to construct confidence intervals or regions for the k-dimensional subvector of $\beta $.

The remainder of this paper is organized as follows. In Sect. 2, we introduce the empirical likelihood method for the inference of heteroscedastic partially linear single-index models and present our main results. The pure single-index model and the partially linear model, as the special examples, are discussed in Sect. 3. In Sect. 4, we report the results from simulation studies, and a real data example is presented in Sect. 5. Finally, the technical proofs of the main results are given in the “Appendix”.

2 Methodology and main results

Firstly, we introduce the efficient estimation method of Ma and Zhu (2013) for the parameter $(\beta ^{\top },\theta ^{\top })^{\top }$ in the heteroscedastic partially linear single-index model (1), and propose the empirical likelihood method for $(\beta ^{\top },\theta ^{\top })^{\top }$. To estimate $(\beta ^{\top },\theta ^{\top })^{\top }$, Ma and Zhu (2013) reviewed the following class of weighted estimating equations:

$$\begin{aligned}&\frac{1}{\sqrt{n}}\sum _{i=1}^{n}w\left( X_{i},Z_{i}\right) \left\{ Y_{i}-X_{i}^{\top }\beta -f\left( Z_{i}^{\top }\theta \right) \right\} \left[ a\left( X_{i},Z_{i}\right) \right. \nonumber \\&\qquad \left. \left. -\,{\tilde{E}}\left\{ a\left( X,Z\right) |Z_{i}^{\top }\theta \right) \right\} \right] =0, \end{aligned}$$

(2)

where $f(Z_{i}^{\top }\theta )$ is a given function of $Z_{i}^{\top }\theta $ which may or may not equal $g(Z_{i}^{\top }\theta )$, ${\tilde{E}}\{\cdot |Z_{i}^{\top }\theta )\}$ denotes a function of $Z_{i}^{\top }\theta $ that may or may not be the true $E\{\cdot |Z_{i}^{\top }\theta )\}$, $w_{i}=w(X_{i},Z_{i})=\mathrm{var}(\varepsilon _{i}|X_{i},Z_{i})^{-1}$ and $a(\cdot ,\cdot )\in {\mathbb {R}}^{p+r-1}$ is an arbitrary function of X and Z. They pointed out that consistent estimator $({\hat{\beta }}^{\top },{\hat{\theta }}^{\top })^{\top }$ can be obtained by (2) if $w_{i}$ and $f(\cdot )$ can be consistently estimated or is completely known. However, the doubly robustness and efficiency property of $({\hat{\beta }}^{\top },{\hat{\theta }}^{\top })^{\top }$ is lost. In order to develop estimators for $(\beta ^{\top },\theta ^{\top })^{\top }$ with the doubly robustness and efficiency property, Ma and Zhu (2013) proposed an improved class of weighted estimation equations:

$$\begin{aligned} \left\{ \begin{array}{ll} \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}{\tilde{\varepsilon }}_{i}{\hat{w}}\left( X_{i},Z_{i}\right) \left[ X_{i}-\frac{{\hat{E}}\left\{ {\hat{w}}\left( X,Z\right) X|Z_{i}^{\top }\theta \right\} }{{\hat{E}} \left\{ {\hat{w}}\left( X,Z\right) |Z_{i}^{\top }\theta \right\} }\right] =0,\\ \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}{\tilde{\varepsilon }}_{i}{\hat{w}}\left( X_{i},Z_{i}\right) {\hat{g}}'\left( Z_{i}^{\top }\theta \right) \left[ Z_{-1,i}-\frac{{\hat{E}}\left\{ {\hat{w}}\left( X,Z\right) Z_{-1}|Z_{i}^{\top }\theta \right\} }{{\hat{E}}\left\{ {\hat{w}}\left( X,Z\right) |Z_{i}^{\top }\theta \right\} }\right] =0, \end{array} \right. \end{aligned}$$

(3)

where ${\tilde{\varepsilon }}_{i}=Y_{i}-X_{i}^{\top }\beta -{\hat{g}}(Z_{i}^{\top }\theta )$, $w_{i}=w(X_{i},Z_{i})=E(\varepsilon _{i}^{2}|X_{i},Z_{i})^{-1}$, ${\hat{g}}(Z_{i}^{\top }\theta )$, ${\hat{g}}'(Z_{i}^{\top }\theta )$, ${\hat{E}}\{{\hat{w}}(X,Z)|Z_{i}^{\top }\theta \}$, ${\hat{E}}\{{\hat{w}}(X,Z)X|Z_{i}^{\top }\theta \}$ and ${\hat{E}}\{{\hat{w}}(X,Z)Z_{-1}|Z_{i}^{\top }\theta \}$ are the nonparametric estimators via kernel estimation. We assume that $\eta _{i}=\eta (X_{i},Z_{i})$ is a low dimensional variable such that $\mathrm{var}(\varepsilon _{i}|X_{i},Z_{i})=\mathrm{var}(\varepsilon _{i}|\eta _{i})$ and $\eta $ has a known form. For example, $\eta $ can be $Z^{\mathrm{T}}\theta $ or $X^{\mathrm{T}}\beta $, which means that the error variance depends on the covariates through $Z^{\mathrm{T}}\theta $ or $X^{\mathrm{T}}\beta $ only. Certainly, it can also be a combination of these two or can have any other form. Let $K_{h}(\cdot )=h^{-1}K(\cdot /h)$, where $K_{h}(\cdot )$ is a kernel function with bandwidth $h\rightarrow 0$. For bandwidths $h_{1}$, $h_{2}$ and $h_{3}$, we set

$$\begin{aligned}&{\hat{g}}\left( Z_{i}^{\top }\theta \right) =\sum _{j\ne i}K_{h_{1}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) \left( Y_{j}-X_{j}^{\top }\beta \right) /\sum _{j\ne i}K_{h_{1}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) ,\\&{\hat{g}}'\left( Z_{i}^{\top }\theta \right) =h_{1}^{-1}\left\{ \sum _{j\ne i}K_{h_{1}}'\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) \left( Y_{j}-X_{j}^{\top }\beta \right) \sum _{j\ne i}K_{h_{1}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) \right\} \\&\quad \qquad \qquad \qquad -\sum _{j\ne i}K_{h_{1}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) \left( Y_{j}-X_{j}^{\top }\beta \right) \\&\quad \qquad \qquad \qquad \left. \times \sum _{j\ne i}K_{h_{1}}'\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) \right\} \Bigg /\left\{ \sum _{j\ne i}K_{h_{1}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) \right\} ^{2},\\&{\hat{w}}\left( X_{i},Z_{i}\right) =\sum _{j\ne i}K_{h_{2}}\left( \eta _{i}-\eta _{j}\right) /\sum _{j\ne i}K_{h_{2}}\left( \eta _{i}-\eta _{j}\right) e_{j}^{2},\\&{\hat{E}}\left\{ {\hat{w}}\left( X,Z\right) |Z_{i}^{\top }\theta \right\} \\&\quad =\sum _{j\ne i}K_{h_{3}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) {\hat{w}}\left( X_{j},Z_{j}\right) /\sum _{j\ne i}K_{h_{3}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) ,\\&{\hat{E}}\left\{ {\hat{w}}\left( X,Z\right) X|Z_{i}^{\top }\theta \right\} \\&\quad =\sum _{j\ne i}K_{h_{3}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) {\hat{w}}\left( X_{j},Z_{j}\right) X_{j}/\sum _{j\ne i}K_{h_{3}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) ,\\&{\hat{E}}\left\{ {\hat{w}}\left( X,Z\right) Z_{-1,i}|Z_{i}^{\top }\theta \right\} \\&\quad =\sum _{j\ne i}K_{h_{3}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) {\hat{w}}\left( X_{j},Z_{j}\right) Z_{-1,j}/\sum _{j\ne i}K_{h_{3}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) . \end{aligned}$$

The algorithm in calculating the estimators of $(\beta ^{\top },\theta ^{\top })^{\top }$ can be found in Ma and Zhu (2013).

Ma and Zhu (2013) proved that, under some mild conditions, the semiparametric efficient score is

$$\begin{aligned} S_{eff}=w\varepsilon \left( X^{\top }-\frac{E(wX^{\top }|Z^{\top }\theta )}{E(w|Z^{\top }\theta )},g'(Z^{\top }\theta )\left\{ Z_{-1}^{\top }-\frac{E\left( wZ_{-1}^{\top }|Z^{\top }\theta \right) }{E(w|Z^{\top }\theta )}\right\} \right) ^{\top }, \end{aligned}$$

(4)

and estimators $({\hat{\beta }}^{\top },{\hat{\theta }}^{\top })^{\top }$ by solving (3) are doubly robust and efficient. In Ma and Zhu (2013), although they allowed high-dimensional covariates, the dimension of covariates is fixed, and can not tend to infinity as the sample size $n\rightarrow \infty $.

We first extend the fixed-dimensional results in Ma and Zhu (2013) to cases with diverging dimensionality, i.e., p, $r\rightarrow \infty $ as $n\rightarrow \infty $. Let $B_{n}\in {\mathbb {R}}^{(p+r-1)\times q}$ with fixed q. $B_{n}^{\top }S_{eff}$ represents a projection of the diverging dimensional vector $S_{eff}$ to a fixed dimension q.

Definition 1

Assume that $S_{eff}$ is a diverging dimensional semiparametric score, and $B_{n}^{\top }S_{eff}$ represents a projection of the diverging dimensional vector $S_{eff}$ to a fixed dimension q. We say that the diverging dimensional semiparametric score is a semiparametric efficient score if the fixed dimensional projection $B_{n}^{\top }S_{eff}$ of $S_{eff}$ is a semiparametric efficient score.

Similar to Proposition 1 of Ma and Zhu (2013), we can prove that, for any fixed q, the projection $B_{n}^{\top }S_{eff}$ of the diverging dimensional semiparametric score $S_{eff}$ is a semiparametric efficient score. Therefore, according to Definition 1, we say that the diverging dimensional semiparametric score $S_{eff}$ is a semiparametric efficient score.

The following theorem shows the theoretical properties of high-dimensional estimator based on the semiparametric score.

Theorem 2.1

Let $(\beta _{0}^{\top }, \theta _{0}^{\top })^{\top }$ be the true value of parameter vector $(\beta ^{\top },\theta ^{\top })^{\top }$, under Assumptions 1–9, then

$$\begin{aligned} \sqrt{n}AV^{1/2}\left\{ \left( {\hat{\beta }}^{\top },{\hat{\theta }}^{\top }\right) ^{\top }-\left( \beta _{0}^{\top },\theta _{0}^{\top }\right) ^{\top }\right\} {\mathop {\rightarrow }\limits ^{L}} N\left( 0,G\right) , \end{aligned}$$

where ${\mathop {\rightarrow }\limits ^{L}}$ stands for convergence in distribution, $A_{1}\in R^{q_{1}\times p}$ and $A_{2}\in R^{q_{2}\times r}$ are $q_{1}\times p$ and $q_{2}\times r$ matrixes respectively, $AA^{\top }\rightarrow G$ and G is a $(q_{1}+q_{2})\times (q_{1}+q_{2})$ matrix with fixed $q_{1}$ and $q_{2}$,

$$\begin{aligned} A= & {} \left( { \begin{array}{*{10}c} A_{1}&{}0\\ 0&{}A_{2}\\ \end{array}}\right) , \qquad V=\left( { \begin{array}{*{10}c} V_{11}&{}V_{12}\\ V_{21}&{}V_{22}\\ \end{array}}\right) ,\\ V_{11}= & {} E\left\{ wXX^{\top }-\frac{E(wX|Z^{\top }\theta )E(wX^{\top }|Z^{\top }\theta )}{E(w|Z^{\top }\theta )}\right\} ,\\ V_{12}= & {} E\left[ g'(Z^{\top }\theta )\left\{ wXZ_{-1}^{\top }-\frac{E(wX|Z^{\top }\theta )E(wZ_{-1}^{\top }|Z^{\top }\theta )}{E(w|Z^{\top }\theta )}\right\} \right] ,\\ V_{21}= & {} E\left[ g'(Z^{\top }\theta )\left\{ wZ_{-1}X^{\top }-\frac{E(wZ_{-1}|Z^{\top }\theta )E(wX^{\top }|Z^{\top }\theta )}{E(w|Z^{\top }\theta )}\right\} \right] ,\\ V_{22}= & {} E\left[ g'(Z^{\top }\theta )^{2}\left\{ wZ_{-1}Z_{-1}^{\top }-\frac{E(wZ_{-1}|Z^{\top }\theta )E(wZ_{-1}^{\top }|Z^{\top }\theta )}{E(w|Z^{\top }\theta )}\right\} \right] . \end{aligned}$$

Theorem 2.1 extends the fixed-dimensional results in Ma and Zhu (2013) to cases with diverging dimensionality. Furthermore, in Theorem 2.1, A represents a projection of the diverging dimensional vector to a fixed dimension $q_{1}+q_{2}$, and the limiting distribution of the projected vector of $\{({\hat{\beta }}^{\top },{\hat{\theta }}^{\top })^{\top }-(\beta _{0}^{\top },\theta _{0}^{\top })^{\top }\}$ is a multivariate normal distribution. This theorem provides the consistency and normality of the projected vector of the estimator $({\hat{\beta }}^{\top },{\hat{\theta }}^{\top })^{\top }$ for heteroscedastic partially linear single-index models.

Next, we introduce an auxiliary random vector by using the semiparametric efficient score. Let

$$\begin{aligned} \xi _{i}(\beta ,\theta )=w_{i}\varepsilon _{i}\left( X_{i}^{\top }-\frac{E\left( wX^{\top }|Z_{i}^{\top }\theta \right) }{E(w|Z_{i}^{\top }\theta )}, g'\left( Z_{i}^{\top }\theta \right) \left\{ Z_{-1,i}^{\top }-\frac{E\left( wZ_{-1}^{\top }|Z_{i}^{\top }\theta \right) }{E(w|Z_{i}^{\top }\theta )}\right\} \right) ^{\top }. \end{aligned}$$

Note that $E\{\xi _{i}(\beta ,\theta )\}=0$ for $i=1,\ldots , n$, if $(\beta ^{\top },\theta ^{\top })^{\top }$ is the true parameter. According to this fact, we apply the empirical likelihood method of Owen (1988, 1990) to make inference about $(\beta ^{\top },\theta ^{\top })^{\top }$. Let $\pi =(\pi _{1},\cdots , \pi _{n})$ be a probability vector, satisfying $\sum _{i=1}^{n}\pi _{i}=1$ $\pi _{i}\ge 0$ for $i=1,\ldots , n$. The traditional empirical likelihood function for $(\beta ^{\top },\theta ^{\top })^{\top }$ is defined as follows:

$$\begin{aligned} L(\beta ,\theta )=\sup \left\{ \prod _{i=1}^{n}(n\pi _{i}):\sum _{i=1}^{n}\pi _{i}=1, \pi _{i}\ge 0, \sum _{i=1}^{n}\pi _{i}\xi _{i}(\beta ,\theta )=0\right\} . \end{aligned}$$

(5)

Because (5) contains unknown functions $w(X_{i},Z_{i})$, $g(Z_{i}^{\top }\theta )$, $E\{w(X,Z)|Z_{i}^{\top }\theta \}$, $g'(Z_{i}^{\top }\theta )$, $E\{w(X,Z)X|Z_{i}^{\top }\theta \}$ and $E\{w(X,Z)Z_{-1}|Z_{i}^{\top }\theta \}$, it cannot be used directly to make inference on $(\beta ^{\top },\theta ^{\top })^{\top }$. To solve this problem, a natural method is to replace those by their estimators ${\hat{w}}(X_{i},Z_{i})$, ${\hat{g}}(Z_{i}^{\top }\theta )$, ${\hat{g}}'(Z_{i}^{\top }\theta )$, ${\hat{E}}\{{\hat{w}}(X,Z)|Z_{i}^{\top }\theta \}$, ${\hat{E}}\{{\hat{w}}(X,Z)X|Z_{i}^{\top }\theta \}$ and ${\hat{E}}\{{\hat{w}}(X,Z)Z_{-1}|Z_{i}^{\top }\theta \}$ given above. Define an estimated empirical likelihood function for $(\beta ^{\top },\theta ^{\top })^{\top }$ as

$$\begin{aligned} {\tilde{L}}(\beta ,\theta )=\sup \left\{ \prod _{i=1}^{n}(n\pi _{i}):\sum _{i=1}^{n}\pi _{i}=1, \pi _{i}\ge 0, \sum _{i=1}^{n}\pi _{i}{\hat{\xi }}_{i}(\beta ,\theta )=0\right\} , \end{aligned}$$

(6)

where

$$\begin{aligned} {\hat{\xi }}_{i}\left( \beta ,\theta \right) ={\hat{w}}_{i}{\tilde{\varepsilon }}_{i}\left( X_{i}^{\top }-\frac{{\hat{E}}\left( {\hat{w}}X^{\top }|Z_{i}^{\top }\theta \right) }{{\hat{E}}\left( {\hat{w}} |Z_{i}^{\top }\theta \right) },{\hat{g}}'\left( Z_{i}^{\top }\theta \right) \left\{ Z_{-1,i}^{\top }-\frac{{\hat{E}}\left( {\hat{w}}Z_{-1}^{\top }|Z_{i}^{\top }\theta \right) }{{\hat{E}}\left( {\hat{w}}|Z_{i}^{\top } \theta \right) }\right\} \right) ^{\top }, \end{aligned}$$

and ${\tilde{\varepsilon }}_{i}=Y_{i}-X_{i}^{\top }\beta -{\hat{g}}(Z_{i}^{\top }\theta )$. The estimated empirical log-likelihood ratio is

$$\begin{aligned} {\tilde{l}}(\beta ,\theta )=-\,2\left\{ \log \{{\tilde{L}}(\beta ,\theta )\}-n\log (n)\right\} . \end{aligned}$$

(7)

According to Tsao (2004), when the dimension $p+r$ is moderately large but fixed, the distribution of empirical likelihood ratio ${\tilde{l}}(\beta ,\theta )$ has an atom at infinity for fixed sample size n: the probability of ${\tilde{l}}(\beta ,\theta )=\infty $ is nonzero. Based on the paper of Tsao (2004), the dimension $p+r$ and the sample size n increase at the same rate such that $p/n\ge 0.5$, the probability of ${\tilde{l}}(\beta ,\theta )=\infty $ converges to 1 since the probability of $(\beta ^{\top },\theta ^{\top })^{\top }$ being contained in the convex hull of the sample converges to 0. These reveal the effects of the dimension $p+r$ on the empirical likelihood from another perspective. In this paper, we analyze the empirical likelihood for heteroscedastic partially linear single-index models with high-dimensional data, which p, r and n increase at the mild rate in order to ensure the empirical likelihood ratio ${\tilde{l}}(\beta ,\theta )$ having definition.

By using the Lagrange multiplier method, $\{\pi _{i}\}_{i=1}^{n}$ in (6) are

$$\begin{aligned} \pi _{i}=\frac{1}{n}\frac{1}{1+\lambda ^{\top }{\hat{\xi }}_{i}(\beta ,\theta )}, \end{aligned}$$

with the restriction on $\lambda $ that is

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\frac{{\hat{\xi }}_{i}(\beta ,\theta )}{1+\lambda ^{\top }{\hat{\xi }}_{i}(\beta ,\theta )}=0. \end{aligned}$$

(8)

Therefore, the estimated empirical log-likelihood ratio function for $(\beta ^{\top },\theta ^{\top })^{\top }$ defined in (7) is given by

$$\begin{aligned} {\tilde{l}}(\beta ,\theta )=-\,2\left\{ \log \{{\tilde{L}}(\beta ,\theta )\}-n\log (n)\right\} =2\sum _{i=1}^{n}\log \left\{ 1+\lambda ^{\top }{\hat{\xi }}_{i}(\beta ,\theta )\right\} . \end{aligned}$$

(9)

The following theorem gives the asymptotic distribution of ${\tilde{l}}(\beta ,\theta )$.

Theorem 2.2

Let $(\beta _{0}^{\top }, \theta _{0}^{\top })^{\top }$ be the true value of parameter vector $(\beta ^{\top },\theta ^{\top })^{\top }$, under Assumptions 1–9, ${\tilde{l}}(\beta _{0},\theta _{0})$ has an asymptotic standard normal distribution, i.e.,

$$\begin{aligned} \left\{ 2(p+r-1)\}^{-1/2}\{{\tilde{l}}(\beta _{0},\theta _{0})-(p+r-1)\right\} {\mathop {\rightarrow }\limits ^{L}} N(0,1). \end{aligned}$$

Therefore, Theorem 2.2 can be used to test the hypothesis

$$\begin{aligned} H_{0}: \left( \beta ^{\top },\theta ^{\top }\right) ^{\top }=\left( \beta _{0}^{\top }, \theta _{0}^{\top }\right) ^{\top }~~~vs~~~H_{1}: \left( \beta ^{\top },\theta ^{\top }\right) ^{\top }\ne \left( \beta _{0}^{\top }, \theta _{0}^{\top }\right) ^{\top }, \end{aligned}$$

and on the other hand, it also can be used to construct confidence regions for $(\beta ^{\top },\theta ^{\top })^{\top }$. Let

$$\begin{aligned} I_{\alpha }(\beta ,\theta )=\left\{ (\beta ^{\top },\theta ^{\top })^{\top }:\left| {\tilde{l}}(\beta ,\theta )-(p+r-1)\right| \le z_{\alpha /2}\sqrt{2(p+r-1)}\right\} , \end{aligned}$$

where $z_{\alpha /2}$ is the upper $\alpha /2-$quantile of the standard normal distribution, then by Theorem 2.2, $I_{\alpha }(\beta ,\theta )$ gives an approximate confidence region for $(\beta ^{\top },\theta ^{\top })^{\top }$ with asymptotically correct coverage probability $1-\alpha $, i.e.,

$$\begin{aligned} P((\beta ^{\top },\theta ^{\top })^{\top }\in I_{\alpha }(\beta ,\theta ))=1-\alpha +o(1). \end{aligned}$$

Remark 1

Theorem 2.2 gives the asymptotic distribution of ${\tilde{l}}(\beta ,\theta )$ with diverging dimensionality, i.e., p, $r\rightarrow \infty $ as $n\rightarrow \infty $. When p and r are fixed and do not diverge with n, ${\tilde{l}}(\beta _{0},\theta _{0})$ is asymptotically a standard chi-square random variable, and it is easy to see from the proof of Theorem 2.2 in “Appendix” that

$$\begin{aligned} {\tilde{l}}(\beta _{0},\theta _{0}){\mathop {\rightarrow }\limits ^{L}}\chi _{p+r-1}^{2}, \end{aligned}$$

where $\chi _{k}^{2}$ denotes the chi-square distribution with $(p+r-1)$ degrees of freedom.

In practice, it is rarely the case that confidence regions for the entire parameter vector will be sought, because as soon as dimension $p>3$, it is hard to visualize or represent the confidence region. Usually one will only be interested in a one or two-dimensional subvector of parameter. Assume $(\beta ^{\top },\theta ^{\top })^{\top }=(\beta ^{(1)\top },\beta ^{(2)\top },\theta ^{\top })^{\top }$, where $\beta ^{(1)}$ is k-dimensional (k is fixed and does not diverge with n) for which the confidence interval/region is to be constructed. The true parameter vector can be partitioned as $(\beta _{0}^{\top },\theta _{0}^{\top })^{\top }=(\beta _{0}^{(1)\top }, \beta _{0}^{(2)\top },\theta _{0}^{\top })^{\top }$, and $X_{i}$ and $Z_{i}$ can be similarly partitioned.

We can obtain, with fixed $\beta ^{(1)}$, estimators for the rest of the parameters by solving (3), denoted by ${\hat{\beta }}^{(2)}$ and ${\hat{\theta }}$. Define

$$\begin{aligned} \hat{{\tilde{\xi }}}_{i}(\beta ^{(1)})= & {} {\hat{w}}_{i}\left\{ Y_{i}-X_{i}^{(1)\top }\beta ^{(1)}-X_{i}^{(2)\top }{\hat{\beta }}^{(2)}\right. \\&\left. -\,{\hat{g}}\left( Z_{i}^{\top }{\hat{\theta }}\right) \right\} \left( X_{i}^{(1)}-\frac{{\hat{E}}\left( {\hat{w}}X^{(1)}|Z_{i}^{\top }{\hat{\theta }}\right) }{{\hat{E}}\left( {\hat{w}}|Z_{i}^{\top }{\hat{\theta }}\right) }\right) \end{aligned}$$

The estimated empirical log-likelihood for $\beta ^{(1)}$ is given by

$$\begin{aligned} {\tilde{l}}(\beta ^{(1)})=2\sum _{i=1}^{n}\log \left\{ 1+\lambda ^{(1)\top }\hat{{\tilde{\xi }}}_{i}(\beta ^{(1)})\right\} . \end{aligned}$$

(10)

Theorem 2.3

Under the same assumptions as in Theorem 2.2,

$$\begin{aligned} {\tilde{l}}\left( \beta _{0}^{(1)}\right) {\mathop {\rightarrow }\limits ^{L}} \chi _{k}^{2}. \end{aligned}$$

Based on Theorem 2.3, a confidence interval or region for $\beta ^{(1)}$ can be easily constructed. For any given $0<\alpha <1$, there exists $c_{\alpha }$ such that $P(\chi _{k}^{2}>c_{\alpha })=\alpha $, then

$$\begin{aligned} I_{\alpha }(\beta ^{(1)})=\left\{ \beta ^{(1)}\in R^{k}:{\tilde{l}}(\beta ^{(1)})\le c_{\alpha }\right\} \end{aligned}$$

is the confidence interval or region of the $\beta ^{(1)}$ with asymptotically correct coverage probability $1-\alpha $.

Remark 2

Based on Tsao (2004), in high dimensional setting, the dimension will have non-ignorable effect on coverage probabilities of empirical likelihood confidence regions. Therefore, coverage probabilities of empirical likelihood confidence regions may be less than the nominal significance level. We hope to consider how to improve coverage probabilities of empirical likelihood confidence regions under the high dimensional situations in future communications.

3 Two special cases: partially linear models and single-index models

In this section, we present the empirical likelihood inference for $\beta $ or $\theta $ of two special cases in model (1). First we consider the heteroscedastic partially linear model with a diverging number parameters, meaning that single-index component in model (1) does not contain parameters, and the model can be written as

$$\begin{aligned} Y_{i}=X_{i}^{\top }\beta +g(Z_{i})+\varepsilon _{i},~~~(i=1,\ldots ,n). \end{aligned}$$

(11)

The weighted estimating equations for model (11) become

$$\begin{aligned} n^{-1/2}\sum \limits _{i=1}^{n}{\hat{w}}(X_{i},Z_{i})\left( Y_{i}{-}X_{i}^{\top }\beta {-}{\hat{g}}(Z_{i})\right) \left[ X_{i}{-}\frac{{\hat{E}} \{{\hat{w}}(X,Z)X|Z_{i}\}}{{\hat{E}}\{{\hat{w}}(X,Z)|Z_{i}\}}\right] =0. \end{aligned}$$

(12)

The estimator ${\hat{\beta }}$ can be obtained by solving semiparametric efficient score equations (12). Rewrite the auxiliary random vector

$$\begin{aligned} {\hat{\xi }}_{i}(\beta )={\hat{w}}_{i}\left\{ Y_{i}-X_{i}^{\top }\beta -{\hat{g}}(Z_{i})\right\} \left\{ X_{i}-\frac{{\hat{E}}({\hat{w}}X|Z_{i})}{{\hat{E}} ({\hat{w}}|Z_{i})}\right\} . \end{aligned}$$

Therefore, the estimated empirical log-likelihood ratio function for $\beta $ is

$$\begin{aligned} {\tilde{l}}_{1}(\beta )=2\sum _{i=1}^{n}\log \left\{ 1+\lambda ^{\top }{\hat{\xi }}_{i}(\beta )\right\} . \end{aligned}$$

(13)

Assume $\beta =(\beta ^{(1)\top },\beta ^{(2)\top })^{\top }$, where $\beta ^{(1)}$ is a k-dimensional (k is fixed and does not diverge with n) component of $\beta $. The true parameter vector can be partitioned as $\beta _{0}=(\beta _{0}^{(1)\top }, \beta _{0}^{(2)\top })^{\top }$, and $X_{i}$ can be similarly partitioned. We can obtain, with fixed $\beta ^{(1)}$, estimators for the rest of the parameters by solving (12), denoted by ${\hat{\beta }}^{(2)}$. Define

$$\begin{aligned} \tilde{{\tilde{\xi }}}_{i}(\beta ^{(1)})={\hat{w}}_{i}\left\{ Y_{i}-X_{i}^{(1)\top }\beta ^{(1)}-X_{i}^{(2)\top }{\hat{\beta }}^{(2)}-{\hat{g}}(Z_{i})\right\} \left\{ X_{i}^{(1)}-\frac{{\hat{E}}({\hat{w}}X^{(1)}|Z_{i})}{{\hat{E}}({\hat{w}}|Z_{i})}\right\} . \end{aligned}$$

The estimated empirical log-likelihood for $\beta ^{(1)}$ is given by

$$\begin{aligned} \tilde{{\tilde{l}}}(\beta ^{(1)})=2\sum _{i=1}^{n}\log \left\{ 1+\lambda ^{(1)\top }\tilde{{\tilde{\xi }}}_{i}(\beta ^{(1)})\right\} . \end{aligned}$$

(14)

In high-dimensional setting, there are some theoretical properties for model (11) as follows.

Theorem 3.1

Let $\beta _{0}$ be the true value of parameter vector $\beta $, under Assumptions 1–9, then

$$\begin{aligned}&(1)\quad \sqrt{n}A_{n}V_{1}^{-1/2}({\hat{\beta }}-\beta _{0}){\mathop {\rightarrow }\limits ^{L}} N(0,G),\\&(2)\quad (2p)^{-1/2}\left\{ {\tilde{l}}_{1}(\beta _{0})-p\right\} {\mathop {\rightarrow }\limits ^{L}} N(0,1),\\&(3)\quad \tilde{{\tilde{l}}}(\beta _{0}^{(1)}) {\mathop {\rightarrow }\limits ^{L}} \chi _{k}^{2}, \end{aligned}$$

where $A_{n}\in R^{q\times p}$ is a $q\times p $ matrix such that $A_{n}A_{n}^{\top }\rightarrow G$ and G is a $q\times q$ matrix with fixed q, and

$$\begin{aligned} V_{1}=\left[ E\left\{ wXX^{\top }-\frac{E(wX|Z)E(wX|Z)^{\top }}{E(w|Z)}\right\} \right] ^{-1}. \end{aligned}$$

Theorem 3.1(1) extends the fixed-dimensional results in Ma et al. (2006) to cases with diverging dimensionality. Theorem 3.1(2) can be used to construct empirical likelihood confidence regions for $\beta $. A $(1-\alpha )100\%$ confidence region for $\beta $ is given by

$$\begin{aligned} I_{\alpha }(\beta )=\left\{ \beta :\left| {\tilde{l}}_{1}(\beta )-p\right| \le z_{\alpha /2}\sqrt{2p}\right\} . \end{aligned}$$

When p is fixed and does not diverge with n, ${\tilde{l}}_{1}(\beta _{0})$ has the asymptotic chi-square distribution, and this corresponds to the empirical likelihood method considered by Lu (2009) for heteroscedastic partially linear models. Theorem 3.1(3) can be used to construct the confidence interval or region of subvector of parameter $\beta $. Based on Theorem 3.1(3), for $0<\alpha <1$, the confidence interval or region for $\beta ^{(1)}$ can be given by

$$\begin{aligned} {\tilde{I}}_{\alpha }(\beta ^{(1)})=\left\{ \beta ^{(1)}|\tilde{{\tilde{l}}}(\beta ^{(1)})\le \chi _{k}^{2}(\alpha )\right\} . \end{aligned}$$

For high-dimensional data, we now consider the heteroscedastic pure single-index model

$$\begin{aligned} Y_{i}=g\left( Z_{i}^{\top }\theta \right) +\varepsilon _{i},~~~(i=1,\ldots ,n). \end{aligned}$$

(15)

It means that there is no linear component in model (1). The weighted estimating equations can be written as

$$\begin{aligned} n^{-1/2}\sum \limits _{i=1}^{n}{\hat{w}}\left\{ Y_{i}{-}{\hat{g}}\left( Z_{i}^{\top }\theta \right) \right\} {\hat{g}}'\left( Z_{i}^{\top }{\tilde{\theta }}\right) \left[ Z_{-1,i}{-}\frac{{\hat{E}}\left\{ {\hat{w}}Z_{-1}|Z_{i}^{\top }{\tilde{\theta }}\right\} }{{\hat{E}}\left\{ {\hat{w}}|Z_{i}^{\top }{\tilde{\theta }}\right\} }\right] =0, \end{aligned}$$

(16)

where ${\tilde{\theta }}$ is the solution of the following equations

$$\begin{aligned} n^{-1/2}\sum _{i=1}^{n}{\hat{w}}\left\{ Y_{i}-f\left( Z_{i}^{\top }\theta \right) \right\} \left\{ a(Z_{i})-{\tilde{E}}\left\{ a(Z)|Z_{i}^{\top }\theta )\right\} \right\} =0. \end{aligned}$$

The estimator ${\hat{\theta }}$ can be got by solving semiparametric efficient score equations (16). Similarly, redefine the auxiliary random vector

$$\begin{aligned} {\hat{\xi }}_{i}\left( \theta \right) ={\hat{w}}_{i}\left\{ Y_{i}-{\hat{g}}\left( Z_{i}^{\top }\theta \right) \right\} {\hat{g}}'\left( Z_{i}^{\top }\theta \right) \left\{ Z_{-1,i}-\frac{{\hat{E}}\left( {\hat{w}}Z_{-1}|Z_{i}^{\top }\theta \right) }{{\hat{E}}\left( {\hat{w}}|Z_{i}^{\top }\theta \right) }\right\} . \end{aligned}$$

and the estimated empirical log-likelihood ratio is

$$\begin{aligned} {\tilde{l}}_{2}(\theta )=2\sum _{i=1}^{n}\log \left\{ 1+\lambda ^{\top }{\hat{\xi }}_{i}(\theta )\right\} . \end{aligned}$$

Theorem 3.2

Under Assumptions 1–9, if $\theta _{0}$ is the true parameter value, then

$$\begin{aligned}&(1) \quad \sqrt{n}A_{n}V_{2}^{-1/2}({\hat{\theta }}_{-1}-\theta _{0,-1}){\mathop {\rightarrow }\limits ^{L}} N(0,G), \\&(2) \quad \{2(r-1)\}^{-1/2}\left\{ {\tilde{l}}_{2}(\theta _{0})-(r-1)\right\} {\mathop {\rightarrow }\limits ^{L}} N(0,1), \end{aligned}$$

where $A_{n}\in R^{q\times (r-1)}$ is a $q\times (r-1)$ matrix such that $A_{n}A_{n}^{\top }\rightarrow G$ and G is a $q\times q$ matrix with fixed q, and

$$\begin{aligned} V_{2}=E\left[ g'(Z^{\top }\theta )^{2}\left\{ wZ_{-1}Z_{-1}^{\top }-\frac{E(wZ_{-1}|Z^{\top }\theta )E(wZ_{-1}^{\top }|Z^{\top }\theta )}{E(w|Z^{\top }\theta )}\right\} \right] . \end{aligned}$$

The theoretical properties of ${\hat{\theta }}$ is given by Theorem 3.2(1). When r is fixed, ${\tilde{l}}_{2}(\theta _{0}){\mathop {\rightarrow }\limits ^{L}} \chi _{r-1}^{2}$. According to Theorem 3.2(2), a $(1-\alpha )100\%$ empirical likelihood confidence region for $\theta $ is

$$\begin{aligned} I_{\alpha }(\theta )=\left\{ \theta :\left| {\tilde{l}}_{2}(\theta )-(r-1)\right| \le z_{\alpha /2}\sqrt{2(r-1)}\right\} . \end{aligned}$$

4 Simulation studies

Firstly, we describe how to approach the optimization problems posed by the empirical likelihood method. Due to the nonconvexity, computing the empirical likelihood ratio is nontrivial. We can use the following algorithm to obtain the estimated empirical likelihood ratio defined by (7).

Step 1 Obtain ${\hat{g}}(Z_{i}^{\top }\theta )$, ${\hat{g}}'(Z_{i}^{\top }\theta )$, ${\hat{w}}(X_{i},Z_{i})$, ${\hat{E}}\{{\hat{w}}(X,Z)|Z_{i}^{\top }\theta \}$, ${\hat{E}}\{{\hat{w}}(X,Z)X|Z_{i}^{\top }\theta \}$ and ${\hat{E}}\{{\hat{w}}(X,Z)Z_{-1}|Z_{i}^{\top }\theta \}$ described above by using the fixed values of $(\beta ^{\top },\theta ^{\top })^{\top }$.
Step 2 Obtain the auxiliary random vector ${\hat{\xi }}_{i}(\beta ,\theta )$.
Step 3 Obtain ${\hat{\lambda }}$ by using the Newtons method to minimize (9) over $\lambda $.
Step 4 Obtain ${\tilde{l}}(\beta ,\theta )$ defined by (9) based on ${\hat{\xi }}_{i}(\beta ,\theta )$ and ${\hat{\lambda }}$.

Next, we present some Monte Carlo experiments to compare the finite sample performance of EL with normal approximation that is based on the doubly robust and efficient method (DRE) in Ma and Zhu (2013). Throughout these simulations, we use the Epanechnikov kernel $K(t)=3/4(1-t^{2})_{+}$ in all the nonparametric regression procedures and use the cross validation method to choose the optimal bandwidth $h_{opt}$ satisfying Assumption 6. We consider the heteroscedastic partially single-index model (1). In our simulations, $(X_{1},\ldots , X_{p}, Z_{1},\ldots , Z_{r})^{\top }$ are generated from a multivariate normal distribution with mean 0 and covariance matrix $(\sigma _{ij})_{(p+r)\times (p+r)}$ where $\sigma _{ij}=0.5^{|i-j|}$, and Y are generated from a normal distribution with mean $X^{\top }\beta +\exp (Z^{\top }\theta )$, and variance function $|Z^{\top }\theta |$. Furthermore, $\beta =(1,0.5,-\,1,1,0.5,-\,1, \ldots )^{\top }$ and $\theta =(1, 0.25, -\,0.25, 0.25, 0,\ldots ,0)^{\top }$. We use the dimensions $p=10, 20, 30$ and the corresponding dimension $r=10$.

In the first simulation, we consider the confidence interval for $\beta _{1}$ constructed by using empirical likelihood and results for other parameters are similar and not presented here. For comparison, we also construct the confidence interval for $\beta _{1}$ by using the doubly robust and efficient method in Ma and Zhu (2013). In each case we repeat the simulation 1000 times, and the nominal confidence level $1-\alpha $ is taken to 0.95. The results are presented in Table 1. In addition, the column of time in Table 1 denote the computing cost of the simulation. For example, the value of 2.6 s in Table 1 means the average computing cost of each simulation, and the total computing cost of repeating the simulation 1000 times is $2.6\times 1000=2600(s)$. This is also the case in Tables 2 and 3.

From Table 1, we have the following results:

(1)
We can see that the coverage probabilities (CP) for $\beta _{1}$ based on the EL and the doubly robust and efficient method (DRE) method increase as the sample size n increases, and the coverage probabilities appear to be close to the nominal levels especially with moderate sample size. Certainly, the coverage probabilities based on the empirical likelihood can not be up to the nominal levels when the dimension p grows with the sample size n. According to Tsao (2004), the dimension will have non-ignorable effect on coverage probabilities of empirical likelihood confidence regions. Some methods may be used to improve coverage probabilities of empirical likelihood confidence regions/intervals under the high dimensional situation, such as Bartlett adjustment. How to improve it under the high dimensional situations will be investigated in our future communications.
(2)
The confidence intervals based on the empirical likelihood method consistently have better coverage probabilities than the intervals based on DRE, and the average length (AL) of the intervals based on the empirical likelihood is slightly shorter than that of the intervals based on DRE.
(3)
The average computing cost of the empirical likelihood method is less than the DRE method from Table 1. However, It must have a lot to do with computer configuration, and high performance computers can improve the computing cost. The outcomes in Table 1 are based on a normal computer.

Table 1 The coverage probability (CP) and average length (AL) for $\beta _{1}$

Full size table

In the second simulation, we further consider confidence regions for $(\beta _{1},\beta _{2})$ constructed by EL and DRE where we show the coverage probability of the constructed regions. The results are presented in Table 2. Finally, we consider confidence regions for $(\beta ^{\top },\theta ^{\top })^{\top }$ constructed by using EL where we show the coverage of the constructed regions. For comparison, we also construct confidence regions for $(\beta ^{\top },\theta ^{\top })^{\top }$ by using DRE. In each case we repeat the simulation 1000 times. The results are presented in Table 3. Observing Tables 2 and 3, the message is similar as before and we can find that EL consistently achieves slightly higher coverage probabilities than DRE.

Table 2 Comparison of coverage probability for $(\beta _{1},\beta _{2})$ between EL method and DRE method

Full size table

Table 3 Comparison of coverage probability for $(\beta ^{\top },\theta ^{\top })^{\top }$ between EL method and DRE method

Full size table

5 Real data application

We further illustrate our proposed method by applying the heteroscedastic partially linear single index model to data from AIDS Clinical Trials Group Protocol 175 (ACTG175) which has been analyzed by Hammer et al. (1996), Davidian et al. (2005) and Lai and Wang (2014). CD4 is a co-receptor that assists the T cell receptor (TCR) with an antigen-presenting cell, and many HIV clinical trials focus on comparing treatment effects on CD4 count after a specified period. ACTG175 data concludes four antiretroviral regimens [zidovudine (ZDV) monotherapy, ZDV+didanosine (ddI), ZDV+zalcitabine, ddI] and randomizes 2139 patients to four antiretroviral regimens in equal proportions. The findings of ACTG 175 indicate that antiretroviral therapy could reduce the risk in people with intermediate stage HIV disease and no symptoms.

Table 4 95% confidence intervals based on EL and DRE, and estimators for $(\beta ^{\top },\theta ^{\top })^{\top }$

Full size table

In this paper, we will build a heteroscedastic partially linear single-index model for a subject’s response with ZDV monotherapy. There are 532 patients with CD4 cell counts between 200 and 500 per cubic millimeter. The response Y is the CD4 cell count at $96\pm 5$ weeks (CD496). Because some CD496 are missing in the ACTG175 data, we consider a subset of the ACTG175 data representing 321 samples from subjects with CD496. The predictor variables are age, wtkg, hemo, homo, drugs, karnof, race, gender, str2, symptom, CD40, CD420, CD80 and CD820. We divide the discrete variables (hemo, homo, drugs, karnof , race, gender, str2, symptom) into the linear part and the standardized continuous variables (age,wtkg, CD80, CD820, CD40, CD420) into the single-index part. Furthermore, we standardize the response variable CD496, and build the following heteroscedastic partially linear single-index model

$$\begin{aligned} Y=X^{\top }\beta +g\left( Z^{\top }\theta \right) +\varepsilon , \end{aligned}$$

where $X^{\top }=(\text{ hemo, } \text{ homo, } \text{ drugs, } \text{ karnof } \text{, } \text{ race, } \text{ gender, } \text{ str2, } \text{ symptom })$, $Z^{\top }=(\text{ age,wtkg, } \text{ CD80, } \text{ CD820, } \text{ CD40,CD420 })$, $Y=\text{ CD496 }$ and $g(\cdot )$ is an unknown function. According to the plot of estimated errors in Lai and Wang (2014), we can find that $\varepsilon $ is heteroscedastic. Using the method proposed in Sect. 2, we can obtain the estimators and confidence intervals for the parameters, and the results are summarized in Table 4. It is seen from Table 4 that the confidence intervals based on the empirical likelihood method are somewhat different from those of the doubly robust and efficient method. The doubly robust and efficient method gives larger intervals and imposes symmetry on the confidence intervals. Lai and Wang (2014) analyzed the ACTG175 data by using the variable selection method. They found that the effect of CD40 and CD420 were significant, and the corresponding variables were included in the final fitted model. From Table 4, we find that the confidence intervals based on the empirical likelihood method of CD40 and CD420 do not cover the zero point. However, the confidence intervals based on the doubly robust and efficient method of CD40 and CD420 cover the zero point, which mean that the effect of CD40 and CD420 are insignificant. In addition, according to the empirical likelihood method, the coefficient of antiretroviral history (str2), symptomatic indicator (symptom) and weight (wtkg) are significantly negative and the coefficient of age is significantly positive. These factors play an important role for four different antiretroviral regimens. We think that those findings corroborate the results in Lai and Wang (2014) very well.

References

Bai Z, Saranadasa H (1996) Effect of high dimension: by an example of a two sample problem. Stat Sin 6:311–329
MathSciNet MATH Google Scholar
Carroll R, Fan J, Gijbels I, Wand M (1997) Generalized partially linear single-index models. J Am Stat Assoc 92:477–489
Article MathSciNet MATH Google Scholar
Chen S, Hall F (1993) Smoothed empirical likelihood confidence intervals for quantiles. Ann Stat 21:1166–1181
Article MathSciNet MATH Google Scholar
Chen S, Peng L, Qin Y (2009) Effects of data dimension on empirical likelihood. Biometrika 96:712–722
MathSciNet MATH Google Scholar
Davidian M, Tsiatis A, Leon S (2005) Semiparametric estimation of treatment effect in a pretest-posttest study with missing data. Stat Sin 20:261–301
MathSciNet MATH Google Scholar
Donohn D (2000) High-dimensional data analysis: high-dimensional data analysis: the curses and blessings of dimensionality. Aide-memoire of a lecture at AMS conference on math challenges of the 21st century
Engle R, Granger C, Rise J, Weiss A (1986) Semiparametric estimates of the relation between weather and electricity sales. J Am Stat Assoc 81:310–320
Article Google Scholar
Hall P, Hyde C (1980) Martingale central limit theory and its applications. Academic Press, New York
Google Scholar
Hammer S, Katzenstein D, Hughes M et al (1996) For the AIDS clinical trials group study 175 study team: a trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. New Engl J Med 20:1081–1089
Article Google Scholar
Hjort H, Mckeague I, Van Keilegom I (2009) Extending the scope of empirical likelihood. Ann Stat 37:1079–1111
Article MathSciNet MATH Google Scholar
Huber P (1973) Robust regression: asymptotics, conjectures and Monte Carlo. Ann Stat 1:799–821
Article MathSciNet MATH Google Scholar
Kolaczyk E (1994) Empirical likelihood for generalized linear models. Stat Sin 4:199–218
MathSciNet MATH Google Scholar
Lai P, Wang Q (2014) Semiparametric efficient estimation for partially linear single-index models with responses missing at random. J Multivar Anal 128:33–50
Article MathSciNet MATH Google Scholar
Ledoit O, Wolf M (2002) Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann Stat 30:1081–1102
Article MathSciNet MATH Google Scholar
Li G, Wang Q (2003) Empirical likelihood regression analysis for right censored data. Stat Sin 13:51–68
MathSciNet MATH Google Scholar
Lu X (2009) Empirical likelihood for heteroscedastic partially linear models. J Multivar Anal 100:387–395
Article MathSciNet MATH Google Scholar
Lu X, Qi Y (2004) Empirical likelihood for the additive risk model. Probab Math Stat 24:419–431
MathSciNet MATH Google Scholar
Ma Y, Zhu L (2013) Doubly robust and efficient estimators for heteroscedastic partially linear single-index models allowing high dimensional covariates. J R Stat Soc Ser B 75:305–322
Article MathSciNet Google Scholar
Ma Y, Chiou J, Wang N (2006) Efficient semiparametric estimator for heteroscedastic partially linear models. Biometrika 943:75–84
Article MathSciNet MATH Google Scholar
Owen A (1988) Empirical likelihood ratio confidence intervals for a single function. Biometrika 75:237–249
Article MathSciNet MATH Google Scholar
Owen A (1990) Empirical likelihood ratio confidence regions. Ann Stat 18:90–120
Article MathSciNet MATH Google Scholar
Owen A (1991) Empirical likelihood for linear models. Ann Stat 19:1725–1747
Article MathSciNet MATH Google Scholar
Owen A (2001) Empirical likelihood. Chapman and Hall, London
Book MATH Google Scholar
Qin J, Lawless J (1994) Empirical likelihood and general estimating equations. Ann Stat 22:300–325
Article MathSciNet MATH Google Scholar
Qin G, Jing B (2001) Empirical likelihood for Cox regression model under random censorship. Commun Stat Simul Comput 30:79–90
Article MathSciNet MATH Google Scholar
Shi J, Lau T (2000) Empirical likelihood for partially linear models. J Multivar Anal 72:132–148
Article MathSciNet MATH Google Scholar
Tsao M (2004) Bounds on coverage probabilities of the empirical likelihood ratio confidence regions. Ann Stat 32:1215–1221
Article MathSciNet MATH Google Scholar
Wang Q, Rao J (2002) Empirical likelihood-based inference in linear errors-in-covariables models with validation data. Biometrika 89:345–358
Article MathSciNet MATH Google Scholar
Xia H, Härdle W (2006) Semi-parametric estimation of partially linear single-index models. J Multivar Anal 97:1162–1184
Article MathSciNet MATH Google Scholar
Xia Y, Tong H, Li W (1999) On extended partially linear single-index models. Biometrika 86:831–842
Article MathSciNet MATH Google Scholar
Xia Y, Tong H, Li W, Zhu L (2002) An adaptive estimation of dimension reduction space. J R Stat Soc Ser B 64:363–410
Article MathSciNet MATH Google Scholar
Xue L, Zhu L (2006) Empirical likelihood for single-index models. J Multivar Anal 97:1295–1312
Article MathSciNet MATH Google Scholar
Yu Y, Ruppert D (2002) Penalized spline estimation for partially linear single-index models. J Am Stat Assoc 97:1042–1054
Article MathSciNet MATH Google Scholar
Zhang J, Wang T, Zhu L, Liang H (2012) A dimension reduction based approach for estimation and variable selection in partially linear single-index models with high-dimensional covariates. Electron J Stat 6:2235–2273
Article MathSciNet MATH Google Scholar
Zhu L, Xue L (2006) Empirical likelihood confidence regions in a partially linear single-index model. J R Stat Soc Ser B 68:549–570
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We are grateful to the editor, the associate editor and the referees for their insightful comments and suggestions which led to an improved presentation of the article. Fang’s research is supported by Scientific Research Fund of Hunan Provincial Education Department (17C0392). Liu and Lu’s research is supported by Open Fund of Innovation Platform in Hunan Province Colleges and Universities (13k030), and the Construct Program of the Key Discipline in Hunan Province. Lu’s work is partially supported by Discovery Grants (RG/PIN261567-2013) from National Science and Engineering Council (NSERC) of Canada.

Author information

Authors and Affiliations

College of Science, Hunan Institute of Engineering, Xiangtan, 411104, Hunan, China
Jianglin Fang
College of Mathematics and Computer Science, Hunan Normal University, Changsha, 410081, China
Wanrong Liu
Department of Mathematics and Statistics, University of Calgary, Calgary, AB, T2N 1N4, Canada
Xuewen Lu

Authors

Jianglin Fang
View author publications
You can also search for this author in PubMed Google Scholar
Wanrong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xuewen Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianglin Fang.

Appendix

To prove the main theorems, we need to give the following set of conditions.

Assumption 1

Let $\mathrm{Var}(X_{i})=\Sigma _{xi}$ and $\mathrm{Var}(Z_{i})=\Sigma _{zi}$, the eigenvalues of $\Sigma _{xi}$ and $\Sigma _{zi}$ satisfy $C_{1}\le \gamma _{1}(\Sigma _{xi})\le \cdots \le \gamma _{p}(\Sigma _{xi})\le C_{2}$ and $C_{1}\le \gamma _{1}(\Sigma _{zi})\le \cdots \le \gamma _{r}(\Sigma _{zi})\le C_{2}$ for some constants $0<C_{1}<C_{2}$, for $i=1\cdots n$. There is a constant $\delta >0$ such that $E(\varepsilon ^{4+\delta }|X,Z)<\infty $.

Assumption 2

There are $v(\cdot )$, $\eta =\eta (X,Z)$, such that $E(\varepsilon ^{2}|X,Z)=v(\eta )$, $0<C_{1}<v(\cdot )<C_{2}<\infty $ for some constants $0<C_{1}<C_{2}$, and The eigenvalues of $\mathrm{Var}(X_{i}|\eta (X_{i}, Z_{i}))$ are bounded away from zero and infinity.

Assumption 3

There exists $v_{1}(X,Z)$ such that

$$\begin{aligned}&\left| \frac{\partial ^{2}E(X|Z^{\top }\theta )}{\partial \theta _{i}\partial \theta _{j}}\right| , \left| \frac{\partial ^{2}E(Z|Z^{\top }\theta )}{\partial \theta _{i}\partial \theta _{j}}\right| , \left| \frac{\partial ^{2}E(w|Z^{\top }\theta )}{\partial \theta _{i}\partial \theta _{j}}\right| , \left| \frac{\partial ^{2}E(wZ|Z^{\top }\theta )}{\partial \theta _{i}\partial \theta _{j}}\right| ,\\&\left| \frac{\partial ^{2}E(wX|Z^{\top }\theta )}{\partial \theta _{i}\partial \theta _{j}}\right|<v_{1}(X,Z),Ev_{1}^{2}<\infty , (i,j=2,\ldots , r). \end{aligned}$$

Further there exists $v_{2}(X,Z)$ such that

$$\begin{aligned} \left| \frac{\partial ^{3}\eta (X,Z)}{\partial \gamma _{i}\partial \gamma _{j}\partial \gamma _{l}}\right|<v_{2}(X,Z), Ev_{2}^{2}<\infty , (i,j,l=1,\ldots , p+r), \end{aligned}$$

where $(X^{\top },Z^{\top })^{\top }=(\gamma _{1},\ldots ,\gamma _{p+r})^{\top }$. Further there exists $v_{3}(X,Z)$ such that

$$\begin{aligned} \left| \frac{\partial ^{4}g(Z^{\top }\theta )}{\partial \theta _{i}\partial \theta _{j}\partial \theta _{k}\partial \theta _{l}}\right| , \left| \frac{\partial ^{4}v(\eta )}{\partial \eta _{i_{1}}\partial \eta _{j_{1}}\partial \eta _{k_{1}}\partial \eta _{l_{1}}}\right|<v_{3}(X,Z), Ev_{3}^{2}<\infty , \end{aligned}$$

where the dimension of $\eta $ is $p_{1}$, and $i,j,k,l=2,\ldots ,r$, $i_{1},j_{1},k_{1},l_{1}=1,\ldots , p_{1}$.

Assumption 4

Assume that the random variable $\eta $ and $Z^{\top }\theta $ have densities $f_{\eta }(\eta )$ and $f_{Z^{\top }\theta }(Z^{\top }\theta )$, satisfying $0<\inf f_{\eta }(\eta )\le \sup f_{\eta }(\eta )<\infty $ and $0<\inf f_{Z^{\top }\theta }(Z^{\top }\theta )\le \sup f_{Z^{\top }\theta }(Z^{\top }\theta )<\infty $. Further there exists $v_{4}(X,Z)$ such that

$$\begin{aligned}&\left| \frac{\partial ^{2}f_{Z^{\top }\theta }(Z^{\top }\theta ))}{\partial \theta _{i}\partial \theta _{j}}\right| , \left| \frac{\partial ^{2}f_{\eta }(\eta )}{\partial \eta _{k}\partial \eta _{l}}\right| \\&\quad<v_{4}(X,Z), Ev_{4}^{2}<\infty , (i,j=2,\ldots , p;~ k,l=1,\ldots , p_{1}). \end{aligned}$$

Assumption 5

The kernel function $K_{h}(\cdot )$ is symmetric and its derivative is continuous with compact support contained in $[-1,1]$.

Assumption 6

The bandwidths $h_{i}$ satisfy $\log ^{2}(n)/(nh_{i})\rightarrow 0$ for $i=1,2,3$. In addition, $nh_{1}^{4}\rightarrow \infty $, $nh_{1}^{8}\rightarrow 0$, $h_{1}^{4}\log ^{2}(n)/h_{i}\rightarrow 0$ and $\log ^{4}(n)/(nh_{1}h_{i})\rightarrow 0$ for $i=1,2,3$, $h_{2}=O(n^{-1/5})$ and $h_{3}=O(n^{-1/5})$.

Assumption 7

$p,r\rightarrow \infty $, $pn^{-1/5}\rightarrow 0$, $rn^{-1/5}\rightarrow 0$, as $n\rightarrow \infty $.

Assumption 8

$E\Vert X\Vert ^{4}<\infty $, $E\Vert Z\Vert ^{4}<\infty $, $E\Vert \varepsilon X\Vert ^{4}<\infty $, $E\Vert \varepsilon Z\Vert ^{4}<\infty $ and $E|\varepsilon |^{4}<\infty $.

Assumption 9

Let

$$\begin{aligned} \xi _{n}(\beta ,\theta )=w\varepsilon \left[ X^{\top }-\frac{E(wX^{\top }|Z^{\top }\theta )}{E(w|Z^{\top }\theta )}, g'(Z^{\top }\theta )\left\{ Z^{\top }-\frac{E(wZ^{\top }|Z^{\top }\theta )}{E(w|Z^{\top }\theta )}\right\} \right] ^{\top }, \end{aligned}$$

and $\xi _{nl}(\beta ,\theta )$ be the l-th component of $\xi _{n}(\beta ,\theta )$, $l=1,\ldots ,p, p+2,\ldots ,p+r$. As $n\rightarrow \infty $, there is a positive constant C such that, $E(\Vert \xi _{n}(\beta ,\theta )/\sqrt{p}\Vert ^{4})<C$, $E(\Vert XX^{\top }\Vert ^{4})<C$, $E(\Vert XZ^{\top }\Vert ^{4})<\infty $ and $E(\Vert ZX^{\top }\Vert ^{4})<C$.

Assumptions 1–6 ensure the function $g(Z_{i}^{\top }\theta )$, $g'(Z_{i}^{\top }\theta )$, $w(X_{i},Z_{i})$, $E\{{\hat{w}}(X,Z)$$|Z_{i}^{\top }\theta \}$, $E\{{\hat{w}}(X,Z)X|Z_{i}^{\top }\theta \}$ and $E\{{\hat{w}}(X,Z)Z_{-1}| Z_{i}^{\top }\theta \}$ are estimated with retained precision and the nonparametric estimation does not affect the asymptotic result of the estimated empirical likelihood ratio, i.e., the estimated empirical likelihood ratio ${\tilde{L}}(\beta ,\theta )$ has the same asymptotic distribution as the ordinary empirical likelihood ratio $L(\beta ,\theta )$. Furthermore, Assumptions 1–6 ensure the existence of the estimator $({\hat{\beta }}^{\top },{\hat{\theta }}^{\top })^{\top }$ for parameters $(\beta ^{\top },\theta ^{\top })^{\top }$. Assumption 7 is a technical condition, and Assumption 8 ensures that there exists an asymptotic variance for the estimator of the growing parameters $(\beta ^{\top },\theta ^{\top })^{\top }$. Assumption 9 controls the tail probability behavior of the estimating equation. Because establishing the asymptotic theoretical results for empirical likelihood approach under the situation with diverging dimensionality on covariates is very challenging, these conditions are not the weakest possible and the bounds in the stochastic analysis are conservative. This is also the case in Ma and Zhu (2013), these strong conditions facilitate technical derivations.

Let ${\tilde{l}}(\lambda ,\beta ,\theta )=n^{-1}\sum _{i=1}^{n}\log \left\{ 1+\lambda ^{\top }{\hat{\xi }}_{i}(\beta ,\theta )\right\} $, $\bar{{\hat{\xi }}}(\beta ,\theta )=n^{-1}\sum _{i=1}^{n}{\hat{\xi }}_{i}(\beta ,\theta )$, $a_{n}=O_{p}\{(p/n)^{1/2}\}$ and C will denote a generic positive constant that may be different in different uses throughout the “Appendix”. In addition, we use the Frobenius norm of a matrix A, defined as $\Vert A\Vert =\{\mathrm {tr}(A^{\top }A)\}^{\frac{1}{2}}$, where $\mathrm {tr}(A)$ denotes the trace ofmatrix A.

Proof of Theorem 2.1

Proof

We first expand

$$\begin{aligned} 0= & {} \frac{1}{\sqrt{n}}A_{2}\sum \limits _{i=1}^{n}\left\{ Y_{i}-X_{i}^{\top }{\hat{\beta }}+{\hat{g}}'\left( Z_{i}^{\top }{\hat{\theta }}\right) \right\} {\hat{w}}_{i}{\hat{g}}' \left( Z_{i}^{\top }{\hat{\theta }}\right) \left\{ Z_{i}-\frac{{\hat{E}}\left( {\hat{w}}Z|Z_{i}^{\top }{\hat{\theta }}\right) }{{\hat{E}}\left( {\hat{w}}|Z_{i}^{\top }{\hat{\theta }}\right) }\right\} \nonumber \\= & {} \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}{\hat{w}}_{i}{\hat{g}}'\left( Z_{i}^{\top }{\hat{\theta }}\right) A_{2}\left\{ Z_{i}-\frac{E\left( wZ|Z_{i}^{\top }\theta _{0}\right) }{E\left( w|Z_{i}^{\top } \theta _{0}\right) }\right\} X_{i}^{\top }\left( \beta _{0}-{\hat{\beta }}\right) \nonumber \\&+\,\frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}{\hat{w}}_{i}{\hat{g}}'\left( Z_{i}^{\top }{\hat{\theta }}\right) A_{2}\left\{ \frac{E\left( wZ|Z_{i}^{\top }\theta _{0}\right) }{E\left( w|Z_{i}^{\top }\theta _{0} \right) }-\frac{{\hat{E}}\left( {\hat{w}}Z|Z_{i}^{\top }{\hat{\theta }}\right) }{{\hat{E}}\left( {\hat{w}}|Z_{i}^{\top }{\hat{\theta }}\right) }\right\} X_{i}^{\top }\left( \beta _{0}-{\hat{\beta }}\right) \nonumber \\&+\,\frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\left\{ g\left( Z_{i}^{\top }\theta _{0}\right) -{\hat{g}}\left( Z_{i}^{\top }\theta _{0}\right) \right\} {\hat{w}}_{i}{\hat{g}}'\left( Z_{i}^{\top }{\hat{\theta }}\right) A_{2}\left\{ Z_{i}-\frac{E\left( wZ|Z_{i}^{\top }\theta _{0}\right) }{E\left( w|Z_{i}^{\top }\theta _{0}\right) }\right\} \nonumber \\&+\,\frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\left\{ g\left( Z_{i}^{\top }\theta _{0}\right) -{\hat{g}}\left( Z_{i}^{\top }\theta _{0}\right) \right\} {\hat{w}}_{i}{\hat{g}}'\left( Z_{i}^{\top }{\hat{\theta }}\right) A_{2}\left\{ \frac{E\left( wZ|Z_{i}^{\top }\theta _{0}\right) }{E\left( w|Z_{i}^{\top }\theta _{0}\right) }\right. \nonumber \\&\left. -\,\frac{{\hat{E}}\left( {\hat{w}}Z|Z_{i}^{\top }{\hat{\theta }}\right) }{{\hat{E}}\left( {\hat{w}}|Z_{i}^{\top }\hat{\theta }\right) }\right\} \nonumber \\&+\,\frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\left\{ g\left( Z_{i}^{\top }\theta _{0}\right) -{\hat{g}}\left( Z_{i}^{\top }{\hat{\theta }}\right) \right\} {\hat{w}}_{i}{\hat{g}}'\left( Z_{i}^{\top }{\hat{\theta }}\right) A_{2}\left\{ Z_{i}-\frac{{\hat{E}}\left( {\hat{w}}Z|Z_{i}^{\top }{\hat{\theta }}\right) }{{\hat{E}}\left( {\hat{w}}|Z_{i}^{\top }{\hat{\theta }}\right) }\right\} \nonumber \\&+\,\frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\varepsilon _{i}{\hat{w}}_{i}{\hat{g}}'\left( Z_{i}^{\top }{\hat{\theta }}\right) A_{2}\left\{ Z_{i}-\frac{{\hat{E}}\left( {\hat{w}}Z|Z_{i}^{\top } {\hat{\theta }}\right) }{{\hat{E}}\left( {\hat{w}}|Z_{i}^{\top }{\hat{\theta }}\right) }\right\} . \end{aligned}$$

(17)

Similar to the proof of Proposition 2 in Ma and Zhu (2013), we can obtain from the second equation in (3) that

$$\begin{aligned}&A_{2}E\left[ wg'\left( Z^{\top }\theta _{0}\right) \left\{ Z-\frac{E\left( wZ|Z^{\top }\theta _{0}\right) }{E\left( w|Z^{\top }\theta _{0}\right) }\right\} X^{\top }\right] \sqrt{n}\left( {\hat{\beta }}-\beta _{0}\right) \nonumber \\&\qquad +A_{2}E\left[ w\{g'\left( Z^{\top }\theta _{0}\right) \}^{2}\left\{ Z-\frac{E\left( wZ|Z^{\top }\theta _{0}\right) }{E\left( w|Z^{\top }\theta _{0}\right) }\right\} Z^{\top }\right] \sqrt{n}\left( {\hat{\theta }}- \theta _{0}\right) \nonumber \\&\quad =\frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\varepsilon _{i}w_{i}g'\left( Z_{i}^{\top }\theta _{0}\right) A_{2}\left\{ Z_{i}-\frac{E\left( wZ|Z_{i}^{\top }\theta _{0}\right) }{E\left( w|Z_{i}^{\top } \theta _{0}\right) }\right\} +o_{p}\left( 1\right) . \end{aligned}$$

(18)

Similarly, from the first equation in (3), we have that

$$\begin{aligned}&A_{1}E\left[ w\left\{ X-\frac{E\left( wX|Z^{\top }\theta _{0}\right) }{E\left( w|Z^{\top }\theta _{0}\right) }\right\} X^{\top }\right] \sqrt{n}\left( {\hat{\beta }}-\beta _{0}\right) \nonumber \\&\qquad +A_{1}E\left[ wg'\left( Z^{\top }\theta _{0}\right) \left\{ X-\frac{E\left( wX|Z^{\top }\theta _{0}\right) }{E\left( w|Z^{\top }\theta _{0}\right) }\right\} Z^{\top }\right] \sqrt{n}\left( {\hat{\theta }}-\theta _{0}\right) \nonumber \\&\quad =\frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\varepsilon _{i}w_{i}A_{1}\left\{ X_{i}-\frac{E\left( wX|Z_{i}^{\top }\theta _{0}\right) }{E\left( w|Z_{i}^{\top }\theta _{0}\right) }\right\} +o_{p}\left( 1\right) . \end{aligned}$$

(19)

Combining (17) and (18) implies that

$$\begin{aligned} AV^{1/2}\left( { \begin{array}{*{10}c} {\hat{\beta }}-\beta _{0}\\ {\hat{\theta }}-\theta _{0}\\ \end{array}} \right) {=}AV^{-1/2}\left( { \begin{array}{*{10}c} \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\varepsilon _{i}w_{i}\left\{ X_{i}-\frac{E\left( wX|Z_{i}^{\top }\theta _{0}\right) }{E\left( w|Z_{i}^{\top }\theta _{0}\right) }\right\} \\ \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}\varepsilon _{i}w_{i}g'\left( Z_{i}^{\top }\theta _{0}\right) \left\{ Z_{i}-\frac{E\left( wZ|Z_{i}^{\top }\theta _{0}\right) }{E\left( w|Z_{i}^{\top }\theta _{0} \right) }\right\} \\ \end{array}}\right) {+}o_{p}(1). \end{aligned}$$

Applying the Lindeberg–Feller central limit theorem, we can establish

$$\begin{aligned} \sqrt{n}AV^{1/2}\left\{ \left( {\hat{\beta }}^{\top },{\hat{\theta }}^{\top }\right) ^{\top }-\left( \beta _{0}^{\top },\theta _{0}^{\top }\right) ^{\top }\right\} \rightarrow N(0,G) \end{aligned}$$

in distribution, and the proof of Theorem 2.1 is completed. $\square $

Next, we present the following lemmas before proving Theorem 2.2.

Lemma 5.1

Under Assumptions of Theorem 2.2, $\max _{1\le i \le n}\Vert {\hat{\xi }}_{i}(\beta ,\theta )\Vert =o_{p}(n^{1/4}\sqrt{p})$ and $\max _{1\le i \le n}|\lambda ^{\top }{\hat{\xi }}_{i}(\beta ,\theta )|=o_{p}(1)$ for all $\lambda =O_{p}(a_{n})$.

Proof

From Assumptions 8 and 9, for any $\epsilon >0$,

$$\begin{aligned} P\left\{ \max _{1\le i \le n}\Vert \xi _{i}\left( \beta ,\theta \right) \Vert \le n^{1/4}\sqrt{p}\epsilon \right\}\le & {} \sum _{i=1}^{n}P\left\{ \Vert \xi _{i}\left( \beta ,\theta \right) \Vert \le n^{1/4}\sqrt{p}\epsilon \right\} \nonumber \\\le & {} \frac{1}{np^{2}\epsilon ^{4}}\sum _{i=1}^{n}E\Vert \xi _{i}\left( \beta ,\theta \right) \Vert ^{4}\nonumber \\= & {} \frac{1}{\epsilon ^{k}}E\Vert \xi _{1}\left( \beta ,\theta \right) /\sqrt{p}\Vert ^{4}. \end{aligned}$$

(20)

By Cauchy–Schwarz inequality, $\Vert \xi _{1}(\beta ,\theta )/\sqrt{p}\Vert ^{4}\le 1/p\sum _{l=1}^{p+r}|\xi _{1l}(\beta ,\theta )|^{4}$, where $\xi _{1l}(\beta ,\theta )$ are the lth component of $\xi _{1}(\beta ,\theta )$. According to (20), we have

$$\begin{aligned} \max _{1\le i \le n}\Vert \xi _{i}(\beta ,\theta )\Vert =o_{p}\left( n^{1/4}\sqrt{p}\right) . \end{aligned}$$

Similar to the proof of (17) and (18) above, it is easy to check that

$$\begin{aligned} \Vert {\hat{\xi }}_{i}(\beta ,\theta )\Vert =\Vert \xi _{i}(\beta ,\theta )\Vert +O_{p}(p). \end{aligned}$$

Then, by Assumption 7, we have

$$\begin{aligned} \Vert {\hat{\xi }}_{i}\left( \beta ,\theta \right) \Vert =o_{p}\left( n^{1/4}\sqrt{p}\right) +O_{p}\left( p\right) =o_{p}\left( n^{1/4}\sqrt{p}\right) , \end{aligned}$$

and for all $\lambda =O_{p}(a_{n})$,

$$\begin{aligned} \max _{1\le i \le n}|\lambda ^{\top }{\hat{\xi }}_{i}(\beta ,\theta )|=o_{p}(1). \end{aligned}$$

The proof of Lemma 5.1 is completed. $\square $

Lemma 5.2

Under Assumptions of Theorem 2.2, $\Vert S_{n}-V\Vert =O_{p}(p/\sqrt{n})$, where $S_{n}=1/n\sum _{i=1}^{n}{\hat{\xi }}_{i}(\beta ,\theta ){\hat{\xi }}_{i}(\beta ,\theta )^{\top }$.

Proof

Similar to the proof of Lemma 5.4 in Chen et al. (2009), we have $tr\{(S_{n}-V)^{\otimes 2}\}=O_{p}(p^{2}/n)$. Therefore, by the definition of Frobenius norm, $\Vert S_{n}-V\Vert =\{tr[(S_{n}-V)^{\top }(S_{n}-V)]\}^{1/2}=O_{p}(p/\sqrt{n})$. $\square $

Lemma 5.3

Under Assumptions of Theorem 2.2, $\Vert \lambda \Vert =O_{p}(a_{n})$, where $\lambda $ is the root of (8).

Proof

According to (8), $\lambda \in {\mathbb {R}}^{p+r}$ satisfies

$$\begin{aligned} 0=\frac{1}{n}\sum _{i=1}^{n}\frac{{\hat{\xi }}_{i}(\beta ,\theta )}{1+\lambda ^{\top }{\hat{\xi }}_{i}(\beta ,\theta )}=:\psi (\lambda ). \end{aligned}$$

Let $\lambda =\rho \alpha $, where $\rho \ge 0$, $\alpha \in {\mathbb {R}}^{p+r}$ and $\Vert \alpha \Vert =1$. Substituting $1/(1+\lambda ^{\top }{\hat{\xi }}_{i}(\beta ,\theta ))=1-\lambda ^{\top }{\hat{\xi }}_{i}(\beta ,\theta )/(1+\lambda ^{\top }{\hat{\xi }}_{i}(\beta ,\theta ))$ into $\alpha ^{\top }\psi (\lambda )=0$, we have

$$\begin{aligned} |\alpha ^{\top }\bar{{\hat{\xi }}}_{i}(\beta ,\theta )|\ge \frac{\rho }{1+\rho \max \limits _{1\le i \le n}\Vert \xi _{i}(\beta ,\theta )\Vert }\alpha ^{\top }S_{n}\alpha , \end{aligned}$$

where $S_{n}=\frac{1}{n}\sum \limits _{i=1}^{n}{\hat{\xi }}_{i}(\beta ,\theta ){\hat{\xi }}_{i}(\beta ,\theta )^{\top }$. Because of

$$\begin{aligned} 0< 1+\lambda ^{\top }{\hat{\xi }}_{i}(\beta ,\theta )\le 1+\rho \max \limits _{1\le i \le n}\Vert \xi _{i}(\beta ,\theta )\Vert , \end{aligned}$$

we have

$$\begin{aligned} \rho [\alpha ^{\top }S_{n}\alpha -\alpha ^{\top }{\hat{\xi }}_{i}(\beta ,\theta )\max \limits _{1\le i \le n}\Vert \xi _{i}(\beta ,\theta )\Vert ]\le \left| \alpha ^{\top }\bar{{\hat{\xi }}}_{i}(\beta ,\theta )\right| . \end{aligned}$$

(21)

Because $|\alpha ^{\top }\bar{{\hat{\xi }}}_{i}(\beta ,\theta )|\le \Vert \bar{{\hat{\xi }}}_{i}(\beta ,\theta )\Vert =O_{p}(\sqrt{p/n})$ and Lemma 5.1, then

$$\begin{aligned} \max \limits _{1\le i \le n}\Vert \xi _{i}(\beta ,\theta )\Vert \left| \alpha ^{\top }\bar{{\hat{\xi }}}_{i}(\beta ,\theta )\right| =o_{p}(1). \end{aligned}$$

(22)

By combining (21) and (22), we have

$$\begin{aligned} |\rho [\alpha ^{\top }S_{n}\alpha +o_{p(1)}]|=O_{p}(\sqrt{p/n}). \end{aligned}$$

According to Lemma 5.2, for a constant $C_{1}>0$, $P(\alpha ^{\top }S_{n}\alpha \ge \frac{1}{2}C_{1})\rightarrow 1$ as $n\rightarrow \infty $. Hence, $\rho =O_{p}(\sqrt{p/n})$, that is $\Vert \lambda \Vert =\rho =O_{p}(\sqrt{p/n})$, and the proof of Lemma 5.3 is completed. $\square $

Lemma 5.4

Under Assumptions of Theorem 2.2, as $n\rightarrow \infty $,

$$\begin{aligned}&\left\{ 2\left( p+r-1\right) \right\} ^{-1}\left\{ \left( \frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) \right) ^{\top }V^{-1}\left( \frac{1}{\sqrt{n}}\sum _{i=1}^{n} {\hat{\xi }}_{i}\left( \beta ,\theta \right) \right) \right. \\&\quad \left. -\left( p+r-1\right) \right\} {\mathop {\rightarrow }\limits ^{L}} N\left( 0,1\right) \!. \end{aligned}$$

Proof

The proof entails applying the martingale central limit theorem as given in Hall and Hyde (1980), and is omitted. $\square $

Lemma 5.5

Under Assumptions of Theorem 2.2,

$$\begin{aligned} \left\{ \frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) \right\} ^{\top }\left( S_{n}^{-1}-V^{-1}\right) \left\{ \frac{1}{\sqrt{n}}\sum _{i=1}^{n} {\hat{\xi }}_{i}\left( \beta ,\theta \right) \right\} =o_{p}\left( \sqrt{p}\right) . \end{aligned}$$

Proof

Let $D_{n}=V^{-1/2}S_{n}V^{-1/2}-I_{p+r}$, where $I_{p+r}$ is the $p+r$ dimensional identity matrix.

$$\begin{aligned} S_{n}^{-1}-V^{-1}= & {} V^{-1/2}\left( V^{1/2}S_{n}^{-1}V^{1/2}-I_{p+r}\right) V^{-1/2}\\= & {} V^{-1/2}\left\{ -D_{n}+D_{n}^{2}+D_{n}^{2}\left( V^{1/2}S_{n}^{-1}V^{1/2}-I_{p+r}\right) \right\} V^{-1/2}. \end{aligned}$$

It is easy to check that

$$\begin{aligned} tr\left( S_{n}-V\right)= & {} tr\left( V^{1/2}\left( V^{-1/2}S_{n}V^{-1/2}-I_{p+r}\right) V^{1/2}\right) \\= & {} tr\left( D_{n}VD_{n}V\right) \ge \gamma _{1}^{2}\left( V\right) tr\left( D_{n}^{2}\right) , \end{aligned}$$

where $\gamma _{1}(V)$ is the smallest eigenvalue of V. Similar to the proof of Lemma 5.4 in Chen et al. (2009), we have

$$\begin{aligned} tr\left( D_{n}^{2}\right) \le tr\left\{ \left( S_{n}-V\right) ^{2}\right\} =O_{p}\left( p^{2}/n\right) . \end{aligned}$$

Then

$$\begin{aligned} tr\left( S_{n}^{-1}-V^{-1}\right) ^{2}\le & {} 2tr\left\{ V^{-2}\left( -D_{n}+D_{n}^{2}\right) ^{2}\right\} +2tr\left\{ D_{n}^{4}\left( S_{n}^{-1}-V^{-1}\right) ^{2}\right\} \\\le & {} 2tr\left\{ V^{-2}\left( -D_{n}+D_{n}^{2}\right) ^{2}\right\} \\&+2\left\{ tr\left( D_{n}^{2}\right) \right\} ^{2}tr\left\{ \left( S_{n}^{-1}-V^{-1}\right) ^{2}\right\} \\= & {} 2tr\left\{ V^{-2}\left( -D_{n}+D_{n}^{2}\right) ^{2}\right\} +o_{p}\left( tr\left\{ \left( S_{n}^{-1}-V^{-1}\right) ^{2}\right\} \right) \\= & {} o_{p}\left( p^{2}/n\right) . \end{aligned}$$

Because $\Vert \frac{1}{n}\sum _{i=1}^{n}{\hat{\xi }}_{i}(\beta ,\theta )\Vert =O_{p}(\sqrt{p/n})$, we can obtain

$$\begin{aligned}&\left\{ \frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) \right\} ^{\top }\left( S_{n}^{-1}-V^{-1}\right) \left\{ \frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) \right\} \\&\quad \le n\Vert \frac{1}{n}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) \Vert ^{2}\sqrt{tr\left( S_{n}^{-1}-V^{-1}\right) ^{2}}=o_{p}\left( \sqrt{p}\right) . \end{aligned}$$

$\square $

Proof of Theorem 2.2

Proof

Put $W_{i}=\lambda ^{\top }{\hat{\xi }}_{i}(\beta ,\theta ), i=1,\ldots ,n$. By expanding Eq. (8), we obtain

$$\begin{aligned} 0=\sum _{i=1}^{n}\frac{{\hat{\xi }}_{i}\left( \beta ,\theta \right) }{1+\lambda ^{\top }{\hat{\xi }}_{i}\left( \beta ,\theta \right) } =\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) -\sum _{i=1}^{n}\left\{ {\hat{\xi }}_{i}\left( \beta ,\theta \right) {\hat{\xi }}_{i}\left( \beta ,\theta \right) ^{\top }\right\} \lambda +R_{n},\qquad \end{aligned}$$

(23)

where $R_{n}=\sum _{i=1}^{n}\frac{{\hat{\xi }}_{i}(\beta ,\theta )(\lambda ^{\top }{\hat{\xi }}_{i}(\beta ,\theta ))^{2}}{(1+\vartheta _{i})^{3}}$ and $|\vartheta _{i}|\le |\lambda ^{\top }{\hat{\xi }}_{i}(\beta ,\theta )|$. By Lemma 5.1, we have $\max _{1\le i\le n}|\vartheta _{i}|=o_{p}(1)$. Hence $R_{n}=R_{n1}\{1+o_{p}(1)\}$, where

$$\begin{aligned} R_{n1}=\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) \left( \lambda ^{\top }{\hat{\xi }}_{i}\left( \beta ,\theta \right) \right) ^{2}. \end{aligned}$$

Apply Lemmas 5.1 and 5.3, we obtain

$$\begin{aligned} \Vert n^{-1}R_{n}\Vert \le C\Vert \lambda \Vert ^{2}\max _{1\le i \le n}\Vert {\hat{\xi }}_{i}(\beta ,\theta )\Vert n^{-1}\sum _{i=1}^{n}\Vert {\hat{\xi }}_{i}(\beta ,\theta )\Vert ^{2}=o_{p}(a_{n}). \end{aligned}$$

(24)

By (23), we have

$$\begin{aligned} \lambda =\left\{ \sum _{i=1}^{n}{\hat{\xi }}_{i}(\beta ,\theta ){\hat{\xi }}_{i}(\beta ,\theta )^{\top }\right\} ^{-1}\sum _{i=1}^{n}{\hat{\xi }}_{i}(\beta ,\theta )+ \left\{ \sum _{i=1}^{n}{\hat{\xi }}_{i}(\beta ,\theta ){\hat{\xi }}_{i}(\beta ,\theta )^{\top }\right\} ^{-1}R_{n}. \end{aligned}$$

Applying Taylor’s expansion, for some $\zeta _{i}$ such that $|\zeta _{i}|\le |\lambda ^{\top }{\hat{\xi }}_{i}(\beta ,\theta )|$, we obtain

$$\begin{aligned} \log \left( 1+\lambda ^{\top }{\hat{\xi }}_{i}\left( \beta ,\theta \right) \right) =\lambda ^{\top }{\hat{\xi }}_{i}\left( \beta ,\theta \right) -\frac{\left\{ \lambda ^{\top }{\hat{\xi }}_{i}\left( \beta ,\theta \right) \right\} ^{2}}{2}+ \frac{\left\{ \lambda ^{\top }{\hat{\xi }}_{i}\left( \beta ,\theta \right) \right\} ^{3}}{3\left( 1+\zeta _{i}\right) ^{4}}. \end{aligned}$$

Therefore,

$$\begin{aligned} {\tilde{l}}\left( \beta ,\theta \right)= & {} \left( \frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) \right) ^{\top }\left\{ \frac{1}{n}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) {\hat{\xi }}_{i}\left( \beta ,\theta \right) ^{\top }\right\} ^{-1}\left( \frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) \right) \nonumber \\&-\frac{1}{n}R_{n}^{\top }\left\{ \frac{1}{n}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) {\hat{\xi }}_{i}\left( \beta ,\theta \right) ^{\top }\right\} ^{-1}R_{n}+\sum _{i=1}^{n}\frac{2\left\{ \lambda ^{\top } {\hat{\xi }}_{i}\left( \beta ,\theta \right) \right\} ^{3}}{3\left( 1+\zeta _{i}\right) ^{4}}\nonumber \\= & {} \left( \frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) \right) ^{\top }V^{-1}\left( \frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) \right) \nonumber \\&+\left( \frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) \right) ^{\top }\left[ \left\{ \frac{1}{n}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) {\hat{\xi }}_{i}\left( \beta ,\theta \right) ^{\top }\right\} ^{-1} -V^{-1}\right] \nonumber \\&\times \left( \frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) \right) -\frac{1}{n}R_{n}^{\top }\left\{ \frac{1}{n}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) {\hat{\xi }}_{i}\left( \beta ,\theta \right) ^{\top }\right\} ^{-1}R_{n}\nonumber \\&+\frac{2}{3}\sum _{i=1}^{n} \left\{ \lambda ^{\top }{\hat{\xi }}_{i}\left( \beta ,\theta \right) \right\} ^{3}\left\{ 1+o_{p}\left( 1\right) \right\} . \end{aligned}$$

(25)

By Lemma 5.5, we have

$$\begin{aligned}&\left( \frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) \right) ^{\top }\left[ \left\{ \frac{1}{n}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) {\hat{\xi }}_{i}\left( \beta ,\theta \right) ^{\top }\right\} ^{-1}-V^{-1}\right] \nonumber \\&\times \left( \frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) \right) =o_{p}\left( 1\right) . \end{aligned}$$

(26)

By Lemmas 5.1–5.3 and (24), we can obtain

$$\begin{aligned} \frac{1}{n}R_{n}^{\top }\left\{ \frac{1}{n}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) {\hat{\xi }}_{i}\left( \beta ,\theta \right) ^{\top }\right\} ^{-1}R_{n}=o_{p}\left( 1\right) , \end{aligned}$$

(27)

and

$$\begin{aligned} \frac{2}{3}\sum _{i=1}^{n} \left\{ \lambda ^{\top }{\hat{\xi }}_{i}\left( \beta ,\theta \right) \right\} ^{3}\left\{ 1+o_{p}\left( 1\right) \right\} =o_{p}\left( \sqrt{p}\right) . \end{aligned}$$

(28)

It follows from (25)–(28) that

$$\begin{aligned} {\tilde{l}}\left( \beta ,\theta \right) =\left( \frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\hat{\xi }}_{i}\left( \beta ,\theta \right) \right) ^{\top }V^{-1}\left( \frac{1}{\sqrt{n}}\sum _{i=1}^{n} {\hat{\xi }}_{i}\left( \beta ,\theta \right) \right) +o_{p}\left( \sqrt{p}\right) . \end{aligned}$$

Hence the theorem follows from Lemmas 5.4 and 5.5, and the proof of Theorem 2.2 is completed. $\square $

Proof of Theorem 2.3

Proof

We first prove that $\max _{1\le i \le n}\Vert \hat{{\tilde{\xi }}}_{i}(\beta ^{(1)})\Vert =o_{p}(n^{1/2})$. It can be shown that

$$\begin{aligned} \hat{{\tilde{\xi }}}_{i}\left( \beta \right)= & {} {\hat{w}}_{i}\left\{ Y_{i}-X_{i}^{\left( 1\right) \top }\beta ^{\left( 1\right) }-X_{i}^{\left( 2\right) \top }{\hat{\beta }}^{\left( 2\right) }-{\hat{g}}\left( Z_{i}^{\top }{\hat{\theta }}\right) \right\} \left\{ X_{i}^{\left( 1\right) }-\frac{{\hat{E}}\left( {\hat{w}}X^{\left( 1\right) }|Z_{i}^{\top }{\hat{\theta }}\right) }{{\hat{E}}\left( {\hat{w}}|Z_{i}^{\top }{\hat{\theta }}\right) }\right\} \nonumber \\= & {} \left\{ w_{i}\left( 1+o_{p}\left( 1\right) \right) \right\} \left\{ \varepsilon _{i}+X_{i}^{\top }\left( \beta -{\hat{\beta }}\right) +X_{i}^{\left( 1\right) \top }\left( {\hat{\beta }}^{\left( 1\right) }-\beta ^{\left( 1\right) }\right) +\left( g\left( Z_{i}^{\top }{\hat{\theta }}\right) \right. \right. \nonumber \\&\left. \left. -{\hat{g}}\left( Z_{i}^{\top }{\hat{\theta }}\right) \right) \right\} \nonumber \\&\times \left\{ \left( X_{i}^{\left( 1\right) }-\frac{E\left( wX^{\left( 1\right) }|Z_{i}^{\top }{\hat{\theta }}\right) }{E\left( w|Z_{i}^{\top }{\hat{\theta }}\right) }\right) +\left( \frac{E\left( wX^{\left( 1\right) } |Z_{i}^{\top }{\hat{\theta }}\right) }{E\left( w|Z^{\top }{\hat{\theta }}\right) }-\frac{{\hat{E}}\left( {\hat{w}}X^{\left( 1\right) }|Z_{i}^{\top }{\hat{\theta }}\right) }{{\hat{E}}\left( {\hat{w}}|Z_{i}^{\top } {\hat{\theta }}\right) }\right) \right\} \nonumber \\= & {} M_{i1}+M_{i2}+M_{i3}+M_{i4}+M_{i5}+M_{i6}+M_{i7}+M_{i8}, \end{aligned}$$

(29)

where

$$\begin{aligned} M_{i1}= & {} \{w_{i}\left( 1+o_{p}\left( 1\right) \right) \}\varepsilon _{i}\left\{ X_{i}^{\left( 1\right) }-\frac{E\left( wX^{\left( 1\right) }|Z_{i}^{\top }{\hat{\theta }}\right) }{E\left( w|Z_{i}^{\top }{\hat{\theta }}\right) } \right\} ,\\ M_{i2}= & {} \{w_{i}\left( 1+o_{p}\left( 1\right) \right) \}\varepsilon _{i}\left\{ \frac{E\left( wX^{\left( 1\right) }|Z_{i}^{\top }{\hat{\theta }}\right) }{E\left( w|Z^{\top }{\hat{\theta }}\right) }-\frac{{\hat{E}} \left( {\hat{w}}X^{\left( 1\right) }|Z_{i}^{\top }{\hat{\theta }}\right) }{{\hat{E}}\left( {\hat{w}}|Z^{\top }{\hat{\theta }}\right) }\right\} ,\\ M_{i3}= & {} \{w_{i}\left( 1+o_{p}\left( 1\right) \right) \}X_{i}^{\top }\left( \beta -{\hat{\beta }}\right) \left\{ X_{i}^{\left( 1\right) }-\frac{E\left( wX^{\left( 1\right) }|Z_{i}^{\top }{\hat{\theta }}\right) }{E\left( w|Z_{i}^{\top } {\hat{\theta }}\right) }\right\} ,\\ M_{i4}= & {} \{w_{i}\left( 1+o_{p}\left( 1\right) \right) \}X_{i}^{\top }\left( \beta -{\hat{\beta }}\right) \left\{ \frac{E\left( wX^{\left( 1\right) }|Z_{i}^{\top }{\hat{\theta }}\right) }{E\left( w|Z^{\top }{\hat{\theta }}\right) } -\frac{{\hat{E}}\left( {\hat{w}}X^{\left( 1\right) }|Z_{i}^{\top }{\hat{\theta }}\right) }{{\hat{E}}\left( {\hat{w}}|Z^{\top }{\hat{\theta }}\right) }\right\} ,\\ M_{i5}= & {} \{w_{i}\left( 1+o_{p}\left( 1\right) \right) \}X_{i}^{\left( 1\right) \top }\left( {\hat{\beta }}^{\left( 1\right) }-\beta ^{\left( 1\right) }\right) \left\{ X_{i}^{\left( 1\right) }-\frac{E\left( wX^{\left( 1\right) }|Z_{i}^{\top }{\hat{\theta }}\right) }{E\left( w|Z_{i}^{\top }{\hat{\theta }}\right) }\right\} ,\\ M_{i6}= & {} \{w_{i}\left( 1+o_{p}\left( 1\right) \right) \}X_{i}^{\left( 1\right) \top }\left( {\hat{\beta }}^{\left( 1\right) }\right. \\&\left. -\beta ^{\left( 1\right) }\right) \left\{ \frac{E\left( wX^{\left( 1\right) }|Z_{i}^{\top }{\hat{\theta }}\right) }{E\left( w|Z_{i} {\hat{\theta }}\right) }-\frac{{\hat{E}}\left( {\hat{w}}X^{\left( 1\right) }|Z_{i}^{\top }{\hat{\theta }}\right) }{{\hat{E}}\left( {\hat{w}}|Z^{\top }{\hat{\theta }}\right) }\right\} ,\\ M_{i7}= & {} \{w_{i}\left( 1+o_{p}\left( 1\right) \right) \}\left( g\left( z_{i}^{\top }{\hat{\theta }}\right) -{\hat{g}}\left( Z_{i}^{\top }{\hat{\theta }}\right) \right) \left\{ X_{i}^{\left( 1\right) }-\frac{E\left( wX^{\left( 1\right) }|Z_{i}^{\top } {\hat{\theta }}\right) }{E\left( w|Z_{i}^{\top }{\hat{\theta }}\right) }\right\} ,\\ M_{i8}= & {} \{w_{i}\left( 1+o_{p}\left( 1\right) \right) \}\left( g\left( z_{i}^{\top }{\hat{\theta }}\right) \right. \\&\left. -{\hat{g}}\left( Z_{i}^{\top }{\hat{\theta }}\right) \right) \left\{ \frac{E\left( wX^{\left( 1\right) }|Z_{i}^{\top }{\hat{\theta }}\right) }{E\left( w| Z_{i}^{\top }{\hat{\theta }}\right) }-\frac{{\hat{E}}\left( {\hat{w}}X^{\left( 1\right) }|Z_{i}^{\top }{\hat{\theta }}\right) }{{\hat{E}}\left( {\hat{w}}|Z^{\top }{\hat{\theta }}\right) }\right\} . \end{aligned}$$

By (29), we can obtain that

$$\begin{aligned} \max _{1\le i \le n}\Vert \hat{{\tilde{\xi }}}_{i}(\beta ^{(1)})\Vert\le & {} \max _{1\le i \le n}\Vert M_{i1}\Vert +\max _{1\le i \le n}\Vert M_{i2}\Vert +\max _{1\le i \le n}\Vert M_{i3}\Vert +\max _{1\le i \le n}\Vert M_{i4}\Vert \\+ & {} \max _{1\le i \le n}\Vert M_{i5}\Vert +\max _{1\le i \le n}\Vert M_{i6}\Vert +\max _{1\le i \le n}\Vert M_{i7}\Vert +\max _{1\le i \le n}\Vert M_{i8}\Vert . \end{aligned}$$

Similar to the proof of Proposition 2 in Ma and Zhu (2013), it is easy to show that

$$\begin{aligned} \max _{1\le i \le n}\Vert M_{l1}\Vert =o_{p}(n^{1/2}),\quad l=1,\ldots ,8. \end{aligned}$$

Therefore, we have $\max _{1\le i \le n}\Vert \hat{{\tilde{\xi }}}_{i}(\beta ^{(1)})\Vert =o_{p}(n^{1/2})$. In addition, from the proof of Theorem 3.1 in Li and Wang (2003), as $n\rightarrow \infty $, we can also show that

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum _{i=1}^{n}\hat{{\tilde{\xi }}}_{i}\left( \beta ^{\left( 1\right) }\right) {\mathop {\longrightarrow }\limits ^{L}} N\left( 0,V_{1}\left( \beta ^{\left( 1\right) }\right) \right) , \end{aligned}$$

(30)

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\hat{{\tilde{\xi }}}_{i}\left( \beta ^{\left( 1\right) }\right) \hat{{\tilde{\xi }}}_{i}^{\top }\left( \beta ^{\left( 1\right) }\right) {\mathop {\longrightarrow }\limits ^{p}} V_{1}\left( \beta ^{\left( 1\right) }\right) , \end{aligned}$$

(31)

where

$$\begin{aligned} V_{1}\left( \beta ^{\left( 1\right) }\right) =E\left\{ wX^{\left( 1\right) }X^{\left( 1\right) \top }-\frac{E\left( wX^{\left( 1\right) }|Z^{\top }{\hat{\theta }}\right) E\left( wX^{\left( 1\right) T}|Z^{\top }{\hat{\theta }}\right) }{E\left( w|Z^{\top }{\hat{\theta }}\right) }\right\} \end{aligned}$$

and ${\mathop {\rightarrow }\limits ^{p}}$ stands for convergence in probability. By $\max _{1\le i \le n}\Vert \hat{{\tilde{\xi }}}_{i}(\beta ^{(1)})\Vert =o_{p}(n^{1/2})$ and Talor expansion to (10), we can obtain that

$$\begin{aligned} {\tilde{l}}\left( \beta ^{\left( 1\right) }\right) =2\sum _{i=1}^{n}\lambda ^{\left( 1\right) \top }\hat{{\tilde{\xi }}}_{i}\left( \beta ^{\left( 1\right) }\right) -\sum _{i=1}^{n}\left\{ \lambda ^{\left( 1\right) \top }\hat{{\tilde{\xi }}}_{i} \left( \beta ^{\left( 1\right) }\right) \right\} ^{2}+o_{p}\left( 1\right) . \end{aligned}$$

(32)

Similar to the proof of Theorem 17 in Owen (1990), we have

$$\begin{aligned}&\sum _{i=1}^{n}\left\{ \lambda ^{\left( 1\right) \top }\hat{{\tilde{\xi }}}_{i}\left( \beta ^{\left( 1\right) }\right) \right\} ^{2}=\sum _{i=1}^{n}\lambda ^{\left( 1\right) \top }\hat{{\tilde{\xi }}}_{i}\left( \beta ^{\left( 1\right) }\right) +o_{p}\left( 1\right) , \end{aligned}$$

(33)

$$\begin{aligned}&\lambda ^{\left( 1\right) }=\left\{ \sum _{i=1}^{n}\hat{{\tilde{\xi }}}_{i}\left( \beta ^{\left( 1\right) }\right) \hat{{\tilde{\xi }}}_{i}\left( \beta ^{\left( 1\right) }\right) ^{\top }\right\} ^{-1}\sum _{i=1}^{n}\hat{{\tilde{\xi }}}_{i} \left( \beta ^{\left( 1\right) }\right) +o_{p}\left( n^{-1/2}\right) . \end{aligned}$$

(34)

Combining (32)–(34) implies that

$$\begin{aligned}&{\tilde{l}}\left( \beta ^{\left( 1\right) }\right) \\&\quad =\left\{ \frac{1}{\sqrt{n}}\sum _{i=1}^{n}\hat{{\tilde{\xi }}}_{i}\left( \beta ^{\left( 1\right) }\right) \right\} ^{\top }\left\{ \frac{1}{n}\sum _{i=1}^{n}\hat{{\tilde{\xi }}}_{i} \left( \beta ^{\left( 1\right) }\right) \hat{{\tilde{\xi }}}_{i}^{\top }\left( \beta ^{\left( 1\right) }\right) \right\} ^{-1}\left\{ \frac{1}{\sqrt{n}}\sum _{i=1}^{n}\hat{{\tilde{\xi }}}_{i}\left( \beta ^{\left( 1\right) }\right) \right\} \\&\qquad +o_{p}\left( 1\right) . \end{aligned}$$

Therefore, together with (30) and (31), we can show that ${\tilde{l}}(\beta ^{(1)}){\mathop {\rightarrow }\limits ^{L}} \chi _{k}^{2}$, and the proof is completed. $\square $

The partially linear model or the single-index model is a special case of the partially linear single-index model. We can prove Theorems 3.1 and 3.2 by using the same arguments in the proofs of Theorems 2.1–2.3, hence their proofs are omitted.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fang, J., Liu, W. & Lu, X. Empirical likelihood for heteroscedastic partially linear single-index models with growing dimensional data. Metrika 81, 255–281 (2018). https://doi.org/10.1007/s00184-018-0642-7

Download citation

Received: 03 March 2017
Published: 02 February 2018
Issue Date: April 2018
DOI: https://doi.org/10.1007/s00184-018-0642-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Empirical likelihood for heteroscedastic partially linear single-index models with growing dimensional data

Abstract

Similar content being viewed by others

Inferences for extended partially linear single-index models

A constructive hypothesis test for the single-index models with two groups

Specification testing of partially linear single-index models: a groupwise dimension reduction-based adaptive-to-model approach

1 Introduction

2 Methodology and main results

Definition 1

Theorem 2.1

Theorem 2.2

Remark 1

Theorem 2.3

Remark 2

3 Two special cases: partially linear models and single-index models

Theorem 3.1

Theorem 3.2

4 Simulation studies

5 Real data application

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Assumption 1

Assumption 2

Assumption 3

Assumption 4

Assumption 5

Assumption 6

Assumption 7

Assumption 8

Assumption 9

Proof

Lemma 5.1

Proof

Lemma 5.2

Proof

Lemma 5.3

Proof

Lemma 5.4

Proof

Lemma 5.5

Proof

Proof

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation