1 Introduction

Consider the partially linear single-index model

$$\begin{aligned} Y_{i}=X_{i}^{\top }\beta +g\left( Z_{i}^{\top }\theta \right) +\varepsilon _{i},\quad \left( i=1,\ldots ,n\right) , \end{aligned}$$
(1)

where \(X_{i}\in {\mathbb {R}}^{p}\) and \(Z_{i}\in {\mathbb {R}}^{r}\) are random variables with dimensions p and r respectively, \(Y_{i}\) is a response variable, \(\beta \) and \(\theta \) are unknown parameters, \(g(\cdot )\) is an unknown function, \(\varepsilon _{i}\)’s are independent random errors with \(E(\varepsilon _{i}|X_{i},Z_{i})=0\) and \(Var(\varepsilon _{i}|X_{i},Z_{i})=v(X_{i},Z_{i})>0\), v(XZ) is a function of (XZ) representing possible heteroscedasticity. For identifiability, we assume that the first component of \(\theta \) is 1, and we use \(Z_{-1}\) to denote the last \(r-1\) components of Z. An interesting problem is statistical inference about \((\beta ^{\top },\theta ^{\top })^{\top }\) in partially linear single-index models, whereas all the other unknown components of model (1), such as the unknown function \(g(\cdot )\) and the unspecified function v(XZ), are termed nuisance parameters. Model (1) was proposed by Carroll et al. (1997), and is a natural extension of the partially linear model and the single-index model (See Engle et al. 1986; Xia et al. 1999). The parametric component \(X_{i}^{\top }\beta \) provides a simple summary of covariate effects which are of the main scientific interest. The index \(Z_{i}^{\top }\theta \) enables us to simplify the treatment of the multiple auxiliary variables, and the smooth baseline component \(g(\cdot )\) enriches model flexibility. Since the partially linear single-index model was introduced, it has been broadly and deeply studied by many authors in various disciplines. For example, Yu and Ruppert (2002) proposed a penalized spline estimation procedure; Xia et al. (2002) integrated the dimension reduction idea and minimum average variance estimation; Xia and Härdle (2006) showed semiparametric estimation of partially linear single-index models; Lai and Wang (2014) studied semiparametric efficient estimation with responses missing at random. Although, such as Yu and Ruppert (2002), they did not explicitly make an equal variance assumption for \(\varepsilon \), they did not account for the heteroscedasticity of model (1). Therefore, these estimators are not efficient when heteroscedasticity is present.

In practice, many variables are introduced to reduce possible modeling biases. In the early days, the value for the number of parameters p is in the ranges 10–500, and sample size n is in the ranges 100–10,000. From problems in X-ray crystallography, Huber (1973) noted that in a variable selection context the number of parameters is often large and should be modeled as \(p_{n}\), which tends to \(\infty \). Afterwards, with the development of technology and huge investment in various forms of data gathering, Donohn (2000) demonstrated with web term-document data, gene expression data and consumer financial history data, large sample sizes with high dimensions are important characteristics. He also found that even in a classical setting such as the Framingham heart study, the sample size is as large as \(n=25{,}000\) and the dimension is \(p=100\), which can be modeled as \(p=O(n^{1/3})\) or \(p=O(n^{1/2})\). Recently, high-dimensional data becomes more and more popular in many areas, such as financial and statistical applications, hyperspectral imagery, internet portals, high-throughput genomic data analysis and other areas of computational biology; see, e.g., Bai and Saranadasa (1996), Ledoit and Wolf (2002), Hjort et al. (2009), Zhang et al. (2012) and Ma and Zhu (2013). Zhang et al. (2012) integrated the dimension reduction idea and variable selection in partially linear single-index models with high-dimensional covariates, Ma and Zhu (2013) proposed efficient estimators for heteroscedastic partially linear single-index models allowing high-dimensional covariates. For model (1), although they allowed high-dimensional covariates, the dimension of covariates is fixed, and can not tend to infinity as the sample size \(n\rightarrow \infty \).

The method of empirical likelihood introduced in Owen (1988, 1990, 1991, 2001) might be useful for the purpose of making inference for model (1). To the best of our knowledge, there is not much literature on this model by using empirical likelihood method, although it has been successfully applied to various models, see, e.g., linear models (Owen 1991), generalized linear models (Kolaczyk 1994), partially linear models (Shi and Lau 2000), heteroscedastic partially linear models (Lu 2009), single-index models (Xue and Zhu 2006), partially linear single-index models (Zhu and Xue 2006), general estimating equations (Qin and Lawless 1994), quantile estimation (Chen and Hall 1993), errors-in-covariables models (Wang and Rao 2002), Cox regression models(Qin and Jing 2001), additive risk models (Lu and Qi 2004). Empirical likelihood has also been applied to some high-dimensional problems and its asymptotic behavior under the setting where n and p both tend to infinity has also been carefully studied. Hjort et al. (2009) derived the limit distribution of the EL ratio statistic based on p-dimensional estimating equations when \(p\rightarrow \infty \) with n at the rate \(p=o\left( n^{1/3}\right) \); Chen et al. (2009) improved upon the rate restriction in Hjort et al. (2009) and established a nondegenerate limit distribution of the EL ratio statistic, allowing \(p=o\left( n^{1/2}\right) \) under suitable regularity conditions.

Zhu and Xue (2006) investigated likelihood confidence regions in a partially linear single-index model. In their paper, empirical likelihood was constructed from the components of a semiparametric inefficient estimating equation. We think that it might be more informative if we use semiparametric efficient score to construct the empirical likelihood. Furthermore, Zhu and Xue (2006) assumed that dimensions p and r for \(\beta \) and \(\theta \) are fixed. Their results may not be valid when dimensions \(p\rightarrow \infty \) and \(r\rightarrow \infty \), as \(n\rightarrow \infty \). Motivated by the empirical likelihood method for high-dimensional data in Hjort et al. (2009), in this paper, we propose a new approach to the empirical likelihood inference about \((\beta ^{\top },\theta ^{\top })^{\top }\) for heteroscedastic partially linear single-index models with high-dimensional data based on the semiparametric efficient score, where dimensions \(p\rightarrow \infty \) and \(r\rightarrow \infty \), as \(n\rightarrow \infty \). We will show that the limit distribution of the empirical log-likelihood ratio for \(\beta \) is a normal distribution. Furthermore, we will show that the empirical log-likelihood ratio based on a k-dimensional (\(k<p\)) subvector of \(\beta \) is an asymptotically standard chi-square random variable, which can be used to construct confidence intervals or regions for the k-dimensional subvector of \(\beta \).

The remainder of this paper is organized as follows. In Sect. 2, we introduce the empirical likelihood method for the inference of heteroscedastic partially linear single-index models and present our main results. The pure single-index model and the partially linear model, as the special examples, are discussed in Sect. 3. In Sect. 4, we report the results from simulation studies, and a real data example is presented in Sect. 5. Finally, the technical proofs of the main results are given in the “Appendix”.

2 Methodology and main results

Firstly, we introduce the efficient estimation method of Ma and Zhu (2013) for the parameter \((\beta ^{\top },\theta ^{\top })^{\top }\) in the heteroscedastic partially linear single-index model (1), and propose the empirical likelihood method for \((\beta ^{\top },\theta ^{\top })^{\top }\). To estimate \((\beta ^{\top },\theta ^{\top })^{\top }\), Ma and Zhu (2013) reviewed the following class of weighted estimating equations:

$$\begin{aligned}&\frac{1}{\sqrt{n}}\sum _{i=1}^{n}w\left( X_{i},Z_{i}\right) \left\{ Y_{i}-X_{i}^{\top }\beta -f\left( Z_{i}^{\top }\theta \right) \right\} \left[ a\left( X_{i},Z_{i}\right) \right. \nonumber \\&\qquad \left. \left. -\,{\tilde{E}}\left\{ a\left( X,Z\right) |Z_{i}^{\top }\theta \right) \right\} \right] =0, \end{aligned}$$
(2)

where \(f(Z_{i}^{\top }\theta )\) is a given function of \(Z_{i}^{\top }\theta \) which may or may not equal \(g(Z_{i}^{\top }\theta )\), \({\tilde{E}}\{\cdot |Z_{i}^{\top }\theta )\}\) denotes a function of \(Z_{i}^{\top }\theta \) that may or may not be the true \(E\{\cdot |Z_{i}^{\top }\theta )\}\), \(w_{i}=w(X_{i},Z_{i})=\mathrm{var}(\varepsilon _{i}|X_{i},Z_{i})^{-1}\) and \(a(\cdot ,\cdot )\in {\mathbb {R}}^{p+r-1}\) is an arbitrary function of X and Z. They pointed out that consistent estimator \(({\hat{\beta }}^{\top },{\hat{\theta }}^{\top })^{\top }\) can be obtained by (2) if \(w_{i}\) and \(f(\cdot )\) can be consistently estimated or is completely known. However, the doubly robustness and efficiency property of \(({\hat{\beta }}^{\top },{\hat{\theta }}^{\top })^{\top }\) is lost. In order to develop estimators for \((\beta ^{\top },\theta ^{\top })^{\top }\) with the doubly robustness and efficiency property, Ma and Zhu (2013) proposed an improved class of weighted estimation equations:

$$\begin{aligned} \left\{ \begin{array}{ll} \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}{\tilde{\varepsilon }}_{i}{\hat{w}}\left( X_{i},Z_{i}\right) \left[ X_{i}-\frac{{\hat{E}}\left\{ {\hat{w}}\left( X,Z\right) X|Z_{i}^{\top }\theta \right\} }{{\hat{E}} \left\{ {\hat{w}}\left( X,Z\right) |Z_{i}^{\top }\theta \right\} }\right] =0,\\ \frac{1}{\sqrt{n}}\sum \limits _{i=1}^{n}{\tilde{\varepsilon }}_{i}{\hat{w}}\left( X_{i},Z_{i}\right) {\hat{g}}'\left( Z_{i}^{\top }\theta \right) \left[ Z_{-1,i}-\frac{{\hat{E}}\left\{ {\hat{w}}\left( X,Z\right) Z_{-1}|Z_{i}^{\top }\theta \right\} }{{\hat{E}}\left\{ {\hat{w}}\left( X,Z\right) |Z_{i}^{\top }\theta \right\} }\right] =0, \end{array} \right. \end{aligned}$$
(3)

where \({\tilde{\varepsilon }}_{i}=Y_{i}-X_{i}^{\top }\beta -{\hat{g}}(Z_{i}^{\top }\theta )\), \(w_{i}=w(X_{i},Z_{i})=E(\varepsilon _{i}^{2}|X_{i},Z_{i})^{-1}\), \({\hat{g}}(Z_{i}^{\top }\theta )\), \({\hat{g}}'(Z_{i}^{\top }\theta )\), \({\hat{E}}\{{\hat{w}}(X,Z)|Z_{i}^{\top }\theta \}\), \({\hat{E}}\{{\hat{w}}(X,Z)X|Z_{i}^{\top }\theta \}\) and \({\hat{E}}\{{\hat{w}}(X,Z)Z_{-1}|Z_{i}^{\top }\theta \}\) are the nonparametric estimators via kernel estimation. We assume that \(\eta _{i}=\eta (X_{i},Z_{i})\) is a low dimensional variable such that \(\mathrm{var}(\varepsilon _{i}|X_{i},Z_{i})=\mathrm{var}(\varepsilon _{i}|\eta _{i})\) and \(\eta \) has a known form. For example, \(\eta \) can be \(Z^{\mathrm{T}}\theta \) or \(X^{\mathrm{T}}\beta \), which means that the error variance depends on the covariates through \(Z^{\mathrm{T}}\theta \) or \(X^{\mathrm{T}}\beta \) only. Certainly, it can also be a combination of these two or can have any other form. Let \(K_{h}(\cdot )=h^{-1}K(\cdot /h)\), where \(K_{h}(\cdot )\) is a kernel function with bandwidth \(h\rightarrow 0\). For bandwidths \(h_{1}\), \(h_{2}\) and \(h_{3}\), we set

$$\begin{aligned}&{\hat{g}}\left( Z_{i}^{\top }\theta \right) =\sum _{j\ne i}K_{h_{1}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) \left( Y_{j}-X_{j}^{\top }\beta \right) /\sum _{j\ne i}K_{h_{1}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) ,\\&{\hat{g}}'\left( Z_{i}^{\top }\theta \right) =h_{1}^{-1}\left\{ \sum _{j\ne i}K_{h_{1}}'\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) \left( Y_{j}-X_{j}^{\top }\beta \right) \sum _{j\ne i}K_{h_{1}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) \right\} \\&\quad \qquad \qquad \qquad -\sum _{j\ne i}K_{h_{1}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) \left( Y_{j}-X_{j}^{\top }\beta \right) \\&\quad \qquad \qquad \qquad \left. \times \sum _{j\ne i}K_{h_{1}}'\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) \right\} \Bigg /\left\{ \sum _{j\ne i}K_{h_{1}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) \right\} ^{2},\\&{\hat{w}}\left( X_{i},Z_{i}\right) =\sum _{j\ne i}K_{h_{2}}\left( \eta _{i}-\eta _{j}\right) /\sum _{j\ne i}K_{h_{2}}\left( \eta _{i}-\eta _{j}\right) e_{j}^{2},\\&{\hat{E}}\left\{ {\hat{w}}\left( X,Z\right) |Z_{i}^{\top }\theta \right\} \\&\quad =\sum _{j\ne i}K_{h_{3}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) {\hat{w}}\left( X_{j},Z_{j}\right) /\sum _{j\ne i}K_{h_{3}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) ,\\&{\hat{E}}\left\{ {\hat{w}}\left( X,Z\right) X|Z_{i}^{\top }\theta \right\} \\&\quad =\sum _{j\ne i}K_{h_{3}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) {\hat{w}}\left( X_{j},Z_{j}\right) X_{j}/\sum _{j\ne i}K_{h_{3}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) ,\\&{\hat{E}}\left\{ {\hat{w}}\left( X,Z\right) Z_{-1,i}|Z_{i}^{\top }\theta \right\} \\&\quad =\sum _{j\ne i}K_{h_{3}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) {\hat{w}}\left( X_{j},Z_{j}\right) Z_{-1,j}/\sum _{j\ne i}K_{h_{3}}\left( Z_{i}^{\top }\theta -Z_{j}^{\top }\theta \right) . \end{aligned}$$

The algorithm in calculating the estimators of \((\beta ^{\top },\theta ^{\top })^{\top }\) can be found in Ma and Zhu (2013).

Ma and Zhu (2013) proved that, under some mild conditions, the semiparametric efficient score is

$$\begin{aligned} S_{eff}=w\varepsilon \left( X^{\top }-\frac{E(wX^{\top }|Z^{\top }\theta )}{E(w|Z^{\top }\theta )},g'(Z^{\top }\theta )\left\{ Z_{-1}^{\top }-\frac{E\left( wZ_{-1}^{\top }|Z^{\top }\theta \right) }{E(w|Z^{\top }\theta )}\right\} \right) ^{\top }, \end{aligned}$$
(4)

and estimators \(({\hat{\beta }}^{\top },{\hat{\theta }}^{\top })^{\top }\) by solving (3) are doubly robust and efficient. In Ma and Zhu (2013), although they allowed high-dimensional covariates, the dimension of covariates is fixed, and can not tend to infinity as the sample size \(n\rightarrow \infty \).

We first extend the fixed-dimensional results in Ma and Zhu (2013) to cases with diverging dimensionality, i.e., p, \(r\rightarrow \infty \) as \(n\rightarrow \infty \). Let \(B_{n}\in {\mathbb {R}}^{(p+r-1)\times q}\) with fixed q. \(B_{n}^{\top }S_{eff}\) represents a projection of the diverging dimensional vector \(S_{eff}\) to a fixed dimension q.

Definition 1

Assume that \(S_{eff}\) is a diverging dimensional semiparametric score, and \(B_{n}^{\top }S_{eff}\) represents a projection of the diverging dimensional vector \(S_{eff}\) to a fixed dimension q. We say that the diverging dimensional semiparametric score is a semiparametric efficient score if the fixed dimensional projection \(B_{n}^{\top }S_{eff}\) of \(S_{eff}\) is a semiparametric efficient score.

Similar to Proposition 1 of Ma and Zhu (2013), we can prove that, for any fixed q, the projection \(B_{n}^{\top }S_{eff}\) of the diverging dimensional semiparametric score \(S_{eff}\) is a semiparametric efficient score. Therefore, according to Definition 1, we say that the diverging dimensional semiparametric score \(S_{eff}\) is a semiparametric efficient score.

The following theorem shows the theoretical properties of high-dimensional estimator based on the semiparametric score.

Theorem 2.1

Let \((\beta _{0}^{\top }, \theta _{0}^{\top })^{\top }\) be the true value of parameter vector \((\beta ^{\top },\theta ^{\top })^{\top }\), under Assumptions 19, then

$$\begin{aligned} \sqrt{n}AV^{1/2}\left\{ \left( {\hat{\beta }}^{\top },{\hat{\theta }}^{\top }\right) ^{\top }-\left( \beta _{0}^{\top },\theta _{0}^{\top }\right) ^{\top }\right\} {\mathop {\rightarrow }\limits ^{L}} N\left( 0,G\right) , \end{aligned}$$

where \({\mathop {\rightarrow }\limits ^{L}}\) stands for convergence in distribution, \(A_{1}\in R^{q_{1}\times p}\) and \(A_{2}\in R^{q_{2}\times r}\) are \(q_{1}\times p\) and \(q_{2}\times r\) matrixes respectively, \(AA^{\top }\rightarrow G\) and G is a \((q_{1}+q_{2})\times (q_{1}+q_{2})\) matrix with fixed \(q_{1}\) and \(q_{2}\),

$$\begin{aligned} A= & {} \left( { \begin{array}{*{10}c} A_{1}&{}0\\ 0&{}A_{2}\\ \end{array}}\right) , \qquad V=\left( { \begin{array}{*{10}c} V_{11}&{}V_{12}\\ V_{21}&{}V_{22}\\ \end{array}}\right) ,\\ V_{11}= & {} E\left\{ wXX^{\top }-\frac{E(wX|Z^{\top }\theta )E(wX^{\top }|Z^{\top }\theta )}{E(w|Z^{\top }\theta )}\right\} ,\\ V_{12}= & {} E\left[ g'(Z^{\top }\theta )\left\{ wXZ_{-1}^{\top }-\frac{E(wX|Z^{\top }\theta )E(wZ_{-1}^{\top }|Z^{\top }\theta )}{E(w|Z^{\top }\theta )}\right\} \right] ,\\ V_{21}= & {} E\left[ g'(Z^{\top }\theta )\left\{ wZ_{-1}X^{\top }-\frac{E(wZ_{-1}|Z^{\top }\theta )E(wX^{\top }|Z^{\top }\theta )}{E(w|Z^{\top }\theta )}\right\} \right] ,\\ V_{22}= & {} E\left[ g'(Z^{\top }\theta )^{2}\left\{ wZ_{-1}Z_{-1}^{\top }-\frac{E(wZ_{-1}|Z^{\top }\theta )E(wZ_{-1}^{\top }|Z^{\top }\theta )}{E(w|Z^{\top }\theta )}\right\} \right] . \end{aligned}$$

Theorem 2.1 extends the fixed-dimensional results in Ma and Zhu (2013) to cases with diverging dimensionality. Furthermore, in Theorem 2.1, A represents a projection of the diverging dimensional vector to a fixed dimension \(q_{1}+q_{2}\), and the limiting distribution of the projected vector of \(\{({\hat{\beta }}^{\top },{\hat{\theta }}^{\top })^{\top }-(\beta _{0}^{\top },\theta _{0}^{\top })^{\top }\}\) is a multivariate normal distribution. This theorem provides the consistency and normality of the projected vector of the estimator \(({\hat{\beta }}^{\top },{\hat{\theta }}^{\top })^{\top }\) for heteroscedastic partially linear single-index models.

Next, we introduce an auxiliary random vector by using the semiparametric efficient score. Let

$$\begin{aligned} \xi _{i}(\beta ,\theta )=w_{i}\varepsilon _{i}\left( X_{i}^{\top }-\frac{E\left( wX^{\top }|Z_{i}^{\top }\theta \right) }{E(w|Z_{i}^{\top }\theta )}, g'\left( Z_{i}^{\top }\theta \right) \left\{ Z_{-1,i}^{\top }-\frac{E\left( wZ_{-1}^{\top }|Z_{i}^{\top }\theta \right) }{E(w|Z_{i}^{\top }\theta )}\right\} \right) ^{\top }. \end{aligned}$$

Note that \(E\{\xi _{i}(\beta ,\theta )\}=0\) for \(i=1,\ldots , n\), if \((\beta ^{\top },\theta ^{\top })^{\top }\) is the true parameter. According to this fact, we apply the empirical likelihood method of Owen (1988, 1990) to make inference about \((\beta ^{\top },\theta ^{\top })^{\top }\). Let \(\pi =(\pi _{1},\cdots , \pi _{n})\) be a probability vector, satisfying \(\sum _{i=1}^{n}\pi _{i}=1\) \(\pi _{i}\ge 0\) for \(i=1,\ldots , n\). The traditional empirical likelihood function for \((\beta ^{\top },\theta ^{\top })^{\top }\) is defined as follows:

$$\begin{aligned} L(\beta ,\theta )=\sup \left\{ \prod _{i=1}^{n}(n\pi _{i}):\sum _{i=1}^{n}\pi _{i}=1, \pi _{i}\ge 0, \sum _{i=1}^{n}\pi _{i}\xi _{i}(\beta ,\theta )=0\right\} . \end{aligned}$$
(5)

Because (5) contains unknown functions \(w(X_{i},Z_{i})\), \(g(Z_{i}^{\top }\theta )\), \(E\{w(X,Z)|Z_{i}^{\top }\theta \}\), \(g'(Z_{i}^{\top }\theta )\), \(E\{w(X,Z)X|Z_{i}^{\top }\theta \}\) and \(E\{w(X,Z)Z_{-1}|Z_{i}^{\top }\theta \}\), it cannot be used directly to make inference on \((\beta ^{\top },\theta ^{\top })^{\top }\). To solve this problem, a natural method is to replace those by their estimators \({\hat{w}}(X_{i},Z_{i})\), \({\hat{g}}(Z_{i}^{\top }\theta )\), \({\hat{g}}'(Z_{i}^{\top }\theta )\), \({\hat{E}}\{{\hat{w}}(X,Z)|Z_{i}^{\top }\theta \}\), \({\hat{E}}\{{\hat{w}}(X,Z)X|Z_{i}^{\top }\theta \}\) and \({\hat{E}}\{{\hat{w}}(X,Z)Z_{-1}|Z_{i}^{\top }\theta \}\) given above. Define an estimated empirical likelihood function for \((\beta ^{\top },\theta ^{\top })^{\top }\) as

$$\begin{aligned} {\tilde{L}}(\beta ,\theta )=\sup \left\{ \prod _{i=1}^{n}(n\pi _{i}):\sum _{i=1}^{n}\pi _{i}=1, \pi _{i}\ge 0, \sum _{i=1}^{n}\pi _{i}{\hat{\xi }}_{i}(\beta ,\theta )=0\right\} , \end{aligned}$$
(6)

where

$$\begin{aligned} {\hat{\xi }}_{i}\left( \beta ,\theta \right) ={\hat{w}}_{i}{\tilde{\varepsilon }}_{i}\left( X_{i}^{\top }-\frac{{\hat{E}}\left( {\hat{w}}X^{\top }|Z_{i}^{\top }\theta \right) }{{\hat{E}}\left( {\hat{w}} |Z_{i}^{\top }\theta \right) },{\hat{g}}'\left( Z_{i}^{\top }\theta \right) \left\{ Z_{-1,i}^{\top }-\frac{{\hat{E}}\left( {\hat{w}}Z_{-1}^{\top }|Z_{i}^{\top }\theta \right) }{{\hat{E}}\left( {\hat{w}}|Z_{i}^{\top } \theta \right) }\right\} \right) ^{\top }, \end{aligned}$$

and \({\tilde{\varepsilon }}_{i}=Y_{i}-X_{i}^{\top }\beta -{\hat{g}}(Z_{i}^{\top }\theta )\). The estimated empirical log-likelihood ratio is

$$\begin{aligned} {\tilde{l}}(\beta ,\theta )=-\,2\left\{ \log \{{\tilde{L}}(\beta ,\theta )\}-n\log (n)\right\} . \end{aligned}$$
(7)

According to Tsao (2004), when the dimension \(p+r\) is moderately large but fixed, the distribution of empirical likelihood ratio \({\tilde{l}}(\beta ,\theta )\) has an atom at infinity for fixed sample size n: the probability of \({\tilde{l}}(\beta ,\theta )=\infty \) is nonzero. Based on the paper of Tsao (2004), the dimension \(p+r\) and the sample size n increase at the same rate such that \(p/n\ge 0.5\), the probability of \({\tilde{l}}(\beta ,\theta )=\infty \) converges to 1 since the probability of \((\beta ^{\top },\theta ^{\top })^{\top }\) being contained in the convex hull of the sample converges to 0. These reveal the effects of the dimension \(p+r\) on the empirical likelihood from another perspective. In this paper, we analyze the empirical likelihood for heteroscedastic partially linear single-index models with high-dimensional data, which p, r and n increase at the mild rate in order to ensure the empirical likelihood ratio \({\tilde{l}}(\beta ,\theta )\) having definition.

By using the Lagrange multiplier method, \(\{\pi _{i}\}_{i=1}^{n}\) in (6) are

$$\begin{aligned} \pi _{i}=\frac{1}{n}\frac{1}{1+\lambda ^{\top }{\hat{\xi }}_{i}(\beta ,\theta )}, \end{aligned}$$

with the restriction on \(\lambda \) that is

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\frac{{\hat{\xi }}_{i}(\beta ,\theta )}{1+\lambda ^{\top }{\hat{\xi }}_{i}(\beta ,\theta )}=0. \end{aligned}$$
(8)

Therefore, the estimated empirical log-likelihood ratio function for \((\beta ^{\top },\theta ^{\top })^{\top }\) defined in (7) is given by

$$\begin{aligned} {\tilde{l}}(\beta ,\theta )=-\,2\left\{ \log \{{\tilde{L}}(\beta ,\theta )\}-n\log (n)\right\} =2\sum _{i=1}^{n}\log \left\{ 1+\lambda ^{\top }{\hat{\xi }}_{i}(\beta ,\theta )\right\} . \end{aligned}$$
(9)

The following theorem gives the asymptotic distribution of \({\tilde{l}}(\beta ,\theta )\).

Theorem 2.2

Let \((\beta _{0}^{\top }, \theta _{0}^{\top })^{\top }\) be the true value of parameter vector \((\beta ^{\top },\theta ^{\top })^{\top }\), under Assumptions 19, \({\tilde{l}}(\beta _{0},\theta _{0})\) has an asymptotic standard normal distribution, i.e.,

$$\begin{aligned} \left\{ 2(p+r-1)\}^{-1/2}\{{\tilde{l}}(\beta _{0},\theta _{0})-(p+r-1)\right\} {\mathop {\rightarrow }\limits ^{L}} N(0,1). \end{aligned}$$

Therefore, Theorem 2.2 can be used to test the hypothesis

$$\begin{aligned} H_{0}: \left( \beta ^{\top },\theta ^{\top }\right) ^{\top }=\left( \beta _{0}^{\top }, \theta _{0}^{\top }\right) ^{\top }~~~vs~~~H_{1}: \left( \beta ^{\top },\theta ^{\top }\right) ^{\top }\ne \left( \beta _{0}^{\top }, \theta _{0}^{\top }\right) ^{\top }, \end{aligned}$$

and on the other hand, it also can be used to construct confidence regions for \((\beta ^{\top },\theta ^{\top })^{\top }\). Let

$$\begin{aligned} I_{\alpha }(\beta ,\theta )=\left\{ (\beta ^{\top },\theta ^{\top })^{\top }:\left| {\tilde{l}}(\beta ,\theta )-(p+r-1)\right| \le z_{\alpha /2}\sqrt{2(p+r-1)}\right\} , \end{aligned}$$

where \(z_{\alpha /2}\) is the upper \(\alpha /2-\)quantile of the standard normal distribution, then by Theorem 2.2, \(I_{\alpha }(\beta ,\theta )\) gives an approximate confidence region for \((\beta ^{\top },\theta ^{\top })^{\top }\) with asymptotically correct coverage probability \(1-\alpha \), i.e.,

$$\begin{aligned} P((\beta ^{\top },\theta ^{\top })^{\top }\in I_{\alpha }(\beta ,\theta ))=1-\alpha +o(1). \end{aligned}$$

Remark 1

Theorem 2.2 gives the asymptotic distribution of \({\tilde{l}}(\beta ,\theta )\) with diverging dimensionality, i.e., p, \(r\rightarrow \infty \) as \(n\rightarrow \infty \). When p and r are fixed and do not diverge with n, \({\tilde{l}}(\beta _{0},\theta _{0})\) is asymptotically a standard chi-square random variable, and it is easy to see from the proof of Theorem 2.2 in “Appendix” that

$$\begin{aligned} {\tilde{l}}(\beta _{0},\theta _{0}){\mathop {\rightarrow }\limits ^{L}}\chi _{p+r-1}^{2}, \end{aligned}$$

where \(\chi _{k}^{2}\) denotes the chi-square distribution with \((p+r-1)\) degrees of freedom.

In practice, it is rarely the case that confidence regions for the entire parameter vector will be sought, because as soon as dimension \(p>3\), it is hard to visualize or represent the confidence region. Usually one will only be interested in a one or two-dimensional subvector of parameter. Assume \((\beta ^{\top },\theta ^{\top })^{\top }=(\beta ^{(1)\top },\beta ^{(2)\top },\theta ^{\top })^{\top }\), where \(\beta ^{(1)}\) is k-dimensional (k is fixed and does not diverge with n) for which the confidence interval/region is to be constructed. The true parameter vector can be partitioned as \((\beta _{0}^{\top },\theta _{0}^{\top })^{\top }=(\beta _{0}^{(1)\top }, \beta _{0}^{(2)\top },\theta _{0}^{\top })^{\top }\), and \(X_{i}\) and \(Z_{i}\) can be similarly partitioned.

We can obtain, with fixed \(\beta ^{(1)}\), estimators for the rest of the parameters by solving (3), denoted by \({\hat{\beta }}^{(2)}\) and \({\hat{\theta }}\). Define

$$\begin{aligned} \hat{{\tilde{\xi }}}_{i}(\beta ^{(1)})= & {} {\hat{w}}_{i}\left\{ Y_{i}-X_{i}^{(1)\top }\beta ^{(1)}-X_{i}^{(2)\top }{\hat{\beta }}^{(2)}\right. \\&\left. -\,{\hat{g}}\left( Z_{i}^{\top }{\hat{\theta }}\right) \right\} \left( X_{i}^{(1)}-\frac{{\hat{E}}\left( {\hat{w}}X^{(1)}|Z_{i}^{\top }{\hat{\theta }}\right) }{{\hat{E}}\left( {\hat{w}}|Z_{i}^{\top }{\hat{\theta }}\right) }\right) \end{aligned}$$

The estimated empirical log-likelihood for \(\beta ^{(1)}\) is given by

$$\begin{aligned} {\tilde{l}}(\beta ^{(1)})=2\sum _{i=1}^{n}\log \left\{ 1+\lambda ^{(1)\top }\hat{{\tilde{\xi }}}_{i}(\beta ^{(1)})\right\} . \end{aligned}$$
(10)

Theorem 2.3

Under the same assumptions as in Theorem 2.2,

$$\begin{aligned} {\tilde{l}}\left( \beta _{0}^{(1)}\right) {\mathop {\rightarrow }\limits ^{L}} \chi _{k}^{2}. \end{aligned}$$

Based on Theorem 2.3, a confidence interval or region for \(\beta ^{(1)}\) can be easily constructed. For any given \(0<\alpha <1\), there exists \(c_{\alpha }\) such that \(P(\chi _{k}^{2}>c_{\alpha })=\alpha \), then

$$\begin{aligned} I_{\alpha }(\beta ^{(1)})=\left\{ \beta ^{(1)}\in R^{k}:{\tilde{l}}(\beta ^{(1)})\le c_{\alpha }\right\} \end{aligned}$$

is the confidence interval or region of the \(\beta ^{(1)}\) with asymptotically correct coverage probability \(1-\alpha \).

Remark 2

Based on Tsao (2004), in high dimensional setting, the dimension will have non-ignorable effect on coverage probabilities of empirical likelihood confidence regions. Therefore, coverage probabilities of empirical likelihood confidence regions may be less than the nominal significance level. We hope to consider how to improve coverage probabilities of empirical likelihood confidence regions under the high dimensional situations in future communications.

3 Two special cases: partially linear models and single-index models

In this section, we present the empirical likelihood inference for \(\beta \) or \(\theta \) of two special cases in model (1). First we consider the heteroscedastic partially linear model with a diverging number parameters, meaning that single-index component in model (1) does not contain parameters, and the model can be written as

$$\begin{aligned} Y_{i}=X_{i}^{\top }\beta +g(Z_{i})+\varepsilon _{i},~~~(i=1,\ldots ,n). \end{aligned}$$
(11)

The weighted estimating equations for model (11) become

$$\begin{aligned} n^{-1/2}\sum \limits _{i=1}^{n}{\hat{w}}(X_{i},Z_{i})\left( Y_{i}{-}X_{i}^{\top }\beta {-}{\hat{g}}(Z_{i})\right) \left[ X_{i}{-}\frac{{\hat{E}} \{{\hat{w}}(X,Z)X|Z_{i}\}}{{\hat{E}}\{{\hat{w}}(X,Z)|Z_{i}\}}\right] =0. \end{aligned}$$
(12)

The estimator \({\hat{\beta }}\) can be obtained by solving semiparametric efficient score equations (12). Rewrite the auxiliary random vector

$$\begin{aligned} {\hat{\xi }}_{i}(\beta )={\hat{w}}_{i}\left\{ Y_{i}-X_{i}^{\top }\beta -{\hat{g}}(Z_{i})\right\} \left\{ X_{i}-\frac{{\hat{E}}({\hat{w}}X|Z_{i})}{{\hat{E}} ({\hat{w}}|Z_{i})}\right\} . \end{aligned}$$

Therefore, the estimated empirical log-likelihood ratio function for \(\beta \) is

$$\begin{aligned} {\tilde{l}}_{1}(\beta )=2\sum _{i=1}^{n}\log \left\{ 1+\lambda ^{\top }{\hat{\xi }}_{i}(\beta )\right\} . \end{aligned}$$
(13)

Assume \(\beta =(\beta ^{(1)\top },\beta ^{(2)\top })^{\top }\), where \(\beta ^{(1)}\) is a k-dimensional (k is fixed and does not diverge with n) component of \(\beta \). The true parameter vector can be partitioned as \(\beta _{0}=(\beta _{0}^{(1)\top }, \beta _{0}^{(2)\top })^{\top }\), and \(X_{i}\) can be similarly partitioned. We can obtain, with fixed \(\beta ^{(1)}\), estimators for the rest of the parameters by solving (12), denoted by \({\hat{\beta }}^{(2)}\). Define

$$\begin{aligned} \tilde{{\tilde{\xi }}}_{i}(\beta ^{(1)})={\hat{w}}_{i}\left\{ Y_{i}-X_{i}^{(1)\top }\beta ^{(1)}-X_{i}^{(2)\top }{\hat{\beta }}^{(2)}-{\hat{g}}(Z_{i})\right\} \left\{ X_{i}^{(1)}-\frac{{\hat{E}}({\hat{w}}X^{(1)}|Z_{i})}{{\hat{E}}({\hat{w}}|Z_{i})}\right\} . \end{aligned}$$

The estimated empirical log-likelihood for \(\beta ^{(1)}\) is given by

$$\begin{aligned} \tilde{{\tilde{l}}}(\beta ^{(1)})=2\sum _{i=1}^{n}\log \left\{ 1+\lambda ^{(1)\top }\tilde{{\tilde{\xi }}}_{i}(\beta ^{(1)})\right\} . \end{aligned}$$
(14)

In high-dimensional setting, there are some theoretical properties for model (11) as follows.

Theorem 3.1

Let \(\beta _{0}\) be the true value of parameter vector \(\beta \), under Assumptions 19, then

$$\begin{aligned}&(1)\quad \sqrt{n}A_{n}V_{1}^{-1/2}({\hat{\beta }}-\beta _{0}){\mathop {\rightarrow }\limits ^{L}} N(0,G),\\&(2)\quad (2p)^{-1/2}\left\{ {\tilde{l}}_{1}(\beta _{0})-p\right\} {\mathop {\rightarrow }\limits ^{L}} N(0,1),\\&(3)\quad \tilde{{\tilde{l}}}(\beta _{0}^{(1)}) {\mathop {\rightarrow }\limits ^{L}} \chi _{k}^{2}, \end{aligned}$$

where \(A_{n}\in R^{q\times p}\) is a \(q\times p \) matrix such that \(A_{n}A_{n}^{\top }\rightarrow G\) and G is a \(q\times q\) matrix with fixed q, and

$$\begin{aligned} V_{1}=\left[ E\left\{ wXX^{\top }-\frac{E(wX|Z)E(wX|Z)^{\top }}{E(w|Z)}\right\} \right] ^{-1}. \end{aligned}$$

Theorem 3.1(1) extends the fixed-dimensional results in Ma et al. (2006) to cases with diverging dimensionality. Theorem 3.1(2) can be used to construct empirical likelihood confidence regions for \(\beta \). A \((1-\alpha )100\%\) confidence region for \(\beta \) is given by

$$\begin{aligned} I_{\alpha }(\beta )=\left\{ \beta :\left| {\tilde{l}}_{1}(\beta )-p\right| \le z_{\alpha /2}\sqrt{2p}\right\} . \end{aligned}$$

When p is fixed and does not diverge with n, \({\tilde{l}}_{1}(\beta _{0})\) has the asymptotic chi-square distribution, and this corresponds to the empirical likelihood method considered by Lu (2009) for heteroscedastic partially linear models. Theorem 3.1(3) can be used to construct the confidence interval or region of subvector of parameter \(\beta \). Based on Theorem 3.1(3), for \(0<\alpha <1\), the confidence interval or region for \(\beta ^{(1)}\) can be given by

$$\begin{aligned} {\tilde{I}}_{\alpha }(\beta ^{(1)})=\left\{ \beta ^{(1)}|\tilde{{\tilde{l}}}(\beta ^{(1)})\le \chi _{k}^{2}(\alpha )\right\} . \end{aligned}$$

For high-dimensional data, we now consider the heteroscedastic pure single-index model

$$\begin{aligned} Y_{i}=g\left( Z_{i}^{\top }\theta \right) +\varepsilon _{i},~~~(i=1,\ldots ,n). \end{aligned}$$
(15)

It means that there is no linear component in model (1). The weighted estimating equations can be written as

$$\begin{aligned} n^{-1/2}\sum \limits _{i=1}^{n}{\hat{w}}\left\{ Y_{i}{-}{\hat{g}}\left( Z_{i}^{\top }\theta \right) \right\} {\hat{g}}'\left( Z_{i}^{\top }{\tilde{\theta }}\right) \left[ Z_{-1,i}{-}\frac{{\hat{E}}\left\{ {\hat{w}}Z_{-1}|Z_{i}^{\top }{\tilde{\theta }}\right\} }{{\hat{E}}\left\{ {\hat{w}}|Z_{i}^{\top }{\tilde{\theta }}\right\} }\right] =0, \end{aligned}$$
(16)

where \({\tilde{\theta }}\) is the solution of the following equations

$$\begin{aligned} n^{-1/2}\sum _{i=1}^{n}{\hat{w}}\left\{ Y_{i}-f\left( Z_{i}^{\top }\theta \right) \right\} \left\{ a(Z_{i})-{\tilde{E}}\left\{ a(Z)|Z_{i}^{\top }\theta )\right\} \right\} =0. \end{aligned}$$

The estimator \({\hat{\theta }}\) can be got by solving semiparametric efficient score equations (16). Similarly, redefine the auxiliary random vector

$$\begin{aligned} {\hat{\xi }}_{i}\left( \theta \right) ={\hat{w}}_{i}\left\{ Y_{i}-{\hat{g}}\left( Z_{i}^{\top }\theta \right) \right\} {\hat{g}}'\left( Z_{i}^{\top }\theta \right) \left\{ Z_{-1,i}-\frac{{\hat{E}}\left( {\hat{w}}Z_{-1}|Z_{i}^{\top }\theta \right) }{{\hat{E}}\left( {\hat{w}}|Z_{i}^{\top }\theta \right) }\right\} . \end{aligned}$$

and the estimated empirical log-likelihood ratio is

$$\begin{aligned} {\tilde{l}}_{2}(\theta )=2\sum _{i=1}^{n}\log \left\{ 1+\lambda ^{\top }{\hat{\xi }}_{i}(\theta )\right\} . \end{aligned}$$

Theorem 3.2

Under Assumptions 19, if \(\theta _{0}\) is the true parameter value, then

$$\begin{aligned}&(1) \quad \sqrt{n}A_{n}V_{2}^{-1/2}({\hat{\theta }}_{-1}-\theta _{0,-1}){\mathop {\rightarrow }\limits ^{L}} N(0,G), \\&(2) \quad \{2(r-1)\}^{-1/2}\left\{ {\tilde{l}}_{2}(\theta _{0})-(r-1)\right\} {\mathop {\rightarrow }\limits ^{L}} N(0,1), \end{aligned}$$

where \(A_{n}\in R^{q\times (r-1)}\) is a \(q\times (r-1)\) matrix such that \(A_{n}A_{n}^{\top }\rightarrow G\) and G is a \(q\times q\) matrix with fixed q, and

$$\begin{aligned} V_{2}=E\left[ g'(Z^{\top }\theta )^{2}\left\{ wZ_{-1}Z_{-1}^{\top }-\frac{E(wZ_{-1}|Z^{\top }\theta )E(wZ_{-1}^{\top }|Z^{\top }\theta )}{E(w|Z^{\top }\theta )}\right\} \right] . \end{aligned}$$

The theoretical properties of \({\hat{\theta }}\) is given by Theorem 3.2(1). When r is fixed, \({\tilde{l}}_{2}(\theta _{0}){\mathop {\rightarrow }\limits ^{L}} \chi _{r-1}^{2}\). According to Theorem 3.2(2), a \((1-\alpha )100\%\) empirical likelihood confidence region for \(\theta \) is

$$\begin{aligned} I_{\alpha }(\theta )=\left\{ \theta :\left| {\tilde{l}}_{2}(\theta )-(r-1)\right| \le z_{\alpha /2}\sqrt{2(r-1)}\right\} . \end{aligned}$$

4 Simulation studies

Firstly, we describe how to approach the optimization problems posed by the empirical likelihood method. Due to the nonconvexity, computing the empirical likelihood ratio is nontrivial. We can use the following algorithm to obtain the estimated empirical likelihood ratio defined by (7).

  • Step 1 Obtain \({\hat{g}}(Z_{i}^{\top }\theta )\), \({\hat{g}}'(Z_{i}^{\top }\theta )\), \({\hat{w}}(X_{i},Z_{i})\), \({\hat{E}}\{{\hat{w}}(X,Z)|Z_{i}^{\top }\theta \}\), \({\hat{E}}\{{\hat{w}}(X,Z)X|Z_{i}^{\top }\theta \}\) and \({\hat{E}}\{{\hat{w}}(X,Z)Z_{-1}|Z_{i}^{\top }\theta \}\) described above by using the fixed values of \((\beta ^{\top },\theta ^{\top })^{\top }\).

  • Step 2 Obtain the auxiliary random vector \({\hat{\xi }}_{i}(\beta ,\theta )\).

  • Step 3 Obtain \({\hat{\lambda }}\) by using the Newtons method to minimize (9) over \(\lambda \).

  • Step 4 Obtain \({\tilde{l}}(\beta ,\theta )\) defined by (9) based on \({\hat{\xi }}_{i}(\beta ,\theta )\) and \({\hat{\lambda }}\).

Next, we present some Monte Carlo experiments to compare the finite sample performance of EL with normal approximation that is based on the doubly robust and efficient method (DRE) in Ma and Zhu (2013). Throughout these simulations, we use the Epanechnikov kernel \(K(t)=3/4(1-t^{2})_{+}\) in all the nonparametric regression procedures and use the cross validation method to choose the optimal bandwidth \(h_{opt}\) satisfying Assumption 6. We consider the heteroscedastic partially single-index model (1). In our simulations, \((X_{1},\ldots , X_{p}, Z_{1},\ldots , Z_{r})^{\top }\) are generated from a multivariate normal distribution with mean 0 and covariance matrix \((\sigma _{ij})_{(p+r)\times (p+r)}\) where \(\sigma _{ij}=0.5^{|i-j|}\), and Y are generated from a normal distribution with mean \(X^{\top }\beta +\exp (Z^{\top }\theta )\), and variance function \(|Z^{\top }\theta |\). Furthermore, \(\beta =(1,0.5,-\,1,1,0.5,-\,1, \ldots )^{\top }\) and \(\theta =(1, 0.25, -\,0.25, 0.25, 0,\ldots ,0)^{\top }\). We use the dimensions \(p=10, 20, 30\) and the corresponding dimension \(r=10\).

In the first simulation, we consider the confidence interval for \(\beta _{1}\) constructed by using empirical likelihood and results for other parameters are similar and not presented here. For comparison, we also construct the confidence interval for \(\beta _{1}\) by using the doubly robust and efficient method in Ma and Zhu (2013). In each case we repeat the simulation 1000 times, and the nominal confidence level \(1-\alpha \) is taken to 0.95. The results are presented in Table 1. In addition, the column of time in Table 1 denote the computing cost of the simulation. For example, the value of 2.6 s in Table 1 means the average computing cost of each simulation, and the total computing cost of repeating the simulation 1000 times is \(2.6\times 1000=2600(s)\). This is also the case in Tables 2 and 3.

From Table 1, we have the following results:

  1. (1)

    We can see that the coverage probabilities (CP) for \(\beta _{1}\) based on the EL and the doubly robust and efficient method (DRE) method increase as the sample size n increases, and the coverage probabilities appear to be close to the nominal levels especially with moderate sample size. Certainly, the coverage probabilities based on the empirical likelihood can not be up to the nominal levels when the dimension p grows with the sample size n. According to Tsao (2004), the dimension will have non-ignorable effect on coverage probabilities of empirical likelihood confidence regions. Some methods may be used to improve coverage probabilities of empirical likelihood confidence regions/intervals under the high dimensional situation, such as Bartlett adjustment. How to improve it under the high dimensional situations will be investigated in our future communications.

  2. (2)

    The confidence intervals based on the empirical likelihood method consistently have better coverage probabilities than the intervals based on DRE, and the average length (AL) of the intervals based on the empirical likelihood is slightly shorter than that of the intervals based on DRE.

  3. (3)

    The average computing cost of the empirical likelihood method is less than the DRE method from Table 1. However, It must have a lot to do with computer configuration, and high performance computers can improve the computing cost. The outcomes in Table 1 are based on a normal computer.

Table 1 The coverage probability (CP) and average length (AL) for \(\beta _{1}\)

In the second simulation, we further consider confidence regions for \((\beta _{1},\beta _{2})\) constructed by EL and DRE where we show the coverage probability of the constructed regions. The results are presented in Table 2. Finally, we consider confidence regions for \((\beta ^{\top },\theta ^{\top })^{\top }\) constructed by using EL where we show the coverage of the constructed regions. For comparison, we also construct confidence regions for \((\beta ^{\top },\theta ^{\top })^{\top }\) by using DRE. In each case we repeat the simulation 1000 times. The results are presented in Table 3. Observing Tables 2 and 3, the message is similar as before and we can find that EL consistently achieves slightly higher coverage probabilities than DRE.

Table 2 Comparison of coverage probability for \((\beta _{1},\beta _{2})\) between EL method and DRE method
Table 3 Comparison of coverage probability for \((\beta ^{\top },\theta ^{\top })^{\top }\) between EL method and DRE method

5 Real data application

We further illustrate our proposed method by applying the heteroscedastic partially linear single index model to data from AIDS Clinical Trials Group Protocol 175 (ACTG175) which has been analyzed by Hammer et al. (1996), Davidian et al. (2005) and Lai and Wang (2014). CD4 is a co-receptor that assists the T cell receptor (TCR) with an antigen-presenting cell, and many HIV clinical trials focus on comparing treatment effects on CD4 count after a specified period. ACTG175 data concludes four antiretroviral regimens [zidovudine (ZDV) monotherapy, ZDV+didanosine (ddI), ZDV+zalcitabine, ddI] and randomizes 2139 patients to four antiretroviral regimens in equal proportions. The findings of ACTG 175 indicate that antiretroviral therapy could reduce the risk in people with intermediate stage HIV disease and no symptoms.

Table 4 95% confidence intervals based on EL and DRE, and estimators for \((\beta ^{\top },\theta ^{\top })^{\top }\)

In this paper, we will build a heteroscedastic partially linear single-index model for a subject’s response with ZDV monotherapy. There are 532 patients with CD4 cell counts between 200 and 500 per cubic millimeter. The response Y is the CD4 cell count at \(96\pm 5\) weeks (CD496). Because some CD496 are missing in the ACTG175 data, we consider a subset of the ACTG175 data representing 321 samples from subjects with CD496. The predictor variables are age, wtkg, hemo, homo, drugs, karnof, race, gender, str2, symptom, CD40, CD420, CD80 and CD820. We divide the discrete variables (hemo, homo, drugs, karnof , race, gender, str2, symptom) into the linear part and the standardized continuous variables (age,wtkg, CD80, CD820, CD40, CD420) into the single-index part. Furthermore, we standardize the response variable CD496, and build the following heteroscedastic partially linear single-index model

$$\begin{aligned} Y=X^{\top }\beta +g\left( Z^{\top }\theta \right) +\varepsilon , \end{aligned}$$

where \(X^{\top }=(\text{ hemo, } \text{ homo, } \text{ drugs, } \text{ karnof } \text{, } \text{ race, } \text{ gender, } \text{ str2, } \text{ symptom })\), \(Z^{\top }=(\text{ age,wtkg, } \text{ CD80, } \text{ CD820, } \text{ CD40,CD420 })\), \(Y=\text{ CD496 }\) and \(g(\cdot )\) is an unknown function. According to the plot of estimated errors in Lai and Wang (2014), we can find that \(\varepsilon \) is heteroscedastic. Using the method proposed in Sect. 2, we can obtain the estimators and confidence intervals for the parameters, and the results are summarized in Table 4. It is seen from Table 4 that the confidence intervals based on the empirical likelihood method are somewhat different from those of the doubly robust and efficient method. The doubly robust and efficient method gives larger intervals and imposes symmetry on the confidence intervals. Lai and Wang (2014) analyzed the ACTG175 data by using the variable selection method. They found that the effect of CD40 and CD420 were significant, and the corresponding variables were included in the final fitted model. From Table 4, we find that the confidence intervals based on the empirical likelihood method of CD40 and CD420 do not cover the zero point. However, the confidence intervals based on the doubly robust and efficient method of CD40 and CD420 cover the zero point, which mean that the effect of CD40 and CD420 are insignificant. In addition, according to the empirical likelihood method, the coefficient of antiretroviral history (str2), symptomatic indicator (symptom) and weight (wtkg) are significantly negative and the coefficient of age is significantly positive. These factors play an important role for four different antiretroviral regimens. We think that those findings corroborate the results in Lai and Wang (2014) very well.