1 Introduction

The least-squares approach for fitting a linear regression model provides a statistical technique for investigating the relationship between variables. Its simple structure and ease of interpretation have made it an attractive method for practitioners. Another important approach for fitting a linear model is the quantile regression method originated by Koenker and Bassett (1978). A collection of conditional quantiles can characterize the entire conditional distribution and capture the rich underlying relationship between the quantiles of response variable and covariates. But a major difficulty of quantile regression is that to obtain the asymptotic covariance matrix of estimators, we need an estimation of the regressor density, which is often cumbersome to obtain. This motivated Zhou et al. (2011) to bring together these two well-known techniques, and develop a coherent estimation framework that can be applied to a myriad of situations. Their simulation studies and real data analysis have shown that the least-squares estimator combined with auxiliary quantile information not only leads to a more efficient estimator, but also results in a relatively simple calculation of estimator’s standard error that does not require any density estimation. Based on the same idea, Liu and Ishfaq (2011) and Liu et al. (2011) considered the estimation of distribution function when auxiliary quantile information is available with complete data and missing data, respectively. The aim of this paper is to extend this idea to censored linear regression model.

The censored linear regression model, also referred to as the accelerated failure time (AFT) model, specifies that the logarithm of the failure time T is related to a \(p\times 1\) vector of covariate \(Z_i\) in the following way:

$$\begin{aligned} \log T_i=Z_i^{\tau }\beta _0+\epsilon _i, \quad i=1,\ldots ,n, \end{aligned}$$
(1)

where \(\beta _0\) is a \(p\times 1\) vector of unknown regression parameters and \(\epsilon _i\), \(i=1,\ldots ,n\), are independent and identically distributed with an unspecified common distribution function \(F_{\epsilon }\), but zero mean and finite variance. This model, as an alternative to the popular Cox model, has been studied extensively in literature, see, for instance, Buckley and James (1979), Koul et al. (1981), Lai and Ying (1991), Ritov (1990) and Wei (1992), among others. When the data are completely observed, Zhou et al. (2011) suggested to estimate \(\beta _0\) in (1) based on the following estimating function:

$$\begin{aligned} \psi (T,Z,\beta )=\left( \begin{array}{c} \psi _{(1)}(T,Z,\beta )\\ \psi _{(2)}(T,Z,\beta ) \end{array}\right) =\left( \begin{array}{c} Z(\log T-Z^{\tau }\beta )\\ Z(\frac{1}{2} - I(\log T-Z^{\tau }\beta \le 0)) \end{array}\right) . \end{aligned}$$
(2)

The first part of (2) is based on the normal equation of least squares and the second part of (2) is based on the auxiliary quantile information, using the assumption that the errors are symmetric or the median of the errors is zero. In general, if information corresponding to the \(\zeta \)-th quantile is known, then 1 / 2 in (2) may be replaced by \(\zeta \). In survival analysis, the survival time T is usually right censored by another variable C. The observed data are \((X_i,\varDelta _i,Z_i)\), \(i=1,2,\ldots ,n\), where \(X_i=\min (T_i,C_i)\), \(\varDelta _i=I(T_i\le C_i)\), and \(I(\cdot )\) is the indicator function. Assume that \(T_i\) and \(C_i\) have absolutely continuous survival function S(t) and K(t), respectively, and \(C_i\) is independent of \(T_i\) and \(Z_i\). In this paper, we will extend the results of Zhou et al. (2011) to right censored data case.

The rest of this paper is arranged as follows. A smoothing technique is introduced in Sect. 2. In Sect. 3, we propose an IPW estimating equation method to construct asymptotic unbiased estimating equations with right censored data. In Sect. 4, we propose two estimators based on EL and GMM methods, respectively, and we also show that the proposed two estimators are asymptotically normally distributed. Section 5 reports some simulation results and a real data example. A discussion is given in Sect. 6, and the proofs of the theorems are contained in the Appendix.

2 A smoothing technique for non-smooth EEs

Note that based on (2), we can construct 2p estimating equations (EEs), but we only have p unknown parameters. This is so-called over-determined case. Obviously, ordinary estimating methods are infeasible here. The common procedures equipped to handle the over-determined case are the generalized method of moments (GMM) given by Hansem (1982) and empirical likelihood (EL) method developed by Qin and Lawless (1994). However, the functions obtained from quantile information are non-differentiable in \(\beta \), since they are indicators.

The lack of smoothness of the objective equations can be handled by replacing them with smooth approximations because smoothness of the objective function is required for Taylor expansions, such as Chen and Hall (1993), Heller (2007) and Song et al. (2007). Similar to the smoothing technique developed in Zhou et al. (2011), the proposed smoothed estimating equations in this paper are also kernel based. Without loss of generality, we illustrate the idea by assuming \(p=1\). Here, we use the high-order kernel to smooth the estimating equations associated with quantiles as those in existing literature, for more details please refer to Zhou and Jing (2003). Consider a smooth kernel function \(l(\xi )\), then \(L(t)=\int ^{t}_{-\infty }l(u)\mathrm{d}u\) is also a smooth function, and \(b_n\rightarrow 0\), \(nb_n\rightarrow \infty \) as \(n\rightarrow \infty \). In practice, \(l(\xi )\) can be any smooth function, for example, \(l(\xi )\) may be an r-th order kernel function such that

$$\begin{aligned} \int \!{u}^jl(u)\mathrm{d}u=\left\{ \begin{array}{ll} 1, &{} \quad \text{ if }\ j=0,\\ 0, &{} \quad \text{ if }\ 1\le j\le r-1,\\ c_r\ne 0, &{} \quad \text{ if }\ j=r, \end{array}\right. \end{aligned}$$

for some integer \(r\ge 2\). The smoothed version of the second part of (2) comes out to be

$$\begin{aligned} \phi _{(2)}(T,Z,\beta )= & {} Z\left\{ \frac{1}{2} -\int I(\log T -Z^{\tau } \beta + b_n\xi \le 0)\mathrm{d}L(\xi )\right\} \nonumber \\= & {} Z \left\{ \frac{1}{2}-L\left( \frac{Z^{\tau } \beta -\log T}{b_n}\right) \right\} . \end{aligned}$$
(3)

From Zhou et al. (2011), we know that

(4)

3 Inverse probability-weighted EEs with censored data

With complete data, we can get the estimators of regression parameters along lines of Zhou et al. (2011) based on the smoothed estimating equations

$$\begin{aligned} \sum _{i=1}^n\phi (X_i,Z_i,\beta )=\sum _{i=1}^n\left( \begin{array}{c} \psi _{(1)}(X_i,Z_i,\beta )\\ \phi _{(2)}(X_i,Z_i,\beta ) \end{array}\right) =0. \end{aligned}$$
(5)

While, in the presence of censoring, (5) is no longer asymptotically unbiased estimating equations. Hence, we consider the modified estimating equations which are called inverse probability-weighted (IPW) estimating equations,

$$\begin{aligned} \sum _{i=1}^n\frac{\varDelta _i}{K(X_i)}\phi (X_i,Z_i,\beta )=0. \end{aligned}$$
(6)

The above idea of weighting the complete observations by their inverse probabilities was originated by Horvitz and Thompson (1952) in the context of sample surveys. The adaptation of this idea to the setting of censored survival data was initially considered by Koul et al. (1981), and later on by Robins and Rotnitzky (1992) and Lin and Ying (1993). Zhao and Tsiatis (1997) applied this idea to the problem of quality adjusted survival time. Recently, Bang and Tsiatis (2000), Lin (2000) and Bang and Tsiatis (2002) used this method to estimate medical costs. We find that (6) is an asymptotically unbiased estimating equation which is a consequence of the following equality:

$$\begin{aligned} E\left\{ \frac{\varDelta }{K(X)}\phi (X,Z,\beta )\right\}= & {} E\left\{ E(\varDelta |T,Z)\frac{1}{K(T)}\phi (T,Z,\beta )\right\} \\= & {} E\{\phi (T,Z,\beta )\}\approx 0. \end{aligned}$$

In practice, the survival function \(K(\cdot )\) is unknown. Here, we propose to estimate \(K(\cdot )\) by the Kaplan–Meier estimator (Kaplan and Meier 1958) with the roles of censoring time \(C_i\) and survival time \(T_i\) reversed. That is,

$$\begin{aligned} \widehat{K}(t)=\varPi _{u\le t}\left[ 1-\frac{\mathrm{d}N^c(u)}{Y(u)}\right] , \end{aligned}$$

where \(N^c(u)=\sum _{i=1}^n I(X_i\le u, \varDelta _i=0)\), \(Y(u)=\sum _{i=1}^n I(X_i\ge u)\). The simple weighted complete-case EEs come out to be:

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\frac{\varDelta _i}{\widehat{K}(X_i)}\phi (X_i,Z_i,\beta )=0. \end{aligned}$$
(7)

In the next section, we give the estimators of regression parameters based on the IPW estimating equations (7) by empirical likelihood and GMM methods.

4 Inference based on IPW EEs

4.1 Empirical likelihood

In this section, we construct an estimated empirical likelihood to make statistical inference on \(\beta \). For convenience, denote \(\varDelta _i/\widehat{K}(X_i)=V_{ni}\). Let \(p=(p_1,\ldots ,p_n)\), \(p_i\ge 0\) for all \(1\le i\le n\) with \(\sum _{i=1}^n p_i=1\). Define \(F_p\) to be the distribution function which assigns probability \(p_i\) to the point \(V_{ni}\phi (X_i,Z_i,\beta )\). The empirical likelihood is

$$\begin{aligned} L(\beta )=\varPi _{i=1}^np_i. \end{aligned}$$
(8)

We maximize (8) subject to restrictions

$$\begin{aligned} p_i\ge 0, \quad \sum _{i=1}^n p_i=1, \quad \sum _{i=1}^nV_{ni}\phi (X_i,Z_i,\beta )p_i=0. \end{aligned}$$
(9)

For any given \(\beta \), let the set \(\varOmega _{\beta }=\{\lambda :1+\lambda ^\tau V_{ni}\phi (X_i,Z_i,\beta )\ge 1/n\}\) be convex, closed and also bounded if the convex hull of \(V_{ni}\phi (X_i,Z_i,\beta )\) contains 0. By the Lagrange multiplier method, we have

$$\begin{aligned} p_i=\frac{1}{n}\frac{1}{1+\lambda ^{\tau } V_{ni}\phi (X_i,Z_i,\beta )}, \end{aligned}$$

where \(\lambda \) is the solution to

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n \frac{V_{ni}\phi (X_i,Z_i,\beta )}{1+\lambda ^{\tau } V_{ni}\phi (X_i,Z_i,\beta ) }=0. \end{aligned}$$
(10)

Note that \(\varPi _{i=1}^n p_i\) subject to \(\sum _{i=1}^n p_i=1\) and \(p_i\ge 0\), \(1\le i\le n\), attains its maximum value \(n^{-n}\) at \(p_i=n^{-1}\). Hence, we define the profile empirical likelihood ratio by

$$\begin{aligned} R(\beta )=\varPi _{i=1}^n(np_i)=\varPi _{i=1}^n\frac{1}{1+\lambda ^{\tau } V_{ni} \phi (X_i,Z_i,\beta )}. \end{aligned}$$

The log empirical likelihood ratio multiplied by \(-2\) is then given by

$$\begin{aligned} \mathcal{R}(\beta )=-2\log R(\beta )=2\sum _{i=1}^n \log \{1+\lambda ^{\tau } V_{ni}\phi (X_i,Z_i,\beta )\}. \end{aligned}$$
(11)

Let \(\widehat{\beta }_e\) be the MELE that results from minimizing \(\mathcal{R}(\beta )\), then we have:

Theorem 1

Let Assumptions 1–6 in the Appendix be satisfied. Then,

$$\begin{aligned} \sqrt{n}(\widehat{\beta }_e-\beta _0)\mathop {\rightarrow }\limits ^{\fancyscript{D}}N(0,V), \end{aligned}$$

where

$$\begin{aligned}&V=\varSigma _1A^{\tau }B^{-1}\varSigma B^{-1}A\varSigma _1,\\&\varSigma _1(\beta _0)=\left\{ A(\beta _0)^{\tau }B(\beta _0)^{-1}A(\beta _0)\right\} ^{-1},\\&A(\beta _0)=\frac{\partial E\psi (T,Z,\beta )}{\partial {\beta }}\Big |_{\beta _0},\\&B(\beta _0)=E\frac{\psi (T,Z,\beta )\psi (T,Z,\beta )^\tau }{K(T)}\Big |_{\beta _0}, \end{aligned}$$

and \(\varSigma \) is given in Lemma  1.

To use Theorem 1 to construct confidence interval for parameter \(\beta \), we have to estimate \(A(\beta )\) and \(B(\beta )\). Based on the results of Lemma 4, \(A(\beta )\) and \(B(\beta )\) can be estimated consistently by

$$\begin{aligned} A_n(\beta )=\frac{1}{n}\sum _{i=1}^n \frac{ \nabla _\beta \phi (X_i,Z_i,\beta )\varDelta _i}{\widehat{K}(X_i)}\quad \text{ and }\quad B_n(\beta )=\frac{1}{n}\sum _{i=1}^n\left\{ \frac{ \phi (X_i,Z_i,\beta )\varDelta _i}{\widehat{K}(X_i)}\right\} ^{\otimes 2}, \end{aligned}$$

respectively. Under some mild regular conditions, it can be shown that

$$\begin{aligned} \mathcal{R}(\beta )\!=\! \left[ \frac{1}{\sqrt{n}}\sum _{i=1}^n W_{ni}(\beta )\!\right] ^{\tau } \left[ \frac{1}{n}\sum _{i=1}^nW_{ni}(\beta )W_{ni}(\beta )^{\tau }\!\right] ^{-1} \left[ \!\frac{1}{\sqrt{n}}\sum _{i=1}^n W_{ni}(\beta )\!\right] \!+\!o_p(1),\nonumber \\ \end{aligned}$$
(12)

where \(W_{ni}(\beta )=V_{ni}\phi (X_i,Z_i,\beta )\). It can be readily shown that \(\mathcal{R}(\beta )\) converges in distribution to a weighted sum of Chi-square distributions, as stated in the following theorem.

Theorem 2

Let Assumptions 1–6 in the Appendix be satisfied. Then,

$$\begin{aligned} \mathcal{R}(\beta _0)\mathop {\rightarrow }\limits ^{\fancyscript{D}} \omega _1\chi _{1,1}^2+\cdots +\omega _q\chi _{1,q}^2, \end{aligned}$$
(13)

where the weights \(\omega _j,1\le j\le q,\) are the eigenvalues of \(B(\beta _0)^{-1}\varSigma (\beta _0),\) and \(\chi _{1,j}^2\) for \(1\le j\le q\) are independently distributed Chi-square variables with 1 degree of freedom.

Remark 1

We can give a modification of \(\mathcal{R}(\beta _0)\), let

$$\begin{aligned}&\eta _1(\beta _0)=\left[ \frac{1}{\sqrt{n}}\sum _{i=1}^n W_{ni}(\beta _0)\right] ^{\tau } \left[ \widehat{\varSigma }(\beta _0)\right] ^{-1} \left[ \frac{1}{\sqrt{n}}\sum _{i=1}^n W_{ni}(\beta _0)\right] \\&\eta _2(\beta _0)=\left[ \frac{1}{\sqrt{n}}\sum _{i=1}^n W_{ni}(\beta _0)\right] ^{\tau } \left[ \widehat{B}(\beta _0)\right] ^{-1} \left[ \frac{1}{\sqrt{n}}\sum _{i=1}^n W_{ni}(\beta _0)\right] \end{aligned}$$

and let

$$\begin{aligned} \widehat{\mathcal{R}}(\beta _0)=\widehat{\xi }(\beta _0)\mathcal{R}(\beta _0), \end{aligned}$$

where \(\widehat{\xi }(\beta _0)= \eta _1(\beta _0)/\eta _2(\beta _0)\) and \(\widehat{B}(\beta _0)=\frac{1}{n}\sum _{i=1}^{n}W_{ni}(\beta _0)W_{ni}(\beta _0)^{\tau }\). It can be shown \(\widehat{\mathcal{R}}(\beta _0)\) has the limiting Chi-square distribution with q degree of freedom.

Corollary 1

Let \(\beta ^{\tau } = (\beta _1^{\tau }, \beta _2^{\tau }),\) where \(\beta _1\) are \(q_1 \times 1\) vector and \(\beta _2\) are \( q_2 \times 1\) vectors. For \(H_0: \beta _1 = \beta _{1,0},\) the profile empirical likelihood test statistic is

$$\begin{aligned} \mathcal{R}_2 = \mathcal{R}(\beta _{1,0}, \tilde{\beta }_{2,0}) - \mathcal{R}(\widehat{\beta }_{1}, \widehat{\beta }_{2}), \end{aligned}$$

where \(\tilde{\beta }_{2,0}\) minimizes \(\mathcal{R}(\beta _{1,0}, \beta _2)\) with respect to \(\beta _2,\beta _{1,0}\) is the true value of \(\beta _1,\) and \(\widehat{\beta }_{e} = (\widehat{\beta }_1, \widehat{\beta }_2)\).

Under \(H_0,\)

$$\begin{aligned} \mathcal{R}_2 \rightarrow \rho _1 \chi _{1,1}^{2} +\cdots + \rho _{q_1} \chi _{1,q_1}^{2}, \end{aligned}$$

where the weights \(\rho _j,1 \le j \le q_1,\) are the eigenvalues of \(B_{2}(\beta _{1,0})\varSigma (\beta _{1,0}),\) \(\chi ^{2}_{1,j}\) for \(1 \le j \le q_1\) are independently distributed Chi-square variables with 1 degree of freedom,  and \(B_2\) is a positive definite matrix given in the Appendix.

4.2 Generalized method of moments

The GMM approach chooses parameter values such that

$$\begin{aligned} M_{w}=\left[ \frac{1}{n}\sum _{i=1}^n W_{ni}(\beta )\right] ^{\tau } W \left[ \frac{1}{n}\sum _{i=1}^n W_{ni}(\beta )\right] \end{aligned}$$
(14)

is minimized for some positive semi-definite symmetric weight matrix W. In practice, the unknown W is typically replaced by a consistent estimator \(\widehat{W}\). The resultant GMM estimator is then

$$\begin{aligned} \widehat{\beta }_{g}=\hbox {argmin}_{\beta }M_{\widehat{w}}(\beta ). \end{aligned}$$

Theorem 3

Let Assumptions 1–6 in the Appendix be satisfied. Then,

$$\begin{aligned} \sqrt{n}(\widehat{\beta }_{g}-\beta _0)\mathop {\longrightarrow }\limits ^{\fancyscript{D}}N(0,\varSigma _g), \end{aligned}$$
(15)

where \(\varSigma _g=\varSigma _2A^{\tau }W\varSigma WA\varSigma _2\) and \(\varSigma _2=\{A(\beta _0)^{\tau } WA(\beta _0)\}^{-1}.\)

The choice of W that leads to the most asymptotically efficient GMM estimator is the asymptotic covariance \(\varSigma ^{-1}\) defined in Theorem 1, which also results in a “sandwich” covariance \(\varSigma _2\). If we set \(W=B^{-1}\), which is defined in Sect. 4.1, it can be shown that the asymptotic covariance of the resultant GMM estimator coincides with the asymptotic covariance of the EL estimator.

5 Numerical studies

5.1 Simulations

In this section, we carry out simulation studies to evaluate the finite-sample performance of the GMM and EL procedures developed in this paper. The data are generated from the following censored linear regression model, which is similar to the model given in Zhou et al. (2011).

$$\begin{aligned} T=\beta _{1}Z_1+\beta _{2}Z_2+\varepsilon , \end{aligned}$$

where \(\beta _1\) = 1, \(\beta _2\) = 1, \(Z_1\sim \) Bernoulli distribution with success probability 0.5, \(Z_2\sim U[ 1, 3 ]\), \(\varepsilon \) is generated from the symmetric distribution \(\sqrt{2}/4 N(0, 1)+t_{3}/2\) and the censored variable \(C\sim U[ 0, 8.3 ]\) (for heavy censoring) and U[ 0, 25 ] (for light censoring), where C is independent of \(Z_1\), \(Z_2\) and \(\varepsilon \). \(X=\min ( T, C )\), \(\varDelta =I( T\le C )\), the corresponding unbiased estimating functions are

$$\begin{aligned} \psi (X,Z,\beta ) \!=\!\left( \begin{array}{c} \psi _1(X,Z,\beta ) \\ \psi _2(X,Z,\beta ) \\ \psi _3(X,Z,\beta ) \\ \psi _4(X,Z,\beta ) \end{array} \right) \!= \frac{\varDelta }{K(X)} \left( \begin{array}{c} Z_1(X-Z_1\beta _1-Z_2\beta _2) \\ Z_2(X-Z_1\beta _1-Z_2\beta _2) \\ Z_1[1/2-I(X-Z_1\beta _1-Z_2\beta _2 \le 0)] \\ Z_2[1/2-I(X-Z_1\beta _1-Z_2\beta _2 \le 0)] \end{array} \right) \nonumber \\ \end{aligned}$$
(16)

with \(E\psi _1(\cdot )=0\) and \(E\psi _2(\cdot )=0\) representing conditions from least squares, and \(E\psi _3(\cdot )=0\) and \(E\psi _4(\cdot )=0\) arising from the median regression. We use the second-order kernel, Gaussian kernel, \(l(u)=\exp (-u^{2}/2)/(2\pi )^{1/2}\) to smooth \(\psi _3(\cdot )\) and \(\psi _4(\cdot )\). Four estimators are examined, specifically, Koul et al. estimator (1981), least-squares (LS), GMM and EL estimators.

The estimator proposed by Koul et al. (1981) is

$$\begin{aligned} \widehat{\beta }_{K}=\left( \sum _{i=1}^{n}Z_iZ_{i}^{\tau }\right) ^{-1} \sum _{i=1}^{n}Z_iX_{i\widehat{K}}, \end{aligned}$$
(17)

where \(X_{i\widehat{K}}=\frac{\varDelta _iX_i}{1-\widehat{K}(X_i)}, i=1,\ldots ,n\), and \(1-\widehat{K}(\cdot )\) is the Kaplan–Meier estimator of the censoring distribution. It generalizes ordinary least-squares estimator to censored linear regression model. The LS estimator uses information only from \(\psi _1(\cdot )=0\) and \(\psi _2(\cdot )=0,\) while the GMM and EL estimators use information from all the four EEs. Since we use the second-order kernel l(u), Assumption 1 implies \(b=o(n^{-1/4})\). Such a bandwidth is of smaller order of magnitude than \(o(n^{-1/5})\) which is usually appropriate for minimizing error of curve estimator. Chen and Hall (1993) suggested choices of b in the range between \(n^{-1/2}\) and \(n^{-3/4}\) which generally provides quite good coverage accuracy. In our simulation study, we selected b through a rule of thumb proposed by Cui et al. (2002) (see also Fan and Yao 2003; Sepanski et al. 1994; Zhou et al. 2008; Zhou and Liang 2009, etc.) and suggested to set \(b=c\times \sigma \{\frac{\varDelta }{\widehat{K}(X)}(X-Z^{\tau }\beta )\}n^{-1/3}\), where \(\sigma \{X\}\) is the standard variance of X, and c is a suitable constant. We replace \(\beta \) with its LS estimator and set c to 1.5, 2, 2.5, 3, 3.5. Tables 1 and 2 report the simulation results with light censoring (about 10 %) and heavy censoring (about 30 %), respectively. Each experiment is based on 1000 replicated samples with sample size \(n=200\). The comparisons are in terms of the magnitude of bias in the estimators (BIAS), standard error of the estimators (SE), standard deviation (SD), coverage probability (COV) at the nominal confident level 95 % and the length of confidence interval (LEN) with the same confident level. The coverage probabilities of Koul et al., LS and GMM estimators were constructed using asymptotic normal distribution, while the coverage probability of EL estimator was constructed by the empirical likelihood method.

Table 1 Simulation results with 10 % censoring

From Tables 1 and 2, it can be seen that all the four estimators have very small biases, which implies they are asymptotic unbiased and consistent. Meanwhile, the SD (standard deviation) approximates SE (standard error) of the estimator well and coverage probability is close to the nominal confidence level 95 %. The choices of bandwidth have little influence on the results, and the proposed GMM and EL estimators perform better than Koul et al.  and LS estimators with smaller SE, SD and shorter LEN, since the prior make use of more information.

Table 2 Simulation results with 30 % censoring

In addition, comparing GMM with EL method, it seems GMM estimator has generally less SE and SD, while EL estimator has shorter LEN, especially when censoring rate increases. Besides, the results of Table 2 with heavy censoring rate are very similar as those in Table 1, which implies that the EL and GMM estimators still perform well although the censoring is heavier. The most interest of this paper is reducing SD and SE by proposed censored GMM and EL method, which indeed illustrated by Tables 1 and 2.

Finally, we compare the performance of the proposed GMM and EL estimators with Gehan and Logrank type of rank regression estimators and Buckley–James (B–J) type estimator, respectively. We do not need smoothed technique to use the Gehan estimator, Logrank estimator and B–J estimator. Results are shown in Table 3 below. Zhou (2005) mainly derived a test and a confidence interval based on the rank estimator (Gehan and Logrank type estimators) of regression coefficient in the accelerated failure model. Compared with proposed GMM and EL method from Tables 12 and 3, we can notice that bias of Gehan and Logrank estimators is obviously bigger, especially when censoring rate increases. Again we can find that GMM and EL methods generally perform well with less SD, SE and shorter confidence interval than Gehan, Logrank and Buckley–James type estimators. As we expect that we can improve the efficiency of estimators of the parameters in the AFT model by taking account into auxiliary quantile information, and that implies why the proposed estimators are better than the existed methods.

Table 3 Simulation results for other estimators in AFT model

5.2 A real data example

We illustrate the proposed estimating method with the Stanford Heart Transplant data. These data contain the survival times of 184 heart-transplanted patients with their ages at the time of first transplant and their T5 mismatch scores, and details can be seen in Miller and Halpern (1982). Out of these 184 patients, 27 patients did not have T5 scores. And of the remaining 157 patients, the survival times of 55 patients were censored. The cutoff date for the data was in February 1980. It is reasonable to believe that the censoring is dictated by administrative decisions. So, we can estimate the survival function of C by Kaplan–Meier estimator.

But in Miller and Halpern’s paper, T5 mismatch score was nonsignificant, so age was only considered in their further analysis. Moreover, 5 of the 157 patients’ survival times (T) were less than 10 days, so they were deleted to make \(\log _{10}{T}>0\). Similarly, in this paper, we use the same dataset with 152 patients, only consider the age covariate, and adopt the same model as in Miller and Halpern (1982), to compare our proposed estimator with theirs. The model is

$$\begin{aligned} \log _{10}{T}=\alpha + \beta _1 \hbox {Age} + \beta _2 \hbox {Age}^{2}+ \epsilon . \end{aligned}$$

Three different c are chosen for the bandwidth parameter, which result in three different smoothing bandwidths. The analysis results can be seen in Table 4.

Table 4 Stanford Heart Transplant data

Buckley–James estimator was given in Miller and Halpern (1982) and Gehan and Logrank type rank regression estimators were obtained by the R-codes given in Zhou (2005). As shown in Table 4, proposed GMM and EL estimators indeed have smaller SD than other estimators. Compared with Koul et al.  and LS estimator, GMM and EL estimators look more stable and the estimates are similar to Buckley–James estimator, Gehan estimator and Logrank estimator. Moreover, different choices of smoothing bandwidths have little effect on the results, especially for \(\hbox {Age}^{2}\) covariate.

6 Discussion

In this article, we proposed a method to estimate the parameters of interest in the AFT model by combining the quantile information with censored least-squares normal equations in the estimating equations. The proposed method is based on the EL and GMM methods, and estimators obtained both have smaller standard error and standard deviation than other estimators such as Koul et al.  and LS estimators, which are illustrated in the simulation studies. And their asymptotic properties were studied under some regular conditions. However, there are some problems which need further study. For example, both of the referees ask whether some practical guideline can be used to choose the bandwidth in the kernel smoothing procedure. There is not a standard method by now as we know, especially for the right censored data. In this paper, the bandwidth is chosen according to the thumb rule, and set \(b=c\times \sigma \{\frac{\varDelta }{\widehat{K}(X)}(X-Z^{\tau }\beta )\}n^{-1/3}\), where \(\sigma \{X\}\) is the standard variance of X. Actually, different c in a large range of possible choices affect little for the results, which can be seen from Tables 1 and 2 of the paper. Besides, the choice of the optimal bandwidth may vary for different datasets, which is a difficult but interesting question, and deserves further study.

7 Appendix

In this section, we will present the proofs of Theorems 13. First, we need some assumptions and symbols. Let \(\Vert \cdot \Vert \) denote Euclidean norm, \(a^{\otimes 2}=aa^{\tau }\), and \(O_p(\cdot )\) denote bound in probability. Assume that \(\beta \in \varTheta \), where \(\varTheta \) is a tight space.

Assumptions:

  1. 1.

    The selected bandwidth b satisfies the conditions: \(b\rightarrow 0\), \(nb\rightarrow \infty \) and \(nb^{2r}\rightarrow 0\).

  2. 2.

    L(x) is the r-th kernel distribution function such that \(\int |x|^r\mathrm{d}L(x)<\infty \).

  3. 3.

    \(\tau _s\le \tau _k\), where \(\tau _{s}=\sup \{x:S(x)>0\}\), \(\tau _{k}=\sup \{x:K(x)>0\}\), and

    $$\begin{aligned} \int _0^{\tau _{s}}\frac{\psi (u,z,\beta )^{\otimes 2}}{K(u)}\mathrm{d}F_z(u)<\infty . \end{aligned}$$
  4. 4.

    \(Q(\beta )=E_z\psi (T,Z,\beta )\) is r-th continuously differentiable in the neighborhood of \(\beta _0\), the rank of \(\partial {Q(\beta )}/{\partial {\beta }}\) is identical to the dimension of parameter \(\beta \), \(\parallel \partial {Q(\beta )}/{\partial {\beta }}\parallel \) and \(\parallel \psi (u,z,\beta )\parallel ^3/K^2(u)\) is bounded by some integrable function G(u) in some neighborhood of \(\beta _0\).

  5. 5.

    Matrix \(B(\beta _0)\) is positive definite.

  6. 6.

    \(Q(\beta )\) satisfies the Lipschitz condition in some neighborhood of \(\beta _0\), that is \(\parallel E_z\{\psi (T,Z,\beta )-\psi (T,Z,\beta _0)\}^{\otimes 2}\parallel =O(\parallel \beta -\beta _0\parallel )\) in some neighborhood of \(\beta _0\).

Assumptions 1 and 2 are common assumptions used in nonparametric studies, while Assumption 3 is often seen in studies of censored survival data and Assumptions 4–5 are used in empirical likelihood (Qin and Lawless 1994). Assumption 6 can be easily satisfied in many occasions. Note that we only need to smooth the second part of (2). So, in the proof of Lemmas 3, 4, \(\phi =\phi _{(2)}\), \(\psi =\psi _{(2)}\).

Lemma 1

Suppose that the Assumption 3 is satisfied. Then,

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum _{i=1}^n\frac{\varDelta _i}{\widehat{K}(X_i)}\psi (X_i,Z_i,\beta _0)\mathop {\rightarrow }\limits ^{\fancyscript{D}}N(\mathbf{0},\varSigma (\beta _0)), \end{aligned}$$
(18)

where \(\varSigma (\beta _0)=(\sigma _{lk}(\beta _0))_{l,k=1,\ldots ,q}\) is the covariance matrix with

$$\begin{aligned} \sigma _{lk}(\beta _0))= & {} \text{ E }[\psi _l(T,Z,\beta _0)\psi _k(T,Z,\beta _0)]\nonumber \\&+E\left[ \int _0^{\tau ^*} H_l(T,Z,\beta _0,u)H_k(T,Z,\beta _0,u)I(T\ge u) \frac{\lambda ^c(u)}{K(u)}\,\mathrm{d}u\right] ,\qquad \end{aligned}$$
(19)

where

$$\begin{aligned}&H_k(T,Z,\beta _0,u)=\psi _k(T,Z,\beta _0)-G_k(\beta _0,u),\\&G_k(\beta _0,u)=\frac{1}{S(u)}E[\psi _k(T,Z,\beta _0)I(T\ge u)], \end{aligned}$$

\(\psi _k(T,Z,\beta )\) is the kth element of \(\psi (T,Z,\beta )\).

Proof

The proof may be constructed along the lines of Bang and Tsiatis (2000). \(\square \)

Lemma 2

Suppose that Assumptions 1–3 and 6 are satisfied. Then,

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum _{i=1}^n \frac{\varDelta _i}{\widehat{K}(X_i)}\phi (X_i,Z_i,\beta _0) \mathop {\longrightarrow }\limits ^{\fancyscript{D}}N(0,\varSigma (\beta _0)), \end{aligned}$$

where \(\varSigma (\beta _0)\) is given in Lemma  1.

Proof

We only need to show that

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum _{i=1}^n \frac{\varDelta _i}{\widehat{K}(X_i)}\phi (X_i,Z_i,\beta _0) =\frac{1}{\sqrt{n}}\sum _{i=1}^n \frac{\varDelta _i}{\widehat{K}(X_i)}\psi (X_i,Z_i,\beta _0)+o_{p}(1). \end{aligned}$$

In fact,

$$\begin{aligned}&\frac{1}{\sqrt{n}}\sum _{i=1}^n \frac{ \phi (X_i,Z_i,\beta _0)\varDelta _i}{\widehat{K}(X_i)} =\frac{1}{\sqrt{n}}\sum _{i=1}^n\frac{\psi (X_i,Z_i,\beta _0)\varDelta _i}{\widehat{K}(X_i)}\\&\qquad +\frac{1}{\sqrt{n}}\sum _{i=1}^n\frac{\left[ \phi (X_i,Z_i,\beta _0)-\psi (X_i,Z_i,\beta _0)\right] \varDelta _i}{\widehat{K}(X_i)}\\&\quad \triangleq \frac{1}{\sqrt{n}}\sum _{i=1}^n\frac{\psi (X_i,Z_i,\beta _0)\varDelta _i}{\widehat{K}(X_i)}+J_{1}. \end{aligned}$$

Next, we will proof \(J_{1}=o_{p}(1)\). Similar to the argument of Bang and Tsiatis (2000),

$$\begin{aligned} J_{1}= & {} \frac{1}{\sqrt{n}}\sum _{i=1}^n\left[ \phi (T_i,Z_i,\beta _0)-\psi (T_i,Z_i,\beta _0)\right] \\&-\frac{1}{\sqrt{n}}\sum _{i=1}^n\int _0^{\tau ^*} [\tilde{\phi }(T_i,Z_i,\beta _0)-\tilde{G}(\beta _0,u)] \frac{\mathrm{d}\mathcal {M}_i^c(u)}{K(u)}+o_{p}(1)\\\triangleq & {} I_{1}+I_{2}, \end{aligned}$$

where \(\tilde{\phi }(T_i,Z_i,\beta _0)=\phi (X_i,Z_i,\beta _0)-\psi (X_i,Z_i,\beta _0)\), the definition of \(\tilde{G}(\beta _0,u)\) similar to the \(G_k(\beta _0,u)\) in Theorem 1, \(\mathcal {M}_i^c(u)\) is a martingale (More details can be seen in Bang and Tsiatis 2000, p 332). It can be shown that

$$\begin{aligned} EI_{1}= & {} \frac{1}{\sqrt{n}}\sum _{i=1}^nE\left[ \phi (T_i,Z_i,\beta _0)-\psi (T_i,Z_i,\beta _0)\right] \nonumber \\= & {} \sqrt{n}E\left[ \phi (T,Z,\beta _0)-\psi (T,Z,\beta _0)\right] \nonumber \\= & {} O_{p}\left( (nb^{2r})^{\frac{1}{2}}\right) =o_{p}(1). \end{aligned}$$
(20)

In addition, note that for any constant vector \(\alpha \),

$$\begin{aligned} \hbox {Var}[\alpha ^{\tau }I_{1}]\le & {} E\left\{ \alpha ^{\tau }\left[ \phi (T,Z,\beta _0)-\psi (T,Z,\beta _0)\right] \right\} ^{2}\\= & {} E\left\{ \int [\alpha ^{\tau }\psi (T,Z^{\tau }\beta _0-b\xi ) -\alpha ^{\tau }\psi (T,Z,\beta _0)]\mathrm{d}L(\xi ) \right\} ^{2}\\\le & {} \int E \left\{ \alpha ^{\tau }\psi (T,Z^{\tau }\beta _0-b\xi ) -\alpha ^{\tau }\psi (T,Z,\beta _0)\right\} ^{ 2}\mathrm{d}L(\xi ). \end{aligned}$$

From assumption of \(F_z(\cdot )\) and assumption of \(L(\cdot )\), we have

$$\begin{aligned} \hbox {Var}[\alpha ^{\tau }I_{1}]=O_{p}(b). \end{aligned}$$
(21)

By (20) and (21) we can get \(I_{1}=o_{p}(1)\). Now, we consider \(I_{2}\).

By the property of martingale, we have \(EI_{2}=0\) and the kth diagonal element of variance of \(I_{2}\) is given by

$$\begin{aligned} E(I_{2})_{kk}^2= & {} E\int _0^{\tau }[\tilde{\phi }_k(T,Z,\beta _0)-\tilde{G}_{k}(\beta _0,u)]^{2} I(T\ge u)\frac{\lambda ^c(u)}{K(u)} \mathrm{d}u\\\le & {} \int _0^{\tau }[E\tilde{\phi }^2_k(T,Z,\beta _0)+\tilde{G}^{2}_{k}(\beta _0,u)S(u)] \frac{\lambda ^c(u)}{K(u)} \mathrm{d}u. \end{aligned}$$

By the results established for \(I_1\), we know that

$$\begin{aligned} \begin{aligned} E\tilde{\phi }^2_k(T,Z,\beta _0)&=o_{p}(1),\\ \tilde{G}^{2}_{k}(\beta _0,u)S(u)&=1/S(u)\left\{ E\tilde{\phi }_k(T,Z,\beta _0)I(T\ge u)\right\} ^{2}\\&\le 1/S(u)E\left\{ \tilde{\phi }_k(T,Z,\beta _0)I(T\ge u)\right\} ^{2}. \end{aligned} \end{aligned}$$
(22)

Using (22), we have

$$\begin{aligned} \tilde{G}^{2}_{k}(\beta _0,u)S(u)=o_{p}(1). \end{aligned}$$
(23)

By (22) and (23) we have that \(I_{2}=o_{p}(1)\). Combining this with \(I_{1}=o_{p}(1)\), we complete the proof of Lemma 2. \(\square \)

Lemma 3

Suppose that Assumptions 1–3 and 6 are satisfied. Then,

$$\begin{aligned} E\left\{ \frac{\phi (T,Z,\beta )^{\otimes 2}}{K(T)} \right\} =E\left\{ \frac{\psi (T,Z,\beta )^{\otimes 2}}{K(T)}\right\} +o(1). \end{aligned}$$

Proof

$$\begin{aligned} E\left\{ \frac{\psi (T,Z,\beta )^{\otimes 2}}{K(T)}\right\}= & {} E \frac{\left\{ Z(\frac{1}{2}-I(\log T-Z^{\tau }\beta \le 0)\right\} ^{\otimes 2}}{K(T)}=\frac{1}{4}E\frac{Z^{\otimes 2}}{K(T)},\\ E\left\{ \frac{\phi (T,Z,\beta )^{\otimes 2}}{K(T)}\right\}= & {} E\left\{ \frac{1}{K(T)}\left\{ \int _{-\infty }^{+\infty }\psi (T,Z,\beta -bw)\mathrm{d}L(w)\right\} ^{\otimes 2}\right\} \\= & {} \int \int E\frac{Z^{\otimes 2}}{K(T)}\left( \frac{1}{2}-I(\log T-Z^{\tau }\beta +bu\le 0)\right) \\&\times \left( \frac{1}{2}-I(\log T-Z^{\tau }\beta +bv\le 0)\right) \mathrm{d}L(u)\mathrm{d}L(v)\\= & {} \frac{1}{4}E\frac{Z^{\otimes 2}}{K(T)}+o(1). \end{aligned}$$

\(\square \)

Lemma 4

Suppose that Assumptions 1–3 and 6 are satisfied. Then,

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\left\{ \frac{ \phi (X_i,Z_i,\beta )\varDelta _i}{\widehat{K}(X_i)}\right\} ^{\otimes 2}= & {} E\left\{ \frac{\psi (T,Z,\beta )^{\otimes 2}}{K(T)}\right\} +o_p(1),\nonumber \\ \frac{1}{n}\sum _{i=1}^n \frac{\nabla _\beta \phi (X_i,Z_i,\beta )\varDelta _i}{\widehat{K}(X_i)}= & {} \nabla _\beta E\psi (T,Z,\beta )+o_p(1). \end{aligned}$$
(24)

Proof

First, we will prove that

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\left\{ \frac{ \phi (X_i,Z_i,\beta )\varDelta _i}{\widehat{K}(X_i)}\right\} ^{\otimes 2}=O_p(1). \end{aligned}$$
(25)

Note that

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\left\{ \frac{ \phi (X_i,Z_i,\beta )\varDelta _i}{\widehat{K}(X_i)}\right\} ^{\otimes 2}\le & {} \frac{2}{n}\sum _{i=1}^n\left\{ \frac{\phi (X_i,Z_i,\beta )\varDelta _i}{K(X_i)}\right\} ^{\otimes 2}\nonumber \\&+\frac{2}{n}\sum _{i=1}^n\left\{ \frac{K(X_i)-\widehat{K}(X_i)}{\widehat{K}(X_i)K(X_i)}\right\} ^2 \phi (X_i,Z_i,\beta )^{\otimes 2}\varDelta _i^2\nonumber \\\triangleq & {} J_1+J_2. \end{aligned}$$
(26)

By the law of large number, we have

$$\begin{aligned} J_1=2E\left\{ \frac{\phi (T,Z,\beta )^{\otimes 2}}{K(T)}\right\} +o_p(1). \end{aligned}$$
(27)

In addition,

$$\begin{aligned} J_2\le \sup _{0\le x\le X_{(n)}}\biggl |\frac{K(x)-\widehat{K}(x)}{\widehat{K}(x)}\biggr |^2 \frac{2}{n}\sum _{i=1}^n\left\{ \frac{ \phi (X_i,Z_i,\beta )\varDelta _i}{K(X_i)}\right\} ^{\otimes 2}. \end{aligned}$$

Using the fact of Zhou (1991)

$$\begin{aligned} \sup _{0\le x\le X_{(n)}}\biggl |\frac{K(x)-\widehat{K}(x)}{\widehat{K}(x)}\biggr |=O_p(1), \end{aligned}$$

we have

$$\begin{aligned} J_{2}=O_{p}(J_{1}). \end{aligned}$$
(28)

By (26), (27), (28) and Lemma 3, we get (25).

Similarly to (27), we can get

$$\begin{aligned}&\frac{1}{n}\sum _{i=1}^n\left\{ \frac{\phi (X_i,Z_i,\beta )\varDelta _i}{\widehat{K}(X_i)}\right\} ^{\otimes 2} =\frac{1}{n}\sum _{i=1}^n\left\{ \frac{\phi (X_i,Z_i,\beta )\varDelta _i}{K(X_i)}\right\} ^{\otimes 2}\nonumber \\&\qquad +\frac{1}{n}\sum _{i=1}^n\left\{ \frac{K(X_i)-\widehat{K}(X_i)}{\widehat{K}(X_i)K(X_i)}\right\} ^2 \phi (X_i,Z_i,\beta )^{\otimes 2}\varDelta _i^2\nonumber \\&\qquad +\frac{2}{n}\sum _{i=1}^n\frac{K(X_i)-\widehat{K}(X_i)}{K^2(X_i)\widehat{K}(X_i)} \phi (X_i,Z_i,\beta )^{\otimes 2}\varDelta _i^2\nonumber \\&\quad \triangleq I_1+I_2+I_3. \end{aligned}$$
(29)

By the law of large number and Lemma 3, we have

$$\begin{aligned} I_1=E\left\{ \frac{\psi (T,Z,\beta )^{\otimes 2}}{K(T)}\right\} +o_p(1). \end{aligned}$$
(30)

As for \(I_2\), notice that

$$\begin{aligned} I_2\le \sup _{0\le x\le X_{(n)}}\biggl |\frac{K(x)-\widehat{K}(x)}{K(x)}\biggr |^2 \frac{2}{n}\sum _{i=1}^n\left\{ \frac{ \phi (X_i,Z_i,\beta )\varDelta _i}{\widehat{K}(X_i)}\right\} ^{\otimes 2}. \end{aligned}$$

Using the fact of Gill (1980, p 37)

$$\begin{aligned} \sup _{0\le x\le X_{(n)}}\biggl |\frac{K(x)-\widehat{K}(x)}{K(x)}\biggr |=o_p(1), \end{aligned}$$

and (25) we get \(I_{2}=o_{p}(1)\).

Now, we consider the third part \(I_{3}\)

$$\begin{aligned} I_3\le & {} \sup _{0\le x\le X_{(n)}}\biggl |\frac{K(x)-\widehat{K}(x)}{K(x)}\biggr | \frac{2}{n}\sum _{i=1}^n\frac{\{\phi (X_i,Z_i,\beta )\varDelta _i\}^{\otimes 2}}{\widehat{K}(X_i)K(X_i)}, \end{aligned}$$

and

$$\begin{aligned}&\frac{2}{n}\sum _{i=1}^n\frac{\{ \phi (X_i,Z_i,\beta )\varDelta _i\}^{\otimes 2}}{\widehat{K}(X_i)K(X_i)} \\&\quad =\frac{2}{n}\sum _{i=1}^n\left\{ \frac{ \phi (X_i,Z_i,\beta )\varDelta _i}{ K(X_i)}\right\} ^{\otimes 2} +\frac{2}{n}\sum _{i=1}^n\frac{\{K(X_i)-\widehat{K}(X_i)\} \phi (X_i,Z_i,\beta )^{\otimes 2}\varDelta _i^2}{\widehat{K}(X_i)K^2(X_i-)} \\&\quad \le \frac{2}{n}\sum _{i=1}^n\left\{ \frac{ \phi (X_i,Z_i,\beta )\varDelta _i}{ K(X_i)}\right\} ^{\otimes 2} + \sup _{0\le x\le X_{(n)}}\biggl |\frac{K(x)- \widehat{K}(x)}{\widehat{K}(x)}\biggr |\frac{2}{n}\sum _{i=1}^n \left\{ \frac{ \phi (X_i,Z_i,\beta )\varDelta _i}{ K(X_i)}\right\} ^{\otimes 2}. \end{aligned}$$

Using (21), similarly to the proof of \(J_2\), we have that \(I_3=o_p(1)\). So, we complete the proof the first result of Lemma 4.

Now, we will consider the second result of Lemma 4, note that

$$\begin{aligned}&\frac{1}{n}\sum _{i=1}^n\left\{ \nabla _{\beta }\phi (X_i,Z_i,\beta )\frac{\varDelta _i}{\widehat{K}(X_i)}\right\} =\frac{1}{n}\sum _{i=1}^n\left\{ \frac{\nabla _{\beta }\phi (X_i,Z_i,\beta )\varDelta _i}{K(X_i)}\right\} \nonumber \\&\qquad +\frac{1}{n}\sum _{i=1}^n\left\{ \frac{K(X_i)-\widehat{K}(X_i)}{\widehat{K}(X_i)K(X_i)}\right\} \nabla _{\beta }\phi (X_i,Z_i,\beta )\varDelta _i \triangleq K_1+K_2, \end{aligned}$$
(31)

we have

$$\begin{aligned}&E\left\{ \frac{\nabla _{\beta }\phi (X_i,Z_i,\beta )\varDelta _i}{K(X_i)}\right\} =E[\nabla _{\beta }\phi (T,Z,\beta )]\\&\quad =\frac{\partial }{\partial {\beta }}\int _{-\infty }^{+\infty } \rho (Z^{\tau }\beta -bw)\mathrm{d}L(w) =\nabla _{\beta }\rho (\beta )+o(1), \end{aligned}$$

where \(\rho (\beta )\equiv E\psi (T,Z^{\tau }\beta )\). So, by the law of large number,

$$\begin{aligned} K_1=\nabla _{\beta }\rho (\beta )+o_p(1). \end{aligned}$$
(32)

Using the fact of Gill (1980, p 37) again

$$\begin{aligned} |K_2| \le \sup _{0\le x\le X_{(n)}}\biggl |\frac{K(x)-\widehat{K}(x)}{K(x)}\biggr | \frac{1}{n}\sum _{i=1}^n\left| \frac{ \nabla _{\beta }\phi (X_i,Z_i,\beta )\varDelta _i}{\widehat{K}(X_i)}\right| =o_p(1). \end{aligned}$$
(33)

Combining (32) and (33), we complete the proof of the second result of Lemma 4.   \(\square \)

Lemma 5

Suppose that Assumptions 1–3 and 6 are satisfied,  then for any \(\beta \) on \(\{\beta : ||\beta -\beta _0||\le cn^{-\varrho }\}\) where \(1/3<\varrho <1/2,c\) is some constant,  we have

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^nV_{ni}\phi (X_i,Z_i,\beta ) =\frac{1}{n}\sum _{i=1}^nV_{ni}\phi (X_i,Z_i,\beta _0)+O_p(n^{-\rho }). \end{aligned}$$
(34)

Proof

We can get the result only by a Taylor expansion. \(\square \)

Lemma 6

Suppose that Assumptions 1–6 are satisfied,  then \(\lambda (\beta )=O_p(n^{-\varrho })\) uniformly on \(\{\beta : ||\beta -\beta _0||\le cn^{-\varrho }\}\) where \(1/3<\varrho <1/2,c\) is some constant,  and

$$\begin{aligned} \lambda (\beta )= \left[ \frac{1}{n}\sum _{i=1}^nW_{ni}(\beta )W_{ni}(\beta )^{\tau }\right] ^{-1} \left[ \frac{1}{n}\sum _{i=1}^n W_{ni}(\beta )\right] +o_p(n^{-\varrho }), \end{aligned}$$
(35)

uniformly on \(\{\beta : ||\beta -\beta _0||\le cn^{-\varrho }\},\) where \(\lambda (\beta )\) satisfies (10).

Proof

By assumptions and the proof of Lemma 3 in Owen (1990), \(\max _{1\le i\le n}|V^{(0)}_{ni}\phi (T_i,Z_i,\beta )|=o_p(n^{1/3})\), where \(V^{(0)}_{ni}=\varDelta _i/K(X_i)\), so we have:

$$\begin{aligned} \max _{1\le i\le n}| W_{ni}(\beta )|\le & {} \max _{1\le i\le n}| V_{ni}^{(0)}\phi (T_i,Z_i,\beta )|\!+\! \max _{1\le i\le n}\biggl |\frac{\varDelta _i(\widehat{K}(T_i)\!-\!K(T_i)) \phi (T_i,Z_i,\beta )}{\widehat{K}(T_i)K(T_i)}\biggr |\\\le & {} o_p(n^{1/3})+\sup _{0\le x\le X_{(n)}}\biggl |\frac{\widehat{K}(x)-K(x)}{\widehat{K}(x)}\biggr | \max _{1\le i\le n}| V_{ni}^{(0)}\phi (T_i,Z_i,\beta )|, \end{aligned}$$

where \(X_{(n)}\) is the largest order statistic. Using the following equality (Zhou 1991),

$$\begin{aligned} \sup _{0\le x\le X_{(n)}}\biggl |\frac{K(x)-\widehat{K}(x)}{\widehat{K}(x)}\biggr |=O_p(1), \end{aligned}$$

we have

$$\begin{aligned} \max _{1\le i\le n}|W_{ni}(\beta )|=o_p(n^{1/3}). \end{aligned}$$
(36)

Using Lemma 5, similar to the proof of Lemma 3 in Owen (1990), we have \( \lambda (\beta )=O_p(n^{-\varrho })\). Using Eq. (10),

$$\begin{aligned} 0=\frac{1}{n}\sum _{i=1}^nW_{ni}(\beta )\left( 1-Y_i+\frac{Y_i^2}{1+Y_i}\right) , \end{aligned}$$

where \(Y_i=\lambda (\beta )^{\tau }W_{ni}(\beta )\), and

$$\begin{aligned} \max _{1\le i\le n}|Y_{i}|\le \Vert \lambda (\beta )\Vert \max _{1\le i\le n}|W_{ni}(\beta )|=o_p(1). \end{aligned}$$

So we have

$$\begin{aligned} \lambda (\beta )= \left[ \frac{1}{n}\sum _{i=1}^nW_{ni}(\beta )W_{ni}(\beta )^{\tau }\right] ^{-1} \left[ \frac{1}{n}\sum _{i=1}^n W_{ni}(\beta )\right] +\beta _n, \end{aligned}$$

where

$$\begin{aligned} \Vert \beta _n\Vert \le \frac{1}{n}\sum _{i=1}^n\parallel W_{ni}(\beta )\parallel ^{3}\parallel \lambda (\beta )\parallel ^{2} \mid 1+Y_i\mid ^{-1}=o_{p}(n^{-\varrho }). \end{aligned}$$

Thus, we complete the proof of Lemma 6. \(\square \)

Lemma 7

Suppose that Assumptions 1–6 are hold,  then,  as \(n\rightarrow \infty ,\) with probability 1 \(\mathcal{R}(\beta )\) attains its minimum value at some point \(\widehat{\beta }_{e}\) in the interior of the ball \(\Vert \beta -\beta _0\Vert \le cn^{-\varrho },\) with \(\widehat{\beta }_{e}\) and \(\widehat{\lambda }=\lambda (\widehat{\beta }_e)\) satisfying

$$\begin{aligned} Q_{1n}(\widehat{\beta }_e,\widehat{\lambda })=0,\quad Q_{2n}(\widehat{\beta }_e,\widehat{\lambda })=0, \end{aligned}$$

where

$$\begin{aligned}&Q_{1n}( {\beta }, {\lambda }) =\frac{1}{n}\sum _{i=1}^n \frac{V_{ni}\phi (T_i,Z_i,\beta )}{1+\lambda ^{\tau }V_{ni}\phi (T_i,Z_i,\beta )},\\&Q_{2n}( {\beta }, {\lambda }) =\frac{1}{n}\sum _{i=1}^n\frac{\lambda ^{\tau }\nabla _{\beta }V_{ni}\phi (Y_i,X_i,\beta )}{1+\lambda ^{\tau }V_{ni}\phi (T_i,Z_i,\beta )}. \end{aligned}$$

Proof

Similar to the proof of Lemma 1 in Qin and Lawless (1994). \(\square \)

Proof of Theorem 1

Given Lemmas 17, the proof of Theorem 1 can be constructed along lines of Theorem 1 of Qin and Lawless (1994). Here, we only give a sketch of the proof. It is easy to show that

$$\begin{aligned} \left( \begin{array}{c} \tilde{\lambda } \\ \tilde{\beta }_e-\beta _0 \end{array} \right) =S_n^{-1}\left( \begin{array}{c} -Q_{1n}(\beta _0,0)+o_p(\delta _n) \\ o_p(\delta _n) \end{array} \right) , \end{aligned}$$
(37)

where \(\delta _n=\Vert \widehat{\beta }_e-\beta _0\Vert +\Vert \widehat{\lambda }\Vert \) and

$$\begin{aligned} S_n=\left( \begin{array}{c@{\quad }c} \nabla _{\lambda } Q_{1n} &{} \nabla _{\beta }Q_{1n} \\ \nabla _{\lambda } Q_{2n} &{} 0 \end{array} \right) _{(\beta _0,0)} \rightarrow \left( \begin{array}{c@{\quad }c} -B &{} A \\ A^{\tau } &{} 0 \end{array} \right) . \end{aligned}$$

From Lemma 2, we have \(Q_{1n}(\beta _0,0)=\frac{1}{n}\sum _{i=1}^nV_{ni}\phi (T_i,Z_i,\beta _0) =O_p(n^{-1/2})\). So, we know that \(\delta _n=O_p(n^{-1/2})\). Easily we have

$$\begin{aligned} \sqrt{n}(\widehat{\beta }_e-\beta _0)=S_{22.1}^{-1}A^{\tau }B^{-1} \sqrt{n}Q_{1n}(\theta _0,0) +o_p(1)\mathop {\longrightarrow }\limits ^{\fancyscript{D}} N(0,V). \end{aligned}$$
(38)

\(\square \)

Proof of Theorem 2

The log empirical likelihood ratio multiplied by \(-2\) is given by

$$\begin{aligned} \mathcal{R}(\beta )= & {} 2\sum _{i=1}^n\log \{1+\lambda (\beta )^{\tau }W_{ni}(\beta )\}\nonumber \\= & {} 2\sum _{i=1}^n\left\{ \lambda (\beta )^{\tau }W_{ni}(\beta )- \frac{1}{2}(\lambda (\beta )^{\tau }W_{ni}(\beta ))^{2}\right\} +\gamma _{n}\nonumber \\= & {} \left[ \frac{1}{\sqrt{n}}\sum _{i=1}^n W_{ni}(\beta )\right] ^{\tau } \left[ \frac{1}{n}\sum _{i=1}^nW_{ni}(\beta )W_{ni}(\beta )^{\tau }\right] ^{-1} \left[ \frac{1}{\sqrt{n}}\sum _{i=1}^n W_{ni}(\beta )\right] +\gamma _{n},\nonumber \\ \end{aligned}$$
(39)

where

$$\begin{aligned} |\gamma _{n}|\le \Vert \lambda (\beta )\Vert ^3\sum _{i=1}^n\Vert W_{ni}(\beta )\Vert ^3 =o_p(1). \end{aligned}$$

From Lemma 2, we have

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum _{i=1}^nW_{ni}(\beta _0) \mathop {\rightarrow }\limits ^{\mathcal{D}} N(\mathbf{0},\varSigma (\beta _0)). \end{aligned}$$
(40)

In addition, by the virtue of Lemma 4,

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^nW_{ni}(\beta _0)W_{ni}(\beta _0)^{\tau } \mathop {\rightarrow }\limits ^{{p}}B. \end{aligned}$$
(41)

So using (39), (40), (41), we complete the proof of Theorem 2. \(\square \)

Proof of Corollary 1

$$\begin{aligned} \mathcal{R}_2= & {} \mathcal{R}(\beta _{1,0}, \tilde{\beta }_{2,0}) - \mathcal{R}(\widehat{\beta }_{1}, \widehat{\beta }_{2})\nonumber \\= & {} \left[ \varSigma (\beta _0)^{-1/2}\sqrt{n}Q_{1n}(\beta _0, 0)\right] ^{\tau }\varSigma (\beta _0)^{1/2}B^{-1/2}\nonumber \\&\times \left\{ \left( E\frac{\partial W_{ni}}{\partial \beta }\right) \left[ \left( E\frac{\partial W_{ni}}{\partial \beta }\right) ^{\tau }(EW_{ni}W_{ni}^{\tau })^{-1}\left( E\frac{\partial W_{ni}}{\partial \beta }\right) \right] ^{-1}\left( E\frac{\partial W_{ni}}{\partial \beta }\right) ^{\tau }\right. \nonumber \\&\left. -\left( E\frac{\partial W_{ni}}{\partial \beta _2}\right) \left[ \left( E\frac{\partial W_{ni}}{\partial \beta _2}\right) ^{\tau }(EW_{ni}W_{ni}^{\tau })^{-1}\left( E\frac{\partial W_{ni}}{\partial \beta _2}\right) \right] ^{-1}\left( E\frac{\partial W_{ni}}{\partial \beta _2}\right) ^{\tau }\right\} \nonumber \\&\times B^{-1/2}\varSigma (\beta _0)^{1/2}\left[ \varSigma (\beta _0)^{-1/2}\sqrt{n}Q_{1n}(\beta _0, 0)\right] + o_{p}(1). \end{aligned}$$
(42)

Denote

$$\begin{aligned} B_2= & {} B^{-1/2}\left\{ \left( E\frac{\partial W_{ni}}{\partial \beta }\right) \left[ \left( E\frac{\partial W_{ni}}{\partial \beta }\right) ^{\tau } (EW_{ni}W_{ni}^{\tau })^{-1}\left( E\frac{\partial W_{ni}}{\partial \beta }\right) \right] ^{-1}\left( E\frac{\partial W_{ni}}{\partial \beta }\right) ^{\tau }\right. \\&\left. -\left( E\frac{\partial W_{ni}}{\partial \beta _2}\right) \left[ \!\left( \!E\frac{\partial W_{ni}}{\partial \beta _2}\!\right) ^{\tau }(EW_{ni}W_{ni}^{\tau })^{-1}\left( \!E\frac{\partial W_{ni}}{\partial \beta _2}\!\right) \!\right] ^{-1}\left( \!E\frac{\partial W_{ni}}{\partial \beta _2}\!\right) ^{\tau }\!\right\} B^{-1/2}. \end{aligned}$$

Similar as Corollary 5 in Qin and Lawless (1994), \(B_2\) is non-negative definite matrix, and then Corollary 1 can be proved easily by Lemma 3 in Qin and Jing (2001). \(\square \)

Proof of Theorem 3

It is easy to show that \(\widehat{\beta }_g\) is a consistent estimator of \(\beta _0\) (see, for example, Newey and McFadden 1994, chapter 36, p 2132). By the assumptions, the first-order condition

$$\begin{aligned} 2A_n(\widehat{\beta }_g)^{\tau }\widehat{W} \frac{1}{n}\sum _{i=1}^nW_{ni}(\widehat{\beta }_g)=0, \end{aligned}$$

is satisfied with probability approaching one. Expanding \(W_{ni}(\widehat{\beta }_g)\) around \(\beta _0\) and multiplying through by \(\sqrt{n}\), we have

$$\begin{aligned} \sqrt{n}(\widehat{\beta }-\beta _0)=-\left[ A_n(\widehat{\beta }_g)^{\tau }\widehat{W}A_n(\bar{\beta })\right] ^{-1} A_n(\widehat{\beta }_g)^{\tau }\widehat{W}\sqrt{n}\frac{1}{n}\sum _{i=1}^nW_{ni}(\beta _0), \end{aligned}$$

where \(\bar{\beta }\) lies between \(\widehat{\beta }_g\) and \(\beta _0\). By Assumption 4 and Lemma 4,

$$\begin{aligned} A_n(\widehat{\beta }_g)\rightarrow ^{p}A,\quad A_n(\bar{\beta })\rightarrow ^{p}A. \end{aligned}$$

Thus, we have, in probability, \(\left[ A_n(\widehat{\beta }_g)^{\tau }\widehat{W}A_n(\bar{\beta })\right] ^{-1}\! A_n(\widehat{\beta }_g)^{\tau }\widehat{W}\!\rightarrow ^{p}(A^{\tau }WA)^{-1}A^{\tau }W\). The conclusion then follows by the Slutsky theorem. \(\square \)