1 Introduction

Truncated data are commonly seen in studies of biomedicine, epidemiology, astronomy and econometrics. Such data occur when the variables of interest can be observed only if their values satisfy certain criteria. Left truncation refers to the situation that the samples are available only when the variable of interest exceeds some threshold value. An example of such data is the lifetime data collected from the Channing house retirement community in Palo Alto, California (Hyde 1980). The data records residents’ lifetime (age at death) and gender subject to the criteria that a resident had to live long enough to enter the retirement center. Thus, the entry age stands for the left-truncation variable. The data also include right censoring due to subjects’ withdrawal or the end-of-study effect.

Most methods for analyzing left-truncated and right-censored data have been developed under the assumption that the lifetime variable is independent of both truncation and censoring variables. In contrast to independent censorship which is not a testable condition, there exist several nonparametric tests for verifying the quasi-independence assumption on the truncation variable (Tsai 1990; Jones and Crowley 1992; Chen et al. 1996; Martin and Betensky 2005; Emura and Wang 2010; Uña-Álvarez 2012; Rodríguez Girondo and de Uña-Álvarez 2012). From these articles, we see an interesting phenomenon that the null hypothesis of quasi-independence was often rejected in some common real-world examples. This further implies that the sampling selection criteria itself contains information about the variable of interest.

In this article, we propose a semiparametric accelerated failure time (AFT) model that incorporates both covariates and the truncation variable as regressors. As an important special case, the model includes the AFT model of Lai and Ying (1991) in which the lifetime variable is conditionally independent of truncation variable given covariates. A major goal of the proposed model is to allow dependent truncation, hereby utilizing the truncation information in statistical modeling.

The article is organized as follows. In Sect. 2, we present the proposed model and inference methods. In Sect. 3, we derive asymptotic properties of the proposed estimator and suggest methods for variance estimation. Simulations and data analysis are provided in Sects. 4 and 5, respectively. Concluding remarks are given in Sect. 6. Technical proofs are provided in the appendix.

2 Proposed methodology

Let \(Y^*\) be the logarithms of lifetime. Similarly, let T and C be the logarithms of left-truncation time and right-censoring time, respectively. Also, let \({\mathbf{X}}\) be a p-dimensional covariate vector without the intercept term. To avoid the trivial case, we assume \(\text{ Pr }(T\le Y^*)>0\). We also impose a usual assumption that left truncation precedes right censoring with probability one, i.e., \(\text{ Pr }(T\le C)=1\). This assumption includes Type I censoring of the form \(C=T+\tau \) for some \(\tau >0\), which often occurs in left-truncated data (Uña-Álvarez 2010).

In left-truncated and right-censored data, observed variables include \((T,\,Y,\,\Delta ,\,{\mathbf{X}})\) subject to \(T\le Y^*\), where \(Y=Y^*\wedge C\), \(\Delta =I(Y^*\le C)\), \(a \,\wedge \, b=\min (a,b)\) and I(.) is the indicator function. Left-truncated and right-censored data consist of random replications of \((T,\,Y,\,\Delta ,\,{\mathbf{X}})\), denoted as \(\text{( }T_i ,\,Y_i ,\,\Delta _i,\,{\mathbf{X}}_i )\), subject to \(T_i \le Y_i \), for \(i=1,\ldots ,n\). Under this sampling scheme, the cumulative distribution function of \((T,\,Y,\,\Delta ,\,{\mathbf{X}})\) is

$$\begin{aligned} F(t,\,y,\,\delta ,\,{\mathbf{x}})=\Pr (T\le t,\,Y^*\wedge C\le y,\,I(Y^*\le C)\le \delta ,\,{\mathbf{X}}\le {\mathbf{x}}\vert T\le \,Y^*). \end{aligned}$$

Our objective is to utilize the information of both \({\mathbf{X}}\) and T, if dependent truncation exists, in modeling the behavior of \(Y^*\).

2.1 Preliminary

Our proposal was motivated by the work of Lai and Ying (1991) who considered the following AFT model:

$$\begin{aligned} Y^*={{\varvec{\beta }'}}_0 {\mathbf{X}}+\varepsilon , \end{aligned}$$
(1)

where \(\varepsilon \) is the error term with an unspecified density \(f_\varepsilon (\cdot )\). The p-dimensional parameter \({\varvec{\beta }}_0 \) describes the covariate effect on the log-lifetime \(Y^*\). Traditionally, it is assumed that \(\varepsilon \) is independent of \((T,\,C,\,{\mathbf{X}})\) (Lai and Ying 1991). It follows that \(Y^*\) and \((T,\,C)\) are conditionally independent given \({\mathbf{X}}\). This implies that, after adjusting for the effect of \({\mathbf{X}}\), the truncation T contains no information about the lifetime \(Y^*\). Accordingly, Lai and Ying (1991) proposed a rank-based estimation approach based on left-truncated and right-censored data \(\text{( }T_i,\,Y_i,\,\Delta _i,\,{\mathbf{X}}_i )\), subject to \(T_i \le Y_i \), for \(i=1,\ldots ,n\).

We revisit the AFT regression procedure of Lai and Ying (1991). Note that the residual lifetime \(e_i^Y ({\varvec{\beta }})=Y_i -{{\varvec{\beta }'X}}_i \) is left truncated by \(e_i^T ({\varvec{\beta }})=T_i -{{\varvec{\beta }'X}}_i \). In terms of the residual scale, \(R_i ({\varvec{\beta }})=\sum \nolimits _j {I\{e_j^T ({\varvec{\beta }})\le e_i^Y ({\varvec{\beta }})\le e_j^Y ({\varvec{\beta }}} )\}\) represents the number at risk for a subject i. Lai and Ying (1991) proposed the log-rank type estimating function

$$\begin{aligned} {\mathbf{U}}_n ({\varvec{\beta }})=\sum \limits _{i=1}^n {\Delta _i \phi _i ({\varvec{\beta }})\left[ {{\mathbf{X}}_i -\frac{1}{R_i ({\varvec{\beta }})}\sum \limits _j {{\mathbf{X}}_j I\{e_j^T ({\varvec{\beta }})\le e_i^Y ({\varvec{\beta }})\le e_j^Y ({\varvec{\beta }})\}} } \right] }, \end{aligned}$$

where \(\phi _i ({\varvec{\beta }})\) is a weight function. With the Gehan weight \(\phi _i ({\varvec{\beta }})=R_i ({\varvec{\beta }})\), it is easy to see that \({\mathbf{U}}_n ({\varvec{\beta }})\) is equal to

$$\begin{aligned} {\mathbf{U}}_n^G ({\varvec{\beta }})=\sum \limits _{i=1}^n {\sum \limits _{j=1}^n {\Delta _i ({\mathbf{X}}_i -{\mathbf{X}}_j )} I\{e_j^T ({\varvec{\beta }})\le e_i^Y ({\varvec{\beta }})\le e_j^Y ({\varvec{\beta }})\}}. \end{aligned}$$

Under the special case of no truncation (i.e., \(e_j^T ({\varvec{\beta }})=-\infty )\), the form of \({\mathbf{U}}_n^G ({\varvec{\beta }})\) reduces to the usual Gehan-type estimating function (Section 7.4.3 of Kalbfleisch and Prentice 2002) which is monotone in each component of \({\varvec{\beta }}\). This useful property has been applied by Jin et al. (2003) to solve the estimating equation using a linear programming method and construct a resampling-based variance estimator.

To facilitate further discussions, we derive an alternative expression for \({\mathbf{U}}_n^G ({\varvec{\beta }})\). Define \(\mathrm{sgn}(x)\) as -1, 0, or 1 if \(x<0\), \(x=0\), or \(x>0\), respectively. After some calculations provided in Appendix A1, we get

$$\begin{aligned} {\mathbf{U}}_n^G ({\varvec{\beta }})=-\sum \limits _{i<j} {({\mathbf{X}}_i -{\mathbf{X}}_j){\text{ sgn }}\{e_i^Y ({\varvec{\beta }})-e_j^Y ({\varvec{\beta }})\} I\{{\mathop {e}^{\smile }}_{ij}^T ({\varvec{\beta }})\le \tilde{e}_{ij}^Y ({\varvec{\beta }})\}} Q_{ij} ({\varvec{\beta }}), \end{aligned}$$
(2)

where \({\mathop {e}\limits ^{\smile }}_{ij}^T ({\varvec{\beta }})=e_i^T ({\varvec{\beta }})\vee e_j^T ({\varvec{\beta }})\), \(a\vee b\equiv \max (a,b)\), \(\tilde{e}_{ij}^Y ({\varvec{\beta }})=e_i^Y ({\varvec{\beta }})\wedge e_j^Y ({\varvec{\beta }})\), and

$$\begin{aligned} Q_{ij} ({\varvec{\beta }})= & {} \Delta _i \Delta _j +\Delta _i (1-\Delta _j \,)I\{e_i^Y ({\varvec{\beta }})<e_j^Y ({\varvec{\beta }})\}\\&+\,(1-\Delta _i)\Delta _j I\left\{ e_i^Y ({\varvec{\beta }})>e_j^Y ({\varvec{\beta }})\right\} . \end{aligned}$$

Here, \(Q_{ij} ({\varvec{\beta }})\) indicates whether a pair (ij) is orderable and \(I\{{\mathop {e}\limits ^{\smile }}_{ij}^T ({\varvec{\beta }})\le \tilde{e}_{ij}^Y ({\varvec{\beta }})\}\) indicates whether the pair is comparable (Martin and Betensky 2005). The new expression in Eq. (2) is a U-statistic which will be used in subsequent discussions.

2.2 Proposed model

We consider relaxing the independent truncation assumption by utilizing the information of T in the following regression model:

$$\begin{aligned} Y^*={{\varvec{\beta }'}}_0 {\mathbf{X}}+\gamma _0 T+\varepsilon , \end{aligned}$$
(3)

where \(\varepsilon \) is an error term with an unspecified density function \(f_\varepsilon (\cdot )\). We assume that \(\varepsilon \) and \((T,\,C,\,{\mathbf{X}})\) are independent. Note that the parameter \(\gamma _0 \) explicitly measures the dependency of \(Y^*\) on T, conditional on the covariate \({\mathbf{X}}\), such that

$$\begin{aligned} \gamma _0 =\frac{{\text{ Cov }}(Y^*,\,T\,\vert \,{\mathbf{X}})}{{\text{ Var }}(T\vert \,{\mathbf{X}})}. \end{aligned}$$

If \(\gamma _0 =0\) is assumed to be known, the model (3) reduces to the model (1). Hence, the new model is a broader version of Lai and Ying’s model under an unknown population parameter \(\gamma _0 \).

One can interpret the regression parameters \({\varvec{\beta }}_0 \) as the effect of the covariate \({\mathbf{X}}\) on the lifetime \(Y^*\), after adjusting for the value of truncation time T. For the Channing house example, the covariate “gender” is coded as \(X=1\) for male and \(X=0\) for female, and the truncation time T is the potential entry age. Then, the parameter \({\varvec{\beta }}_0 \) in the model (3) represents the difference of log-lifetime between male and female residents in the Channing house having the same potential entry age.

Without truncation, we will have \(\text{ Pr }(T\le Y^*)=1\) which happens when all possible values of T fall below the lower boundary of \(Y^*\). In the Channing house example where T denotes the logarithm of the age entering the study, “no truncation” refers to \(T=-\infty \) which means that all subjects in the target population would be recruited into the study since they were just born. However, the real Channing house data are subject to a truncation mechanism which yields a biased sample from the target population by imposing an additional constraint \(T\le Y^*\) with \(\text{ Pr }(T\le Y^*)<1\). We will demonstrate that with appropriate estimation schemes, such sampling bias can be corrected even if the truncation mechanism is informative.

To further illustrate the proposed model (3), we generalize the model of Martin and Betensky (2005) to a regression setting in which the lifetime and truncation time, after the logarithm transformation, jointly follow a bivariate normal distribution

$$\begin{aligned} \left[ {{\begin{array}{c} {Y^*}\\ T\\ \end{array} }} \right] \sim N\left( \,\left[ {{\begin{array}{c} {{{\varvec{\beta }'}}_0 {\mathbf{X}}}\\ {\mu _L }\\ \end{array} }} \right] ,\, \left[ {{\begin{array}{cc} {\sigma _{Y^*}^2 }&{}\quad {\rho \sigma _{Y^*} \sigma _T} \\ {\rho \sigma _{Y^*} \sigma _T}&{}\quad {\sigma _T^2}\\ \end{array} }} \right] \,\right) . \end{aligned}$$

It follows that

$$\begin{aligned} Y^*\vert {\mathbf{X}},T\sim N \left( {\,{{\varvec{\beta }'}}_0 {\mathbf{X}}+\rho \frac{\sigma _{Y^*}}{\sigma _T}(T-\mu _L ),\,\sigma _{Y^*}^2 (1-\rho ^2)\,}\right) . \end{aligned}$$

This implies that the model (3) holds with \(\gamma _0 =\rho \sigma _{Y^*} /\sigma _T \) and an independent error

$$\begin{aligned} \varepsilon \sim N \left( {\,-\rho \frac{\sigma _{Y^*}}{\sigma _T}\mu _L ,\,\sigma _{Y^*}^2 (1-\rho ^2)\,}\right) . \end{aligned}$$

The parameter \({\varvec{\beta }}_0 \) is interpreted as the population effect of \({\mathbf{X}}\) on \(Y^*\) since \(Y^*\vert {\mathbf{X}}\sim N(\,{{\varvec{\beta }'}}_0 {\mathbf{X}},\,\sigma _{Y^*}^2 )\) holds true. Nevertheless, it is wrong to fit the model (1) under the framework of Lai and Ying (1991) since the error \(Y^*-\,{{\varvec{\beta }'}}_0 {\mathbf{X}}\) is clearly dependent on T. As we will see in our simulations, such a naïve method yields systematic bias for estimating the population parameter \({\varvec{\beta }}_0 \).

2.3 Estimation of regression coefficients

We need a set of estimating functions for the joint estimation of \({\varvec{\beta }}=(\beta _1 ,\ldots ,\beta _p {)}'\) and \(\gamma \). Define \(\varepsilon _i^Y ({\varvec{\beta }},\gamma )=Y_i -{{\varvec{\beta }'X}}_i -\gamma T_i \) and \(\varepsilon _i^T ({\varvec{\beta }},\,\gamma )=T_i -{{\varvec{\beta }'X}}_i -\gamma T_i \). Note that \(\varepsilon _i^Y ({\varvec{\beta }},\gamma )\) is left-truncated by \(\varepsilon _i^T ({\varvec{\beta }},\,\gamma )\) satisfying \(\varepsilon _i^T ({\varvec{\beta }},\,\gamma )\le \varepsilon _i^Y ({\varvec{\beta }},\,\gamma )\). In light of Eq. (2), log-rank statistics with the Gehan weight can be modified as

$$\begin{aligned} {\mathbf{S}}_n^\mathrm{Logrank} ({\varvec{\beta }},\,\gamma )= & {} -\sum \limits _{i<j} ({\mathbf{X}}_i -{\mathbf{X}}_j )\mathrm{sgn}\{(\varepsilon _i^Y ({\varvec{\beta }},\,\gamma )-\varepsilon _j^Y ({\varvec{\beta }},\,\gamma )\}\\&\times \, I\{{\mathop {\varepsilon }\limits ^{\smile }}_{ij}^T ({\varvec{\beta }},\,\gamma )\le \tilde{\varepsilon }_{ij}^Y ({\varvec{\beta }},\,\gamma )\}O_{ij} ({\varvec{\beta }},\,\gamma ), \end{aligned}$$

where \({\mathop {\varepsilon }\limits ^{\smile }}_{ij}^T ({\varvec{\beta }},\,\gamma )=\varepsilon _i^T ({\varvec{\beta }},\,\gamma )\vee \varepsilon _j^T ({\varvec{\beta }},\,\gamma )\), \(\tilde{\varepsilon }_{ij}^Y ({\varvec{\beta }},\,\gamma )=\varepsilon _i^Y ({\varvec{\beta }},\,\gamma )\wedge \varepsilon _j^Y ({\varvec{\beta }},\,\gamma )\), and

$$\begin{aligned} O_{ij} ({\varvec{\beta }},\,\gamma )= & {} \Delta _i \Delta _j +\Delta _i (1-\Delta _j )I\{\varepsilon _i^Y ({\varvec{\beta }},\,\gamma )<\varepsilon _j^Y ({\varvec{\beta }},\,\gamma )\}\\&+\,\,(1-\Delta _i )\Delta _j I\{\varepsilon _i^Y ({\varvec{\beta }},\,\gamma )>\varepsilon _j^Y ({\varvec{\beta }},\,\gamma )\}. \end{aligned}$$

Note that \(O_{ij} ({\varvec{\beta }},\,\gamma )=1\) if a pair (ij) is orderable and \(O_{ij} ({\varvec{\beta }},\,\gamma )=0\), otherwise.

The second estimating function utilizes the assumption that \(\varepsilon =Y^*-{{\varvec{\beta }'}}_0 {\mathbf{X}}-\gamma _0 T\) and \(T-{{\varvec{\beta }'}}_0 {\mathbf{X}}-\gamma _0 T\) are independent. Motivated by the quasi-independence test based on the conditional Kendall’s tau statistics (Martin and Betensky 2005), we set

$$\begin{aligned}&S_n^\mathrm{Kendall} ({\varvec{\upbeta }},\,\gamma ) \\&\quad =\sum \limits _{i<j} \mathrm{sgn}[\{\varepsilon _i^T ({\varvec{\upbeta }},\,\gamma )-\varepsilon _j^T ({\varvec{\upbeta }},\,\gamma )\}\{\varepsilon _i^Y ({\varvec{\upbeta }},\,\gamma )-\varepsilon _j^Y ({\varvec{\upbeta }},\,\gamma )\}]\\&\qquad \qquad \times I\{{\mathop {\varepsilon }\limits ^{\smile }}_{ij}^T ({\varvec{\upbeta }},\,\gamma )\le \tilde{\varepsilon }_{ij}^Y ({\varvec{\upbeta }},\,\gamma )\}O_{ij} ({\varvec{\upbeta }},\,\gamma ). \\ \end{aligned}$$

The estimator \(({{\hat{\varvec{\beta }}}},\,\hat{\gamma })\) jointly solves \({\mathbf{S}}_n^\mathrm{Logrank} ({\varvec{\beta }},\,\gamma )={\mathbf{0}}\) and \(S_n^\mathrm{Kendall} ({\varvec{\beta }},\,\gamma )=0\).

It is important to note that both \({\mathbf{S}}_n^\mathrm{Logrank} ({\varvec{\beta }},\,\gamma )\) and \(S_n^\mathrm{Kendall} ({\varvec{\beta }},\,\gamma )\) are non-monotonic step functions in each components of \(({\varvec{\beta }},\,\gamma )\) [see the functional property of \({\mathbf{S}}_n^\mathrm{Logrank} ({\varvec{\beta }},\,\gamma )\) described in Appendix A2]. Accordingly the Newton-type algorithms and even the linear programming developed by Jin et al. (2003) are not directly applicable.

We suggest the Nelder–Mead simplex algorithm for minimizing non-differentiable functions (Nelder and Mead 1965). Define an objective function

$$\begin{aligned} \{\vert \vert {\mathbf{S}}_n^\mathrm{Logrank} ({\varvec{\beta }},\,\gamma )\vert \vert _2 +\vert S_n^\mathrm{Kendall} ({\varvec{\beta }},\,\gamma )\vert ^2\}/n^2, \end{aligned}$$

where \(\vert \vert \cdot \vert \vert _2 \) is the Euclid (\(L_2 )\) norm, defined to be \(\vert \vert {\mathbf{a}}\vert \vert _2 \equiv {\mathbf{{a}'a}}\) for a vector \({\mathbf{a}}\). Then, solving the estimating functions corresponds to minimizing the objective function.

In our numerical studies, we use R optim routine for implementing the Nelder–Mead algorithm. To ascertain the right solution, one needs to supply a reliable initial value. One possible suggestion is the initial value \(({{\hat{\varvec{\beta }}}}^{LY},\,0)\) where \({{\hat{\varvec{\beta }}}}^{LY}\) solves \({\mathbf{S}}_n^\mathrm{Logrank} ({\varvec{\beta }},\,0)={\mathbf{0}}\) and is the estimator of Lai and Ying (1991) with the Gehan weight. Here, \({{\hat{\varvec{\beta }}}}^{LY}\) is found by minimizing \(\vert \vert {\mathbf{S}}_n^\mathrm{Logrank} ({\varvec{\beta }},\,0)\vert \vert _2 /n^2\) with the initial value \({\varvec{\beta }}={\mathbf{0}}\). Another possible initial value is \(({\mathbf{0}},\,0)\). In our simulations (Sect. 4), we use \(({{\hat{\varvec{\beta }}}}^{LY},\,0)\) since it reaches the solutions through all the repetitions. In real data analysis, we suggest trying both methods.

2.4 Estimation of survival function

We consider estimation of the survival function \(S_\varepsilon (t)=\Pr (\varepsilon >t)\). Notice that \(\{(\varepsilon _i^T ({{\hat{\varvec{\beta }}}},\,\hat{\gamma }),\,\varepsilon _i^Y ({{\hat{\varvec{\beta }}}},\,\hat{\gamma }),\,\Delta _i);\,i=1,\ldots ,n\}\) form approximately homogeneous (identically distributed) samples of left-truncated and right-censored data subject to \(\varepsilon _i^T ({{\hat{\varvec{\beta }}}},\,\hat{\gamma })\le \varepsilon _i^Y ({{\hat{\varvec{\beta }}}},\,\hat{\gamma })\). Therefore, we can estimate \(S_\varepsilon (t)\) by the product-limit estimator (Tsai et al. 1987)

$$\begin{aligned} \hat{S}_\varepsilon (\,t;\,{{\hat{\varvec{\beta }}}},\,\hat{\gamma }\,)=\prod \limits _{u\le t} {\left\{ {1-\frac{\sum \nolimits _j {I(\,\varepsilon _j^Y ({{\hat{\varvec{\beta }}}},\,\hat{\gamma })=u,\,\Delta _j =1\,)} }{\sum \nolimits _j {I(\,\varepsilon _j^T ({{\hat{\varvec{\beta }}}},\,\hat{\gamma })\le u\le \varepsilon _j^Y ({{\hat{\varvec{\beta }}}},\,\hat{\gamma })\,)} }} \right\} }. \end{aligned}$$

Remark

Since \(\{(\varepsilon _i^T ({{\hat{\varvec{\beta }}}},\,\hat{\gamma }),\,\varepsilon _i^Y ({{\hat{\varvec{\beta }}}},\,\hat{\gamma }),\,\Delta _i )\}\) and \(\{(\varepsilon _j^T ({{\hat{\varvec{\beta }}}},\,\hat{\gamma }),\,\varepsilon _j^Y ({{\hat{\varvec{\beta }}}},\,\hat{\gamma }),\,\Delta _j )\}\) are dependent for \(i\ne j\), the asymptotic properties of \(\hat{S}_\varepsilon (\,t;\,{{\hat{\varvec{\beta }}}},\,\hat{\gamma }\,)\) do not directly follow from those of the product-limit estimator. As for the consistency of \(\hat{S}_\varepsilon (\,t;\,{{\hat{\varvec{\beta }}}},\,\hat{\gamma }\,)\), this dependency will be asymptotically negligible (Theorem 3). However, the dependency makes it difficult to derive the asymptotic normality of \(\hat{S}_\varepsilon (\,t;\,{{\hat{\varvec{\beta }}}},\,\hat{\gamma }\,)\), and also influences the asymptotic variance.

3 Asymptotic analysis

3.1 Asymptotic theory

Let \(\Theta \subset R^{p+1}\) be the parameter space for \(({\varvec{\beta }},\,\gamma )\), and \(({\varvec{\beta }}_0 ,\,\gamma _0 )\in \Theta \) be the true parameter value. Asymptotic properties of \(({{\hat{\varvec{\beta }}}},\,\hat{\gamma })\) are constructed by expressing the estimating functions as a functional of the empirical distribution function,

$$\begin{aligned} F_n (t,\,y,\,\delta ,\,{\mathbf{x}})=\frac{1}{n}\sum \limits _{i=1}^n {I(T_i \le t,\,Y_i \le y,\,\Delta _i \le \delta ,\,{\mathbf{X}}_i \le {\mathbf{x}})}, \end{aligned}$$

and then applying the asymptotic theory for empirical processes (e.g., Van Der Vaart and Wellner 1996). By straightforward calculations given in Appendix B.1,

$$\begin{aligned} \Phi (F_n;\,{\varvec{\beta }},\,\gamma )\equiv & {} \frac{1}{n^2}\left[ {{\begin{array}{l} {{\mathbf{S}}_n^\mathrm{Logrank} ({\varvec{\beta }},\,\gamma )}\\ {S_n^\mathrm{Kendall} ({\varvec{\beta }},\,\gamma )}\\ \end{array} }} \right] \nonumber \\= & {} \int \int \,{\mathbf{h}}\{(t_1,\,y_1,\,\delta _1,\,{\mathbf{x}}_1 ),\,(t_2 ,\,y_2 ,\,\delta _2,\,{\mathbf{x}}_2 );{\varvec{\beta }},\,\gamma \}dF_n (t_1 ,\,y_1 ,\,\delta _1,\,{\mathbf{x}}_1)\nonumber \\&\times \, dF_n (t_2 ,\,y_2,\,\delta _2,\,{\mathbf{x}}_2 ), \end{aligned}$$
(4)

where \({\mathbf{h}}\) is a deterministic function given by

$$\begin{aligned}&{\mathbf{h}}\{(t_1,\,y_1,\,\delta _1,\,{\mathbf{x}}_1 ),\,(t_2 ,\,y_2,\,\delta _2,\,{\mathbf{x}}_2 );{\varvec{\upbeta }},\,\gamma \} \\&\quad \equiv \frac{1}{2}\left[ {{\begin{array}{c} {-({\mathbf{x}}_1 -{\mathbf{x}}_2 )\mathrm{sgn}\{y_1 -{{\varvec{\upbeta }'\mathbf{x}}}_1 -\gamma t_1 -(y_2 -{{\varvec{\upbeta }'\mathbf{x}}}_2 -\gamma t_2 )\}}\\ {\mathrm{sgn}\{(t_1 -{{\varvec{\upbeta }'\mathbf{x}}}_1 -\gamma t_1 -(t_2 -{{\varvec{\upbeta }'\mathbf{x}}}_2 -\gamma t_2 ))(y_1 -{{\varvec{\upbeta }'\mathbf{x}}}_1 -\gamma t_1 -(y_2 -{{\varvec{\upbeta }'\mathbf{x}}}_2 -\gamma t_2 ))\}}\\ \end{array} }} \right] \\&\quad \times \, I\{(t_1 -{{\varvec{\upbeta }'\mathbf{x}}}_1 -\gamma t_1 )\vee (t_2 -{{\varvec{\upbeta }'\mathbf{x}}}_2 -\gamma t_2 )\le (y_1 -{{\varvec{\upbeta }'\mathbf{x}}}_1 -\gamma t_1 )\wedge (y_2 -{{\varvec{\upbeta }'\mathbf{x}}}_2 -\gamma t_2 )\} \\&\quad \times \, \{\delta _1 \delta _2 +\delta _1 (1-\delta _2 )I(y_1 -{{\varvec{\upbeta }'\mathbf{x}}}_1 -\gamma t_1 <y_2 -{{\varvec{\upbeta }'\mathbf{x}}}_2 -\gamma t_2 )\\&\qquad +(1-\delta _1 ) \delta _2 I(y_1 -{{\varvec{\beta }'\mathbf{x}}}_1 -\gamma t_1 >y_2 -{{\varvec{\beta }'\mathbf{x}}}_2 -\gamma t_2 )\}. \end{aligned}$$

Since \(\Phi (F_n;\,{\varvec{\beta }},\,\gamma )={\mathbf{0}}\) holds at \(({\varvec{\beta }},\,\gamma )=({{\hat{\varvec{\beta }}}},\,\hat{\gamma })\), one can regard \(({{\hat{\varvec{\beta }}}},\,\hat{\gamma })\) as a Z-estimator based on the function \(\Phi (F_n;\,\cdot ):\Theta \mapsto R^{p+1}.\) As \(\Phi (F_n;\,{\varvec{\beta }},\,\gamma )\) is not differentiable with respect to \(({\varvec{\beta }},\,\gamma )\), one cannot apply the classical technique based on a Taylor expansion of \(\Phi (F_n;\,{\varvec{\beta }},\,\gamma )\) around \(({\varvec{\beta }}_0,\,\gamma _0 )\). Instead, we apply the empirical process theory for Z-estimators that accommodate the cases of non-differentiable functions (Sec. 3.3 of Van Der Vaart and Wellner 1996). The corresponding deterministic function is \(\Phi (F;\,\cdot ):\Theta \mapsto R^{p+1}\) in which \(F_n \) is replaced by the true distribution F.

We impose the following conditions:

Assumption 1

There exists some constant \(M>0\) such that \(\vert \vert {\mathbf{X}}\vert \vert _2 \le M\) with probability one, where \(\vert \vert \cdot \vert \vert _2 \) is the Euclid (\(L_2 )\) norm.

Assumption 2

For every \(\varepsilon >0\), \(\inf _{\vert \vert ({\varvec{\beta }},\,\gamma )-({\varvec{\beta }}_0 ,\,\gamma _0 )\vert \vert _2 \ge \varepsilon } \vert \vert \Phi (F;{\varvec{\beta }},\,\gamma )\vert \vert _2 >0=\vert \vert \Phi (F;{\varvec{\beta }}_0,\,\gamma _0 )\vert \vert _2 \).

Assumption 1 is a requirement for the function \({\mathbf{h}}\) to be bounded, which is sufficient for the functional \(\Phi (\cdot \,;{\varvec{\beta }}_0,\,\gamma _0 )\) to be Hadamard differentiable. The same assumption is imposed by Lai and Ying (1991). Assumption 2 is often called “identifiability” ensuring that the true value \(({\varvec{\beta }}_0,\,\gamma _0 )\in \Theta \) is a well-separated point from \(({\varvec{\beta }},\,\gamma )\ne ({\varvec{\beta }}_0 ,\,\gamma _0 )\). This is the standard assumption for guaranteeing the consistency of M- or Z-estimator (Sec. 3.3 of Van Der Vaart and Wellner , 1996; Sec. 5.2 of Van Der Vaart 1998).

Theorem 1

Under Assumptions 1, 2, the estimators \(({{\hat{\varvec{\beta }}}},\,\hat{\gamma })\) converge in probability to the true parameter value \(({\varvec{\beta }}_0 ,\,\gamma _0 )\).

The proof of Theorem 1 is given in Appendix B.2.

Under Assumption 1, the functional \(\Phi (\cdot ;{\varvec{\beta }}_0,\,\gamma _0 )\) is Hadamard differentiable and hence the functional delta method applies (Sec. 20.2 of Van Der Vaart 1998). Then, we obtain the expression

$$\begin{aligned}&n^{1/2}\{\Phi (F_n ;{\varvec{\beta }}_0 ,\,\gamma _0 )-\Phi (F;{\varvec{\beta }}_0 ,\,\gamma _0 )\}\nonumber \\&\quad =n^{-1/2}\sum \limits _{j=1}^n {\phi _F \{(T_j ,\,Y_j ,\,\Delta _j ,\,{\mathbf{X}}_j );\,{\varvec{\beta }}_0 ,\,\gamma _0 \}} +o_P (1) \end{aligned}$$
(5)

where

$$\begin{aligned}&\phi _F \{(T_j ,\,Y_j ,\,\Delta _j ,\,{\mathbf{X}}_j );\,{\varvec{\beta }},\,\gamma \} \nonumber \\&\quad =2\int {\,{\mathbf{h}}\{(t_1 ,\,y_1 ,\,\delta _1 ,\,{\mathbf{x}}_1 ),\,(T_j ,\,Y_j ,\,\Delta _j ,\,{\mathbf{X}}_j );{\varvec{\beta }},\,\gamma \}dF(t_1 ,\,y_1 ,\,\delta _1 ,\,{\mathbf{x}}_1 )} \nonumber \\&\qquad -2\int \int \,{\mathbf{h}}\{(t_1 ,\,y_1 ,\,\delta _1 ,\,{\mathbf{x}}_1 ),\,(t_2 ,\,y_2 ,\,\delta _2 ,\,{\mathbf{x}}_2 );{\varvec{\beta }},\,\gamma \}\nonumber \\&\qquad \times \, dF(t_1 ,\,y_1 ,\,\delta _1 ,\,{\mathbf{x}}_1 )dF(t_2 ,\,y_2 ,\,\delta _2 ,\,{\mathbf{x}}_2 ). \end{aligned}$$
(6)

We will establish the asymptotic normality under the additional conditions:

Assumption 3

The derivatives \(\partial \Phi (F;{\varvec{\beta }},\,\gamma )/\partial ({\varvec{\beta }},\,\gamma )\) exist and continuous. Also, the matrix \({\mathbf{A}}_0 \equiv \partial \Phi (F;{\varvec{\beta }},\,\gamma )/\partial ({\varvec{\beta }},\,\gamma )\vert _{{\varvec{\beta }}_0 ,\gamma _0 } \) is non-singular.

Assumption 4

A class of functions

$$\begin{aligned} \mathfrak {I}=\{\phi _F (\,\cdot \,;\,{\varvec{\beta }},\,\gamma \,)-\phi _F (\,\cdot \,;\,{\varvec{\beta }}_0 ,\,\gamma _0 \,):\,\vert \vert {\varvec{\beta }}-{\varvec{\beta }}_0 ,\,\gamma -\gamma _0 \vert \vert _2 <\delta \} \end{aligned}$$

is F-Donsker for some \(\delta >0\).

Assumption 3 insures the invertibility of the matrix \({\mathbf{A}}_0 \) for the Z-estimator, which is essential not only for the asymptotic normality proof, but also for variance estimation (Sect. 3.2). In Assumption 4, whether the class of functions is Donsker depends on the “size” of \(\mathfrak {I}\) which may reduce as \(\delta \) approaches to zero [see Section 19.2 of Van Der Vaart 1998]. As seen from Appendix B.3, Assumption 4 is a sufficient condition to verify the stochastic condition

$$\begin{aligned}&n^{1/2}\{\Phi (F_n ;{{\hat{\varvec{\beta }}}},\,\hat{\gamma })-\Phi (F;{{\hat{\varvec{\beta }}}},\,\hat{\gamma })\}-n^{1/2}\{\Phi (F_n ;{\varvec{\beta }}_0 ,\,\gamma _0 )-\Phi (F;{\varvec{\beta }}_0 ,\,\gamma _0 )\,\}\nonumber \\&\quad =o_P (1+n^{1/2}\vert \vert {{\hat{\varvec{\beta }}}}-{\varvec{\beta }}_0 ,\,\hat{\gamma }-\gamma _0 \vert \vert ). \end{aligned}$$

Owning to the non-differentiability of \(\Phi (F_n;\,{\varvec{\beta }},\,\gamma )\) with respect to \(({\varvec{\beta }},\,\gamma )\), the above stochastic condition is needed to regulate the asymptotic behavior of \(({{\hat{\varvec{\beta }}}},\,\hat{\gamma })\).

Theorem 2

Under Assumptions 1, 2, 3, 4,

$$\begin{aligned} n^{1/2}(\,{{\hat{\varvec{\beta }}}}-{\varvec{\beta }}_0 ,\,\hat{\gamma }-\gamma _0 \,{)}'={\mathbf{A}}_0^{-1} n^{-1/2}\sum \limits _{j=1}^n {\phi _F \{(T_j ,\,Y_j ,\,\Delta _j ,\,{\mathbf{X}}_j );\,{\varvec{\beta }}_0 ,\,\gamma _0 \}} \,+\, o_P (1), \end{aligned}$$
(7)

where \(\phi _F \) is the mean zero random vector defined in Eq. (6) and \({\mathbf{A}}_0 \) is defined in Assumption 3. Accordingly, \(n^{1/2}(\,{{\hat{\varvec{\beta }}}}-{\varvec{\beta }}_0 ,\,\hat{\gamma }-\gamma _0 \,)\) converges weakly to a multivariate normal distribution with mean zero and covariance matrix \({\mathbf{A}}_0 ^{-1}{\mathbf{B}}_0 {\mathbf{A}}_0 ^{-1}\), where \({\mathbf{B}}_0 \equiv E[\phi _F \{(T_j ,\,Y_j ,\,\Delta _j ,\,{\mathbf{X}}_j );\,{\varvec{\beta }}_0 ,\,\gamma _0 \}\phi _F \{(T_j ,\,Y_j ,\,\Delta _j ,\,{\mathbf{X}}_j );\,{\varvec{\beta }}_0 ,\,\gamma _0 {\}}']\).

The proof of Theorem 2 is given in Appendix B.3.

To prove the uniform consistency of the product-limit estimator \(\hat{S}_\varepsilon (t;\,{{\hat{\varvec{\beta }}}},\,\hat{\gamma })\), we need an additional assumption. Let the interval \([a,\,b]\), where \(a<b\), be the support of \(\varepsilon \), where \(a=\inf \{u;\,S_\varepsilon (u)<1\}\) and \(b=\sup \{u;\,S_\varepsilon (u)>0\}\). Also, let

$$\begin{aligned} \pi _\varepsilon (\,s,\,{\varvec{\beta }},\,\gamma \,)= & {} E[\,I(\,T-{{\varvec{\beta }'X}}-\gamma T\le s\,)\\&\times S_\varepsilon \{s+({\varvec{\beta }}-{\varvec{\beta }}_0 {)}'{\mathbf{X}}+\,\,(\gamma -\gamma _0 )T\}S_C \{s+{{\varvec{\beta }'X}}+\gamma T\}\!], \end{aligned}$$

where \(S_C \) is the survival function for C, and \(E[\,\cdot \,]\) is taken over the joint distribution of \((T,\,{\mathbf{X}})\vert T\le Y\). It is not difficult to see that \(\sum \nolimits _j {I\{\varepsilon _j^T ({\varvec{\beta }},\,\gamma )\le s\le \varepsilon _j^Y ({\varvec{\beta }},\,\gamma )\}} /n\) converges in probability to \(\pi _\varepsilon (s,\,{\varvec{\beta }},\,\gamma )\). To establish the uniform consistency, it is natural to assume that \(\pi _\varepsilon (s,\,{\varvec{\beta }},\,\gamma )\) is positive on \(s\in [a,\,b]\), in the neighborhood of \(({\varvec{\beta }}_0 ,\,\gamma _0 )\).

Assumption 5

For some \(\delta >0\), \(\inf _{\vert \vert ({\varvec{\beta }},\,\gamma )-({\varvec{\beta }}_0 ,\,\gamma _0 )\vert \vert \le \delta } \inf _{s\in [a,b]} \pi _\varepsilon (s,\,{\varvec{\beta }},\,\gamma )>0\).

Theorem 3

Under Assumptions 1, 2 and 5, the product-limit estimator \(\hat{S}_\varepsilon (t;\,{{\hat{\varvec{\beta }}}},\,\hat{\gamma })\) converges in probability to \(S_\varepsilon (t)\), uniformly over \(t\in [a,\,b].\)

The proof of Theorem 3 is given in Appendix B.4.

3.2 Variance estimation

Even for the usual AFT model in (1) without truncation, resampling based methods are often suggested for variance estimation (Jin et al. 2003). However, based on Theorem 2, we can directly estimate \({\mathbf{A}}_0 ^{-1}{\mathbf{B}}_0 {\mathbf{A}}_0 ^{-1}\) by \({\hat{\mathbf {A}}}_0 ^{-1}{\hat{\mathbf {B}}}_0 {\hat{\mathbf {A}}}_0 ^{-1}\) as defined below. We propose to estimate \({\mathbf{B}}_0 \) by

$$\begin{aligned} {\hat{\mathbf {B}}}_0 =\sum \limits _{j=1}^n {\phi _{F_n } (T_j ,\,Y_j ,\,\Delta _j ,\,{\mathbf{X}}_j ;{{\hat{\varvec{\beta }}}},\,\hat{\gamma }\,)\phi _{F_n } (T_j ,\,Y_j ,\,\Delta _j ,\,{\mathbf{X}}_j ;{{\hat{\varvec{\beta }}}},\,\hat{\gamma })'/n} , \end{aligned}$$

where

Estimation of the \((p+1)\times (p+1)\) matrix \({\mathbf{A}}_0 =\partial \Phi (F;{\varvec{\beta }},\,\gamma )/\partial ({\varvec{\beta }},\,\gamma )\vert _{{\varvec{\beta }}_0 ,\gamma _0 } \) is more challenging since \(\Phi (F_n ;{\varvec{\beta }},\,\gamma )\) is a step function in each component of \(({\varvec{\beta }},\gamma )\) and hence not differentiable. We propose to smooth out the jumps by a kernel density function K satisfying \(K(u)=K(-u)\), \(\int {K(u)du} =1\) and \(\int {s^2K(s)ds} \equiv \mu _2 (K)<\infty \). The resulting kernel estimator is \({\hat{\mathbf {A}}}_0 =[{\hat{\mathbf {A}}}_0^{(1)} ({{\hat{\varvec{\beta }}}},\,\hat{\gamma };b_1 ),\ldots ,{\hat{\mathbf {A}}}_0^{(p+1)} ({{\hat{\varvec{\beta }}}},\,\hat{\gamma };b_{p+1} )]\), where

$$\begin{aligned}&{\hat{\mathbf {A}}}_0^{(k)} ({\varvec{\beta }},\,\gamma ;b_k)\nonumber \\&\quad =\int \nolimits _{u\ne 0} {\frac{1}{u}\Phi (F_n ;\,\beta _1 ,\ldots ,\beta _k +u,\ldots ,\beta _p ,\,\gamma )\frac{1}{b_k }K\left( {\frac{u}{b_k }}\right) du} ,\quad k=1,\ldots ,p, \end{aligned}$$
$$\begin{aligned} {\hat{\mathbf {A}}}_0^{(p+1)} ({\varvec{\beta }},\,\gamma ;b_{p+1} )=\int \nolimits _{u\ne 0} {\frac{1}{u}\Phi (\,F_n ;\,{\varvec{\beta }},\,\gamma +u\,)\frac{1}{b_{p+1} }K\left( {\frac{u}{b_{p+1} }}\right) du}, \end{aligned}$$

where \(b_k >0\) is the bandwidth. We particularly use the standard normal density for K.

Using the Taylor expansion under certain smoothness assumptions on \(\Phi (F;{\varvec{\beta }},\,\gamma )\), the asymptotic mean square error is calculated as

$$\begin{aligned} \mathrm{MSE}\{{\hat{\mathbf {A}}}_0^{(k)} ({{\hat{\varvec{\beta }}}},\,\hat{\gamma };b_k )\}= & {} Var\{{\hat{\mathbf {A}}}_0^{(k)} ({{\hat{\varvec{\beta }}}},\,\hat{\gamma };b_k )\}+Bias[{\hat{\mathbf {A}}}_0^{(k)} ({\varvec{\beta }},\,\gamma ;b_k )]^2 \\\approx & {} \frac{1}{nb_k}E\left[ {\frac{\partial }{\partial \beta _k }\phi _F \{(T_j ,\,Y_j ,\,\delta _j ,\,{\mathbf{X}}_j );\,{\varvec{\beta }}_0 ,\,\gamma _0 \}} \right] ^2\nonumber \\&+\,\frac{b_k^4 }{36}\left[ {\frac{\partial ^3}{\partial \beta _k^3 }\Phi (\,F;\,{\varvec{\beta }}_0 ,\,\gamma _0 \,)} \right] ^2\mu _2 (K)^2, \\ \end{aligned}$$

The bandwidth that minimizes the preceding expression becomes

$$\begin{aligned}&b_k^\mathrm{opt}\\&\quad =\left( {9E\left[ {\frac{\partial }{\partial \beta _k }\phi _F \{(T_j ,\,Y_j,\,\delta _j ,\,{\mathbf{X}}_j );\,{\varvec{\beta }}_0 ,\,\gamma _0 \}} \right] ^2\left[ {\frac{\partial ^3}{\partial \beta _k^3 }\Phi (F;\,{\varvec{\beta }}_0 ,\,\gamma _0 \,)} \right] ^{-2}\mu _2 (K)^{-2}}\right) ^{1/5}\frac{1}{n^{1/5} }, \end{aligned}$$

for \(k=1,\ldots ,p\). Thus, the optimal convergence rate is \(b_k^\mathrm{opt} =\mathrm{O}(n^{-1/5})\).

Unfortunately, estimation of the unknown part of \(b_k^\mathrm{opt} \) is even more difficult than the variance estimation. In addition, commonly used cross-validation schemes are computationally prohibitive. We instead use the idea of Silverman’s reference bandwidth under the normal kernel (Sheather 2004) and set

$$\begin{aligned} \hat{b}_k =0.5\min (\,S_k ,\,\text{ IQR }_k /1.34\,)n^{-1/5}, \end{aligned}$$

where \(S_k^2 \) is the sample variance and \(\text{ IQR }_k \) is the sample inter-quartile range for the jumps of \({\mathbf{S}}_n^\mathrm{Logrank} ({\varvec{\beta }},\,\gamma )\) with respect to \(\beta _k \) (Appendix C). Although this choice may be somewhat ad hoc, we prefer this approach owning to its computational simplicity and reasonable numerical performance. The bandwidth estimator \(\hat{b}_{p+1} \) is obtained by a similar fashion (Appendix C).

One can construct the standard error and confidence interval for \(({{\hat{\varvec{\beta }}}},\,\hat{\gamma })\) using the proposed variance estimator. For instance, the standard error \(se(\hat{\beta }_k )\) of \(\hat{\beta }_k \) is the square root of the kth diagonal element of \({\hat{\mathbf {A}}}_0 ^{-1}{\hat{\mathbf {B}}}_0 {\hat{\mathbf {A}}}_0 ^{-1}\) divided by \(n^{1/2} \). On the basis of the asymptotic normality (Theorem 2), a \((1-\alpha )\times 100\,\% \) confidence interval becomes

$$\begin{aligned}{}[\hat{\beta }_k -z_{\alpha /2} se(\hat{\beta }_k ),\,\,\hat{\beta }_k +z_{\alpha /2} se(\hat{\beta }_k )], \end{aligned}$$

where \(z_{\alpha /2} \) is the upper \((\alpha /2)\times 100\,\% \) point of \(N(0,\,1)\). The standard error and confidence interval for \(\hat{\gamma }\) are derived similarly. If the confidence interval does not cover \(\gamma =0\), one can reject the independence assumption between T and \(Y^*\) given \({\mathbf{X}}\).

Estimation of the standard error for \(\hat{S}_\varepsilon (t;\,{{\hat{\varvec{\beta }}}},\,\hat{\gamma })\) is more challenging. Note that the Greenwood-type variance formula produces serious under-estimation since \(\{(\varepsilon _i^T ({{\hat{\varvec{\beta }}}},\,\hat{\gamma }),\,\varepsilon _i^Y ({{\hat{\varvec{\beta }}}},\,\hat{\gamma }),\,\Delta _i )\}\) and \(\{(\varepsilon _j^T ({{\hat{\varvec{\beta }}}},\,\hat{\gamma }),\,\varepsilon _j^Y ({{\hat{\varvec{\beta }}}},\,\hat{\gamma }),\,\Delta _j )\}\) are not independent for \(i\ne j\). A better alternative is the bootstrap. For \(b=1,\ldots ,B\), where B is usually 500 or 1000, a resample \(\{(T_i^{(b)} ,\,Y_i^{(b)} ,\,\Delta _i^{(b)} ,\,{\mathbf{X}}_i^{(b)} ); (i=1,\ldots ,n)\}\) is taken out with replacement from the observation. Then, for each b, we calculate the estimate \(\hat{S}_\varepsilon ^{(b)} (t;\,{{\hat{\varvec{\beta }}}}^{(b)},\,\hat{\gamma }^{(b)})\) based on the resample. The sample standard deviation of the B estimates can be used for the standard error for \(\hat{S}_\varepsilon (t;\,{{\hat{\varvec{\beta }}}},\,\hat{\gamma })\). Some simulations, not reported here, show that the bootstrap performs reasonably well.

4 Simulations

We conduct simulations to examine the performance of the proposed method and also to compare the proposed method with the method of Lai and Ying (1991).

4.1 Simulation design

We generate data under a bivariate normal model

$$\begin{aligned} \left[ {{\begin{array}{c} {Y^*}\\ T\\ \end{array} }} \right] \sim N \left( {\,\left[ {{\begin{array}{c} {\beta _0 X}\\ {-1}\\ \end{array} }} \right] ,\,\left[ {{\begin{array}{cc} 1&{}\quad \rho \\ \rho &{}\quad 1\\ \end{array} }} \right] \,}\right) . \end{aligned}$$
(8)

Here, the covariate X follows the uniform distribution on [0,1]. The censoring time C follows the conditional distribution \(C^*\vert C^*>T\), where \(C^*\sim N(1,\,1)\). Then, left-truncated and right-censored data \(\text{( }T_i ,\,Y_i ,\,\Delta _i ,\,X_i )\) subject to \(T_i \le Y_i \), for \(i=1,\ldots ,n\), are generated.

Note that model (8) leads to

$$\begin{aligned} Y^*=\beta _0 X+\gamma _0 T+\varepsilon , \end{aligned}$$

where \(\varepsilon \sim N(\gamma _0 ,\,1-\gamma _0 ^2)\) and \(\gamma _0 =\rho \). The corresponding survival function for \(\varepsilon \) is

$$\begin{aligned} S_\varepsilon (t)=1-\Phi \left\{ {\frac{t-\gamma _0 }{(1-\gamma _0^2 )^{1/2}}} \right\} , \end{aligned}$$
(9)

and the truncation probability is

$$\begin{aligned} \text{ Pr }(T\le Y^*)=\int \nolimits _0^1 {\Phi \left[ {\frac{1+\beta _0 u}{\{2(1-\gamma _0 )\}^{1/2}}} \right] du}, \end{aligned}$$

where \(\Phi \) is the cumulative distribution function for \(N(0,\,1)\). We examine six parameter configurations, namely \((\beta _0 ,\gamma _0 )= (0, -0.3), (0, 0), (0, 0.3), (1, -0.3) (1, 0), \, \mathrm{and} (1, 0.3)\), which yield the truncation probability \(\Pr (T\le Y^*) = 0.73, 0.76, 0.80, 0.82, 0.85, \, \mathrm{and} \, 0.89\), respectively. The corresponding censoring probability is \(\text{ Pr }(C<Y^*\vert T\le Y^*) = 0.28, 0.27, 0.24, 0.40, 0.38, \, \mathrm{and}\, 0.36\), respectively. Two sample sizes are considered: \(n=\)150 and 300.

For simulated data, we compute \((\hat{\beta },\,\hat{\gamma })\) by following the Nelder–Mead algorithm implemented in the R optim routine (Sect. 2.3). The variance estimator \({\hat{\mathbf {A}}}_0 ^{-1}{\hat{\mathbf {B}}}_0 {\hat{\mathbf {A}}}_0 ^{-1}\) for \((\hat{\beta },\,\hat{\gamma })\) and the 95 % confidence intervals are then computed using the methods of Sect. 3.2. We have checked that the routine successfully ascertains the right solution and that the matrix \({\hat{\mathbf {A}}}_0 \) is invertible through the 500 replications (not shown).

4.2 Simulation results

Table 1 presents the results for estimating \((\beta _0 ,\gamma _0 )\). In all the cases, the proposed estimator provides almost unbiased results. The standard deviation of the estimates decreases as the sample size increases from \(n=150\, \mathrm{to} \, 300\).

Table 1 Simulation results for the proposed estimator \((\hat{\beta },\,\hat{\gamma })\)

Table 1 also compares the standard deviations with the standard errors. For n=150, the standard error \(\mathrm{se}(\hat{\beta })\) tends to overestimate the standard deviation of \(\hat{\beta }\). This upward bias is due to a few outlying variance estimates in the 500 replications, and hence does not affect the coverage probability much. The upward bias vanishes as the sample size increases from \(n=150\) to \(n=300\). In a similar fashion, the standard error \(\mathrm{se}(\hat{\gamma })\) is sometimes biased for the standard deviation of \(\hat{\gamma }\) with \(n=150\), but it is nearly unbiased with \(n=300\). These results imply that the kernel approach to variance estimation works well for large samples. Accordingly the empirical coverage probabilities for the 95 % confidence intervals are all close to the nominal level.

Table 2 compares the performance between the proposed estimator and the estimator of Lai and Ying (1991) with the Gehan weight. The latter is obtained by solving \({\mathbf{U}}_n^G ({\varvec{\beta }})=0\), or equivalently solving \({\mathbf{S}}_n^\mathrm{Logrank} (\beta ,\,\gamma )=0\) under \(\gamma =0\) (Sect. 2.3). As expected from our discussions in Sect. 2.2, Lai and Ying’s estimator produces systematic bias for the cases of \(\gamma _0 \ne 0\) (Table 2). The biases do not vanish as the sample size increases from \(n=150\) to \(n=300\), and the direction of bias is determined by the sign of \(\gamma _0 \). If \(\gamma _0 =0\), both the proposed method and Lai and Ying’s method are nearly unbiased, but the standard deviations of Lai and Ying’s estimator are slightly smaller. However, the loss of efficiency in the proposed estimator is quite modest.

Table 2 Simulation results for comparing the proposed estimator with the estimator of Lai and Ying (1991)
Table 3 Simulation results for the proposed estimator \(\hat{S}_\varepsilon (t;\,\hat{\beta },\,\hat{\gamma })\)

Table 3 examines the performance of \(\hat{S}_\varepsilon (t;\,\hat{\beta },\,\hat{\gamma })\) at selected values of t satisfying \(S_\varepsilon (t)=0.25, 0.50,\, {\text{ and }}\, 0.75\), which can be obtained from Eq. (9). The results show that the estimator \(\hat{S}_\varepsilon (t;\,\hat{\beta },\,\hat{\gamma })\) is nearly unbiased in all the cases. The standard deviation decreases as the sample size n increases from 150 to 300.

5 Data analysis

To illustrate the proposed methodology, we analyze the data collected from the Channing house retirement center available in Appendix 1 of Hyde (1980). The data consist of 462 subjects (97 men and 365 women) whose lifetime is left truncated by the entry age. Among them, 286 subjects are right censored, yielding the censoring proportion 0.62. The covariate “gender” is coded as \(X=1\) for male and \(X=0\) for female. We also include the truncation time (entry age) as a regressor. Our aim is to study the effects of gender and entry age jointly on the lifetime.

We fit the model \(Y^*=\beta _0 X+\gamma _0 T+\varepsilon \) and obtain \(\hat{\beta }=-0.030\) (\(\mathrm{se} = 0.018\)) and \(\hat{\gamma }=0.26\) (\(\mathrm{se} = 0.12\)). Here, the point estimates are obtained by the proposed algorithm (Sect. 2.3) with the initial values \((\beta =0,\,\gamma =0)\). The same estimates are obtained with the other initial values \((\hat{\beta }^{LY}=-0.036,\,\gamma =0)\). The 95 % confidence interval for \(\gamma \) is \((\,0.048,\,0.468\,)\) which indicates that the entry age and lifetime are positively associated (\(p-\mathrm{value} = 0.016\)). This result agrees with Tsai’s quasi-independence test with the estimated conditional Kendall tau equal to 0.088 (\(p-\mathrm{value} = 0.076\)) (see Emura and Wang 2010). The fitted value \(\hat{\beta }=-0.030\) implies that women seemed to live longer than men but the gender difference is not statistically significant since the corresponding 95 % confidence interval (\(-\)0.066, 0.007) covers the zero. This observation agrees with the previously reported results from the Cox proportional hazard regression analysis under the independent left-truncation assumption [see Example 9.4 in p.313 of Klein and Moeschberger (2003)].

6 Concluding remarks

Existing regression models and methods developed for left-truncated and right-censored data focus on the covariate effect on lifetime under the assumption that lifetime is independent of truncation time. However, empirical evidence has shown that truncation may reveal useful information on the lifetime of interest (Tsai 1990; Jones and Crowley 1992; Chen et al. 1996; Martin and Betensky 2005; Emura and Wang 2010; Rodríguez Girondo and de Uña-Álvarez 2012). In this article, we explicitly include the truncation variable, along with other covariates, in the proposed semiparametric AFT regression model. We implement computing routines for the proposed method in R depend.truncation package which is freely available through the Comprehensive R Archive Network (CRAN) http://cran.rproject.org/”.

Jones and Crowley (1992) also included the truncation variable as a covariate based on their models. However, their paper focused only on testing the assumption of quasi-independence without estimating the models.

Another approach for dependent truncation is based on copula models, where the dependence between truncation time and lifetime can be measured and estimated (Chaieb et al. 2006; Beaudoin and Lakhal-Chaieb 2008; Emura et al. 2011; Emura and Wang 2012; Emura and Murotani 2015). In the absence of covariates, these papers proposed semiparametric inference methods assuming a parametric copula with unspecified marginal distributions for the truncation time and lifetime. The copula-based models could be extended to include covariates along the line of Braekers and Veraverbeke (2005), Chen (2010), and Emura and Chen (2014) as studied under the competing risks setting. However, one difficulty comes from the joint estimation of the two marginal distribution functions, which are infinite dimensional. Compared with the copula approach, the proposed methodology is much easier to implement since the proposed estimating functions only involve regression parameters and avoid handling infinite dimensional parameters. This implies that there is enough information in the estimating functions to perform regression analysis.

7 Appendix A: Properties of the estimating functions

7.1 Appendix A1: Proof of Equation (2)

Without loss of generality, we assume that \(e^{Y}_{i}\) (\(\varvec{ \beta }\)) \(\ne \) \(e^{Y}_{i}\) (\(\varvec{\beta }\)), \(i \ne j\), holds true. By straightforward calculations,

$$\begin{aligned} {\mathbf{U}}_n^G ({\varvec{\beta }})= & {} \sum \limits _{i=1}^n {\sum \limits _{j=1}^n {\Delta _i ({\mathbf{X}}_i -{\mathbf{X}}_j )I\{e_j^T ({\varvec{\beta }})\le e_i^Y ({\varvec{\beta }})\le e_j^Y ({\varvec{\beta }})\}} } \\= & {} \sum \limits _{i=1}^n {\sum \limits _{j:e_i^Y ({\varvec{\beta }})<e_j^Y ({\varvec{\beta }})} {\Delta _i ({\mathbf{X}}_i -{\mathbf{X}}_j )I\{e_j^T ({\varvec{\beta }})\le e_i^Y ({\varvec{\beta }})\}} } . \end{aligned}$$

If a pair \((i,\,j)\) satisfies \(e_i^Y ({\varvec{\beta }})<e_j^Y ({\varvec{\beta }})\), it holds \(e_i^Y ({\varvec{\beta }})=\tilde{e}_{ij}^Y ({\varvec{\beta }})=e_i^Y ({\varvec{\beta }})\wedge e_j^Y ({\varvec{\beta }})\). Thus,

$$\begin{aligned} {\mathbf{U}}_n^G ({\varvec{\beta }})=\sum \limits _{i=1}^n {\sum \limits _{j:e_i^Y ({\varvec{\beta }})<e_j^Y ({\varvec{\beta }})} {\Delta _i ({\mathbf{X}}_i -{\mathbf{X}}_j )I\{e_j^T ({\varvec{\beta }})\le \tilde{e}_{ij}^Y ({\varvec{\beta }})\}} } . \end{aligned}$$

Since \(\{e_j^T ({\varvec{\beta }})\le \tilde{e}_{ij}^Y ({\varvec{\beta }})\}\) holds if and only if \(\{{\mathop {e}\limits ^{\smile }}_{ij}^T ({\varvec{\beta }})\le \tilde{e}_{ij}^Y ({\varvec{\beta }})\}\),

$$\begin{aligned} {\mathbf{U}}_n^G ({\varvec{\beta }})=\sum \limits _{i=1}^n {\sum \limits _{j:e_i^Y ({\varvec{\beta }})<e_j^Y ({\varvec{\beta }})} {\Delta _i ({\mathbf{X}}_i -{\mathbf{X}}_j )I\{{\mathop {e}\limits ^{\smile }}_{ij}^T ({\varvec{\beta }})\le \tilde{e}_{ij}^Y ({\varvec{\beta }})\}} } . \end{aligned}$$

As long as a pair \((i,\,j)\) satisfies \(e_i^Y ({\varvec{\beta }})<e_j^Y ({\varvec{\beta }})\), we obtain the identity

$$\begin{aligned} \Delta _i= & {} \Delta _i \Delta _j +\Delta _i (1-\Delta _j )I\{e_i ({\varvec{\beta }})<e_j ({\varvec{\beta }})\}+(1-\Delta _i )\Delta _j I\{e_i ({\varvec{\beta }})>e_j ({\varvec{\beta }})\}\\= & {} Q_{ij} ({\varvec{\beta }}). \end{aligned}$$

This leads to

$$\begin{aligned} {\mathbf{U}}_n^G ({\varvec{\beta }})= & {} \sum \limits _{i=1}^n {\sum \limits _{j:e_i^Y ({\varvec{\beta }})<e_j^Y ({\varvec{\beta }})} {({\mathbf{X}}_i -{\mathbf{X}}_j )} }I\{{\mathop {e}\limits ^{\smile }}_{ij}^T ({\varvec{\beta }})\le \tilde{e}_{ij}^Y ({\varvec{\beta }})\} Q_{ij} ({\varvec{\beta }}) \\= & {} -\sum \limits _{i<j} {({\mathbf{X}}_i -{\mathbf{X}}_j )\mathrm{sgn}\{e_i^Y ({\varvec{\beta }})-e_j^Y ({\varvec{\beta }})\}I\{{\mathop {e}\limits ^{\smile }}_{ij}^T ({\varvec{\beta }})\le \tilde{e}_{ij}^Y ({\varvec{\beta }})\}} Q_{ij} ({\varvec{\beta }}). \\ \end{aligned}$$

\(\square \)

7.2 Appendix A2: The functional behavior of \({\mathbf{S}}_n^\mathrm{Logrank} ({\varvec{\beta }},\,\gamma )\)

We consider the case that \({\mathbf{X}}=X\) is one dimensional, and hence \({\varvec{\beta }}=\beta \). For a given \(\gamma \), \({\mathbf{S}}_n^\mathrm{Logrank} ({\varvec{\beta }},\,\gamma )\) is the sum of

$$\begin{aligned} L_{ij} (\beta )=(X_i -X_j )\mathrm{sgn}\{\varepsilon _i^Y (\beta ,\,\gamma )-\varepsilon _j^Y (\beta ,\,\gamma )\}I\{{\mathop {\varepsilon }\limits ^{\smile }}_{ij}^T (\beta ,\,\gamma )\le \tilde{\varepsilon }_{ij}^Y (\beta ,\,\gamma )\}O_{ij} (\beta ,\,\gamma ), \end{aligned}$$

for \(i<j\). Suppose \(O_{ij} (\beta )=1\). Otherwise \(L_{ij} (\beta )=0\) has no contribution to \({\mathbf{S}}_n^\mathrm{Logrank} ({\varvec{\beta }},\,\gamma )\). Note that the range of \(\beta \) that satisfies the condition \({\mathop {\varepsilon }\limits ^{\smile }}_{ij}^T (\beta ,\gamma )\le \tilde{\varepsilon }_{ij}^Y (\beta ,\gamma )\) is always a closed interval. The function \(\mathrm{sgn}\{\varepsilon _i^Y (\beta ,\gamma )-\varepsilon _j^Y (\beta ,\gamma )\}\) is a step function that crosses with zero at \(\beta =\{Y_i -Y_j -\gamma (T_i -T_j )\}/(X_i -X_j )\). Therefore, \(L_{ij} (\beta )\) is non-monotonic step function which has at most three different jump points.

8 Appendix B: Proofs for asymptotic analysis

8.1 Appendix B.1: Proof of Equation (4)

By definition,

$$\begin{aligned}&2n^{-2}{\mathbf{S}}_n^\mathrm{Logrank} ({\varvec{\upbeta }},\,\gamma ) \\&\quad =-n^{-2}\sum \limits _{i,j} {({\mathbf{X}}_i -{\mathbf{X}}_j )\mathrm{sgn}\{(\varepsilon _i^Y ({\varvec{\upbeta }},\,\gamma )-\varepsilon _j^Y ({\varvec{\upbeta }},\,\gamma )\}I\{{\mathop {\varepsilon }\limits ^{\smile }}_{ij}^T ({\varvec{\upbeta }},\,\gamma )\le \tilde{\varepsilon }_{ij}^Y ({\varvec{\upbeta }},\,\gamma )\}O_{ij} ({\varvec{\upbeta }},\,\gamma )} \\&\quad =-\int \!\!\!\int {\left[ {({\mathbf{x}}_1 -{\mathbf{x}}_2 )\mathrm{sgn}\{y_1 -{{\varvec{\upbeta }'\mathbf{x}}}_1 -\gamma t_1 -(y_2 -{{\varvec{\upbeta }'\mathbf{x}}}_2 -\gamma t_2 )\}} \right. } \\&\qquad \times \, I\{(t_1 -{{\varvec{\upbeta }'\mathbf{x}}}_1 -\gamma t_1 )\vee (t_2 -{{\varvec{\upbeta }'\mathbf{x}}}_2 -\gamma t_2 )\le (y_1 -{{\varvec{\upbeta }'\mathbf{x}}}_1 -\gamma t_1 )\wedge (y_2 -{{\varvec{\upbeta }'\mathbf{x}}}_2 -\gamma t_2 )\} \\&\qquad \left. \times \, \{\delta _1 \delta _2 +\delta _1 (1-\delta _2 )I(y_1 -{{\varvec{\upbeta }'\mathbf{x}}}_1 -\gamma t_1 <y_2 -{{\varvec{\beta }'\mathbf{x}}}_2 -\gamma t_2 )\right. \\&\left. \qquad +\,(1-\delta _1 )\delta _2 I(y_1 -{{\varvec{\upbeta }'}}\mathbf{x}_1 -\gamma t_1 >y_2 -{{\varvec{\upbeta }'\mathbf{x}}}_2 -\gamma t_2 )\} \right] \\&\qquad \times \,\mathrm{d}F_n (t_1 ,\,y_1 ,\,\delta _1 ,\,{\mathbf{x}}_1 )dF_n (t_2 ,\,y_2 ,\,\delta _2 ,\,{\mathbf{x}}_2 ). \end{aligned}$$

Also, using similar calculations for \(2n^{-2}S_n^{Kendall} ({\varvec{\beta }},\gamma )\), we obtain Equation (4).

8.2 Appendix B.2: Proof for Theorem 1

The consistency proof follows from the general theory of Z-estimators. For Theorem 5.9 of Van Der Vaart (1998) to be applied, one needs to verify the following conditions:

  1. (i)

    \(\sup _{({\varvec{\beta }},\,\gamma )\in \Theta } \vert \vert \Phi (F_n ;{\varvec{\beta }},\,\gamma )-\Phi (F;{\varvec{\beta }},\,\gamma )\vert \vert \) converges in probability to zero,

  2. (ii)

    For every \(\varepsilon >0\), \(\inf _{\vert \vert ({\varvec{\beta }},\,\gamma )-({\varvec{\beta }}_0 ,\,\gamma _0 )\vert \vert _2 \ge \varepsilon } \vert \vert \Phi (F;{\varvec{\beta }},\,\gamma )\vert \vert _2 >0=\vert \vert \Phi (F;{\varvec{\beta }}_0 ,\,\gamma _0 )\vert \vert _2 \).

To prove (i), let \(D\{(-\infty ,\,\infty )^{p+3}\}\) be a collection of all right-continuous, bounded functions on \((-\infty ,\,\infty )^{p+3}\) whose left limit exists such that \(F,\,F_n \in D\{(-\infty ,\,\infty )^{p+3}\}\). Define the norm \(\sup _{x\in (-\infty ,\,\infty )^{p+3}} \vert f(x)\vert \) for \(f\in D\{(-\infty ,\,\infty )^{p+3}\}\). Also, let \(l^\infty (\Theta )\) be a collection of all bounded functions on \(\Theta \) whose norm is defined by \(\sup _{\theta \in \Theta } \vert \vert H(\theta )\vert \vert \) for \(H\in l^\infty (\Theta )\). It is well-known from the Glivenko–Cantelli theorem that \(F_n \) converges in probability to F in the normed space \(D\{(-\infty ,\,\infty )^{p+3}\}\). Also, it can be shown that a map\(D\{(-\infty ,\,\infty )^{p+3}\}\mapsto l^\infty (\Theta )\), defined as

$$\begin{aligned}&F\mapsto \Phi (F;{\varvec{\beta }},\,\gamma )\\&\quad =\int \!\!\!\int {{\mathbf{h}}\{(t_1 ,\,y_1 ,\,\delta _1 ,\,{\mathbf{x}}_1 ),\,(t_2 ,\,y_2 ,\,\delta _2 ,\,{\mathbf{x}}_2 );{\varvec{\beta }},\,\gamma \}dF(t_1 ,\,y_1 ,\,\delta _1 ,\,{\mathbf{x}}_1 )dF(t_2 ,\,y_2 ,\,\delta _2 ,\,{\mathbf{x}}_2 )}, \end{aligned}$$

is continuous. By the continuous mapping theorem,

$$\begin{aligned}&\left. {\mathop {\sup }\limits _{\Theta \in ({\varvec{\beta }},\,\gamma )} } \right| \int \!\!\!\int {{\mathbf{h}}\{(t_1 ,\,y_1 ,\,\delta _1 ,\,{\mathbf{x}}_1 ),\,(t_2 ,\,y_2 ,\,\delta _2 ,\,{\mathbf{x}}_2 );{\varvec{\beta }},\,\gamma \}dF_n (t_1 ,\,y_1 ,\,\delta _1 ,\,{\mathbf{x}}_1 )dF_n (t_2 ,\,y_2 ,\,\delta _2 ,\,{\mathbf{x}}_2 )} \, \\&\qquad -\left. {\int \!\!\!\int {{\mathbf{h}}\{(t_1 ,\,y_1 ,\,\delta _1 ,\,{\mathbf{x}}_1 ),\,(t_2 ,\,y_2 ,\,\delta _2 ,\,{\mathbf{x}}_2 );{\varvec{\beta }},\,\gamma \}dF(t_1 ,\,y_1 ,\,\delta _1 ,\,{\mathbf{x}}_1 )dF(t_2 ,\,y_2 ,\,\delta _2 ,\,{\mathbf{x}}_2 )} \,} \right| \\ \end{aligned}$$

converges in probability to zero, which implies (i). The validity of (ii) is due to Assumption 2. \(\square \)

8.3 Appendix B.3: Proof for asymptotic normality

Note that the random vector \(\phi _F \{(T_j ,\,Y_j ,\,\Delta _j ,\,{\mathbf{X}}_j );\,{\varvec{\beta }},\,\gamma \}\) in Eq. (6) is bounded since \({\mathbf{h}}\,\)is bounded under Assumption 1. Hence, under Assumption 1, the dominated convergence theorem shows

$$\begin{aligned} E[\phi _F \{(T,\,Y,\,\Delta ,\,{\mathbf{X}});\,{\varvec{\beta }},\,\gamma \}-\phi _F \{(T,\,Y,\,\Delta ,\,{\mathbf{X}});\,{\varvec{\beta }}_0 ,\,\gamma _0 \}]^2\rightarrow 0, \end{aligned}$$

as \(({\varvec{\beta }},\,\gamma )\rightarrow ({\varvec{\beta }}_0 ,\,\gamma _0 )\). By Lemma 3.3.5 of Van Der Vaart and Wellner 1996 and under Assumption 4, we have

$$\begin{aligned}&n^{1/2}\{\Phi (F_n ;{{\hat{\varvec{\beta }}}},\,\hat{\gamma })-\Phi (F;{{\hat{\varvec{\beta }}}},\,\hat{\gamma })\}-n^{1/2}\{\Phi (F_n ;{\varvec{\beta }}_0 ,\,\gamma _0 )-\Phi (F;{\varvec{\beta }}_0 ,\,\gamma _0 )\}\,\, \\&\quad =o_P (1+n^{1/2}\vert \vert {\varvec{\beta }}-{\varvec{\beta }}_0 ,\,\gamma -\gamma _0 \vert \vert ). \end{aligned}$$

Since \(\Phi (F_n ;{{\hat{\varvec{\beta }}}},\,\hat{\gamma })=\Phi (F;{\varvec{\beta }}_0 ,\,\gamma _0 )={\mathbf{0}}\), under Assumptions 3 and 4,

$$\begin{aligned} {\mathbf{0}}= & {} n^{1/2}\Phi (F_n ;{{\hat{\varvec{\beta }}}},\,\hat{\gamma })-n^{1/2}\Phi (F;{\varvec{\beta }}_0 ,\,\gamma _0 ) \\= & {} n^{1/2}\{\Phi (F_n ;{{\hat{\varvec{\beta }}}},\,\hat{\gamma })-\Phi (F;{{\hat{\varvec{\beta }}}},\,\hat{\gamma })\} \\&+\,n^{1/2}\{\Phi (F;{{\hat{\varvec{\beta }}}},\,\hat{\gamma })-\Phi (F;{\varvec{\beta }}_0 ,\,\gamma _0 )\} \\= & {} n^{1/2}\{\Phi (F_n ;{{\hat{\varvec{\beta }}}},\,\hat{\gamma })-\Phi (F;{{\hat{\varvec{\beta }}}},\,\hat{\gamma })\}-n^{1/2}\{\Phi (F_n ;{\varvec{\beta }}_0 ,\,\gamma _0 )-\Phi (F;{\varvec{\beta }}_0 ,\,\gamma _0 )\} \\&+\,n^{1/2}\{\Phi (F_n ;{\varvec{\beta }}_0 ,\,\gamma _0 )-\Phi (F;{\varvec{\beta }}_0 ,\,\gamma _0 )\} \\&+\,n^{1/2}\{\Phi (F;{{\hat{\varvec{\beta }}}},\,\hat{\gamma })-\Phi (F;{\varvec{\beta }}_0 ,\,\gamma _0 )\} \\= & {} o_P (1+n^{1/2}\vert \vert {\varvec{\beta }}-{\varvec{\beta }}_0 ,\,\gamma -\gamma _0 \vert \vert ) \\&+\,n^{-1/2}\sum \limits _{j=1}^n {\phi _F \{(T_j ,\,Y_j ,\,\Delta _j ,\,{\mathbf{X}}_j );\,{\varvec{\beta }}_0 ,\,\gamma _0 \}} \\&+\,{\mathbf{A}}_0 n^{1/2}({{\hat{\varvec{\beta }}}}-{\varvec{\beta }}_0 ,\,\hat{\gamma }-\gamma _0 {)}', \end{aligned}$$

where the last equation uses Eq. (5) and the differentiability of \(\Phi (F;{\varvec{\beta }},\,\gamma )\) at \(({\varvec{\beta }}_0 ,\,\gamma _0 )\). The preceding formula implies that \(({{\hat{\varvec{\beta }}}},\,\hat{\gamma })\) is \(\sqrt{n} \)-consistent for \(({\varvec{\beta }}_0 ,\,\gamma _0 )\) and, therefore, Eq. (7) holds [see p.311 of Van Der Vaart and Wellner (1996)]. \(\square \)

8.4 Appendix B.4: Consistency of the product-limit estimator

We show that \(\sup _{t\in [a,b]} \vert \hat{S}_\varepsilon (t;\,{{\hat{\varvec{\beta }}}},\,\hat{\gamma })-S_\varepsilon (t)\vert \) converges in probability to zero. It is useful to write \(S_\varepsilon (t)=\prod \nolimits _{s\le t} {\{1-d\Lambda _\varepsilon (s)\}} \), where \(\Lambda _\varepsilon (t)=-\log S_\varepsilon (t)\) is the cumulative hazard function. Similarly, \(\hat{S}_\varepsilon (t;\,{\varvec{\beta }},\,\gamma )=\prod \nolimits _{s\le t} {\{1-d\hat{\Lambda }(s;{\varvec{\beta }},\,\gamma )\}} \), where

$$\begin{aligned} d\hat{\Lambda }(s;{\varvec{\beta }},\,\gamma )=R(s;{\varvec{\beta }},\,\gamma )^{-1}\sum \nolimits _j {I(\varepsilon _j^Y ({\varvec{\beta }},\,\gamma )=s,\,\Delta _j =1)} \end{aligned}$$

and \(R(s;{\varvec{\beta }},\,\gamma )=\sum \nolimits _j {I\{\varepsilon _j^T ({\varvec{\beta }},\,\gamma )\le s\le \varepsilon _j^Y ({\varvec{\beta }},\,\gamma )\}} \). Since the product integration is a continuous map, it suffices to show that \(\sup _{t\in [a,b]} \vert \hat{\Lambda }_\varepsilon (t;\,{{\hat{\varvec{\beta }}}},\,\hat{\gamma })-\Lambda _\varepsilon (t)\vert \) converges in probability to zero. Let \(\Lambda _\varepsilon ^*(t)=\int _{-\infty }^t {J(s)d\Lambda _\varepsilon (s)} \), where \(J(s)=I\{R(s;{\varvec{\beta }}_0 ,\,\gamma _0 )>0\}\), and

$$\begin{aligned}&\Lambda _\varepsilon (t,{\varvec{\upbeta }},\,\gamma )\\&\quad =\int \limits _{-\infty }^t \frac{E[I(T-{{\varvec{\upbeta }'\mathbf{X}}}-\gamma T)f_\varepsilon \{s+({\varvec{\upbeta }}-{\varvec{\upbeta }}_0 {)}'{\mathbf{X}}+(\gamma -\gamma _0 )T\}S_C (s+{{\varvec{\upbeta }'\mathbf{X}}}+\gamma T)]}{E[I(T-{{\varvec{\upbeta }'\mathbf{X}}}-\gamma T)S_\varepsilon \{s+({\varvec{\upbeta }}-{\varvec{\upbeta }}_0 {)}'{\mathbf{X}}+(\gamma -\gamma _0 )T\}S_C (s+{{\varvec{\upbeta }'\mathbf{X}}}+\gamma T)]}ds. \end{aligned}$$

It is easy to see that

$$\begin{aligned} \Lambda _\varepsilon (t,{\varvec{\beta }}_0 ,\gamma _0 )=\int \nolimits _{-\infty }^t {f_\varepsilon (s)S_\varepsilon (s)^{-1}ds} =\Lambda _\varepsilon (t). \end{aligned}$$

Also, let \(\Lambda _\varepsilon ^*(t,{\varvec{\beta }},\,\gamma )=\int _{-\infty }^t {J(s)d\Lambda _\varepsilon (s,{\varvec{\beta }},\,\gamma )} \). Now, for \(\delta \) from Assumption 5, we have

$$\begin{aligned}&\vert \hat{\Lambda }_\varepsilon (t;\,{{\hat{\varvec{\beta }}}},\,\hat{\gamma })-\Lambda _\varepsilon (t)\vert \\&\quad \le \sup _{\vert \vert ({\varvec{\beta }},\,\gamma )-({\varvec{\beta }}_0 ,\,\gamma _0 )\vert \vert \le \delta } \vert \hat{\Lambda }_\varepsilon (t;\,{\varvec{\beta }},\,\gamma )-\Lambda _\varepsilon ^*(t;\,{\varvec{\beta }},\,\gamma )\vert +\vert \Lambda _\varepsilon ^*(t;\,{{\hat{\varvec{\beta }}}},\,\hat{\gamma })-\Lambda _\varepsilon ^*(t)\vert \\&\qquad +\,\vert \Lambda _\varepsilon ^*(t)-\Lambda _\varepsilon (t)\vert . \\ \end{aligned}$$

Under Assumption 5, and by the Glivenko–Cantelli theorem, it follows that

$$\begin{aligned} \sup _{t\in [a,b]} \sup _{\vert \vert ({\varvec{\beta }},\,\gamma )-({\varvec{\beta }}_0 ,\,\gamma _0 )\vert \vert \le \delta } \vert \hat{\Lambda }_\varepsilon (t;\,{\varvec{\beta }},\,\gamma )-\Lambda _\varepsilon ^*(t;\,{\varvec{\beta }},\,\gamma )\vert \end{aligned}$$

converges in probability to zero. Since \(\Lambda _\varepsilon (t,{\varvec{\beta }},\,\gamma )\) is continuous at \(({\varvec{\beta }}_0 ,\,\gamma _0 )\), the continuous mapping is applied to Theorem 1 to show that \(\sup _{t\in [a,b]} \vert \Lambda _\varepsilon ^*(t;\,{{\hat{\varvec{\beta }}}},\,\hat{\gamma })-\Lambda _\varepsilon ^*(t)\vert \) converges in probability to zero. Note that \(R(s;{\varvec{\beta }}_0 ,\,\gamma _0 )\) converges in probability to \(\pi _\varepsilon (t,{\varvec{\beta }}_0 ,\,\gamma _0 )>0\), uniformly over \(t\in [a,b]\). Hence, \(I\{R(s;{\varvec{\beta }}_0 ,\,\gamma _0 )=0\}\) converges in probability to zero as well. Finally,

$$\begin{aligned} \vert \Lambda _\varepsilon ^*(t)-\Lambda _\varepsilon (t)\vert =\left| {\int \nolimits _{-\infty }^t {I\{R(s;{\varvec{\beta }}_0 ,\,\gamma _0 )=0\}d\Lambda _\varepsilon (s)} } \right| , \end{aligned}$$

converges in probability to zero, uniformly over \(t\in [a,\,b]\). \(\square \)

9 Appendix C: Bandwidth calculation

The jumps of \({\mathbf{S}}_n^\mathrm{Logrank} ({\varvec{\beta }},\,\gamma )\) with respect to \(\beta _k \) given the other components satisfy

$$\begin{aligned} Y_i -{\hat{\varvec{\beta }}'}_{(-k)} {\mathbf{X}}_{i(-k)} -\beta _k X_{i,k} -\hat{\gamma }T_i =Y_j -{\hat{\varvec{\beta }}'}_{(-k)} {\mathbf{X}}_{j(-k)} -\beta _k X_{j,k} -\hat{\gamma }T_j, \end{aligned}$$

where \(X_{i,k} \) is kth component of \({\mathbf{X}}_i \), and \({\mathbf{X}}_{j(-k)} \) is the \(k-\)1 remaining vector. Accordingly, the jumps are identified at

$$\begin{aligned} \frac{Y_i -Y_j -{\hat{\varvec{\beta }}'}_{(-k)} ({\mathbf{X}}_{i(-k)} -{\mathbf{X}}_{j(-k)} )-\hat{\gamma }(T_i -T_j )}{X_{i,k} -X_{j,k} }\approx \frac{\varepsilon _i^Y ({{\hat{\varvec{\beta }}}},\,\hat{\gamma })-\varepsilon _j^Y ({{\hat{\varvec{\beta }}}},\,\hat{\gamma })}{X_{i,k} -X_{j,k} }{\equiv }\beta _{(i,j),k}^\#. \end{aligned}$$

Hence, we define \(S_k^2 \) as the sample variance of all \(\beta _{(i,j),k}^\# \), with \(i<j\), satisfying \(I\{{\mathop {\varepsilon }\limits ^{\smile }}_{ij}^T ({{\hat{\varvec{\beta }}}},\,\hat{\gamma })\le \tilde{\varepsilon }_{ij}^Y ({{\hat{\varvec{\beta }}}},\,\hat{\gamma })\}O_{ij} ({{\hat{\varvec{\beta }}}},\,\hat{\gamma })=1\) and \(X_{i,k} \ne X_{j,k} \). The \(\text{ IQR }_k \) is defined similarly.

The bandwidth \(b_{p+1} =0.5\min (\,S_{p+1} ,\,\text{ IQR }_{p+1} /1.34\,)n^{-1/5} \) is also obtained, where \(S_{p+1}^2 \) and \(\text{ IQR }_{p+1} \) are calculated with \(\gamma _{(i,j)}^\# \equiv \{\varepsilon _i^Y ({{\hat{\varvec{\beta }}}},\,\hat{\gamma })-\varepsilon _j^Y ({{\hat{\varvec{\beta }}}},\,\hat{\gamma })\}/(\,T_i -T_j )\).