1 Introduction

Time to event is the time from a given origin to the occurrence time of the event of interest. In many applied fields time to event data occur. Examples include duration analysis (e.g. time to first job after graduation); reliability analysis (e.g. lifetime of a mechanical component); survival analysis (e.g. time from onset to death). Other applied fields include: economy, insurance, demography, biology, public health, epidemiology, veterinary medicine.

Within the context of survival analysis, ’survival data’ is standard terminology for time to event data. The outcome of the event can be ’good’ (e.g. time to pain relief, time to recovery, time to cure) or can be ’bad’ (e.g. time to first relapse, time to death, time from diagnosis to onset).

Time to event data can be univariate (e.g. time from onset of virus infection to cure) or multivariate (e.g. the lifetime of monozygotic male twins; time to blindness in left and right eye in diabetic retinopathy patients; time to tumor in a litter-matched rats (one treated and two control rats) tumorigenesis experiment).

The multivariate examples we mentioned are so called parallel multivariate data. Parallel data sets follow several items/subjects/animals simultaneously. Other types of multivariate data structures (such as longitudinal data, repeated measures data) are discussed in Hougaard (2000). In this survey we focus on parallel multivariate data.

In this paper we focus on univariate and (parallel) bivariate survival data. Given a survival time \(T \ge 0\) or given a bivariate survival vector \((T_1,T_2)\) with \(T_1 \ge 0\) and \(T_2 \ge 0\), primary interest is often in estimating the survival distribution \(S(t) = P(T > t)\) (univariate setting) and \(S(t_1,t_2) = P(T_1> t_1, T_2 > t_2)\) (bivariate setting). Survival data are (typically) subject to right censoring, for such data we review, in Sect. 2 of the paper, nonparametric estimation of \(S(t) = P(T > t)\). In Sect. 3 we review nonparametric estimation for \(S(t_1,t_2) = P(T_1> t_1, T_2 > t_2)\). After a general introduction (Sect. 3.3), we discuss univariate censoring and one-component censoring. We clearly explain how the censoring scheme determines the definition of the nonparametric estimator and for each estimator we study the asymptotic normality and give an explicit analytic expression for the asymptotic variance (contributing to an open question in Hougaard (2000), p. 457, where he writes ’generally, expressions for the variance are not available’).

The advantage of nonparametric estimation when compared to (semi)parametric estimation is that no underlying model assumptions are made (i.e. the inference is completely data driven). Also note that even if (semi)parametric models are used to estimate the survival distribution, nonparametric estimators remain instrumental for goodness-of-fit purposes.

2 Nonparametric estimation of the univariate survival function

2.1 The right random censoring model

In survival analysis, the main object of interest is a nonnegative random variable T, called survival time (or lifetime, failure time, event time,...).

A typical feature is that T is not always observed. Instead of T one sometimes observes some other nonnegative random variable C, called censoring time. In the right random censorship model the observable variables are

$$\begin{aligned} Y = T \wedge C\ \ \ \ \text{ and }\ \ \ \ \delta = I(T \le C) \end{aligned}$$

where \(a \wedge b = \min (a,b)\) and where I is the indicator function, defined for every event A as \(I(A) = 1\) if and only if A holds and zero otherwise.

The right random censorship model assumes that

T and C are independent,

the independent censoring assumption. Let \(T_1, \ldots , T_n {\mathop {\sim }\limits ^{i.i.d.}} T\) be independent and identically distributed (i.i.d.) random variables with distribution function F and survival function \(S= 1-F\) and \(C_1, \ldots , C_n {\mathop {\sim }\limits ^{i.i.d.}} C\) with distribution function G. The observations in the model are \((Y_i,\delta _i)\), with \(Y_i = T_i \wedge C_i\) and \(\delta _i = I(T_i \le C_i)\), \(i = 1,\ldots , n\). We clearly have \(Y_1,\ldots , Y_n {\mathop {\sim }\limits ^{i.i.d.}} Y\) with distribution function H and due to the independence assumption

$$\begin{aligned} 1 - H(t) = (1-F(t)) (1-G(t)). \end{aligned}$$
(1)

The estimation should therefore be based on \((Y_i, \delta _i)\), \(i = 1,\ldots , n\). For the uncensored observations we define the following subdistribution function

$$\begin{aligned} H^1(t)= & {} P(Y \le t, \delta = 1) = P(T \le t, T \le C) \nonumber \\= & {} \int \limits _0^t \int \limits _{y-}^{\infty } G(dx) F(dy) = \int \limits _0^t (1-G(y-)) F(dy). \end{aligned}$$
(2)

We also introduce the following notation. For any distribution function L, the upper endpoint of support is denoted by \(\tau _L\), i.e. \(\tau _L = \inf \{t: L(t) = 1\}\). From (1) it follows that \(\tau _H = \tau _F \wedge \tau _G\).

2.2 Identifiability

An important preliminary question is of course the identifiability of the survival function, i.e. the possibility of obtaining the survival function of T from the observations on \(Y = T \wedge C\) and \(\delta = I(T \le C)\).

Theorem 1 below shows that the independent censoring assumption on T and C is sufficient for identifiability of the survival function of T. We refer to Tsiatis (1975) for the weaker assumption which says that knowledge of the copula function of T and C is also sufficient. See also Ebrahimi et al. (2003).

Theorem 1

Assume that T and C are independent with continuous distribution functions F and G. Then, for \(t < \tau _H\),

$$\begin{aligned} S(t) = \exp \left\{ - \int \limits _0^t \frac{H^1(dy)}{1-H(y)}\right\} . \end{aligned}$$

Proof

From (2) and the continuity of G we obtain

$$\begin{aligned} H^1(t) = \int \limits _0^t (1 - G(y)) F(dy). \end{aligned}$$

Together with (1) this gives

$$\begin{aligned} \frac{H^1(dy)}{1-H(y)} = \frac{F(dy)}{1-F(y)} \end{aligned}$$

or, for \(t < \tau _H\),

$$\begin{aligned} 1 - F(t) = \exp \left( -\int \limits _0^t \frac{H^1(dy)}{1-H(y)}\right) . \end{aligned}$$

2.3 The Kaplan–Meier estimator

The classical nonparametric maximum likelihood estimator under right censoring for the survival function is the estimator of Kaplan and Meier (1958), also called the product-limit estimator (see Supplementary Material) or nonparametric maximum likelihood estimator (see Johansen 1978). For values of t in the range of the data, it is defined as

$$\begin{aligned} \widehat{S}(t) = 1 - \widehat{F}(t) = \prod \limits _{\begin{array}{c} i=1\\ Y_i \le t, \delta _i = 1 \end{array}}^{n} \left( 1- \frac{d_i}{n_i}\right) \end{aligned}$$
(3)

with \(d_i\) the number of events and \(n_i\) the number of subjects/objects at risk at time \(Y_i\), \(i = 1,\ldots , n\).

For continuous survival distributions we have that, with probability one, only one event can happen at a time. We then have \(d_i = 0\) for \(\delta _i = 0\) and \(d_i = 1\) for \(\delta _i = 1\). This is the situation that we consider in the sequel. In fact we assume that T and C have continuous distribution functions.

Hence the proposed estimator is a step function with jumps at the event times, i.e. the \(Y_i\) having \(\delta _i = 1\), \(i = 1,\ldots , n\).

The Kaplan–Meier estimator in (3) can then be rewritten as

$$\begin{aligned} \begin{array}{ll} \widehat{S}(t) &{}= \displaystyle \prod \limits _{\begin{array}{c} i=1\\ Y_i \le t, \delta _i = 1 \end{array}}^{n}\left( 1- \frac{1}{n_i}\right) \\ &{}= \displaystyle \prod \limits _{\begin{array}{c} i=1\\ Y_{(i)} \le t \end{array}}^{n} \left( 1- \frac{1}{n- i+1}\right) ^{\delta _{(i)}} I(t < Y_{(n)}) \end{array} \end{aligned}$$
(4)

with \(Y_{(1)} \le \ldots \le Y_{(n)}\) the ordered \(Y_i\)’s and \(\delta _{(1)},\ldots , \delta _{(n)}\) the corresponding indicators.

The Kaplan–Meier estimator can also be represented as a sum:

$$\begin{aligned} \widehat{S}(t) = 1 - \widehat{F}(t) = \sum \limits _ {i=1}^n \Delta _i I(Y_{(i)} > t)\end{aligned}$$

where \(\Delta _i\) is the jump at \(Y_{(i)}\). We have

$$\begin{aligned} \Delta _i= & {} \widehat{S}(Y_{(i)}-) - \widehat{S}(Y_{(i)}) = \prod \limits _{j=1}^{i-1} \left( \frac{n-j}{n-j+1}\right) ^{\delta _{(j)}} - \prod \limits _{j=1}^i \left( \frac{n-j}{n-j+1}\right) ^{\delta _{(j)}}\\= & {} \frac{\delta _{(i)}}{n-i+1} \prod \limits _{j=1}^{i-1} \left( \frac{n-j}{n-j+1}\right) ^{\delta _{(j)}} = \frac{\delta _{(i)}}{n} \prod \limits _{j=1}^{i-1} \left( \frac{n-j+1}{n-j}\right) ^{1-\delta _{(j)}}\\= & {} \frac{\delta _{(i)}}{n} \frac{1}{1- \widehat{G}(Y_{(i)}-)} \end{aligned}$$

with \(\widehat{G}\) the Kaplan–Meier estimator for G, i.e. the Kaplan–Meier estimator based on the sample \((Y_i, 1-\delta _i)\), \(i=1,\ldots , n\). Hence,

$$\begin{aligned} \widehat{S}(t)= & {} \frac{1}{n} \sum \limits _{i=1}^n \frac{\delta _{(i)}}{1- \widehat{G} (Y_{(i)}-)} I(Y_{(i)}> t)\\= & {} \frac{1}{n} \sum \limits _{i=1}^n \frac{\delta _i}{1-\widehat{G} (Y_i - )} I(Y_i > t)\\\equiv & {} \widehat{S}_{RR}(t), \end{aligned}$$

which is the estimator of Robins and Rotnitzky (1992), also called the ‘inverse-probability-of censoring weighted average’. See also Satten and Datta (2001).

Remark 1

The Robins-Rotnitzky estimator stems from the identifying equation approach (see Suplementary Material).

2.4 The Lin-Ying estimator

In this section we want to present an estimator which is new in the univariate case. It has been proposed in the bivariate case by Lin and Ying (1993), but their simple idea can also be used in the univariate situation.

If T and C are independent, then for \(t < \tau _H\), (1) implies

$$\begin{aligned} S(t) = \frac{1-H(t)}{1-G(t)} \end{aligned}$$

and a simple nonparametric estimator for S(t) is, for \(t < Y_{(n)}\),

$$\begin{aligned} \widehat{S}_{LY}(t) = \frac{1-H_n(t)}{1-\widehat{G}(t)} = \frac{1}{n} \frac{1}{1-\widehat{G}(t)} \sum \limits _{i=1}^n I(Y_i >t) \end{aligned}$$

where \(H_n(t) = n^{-1} \sum \nolimits _{i=1}^n I(Y_i \le t)\) is the empirical distribution function of \(Y_1,\ldots , Y_n\) and \(\widehat{G}(t)\) is the Kaplan–Meier estimator for G(t). A nice feature of this estimator is that it jumps at every observation \(Y_i\), \(i = 1,\ldots ,n\). A drawback is that \(\widehat{S }_{LY}\), as estimator of a monotone function S, is not guaranteed to be monotone.

Remark 2

To compare \(\widehat{S}_{RR}(t)\) and \(\widehat{S}_{LY}(t)\) note that the Lin-Ying estimator has \((1-\widehat{G}(t))^{-1}\) in front of the summation whereas the Robins-Rotnizky estimator has the weights \(\frac{\delta _{i}}{1-\widehat{G}(Y_{i}-)}\), \(i=1,\ldots , n\), inside the sum.

2.5 Asymptotic behaviour of the Kaplan–Meier estimator

The asymptotic properties of the Kaplan–Meier estimator have been studied in great detail in several papers. In this survey we restrict attention to uniform strong consistency and asymptotic normality.

The oldest Glivenko-Cantelli type result is proved in Földes and Rejtő (1981).

Theorem 2

(Földes and Rejtő 1981)

Assume that T and C are independent and that F and G are continuous.

Then, for any \(t_0 < \tau _H = \inf \{t: 1-H(t) = 0 \}\),

$$\begin{aligned} \sup \limits _{0 \le t \le t_0} \mid \widehat{S}(t) - S(t) \mid \ = O(n^{-1/2} ( \log \log n)^{1/2})\ \text{ a.s. } \end{aligned}$$

Related important papers are Stute and Wang (1993b) and Gill (1994).

Theorem 3

(Lo and Singh 1986; Major and Rejtő 1988)

Assume that T and C are independent and that F and G are continuous.

Then, for any \(t < \tau _H\),

$$\begin{aligned} \widehat{S}(t) = S(t) - \frac{1}{n} \sum \limits _{i=1}^n \xi (t; Y_i, \delta _i) + R_n(t)\ \ \text{ a.s. } \end{aligned}$$

with, for any \(t_0 < \tau _H\),

$$\begin{aligned} \sup \limits _{0 \le t \le t_0} \mid R_n(t)\mid \ = O(n^{-1} \log n)\ \ \text{ a.s. } \end{aligned}$$

The i.i.d. random variables \(\xi (t; Y_i, \delta _i)\) in this representation are given by

$$\begin{aligned} \xi (t;Y_i,\delta _i)= & {} S(t) \left\{ - \int \limits _0^{Y_i \wedge t} \frac{H^1(dy)}{(1-H(y))^2} + \frac{I(Y_i \le t, \delta _i = 1)}{1-H(Y_i)}\right\} \\= & {} S(t) \left\{ \int \limits _0^t \frac{I(Y_i \le y) - H(y)}{(1-H(y))^2} H^1(dy) + \frac{I(Y_i \le t, \delta _i = 1) - H^1(t)}{1-H(t)}\right. \\{} & {} \left. - \int \limits _0^t \frac{I(Y_i \le y, \delta _i = 1) - H^1(y)}{(1-H(y))^2} H(dy)\right\} . \end{aligned}$$

Moreover, we have

$$\begin{aligned} \begin{array}{llll} \displaystyle E\xi (t; Y, \delta ) = 0\\ \displaystyle \text{ Cov } (\xi (t;Y, \delta ), \xi (t'; Y, \delta )) = S(t) S(t') \displaystyle \int \nolimits _0^{t \wedge t'}\frac{H^1(dy)}{(1-H(y))^2}. \end{array} \end{aligned}$$
(5)

The calculation of this covariance expression is long and tedious and can be found in Breslow and Crowley (1974), Appendix, p. 450–452.

Corollary 1

Assume the conditions of Theorem 2. Then, for any \(t < \tau _H\),

$$\begin{aligned} n^{1/2} (\widehat{S}(t) - S(t)) {\mathop {\rightarrow }\limits ^{d}}N \left( 0; S^2(t) \int \limits _0^t \frac{H^1(dy)}{(1-H(y))^2}\right) . \end{aligned}$$

Remark 3

The Kaplan–Meier estimator has been extended to the regression case, where next to the observations of \((Y, \delta )\) also another variable X, called covariate, is observed. The pioneering paper on nonparametric estimation of the conditional survival function \(S(t\mid x) = P(T > t \mid X = x)\) is Beran (1981). He studied the conditional Kaplan–Meier estimator of \(S(t \mid x)\) defined by

$$\begin{aligned} \widehat{S}(t \mid x)= & {} \prod \limits _{\begin{array}{c} i=1\\ Y_i \le t, \delta _i = 1 \end{array}}^n \left( 1- \frac{w_{ni}(t,h_n)}{\sum \nolimits _{j=1}^n w_{nj} (t,h_n) I(Y_j \ge Y_i)}\right) \\= & {} \prod \limits _{\begin{array}{c} i=1\\ Y_{(i)} \le t \end{array}}^n \left( 1 - \frac{w_{n(i)} (t,h_n)}{1- \sum \nolimits _{j=1}^{i-1} w_{n(j)} (t, h_n)}\right) ^{\delta _{(i)}} \end{aligned}$$

where \(Y_{(1)} \le Y_{(2)} \le \ldots Y_{(n)}\) are the ordered \(Y_j\)’s. Also \(\delta _{(j)}\) and \(w_{n(j)}(t,h_n)\) are the censoring indicator and weight corresponding to that ordering.

The \(w_{ni} (t,h_n)\) are some smoothing weights depending on a given probability density function (called kernel) and a nonnegative sequence \(\{h_n\}\), tending to 0 as \(n \rightarrow \infty \) (bandwidth sequence).

Note that \(w_n(t,h_n) = n^{-1}\) gives the classical Kaplan–Meier estimator as in (4). Properties of the Beran estimator such as the generalization of the representation of Lo and Singh (Theorem 3) are in González-Manteiga and Cadarso Suarez (1994), Van Keilegom and Veraverbeke (1997).

Remark 4

The Kaplan–Meier estimator has also been extended to the case of dependent censoring, that is T and C are allowed to be dependent. Then, instead of assuming that \(P(T \le t, C \le c) = F(t) G(c)\) we assume that for a given copula \(\mathcal {C}\), \(P(T \le t, C \le c) = \mathcal {C} (F(t), G(c))\) (see Sklar’s theorem in Nelsen 2006).

For this more general setting identifiability is discussed in Tsiatis (1975). Further important references include Zheng and Klein (1995) and Rivest and Wells (2001). The regression case for dependent T and C is considered in Braekers and Veraverbeke (2005).

2.6 Asymptotic behaviour of the Lin-Ying estimator

From Sect. 2.4 we have, for \(t < \tau _H\),

$$\begin{aligned} \widehat{S}_{LY}(t) - S(t)= & {} \dfrac{1-H_n(t)}{1-\widehat{G}(t)} - \dfrac{1-H(t)}{1-G(t)}\nonumber \\= & {} \dfrac{1}{(1-\widehat{G}(t))(1-G(t))}\!\left\{ -(1\!-\!G(t))(H_n(t)-H(t))\right. \nonumber \\{} & {} \left. \!+\,(1\!-\!H(t))(\widehat{G}(t)\!-\!G(t))\!\right\} . \end{aligned}$$
(6)

Theorem 4

Assume that T and C are independent and that F and G are continuous.

Then, for any \(t_0 < \tau _H\),

$$\begin{aligned} \sup \limits _{0 \le t \le t_0} \mid \widehat{S}_{LY} (t) - S(t) \mid = O(n^{-1/2}(\log \log n)^{1/2})\ \ \text{ a.s. } \end{aligned}$$

Proof

If \(t_0 < \tau _H\), it follows from (6) and the uniform consistency of \(\widehat{G}\) that there exist positive constants \(K_1\) and \(K_2\) such that

$$\begin{aligned} \sup \limits _{0 \le t \le t_0} \mid \widehat{S}_{LY}(t) - S(t)\mid \ \le K_1 \sup \limits _{0 \le t\le t_0} \mid H_n(t) - H(t)\mid + K_2 \sup \limits _{0 \le t \le t_0} \mid \widehat{G}(t) - G(t)\mid . \end{aligned}$$

Apply the law of iterated logarithm result to the first term and the corresponding result for the Kaplan–Meier estimator (see Theorem 2) to the second term.

Theorem 5

Assume that T and C are independent and that F and G are continuous.

Then, for any \(t < \tau _H\),

$$\begin{aligned} \widehat{S}_{LY}(t) = S(t) - \frac{1}{n} \sum \limits _{i=1}^n \psi _{LY} (t; Y_i, \delta _i) + \widetilde{R}_n(t)\ \ \text{ a.s. } \end{aligned}$$

with, for any \(t_0 < \tau _H\),

$$\begin{aligned} \sup \limits _{0 \le t \le t_0} \mid \widetilde{R}_n(t) \mid = O(n^{-1} \log n)\ \ \text{ a.s. } \end{aligned}$$

The i.i.d. random variables \(\psi _{LY}(t;Y_i, \delta _i)\) are given by

$$\begin{aligned} \begin{array}{l} \displaystyle \psi _{LY} (t; Y_i, \delta _i) = \frac{1}{1-G(t)} \{I(Y_i \le t) - H(t)\}\\ \displaystyle - \frac{1-F(t)}{1-G(t)} \left\{ (1-G(t))\left[ \int \limits _0^t \frac{I(Y_i \le y)-H(y)}{(1-H(y))^2} H^0(dy)\right. \right. \\ \left. \left. + \displaystyle \frac{I(Y_i \le t, \delta _i = 0) - H^0(t)}{1-H(t)} - \int \limits _0^t \frac{I(Y_i \le y, \delta _i = 0) - H^0(y)}{(1-H(y))^2} H(dy)\right] \right\} \end{array} \end{aligned}$$

where \(H^0(t) = P(Y \le t, \delta =0)\) is the subdistribution function of the censored obervations.

Proof

From (6) and the consistency of \(\widehat{G}\) (using a Slutsky argument) it follows by linearization that \(\widehat{S}_{LY}(t) - S(t)\) has the same asymptotic distribution as

$$\begin{aligned} -\frac{1}{1-G(t)} \{H_n(t)-H(t)\} + \frac{1-F(t)}{1-G(t)} \left\{ \widehat{G}(t) - G(t)\right\} . \end{aligned}$$

Plugging in the asymptotic representation of Theorem 3 for \(\widehat{G} - G\) gives the desired result. This asymptotic representation for \(\widehat{G} - G\) is obtained from Theorem 3 by interchanging the role of F and G and now \(1-\delta _i\) in the role of \(\delta _i\).

Corollary 2

Assume the conditions of Theorem 5. Then, for any fixed \(t < \tau _H\),

$$\begin{aligned} n^{1/2} (\widehat{S}_{LY}(t) - S(t)) {\mathop {\rightarrow }\limits ^{d}} N(0; \text{ Var }(\psi _{LY}(t; Y, \delta )). \end{aligned}$$

A long but rather straightforward calculation gives that

$$\begin{aligned} \text{ Var }(\psi _{LY}(t;Y,\delta )) = S^2(t) \int \limits _0^t \frac{H^1(dy)}{(1-H(y))^2} \end{aligned}$$

and

$$\begin{aligned} \text{ Cov }(\psi _{LY}(t;Y,\delta ), \psi _{LY}(t';Y,\delta )) = S(t)S(t')\int \limits _0^{t\wedge t'} \frac{H^1(dy)}{(1-H(y))^2} \end{aligned}$$

which is exactly the same as for the Kaplan–Meier estimator.

Proof

From the expression for \(\psi _{LY}(t;Y;\delta )\) in Theorem 5:

$$\begin{aligned} \begin{array}{l} \displaystyle \text{ Var }(\psi _{LY}(t;Y,\delta )) = \frac{1}{(1-G(t))^2} H(t) (1-H(t))\\ \displaystyle + (1-F(t))^2 \int \limits _0^t \frac{H^0(dy)}{(1-H(y))^2}\\ \displaystyle - 2 \frac{1-F(t)}{1-G(t)} E \left\{ I(Y \le t)\left[ \int \limits _0^t \frac{I(Y \le y)-H(y)}{(1-H(y))^2} H^0(dy)\right. \right. \\ \displaystyle + \left. \left. \frac{I(Y \le t, \delta = 0) - H^0(t)}{1-H(t)} - \int \limits _0^t \frac{I(Y \le y, \delta = 0) - H^0(t)}{(1-H(y))^2} H(dy) \right] \right\} \end{array} \end{aligned}$$

where we used the fact that \(E[\psi _{LY} (t;Y, \delta )] = 0\) in the covariance term.

Write the expectation above as \(E\{(1) + (2) - (3)\}\).

$$\begin{aligned} \begin{array}{llll} \displaystyle E{(1)} = (1-H(t)) \displaystyle \int \limits _0^t \frac{H(y)}{(1-H(y))^2} H^0 (dy)\\ E{(2)} = H_0(t)\\ \displaystyle E {(3)} = (1-H(t)) \displaystyle \int \limits _0^t \frac{H^0(y)}{(1-H(y))^2} H(dy). \end{array} \end{aligned}$$

Now using \(H^0(dy) = (1-F(y)) G(dy)\),

  • \(\displaystyle \int \limits _0^t \frac{H(y)}{(1-H(y))^2} H^0(dy)=\int \limits _0^t \frac{1-(1-F(y))(1-G(y))}{(1-F(y))(1-G(y))^2} G(dy) = \int \limits _0^t \frac{1}{1-H(y)} \frac{G(dy)}{1-G(y)} + \ln (1-G(t)) = \int \limits _0^t \frac{H^0(dy)}{(1-H(y))^2} + \ln (1-G(t))\).

  • \(\displaystyle \int \limits _0^t \frac{H^0(y)}{(1-H(y))^2} H(dy)=\int \limits _0^t H^0(y) d\left( \frac{1}{1-H(y)}\right) = \frac{H^0(t)}{1-H(t)} - \int \limits _0^t \frac{1}{1-H(y)} H^0(dy) = \frac{H^0(t)}{1-H(t)} + \ln (1-G(t))\).

Hence, \(\displaystyle E\{(1) + (2) - (3)\} = (1-H(t) \int \limits _0^t \frac{H^0(dy)}{(1-H(y))^2}. \text{ Var }(\psi _{LY}(t;Y,\delta )) = \frac{1}{(1-G(t))^2} H(t) (1-H(t)) - (1-F(t))^2 \int \limits _0^t \frac{H^0(dy)}{(1-H(y))^2}.\)

Use \(H(y) = H^0(y) + H^1(y)\) to obtain

$$\begin{aligned} \begin{array} {l} \displaystyle \text{ Var } (\psi _{LY}(t;Y,\delta ) = (1-F(t))^2 \int \limits _0^t \frac{H^1(dy)}{(1-H(y))^2}\\ \displaystyle + \frac{1}{(1-G(t))^2} H(t) (1-H(t)) - (1-F(t))^2 \int \limits _0^t \frac{H(dy)}{(1-H(y))^2}. \end{array} \end{aligned}$$

Since \(\int \limits _0^t \frac{H(dy)}{(1-H(y))^2} = \frac{H(t)}{1-H(t)}\), we have

$$\begin{aligned} \text{ Var }(\psi _{LY}(t;Y,\delta )) = (1-F(t))^2 \int \limits _0^t \frac{H^1(dy)}{(1-H(y))^2}. \end{aligned}$$

Remark 5

We are grateful to one of the referees for insisting on a further (very long) calculation, in line with the calculations for the asymptotic variance, of the asymptotic covariance. Given that the asymptotic covariances of Lin-Ying estimator and the Kaplan–Meier estimator coincide, the Lin-Ying process in t is first order asymptotic equivalent with the Kaplan-Meier process in t.

3 Nonparametric estimation of the bivariate survival function

3.1 The bivariate right random censoring model

In the bivariate setting we have a vector \((T_1, T_2)\) of nonnegative random variables, subject to right random censoring by a vector \((C_1, C_2)\) of nonnegative censoring variables. The observable variables are \((Y_1, Y_2)\) and \((\delta _1,\delta _2)\) with, for \(j = 1,2,\)

$$\begin{aligned} Y_j = T_j \wedge C_j \ \ \ \text{ and }\ \ \ \delta _j = I(T_j \le C_j). \end{aligned}$$

The observations in the model are \((Y_{1i}, Y_{2i}, \delta _{1i}, \delta _{2i})\) with, for \(i=1,\ldots , n\) and \(j=1,2\), \(Y_{ji}=T_{ji} \wedge C_{ji}\) and \(\delta _{ji} = I(T_{ji} \le C_{ji})\) and \((Y_{1i}, Y_{2i}, \delta _{1i}, \delta _{2i})\) are i.i.d. as \((Y_1, Y_2, \delta _1,\delta _2)\). Note that \((T_{1i}, T_{2i})\), \(i=1,\ldots , n\), is an i.i.d. sequence with joint distribution function \(F(t_1,t_2)\) and joint survival function \(S(t_1, t_2)\), and \((C_{1i}, C_{2i})\), \(i = 1,\ldots , n\), is an i.i.d. sequence with joint distribution function \(G(t_1, t_2)\) and joint survival function \(S_G(t_1, t_2)\).

3.2 Identifiability

For the bivariate censoring model there is a result analogous to Theorem 1. It is due to Langberg and Shaked (1982) and shows that the survival function of \((T_1, T_2)\) is identifiable under the assumption of independence of the vectors \((T_1, T_2)\) and \((C_1, C_2)\). We refer to Pruitt (1993) for a discussion on possible other sufficient conditions (depending on the type of data).

Theorem 6

(Langberg and Shaked 1982)

Assume that \((T_1, T_2)\) and \((C_1, C_2)\) are independent and that the marginal distributions \(F_1, F_2, G_1, G_2\) of \(T_1, T_2, C_1, C_2\) are continuous. Then \(S(t_1,t_2)\) is identifiable on the set \(\Omega \cup \widetilde{\Omega }\), where

$$\begin{aligned}{} & {} \Omega = \{(t_1, t_2): t_2< \tau _{H_2}, t_1< \tau _{H_1}(t_2)\}\\{} & {} \widetilde{\Omega } = \{(t_1, t_2): t_1< \tau _{H_1}, t_2 < \tau _{H_2}(t_1)\} \end{aligned}$$

with \(\tau _{H_1}\), \(\tau _{H_2}\), \(\tau _{H_1}(t_2)\), \(\tau _{H_2}(t_1)\) the right endpoints of support of \(H_1(t) = P(Y_1 \le t)\), \(H_2(t) = P(Y_2 \le t)\), \(P(Y_1 \le v \mid Y_2 > t_2)\), \(P(Y_2 \le v \mid Y_1 > t_1)\).

For all \((t_1,t_2) \in \Omega \) we have

$$\begin{aligned} S(t_1, t_2) = \exp \left( \!\!-\!\!\int \limits _0^{t_2} \!\!\frac{dP (Y_2 \le u, \delta _2 = 1)}{1 - H_2(u)}\!\right) \!\!\exp \left( \!\!-\!\!\int \limits _0^{t_1} \!\!\frac{dP(Y_1 \le v, \delta _1 = 1 \mid Y_2> t_2)}{P(Y_1> v \mid Y_2 > t_2)}\!\!\right) \nonumber \\ \end{aligned}$$
(7)

and for all \((t_1, t_2) \in \widetilde{\Omega }\) we have

$$\begin{aligned} S(t_1, t_2) = \exp \left( -\int \limits _0^{t_1} \frac{dP (Y_1 \le u, \delta _1 = 1)}{1 - H_1(u)}\right) \exp \left( -\int \limits _0^{t_2} \frac{dP(Y_2 \le v, \delta _2 = 1 \mid Y_1> t_1)}{P(Y_2> v \mid Y_1 > t_1)}\right) . \end{aligned}$$

Proof

We have, using independence of \(T_2\) and \(C_2\),

$$\begin{aligned} \begin{array}{ll} S(t_1, t_2) &{}= P(T_2> t_2) P(T_1> t_1 \mid T_2> t_2)\\ &{}= P(T_2> t_2) P(T_1> t_1 \mid Y_2 > t_2). \end{array} \end{aligned}$$
(8)

From the assumptions and Theorem 1, we have that the first factor in (8) is equal to the first factor in (7).

From the assumptions it also follows that \(T_1 \mid Y_2 > t_2\) and \(C_1 \mid Y_2 > t_2\) are independent and have continuous distributions. Apply again Theorem 1, to see that the second factor in (8) is equal to the second factor in (7).

The second expression for \(S(t_1,t_2)\) for \((t_1,t_2) \in \widetilde{\Omega }\) follows similarly, starting from

$$\begin{aligned} S(t_1,t_2) = P(T_1> t_1) P(T_2> t_2 \mid Y_1 > t_1). \end{aligned}$$

3.3 Bivariate extensions of the Kaplan–Meier estimator

Under the assumption of independence of the vector \((T_1, T_2)\) and \((C_1, C_2)\), several nonparametric estimators for the survival function \(S(t_1, t_2)\) have been proposed in the literature.

To obtain a nonparametric estimator of the survival function Dabrowksa (1988) has used a two-dimensional product-limit approximation (see also Pruitt 1991). Prentice and Cai (1992) used an approximation based on Peano series and van der Laan (1996) has taken an approach based on nonparametric maximum likelihood ideas. A look at the proposed solutions shows that bivariate censoring complicates nonparametric inference and makes it a hard problem. All proposals have one or more drawbacks such as, lack of monotonicity, non-uniqueness, slow rate of convergence, no analytic variance expression. See also Gill (1992) and see Prentice and Zhao (2018) for an excellent recent review with focus on these approaches. These complicated estimators will not be discussed in this survey.

Our approach to study nonparametric estimation of \(S(t_1,t_2)\) given bivariate right censored time to event data follows the inverse probability weighting (IPW) idea of Robins and Rotnitzky (see also Burke 1988; Satten and Datta 2001; Lopez 2012). After a general starting point, we consider specific bivariate censoring schemes and for these we work out the asymptotic distribution theory of the proposed nonparametric estimators in detail.

Also the simpler nonparametric estimators we propose share some of the drawbacks mentioned above. For example IPW estimators have been criticized for not using all the information contained in the data, and the Lin-Ying type estimators do not define a true distribution and are not necessarily monotone. Our estimators, however, show remarkable good behaviour in concrete applied situations (see the simulations in Geerdens et al. 2016; Abrams et al. 2021, 2023). Their performance can also depend on the time region where they are used (Geerdens et al. 2016). It is clear that the finite sample quality always needs to checked by detailed simulations (see e.g. Prentice and Zhao (2018)).

As, for example, in Burke (1988), introduce the following subdistribution function

$$\begin{aligned} H^{11}(t_1,t_2) = P(Y_1 \le t_1, Y_2 \le t_2, \delta _1 = 1,\delta _2 = 1). \end{aligned}$$

We have, under independence of \((T_1, T_2)\) and \((C_1, C_2)\),

$$\begin{aligned} H^{11}(t_1, t_2)= & {} P(T_1 \le t_1, T_2 \le t_2, T_1 \le C_1, T_2 \le C_2)\\= & {} \int \limits _0^{t_1}\int \limits _0^{t_2} P(C_1 \ge y_1, C_2 \ge y_2) F(dy_1, dy_2)\\= & {} \int \limits _0^{t_1}\int \limits _0^{t_2} S_G(y_1 -, y_2-) F(dy_1, dy_2). \end{aligned}$$

Hence

$$\begin{aligned} F(dy_1, dy_2)= \frac{1}{S_G(y_1-, y_2-)} H^{11}(dy_1, dy_2) \end{aligned}$$

or

$$\begin{aligned} S(t_1, t_2) = \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{1}{S_G(y_1-, y_2-)} H^{11} (dy_1, dy_2). \end{aligned}$$
(9)

In the Supplementary Material we show how (9) can be obtained from the identifying equation idea. This will also be demonstrated for (10) (Sect. 3.4) and (15) (Sect. 3.6).

An estimator for \(S(t_1, t_2)\) is obtained by plugging in appropriate estimators \(\widehat{H}^{11}\) for \(H^{11}\) and \(\widehat{S}_G\) for \(S_G\). For \(\widehat{H}^{11}\) we can take

$$\begin{aligned} \widehat{H}^{11} (t_1, t_2) = \frac{1}{n} \sum \limits _{i=1}^n \delta _{1i} \delta _{2i} I(Y_{1i} \le t_1, Y_{2i} \le t_2) \end{aligned}$$

which gives

$$\begin{aligned} \widehat{S}(t_1, t_2) = \frac{1}{n} \sum \limits _{i=1}^n \frac{\delta _{1i} \delta _{2i}}{\widehat{S}_G(Y_{1i}-, Y_{2i}-)} I(Y_{1i}> t_1, Y_{2i} > t_2). \end{aligned}$$

Since, in general \(C_1\) and \(C_2\) are not independent, there exists a (survival) copula \(\mathcal {C}\) such that \(S_G(t_1, t_2) = \mathcal {C}(S_{G_1} (t_1), S_{G_2}(t_2))\) with \(S_{G_j}(t_j) = 1 - G_j(t_j)\) and \(G_j(t_j)\) the marginal distributions corresponding to \(G(t_1,t_2)\), \(j = 1,2\).

The presence of \(\mathcal {C}\) complicates the situation. If \(\mathcal {C}\) is known, \(S_G\) can be estimated as

$$\begin{aligned} \widehat{S}_G(t_1,t_2) = \mathcal {C} (\widehat{S}_{G_1}(t_1), \widehat{S}_{G_2}(t_2)) \end{aligned}$$

where \(\widehat{S}_{G_j}(t_j) = 1 - \widehat{G}_j(t_j)\) with \(\widehat{G}_j (t_j)\) the Kaplan–Meier estimator of \(G_j(t_j)\), \(j = 1,2.\)

The study of \(\widehat{S}_G(t_1, t_2)\) in the general setting, although possible, is challenging and an explicit expression for the asymptotic variance is hard to obtain (see p. 457 in Hougaard 2000). However, for specific censoring schemes, explicit estimators for \(\widehat{S}_G(t_1,t_2)\)—and hence for \(\widehat{S}(t_1, t_2)\)—can be given and the asymptotic normality of \(\widehat{S}(t_1, t_2)\) can be obtained with an explicit analytic expression for the asymptotic variance.

In the sequel we study in detail univariate censoring (Sects. 3.4 and 3.5) and one-component censoring (Sects. 3.6–3.8).

3.4 Estimation of the bivariate survival function under univariate censoring

In this situation \((T_1, T_2)\) is subject to right censoring by a single censoring variable C with univariate distribution function \(G(c) = P(C \le c)\).

We assume that \((T_1, T_2)\) and C are independent.

Denote \(Y_1 = T_1 \wedge C\), \(Y_2 = T_2 \wedge C\), \(\delta _1 = I(T_1 \le C)\), \(\delta _2 = I(T_2 \le C)\). Also, with \( a \vee b = \max (a,b)\),

$$\begin{aligned} H^{11}(t_1,t_2)= & {} P(Y_1 \le t_1, Y_2 \le t_2, \delta _1 = 1, \delta _2 = 1)\\= & {} P(T_1 \le t_1, T_2 \le t_2, T_1 \vee T_2 \le C)\\= & {} \int \limits _0^{t_1} \int \limits _0^{t_2} [1-G((y_1 \vee y_2)-)] \ F(dy_1, dy_2). \end{aligned}$$

As in Sect. 3.3 we obtain, for continuous G,

$$\begin{aligned} \begin{array}{ll} S(t_1, t_2) &{}= \displaystyle \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{1}{1-G((y_1 \vee y_2)-)} H^{11}(dy_1, dy_2)\\ &{}= \displaystyle \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{1}{1-G(y_1 \vee y_2)} H^{11}(dy_1, dy_2). \end{array} \end{aligned}$$
(10)

As empirical version for \(H^{11}(t_1,t_2)\) we again use

$$\begin{aligned} \widehat{H}^{11} (t_1, t_2) = \frac{1}{n} \sum \limits _{i=1}^n \delta _{1i} \delta _{2i} I(Y_{1i} \le t_1, Y_{2i} \le t_2). \end{aligned}$$

To estimate G, we note that C is observed if \(T_1> C\) or \(T_2 > C\), i.e. \(T_1 \vee T_2 > C\). Therefore G can be estimated by a Kaplan–Meier estimator \(\widehat{G}\), calculated from \(\{C_i \wedge (T_{1i} \vee T_{2i})\), \(I(C_i \le T_{1i} \vee T_{2i}\} = \{Y_{1i} \vee Y_{2i}, \delta _i^{\max }\}\) with \(\delta _i^{\max } = 1 - \delta _{1i} \delta _{2i}\).

The estimator for \(S(t_1, t_2)\) is

$$\begin{aligned} \widehat{S}(t_1,t_2)= & {} \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{1}{1-\widehat{G}((y_1 \vee y_2)-)} \widehat{H}^{11} (dy_1, dy_2)\\= & {} \frac{1}{n} \sum \limits _{i=1}^n \frac{\delta _{1i} \delta _{2i}}{1-\widehat{G}((Y_{1i} \vee Y_{2i})-)} I(Y_{1i}> t_1, Y_{2i} > t_2). \end{aligned}$$

In the next theorem, the supports of the underlying distributions are important. Let \(\tau _{F_1}\), \(\tau _{F_2}\), \(\tau _G\), \(\tau _{H_1}\), \(\tau _{H_2}\) be the right endpoints of support of the distribution functions \(F_1\), \(F_2\), G, \(H_1\), \(H_2\) of \(T_1\), \(T_2\), C, \(Y_1\), \(Y_2\). We impose the condition

$$\begin{aligned} \tau _G > \tau _{F_1} \vee \tau _{F_2}. \end{aligned}$$
(11)

This will imply that \(1 - G(y_1 \vee y_2) > 0\) for \(y_1 < \tau _{F_1}\) and \(y_2 < \tau _{F_2}\).

Another consequence of (11) is that \(\tau _{H_1} = \tau _{F_1}\) and \(\tau _{H_2} = \tau _{F_2}\). Indeed, since \(Y_1 = T_1 \wedge C\) and \(Y_2 = T_2 \wedge C\), we have that \(\tau _{H_1} = \tau _{F_1} \wedge \tau _G = \tau _{F_1}\) and \(\tau _{H_2} = \tau _{F_2} \wedge \tau _G = \tau _{F_2}\).

Also, note that \(P(Y_1> \tau _{H_1}, Y_2 > \tau _{H_2}) = 0\).

Theorem 7

Assume that \((T_1, T_2)\) and C are independent and that the distribution functions \(F_1\), \(F_2\) and G are continuous. Also assume condition (11). Then, for \(t_1 < \tau _{F_1}\), \(t_2 < \tau _{F_2}\) with \(S(t_1, t_2) > 0\), we have the following asymptotic representation:

$$\begin{aligned}{} & {} \widehat{S}(t_1, t_2) - S(t_1, t_2)\\{} & {} \quad = \displaystyle \frac{1}{n} \sum \limits _{i=1}^n \frac{\delta _{1i} \delta _{2i}}{1-G(Y_{1i} \vee Y_{2i})} I(Y_{1i}> t_1, Y_{2i} > t_2) - S(t_1, t_2)\\{} & {} \qquad + \displaystyle \frac{1}{n} \sum \limits _{i=1}^n \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{\xi (y_1 \vee y_2; Y_{1i} \vee Y_{2i}, \delta _i^{\max })}{(1-G(y_1 \vee y_2))^2} H^{11} (dy_1, dy_2)\ + o_P(n^{-1/2}) \end{aligned}$$

where, with \(\delta ^{\max } = 1-\delta _1\delta _2\),

$$\begin{aligned} \begin{array}{l} \xi (t; Y_1 \vee Y_2, \delta ^{\max })\\ = (1-G(t))\displaystyle \left\{ \int \limits _0^t \frac{I (Y_1 \vee Y_2 \le y) - \widetilde{H}(y)}{(1-\widetilde{H}(y))^2} \widetilde{H}^1 (dy)\right. \\ \left. + \displaystyle \frac{I(Y_1 \vee Y_2 \le t, \delta ^{\max } = 1) - \widetilde{H}^1(t)}{1-\widetilde{H}(t)}\right. \\ \left. -\displaystyle \int \limits _0^t \frac{I(Y_1 \vee Y_2 \le y, \delta ^{\max } = 1)-\widetilde{H}^1(y)}{(1-\widetilde{H}(y))^2} \widetilde{H}(dy)\right\} \end{array} \end{aligned}$$

and

$$\begin{aligned} \begin{array}{ll} \widetilde{H}(t) &{}= P(Y_1 \vee Y_2 \le t)\\ \widetilde{H}^1(t) &{}= P(Y_1 \vee Y_2 \le t, \delta ^{\max } = 1)\\ &{} = P(Y_1 \vee Y_2 \le t, \delta _1 = 0\ \text{ or }\ \delta _2 = 0). \end{array} \end{aligned}$$
(12)

Proof

$$\begin{aligned}{} & {} \widehat{S}(t_1, t_2) - S(t_1, t_2) \nonumber \\{} & {} \quad = \displaystyle \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{1}{1-\widehat{G} ((y_1 \vee y_2)-)} \widehat{H}^{1} (dy_1, dy_2) - \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{1}{1-G(y_1 \vee y_2)}H^{11} (dy_1, dy_2) \nonumber \\{} & {} \quad = \displaystyle \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \left\{ \frac{1}{1-\widehat{G} ((y_1 \vee y_2)-)} - \frac{1}{1-\widehat{G}(y_1 \vee y_2)}\right\} (\widehat{H}^{11} (dy_1, dy_2) -H^{11} (dy_1, dy_2)) \nonumber \\{} & {} \quad \ \ \ + \displaystyle \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \left\{ \frac{1}{1-\widehat{G} (y_1 \vee y_2)} - \frac{1}{1-G(y_1 \vee y_2)}\right\} \widehat{H}^{11} (dy_1, dy_2) \nonumber \\{} & {} \quad \ \ \ +\displaystyle \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \left\{ \frac{1}{1-\widehat{G} ((y_1 \vee y_2)-)} - \frac{1}{1-\widehat{G}(y_1 \vee y_2)}\right\} H^{11} (dy_1, dy_2) \nonumber \\{} & {} \quad \ \ \ +\displaystyle \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{1}{1-G (y_1 \vee y_2)} \widehat{H}^{11} (dy_1, dy_2) - S(t_1, t_2) \nonumber \\{} & {} \quad \equiv (a) + (b) + (c) + (d). \end{aligned}$$
(13)

We have

$$\begin{aligned} \mid (c)\mid\le & {} \int \limits _{t_1}^{\tau _{F_1}} \int \limits _{t_2}^{\tau _{F_2}} \frac{\widehat{G}((y_1 \vee y_2)-) - \widehat{G}((y_1 \vee y_2)}{(1-\widehat{G}((y_1 \vee y_2))^2} H^{11} (dy_1, dy_2)\\\le & {} \frac{1}{(1-\widehat{G}(\tau _{F_1} \vee \tau _{F_2}))^2} \sup \limits _{s \le \tau _{F_1} \vee \tau _{F_2}} \mid \widehat{G}(s-) - \widehat{G}(s)\mid \int \limits _{t_1}^{\tau _{F_1}} \int \limits _{t_2}^{\tau _{F_2}} H^{11} (dy_1, dy_2)\\= & {} O_p (n^{-1}) \end{aligned}$$

since \(\widehat{G} (\tau _{F_1} \vee \tau _{F_2})\) is a consistent estimator for \( G(\tau _{F_1} \vee \tau _{F_2})\) and since the jump of the Kaplan–Meier estimator is \(O_P(n^{-1})\), uniformly (See Sect. 2.3).

Similarly \(\mid (a)\mid = O_P(n^{-1})\).

For (b) we replace the expression \((\widehat{G}-G)/[(1-\widehat{G})(1-G)]\) in the integrand by \((\widehat{G}-G)/(1-G)^2\) and use the consistency result for \(\widehat{G}\) in Theorem 2. It then follows that

$$\begin{aligned} \begin{array}{llll} \widehat{S}(t_1,t_2)- S(t_1,t_2)\\ = \displaystyle \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{1}{1-G(y_1 \vee y_2)} \widehat{H}^{11} (dy_1, dy_2) - S(t_1, t_2)\\ \ \ \ + \displaystyle \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{\widehat{G} (y_1 \vee y_2)- G(y_1 \vee y_2)}{(1-G(y_1 \vee y_2))^2} \widehat{H}^{11} (dy_1, dy_2)\\ \ \ \ + O_P(n^{-1}\log \log n). \end{array} \end{aligned}$$

In the second term we plug in the asymptotic representation of Lo and Singh (1986) (see Theorem 3). This gives that the second term becomes

$$\begin{aligned} \begin{array}{l} \displaystyle \frac{1}{n} \sum \limits _{i=1}^n \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{\xi (y_1 \vee y_2; Y_{1i} \vee Y_{2i}, \delta _i^{\max })}{(1-G(y_1 \vee y_2))^2} \widehat{H}^{11} (dy_1,dy_2) + O(n^{-1} \log n)\ \ \text{ a.s. }\\ = \displaystyle \frac{1}{n^2} \sum \limits _{i=1}^n \sum \limits _{j=1}^n \frac{\xi (Y_{1j} \vee Y_{2j}; Y_{1i} \vee Y_{2i}, \delta _i^{\max })}{(1-G(Y_{1j} \vee Y_{2j}))^2} I(Y_{1j}> t_1, Y_{2j} > t_2, \delta _{1j} = 1, \delta _{2j} = 1)\\ + O_P(n^{-1}\log n). \end{array} \end{aligned}$$

The double sum term in the above expression is a V-statistic with kernel

$$\begin{aligned}{} & {} h((y_{1i}, y_{2i}, \delta _{1i}, \delta _{2i}), (y_{1j}, y_{2j}, \delta _{1j}, \delta _{2j}))\\ \\{} & {} \quad = \frac{\xi (y_{1j} \vee y_{2j}; y_{1i} \vee y_{2i}, \delta _i^{\max })}{(1-G(y_{1j} \vee y_{2j}))^2} I(y_{1j}> t_1, y_{2j} > t_2, \delta _{1j} = 1, \delta _{2j} = 1). \end{aligned}$$

We have

$$\begin{aligned}{} & {} E[h((y_1, y_2, \delta _1,\delta _2), (Y_{1j}, Y_{2j}, \delta _{1j}, \delta _{2j}))]\\ \\{} & {} \quad = \displaystyle \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{\xi (y_1' \vee y_2'; y_{1} \vee y_{2}, \delta ^{\max })}{(1-G(y_1' \vee y_2'))^2} H^{11} (dy_1', dy_2') \end{aligned}$$

and

$$\begin{aligned}{} & {} E[h((Y_{1i}, Y_{2i}, \delta _{1i}, \delta _{2i}), (y_1, y_2, \delta _1, \delta _2))]\\ \\{} & {} \quad = \displaystyle \frac{E[\xi (y_1 \vee y_2; Y_{1i} \vee Y_{2i}, \delta _i^{\max })]}{(1-G(y_1 \vee y_2))^2} I(y_1> t_1, y_2 > t_2, \delta _1 = 1, \delta _2 = 1)\\{} & {} \quad = 0. \end{aligned}$$

Hence the Hajek projection of the V-statistic is

$$\begin{aligned} \frac{1}{n} \sum \limits _{i=1}^n \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{\xi (y_1 \vee y_2; Y_{1i} \vee Y_{2i}, \delta _i^{\max })}{(1-G(y_1 \vee y_2))^2} H^{11} (dy_1, dy_2) \end{aligned}$$

and the remainder term is \(o_p(n^{-1/2})\).

This follows from the asymptotic theory for the V-statistic and the corresponding U-statistic (Serfling 1980). The required moment conditions are satisfied since the kernel h is bounded. Indeed, \(\xi \) is bounded and \(1/(1-G(y_1 \vee y_2)) \le \) \(1/(1-G(\tau _{F_1} \vee \tau _{F_2}))\) since \(\tau _G > \tau _{F_1} \vee \tau _{F_2}\). Note that symmetry of the kernel is not required for this type of result.

This proves the theorem.

Corollary 3

Assume the conditions of Theorem 7. Then, for any \(t_1 < \tau _{F_1}\), \(t_2 < \tau _{F_2}\) with \(S(t_1, t_2) > 0\), we have

$$\begin{aligned} n^{1/2} (\widehat{S}(t_1, t_2) - S(t_1,t_2)) \rightarrow N (0; \sigma ^2(t_1,t_2)) \end{aligned}$$

where

$$\begin{aligned} \sigma ^2(t_1,t_2)= & {} \displaystyle \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{1}{1-G(y_1 \vee y_2)} F(dy_1, dy_2) - S^2(t_1,t_2)\nonumber \\{} & {} + \displaystyle \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \int \limits _{t_1} ^\infty \int \limits _{t_2}^\infty \left( \int \limits _0^{(y_1 \vee y_2) \wedge (y'_1 \vee y'_2)}\frac{\widetilde{H}^1(dy)}{(1-\widetilde{H}(y))^2}\right) F(dy_1, dy_2) F(dy'_1, dy'_2)\nonumber \\{} & {} - 2S^2(t_1, t_2) \displaystyle \int \limits _0^{t_1 \vee t_2} \frac{\widetilde{H}^1(dy)}{(1-\widetilde{H}(y))^2}. \end{aligned}$$
(14)

Proof

This follows from the asymptotic representation in Theorem 7 which is of the form

$$\begin{aligned} \widehat{S}(t_1,t_2) - S(t_1, t_2) = \frac{1}{n} \sum \limits _{i=1}^n A_i + \frac{1}{n} \sum \limits _{i=1}^n B_i + o_P(n^{-1/2}). \end{aligned}$$

In the Supplementary Material we show that

$$\begin{aligned} \begin{array}{l} Var(A_i) = \displaystyle \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{1}{1-G(y_1 \vee y_2)} F(dy_1, dy_2) - S^2(t_1,t_2)\\ Var(B_i) = \displaystyle \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \left( \int \limits _0^{(y_1 \vee y_2) \wedge (y'_1 \vee y'_2)}\frac{\widetilde{H}^1(dy)}{(1-\widetilde{H}(y))^2}\right) F(dy_1, dy_2) F(dy'_1, dy'_2)\\ \displaystyle \text{ Cov } (A_i, B_i) = - S^2(t_1,t_2) \int \limits _0^{t_1 \vee t_2} \frac{\widetilde{H}^1(dy)}{(1-\widetilde{H}(y))^2}. \end{array} \end{aligned}$$

Remark 6

In case of no censoring

$$\begin{aligned} \sigma ^2(t_1,t_2) = S(t_1,t_2)(1-S(t_1,t_2)). \end{aligned}$$

Indeed in this case \(\delta _1 = \delta _2 \equiv 1\), \(\widetilde{H}^1 \equiv 0\), \(G \equiv 1\).

3.5 Estimator of Lin-Ying for the bivariate survival function under univariate censoring

For bivariate survival data subject to univariate censoring an alternative estimator has been proposed by Lin and Ying (1993). It is based on the following simple idea. Given the assumed independence of \((T_1, T_2)\) and C we have

$$\begin{aligned} P(Y_1> t_1, Y_2> t_2)= & {} P(T_1> t_1, T_2> t_2, C> t_1, C> t_2)\\= & {} S(t_1, t_2) P(C > t_1 \vee t_2). \end{aligned}$$

This leads, for \(t_1 \vee t_2 < (Y_1 \vee Y_2)_{(n)}\), to the following estimator

$$\begin{aligned} \widehat{S}_{LY}(t_1,t_2) = \displaystyle \frac{\frac{1}{n} \sum \nolimits _{i=1}^n I(Y_{1i}> t_1, Y_{2i} > t_2)}{1-\widehat{G}(t_1 \vee t_2)}, \end{aligned}$$

where \(\widehat{G}\) is the Kaplan–Meier estimator for G given in Sect. 3.4. Note that in the absence of censoring, \(\widehat{S}_{LY}\) reduces to the usual bivariate empirical survival function.

Theorem 8

Assume that \((T_1, T_2)\) and C are independent and that the distribution functions \(F_1\), \(F_2\) and G are continuous. Assume condition (11), i.e. \(\tau _G > \tau _{F_1} \vee \tau _{F_2}\). Then, for \(t_1 < \tau _{F_1}\), \(t_2 < \tau _{F_2}\) with \(S(t_1, t_2) > 0\), we have the following asymptotic representation

$$\begin{aligned} \widehat{S}_{LY} (t_1,t_2) - S(t_1,t_2) = \displaystyle \frac{1}{n} \sum \nolimits _{i=1}^n \psi _{LY} (t_1, t_2, Y_{1i}, Y_{2i}, \delta _{1i }, \delta _{2i}) + o_P(n^{-1/2}) \end{aligned}$$

where

$$\begin{aligned}{} & {} \psi _{LY} (t_1, t_2, Y_1, Y_2, \delta _1,\delta _2)\\{} & {} = \displaystyle \frac{1}{1-G(t_1 \vee t_2)} \{I(Y_1> t_1, Y_2> t_2) - P(Y_1> t_1, Y_2 > t_2)\}\\{} & {} + \displaystyle \frac{S(t_1, t_2)}{1-G(t_1 \vee t_2)} \xi (t_1 \vee t_2; Y_1 \vee Y_2, \delta ^{\max }) \end{aligned}$$

where \(\xi (t, Y_1 \vee Y_2,\delta ^{\max })\) is defined in (12).

Proof

As in the proof of Theorem 5, we note that by linearization and by consistency of \(\widehat{G}\), \(\widehat{S}_{LY}(t_1,t_2) - S(t_1,t_2)\) has the same asymptotic distribution as

$$\begin{aligned} \begin{array}{l} \displaystyle \frac{1}{1-G(t_1 \vee t_2)} \left\{ \frac{1}{n} \sum \limits _{i=1}^n I(Y_{1i}> t_1, Y_{2i}> t_2) - P(Y_1> t_1, Y_2 > t_2)\right\} \\ +\displaystyle \frac{S(t_1,t_2)}{1-G(t_1 \vee t_2)} \left\{ \widehat{G}(t_1 \vee t_2) - G(t_1 \vee t_2)\right\} . \end{array} \end{aligned}$$

Then plug in the asymptotic representation for \(\widehat{G}(t_1 \vee t_2) - G(t_1 \vee t_2)\).

Corollary 4

Assume the conditions of Theorem 8. Then, for any \(t_1 < \tau _{F_1}\), \(t_2 <\tau _{F_2}\) with \(S(t_1,t_2) > 0\), we have

$$\begin{aligned} n^{1/2} (\widehat{S}_{LY} (t_1, t_2) - S(t_1, t_2)) {\mathop {\rightarrow }\limits ^{d}}N (0; \sigma ^2_{LY} (t_1, t_2)) \end{aligned}$$

where

$$\begin{aligned} \sigma ^2_{LY} (t_1,t_2)= & {} \displaystyle \frac{1}{(1-G(t_1 \vee t_2))^2} P(Y_1> t_1, Y_2> t_2)(1-P(Y_1> t_1, Y_2> t_2))\\{} & {} - \displaystyle 2S^2(t_1,t_2) \int \limits _0^{t_1 \vee t_2} \frac{\widetilde{H}^1(dy)}{(1-\widetilde{H}(y))^2}\\= & {} \displaystyle \frac{1}{(1-G(t_1 \vee t_2))^2} P(Y_1> t_1, Y_2> t_2)(1-P(Y_1> t_1, Y_2 > t_2))\\{} & {} - S^2(t_1,t_2)\displaystyle \int \limits _0^{t_1\vee t_2} \frac{G(dy)}{(1-G(y))^2 (1-F(y,y))} \end{aligned}$$

with \(\widetilde{H}^1\) and \(\widetilde{H}\) as defined in (12) of Sect. 3.4.

Proof

$$\begin{aligned} \sigma ^2_{LY}(t_1,t_2)= & {} \frac{1}{(1-G(t_1 \vee t_2))^2} P(Y_1> t_1, Y_2> t_2) (1-P(Y_1> t_1, Y_2> t_2))\\{} & {} + S^2 (t_1,t_2) \int \limits _0^{t_1 \vee t_2} \frac{\widetilde{H}^1(dy)}{(1-\widetilde{H}(y))^2}\\{} & {} + 2 \frac{S(t_1,t_2)}{(1-G(t_1 \vee t_2))^2} E\{I(Y_1> t_1, Y_2 > t_2) \xi (t_1 \vee t_2; Y_1 \vee Y_2, \delta ^{\max })\}. \end{aligned}$$

The expectation above is equal to

$$\begin{aligned}{} & {} -P(Y_1> t_1, Y_2> t_2) (1-G(t_1 \vee t_2)) \left\{ \int \limits _0^{t_1 \vee t_2} \frac{\widetilde{H}(y)}{(1-\widetilde{H}(y))^2} \widetilde{H}^1(dy)\right. \\{} & {} \qquad \left. + \frac{\widetilde{H}^1(t_1 \vee t_2)}{1-\widetilde{H}(t_1 \vee t_2))} - \int \limits _0^{t_1 \vee t_2} \frac{\widetilde{H}^1(y)}{(1-\widetilde{H}(y))^2} \widetilde{H}(dy)\right\} \\{} & {} \quad =- P(Y_1> t_1, Y_2 > t_2) (1-G(t_1 \vee t_2)) \int \limits _0^{t_1 \vee t_2} \frac{\widetilde{H}^1(dy)}{(1-\widetilde{H}(y))^2} \end{aligned}$$

using the calculation in the proof of Corollary 3.

Hence,

$$\begin{aligned} \sigma ^2_{LY} (t_1,t_2)= & {} \frac{1}{(1-G(t_1 \vee t_2))^2} P(Y_1> t_1, Y_2> t_2)(1-P(Y_1> t_1, Y_2 > t_2))\\{} & {} - S^2(t_1, t_2) \int \limits _0^{t_1 \vee t_2} \frac{\widetilde{H}^1(dy)}{(1-\widetilde{H}(y))^2}. \end{aligned}$$

This can be rewritten by using the expressions: \(1-\widetilde{H}(y) = (1-G(y))(1-F(y,y))\), \(\widetilde{H}^1(dy) = (1-F(y,y)) G(dy)\) and \(P(Y_1> t_1, Y_2 > t_2) = S(t_1,t_2)(1-G(t_1 \vee t_2))\).

Remark 7

In case of no censoring

$$\begin{aligned} \sigma ^2_{LY} (t_1,t_2) = S(t_1,t_2)(1-S(t_1,t_2)). \end{aligned}$$

Remark 8

Wang and Wells (1997) use a different estimator for the denominator in the Lin and Ying (1993) estimator. Since \(G(t_1 \vee t_2) = G(t_1) \vee G(t_2)\), they estimate \(1-G(t_1 \vee t_2)\) by \(1-(\widehat{G}(t_1) \vee \widehat{G}(t_2))\):

$$\begin{aligned} \widehat{S}_{WW} (t_1,t_2) = \frac{\frac{1}{n} \sum \nolimits _{i=1}^n I(Y_{1i}> t_1, Y_2 > t_2).}{1-(\widehat{G}(t_1) \vee \widehat{G}(t_2))}. \end{aligned}$$

Similar calculations as before give for the asymptotic variance:

$$\begin{aligned} \sigma ^2_{WW}(t_1,t_2)= & {} \frac{1}{(1-G(t_1\vee t_2))^2} P(Y_1> t_1, Y_2> t_2)(1-P(Y_1> t_1, Y_2> t_2))\\{} & {} - S^2 (t_1, t_2) \times \left\{ \begin{array}{l} \displaystyle \int \limits _0^{t_1} \frac{G(dy)}{(1-G(y))^2(1-F_1(y))} \ \text{ if } \ t_1 > t_2\\ \\ \displaystyle \int \limits _0^{t} \frac{G(dy)}{(1-G(y))^2(1-F(y,y))} \ \text{ if } \ t_1 = t_2 = t\\ \\ \displaystyle \int \limits _0^{t_2} \frac{G(dy)}{(1-G(y))^2(1-F_2(y))} \ \text{ if }\ t_1 < t_2\end{array}\right. . \end{aligned}$$

Since \(F(y,y) \le F_1(y)\) and \(F(y,y) \le F_2(y)\) we have that \(\sigma ^2_{WW} (t_1, t_2) \le \sigma ^2_{LY} (t_1,t_2)\) (see also (3.5b) and (3.6b) in Wang and Wells (1997)). Also note that Remark 7 is valid for \(\sigma ^2_{WW} (t_1,t_2)\) since, in case of no censoring, \(G \equiv 1\).

3.6 One-component censoring: survival function estimator of Stute

A simplification of the general bivariate setting of Sect. 3.1 is the situation where the component \(T_1\) is fully observed and the component \(T_2\) is subject to right censoring by C. Compare to a regression-like context where the response \(T_2\) is censored and the covariate \(T_1\) is fully observed. So in this model we observe a random sample \((T_{1i}, Y_{2i}, \delta _{2i})\), \(i = 1,\ldots , n\), from \((T_1, Y_2, \delta _2)\) where \(Y_2 = T_2 \wedge C\) and \(\delta _2 = I(Y_2 \le C)\).

In this section we discuss the estimator \(\widehat{S}_S(t_1,t_2)\) for the survival function \(S(t_1,t_2)\) introduced by Stute (1993a, 1995, 1996) studied the more general context of Kaplan-Meier integrals, i.e. estimation of \(\int \varphi (t_1,t_2) F(dt_1, dt_2)\) by \(\int \varphi (t_1,t_2) \widehat{F}(dt_1,dt_2)\) for some functions \(\varphi \) and with \(\widehat{F}\) an appropriate estimator for F. The condition of independence between \((T_1,T_2)\) and C is now replaced by the following pair of assumptions:

  1. (i)

    \(T_2\) and C are independent

  2. (ii)

    \(P(T_2 \le C \mid T_1, T_2) = P(T_2 \le C \mid T_2)\).

Note that independence of \((T_1, T_2)\) and C implies (i) and (ii) and that the present weaker assumptions allow for dependence between \(T_1\) and C. For a discussion on (ii) we refer to Stute (1996), p. 462, and to Pruitt (1993).

For simplicity we also assume that the distribution functions of \(T_1\), \(T_2\) and C are continuous.

Conditions (i) and (ii) are sufficient for identifiability of the survival function of \((T_1, T_2)\). Indeed, denote

$$\begin{aligned} \widetilde{H}^{11} (t_1, t_2) = P(T_1 \le t_1, Y_2 \le t_2, \delta _2 = 1). \end{aligned}$$

Then

$$\begin{aligned} \widetilde{H}^{11}(t_1,t_2)= & {} E[I(T_1 \le t_1, T_2 \le t_2, T_2 \le C)]\\= & {} E[E[I(T_1 \le t_1) I(T_2 \le t_2) I(T_2 \le C)\mid T_1,T_2]]\\= & {} E[I(T_1 \le t_1, T_2 \le t_2) E[I(T_2 \le C) \mid T_2]\\= & {} E[I(T_1 \le t_1, T_2 \le t_2)(1-G(T_2-))]\\= & {} \int \limits _0^{t_1} \int \limits _0^{t_2} (1-G(y_2-) F(dy_1,dy_2). \end{aligned}$$

Hence,

$$\begin{aligned} F(dy_1,dy_2) = \frac{1}{1-G(y_2-)} \widetilde{H}^{11} (dy_1, dy_2) \end{aligned}$$

or, since G is continuous,

$$\begin{aligned} S(t_1,t_2) = \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{1}{1-G(y_2)} \widetilde{H}^{11} (dy_1, dy_2). \end{aligned}$$
(15)

The corresponding estimator for \(S(t_1,t_2)\) is

$$\begin{aligned} \widehat{S}_S(t_1,t_2) = \frac{1}{n} \sum \limits _{i=1}^n \frac{\delta _{2i}}{1-\widehat{G}(Y_{2i}-)} I(T_{1i}> t_1, Y_{2i} > t_2). \end{aligned}$$

Considering the results of Stute (1996) for the particular choice \(\varphi (x,w) = I_{]t_1,\infty [ \times ]t_2,\infty [} (x,w)\) and calculating the quantities \(\gamma _0, \gamma _1^\varphi , \gamma _2^\varphi \) in Stute (1996), p. 464 (see the Supplementary Material for details), we obtain the asymptotic representation in Theorem 9 below.

The following integrability assumptions are also required (see (1.3) and (1.4) in Stute 1996).

  1. (iii)

    \(\int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{1}{1-G(w)} F(dx,dw) < \infty \)

  2. (iv)

    \(\int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \left( \int \limits _0^w \frac{H_2^0(dy)}{(1-H_2(y))^2}\right) ^{1/2} F(dx,dw) < \infty \)

where \(H_2^0(t) = P(Y_2 \le t, \delta _2 = 0)\), \(H_2(t) = P(Y_2 \le t)\).

Theorem 9

Assume conditions (i)–(iv).

Assume that \(T_1\), \(T_2\), C have continuous distributions.

Then,

$$\begin{aligned} \widehat{S}_S(t_1,t_2) - S(t_1,t_2) = \frac{1}{n} \sum \limits _{i=1}^n \psi _S(t_1,t_2, T_{1i}, Y_{2i}, \delta _{2i}) + o_p(n^{-1/2}) \end{aligned}$$

where

$$\begin{aligned} \psi _S(t_1,t_2,T_1,Y_2,\delta _2)= & {} \frac{1}{1-G(Y_2)} I(T_1> t_1, Y_2 > t_2, \delta _2 = 1) - S(t_1, t_2)\\{} & {} + \frac{1}{1-H_2(Y_2)} \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{I(Y_2 \le w, \delta _2 = 0)}{1-G(w)} \widetilde{H}^{11} (dx,dw)\\{} & {} - \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \int \limits _0^{Y_2 \wedge w} \frac{H_2^0(dv)}{(1-H_2(v))^2} \frac{1}{1-G(w)} \widetilde{H}^{11} (dx,dw). \end{aligned}$$

Corollary 5

Assume the conditions of Theorem 9. Then,

$$\begin{aligned} n^{1/2} (\widehat{S}_S(t_1,t_2) - S(t_1,t_2)) {\mathop {\rightarrow }\limits ^{d}} N (0;\sigma ^2_S(t_1,t_2)) \end{aligned}$$

where

$$\begin{aligned} \sigma _S^2(t_1,t_2)= & {} \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{1}{1-G(w)} F(dx,dw) - S^2 (t_1,t_2)\\{} & {} - \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \left( \int \limits _0^{w \wedge w'} \frac{H_2^0(dy)}{(1-H_2(y))^2}\right) F(dx,dw) F(dx', dw'). \end{aligned}$$

Proof

For the calculation of the asymptotic variance, it is useful to note that \(\psi _S\) can also be written as

$$\begin{aligned} \psi _S(t_1,t_2,T_1, Y_2, \delta _2)= & {} \frac{1}{1-G(Y_2)} I(T_1> t_1, Y_2 > t_2, \delta _2 =1) - S(t_1,t_2)\nonumber \\{} & {} + \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{1}{(1-G(w))^2} \xi _G(w; Y_2, \delta _2) \widetilde{H}^{11} (dx,dw) \end{aligned}$$
(16)

where \(\xi _G\) is the expression in the asymptotic representation for \(\widehat{G}(w) - G(w)\), see Theorem 3.

The variance of the first two terms in (16) is equal to

$$\begin{aligned}{} & {} \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{1}{(1-G(w))^2} \widetilde{H}^{11} (dx,dw) - S^2(t_1, t_2)\\{} & {} \quad = \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{1}{1-G(w)} F(dx,dw) - S^2(t_1, t_2). \end{aligned}$$

The variance of the third term in (16) equals

$$\begin{aligned}{} & {} \displaystyle \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty E[\xi _G(w;Y_2,\delta )\xi _G(w';Y_2,\delta )] \frac{1}{(1-G(w))^2(1-G(w'))^2} \\{} & {} \qquad \widetilde{H}^{11} (dx,dw)\widetilde{H}^{11}(dx',dw')\\{} & {} \quad = \displaystyle \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \left( \int \limits _0^{w \wedge w'} \frac{H_2^0(dy)}{(1-H_2(y))^2}\right) F(dx, dw) F(dx', dw') \end{aligned}$$

using the covariance formula (5).

Finally the covariance is equal to

$$\begin{aligned}{} & {} E\left\{ \frac{I(T_1> t_1, Y_2> t_2, \delta _2 =1)}{1-G(Y_2)} \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \frac{1}{(1-G(w))^2} \xi _G(w; Y_2, \delta _2) \widetilde{H}^{11} (dx,dw)\right\} \\{} & {} \quad = -\int \limits _{t_1}^\infty \int \limits _{t_2}^\infty E\left\{ \frac{I(T_1> t_1, Y_2 > t_2, \delta _2 =1)}{1-G(Y_2)} \int \limits _0^{Y_2\wedge w} \frac{H_2^0(dv)}{(1-H_2(v))^2}\right\} \frac{1}{(1-G(w))} \widetilde{H}^{11} (dx,dw)\\{} & {} \quad = -\int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \left( \int \limits _0^{w\wedge w'} \frac{H_2^0(dv)}{(1-H_2(v))^2}\right) \frac{1}{1-G(w)} \frac{1}{1-G(w')} \widetilde{H}^{11} (dx,dw) \widetilde{H}^{11} (dx', dw')\\{} & {} \quad = -\int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \int \limits _{t_1}^\infty \int \limits _{t_2}^\infty \left( \int \limits _0^{w\wedge w'}\frac{H_2^0(dv)}{(1-H_2(v))^2}\right) F(dx, dw) F(dx',dw'). \end{aligned}$$

Collecting the terms gives the desired results.

Remark 9

In case of no censoring

$$\begin{aligned} \sigma _S^2(t_1,t_2) = S(t_1, t_2) (1-S(t_1,t_2)). \end{aligned}$$

3.7 One-component censoring: survival function estimator of Lin-Ying

The idea which led to the estimator of Lin and Ying (1993) discussed in Sect. 3.5 can also be used in the case of one-component censoring. It leads to a new estimator for the bivariate survival function.

If \((T_1, T_2)\) and C are independent and if \(T_1\), \(T_2\) and C have continuous distributions, then

$$\begin{aligned} P(T_1> t_1, Y_2> t_2)= & {} P(T_1> t_1, T_2> t_2, C > t_2)\\= & {} S(t_1, t_2) (1-G(t_2)) \end{aligned}$$

or

$$\begin{aligned} S(t_1,t_2) = \frac{P(T_1> t_1, Y_2 > t_2)}{1-G(t_2)}. \end{aligned}$$

A simple estimator is given by

$$\begin{aligned} \widetilde{S}_{LY}(t_1,t_2) = \frac{\frac{1}{n} \sum _{i=1}^n I(T_{1i}> t_1, Y_{2i} > t_2)}{1-\widehat{G}(t_2)} \end{aligned}$$

with \(\widehat{G}\) the Kaplan–Meier estimator of G.

Remark 10

Given the way we write \(S(t_1,t_2)\) it is natural to assume that \((T_1,T_2)\) and C are independent. Note that this condition implies conditions (i) and (ii) in Sect. 3.6.

Theorem 10

Assume that \((T_1, T_2)\) and C are independent and that the distribution functions of \(T_1\), \(T_2\) and C are continuous. Then, for \(t_2 < \tau _G\),

$$\begin{aligned} \widetilde{S}_{LY} (t_1,t_2) - S(t_1, t_2) = \frac{1}{n} \sum \limits _{i=1}^n \widetilde{\psi }_{LY} (t_1,t_2,T_{1i}, Y_{2i}, \delta _{2i}) + o_P(n^{-1/2}) \end{aligned}$$

where

$$\begin{aligned} \displaystyle \widetilde{\psi }_{LY} (t_1, t_2, T_1, Y_2, \delta _2)= & {} \frac{1}{1-G(t_2)} \{I(T_1> t_1, Y_2> t_2) - P(T_1> t_1, Y_2 > t_2)\}\\{} & {} + \displaystyle S(t_1,t_2) \left\{ \int \limits _0^{t_2} \frac{I(Y_2 \le y)-H_2(y)}{(1-H_2(y))^2} H_2^0 (dy)\right. \\{} & {} + \displaystyle \frac{I(Y_2 \le t_2, \delta _2 = 0) - H_2^0(t_2)}{1-H_2(t_2)}\\{} & {} \left. - \int \limits _0^{t_2} \frac{I(Y_2 \le y, \delta _2 = 0) - H_2^0(y)}{(1-H_2(y))^2} H_2(dy)\right\} \end{aligned}$$

with \(H_2(t) = P(Y_2 \le t)\) and \(H_2^0(t) = P(Y_2 \le t, \delta _2=0)\).

Proof

Similarly as in the proof of Theorem 8 it follows by linearization of \(\widetilde{S}_{LY} - S\) that \(\widetilde{S}_{LY} (t_1, t_2) - S(t_1,t_2)\) has the same asymptotic distribution as

$$\begin{aligned} \begin{array}{l} \displaystyle \frac{1}{1-G(t_2)} \frac{1}{n} \sum \limits _{i=1}^n \left\{ I(T_{1i}> t_1, Y_{2i}> t_2) - P(T_1> t_1, Y_2 > t_2\right\} \\ + \displaystyle S(t_1,t_2) \frac{1}{1-G(t_2)} [\widehat{G}(t_2) - G(t_2)]. \end{array} \end{aligned}$$

Now replace \(\widehat{G}(t_2) - G(t_2)\) by its asymptotic representation.

Corollary 6

Assume the conditions of Theorem 10. Then, for \(t_2 < \tau _G\),

$$\begin{aligned} n^{1/2} (\widetilde{S}_{LY}(t_1,t_2) - S(t_1,t_2)) {\mathop {\rightarrow }\limits ^{d}} N (0; \widetilde{\sigma }_{LY}^2 (t_1,t_2)) \end{aligned}$$

where

$$\begin{aligned} \widetilde{\sigma }_{LY}^2 (t_1,t_2)= & {} \frac{1}{(1-G(t_2))^2} P(T_1> t_1, Y_2> t_2)(1-P(T_1> t_1, Y_2 > t_2))\\{} & {} - S^2(t_1,t_2) \int \limits _0^{t_2} \frac{H_2^0(dy)}{(1-H_2(y))^2}. \end{aligned}$$

Proof

$$\begin{aligned} \widetilde{\sigma }_{LY}^2 (t_1,t_2)= & {} \frac{1}{(1-G(t_2))^2} P(T_1> t_1, Y_2> t_2) (1-P(T_1> t_1, Y_2> t_2))\\{} & {} + S^2(t_1,t_2) \int \limits _0^{t_2}\frac{H_2^0(dy)}{(1-H_2(y))^2}\\{} & {} + 2 \frac{S(t_1,t_2)}{1-G(t_2)} E\left\{ I(T_1> t_1, Y_2 > t_2)\left[ \int \limits _0^{t_2} \frac{I(Y_2 \le y) - H_2(y)}{(1-H_2(y))^2} H_2^0(dy)\right. \right. \\{} & {} + \left. \left. \frac{I(Y_2 \le t_2, \delta _2 = 0) - H_2^0(t_2)}{1-H_2(t_2)} - \int \limits _0^{t_2} \frac{I(Y_2 \le y, \delta _2 = 0) - H_2^0(y)}{(1-H_2(y))^2} H_2(dy)\right] \right\} . \end{aligned}$$

The last term equals

$$\begin{aligned}{} & {} - 2 \frac{S(t_1,t_2)}{1-G(t_2)} P(T_1> t_1, Y_2> t_2) \left\{ \int \limits _0^{t_2}\frac{H_2(y)}{(1-H_2(y))^2} H_2^0(dy)\right. \\{} & {} \qquad \left. + \frac{H_2^0(t)}{1-H_2(t_2)} - \int \limits _0^{t_2} \frac{H_2^0(y)}{(1-H_2(y))^2} H_2(dy)\right\} \\{} & {} \quad = - 2 \frac{S(t_1, t_2)}{1-G(t_2)} P(T_1> t_1, Y_2 > t_2) \int \limits _0^{t_2} \frac{H_2^0(dy)}{(1-H_2(y))^2}\\{} & {} \quad = - 2 S^2(t_1,t_2) \int \limits _0^{t_2} \frac{H_2^0(dy)}{(1-H_2(y))^2} \end{aligned}$$

where we used a similar calculation as in Theorem 7.

Remark 11

In case of no censoring

$$\begin{aligned} \widetilde{\sigma }_{LY}^2(t_1,t_2) = S(t_1, t_2) (1-S(t_1,t_2)). \end{aligned}$$

Remark 12

For \(t_1 = 0\), we have \(S(0,t_2) = P(T_2 > t_2)\) and \(\widetilde{\sigma }_{LY}^2(0,t_2) = (P(T_2 > t_2))^2 \int \nolimits _0^{t_2} \frac{H_2^1(dy)}{(1-H_2(y))^2}\) where \(H_2^1(t) = P(Y_2 \le t, \delta _2 = 1)\) which is the asymptotic variance of the Kaplan–Meier estimator for the survival function of \(T_2\).

3.8 One-component censoring: survival function estimator of Akritas

We again consider the bivariate model where \(T_1\) is fully observed and \(T_2\) is subject to censoring by C with distribution function G. Also, \(Y_2 = T_2 \wedge C\), \(\delta _2 = I(T_2 \le C)\) and the observations are \((T_{1i}, Y_{2i}, \delta _{2i})\), \(i = 1,\ldots , n\).

The following estimator for \(S(t_1, t_2)\) has been proposed by Akritas (1994) and further studied in Akritas and Van Keilegom (2003).

It is assumed that, given \(T_1\), the variables \(T_2\) and C are independent. The starting point is the following relation

$$\begin{aligned} S(t_1,t_2) = \int \limits _{t_1}^\infty S(t_2 \mid t) F_1(dt) \end{aligned}$$
(17)

where \(S(t_2\mid t) = P(T_2 > t_2 \mid T_1 = t)\).

The estimator is obtained by plugging in estimators \(\widehat{S}_n(t_2 \mid t)\) for \(S(t_2\mid t)\) and \(F_{1n}(t)\) for \(F_1(t)\), where \(F_{1n} (t) = n^{-1} \sum \nolimits _{i=1}^n I(T_{1i} \le t)\). This gives

$$\begin{aligned} \widehat{S}_A(t_1,t_2) = \frac{1}{n} \sum \nolimits _{i=1}^n \widehat{S}_n (t_2 \mid T_{1i}) I(T_{1i} > t_1). \end{aligned}$$

For \(\widehat{S}_n(t_2\mid t)\) we use the Beran (1981) estimator (see Remark 3:)

$$\begin{aligned} \widehat{S}_n(t_2 \mid t) = \prod \limits _{\begin{array}{c} i=1\\ Y_{2i} \le t_2,\delta _{2i}=1 \end{array}}^n\left( 1- \frac{w_{ni} (t, h_n)}{\sum _{j=1}^n w_{nj} (t,h_n) I(Y_{2j} \ge Y_{2i})}\right) . \end{aligned}$$

The weights \(w_{ni}(t,h_n)\) are Nadaraya-Watson weights with

$$\begin{aligned} w_{ni} (t,h_n) = K\left( \frac{t - T_{1i}}{h_n}\right) / \sum \nolimits _{j=1}^n K\left( \frac{t-T_{1j}}{h_n}\right) , \end{aligned}$$

where K is a known probability density function and \(\{h_n\}\) is a sequence of nonnegative constants, tending to 0 as \(n \rightarrow \infty \). It has been shown (see Van Keilegom and Veraverbeke 1997) that \(\widehat{S}_n (t_2 \mid T_{1i}) - S(t_2\mid T_{1i})\) has the same asymptotic distribution as \(- \sum \nolimits _{j=1}^n w_{nj} (T_{1i}, h_n) \xi _A (t_2; Y_{2j},\delta _{2j}, T_{1i})\) where

$$\begin{aligned} \xi _A(t_2; Y_2, \delta _2, t) = S(t_2 \mid t) \left( -\int \limits _0^{Y_2 \wedge t_2}\frac{H_2^1 (ds\mid t)}{(1-H_2(s\mid t))^2} + \frac{I(Y_2 \le t_2, \delta _2 =1)}{1-H_2(Y_2 \mid t)}\right) . \end{aligned}$$

Here \(H_2(y \mid t) = P(Y_2 \le y \mid T_1 = t)\) and \(H_2^1 (y\mid t) = P(Y_2 \le y, \delta _2 = 1 \mid T_1 = t)\).

Due to the censoring of \(T_2\), it will only be possible to estimate \(S(t_1,t_2)\) in a certain domain for \((t_1,t_2)\). Indeed, the estimator for \(S(t_1,t_2)\) is obtained from relation (17) by plugging in the empirical distribution function for \(F_1(t)\) and the conditional Kaplan–Meier estimator for \(S(t_2\mid t)\). To achieve uniformity of the remainder term in the asymptotic representation, we have to stay strictly away from the right endpoint of support of \(F_1\) as well as from the right endpoint of support of \(P(Y_2 \le y \mid T_1 = t)\), for all \(t \ge t_1\), the range of the integral in (17).

Hence, in order to define the domain of our estimator, we introduce the following notation (as in Akritas 1994; Akritas and Van Keilegom 2003):

  • \(\tau _1\) = any number strictly less than \(\inf \{t: F_1(t) = 1\}\)

  • \(\tau _2(t)\) = any number strictly less than \(\inf \{y: H_2(y\mid t) = 1\}\)

Therefore we use the following domain

$$\begin{aligned} \Omega _A = \{(t_1,t_2): t_1 \le \tau _1, t_2 \le \inf \limits _{t \ge t_1} \tau _2(t)\}. \end{aligned}$$

We will also need the following assumptions (see Akritas and Van Keilegom 2003):

  • (A1) \(\frac{\log n}{n h_n} \rightarrow 0\), \(nh^4 \rightarrow 0\); K is a probability density with support \([-1,1]\), K is twice continuously differentiable, \(\int uK(u)du = 0\);

  • (A2) \(F_1(t_1)\) is three times continuously differentiable w.r.t. \(t_1\); \(H_2(t_2 \mid t_1)\) and \(H_2^1(t_2\mid t_1)\) are twice continuously differentiable w.r.t. \(t_1\) and \(t_2\) and for \((t_1, t_2) \in \Omega _A\), all derivatives are uniformly bounded.

Theorem 11

(Akritas 1994; Akritas and Van Keilegom 2003)

Assume that \(T_2\) and C are independent, given \(T_1\). Assume (A1) and (A2). Then, for \((t_1,t_2) \in \Omega _A\), we have the following representation

$$\begin{aligned} \widehat{S}_A(t_1, t_2) - S(t_1, t_2) = \frac{1}{n} \sum \limits _{i=1}^n \psi _A(t_2; Y_{2i}, \delta _{2i}, T_{1i}) + r_n(t_1,t_2) \end{aligned}$$

where

$$\begin{aligned} \psi _A(t_2;Y_2, \delta _2, T_1) = \{S(t_2 \mid T_1)I(T_1> t_1) - S(t_1, t_2)\} - \xi _A(t_2; Y_2, \delta _2, T_1) I(T_1 > t_1) \end{aligned}$$

and

$$\begin{aligned} \sup \limits _{(t_1,t_2) \in \Omega _A} \mid r_n (t_1, t_2) \mid \ = o_p(n^{-1/2}). \end{aligned}$$

Remark 13

The crucial part of the Akritas estimator is the Beran estimator \(\widehat{S}_n(t_2\mid t)\). It is therefore natural to assume that \(T_2\) and C are conditional independent given \(T_1\). Note that this assumption is not implied or does not imply the independence conditions in Sects. 3.6 and 3.7.

Remark 14

Theorem 11 is a special case of Van Keilegom (2004) in which \(T_1\) is allowed to be censored.

Corollary 7

Assume the conditions of Theorem 11. Then,

$$\begin{aligned} n^{1/2} (\widehat{S}_A(t_1,t_2) - S(t_1,t_2)) {\mathop {\rightarrow }\limits ^{d}}N (0; \sigma _A^2 (t_1, t_2)) \end{aligned}$$

where

$$\begin{aligned} \sigma _A^2(t_1,t_2)= & {} E[S^2 (t_2\mid T_1) I(T_1> t_1)] - S^2 (t_1,t_2)\\{} & {} + E\left[ S^2(t_2 \mid T_1)\left( \int \limits _0^{t_2} \frac{H_2^1(ds\mid T_1)}{(1-H_2(s\mid T_1))^2}\right) I(T_1 > t_1)\right] \\= & {} \int \limits _{t_1}^\infty S^2(t_2 \mid t) F_1 (dt) - S^2(t_1, t_2)\\{} & {} +\int \limits _{t_1}^\infty S^2(t_2\mid t) \left( \int \limits _0^{t_2} \frac{H_2^1 (ds\mid t)}{(1-H_2(s\mid t))^2}\right) F_1(dt). \end{aligned}$$

Remark 15

In case of no censoring \(\int \limits _0^{t_2} \frac{H_2^1(ds\mid t)}{(1-H_2(s\mid t))^2} = \frac{F(t_2\mid t)}{1-F(t_2\mid t)}\) and hence \(\sigma _A^2(t_1,t_2) = S(t_1,t_2) (1-S(t_1,t_2))\).

4 Applications

Representations are a particularly useful tool to study asymptotic properties of complicated statistical estimators. In this section we demonstrate, for right censored data, how the i.i.d. representations for nonparametric univariate and bivariate survival function estimators have been used as building blocks in the derivation of asymptotic properties of more complicated estimators. Given the large amount of possible applications, we limit ourselves to four concrete examples that have recently been discussed in the statistical literature: nonparametric conditional residual quantile estimation, nonparametric copula estimation, cure models (in survival analysis and banking) and goodness-of-fit in regression models.

4.1 Conditional residual quantiles

The Lo and Singh (1986) representation (Theorem 3 in this paper) has been used to obtain i.i.d. representations for quantiles of the Kaplan–Meier estimator \(\widehat{S}(t)\) (Gijbels and Veraverbeke 1988) and also for quantiles of the conditional Kaplan-Meier estimator \(\widehat{S}(t\mid x)\) in Remark 3 (Van Keilegom and Veraverbeke 1998).

More recent work is the study of conditional residual quantiles. For a lifetime \(T_1\) and some other variable \(T_2\), containing extra information on \(T_1\), conditional residual lifetime distributions are defined as \(P(T_1 - t_1 \le y \mid T_1 > t_1, T_2 \le t_2)\) or \(P(T_1-t_1 \le y \mid T_1> t_1, T_2 > t_2)\) or \(P(T_1 - t_1 \le y \mid T_1 > t_1, t_{21} < T_2 \le t_{22})\).

Abrams et al. (2021, 2023) studied asymptotic representations for nonparametric estimators of the quantiles of these distributions. The proposed estimators use the one-component Akritas-Van Keilegom estimator of Sect. 3.8 and the univariate censoring estimators of Sects. 3.4 and 3.5.

The i.i.d. representations in Sect. 3 are key ingredients to study the asymptotic properties of the conditional residual quantile estimators.

4.2 Copulas

Survival copulas can be written as \(\mathcal {C}(u_1,u_2) = S(S_1^{-1}(u_1), S_2^{-1}(u_2))\) with S the joint survival function and \(S_1\) and \(S_2\) the marginal survival functions. Using nonparametric estimators \(S_n\), \(S_{1n}\) and \(S_{2n}\) for S, \(S_1\) and \(S_2\), a nonparametric estimator \(\mathcal {C}_n(u_1,u_2)\) for \(\mathcal {C}(u_1, u_2)\) is given by

$$\begin{aligned} \mathcal {C}_n(u_1,u_2) = S_n(S_{1n}^{-1} (u_1), S_{2n}^{-1}(u_2)). \end{aligned}$$

Using the nonparametric estimators for \(S_n\) discussed in Sect. 3 and nonparametric Kaplan–Meier based estimators for the marginal quantiles, we obtain estimators for copula functions, which can be studied based on the asymptotic representations given in Sect. 3. See Geerdens et al. (2016) for details. In that paper there is also a comparison with an alternative estimator of Gribkova and Lopez (2015).

4.3 Cure models

There are many contexts (e.g. cancer research) in which subjects in the study never experience the event of interest (e.g. death caused by the cancer). They are called ’cured’.

Several models have been introduced and studied to modify the classical survival analysis in presence of a cured fraction. An up-to-date review paper is Amico and Van Keilegom (2018). In Geerdens et al. (2020) a goodness-of-fit test for a parametric survival function with cure fraction is discussed for the mixture cure model \(S(t) = 1 - \phi + \phi S_1(t)\) with \(1-\phi \) the cure fraction and \(S_1(t)\) the survival function of the uncured subjects (the susceptibles). With \(\widehat{S}_1(t)\) the Maller and Zhou (1996) estimator for \(S_1(t)\) and \(\widehat{\theta }\) the maximum likelihood estimator for \(\theta \), the Cramér-von Mises distance

$$\begin{aligned} \wedge _n = \sum \limits _{i=1}^n (\widehat{S}_1(Y_i) - S_{1, \widehat{\theta }} (Y_i))^2 \end{aligned}$$
(18)

with \(Y_i = T_i \wedge C_i\), is used to test

$$\begin{aligned} H_0: S_1 \in \{S_{1,\theta }: \theta \in \Theta \}\ \text{ versus }\ H_a: S_1 \notin \{S_{1,\theta }: \theta \in \Theta \} \end{aligned}$$

where \(\Theta \) is the parameter space of the parameter \(\theta \) in the assumed parametric form \(S_{1,\theta }(t)\) of the survival function \(S_1(t)\).

An example of an application of censoring and cure models outside the clinical research but in the domain of finances and banking appeared in the recent PhD thesis of Peláez-Suárez (2022). She uses the conditional cure model

$$\begin{aligned} S(t \mid x) = 1 - \phi (x) + \phi (x) S_1(t\mid x) \end{aligned}$$

with T the time to default (unable to pay the debts incurred by granting a credit) and X a credit score variable. To estimate the default probability

$$\begin{aligned} P(T \le t + b \mid T > t, X = x) = \frac{S(t+b\mid x)}{S(t\mid x)} \end{aligned}$$
(19)

she uses a nonparametric cure model estimator of the conditional survival function \(S(\cdot \mid x)\). The latter estimator, in terms of Beran-type estimators (Beran 1981) for the incidence \(\phi (x)\) and the latency \(S_1(t\mid x)\), is studied in López-Cheda et al. (2017a, b).

To study the asymptotic properties of the goodness-of-fit statistics \(\wedge _n\) in (18) or the estimated default probability (19) again i.i.d. representations are essential.

4.4 Goodness-of-fit in regression models

There is also a huge literature on regression models with censored data in which the response T is subject to random right censoring. We mention the two recent papers: González-Manteiga et al. (2020) and Conde-Amboage et al. (2021) and the references therein. Examples are the mean regression model \(T = m(X) + \varepsilon \) where m(X) is the conditional mean of T, given X, or the quantile regression model \(T = g_\tau (X) + \varepsilon \) where \(g_\tau (X)\) is the conditional \(\tau \)-quantile function of T, given X \((0< \tau < 1)\). There exist many goodness-of-fit procedures to test the hypothesis that \(m(\cdot )\) or \(g_\tau (\cdot )\) belong to some class of parametric functions. As discussed for cure models, goodness-of-fit statistics are based on a comparison of a model based parametric estimator and a nonparametric estimator for \(m(\cdot )\), resp. \(g_\tau (\cdot )\) and, again, the role of i.i.d. representations is crucial to study the asymptotic properties of the goodness-of-fit statistics.

The above examples clearly show the need of i.i.d. representations to study asymptotics in more complicated censoring models. Indeed, also the study of asymptotic properties of nonparametric estimators of the univariate or bivariate survival functions for data subject to left truncation and right censoring or interval censored data will rely on i.i.d. representations. Moreover such representations are and will be highly needed to study more complex data schemes, e.g. censored data in competing risks models and models dealing with dependent censoring.