Keywords

1 Introduction

Winfried Stute is one of the pioneers and key contributors to empirical process theory. His interest in empirical processes dates back to his students days with a Diploma thesis on this topic under the guidance of Peter Gaenessler, which was later published in Z. Wahrscheinlichkeitstheorie und verw. Gebiete in 1976, a premier journal in probability. This was a feat for a diploma (comparable to M.Sc.) student. Winfried continued to work on problems in empirical processes for the next ten years and gained international acclaim for this work. His 1982 paper (Stute 1982) on the oscillation behavior of empirical processes remains a classic and became a foundation for research in density estimation, nonparametric regression, and beyond. His odyssey into the terrain of survival analysis was not accidental and was a consequence of his interest in expanding the horizon of empirical processes from the i.i.d. setting to incomplete data. Here we define survival analysis in the narrow sense that it involves incomplete data, such as randomly right censored or truncated data, or doubly censored data etc. With this narrow interpretation, Winfried’s first publications in survival analysis appeared in Diehl and Stute (1988) and Dikta et al. (1989). They involved density and hazard function estimation as well as sequential confidence bands for distribution functions, all for randomly right censored data. Applying empirical process theory to censored data is a natural path that many theoreticians have partaken in, thereafter his interest in survival analysis intensified. From 1992 to 1997, he had more than 20 papers in survival analysis and continued to plow the field till his retirement in 2012 and beyond. In fact, his most recent papers including Sen and Stute (2014) and Azarang et al. (2013) are on this topic. To date he has produced nearly 40 papers in survival analysis, which accounts for roughly one-third of his publications. It is fitting and a pleasure for me to comment on his contributions in survival analysis, especially as I have worked with him on several projects in this area. Instead of an exhaustive review of his work in survival analysis, I will focus on the projects that I co-authored or am most familiar with, complemented by a few anecdotes.

We began to collaborate in the Fall of 1990 when I spent a sabbatical leave at the University of Giessen, where Winfried was a faculty member in the Department of Mathematics until his retirement. We were looking for a topic of common interest and survival analysis was the obvious choice, as he had just entered the field and I was in the midst of several projects on incomplete data. It took little time for us to settle on a topic, the strong law of large number (SLLN) for censored data, which seemed of great interest given that the SLLN is one of the most fundamental theoretical concepts in statistics. There were many results on the strong consistency of the Kaplan-Meier (hereafter abbreviated as K-M) estimator at that time but little was known for the general setting that involves the K-M integral defined as \(\int \phi (x) \ d\hat{F}_n(x)\), where \(\hat{F}_n\) is the K-M estimator (defined in (1.5) of section  “Strong Law of Large Numbers: Random Right Censoring”) of the true lifetime distribution function F and \(\phi \) is a Borel measurable function on the real line such that \(\int |\phi (x) | \ dF(x) < \infty \). The open problem we addressed was under what conditions \(\int \phi (x) \ d\hat{F}_n(x)\) would converge to \(\int \phi (x) \ dF(x)\) with probability one; the answer was provided in Stute and Wang (1993b).

This was a most memorable experience for me, especially as we were able to solve the problem with the minimum requirement that \(\phi \) is F-integrable, which is the same condition that is needed for the classical SLLN to hold for i.i.d. data. The proof is fairly elaborate and involves several cases where both the lifetime and censoring distributions could be continuous, discrete, or neither, as long as they do not have common jumps (see Sect. 1.2 for details). This collaboration led to three subsequent joint papers (Gürler et al. 1993; Stute and Wang 1993a, 1994) and a series of papers by Winfried and his other collaborators (Stute 1993a, 1994a, b, c, d, 1995a) within the next two years.

Shortly after solving the SLLN for censored survival data, Winfried tackled the next most fundamental result, the central limit theorem (CLT) for the K-M integral (Stute 1995b), to be discussed further in Sect. 1.3. Besides randomly right censored data, Winfried also made landmark contributions to truncated data, another type of incomplete data that are challenging for two reasons: the sample is biased and there are technical difficulties at both the left and right tails of the lifetime distribution F, in contrast to the right censoring case where the left tail pose no difficulties. For truncated data the counterpart of the K-M estimator is the Lynden-Bell estimator (Lynden-Bell 1971) \(\hat{F}_n^T\) defined in (1.27), which will be further discussed in Sect. 1.2. In Stute (1993a) Winfried established an i.i.d. representation for the Lynden-Bell estimator, which facilitated further asymptotic analysis for truncated data. This work sparked further research interest for truncated data, however a fundamental result regarding a CLT for the Lynden-Bell integral \(\int \phi (x) \ d \hat{F}_n^T\) had proved elusive for a long time and remained an open problem. By 2000 we both drifted away from survival analysis but were acutely aware of the need to fill this major theoretical gap. Finally, in the fall of 2005 (or around that period) I returned to Giessen to work on this project with Winfried and the results were published in Stute and Wang (2008). This was my last paper with Winfried although we both maintained interest in survival analysis and continued to dash into the field occasionally.

In the remainder of this paper, we discuss Winfried’s four papers with focus on the SLLN and CLT for survival data.

2 Strong Law of Large Numbers: Random Right Censoring

Let \(X_1, \ldots , X_n\) be a sequence of i.i.d. random variables from a distribution function F, and let \(F_n\) be their empirical distribution function. The classical SLLN implies that, with probability one, \(\int \phi (x) \ dF_n (x) = \frac{1}{n} \sum _{i=1}^n \phi (X_i) \rightarrow E(\phi (X_1) )=\int \phi (x) \ dF(x), \) as long as \(E( |\phi (X_1) | ) < \infty \). Here the empirical distribution is a discrete probability measure that assigns equal point mass 1/n to each observation \(X_i\), hence the classical SLLN and CLT hold automatically for \(\int \phi (x) \ dF_n (x)\). When F is an event-time or lifetime distribution, a longitudinal follow-up study is needed to track the event-time \(X_i\) and as in many studies patients/subjects may be lost during the follow-up period or the study has to end before the event, which could be death. Therefore the event time cannot be observed for all patients. This triggers right censoring for which Winfried has made major contributions.

In the setting of random right censoring, \(X_i\) are no longer observed directly as they are subject to potential censoring by an independent variable \(Y_i\). Instead, one can only observe \(Z_i =\) min\( (X_i, Y_i)\) along with the censoring indicator, \(\delta _i= 1_{\{X_i \le Y_i \}}.\) Unless otherwise mentioned, we make the standard assumption that \(Y_1, \ldots , Y_n\) is an independent sequence of i.i.d. censoring variables with distribution function G that is independent of the sequence \(X_i\). The counterpart of the empirical distribution in the presence of right censoring is the Kaplan-Meier estimator \(\hat{F}_n\), which has been defined in several different but equivalent ways. We will use the form that has the most intuitive appeal for the purpose we want to serve.

One of the most intuitive ways to understand the principle of estimation for incomplete or biasedly sampled data is to first identify what parametric or nonparametric components could be estimated empirically from the observed data and then relate these components to the main target. In the random right censoring setting the main target is the lifetime distribution F, or equivalently its cumulative hazard function, which is defined as

$$\begin{aligned} \Lambda (x) = \int _0^x \frac{dF(t)}{1-F(t-)}, \end{aligned}$$
(1.1)

where the notation \(F(t-)\), for any distribution F, stands for \(F(t-) = \lim _{y \uparrow t} F(y)\), the limit of F(y) as y approaches t from below.

It is obvious that \(Z_i\) can always be observed, so its empirical distribution function, \(H_n (x)= \frac{1}{n} \sum _{i=1}^n 1_{Z_i \le x}\), is the natural estimate for \(H(x)= \)Pr \((Z_1 \le x).\) Likewise, \(H_1(x) = Pr (Z_1 \le x, \delta =1),\) a subdistribution of the distribution of the \(Z_i\), can be estimated empirically by \(H_{1n}(x)= \frac{1}{n} \sum _{i=1}^n 1_{\{Z_i \le x, \ \delta _i=1\}}\). It is not difficult to show that

$$\begin{aligned} \Lambda (x) = \int _0^x \frac{dH_1(t)}{1- H(t-)}. \end{aligned}$$
(1.2)

So \(\Lambda \) can be estimated by replacing H and \(H_1\) in (1.2) with their respective empirical estimates, \(H_n\) and \(H_{1n}\).

To include the case with tied observations, let \(Z_{(1)}< Z_{(2)}<\ldots < Z_{(K)}\) denote the K distinct and ordered observed lifetimes among \(\{Z_1, \ldots , Z_n\}\), i.e. \(Z_{(i)}= Z_j\) for some j for which \(\delta _j=1\). Then the resulting estimate of \(\Lambda \) is

$$\begin{aligned} \hat{\Lambda }_n(x)= \int _0^x \frac{dH_{1n}(t)}{1- H_n(t-)} = \sum _{i=1}^K \left[ \frac{d_i}{n_i}\right] ^{1_{\{Z(i) \le x \}}}, \end{aligned}$$
(1.3)

where \(d_i = \sum _{j=1}^n 1_{\{Z_j=Z_{(i)}, \delta _j=1\}}\) is the number of deaths observed at time \(Z_{(i)}\) and \(n_i= n[1- H_n(Z_{(i)} -)] = \sum _{j=1}^n 1_{\{Z_j \ge Z_{(i)} \}}\) is the number of subjects still at risk at \(Z_{(i)}\).

The estimator \(\hat{\Lambda }_n (x)\) has an intuitive interpretation as the cumulative risk up to time x and is referred to in the literature as the Nelson-Aalen estimator. It is also the cumulative hazard function of the Kaplan-Meier estimate if one adopts a general result that provides a one-to-one correspondence between a cumulative distribution function (F) and its cumulative hazard function (\(\Lambda \)) for any random variable, be it continuous, discrete, or neither. Specifically, for any F let \(F\{x\}= F(x)- F(x-) \) denote the point mass at x and similarly for \(\Lambda \), then \(F\{x\}=\Lambda \{x\} =0,\) except for \(x \in A_F\), where \(A_F\) is the set of atoms of F, i.e. \(A_F\) is the set of all x at which F(x) is discontinuous. Decomposing \(\Lambda \) into \(\Lambda = \Lambda _c + \Lambda _d\), where \(\Lambda _c\) is a continuous function and \(\Lambda _d\) is a step function with jumps at \(A_F\) and jump sizes \(\Lambda \{x\}\), the following relation holds for any distribution function F:

$$\begin{aligned} 1-F(x)= e^{-\Lambda _c(x)} \prod _{a_j \in A_F, a_j \le x} [1-\Lambda \{a_j\}]. \end{aligned}$$
(1.4)

Since the Nelson-Aalan estimator in (1.3) is a step function with atoms in \(A_H= \{Z_{(1)}, \ldots , Z_{(K)}\}\), following (1.4) its corresponding survival function is:

$$\begin{aligned} 1- \hat{F}_n (x) = \prod _{i=1}^K \left[ 1 - \frac{d_i}{n_i}\right] ^{1_{\{Z(i) \le x\}}}, \end{aligned}$$
(1.5)

which is the Kaplan-Meier estimator.

It follows from (1.5) that \(\hat{F}_n\) is a discrete distribution function with atoms at \(Z_{(i)}\) and jump sizes

$$\begin{aligned} w_{i}= \frac{d_i}{n_i} \prod _{j=1}^{i-1} \left[ 1- \frac{d_j}{n_j}\right] . \end{aligned}$$
(1.6)

With this interpretation it is easy to see that the Kaplan-Meier estimate \(\hat{F}_n\) collapses to the empirical distribution \(F_n\) in the absence of censoring, i.e. when all \(\delta _i=1\), and the SLLN implies \(\int \phi (x) \ dF_n (x) = \frac{1}{n} \sum _{i=1}^n \phi (X_i) \rightarrow E(\phi (X_1) )=\int \phi (x) \ dF(x)\) as long as \(\int |\phi (x)| \ dF(x) < \infty . \) When censoring is present, the Kaplan-Meier integral, defined as

$$\begin{aligned} S_n= \int \phi (x) \ d \hat{F}_n (x) = \sum _{i=1}^K w_{i} \ \phi (Z_{(i)}), \end{aligned}$$
(1.7)

now has random weights \(w_{i}\) (1.6) at \(Z_{(i)},\) and this poses technical challenges for the SLLN in Theorem 1. Moreover, \(S_n\) cannot converge to \(\int \phi (x) dF(x), \) if \(\tau _H= inf \{x: H(x)=1\}< \tau _F= inf \{x: F(x)=1\}.\)

The correct limit for \(S_n\) turns out to be

$$\begin{aligned} S= & {} \int _{x<\tau _H} \phi (x) \ dF(x) + 1_{\{\tau _H \in A_H\}} \ \phi (\tau _H) \ F\{\tau _H\} \nonumber \\= & {} \int _{x \le \tau _H} \phi (x) \ d\tilde{F}(x), \end{aligned}$$
(1.8)

where \(A_H\) is the set of atoms of H and

$$\begin{aligned} \tilde{F}(x)= & {} F(x), \text{ if } \ x<\tau _H, \nonumber \\= & {} F(\tau _H\ ^{-}) + 1_{\{\tau _H \in A_H\}} \ F\{\tau _H\}, \ \ \ \ \text{ if } \ x\ge \tau _H. \end{aligned}$$
(1.9)

We now state the SLLN for censored data in Stute and Wang (1993b), hereafter abbreviated as SW93, which had more than 300 citations according to Google Scholar in August, 2017.

Theorem 1

(Stute and Wang 1993b)

Assume that \(\int |\phi | \ dF < \infty \), then

$$S_n= \int \phi (x) \ d\hat{F}_n (x) \rightarrow S = \int \phi (x) \ d \tilde{F} (x),$$

with probability one and in the mean.

Remark 1

Obviously, the limit in the r.h.s. is \(\int _{\{x \le \tau _H\}} \phi (x) \ dF(x)\) unless F is discontinuous at \(\tau _H\) and \(G(\tau _H-) =1\). Otherwise, the limit is \(\int _{\{x < \tau _H \}} \phi (x) \ dF(x)\). Also, the limit would be \(\int \phi (x) \ dF(x)\) if \(\tau _H = \tau _F\) and F is either continuous at \(\tau _H\) or F has a jump at \(\tau _H\) but \(G (\tau _H -) <1.\)

Remark 2

The original SLLN in SW93 involved an extra condition that F and G have no jumps in common but this condition can be removed using a new time scale. This extension was briefly mentioned in a review paper (Stute 1995b), which is a highly recommend reading for anyone interested in studying Winfried’s striking results on K-M integrals. With this extension, the only assupmtion for the SLLN to hold under the random censoring scheme is exactly the same as its empirical counterpart with no censoring. That is, censoring does not cost any theoretical compromise for the SLLN but this is not the case for the central limit theorem which will be explored in Sect. 1.3.

Remark 3

Applications of Theorem 1 are plentiful. For example the choice \(\phi (x) = 1_{(-\infty , t]}(x)\) leads to the strong consistency of the K-M estimator, and the choice \(\phi (x) = x^k\) leads to the convergence of the K-M moment estimators, among others results. Needless to say, it is useful to establish the SLLN for U-statistics (Stute and Wang 1993a) and M-estimators (Wang 1995). The results in Stute (1976) can be further used to derive the strong uniform consistency of the K-M estimator. We refer the readers to the corollaries on Page 1595 there and the additional discussions in SW93.

Last but not least, we mention three additional applications of the SLLN that are provided in Stute (1993b, 1994a) and Stute and Wang (1994). An extension to the multivariate case was studied in Stute (1993b) in the presence of a p-dimensional covariate when these covariates are not subject to censoring. The SLLN for the multivariate joint distribution of the censored response and its covariates is presented there with a very neat application to the linear censored regression model and a proposal for a new and simple estimator for the slope regression parameter. This estimator was shown to perform favorably against its competitors, such as the Buckley-James estimator (Buckley and James 1979). In Stute (1994a) an explicit expression for the bias of a Kaplan-Meier integral was established, while Stute and Wang (1994) provides an explicit formula for the jackknife estimate of a Kaplan-Meier integral.

2.1 Key Ideas of the SLLN

The most studied case in the literature is for \(\phi (x)= 1_{(-\infty , t]}\), which amounts to showing \(\hat{F}_n (t) \rightarrow F(t)\) almost surely (a.s.). Because of the nice empirical expression (1.3), most approaches in the literature by 1990 took a two step approach by first showing that \(\hat{\Lambda }_n (t) \rightarrow \Lambda (t)\) a.s. and then showing that \(\log (1- \hat{F}_n (t) ) + \hat{\Lambda }_n (t) \rightarrow 0\) a.s. This completes the proof for continuous F, since \(\log (1-F(t)) = - \Lambda (t)\). One drawback with such a two-step approach is that the convergence can only be established for t such that \(F(t)<1\) or along a sequence of \(t_n\) such that \(F(t_n) \rightarrow 1\) slowly, since \(\Lambda (t)\) needs to stay away from \(\infty \) at a certain rate. It turns out that this problem on the right tail can be avoided if one bypasses the cumulative hazard function and works instead with the distribution function directly. This is the approach taken in SW93.

Since the idea is to stick to the target (1.7), the goal is to explore what kind of structure it possesses. There are three classical techniques to prove SLLNs for the case where there is no censoring, (i) Kolmogorov’s original proof, (ii) the ergodic theorem for strictly stationary and ergodic sequences, and (iii) the reverse-time martingale approach (Neveu 1975) for a proper sequence of \(\sigma -\)fields so the martingale convergence theorem can be applied. When we first looked into this problem in 1990, we knew immediately that the first two approaches could not be extended to censored data easily and the reverse-time martingale structure does not hold for \(S_n\) since \(E(S_n)\) varies with n. But Winfried had a hunch that it might still work if we can show that \(S_n\) endowed with a proper sequence of decreasing \(\sigma -\)field is a reverse-time supermartingale for positive \(\phi \) functions (we knew that \(S_n\) could not be a reverse-time submartingale because the K-M estimator is biased downward). Although it was not hard to construct the proper \(\sigma -\)fields (cf. Step 1 of Sect. 1.2.2) needed for this martingale structure it was not easy to pin down the supermartingale structure.

This went on for some time and we still had no clue about the martingale structure of \(S_n\). After another uneventful day (Wish I remembered the date !) I left the institute frustrated. After dinner and some soothing German dinner that night, my spirit was lifted and I decided to give a final shot to see if we should continue to invest our time on this problem. My plan was simple, just check the simple cases of \(n=1, 2\) and 3, and the truth would be revealed. I settled down to do the calculation: The case of \(n=1\) was trivial, and YES, \(S_n\) is a reverse-time supermartingale when \(n=2\). When it was revealed that \(S_n\) is also a reverse-time supermartingale for \(n=3\), I thought that I had hit the jackpot—it had to be (ok, just might be) true for general n. I learned this naive strategy to do research on difficult problems from the late Jack Kiefer who taught me that if something is true for \(n=1\) to 3, it is probably true for all n!

A side note about my two mentors in research, Professors Jack Kiefer and Lucien LeCam, two geniuses who approached open problems from opposite ends. Both were extremely kind and generous. Kiefer was my thesis advisor until his sudden death at the age of 57, less than a year before my graduation, and Le Cam graciously took me under his wings after Kiefer’s death and remained a mentor and friend until his death in 2000. I am forever indebted to their inspiration and guidance. As mentioned, Kiefer taught me how to approach a problem from the simplest scenario, e.g., try the one-dimensional case first, then general Euclidean space, before launching to the infinite-dimensional abstract space. LeCam favored the opposite, top-down, approach, as he could see things high up in the abstract space that few others could, so he typically approached a problem in its most general and abstract setting. I was extremely fortunate to witness their differences and took advantage of both approaches. Often, I would start to work on a problem at the ground level and work my way up as Kiefer had taught me to do, but once reaching the higher ground, I would look for powerful tools that could simplify the proofs or expand the results with less stringent assumptions. These two reversed approaches are effective in their own individual way but together they form a powerful team.

Going back to the story of the paper SW93 on the SLLN, the next morning I arrived early at the institute to eagerly share the discovery from the night before. I ran straight to Winfried’s office to announce the big news—the supermartingale structure must be true for \(S_n\) because it holds for \(n=1, 2\) and 3. He did not laugh at the obviously flawed and naive statement and instead immediately realized that we had work to do. We went downstairs to a classroom which had huge blackboards and began to explore our options one by one. After we understood what was going on and believe by then that the statement must be true we still could not come up with a one-shot proof that \(S_n\) is a reverse-time supermartingale. So we took the last resort—to prove it by mathematical induction! I don’t know about Winfried but I never would have thought that one day I would use induction to prove the theorem of my life. It is perhaps not the most elegant way to prove the key result, Lemma 2.2 in SW93, and I still wonder whether there is a more direct way to prove this lemma. But even with the induction the proof of Lemma 2.2 is non-trivial and involves elegant use of order statistics, concomitants and ranks.

We made significant progress over the next few days but there were still multiple hurdles ahead. It must have taken more than a week before we had a proof for the special case when H is a continuous distribution function. I was exhilarated and ready to call it quits when we had the proof for continuous H as that was already far better than any existing result. But Winfried was not content, he was keen to get rid of the continuity assumption. So upon his insistence we eliminated this assumption and showed that the SLLN holds as long as F and G do not have common jumps. This was the version published in our 1993 paper but Winfried learned of a trick later to get rid of this assumption and included that extension in Stute (1995b). In the end he fulfilled his dream to show that the SLLN for K-M integral holds under no assumptions other than the trivial one that \(\int |\phi | \ dF < \infty \). Overall, this project involved the most unusual route towards a proof that I have ever encountered in my career. Still until today, Theorem 1.1 in SW93 remains my favorite theorem of all time.

2.2 Outline of the Proof

Step 1. The first step is to identify the \(\sigma -\)fields for the reverse time martingale. Towards this goal, it is easier to consider an alternative form of the K-M estimator, which aims at breaking tied observations so that any lifetime precedes a tied censoring time but the ordering within tied lifetimes or tied censoring times can be arbitrary. With this rule, the K-M estimator in (1.5) is equivalent to the one in formula (1.2) of SW93 and the associated K-M integral (1.7) can be expressed as

$$\begin{aligned} S_n= & {} \sum _{i=1}^n W_{in} \ \phi (Z_{i:n}),\, \text{ where } Z_{i:n} \ \text{ is } \text{ the } i \text{ th } \text{ ordered-statistics } \text{ among } \ \{Z_1, \ldots , Z_n\}, \\ \nonumber W_{in}= & {} \frac{\delta _{[i:n]}}{n-i+1} \prod _{j=1}^{i-1} \left[ \frac{n-j}{n-j+1} \right] ^{\delta _{[i:n]}} \text{ with } \delta _{[i:n]} =\delta _j, \ \text{ if } Z_j=Z_{i:n}. \end{aligned}$$
(1.10)

The \(\delta _{[i:n]}\) above are often called the concomitants of the \(Z_{i:n}\).

With these notations define \(\mathcal {F}_n\) to be the \(\sigma \) field generated by \(\{ Z_{i:n}, \delta _{[i:n]}, 1 \le i \le n, Z_{n+1}, \delta _{n+1} \ldots \}. \) Then \(S_n\) is adapted to \(\mathcal {F}_n\) with \(\mathcal {F}_n \downarrow \mathcal {F}_{\infty }= \cap _{n \ge 1 } \mathcal {F}_{n}\) and \(\mathcal {F}_{\infty }\) is trivial by the Hewitt-Savage zero-one law.

Step 2. Next, we show that, for every \(\phi \ge 0\) and continuous H, \(E(S_n | \ \mathcal {F}_{n+1}) \le S_{n+1}\). Hence \(\{S_n, \mathcal {F}_n\}_{n \ge 1}\) is a reverse-time supermartingale. The proof uses mathematical induction and is included in Lemma 2.2 of SW93. This is the key step towards the final SLLN.

Step 3. Proposition 5-3-11 in Neveu (1975) then implies that, for every \(\phi \ge 0\), \(S_n\) converges a.s. and in the mean to some random variable \(S_{\infty }\), which must be a constant S by the Hewitt-Savage zero-one law. Hence \(S_n \rightarrow S\) a.s. and \(E |S_n - S | \rightarrow 0\).

This result can be extended to general \(\phi = \phi ^+ + \phi ^-\) by decomposing into positive (\(\phi ^+\)) and negative (\(\phi ^-\)) parts.

Step 4. It now remains to identify the constant S and this was achieved in Lemma 2.7 of SW93 for continuous H, which implies that

$$\begin{aligned} S= \int \phi (x) \ m(x) \ \gamma _0(x) \ dH(x), \end{aligned}$$
(1.11)

where \(m(x)=P (\delta =1 | Z=x),\) and \(\gamma _0 (x) = \int _0^{x-} \frac{1-m(y)}{1-H(y)} \ dH(y).\)

Under independence of lifetime T and censoring variable C, the limit S then takes the form in (1.8).

Step 5. To show the result for a general H, we first look at the case where F and G have no common jumps, hence there are no tied observations between the censored and uncensored observations. Under this assumption, apply a quantile transformation, \(H^{-1}(U_i)\), to a specially constructed sequence \(U_i\) of uniform [0, 1] random variables as in Lemma 2.8 in SW93, so that \(Z_i=H^{-1}(U_i)\). Then

$$\begin{aligned} S_n= \sum _{i=1}^n W_{in} \ \phi (H^{-1}(U_{i:n})). \end{aligned}$$
(1.12)

The SLLN now follows by replacing \(\phi \) with \(\phi \circ H\). This is what was obtained in SW93, where the only assumption needed for the SLLN of K-M integrals is that F and G have no common jump points. It turns out that this restriction can be removed because the K-M estimator treats an uncensored observation as if it precedes a censored one slightly if there is a tie between them. A trick in Gill (1980) to shift the time scale of G slightly to the right of those common jump points then implies that F and the transformed G on this new time scale no longer have common jumps and hence the SLLN holds. This trick was discussed in detail on page 437 of Stute (1995a) where he dealt with the CLT for the K-M integral. In conclusion, the SLLN for a K-M integral holds under the minimal assumption \(\int | \phi (x) | \ dF(x) < \infty \), with no restriction on F and G, just like the classical SLLN when there is no censoring.

3 Central Limit Theorem: Random Right Censoring

Once the limit S in (1.8) of \(S_n= \int \phi (x) \ d\hat{F}_n(x)\) has been identified, this facilitates to explore the limiting distribution of \(S_n - S \), i.e. the CLT for a K-M integral. The special case of \(\phi (x) = 1_{(\infty , x]}\) for \(x < \tau _H\), was studied, for instance, by Breslow and Crowley (1974); Lo and Singh (1986); Major and Rejto (1988). The unrestricted case for all x was established in Gill (1983) and Ying (1989) by using the martingale convergence theorem, a powerful tool to handle asymptotic theory for censored data that was popular in the 1980s. However, some technical assumptions were still needed to control the censoring effect in the right tail of the lifetime distributions. Under these assumptions, the case of a \(\phi \)-function that is of bounded variation on an interval [0, T] for which \(T<\tau _H\) can be handled without much difficulties by invoking integration by parts. But this is a restrictive class of functions and specifically, it excludes the estimation of the K-M mean which corresponds to \(\phi (x)=x\). Susarla and Van Ryzin (1980) were able to extend the K-M mean estimate to an interval \([0, M_n]\) for which \(M_n \rightarrow \infty \) at a suitable rate but the results on \([0, \infty )\) remains unresolved.

Subsequently, Schick et al. (1988) established the CLT for \(\phi \)-functions that are nonnegative, nonincreasing and continuous. Under some regularity conditions on F, Yang (1994) extended the results to general functions \(\phi \) that satisfy

$$\begin{aligned} \int \frac{\phi ^2}{1-G} \ dF < \infty . \end{aligned}$$
(1.13)

Other than the restrictions on F, the result of Yang (1994) is optimal as assumption (1.13) is needed to ensure that the limiting variance is finite. Other restrictions in Yang (1994) were removed by Winfried in his1995 paper (Stute 1995a), where he established the CLT for K-M integral for any F and G under minimal conditions. This paper had 173 citations based on Google scholar in August 2017.

How did Winfried do it? There are several ways to derive the CLT and those familiar with Winfried’s technical style probably know his affinity to derive everything from scratch using basic tools. Thus, instead of employing the martingale CLT as was done in Gill (1983), he took the classical approach of expanding \(\sqrt{n} \ (S_n -S) \) as a sum of i.i.d. random variables plus a small and negligible remainder term. While this i.i.d. representation approach has been explored by many before him, the key to success rests upon the conditions that are invoked to handle the remainder term. Through a clever expression of \(S_n\) as a U-statistic of degree three plus a negligible remainder term, he realized that the Hajek projection for U-statistics would provide the right platform for the desired i.i.d. decomposition. What remains is hard analysis and the tenacity to get things right.

In professional life, Winfried is a minimalist. Any extra condition for the sake of convenience would be an eyesore for him. In my experience working with him, I have witnessed repeatedly his persistence to get rid of anything that is not elegant.

This working style has served him well and attributed to his ability to produce the most elegant results time and again.

To state the CLT, we first define several quantities:

\(\tilde{H}^0(z)= P(Z\le z, \delta =0) = \int _{-\infty }^z (1-F(y) ) \ dG(y),\)

\(\tilde{H}^1(z)= P(Z\le z, \delta =1) = \int _{-\infty }^z (1-G(y-)) \ dF(y), \)

\(\gamma _0(x) = \exp \{ \int _{- \infty }^{x-} \ \frac{d\tilde{H}^0(y) }{1-H(y) } \},\)

\(\gamma _1(x) = \frac{1}{1-H(x)} \ \int 1_{\{x<w\}}\ \phi (w)\ \gamma _0(w) \ d\tilde{H}^1(w),\)

and

\(\gamma _2 (x) = \int \int \frac{1_{v<x, v<w} \ \phi (w) \ \gamma _0(w) }{[1-H(v)]^2} \ d\tilde{H}^0 (v) \ d \tilde{H}^1(w). \)

The following two assumptions are needed for the CLT in Theorem 2:

$$\begin{aligned} \int \phi ^2(x) \ \gamma _0^2(x) \ d \tilde{H}^1(x) < \infty \end{aligned}$$
(1.14)

and

$$\begin{aligned} \int |\phi (x)| \ C^{1/2}(x) \ d\tilde{F}(x) < \infty , \end{aligned}$$
(1.15)

where \(C(x)= \int _{-\infty }^{x-} \ \frac{dG(y)}{[1-H(y)] \ [1-G(y)]}\) and \(\tilde{F}\) is defined in (1.9).

Theorem 2

(Corollary 1.2 of Stute, 1995a) Under assumptions (1.14) and (1.15), \(\sqrt{n} (S_n -S) = \sqrt{n} \int \phi (x) \ d(\hat{F}_n - F) (x) \rightarrow N (0, \sigma ^2) \) in distribution, where \(\sigma ^2\)= Var \([\phi (Z) \ \gamma _0(Z) \ \delta + \gamma _1(Z) \ (1-\delta ) - \gamma _2(Z) ] \).

Remark 4

For continuous F the asymptotic variance becomes

$$\begin{aligned} \sigma ^2= & {} \int _{- \infty }^{\tau _H} \frac{\phi ^2 (x)}{1-G(x)} \ dF(x) - \left[ \int _{- \infty } ^{\tau _H} \ \phi (x)\ dF(x) \right] ^2 \nonumber \\&- \int \left[ \int _x^{\tau _H} \ \phi (y) \ dF(y) \right] ^2 \frac{1-F(x)}{[1-H(x)]^2} \ dG(x), \end{aligned}$$
(1.16)

which further simplifies to \(\sigma ^2= \int \phi ^2 \ dF - [\int \phi \ dF]^2\) when there is no censoring as G then always equals zero.

Remark 5

Condition (1.14) is equivalent to condition (1.13) when F is a continuous function. Both are properly modified “second moment” conditions in the CLT for censored data. Condition (1.15), on the other hand, is used to control the bias of the K-M integral so the \(\sqrt{n}\) rate can be achieved. This is the price paid by the K-M estimator and is needed for the CLT of general K-M integrals. Examples provided in Stute (1995a) imply that Theorem 2 may not hold if condition (1.15) is not satisfied.

Remark 6

As with the SLLN, applications of the CLT are plentiful. Remark 3 above listed a few such applications. In particular, the CLT was extended to the case when covariates are present in Stute (1996a). Another application is provided in Stute (1996b), where Winfried established an explicit expression for the variance of the jackknife estimator of the K-M integral and investigated the convergence of this variance estimator. Surprisingly the variance of the jackknife estimator converges to the variance of the K-M integral only when \(\phi (x) \rightarrow 0\) as \(x \rightarrow \tau _H\). As this is quite restrictive and in view of Winfried’s low tolerance for wrinkles, he proposed a modified variance estimate, \(\widehat{\text{ var }}^*_{JK}\), that satisfies \(n \ \widehat{\text{ var }}^*_{JK} \rightarrow \sigma ^2.\)

3.1 Outline of the Proof

The proof of the CLT is based on an i.i.d. representation of the K-M integral (cf. Theorem 1.1 of Stute (1995a)), which leads to

$$\begin{aligned} \int \phi \ d (\hat{F_n} - \tilde{F}) = \frac{1}{n} \sum _{i=1}^n U_i + R_n, \end{aligned}$$
(1.17)

where the \(U_i\) are i.i.d. with mean zero and variance \(\sigma ^2\) and \(R_n= o_P (n^{-1/2}).\)

The derivation of (1.17) follows the following steps.

Step 1. First assume that H is continuous. Then (1.10) and Lemma 2.1 in Stute (1995a) imply that \(\int \phi \ d \hat{F}_n\) can be expressed as

$$\begin{aligned} \sum _{i=1}^n W_{in} \ \phi (Z_{i:n}) \!=\! \int \phi (x) \exp \left\{ n \int _{-\infty }^{x-} \ln \left[ 1+ \frac{1}{n(1-H_n(y))} \right] d \tilde{H}_n^0 (y) \right\} \ d \tilde{H}_n^1(x). \end{aligned}$$
(1.18)

Step 2. Replace the logarithm term \(\ln (1+x)\) by x and neglecting the error terms, then the exponential term in (1.18) becomes \(\exp \left\{ \int _{-\infty }^{x-} \frac{d \tilde{H}^0 (y)}{1-H_n (y)} \right\} . \) Integrating this term w.r.t. \(\tilde{H}_n^1\) and further expanding this exponential term leads to a U-statistic of order 3.

Step 3. The Hájek projection of this U-statistic leads to the desired i.i.d. expansion in (1.17). Details are provided in Lemma 2.2–2.7 of Stute (1995a) under the additional assumption that

$$\begin{aligned} \phi (x)=0 \ \text{ for } \text{ all } \ x>T \ \text{ and } \text{ some } T< \tau _H, \end{aligned}$$
(1.19)

so all terms appearing in the denominators of the proof are bounded away from zero, hence the denominator will cause no problem.

Step 4. Under the two assumptions (1.14) and (1.15), and for continuous F, the denominators in the proof can be controlled without assumption (1.19). The proof is thus extended to the case without assumption (1.19) but with the assumption that H is continuous.

Step 5. Finally, the assumption of continuous H can be removed just as in the case for the SLLN, as discussed in Step 5 of Sect. 1.2.2.

4 Random Truncation

While I was visiting Giessen in 1990, Winfried and I worked on another type of incomplete data, random truncated data (Gürler et al. 1993), which often occur in astronomy (Woodroofe 1985) or in studies with delayed entry of patients into a study. Let \((X_i, Y_i), \, i=1, \ldots , N,\) be a sequence of i.i.d. random vectors for which \(X_i \sim F\) is also independent of \(Y_i \sim G\). Random truncation occurs when the pair \((X_i, Y_i) \) can be observed only when \(X_i \ge Y_i\). That is, neither \(X_i\) nor \(Y_i\) can be observed when \(X_i < Y_i\) but both are observed when \(X_i \ge Y_i\). This sampling structure is quite different from the one for censored data where one, and only one (the minimum), of the lifetime and censoring variable can be observed. Consequently, the observed sample size, \(n= \sum _{i=1}^N 1_{\{X_i\ge Y_i \}},\) is a random quantity while the latent sample size N is unknown. We denote the observed data by \((X^*, Y^*)\) to distinguish them from the original (XY). Luckily \((X_i^*, Y_i^*)\) are still i.i.d. with joint distribution

$$\begin{aligned} H^* (x, y) = P (X\le x, Y\le y | Y\le X) = \frac{1}{\alpha } \int _{-\infty }^x G(y \wedge z) \ dF(z); \end{aligned}$$
(1.20)

and marginal distributions

$$\begin{aligned} F^*(x)= & {} H^*(x, \infty )= \frac{1}{\alpha } \int _{-\infty }^x \ G(z) \ dF(z), \end{aligned}$$
(1.21)
$$\begin{aligned} G^*(y)= & {} H^*(\infty , y) = \frac{1}{\alpha } \int _{-\infty }^{\infty } \ G(y \wedge z) \ dF(z), \end{aligned}$$
(1.22)

where \(\alpha = P(Y\le X)\) and \(y \wedge z\) denotes the minimum of y and z.

Note here that \(F^*, G^*\) and \(H^*\) can all easily be estimated empirically but the goal is to estimate F and G. We say that left truncation occurs when the primary interest is F, in which case G is called the truncation distribution. Likewise, the data are right truncated when G is the primary interest and F is the right truncation distribution. For illustration purposes we focus on the left truncation case for which F is the primary target with cumulative hazard function \(\Lambda _F\) defined as in (1.1).

Let \(a_F= \inf \{x: F(x) >0\}\) be the left support point of F, and \(b_F= \sup \{x: F(x) <1\} \) be its right support point. It is not surprising that F can be estimated only when

$$\begin{aligned}&(i) \ a_G < a_F, \ \text{ or } \ \\ \nonumber&(ii) \ a_G = a_F \ \text{ and } \ F\{a_F\}=1. \end{aligned}$$
(1.23)

Case (i) is much easier to deal with than case (ii). Throughout this section we make the assumption (1.23) and write

$$\begin{aligned} C(z)= & {} P (Y^* \le z \le X^*) = G^*(z) -F^*(z-) \\ \nonumber= & {} \frac{1}{\alpha } \ G(z) \ [1-F(z-)], \ \text{ for } \ a_F \le z < \infty . \end{aligned}$$
(1.24)

It can be easily shown that \(\Lambda _F(x)= \int _{-\infty }^x \frac{dF^*(y)}{C(z)}.\) Hence the empirical estimate of \(\Lambda _F (x)\) is

$$\begin{aligned} \hat{\Lambda }_{n}^T(x)= \int _{-\infty }^x \frac{d F_n^*(z)}{C_n(z)} = \sum _{\text{ distinct } \ X_k^* \le x} \frac{F^*_n \{X_k^*\}}{C_n(X_k^*)}, \end{aligned}$$
(1.25)

where \(F_n^*\) is the empirical estimates based on \(\{X_1^*, \ldots , X_n^* \}\) and

$$\begin{aligned} C_n (z)= \frac{1}{n} \sum _{i=1}^n 1_{\{Y_i^* \le z \le X_i^* \}} \end{aligned}$$
(1.26)

is the empirical estimate of C. The superscript T in (1.25) reminds us that this is for truncated data.

Based on (1.4) the distribution function \(\hat{F}_n^T\) that corresponds to \(\hat{\Lambda }_{n}^T\) is

$$\begin{aligned} 1- \hat{F}_n^T(x) = \prod _{k: \ X_k^* \le x} \left[ 1- \frac{F_n \{X_k^*\}}{C_n(X_i^*)}\right] , \end{aligned}$$
(1.27)

which, when there are no tied observations among \(X_i^*\), becomes

$$\begin{aligned} 1- \hat{F}_n^T(x) = \prod _{i: \ X_i^* \le x} \left[ 1 - \frac{1}{n \ C_n(X_i^*)}\right] . \end{aligned}$$
(1.28)

This is the original Lynden-Bell estimate (Lynden-Bell 1971) which was shown to be the nonparametric maximum likelihood estimator of F by Woodroofe (1985) and Wang et al. (1986). One undesirable feature of the estimator \(\hat{F}_n^T (x)\) is that it may jumps to 1 before x reaches the largest order statistic. To see this, consider the simpler case when there is no tied observation so (1.28) holds. Under this scenario \(\hat{F}_n^T (x)=1\) as soon as \(nC_n(X_j^*)=1\) for some \(X_j^* \le x\), and all observations larger than \(X_j^*\) will have no influence on the estimation of F. This is a soft spot of the Lynden-Bell estimator which triggers technical difficulties as we will elaborate later. Luckily, the probably that this happens is small so at the end of the day the Lynden-Bell estimator still enjoys nice properties. However, in order to establish the CLT for Lynden-Bell integrals, a revised estimator, which is asymptotically equivalent to the Lynden-Bell estimate, was constructed in Stute and Wang (2008) to facilitate the proof.

To understand when the above undesirable feature might occur, observe that \(n \ C_n(X_j^*)=1\), if \(X_j^*\) is not covered by any other interval \([Y_i^*, X_i^*]\), \(i \ne j\) (note that \(n \ C_n(X_j^*) \ge 1\) because \([Y_j^*, X_j^*]\) always covers \(X_j^*\)). This phenomenon occurs when there are gaps in the unions of all intervals \([Y_i^*, X_i^*], 1 \le i \le n,\) and these gaps, which are intervals that are not covered by any \([Y_i^*, X_i^*]\), are referred to as the “holes” for truncated data (Strzalkowska-Kominiak and Stute 2010) or “empty inner risk sets” (Keiding and Gill 1990). On those holes, \(C_n (x)\) may be zero so the probability for those holes needs to be small and tend to zero sufficiently fast as the sample size tends to infinity. Sharp probability bounds were developed in Strzalkowska-Kominiak and Stute (2010) and they have ramifications on the estimation of \(\alpha \), a topic of practical interest further studied in He and Yang (1998a). Below we focus on two of the fundamental results that Winfried established (Stute 1993a; Stute and Wang 2008).

4.1 I.I.D. Representation for Truncated Data

So far, the left truncation setting resembles the random censoring case by replacing \(H_1\) and H in (1.2), respectively, by \(F^*\) and C and by replacing the empirical estimates \(H_{1n}\) and \(H_n\) in (1.3), respectively, by the empirical estimates \(F_n^*\) and \(C_n\). However, there are distinctive features between the two settings in the handling of theory because \(C_n\) is not a monotone function while \(H_n\) is and also because of the problems created by the “holes” in truncated data when \(n C_n(X_j^*)=1\), for some j.

Winfried’s first solo act for truncated data was Stute (1993a), which appeared around the same time as SW93. However, instead of tackling the Lynden-Bell integrals, \(\int \phi (x) d \hat{F}_n^T (x)\), he focused on the Lynden-Bell estimator itself and on providing an i.i.d. representation for the Lynden-Bell estimator. Along this path, he realized that he needed stronger results on the processes of U-statistics which he developed alongside with Stute (1993a) and which subsequently appeared in Stute (1994e). Below we summarize the main result of Stute (1993a), which improved the results in Chao and Lo (1988) and had 109 citations according to Google Scholar in August, 2017.

Theorem 3

(Theorems 1 and 2 of Stute Stute 1993a) Assume \(a_G \le a_F\) and \(\int _{a_F}^{\infty } G^{-2} (x) \ dF(x) < \infty ,\) then uniformly in \(a_F \le x \le b < b_F \) we have

$$\begin{aligned} (i)&\hat{\Lambda }_n^T (x) -\Lambda _F (x)= L_n + R_n, \ and \\ (ii)&\hat{F}^T_n (x) - F(x) = (1-F(x)) \ L_n (x) + R_n^0 (x), \\ \text{ where } L_n= & {} \int _{a_F}^x \frac{1}{C(z)} \ d (F_n^* -F^*)(z) - \int _{a_F}^x \frac{C_n (z) -C(z)}{C^2(z)} \ dF^*(z), \\ \sup _{a_F \le x \le b} |R_n (x)|= & {} o (n^{-1} (\ln n)^{\delta }), \ \ with \ probability \ one \ and \ for \ any \ \delta > 1.5, \\ \sup _{a_F \le x \le b}|R_n^0 (x)|= & {} O(n^{-1} (\ln n)^3) \text{ with } \text{ probability } \text{ one }.\end{aligned}$$

Remark 7

It is clear from the theorem that \(L_n\) is a sum of i.i.d. processes which then leads to the CLT and LIL (Law of Iterated logarithm) for the Lynden-Bell estimator.

Remark 8

The order of the remainder terms (other than the log part) in Theorem 3 is \(O(n^{-1})\), which is much sharper than the order \(o(n^{-1/2}) \) in standard i.i.d. representations. Winfried stressed the need to have such a higher order remainder term, e.g. for density and quantile estimation.

Remark 9

A version of the SLLN for truncated data was later studied in He and Yang (1998b) but an optimal solution under random truncation remains elusive at this time. Maybe Winfried will fill this gap when he has more time in his hand (he is still carrying a full teaching load at Giessen).

4.2 CLT for Truncated Data

For right censored data, only the right tail poses technical challenges, but for left truncated data both the left and right tails present challenges. This can be seen from the function C and its estimator \(C_n\), as both approach zero on the left and right tail and as in particular \(C_n\) appears in the denominator. Additional challenges are due to the aforementioned “holes” in the data and to the fact neither C nor \(C_n\) is monotone. Consequently, the proof of the CLT for right censored data in Stute (1995a) does not apply directly for truncated data. To circumvent the tail problem we want to prevent \(\hat{F}_n^T\) to reach its full mass (one) prematurely, which means that we need to construct a modified estimator \(\tilde{F}_n^T\) which avoids this problem but satisfies \(\sqrt{n} \{ \int \phi \ d \hat{F}_n^T - \int \phi \ d \tilde{F}_n^T \}= o_P(n^{-1/2}).\) This is the key idea in Stute and Wang (2008), which will be discussed further after we present the CLT for Lynden-Bell estimator.

The following assumptions are needed for the CLT,

$$\begin{aligned} (i) \int \frac{dF}{G} < \infty , \end{aligned}$$
(1.29)
$$\begin{aligned} (ii) \int \frac{\phi ^2}{G} \ dF < \infty . \end{aligned}$$
(1.30)

Theorem 4

(Theorem 1.1 and Corollary 1.1 of Stute and Wang, 2008)

Under assumptions (1.23), (1.29), and (1.30) we have

$$\begin{aligned}&\int \phi d\hat{F}_n^T - \int \phi dF \!=\! \int \frac{\psi (y)}{C(y)} d (F_n^* - F^*) (y) - \int \frac{C_n (y) -C (y)}{C^2(y)} \psi (y) \ dF^*(y) + o_P(n^{-1/2}), \\ \nonumber&where \\ \nonumber&\psi (y)= \phi (y) \ [1-F(y)] - \int _{[y<x]} \ \frac{\phi (x) \ [1-F(x)]}{C(x)} \ dF^*(x) = \int _{[y<x]}\ [\phi (y)-\phi (x)] \ dF(x).\qquad \qquad \end{aligned}$$
(1.31)

Hence

$$\begin{aligned} \sqrt{n} \int \phi \ d (F_n^* - F)&\rightarrow N(0, \sigma ^2) \ \text{ in } \text{ distribution, } \\ \nonumber with&\sigma ^2 = \text{ Var } \left\{ \frac{\psi (X)}{C(X)} - \int _Y^X \frac{\psi (y)}{C^2(y)} \ dF^*(y) \right\} . \end{aligned}$$
(1.32)

Remark 10

Assumption (1.23) as mentioned before ensures that F can be properly estimated under the left truncation setting and assumption (1.29) further ensures that there is enough information in the left tail so F can be estimated at the \(\sqrt{n}\) rate. Both assumptions are standard for truncated data and were already stated in Woodroofe (1985). Assumption (1.30) is needed to ensure that the leading terms in the i.i.d. representation (1.31) have finite second moments so the asymptotic normality in (1.32) holds. Thus, the assumptions listed in Theorem 4 are mild and they are much weaker than any other existing assumptions for truncated data.

The fact that \(G\le 1\) implies \(\int \phi ^2 dF < \infty ,\) which is the second moment assumption for standard CLTs when there is no truncation. It will be implied by assumption (1.29) when \(\int \phi ^2 dF < \infty \) and \(\phi \) is locally bounded in a neighborhood of \(a_G\). Both assumptions in (1.29) and (1.30) will be satisfied when \(a_G < a_F\) and \(\int \phi ^2 dF < \infty \).

Remark 11

Theorem 4 actually has broader implications to a class of \(\phi \) functions if one traces its proof carefully. For instance, if we take the class of all indicators \(\phi _x=1_{(- \infty , x]} \) then it can be shown that the i.i.d representation in (1.31) holds uniformly for all \(\phi _x\) under condition (1.23) and (1.29) (here condition (1.30) is implied by condition (i)) because the remainder term can be bounded uniformly.

4.3 Outline of the Proof

Step 1. The proof begins with the case that F is continuous, where the Lynden-Bell estimator takes the special form in (1.28). As mentioned, \(\hat{F}_n^T\) has the undesirable property that if “holes” exist then \(\hat{F}_n^T\) jumps to one as soon as \(n C_n(X_j^*)=1\) for some \(X_j^*\). When this happens before the largest order statistic the exponential representation in (1.18) for the K-M integral cannot hold for the Lynden-Bell integral, so the method of proof for Theorem 2 is not applicable. To circumvent this problem, a modified estimator was proposed in Stute and Wang (2008). This estimator was constructed by modifying the weights of the Lynden-Bell estimator from

$$\begin{aligned} \hat{F}_n^T \{X_{i:n}^*\} = \frac{1}{C_n(X_{i:n}^*)} \prod _{j=1}^{i-1} \left[ 1- \frac{ 1}{nC_n (X_{i:n}^*) } \right] . \end{aligned}$$
(1.33)

to

$$\begin{aligned} \tilde{F}_n^T \{X_{i:n}^*\} = \frac{1}{C_n(X_{i:n}^*)} \prod _{j=1}^{i-1} \left[ 1- \frac{ 1}{nC_n (X_{i:n}^*) +1 } \right] , \end{aligned}$$
(1.34)

where \(X_{i:n}^*\) is the ith order statistics of \(\{X_1^*, \ldots , X_n^*\}. \)

This small modification in the denominator of the products now avoids the problem of “holes” so all observed data \(X_i^*\) receive positive weights and hence are properly accounted for.

Step 2. A similar proof to the CLT for censored data can then be applied to \(\int \phi \ d \tilde{F}_n^T\), albeit extra care is still needed as the truncation case is challenging both in the left and right tail, whereas the censored case only faces challenges in the right tail. For instance, a new bound for the function \(C/C_n\) is needed and established in Lemma 3.1 of Stute and Wang (2008). In addition, many more bounds need to be established for quantities that involve \(C_n\) in the denominator. Since \(C_n\) is not monotone, many of the nice properties that are readily available for its censoring counter part \(H_n\) are not afforded to \(C_n\).

After eight lemmas and three corollaries, Theorem 4 was established for continuous F.

Step 3. The extension to an arbitrary F is less treacherous and follows a similar path as the analogous extension for censored data, as described in step 5 of Sect. 1.2.2, by invoking a quantile transformation.

5 Conclusion

It has been 40 years since Winfried’s first publication and since he obtained his Ph.D. degree (both events occurred in the same year 1976). During these 40 years, he had a very productive career with many landmark papers. It appears from his CV that the four years from 1993 to 1996 were Winfried’s most productive period, during which he had a total of 27 publications, 20 of which were either in survival analysis or inspired by his interest in survival analysis. In this review, we focused on four of his papers and some of their applications in survival analysis as examples for the scope and impact of his research. Hopefully, the review gives the reader a sense of the transformative nature of his contributions to the theory of survival analysis. It is also my hope that after a brief tranquil period after his retirement Winfried gets fired up again to crack another code, perhaps for doubly or interval censored data this time. There are still lots of interesting open problems for the theory of incomplete data—the world of incomplete data is not complete yet. I look forward to another opportunity to hack the incomplete filed together. Meanwhile, I wish him the very best for his 70th birthday—and many more years to look forward to!