Abstract
A nonparametric latency estimator for mixture cure models is studied in this paper. An i.i.d. representation is obtained, the asymptotic mean squared error of the latency estimator is found, and its asymptotic normality is proven. A bootstrap bandwidth selection method is introduced and its efficiency is evaluated in a simulation study. The proposed methods are applied to a dataset of colorectal cancer patients in the University Hospital of A Coruña (CHUAC).
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In the last two decades there has been a remarkable progress in cancer treatments, which led to longer patient survival and improved their quality of life. Consequently, a spate of statistical research to develop cure models arose. These models are a useful tool to analyze and describe cancer survival data, since they express and predict the prognosis of a patient considering, as a novelty, the real possibility that the subject may never experience the event of interest. Importantly, cure models should not be used in an indiscriminate way (see Farewell 1986). They generally require long-term follow-up and large sample sizes, as well as empirical and biological evidence of a nonsusceptible subpopulation. More specifically, they are used to estimate the probability of cure, also known as incidence, and the survival function of the uncured population denoted by latency.
Cure models can be split into two major types: the mixture and the nonmixture models. Mixture cure models were proposed by Boag (1949). They consider the survival function as a mixture of two groups of subjects: the susceptible group and the cured group. An important benefit of the mixture cure model is that it allows covariates to have different influence on patients who will experience the event of interest (e.g., death by the cancer under study) and on those who will not.
In the literature, the covariate effect is generally expressed parametrically or semiparametrically (see, among others, Farewell 1982; Goldman 1984; Kuk and Chen 1992; Maller and Zhou 1996; Sposto et al. 1992; Chappell et al. 1995; Taylor 1995; Peng and Dear 2000; Sy and Taylor 2000; Peng 2003; Yu and Peng 2008). Recently Louzada and Cobre (2012) considered recurrent event data in the presence of a cure fraction. Very few papers exist that use a nonparametric view to deal with the problem (see Maller and Zhou 1992; Laska and Meisner 1992; Wang et al. 2012). In the discussion by Van Keilegom to the paper González-Manteiga and Crujeiras (2013), the problem of goodness-of-fit tests for regression models with cured data is briefly considered. A completely nonparametric approach to the mixture cure model was firstly addressed by Xu and Peng (2014), proposing a nonparametric incidence estimator which works with continuous covariates, and proving its consistency and asymptotic normality. This nonparametric incidence estimator was studied later by López-Cheda et al. (2017), who obtained an i.i.d. representation, the asymptotically optimal bandwidth and proposed a bootstrap bandwidth selector. Regarding the latency function, a nonparametric estimator was proposed by López-Cheda et al. (2017), but no further properties were studied. The present paper contributes to this lacuna studying the asymptotic properties of that nonparametric latency estimator and proposing a bootstrap bandwidth selector. This enables the mixture cure model with covariates to be addressed in a completely nonparametric way.
The rest of the paper is organized as follows. Section 2 introduces the notation and presents the nonparametric mixture cure model and the nonparametric latency estimation. The asymptotic results for this estimator as well as the required assumptions are also introduced in Sect. 2. An i.i.d. representation is presented, an asymptotic expression for the mean squared error is found and the asymptotic normality is established for the nonparametric latency estimator. The problem of choosing the smoothing parameter is addressed in Sect. 3, where a bootstrap bandwidth selector is presented. The practical performance of this bootstrap bandwidth selector is assessed by a simulation study in Sect. 4. The application of these methods to a colorectal cancer data set is considered in Sect. 5. A final Appendix contains the proofs of the theoretical results stated in Sect. 2.
2 Main results
2.1 Notation and nonparametric estimators
To distinguish between cured and uncured subjects we use a binary indicator: \(\nu \). If the subject belongs to the susceptible group we set \(\nu =0\). This means that the individual will experience the event of interest if followed during enough time. If the subject is cured we set \(\nu =1\). In such a case the event will never be experienced by that subject. The probability of being cured and the survival function in the group of uncured patients may depend on a vector of covariates, \(\mathbf {X}\), measured on the subject. Let us consider \(p(\mathbf {x}) = P(\nu =0 | \mathbf {X}=\mathbf {x})\), the conditional probability of not being cured, and let Y be the time to the event of interest. If \(\nu =1\), we set \(Y =\infty \).
We define \(F(t|\mathbf {x})=P(Y\le t |\mathbf {X}=\mathbf {x})\), the conditional distribution function of Y. When the cure probability is positive, then the corresponding survival function, \(S(t|\mathbf {x})\), is improper. In other terms, \(\lim _{t \rightarrow \infty } S(t|\mathbf {x}) = 1-p(\mathbf {x}) >0\).
Using the conditional survival function for susceptible subjects, \(S_0(t|\mathbf {x})=P(Y > t | \mathbf {X}=\mathbf {x}, \nu =0)\), the mixture cure model can be written as:
The function \(1 - p(\mathbf {x})\) is called the incidence and \(S_0(t|\mathbf {x})\) is the latency.
Random right censoring is assumed. The censoring time is denoted by C and G denotes its distribution function (\(\bar{G}\) is its survival function). The variable C is assumed to be independent of Y given the covariates \(\mathbf {X}\). The observed time is defined as \(T=\min \{Y,C \}\) and \(\delta =1 \{Y \le C\}\) is the uncensoring indicator. We denote by H the distribution function of T. It is clear that \(\delta =0\) for all the cured patients, and also for uncured patients with censored lifetime (\(T=C\)). From now on we restrict ourselves to the case where \(\mathbf {X}\) is a univariate continuous covariate X with density function m(x). As a consequence of the previous definitions and assumptions, the sample is denoted by \(\{(X_i, T_i, \delta _i), i=1, \dots , n \}\), which collects i.i.d. observations of the random vector \((X, T, \delta )\). Whenever is needed \((X_{(i)},T_{(i)},\delta _{(i)})\) will denote the observation corresponding to the ith order statistic with respect to the sample \((T_1, T_2, \ldots , T_n)\), where \(X_{(i)}\) and \(\delta _{(i)}\) are the concomitants of the X and \(\delta \)-samples.
The conditional distribution, survival and subdistribution functions are denoted by \(G(t|x)=P\left( C \le t|X=x \right) \), \(\bar{G}(t|x)= 1-G(t|x)\), \(H(t|x)=P\left( T \le t | \right. \) \(\left. X=x \right) \), \(H^1(t|x)=P\left( T \le t, \delta =1|X=x \right) \) and \(H_{c,1}(t)=P\left( T < t|\delta = \right. \) \(\left. 1 \right) \).
We will consider the nonparametric approach in mixture cure models by López-Cheda et al. (2017). It departs from the generalized Kaplan–Meier estimator by Beran (1981) to estimate the conditional survival function:
where \(B_{h(i)}(x) = K_h (x - X_{(i)}) / \sum _{j=1}^n K_h(x - X_{(j)}) \) are the Nadaraya–Watson (NW) weights and \(K_h(\cdot )=\frac{1}{h}K \left( \frac{ \cdot }{h} \right) \) the rescaled kernel with bandwidth \(h>0\). We denote by \(\hat{F}_h(t|x) = 1 - \hat{S}_h(t|x)\) the Beran estimator of F(t|x).
Departing from the Beran estimator, Xu and Peng (2014) introduced a kernel type estimator for the incidence function:
where \(T_{\max }^1=\max \nolimits _{i:\delta _i=1}(T_i)\) is the largest uncensored failure time. These authors proved the consistency and asymptotic normality of \(\hat{p}_h(x)\). López-Cheda et al. (2017) obtained an i.i.d. representation and the asymptotically optimal bandwidth, proposed a bootstrap bandwidth selector for \(\hat{p}_h(x)\), and introduced the following nonparametric latency estimator:
with \(\hat{S}_{h}(t|x)\), in (2), the Beran estimator of S(t|x) and \(1 - \hat{p}_h(x)\) the estimator by Xu and Peng (2014) in (3). They also addressed identifiability of model (1). Note that the optimal bandwidth for \(\hat{S}_{0,h}(t|x)\) is not necessarily the optimal bandwidth for \(\hat{p}_h(x)\). A more general function than (4) using different bandwidths for the incidence and for the improper survival function:
could be considered as an estimator of the latency. However, it does not yield necessarily a proper survival function since its limit as t tends to infinity needs not to be zero. In fact, it is not even guaranteed to be non-negative. On the other hand, as it will be shown in Sect. 4.1, the optimal values for \(h_1\) and \(h_2\) in (5) are nearly equal. As a consequence, in this work only the asymptotic properties of the nonparametric latency estimator in (4), that depends on one unique bandwidth h, will be studied. Similar theoretical results, not included in this paper, are easily extended to the estimator in (5).
Let us define: \(\tau _{S_{0}}(x) = \sup \left\{ t:S_{0}(t|x)> 0\right\} \). Since S(t|x) is an improper survival function and \(1-H(t|x)=S(t|x) \bar{G}(t|x)\), then \(\tau _{H}(x)=\tau _{G}(x)\), where \(\tau _{H}(x) = \sup \left\{ t:H(t|x) < 1\right\} \) and \(\tau _{G}(x) = \sup \left\{ t:G(t|x) < 1\right\} \).
Let \(\tau _{0}=\sup _{x \in D} \tau _{S_{0}}(x) \), where D is the support of X. As in Xu and Peng (2014), we consider
The rationale of this condition has been discussed by López-Cheda et al. (2017), Xu and Peng (2014) and Maller and Zhou (1992, 1996). Note that if the censoring variable takes values always below a time \(\tau _G < \tau _0\), the largest uncensored observation may occur at a time not larger than \(\tau _G\) and therefore always before \(\tau _0\). Laska and Meisner (1992) stated that, for a large sample size, the nonparametric incidence estimator in (3) is an estimator of \(1-p(x)+p(x)S_0(\tau _G)\), which is strictly larger than \(1-p(x)\). Specifically, as it is mentioned in Maller and Zhou (1992), consistent estimates of the incidence are possible if and only if there is zero probability of a susceptible individual surviving longer than the largest possible censoring time. That is, condition (6) guarantees that censored subjects beyond the largest observable failure time are cured, since the support of the censoring variable, C, is not contained in the support of Y, the time to occurrence of the event. Therefore, the nonparametric estimator does not overestimate the true cure rate. A nonparametric test for this condition on the censoring mechanism was proposed by Maller and Zhou (1992) in an unconditional setting, and by López-Cheda et al. (2017) with covariates.
2.2 Theoretical results
The following assumptions are needed to prove the asymptotic results in this section.
-
(A1)
X, Y and C are absolutely continuous random variables.
-
(A2)
Condition (6) holds.
-
(A3)
-
(a)
Let \(I=[x_{1},x_{2}]\) be an interval contained in the support of m, and \(I_{\delta }=[x_{1}-\delta ,x_{2}+\delta ]\) for some \(\delta >0\) such that \(0<\gamma =\inf [m\left( x\right) :x\in I_{\delta }]<\sup [m\left( x\right) :x\in I_{\delta }]=\varGamma <\infty \) and \(0<\delta \varGamma <1\). Then for all \(x\in I_{\delta }\) the random variables Y and C are conditionally independent given \(X=x\).
-
(b)
There exist \(a,b\in \mathbb {R}\), with \(a<b\) satisfying \(1 - H(t|x) \ge \theta > 0\) for \((t,x) \in [a,b] \times I_{\delta }\).
-
(a)
-
(A4)
The first derivative of the function m(x) exists and is continuous in \(x\in I_{\delta }\) and the first derivatives with respect to x of the functions H(t|x) and \(H^{1}(t|x)\) exist and are continuous and bounded in \(\left( t,x\right) \in [0,\infty )\times I_{\delta }\).
-
(A5)
The second derivative of the function m(x) exists and is continuous in \(x\in I_{\delta }\) and the second derivatives with respect to x of the functions H(t|x) and \(H^{1}(t|x) \) exist and are continuous and bounded in \(\left( t,x\right) \in [0,\infty )\times I_{\delta }\).
-
(A6)
The first derivatives with respect to t of the functions G(t|x), H(t|x), \(H^{1}(t|x)\) and \(S_0(t|x)\) exist and are continuous in \(\left( t,x\right) \in [a,b] \times D\).
-
(A7)
The second derivatives with respect to t of the functions H(t|x) and \(H^{1}(t|x) \) exist and are continuous in \(\left( t,x\right) \in [a,b]\times D\).
-
(A8)
The second partial derivatives with respect to t and x of the functions H(t|x) and \(H^1(t|x)\) exist and are continuous and bounded for \((t,x)\in [0,\infty ) \times D\).
-
(A9)
The first and second derivatives of the distribution and subdistribution functions H(t) and \(H_{c,1}(t)\) are bounded away from zero in [a, b]. Moreover, \(H^{\prime }_{c,1}(\tau _0) > 0\).
-
(A10)
The functions H(t|x), \(S_{0}(t|x)\) and G(t|x) have bounded second-order derivatives with respect to x for any given value of t.
-
(A11)
The kernel function, K, is a symmetric density vanishing outside \(\left( -1,1\right) \) and the total variation of K is less than some \(\lambda<\) \(\infty \).
-
(A12)
The density function of T, \(f_T\), is bounded away from 0 in \([0,\infty )\).
The proof of Theorem 1 is based on Theorem 2 in Iglesias-Pérez and González-Manteiga (1999), where the assumptions (A1), (A3)–(A9) and (A11)–(A12) are required. Assumptions (A2) and (A10) ensure that Theorem 2 in Iglesias-Pérez and González-Manteiga (1999), stated for a fixed t such that \(1-H(t|x) \ge \theta > 0 \in [a,b]\times I_\delta \), can be applied to the random value \(t=T_{\max }^1\). Assumptions (A4)–(A8) and (A10) are regularity conditions for the functions involved in the proofs and in the asymptotic results.
In Theorem 1 we obtain an i.i.d. representation for \(\hat{S}_{0,h}(t|x)\) in (4).
Theorem 1
Suppose that conditions (A1)–(A12) hold, together with \(\frac{\ln n}{nh} \rightarrow 0\) and \(h=O\left( \left( \frac{\ln n}{n}\right) ^{{1}/{5}} \right) \), then we have an i.i.d. representation for the nonparametric latency estimator for any \(t \in [a,b]\):
where
and
From Theorem 1, important properties of the nonparametric latency estimator can be obtained: the first one is the asymptotic expression of the mean squared error (MSE) given in Theorem 2, and the second one is the asymptotic normality, shown in Theorem 3. But first some notation will be introduced. Let us define
and
with \(\xi \) in (7). The asymptotic bias and variance of the latency estimator will be expressed in terms of the following functions:
where
and \(\varPhi ^{\prime }\) and \(\varPhi ^{\prime \prime }\) are the derivatives of \(\varPhi (y,t,x)\) with respect to y. Furthermore,
respectively, where
Note that, except for some constants, \(B_1(t,x)\) in (10) and \(B_2(t,x)\) in (11) are the dominant terms of the asymptotic bias of the estimators \(\hat{S}_h\) and \(1-\hat{p}_h\) in (2) and (3), respectively. Similarly, the terms \(V_1(t,x)\) in (12) and \(V_2(t,x)\) in (13) are the dominant terms of the corresponding asymptotic variances of \(\hat{S}_h\) and \(1-\hat{p}_h\). Finally, \(V_3(t,x)\) in (14) accounts for the covariance of both estimators.
Theorem 2
Under assumptions (A1)–(A10), if \(\frac{\ln n}{nh} \rightarrow 0\) and \(h=O\left( \left( \frac{\ln n}{n}\right) ^{{1}/{5}} \right) \), then the asymptotic mean squared error of the latency estimator is
where \(d_{K}=\int v^{2}K(v)\mathrm{d}v\), \(c_{K}=\int K^{2}(v)\mathrm{d}v\),
with \(t \in [a,b]\), \(B_{1}\), \(B_{2}\), \(V_{1}\), \(V_{2}\) and \(V_{3}\) in (10)–(14).
Theorem 3
Under assumptions (A1)–(A10), if \(h \rightarrow 0\) and \(\frac{(\ln n)^3}{nh} \rightarrow 0\), it follows that, for any \(t \in [a,b],\)
(a) If \(nh^{5} \rightarrow 0\), then
(b) If \(n h^5 \rightarrow C^5 > 0\), then
3 Bandwidth selection
From Theorem 2, the asymptotic mean integrated squared error of the latency estimator is:
where B(t, x) and V(t, x) are defined in (16) and (17). The bandwidth which minimizes the asymptotic mean integrated squared error is
which depends on plenty of unknown functions that are very hard to estimate. Consequently, we propose to select the bandwidth using the bootstrap method.
3.1 Bootstrap bandwidth selector
The bootstrap bandwidth selector is the minimizer of the bootstrap version of the mean integrated squared error (MISE), that can be approximated, using Monte Carlo, by:
where w is an appropriate weight function, \(\hat{S}_{0,h}^{*(j)}(t|x)\) is the kernel estimator of \(S_0(t|x)\) in (4) using bandwidth h and based on the jth bootstrap resample, and \(\hat{S}_{0,g}(t|x)\) is the same estimator computed with the original sample and pilot bandwidth g.
We consider an unconditional censoring bootstrap resampling, assuming that \(G(t|x)=G(t),\;\forall x,t\):
-
1.
For \(i= 1, 2, \ldots , n\), generate \(C_i^*\) from the product-limit estimator \(\hat{G}^{KM}\).
-
2.
For \(i= 1, 2, \ldots , n\), fix the bootstrap covariates \(X_i^*=X_i\) and generate \(Y_i^*\) from \(\hat{S}_{0,g}(\cdot |X_i^*)\) with probability \( \hat{p}_g(X_i^*)\), and \(Y_i^* = \infty \) otherwise.
-
3.
Finally, define \(T^*_i=\min \{Y^*_i, C^*_i \}\) and \(\delta ^*_i = 1\{Y^*_i \le C^*_i\}\) for \(i= 1, 2, \ldots , n\).
-
4.
Repeat Steps 1–3 above B times to generate bootstrap resamples of the form \(\big \{ ( X_1^{(b)}, T_1^{*(b)}, \delta _1^{*(b)}), \dots , ( X_n^{(b)}, T_n^{*(b)}, \delta _n^{*(b)} ) \big \}\), \(b=1, \dots , B\).
-
5.
For the bth bootstrap resample \((b = 1, 2 \ldots ,B)\), compute \(\hat{S}_{0,h}^{*(b)}(t|x)\) with bandwidth \(h_l \in \{h_1,\ldots ,h_L\}\) .
-
6.
With the original sample and pilot bandwidth g, compute \(\hat{S}_{0,g}(t|x)\).
-
7.
For each bandwidth \(h_l\) in \(\{h_1, \dots , h_L\}\), compute the Monte Carlo approximation of \(\mathrm{MISE}_{x,g}^*(h_l)\) as in (18).
-
8.
Find \(h^*_x=\mathop {{{\mathrm{arg\,min}}}}\nolimits _{h_l \in \{h_1,\ldots ,h_L\}}\mathrm{MISE}^{*}_{x,g}(h_l)\).
4 Simulation study
Good practical behavior of the nonparametric latency estimator has been preliminarily reported by López-Cheda et al. (2017). The purpose of this simulation study is to assess the performance of the bootstrap bandwidth selector for the nonparametric latency estimator. We will work with the same two models considered by López-Cheda et al. (2017). For both models, the censoring times are generated according to an exponential distribution with mean 10 / 3 and the covariate X has a \(U(-20,20)\) distribution.
Model 1 The probability of not being cured is a logistic function and the latency is close to fulfill the proportional hazards model, truncated to guarantee condition (6):
with \(\beta _0 = 0.476\) and \(\beta _1 = 0.358\), \(\tau _0 = 4.605\) and \(\lambda \left( x\right) =\exp \left( (x+20) /40 \right) \). A percentage of 54% of the patients are censored and 47% are cured.
Model 2 The probability of not being cured is
with \(\beta _0 = 0.0476\), \(\beta _1 = - 0.2558 \), \(\beta _2 = - 0.0027\) and \(\beta _3 = 0.0020\), and \(S_0 (t|x) = \frac{1}{2} \left( \exp (-\alpha (x) t^5 ) + \exp (-100 t^5) \right) \) with \(\alpha (x) = \frac{1}{5} \exp ((x+20)/40)\). In this case, the percentages of cure and censoring are slightly higher than for Model 1: around 62% of the individuals are censored and 53% are cured.
In order to approximate the bootstrap version of the \(MISE_x\) of the nonparametric latency estimator, \(m=1000\) trials and \(B=200\) bootstrap resamples of sizes \(n=50\), \(n=100\) and \(n=200\) were drawn and used the Epanechnikov kernel. We considered a grid of 35 bandwidths (from 5 to 100) equispaced on a logarithmic scale. Note that, although the covariate \(X\in U(-20,20)\), we only work with \(x \in [-10,20]\). The reason is that \(p(x)\simeq 0\) for \(-20\le x\le -10\). This implies that almost all the subjects are cured, and therefore the estimation of the survival function of the uncured population can not be obtained. Similarly as for the nonparametric incidence estimator (see López-Cheda et al. 2017), the effect of the choice of the pilot bandwidth, g, on the bootstrap bandwidth, \(h^*_x\), is very weak. In this simulation study, we considered the same naive pilot bandwidth selector, \(g = C(X_{[n]} - X_{[1]}) \cdot n^{-1/9}\), as in López-Cheda et al. (2017), with \(C=0.75\), and where \(X_{[n]}\) (\(X_{[1]}\)) is the maximum (minimum) value of the observed values of the covariate X.
In Fig. 1 the density of the bootstrap bandwidths, \(h^*_x\), is compared with the optimal \(h_{\mathrm{MISE},x}\) bandwidth. The MISE values obtained considering these bandwidths are also shown. It is noteworthy that \(\mathrm{MISE}(\hat{S}_{0,h}(\cdot |x))\), and consequently \(\mathrm{MISE}^*_{x,g}(h)\), is almost constant in a very wide interval around its minimizer. This feature implies that very different bandwidths could yield very similar good estimates in terms of MISE. We can appreciate how the bootstrap bandwidth might be larger (smaller) than \(h_\mathrm{MISE}\) in Model 1 (Model 2), for most of the covariate values, reflected in a very little difference in terms of MISE between the estimates with the optimal and the bootstrap bandwidths.
4.1 Results when using two bandwidths to estimate \(S_0\)
We will present some results for the latency estimator in (5), that is, if two different bandwidths are considered: \(h_1\) for the incidence and \(h_2\) for the improper survival function S. Note that, for the sake of brevity, we only work with Model 1 and sample size \(n=100\), considering \(m=1000\) samples. Figure 2 (left) shows the MISE, approximated by Monte Carlo, of the nonparametric latency estimator \(\hat{S}_{0,h_1,h_2}(t|x)\) in (5) as a function of \((h_1,h_2)\) for the covariate value \(x=5\) (the MISE for other values of x is similar, not shown). We can see that the minimum MISE (dark-grey color) is reached around the diagonal, that is, when \(h_1=h_2\). Figure 2 (right) provides the optimal bandwidths \((h_1,h_2)\) as a function of x. Note that for most of the covariate values both optimal bandwidths are very similar, being even equal for the values of x larger than 5.
Therefore, as pointed out in Sect. 2, little efficiency is lost when considering one only bandwidth \(h_1 = h_2\) to estimate \(S_0\), while this guarantees that the resulting estimator is a proper survival function.
5 Application to colorectal cancer data
The proposed method was applied to the dataset used in López-Cheda et al. (2017), composed of 414 colorectal cancer patients from CHUAC (Complejo Hospitalario Universitario de A Coruña), Spain. The variable of interest is the follow-up time, in months, since the diagnosis until death. Two covariates are considered: the stage (from 1 to 4) and the age (from 23 to 103). The percentage of censoring varies from 30% to almost 71%, depending on the stage. In Table 1 we show a summary of the data set.
Due to the small sample sizes in each stage, the results are presented in two groups: Stages 1–2 and Stages 3–4. Note that \(B=200\) bootstrap resamples are drawn. Similar to the simulation study in Sect. 4, we considered a grid of 35 bandwidths from \(h_1=5\) to \(h_{35}=100\) equispaced on a logarithmic scale.
The latency estimation computed with the bootstrap bandwidth, \(\hat{S}_{0,h^*}(t|x)\), for different ages (35, 50 and 80) is shown in Fig. 3. We can observe that for Stages 1–2 the covariate age does not seem to be determining for the latency estimation, since all the estimated latency functions are very similar for the whole grid of ages. On the contrary, for Stages 3–4 the latency estimation varies considerably depending on the age. For example, the probability that the follow-up time since the diagnosis until death is larger than 4.5 years (54 months) is around 0.2 for patients with ages 35 and 50, whereas for 80- year-old patients, that probability is larger than 0.4.
References
Arcones MA (1997) The law of the iterated logarithm for a triangular array of empirical processes. Electron J Probab 2:1–39
Beran R (1981) Nonparametric regression with randomly censored survival data. Technical Report, University of California, Berkeley
Billingsley P (1968) Convergence of probability measures. Wiley, New York
Boag JW (1949) Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J R Stat Soc B Met 11:15–53
Chappell R, Nondahl DM, Fowler JF (1995) Modeling dose and local control in radiotherapy. J Am Stat Assoc 90:829–838
Farewell VT (1982) The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 38:1041–1046
Farewell VT (1986) Mixture models in survival analysis: are they worth the risk? Can J Stat 14:257–262
Goldman AI (1984) Survivorship analysis when cure is a possibility: a Monte Carlo study. Stat Med 3:153–163
González-Manteiga W, Crujeiras RM (2013) An updated review of Goodness-of-Fit tests for regression models (with discussions and rejoinder). TEST 22:361–447
Iglesias-Pérez MC, González-Manteiga W (1999) Strong representation of a generalized product-limit estimator for truncated and censored data with some applications. J Nonparametr Stat 10:213–244
Kuk AYC, Chen CH (1992) A mixture model combining logistic regression with proportional hazards regression. Biometrika 79:531–541
Laska EM, Meisner MJ (1992) Nonparametric estimation and testing in a cure model. Biometrics 48:1223–1234
López-Cheda A, Cao R, Jácome MA, Van Keilegom I (2017) Nonparametric incidence estimation and bootstrap bandwidth selection in mixture cure models. Comput Stat Data Anal 105:144–165
Louzada F, Cobre J (2012) A multiple time scale survival model with a cure fraction. TEST 21:355–368
Maller RA, Zhou S (1992) Estimating the proportion of immunes in a censored sample. Biometrika 79:731–739
Maller RA, Zhou S (1996) Survival analysis with long-term survivors. Wiley, Chichester
Peng Y (2003) Fitting semiparametric cure models. Comput Stat Data Anal 41:481–490
Peng Y, Dear KB (2000) A nonparametric mixture model for cure rate estimation. Biometrics 56:237–243
Sposto R, Sather HN, Baker SA (1992) A comparison of tests of the difference in the proportion of patients who are cured. Biometrics 48:87–99
Sy JP, Taylor JMG (2000) Estimation in a Cox proportional hazards cure model. Biometrics 56:227–236
Taylor JMG (1995) Semi-parametric estimation in failure time mixture models. Biometrics 51:899–907
Wang L, Du P, Lian H (2012) Two-component mixture cure rate model with spline estimated nonparametric components. Biometrics 68:726–735
Xu J, Peng Y (2014) Nonparametric cure rate estimation with covariates. Can J Stat 42:1–17
Yu B, Peng Y (2008) Mixture cure models for multivariate survival data. Comput Stat Data Anal 52:1524–1532
Acknowledgements
The first author’s research was sponsored by the Spanish FPU (Formación de Profesorado Universitario) Grant from MECD (Ministerio de Educación, Cultura y Deporte) with reference FPU13/01371. All the authors acknowledge partial support by the MINECO (Ministerio de Economía y Competitividad) grant MTM2014-52876-R (EU ERDF support included), the MICINN (Ministerio de Ciencia e Innovación) Grant MTM2011-22392 (EU ERDF support included) and Xunta de Galicia GRC Grant CN2012/130. The authors are grateful to Dr. Sonia Pértega and Dr. Salvador Pita, at the University Hospital of A Coruña, for providing the colorectal cancer data set.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of Theorem 1
The nonparametric estimator of \(S_0(t|x)\) in (4) can be decomposed as follows:
where the dominant terms of the i.i.d. representation of \(\hat{S}_{0,h}(t|x)\) derive from
and the remaining terms
will be proved to be negligible.
The i.i.d. representation of the term \(A_{11}\) in (20) follows, under assumptions (A1)–(A7), (A11) and (A12), from that of \(\hat{S}_h(t|x)\) in Theorem 2 of Iglesias-Pérez and González-Manteiga (1999):
Under assumptions (A1)–(A12), the dominant terms of the i.i.d. representation of \(A_{21}\) in (20) come from the i.i.d. representation of \(\hat{p}_h(x)\) in Theorem 3 of López-Cheda et al. (2017):
We continue by proving the negligibility of \(A_{12}\) in (21). Under assumptions (A3a), (A4), (A5) and (A11), we apply Lemma 5 in Iglesias-Pérez and González-Manteiga (1999) to obtain
and, similarly from Theorem 3.3 in Arcones (1997) and the Strong Law of Large Numbers (SLLN),
It is straightforward to check that if the bandwidth satisfies \(h\rightarrow 0\), \(\frac{\ln n}{nh}\rightarrow 0\) and \(\frac{nh^{5}}{\ln n}=O(1)\), with the convergence \(\hat{p}_{h}(x)\rightarrow p(x)\text { a.s.}\) proved in Lemma 7 of López-Cheda et al. (2017), it directly follows that
With respect to \(A_{22}\) in (21), if \(h\rightarrow 0\), \(\frac{\ln n}{nh}\rightarrow 0\) and \(\frac{nh^{5}}{\ln n}=O(1)\), using the almost sure consistency of \(\hat{p}_{h}(x)\), it follows from (24) that
The proof of the theorem follows from the decomposition (19) and the results (22), (23), (25) and (26). \(\square \)
Proof of Theorem 2
From Theorem 1, the latency estimator can be decomposed as
where
with \(\tilde{B}_{h,i}(x)\) in (8) and \(\xi \) in (7). Then, the AMSE of \(\hat{S}_{0,h}(t|x)\) is
We start with the first term of AMSE\((\hat{S}_{0, h} (t|x) )\). Note that
where
and
Let us consider \(\varPhi _{1}(y,t,x)\) defined in (9). From a change of variable and a Taylor expansion, then the first term in (30) is
For the second term in (30), applying a change of variable, a Taylor expansion, and taking into account the symmetry of K, it follows that
where \(\varPhi (y,t,x)=E\left[ \xi (T,\delta ,t,x)|X=y\right] \) and, as will be proved in Lemma 4, \(\varPhi (x,t,x)=0\) for all \(t\ge 0\).
From (29), (30), (31) and (32), then
Continuing with the second term in the right-hand side of (28):
Using a Taylor expansion, and \(\varPhi (x,t,x)=0 \; \forall t\ge 0\), then
So the first term of \(\mathrm{AMSE}(\hat{S}_{0,h}(t|x))\) in (27) is
Following the same ideas as those for \(C_{1}\), we obtain for \(C_{2}\) that
We continue studying the third term of AMSE\((\hat{S}_{0,h}(t|x))\) in (27):
where
Using a Taylor expansion and \(\varPhi (x,t,x)=0\) for all \(t\ge 0\), the terms \(\alpha \) and \(\beta \) are
For the term \(\gamma \), it follows that
where \(\varPhi _{2}(y,t,x)=E\left[ \xi (T,\delta ,t,x)\xi (T,\delta ,\infty ,x)|X=y\right] .\) From (35), (36) and (37), the third term of \(AMSE( \hat{S}_{0,h}(t|x))\) in (27) is:
Compiling (33), (34) and (38), the \(\mathrm{AMSE}(\hat{S} _{0,h}(t|x))\) in (27) is
Since, from (40) and (41), in Lemmas 5 and 6 it is proven that
and considering (10)–(14), the AMSE of \(\hat{S}_{0,h}(t|x)\) is, finally, that in (15).
This completes the proof. \(\square \)
Lemma 4
The term \(\varPhi \left( y,t,x\right) \) in (8) has the following expression:
and consequently, \(\varPhi \left( x,t,x\right) =0\) for any \(t\ge 0\).
Proof of Lemma 4
Let us recall \(\varPhi \left( y,t,x\right) =E\left[ \xi (T,\delta ,t,x)|X=y\right] \), then
We start with \(A^{\prime }\):
where \(q\left( t,y\right) =E\left( \delta |T=t,X=y\right) \) and \(H_{1}\left( t|y\right) =P\left( T\le t,\delta =1|X=y\right) \).
We continue with \(A^{\prime \prime }\):
Then,
and therefore, \(\varPhi \left( x,t,x\right) =0\) for any \(t\ge 0\). \(\square \)
Lemma 5
The term \(\varPhi _1(y,t,x)\) in (9) verifies, for any \(t \in [a,b]\),
Proof of Lemma 5
Note that \(\varPhi _{1}\left( y,t,x\right) =E\left[ \xi ^{2}(T,\delta ,t,x)|X=y\right] \), with \(\xi \) in (7). Then,
The first term in the decomposition of \(\varPhi _{1}\left( y,t,x\right) \) is
The second term is
Integrating in the supports \(\left\{ (v,w) \in \left[ 0,t\right] \times \left[ 0,t\right] /v\le w\right\} \) and \(\left\{ \left( v,w \right) \in \right. \) \(\left[ 0,t\right] \times \left. \left[ 0,t\right] /w < v\right\} \), the term B is
Finally, the third term in the decomposition of \(\varPhi _{1}\left( y,t,x\right) \) is
Note that, for \(y=x\), we have that \(B=2C\). This completes the proof. \(\square \)
Lemma 6
The expression for the term \(\varPhi _{2}(x,t,x)\), for any \(t \in [a,b]\), is the following:
Proof of Lemma 6
Recall \(\varPhi _{2}(y,t,x)=E\left[ \xi \left( T,\delta ,t,x\right) \xi (T,\delta ,\infty ,x)|X=y\right] \) with \(\xi \) in (7). Then:
Straightforward calculations yield:
Integrating in the supports \(\left\{ \left( u,v \right) \in \left[ 0,\infty \right) \times \left[ 0,t \right] /v\le u\right\} \) and \(\left\{ \left( u,v \right) \in \right. \) \( \left[ 0,\infty \right) \left. \times \left[ 0,t \right] /u< v\right\} =\left\{ \left( u, v \right) \in \left[ 0, t \right] \times \left[ 0, t \right] /u < v \right\} \), the term D is
When \(y=x\), then \(D=C+B\), which concludes the proof. \(\square \)
Proof of Theorem 3
Under assumptions (A1)–(A10) and using Theorem 1, \(\sqrt{nh}\left( \hat{S}_{0,h}(t|x)-S_{0}(t|x)\right) \) has the same limit distribution as
where
The deterministic part b(t, x) comes from \(III+IV\). Recall the function \(\varPhi (y,t,x)\) in (39), since \(\varPhi (x,t,x)=0\), then
Therefore,
If \(nh^{5}\rightarrow 0\), then \(III+IV=o\left( 1\right) \) and \(b\left( t,x\right) =0\). On the other hand, if \(nh^{5}\rightarrow C^{5}\) then
As for the asymptotic distribution of \(I+II\), it is immediate to prove that:
where
are n independent variables with mean 0. To prove the asymptotic normality of \(I+II\), it is only necessary to show that \(\sigma _{i,n}^{2}\left( x,t\right) =Var\left( \gamma _{i,n}(x,t)+\varGamma _{i,n}(x,t)\right) \) \( <\infty \), \(\sigma _{n}^{2}\left( x,t\right) =\sum _{i=1}^{n}\sigma _{i,n}^{2}\left( x,t\right) \) is positive and that the Lindeberg’s condition is satisfied, so Lindeberg’s theorem for triangular arrays (Theorem 7.2 in Billingsley (1968), p. 42) can be applied to obtain
and consequently,
We will start proving that the variance
is finite. Note that
Let us define \(\varPhi _{1}(y,t,x)=E\left[ \xi ^{2}(T,\delta ,t,x)|X=y\right] \), using (42), then the first term in (43) is
In a similar way, the second term in (43) is
Finally, for the third term in (43),
Let us consider \(\varPhi _{2}(y,t,x)=E\left[ \xi (T,\delta ,t,x)\xi (T,\delta ,\infty ,x)|X=y\right] \). Applying Taylor expansions, the third term in (43) is
The results (44), (45) and (46), together with (40) and (41), lead to
where \(V_{1}\left( t,x\right) \), \(V_{2}\left( t,x\right) \) and \(V_{3}\left( t,x\right) \) are defined in (12), (13) and (14), respectively. As a consequence, \(\sigma _{i,n}^{2}\left( x,t\right) <\infty \). The finiteness of the variance \(\sigma _{n}^{2}\left( x,t\right) \) is also proved, since
We continue studying Lindeberg’s condition:
Let us define the indicator function \(I_{i,n}( x,t)=1 \left\{ \left( \gamma _{i,n}(x,t){+}\varGamma _{i,n}(x,t)\right) ^{2}\!>\! \epsilon ^{2}\sigma _{n}^{2}\right. \left. ( x,t) \right\} \). Then (47) can be expressed as
with
Since \(\frac{1}{nh}\rightarrow 0\), and the functions K and \(\xi \) are bounded, one has:
Since \(\eta _{n}(x,t)\) is bounded, then the previous condition implies that \( \exists n_{0}\in \mathbb {N}/n\ge n_{0}\Rightarrow E(\eta _{n}(x,t))=0\), and then \(\lim _{n\rightarrow \infty }\frac{1}{\sigma _{n}^{2}}E(\eta _{n}(x,t))=0.\) Therefore, Lindeberg’s condition is proved. All these previous arguments lead to the proof of Theorem 3. \(\square \)
Rights and permissions
About this article
Cite this article
López-Cheda, A., Jácome, M.A. & Cao, R. Nonparametric latency estimation for mixture cure models. TEST 26, 353–376 (2017). https://doi.org/10.1007/s11749-016-0515-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-016-0515-1