Abstract
Recurrent event data are common in survival and reliability studies, where a subject experiences the same type of event repeatedly. There are situations, in which the event of interest can be observed only if they belong to a window of observational range, leading to double censoring of recurrent event times. In this paper, we study recurrent event data subject to double censoring. We propose a proportional mean model for the analysis of doubly censored recurrent event data based on the mean function of the underlying recurrent event process. The estimators of the regression parameters and the baseline mean function are derived and their asymptotic properties are studied. A Monte Carlo simulation study is conducted to assess the finite sample behavior of the proposed estimators. Finally, the procedures are illustrated using two real-life data sets, one from a bladder cancer study and the other from a study on chronic granulomatous disease.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Reliability and survival studies often involve events that can occur more than once per subject, known as recurrent events. These types of outcomes are prevalent in various fields, such as biology, medicine, reliability engineering, and social sciences. Medical examples of recurrent events include tumor recurrences, sequences of asthmatic attacks, and multiple infection episodes. Recurrent events can also occur in other contexts, such as the recurrence of economic recessions, traffic accidents, or repeated breakdowns of devices like mobile phones or computers (Du & Lv, 2022).
Regression analysis methods for recurrent events can be classified into two categories: conditional methods and marginal techniques. Conditional methods involve modeling the intensity or hazard rate function (Prentice et al., 1981; Andersen & Gill, 1982), while marginal techniques focus on mean functions or rate functions (Pepe & Cai, 1993; Lawless & Nadeau, 1995; Lin et al., 2000). Marginal methods have the advantage of providing an easier interpretation of the mean number of recurrent events compared to the intensity of the event. Frailty models have been proposed for recurrent event data by Zeng and Lin (2007) and Sankaran and Anisha (2011), while Kelly and Lim (2000) provide a comparison of various approaches for recurrent event analysis. A comprehensive review of recurrent event analysis is presented by Cook and Lawless (2007). Su and Lin (2021) discussed a semiparametric rate model for recurrent event data with multiple event types, that incorporates cyclic or periodic components.
In many survival studies, the lifetimes of patients can be accurately observed only if it occurs within a window of observational time. The remaining lifetimes are only known to be either left censored or right censored. The data so obtained are referred to as doubly censored data. The analysis of doubly censored data has been studied by several researchers. Pioneering works include Turnbull (1974), Chang (1990), and Gu and Zhang (1993). Regression analysis for doubly censored data has been discussed by Cai and Cheng (2004) and semiparametric transformation models for doubly censored data were studied by Shen (2011). Ji et al. (2012) proposed a quantile regression method for doubly censored data. Proportional hazard models for the analysis of doubly censored data with multiple modes of failure have been proposed by Sankaran and Sreedevi (2016), while Shen and Chen (2018) and Shen (2022) introduced Aalen’s linear model and the Cox–Aalen model for doubly censored data. Recently, Li et al. (2020) presented a regression analysis method for multivariate doubly censored data using flexible semiparametric transformation frailty models, and Choi et al. (2021) proposed the Buckley–James method for an accelerated failure time model under double random censoring. Additionally, Choi and Huang (2021) discussed nonparametric maximum-likelihood estimation of semiparametric transformation models for double censored data. A joint model for the birth distribution and lifetime distribution in the context of heterogeneous survival data under double truncation is studied by Dörre (2021).
While recurrent event times are often subject to right censoring, there are situations where the data may also have other types of incomplete information. For instance, recurrent event data with general interval censoring has been studied by Chen et al. (2005) and Shen and Cook (2015). If the event of interest can occur more than once per subject, and these event recurrences can be subjected to both left censoring and right censoring, we obtain doubly censored recurrent event data. In such scenarios, the recurrence of an event can only be observed within an observational window defined by left and right censoring random variables, denoted as U and V, respectively.
Even though recurrent event data exposed to different types of censoring have been studied, recurrent event data subject to double censoring has not been explored yet. This motivates us to propose a semiparametric regression model for the analysis of doubly censored recurrent event data, based on marginal mean function of the cumulative number of recurrent events. The mean function characterizes the subject’s recurrence experience, and our proposed semiparametric regression model incorporates multiplicative covariate effects on the marginal mean function under double censoring. The effect of covariates on censoring random variables is also modeled using multiplicative intensity models.
The paper is organized as follows. In Sect. 2, we introduce the model and develop inference procedures to estimate regression parameters. The estimating equation approach is employed in inference problems. Asymptotic properties of the proposed estimators are discussed in detail. Estimation of the baseline cumulative mean function is also studied. A Monte Carlo simulation study is carried out in Sect. 3 to assess the finite sample behavior of the proposed estimators. The model is then applied to real-life data sets in Sect. 4. Finally, major conclusions of the study are presented in Sect. 5.
2 Model and inference procedures
Consider an event history study of some recurrent events that involve n independent subjects. Let \(N_i(t)\) be the cumulative number of events that have occurred up to time t, for \(0 \le t \le t^*\), \(i=1,2, \dots , n\) where \(t^*\) denotes the length of the study. Let \(T_{ij}\) be the recurrence times of the \(i^{th}\) subject for \(j=1,2, \dots , m_i\); \(i=1,2, \dots , n\). Assume that \(T_{ij}\)’s are doubly censored by the random variables \(U_i\) and \(V_i\); specifically left censored by \(U_i\) and right censored by \(V_i\). This means that we can observe the exact event recurrence time \(T_{ij}\) only when it falls within the observational window \([U_i, V_i]\), where \(U_i < V_i\), for \(i=1,2, \dots , n\). If \(T_{ij} < U_i\), the recurrence time will be left-censored, and if \(T_{ij} > V_i\), the recurrence time will be right-censored, where \(U_i\) and \(V_i\) are always observed. Denote \(X_{ij} = \text {min}\{\text {max}(T_{ij},U_i),V_i)\}\) for \(j=1,2, \dots , m_i\); \(i=1,2, \dots , n\). Let \({{\varvec{Z}}}_i\) be a \(p \times 1\) vector of covariates observed for each subject i, and we assume independence between \(T_i\) and \((U_i,V_i)\) given \({{\varvec{Z}}}_i\). Let \(E(N_i(t) \vert {{\varvec{Z}}}_i) = \mu (t \vert {{\varvec{Z}}}_i)\) be the mean function of \(N_i(t)\) conditional on \({{\varvec{Z}}}_i\). We consider the proportional mean model to specify the covariate effect on recurrence times, given by
where \(\mu _0(t)\) is the unspecified baseline mean function and \(\varvec{\beta }\) is the \(p \times 1\) vector of regression parameters. Furthermore, the covariate \({{\varvec{Z}}}_i\) may also have an effect on the censoring random variables \(U_i\) and \(V_i\).
To model the covariate effect on censoring random variables, we assume that the hazard rate function \(\lambda _1(t)\) of \(U_i\) satisfies the proportional hazards (PH) model
and due to the strict order restriction between the censoring variables \(U_i\) and \(V_i\), the hazard rate function \(\lambda _2(t)\) of \(V_i\) is assumed to satisfy the proportional hazards (PH) model (as discussed in Wang et al. (2010))
where \(\lambda _{10} (t)\) and \(\lambda _{20} (t)\) are completely unspecified baseline hazard functions, and \(\varvec{\gamma }\) and \(\varvec{\tau }\) are p-dimensional vector of regression parameters.
Define \(\delta _{1ij} = I(T_{ij} \le U_i)\), \(\delta _{2ij} =I(U_i < T_{ij} \le V_i)\) and \(\delta _{3ij} = I(T_{ij} > V_i)\), where I(A) is the indicator function of set A.
Denote
be the number of left-censored recurrence times and right-censored recurrence times, respectively, for \(i^{th}\) individual. Then
Also denote \(N_i^*(t) = \sum \nolimits _{j=1}^{m_i} I(X_{ij} \le t)\) and \(N_i(t) = \sum \nolimits _{j=1}^{m_i} I(T_{ij} \le t)\). Note that
Let \(W(t \vert {{\varvec{Z}}}_i) = P(U < t \le V\vert {{\varvec{Z}}}_i) = \bar{H}(t\vert {{\varvec{Z}}}_i) - \bar{G}(t\vert {{\varvec{Z}}}_i)\), where \(\bar{G}(t\vert {{\varvec{Z}}}_i)=P(U > t\vert {{\varvec{Z}}}_i)\) and \(\bar{H}(t\vert {{\varvec{Z}}}_i)=P(V \ge t\vert U_i, {{\varvec{Z}}}_i)\). Now, using the probability statement in (2.4), the consistent estimators of \(\mu _0(t)\) and \(\varvec{\beta }\) are obtained by solving the following two estimating equations:
and
where \(y_a\) and \(y_b\) are prespecified constants, such that \(P(X_{ij} \le y_a)\) and \(P(X_{ij} \ge y_b)\) are positive. Here, k(t) is a data-dependent weight function, and in practice, we use \(\hat{k}(t) = \frac{1}{N} \sum _{i=1}^{n} \sum _{j=1}^{m_i} I(X_{ij} \le t)\), where \(N = \sum _{i=1}^{n} m_i\). Thus, \(\hat{k}(t)\) is an increasing weight function that uniformly converges to the deterministic function k(t) within the interval \(t \in [y_a,y_b]\).
The quantity \(W(t \vert {{\varvec{Z}}}_i)\) in (2.5) and (2.6) is unknown. To estimate \(W(t \vert {{\varvec{Z}}})\), we define \(\Lambda _{10}(t) = \int _{0}^{t} \lambda _{10} (s) ds \) and \(\Lambda _{20}(t) = \int _{0}^{t} \lambda _{20} (s) ds \) as the cumulative hazard rate functions of left censoring random variables U and right censoring random variables V, respectively. Then, the survival functions \(\bar{G}(t \vert {{\varvec{Z}}})\) and \(\bar{H}(t \vert {{\varvec{Z}}})\) are given by
and
To estimate \(\varvec{\gamma }\), we consider the equation
To estimate \(\varvec{\tau }\), we consider the equation
Let \(\hat{\varvec{\gamma }}\) and \(\hat{\varvec{\tau }}\) be the estimators of \(\varvec{\gamma }\) and \(\varvec{\tau }\) given by \(U_1(\varvec{\gamma }) =0\) and \(U_2(\varvec{\tau }) = 0\). The estimators of \(\Lambda _{10}(t)\) and \(\Lambda _{20}(t)\) are Breslow type estimators. That is, for \(\Lambda _{10}(t) \), the estimator is
For \(\Lambda _{20}(t) \), the estimator is given by
Then, the survival functions \(\bar{G}(t \vert {{\varvec{Z}}})\) and \(\bar{H}(t \vert {{\varvec{Z}}})\) can be estimated as \(\bar{G}_n(t\vert {{\varvec{Z}}})\) and \(\bar{H}_n(t\vert {{\varvec{Z}}})\), where
and
Now, by substituting \(W(t\vert {{\varvec{Z}}})\) with \(\bar{W}_n(t\vert {{\varvec{Z}}}) = \bar{H}_n(t\vert {{\varvec{Z}}}) - \bar{G}_n(t\vert {{\varvec{Z}}})\) in estimating equations (2.5) and (2.6), one can estimate \(\mu _0(t)\) and \(\varvec{\beta }\).
We can estimate the baseline mean function \(\mu _0(t)\) as
Remark 2.1
When covariates do not have effect on the censoring times U and V, a simple estimate for \(W(t\vert {{\varvec{Z}}})\) is proposed as \(W_n^{*}(t) = \frac{1}{n} \sum \nolimits _{i=1}^{n} I(U_i < t \le V_i)\). This can be substituted in (2.5) and (2.6) and the estimators of \(\mu _0(t)\) and \(\varvec{\beta }\) can be obtained by solving those equations. We can note that, when \(m_i=1\) for all i, this situation reduces to the model studied in Cai and Cheng (2004).
2.1 Asymptotic properties
We now discuss the asymptotic properties of the estimators of regression parameters. We prove the consistency and asymptotic normality of the estimators of \(\varvec{\beta }, \varvec{\gamma }\) and \(\varvec{\tau }\). To establish the asymptotic properties, we define the martingale
where \(Y_i(t)\) represents the at-risk process, and we assume the following regularity conditions, similar to those given in Liu et al. (2010) and Du and Lv (2022):
- C1::
-
\(\{N_i^{*}(.),N_i^{(1)}(.), N_i^{(2)}(.), Y_i(.),{{\varvec{Z}}}_i, i=1,2,\ldots ,n\}\) are independent and identically distributed, where \(Y_i(t)\) is the at-risk process.
- C2::
-
\(N_i^{*}(.),N_i^{(1)}(.)\) and \(N_i^{(2)}(.)\) are bounded by constants.
- C3::
-
\({{\varvec{Z}}}_i,i=1,2,\ldots ,n\) are of bounded total variation on \([0,t^{*}]\).
- C4::
-
The matrices B, P and A defined in Appendix are non-singular.
The consistency and asymptotic normality of \(\hat{\varvec{\gamma }}\) and \(\hat{\varvec{\tau }}\), the regression coefficients for the left and right censoring times, respectively, are detailed in the Appendix.
Theorem 2.1
Under the regularity conditions C1–C4, the estimator \(\hat{\varvec{\beta }}\) is strongly consistent and \(\sqrt{n} (\hat{\varvec{\beta }} - \varvec{\beta })\) can be approximated by a \(p-\)variate normal distribution with mean vector 0 and variance–covariance matrix given by \(\Sigma \), where
where A and \(\Omega _i\) are specified in Appendix in (A.7) and (A.9), respectively.
Theorem 2.2
Under the regularity conditions C1–C4, \(\sqrt{n}(\hat{\mu }_0(t)-\mu _0(t))\) converges weakly to a zero-mean Gaussian process with covariance function at (s, t) given by
where
with
A consistent estimator of \(\Gamma (s,t)\) is given by
where
with \(\hat{M}_i(t)\) and \(\hat{D}(t,\hat{\varvec{\beta }})\) are obtained from (2.12) and (2.13), respectively, by replacing all the unknown quantities with their corresponding sample estimators. The terms \(\bar{{{\varvec{Z}}}}(t)\) and \(\hat{A}^{-1}\) are defined in (A.4) and (A.10), respectively.
The proofs of the above theorems are given in Appendix. Note that the variance–covariance matrix of the estimator of \(\varvec{\beta }\) does not yield to a closed form. In practice, one can use bootstrap re-sampling technique to find the standard error of \(\varvec{\hat{\beta }}\). The confidence intervals of \(\varvec{\hat{\beta }}\) can be constructed using the asymptotic normality of the estimator \(\varvec{\hat{\beta }}\).
3 Simulation study
A Monte Carlo simulation study is carried out to assess the finite sample behavior of the proposed estimators. We consider two covariates \(Z_1\) and \(Z_2\), where \(Z_1\) is generated from a Bernoulli distribution with a probability of success 0.5 and \(Z_2\) is generated from a standard normal distribution. Based on these two covariates, under the proportional hazard assumption, the left and right censoring times U and V are generated separately, and then, the recurrence times are simulated accordingly. The data generation for (U, V) is independent of the recurrence times, given \((Z_1,Z_2)\).
Following (2.2) and (2.3), we consider three different combinations for modeling the left and right censoring time. The left and right censoring times are assumed to follow the following three models:
-
(i)
A Gompertz proportional hazard model with baseline hazard function
$$\begin{aligned} \lambda _{10}(t) = \theta _1 \exp \{\alpha _1 t \} \text { and } \lambda _{20}(t) = \theta _2 \exp \{\alpha _2 t \}, \end{aligned}$$where \(\theta _1=0.2\), \(\alpha _1=0.1\), \(\theta _2=0.1\), and \(\alpha _2=0.1\). We assume the regression coefficients \(\varvec{\gamma }=\varvec{\tau }=(1,2)^{\text {T}}\).
-
(ii)
A Weibull proportional hazard model with with baseline hazard function
$$\begin{aligned} \lambda _{10}(t) = \theta _1 \alpha _1 t^{\alpha _1 - 1} \text { and } \lambda _{20}(t) = \theta _2 \alpha _2 t^{\alpha _2 - 1}, \end{aligned}$$where \(\theta _1=0.2\), \(\alpha _1=2\), \(\theta _2=0.3\), and \(\alpha _2=2.5\). Here, \(\varvec{\gamma }=\varvec{\tau }=(1.5,1)^{\text {T}}\).
-
(iii)
A baseline power model with baseline hazard function
$$\begin{aligned} \lambda _{10}(t) = t^{\alpha _1} \text { and } \lambda _{20}(t) = t^{\alpha _2}, \end{aligned}$$where \(\alpha _1=1\), and \(\alpha _2=2\). In this case, \(\varvec{\gamma }=\varvec{\tau }=(1,2)^{\text {T}}\).
In all the cases, the left censoring times, U, take values in [0, 5]. To ensure a sufficient follow-up time for every subject, right censoring times, V, are generated between \(U+5\) and \(U+20\). The recurrence times follow a Gompertz proportional hazards model:
with \(\theta =0.1\) and \(\alpha =0.1\) and \(\lambda (t \vert {{\varvec{Z}}})\) as the corresponding hazard rate function in presence of covariates. We generate 1000 data samples of sizes n = 50, 100, and 200 each with different choices of \(\beta _1\) and \(\beta _2\). In our simulation, we do not fix the censoring rates but rather consider random censoring proportions for different parameter combinations. Various combinations of parameters result in different proportions for left and right censoring times and varying levels of the average number of recurrences per subject. For example, here is a summary of the censoring proportions observed for some of the combinations in our study:
For Model (i) with a Gompertz proportional hazards model for left and right censoring times, with \(\beta _1=1.3\) and \(\beta _2=2\), the average number of events per subject is 21.14. The left censoring rate is found to be 8.91% of the total events, while the right censoring rate is 35.75% of the total events. The double censored data have an average of 11.48 events per subject. Under the same setting but with \(\beta _1=0.2\) and \(\beta _2=0.4\), the average number of events per subject reduces to 4.18. In this case, the left censoring rate is 10.47% and the right censoring rate is 25.49%. The double censored data have an average of 2.17 events per subject.
Similarly, for Model (ii) with \(\beta _1=1.5\) and \(\beta _2=1.9\), the average number of events per subject is 28.03. The left censoring rate is 10.56% and the right censoring rate is 8.51%. For the same model but with \(\beta _1=0.2\) and \(\beta _2=0.4\), the average number of events per subject is 2.55. In this case, the left censoring rate is 9.94% and the right censoring rate is 27.69%.
For Model (iii) with \(\beta _1=-0.5\) and \(\beta _2=-1.0\), the average number of events per subject is 1.83. The left censoring rate is 11.19% and the right censoring rate is 31.60%. When considering \(\beta _1=1.3\) and \(\beta _2=2.0\), the average number of events per subject is 24.66. The left censoring rate is 12.43% and the right censoring rate is 6.03%.
The absolute bias, mean squared error (MSE), and \(95\%\) coverage probability (CP) of the regression parameters for all the above models are calculated. Since, for all choices of \(\beta _1\) and \(\beta _2\), the results are similar, we present the results for randomly chosen four combinations of \(\beta _1\) and \(\beta _2\), for each model in Tables 1–3.
Tables 1–3 demonstrate that both absolute bias and MSE decrease as the sample size increases, indicating the consistency of the proposed estimation procedure. Furthermore, the coverage probability stabilizes near the chosen significance level. We also note that the estimated baseline mean functions for different combinations are of a similar nature. Therefore, we present only the estimated baseline mean functions for case (i) with \(\beta _1=1.5\) and \(\beta _2=1.9\), for both the choices of \(\bar{W}_n(t \mid {{\varvec{Z}}})\) and \(W_n^{*}(t \mid {{\varvec{Z}}})\) in Fig. 1. In Fig. 1, the blue line represents the estimated values of \(\mu _0(t)\) using the weight function \(\bar{W}_n(t \mid {{\varvec{Z}}})\), green line represents the estimated values of \(\mu _0(t)\) using the weight function \(W_n^{*}(t \mid {{\varvec{Z}}})\), and the red line represents the true values. We can observe that toward the end of the follow-up times, the estimator slightly overestimates the baseline mean function, for both choices of weight functions.
4 Data analysis
The inference procedure discussed in Sect. 2 is illustrated using two real-life data sets.
4.1 Multiple tumor recurrence data of bladder cancer patients
We apply the proposed methods to the multiple tumor recurrence data for patients with bladder cancer given in Wei et al. (1989). These data were obtained in a randomized clinical trial conducted by the Veterans Administration Co-operative Urological Research Group (VACURG) in which the patients with superficial bladder tumors were randomly assigned to one of three treatments: thiotepa, pyridoxine, and placebo. This data set is recently analyzed by Du and Lv (2022). The data are available in the R package survival (Therneau, 2023). In this analysis, we consider the patients treated with either placebo or thiotepa and having at least 1 month of follow-up, which results in the analysis of 47 patients assigned to placebo and 38 patients assigned to thiotepa. The right censoring times (V) are taken as the maximum follow-up time for each patient. Since the data are not doubly censored as such, we have chosen the left censoring times (U) randomly by observing the data. For patients with follow-up time greater than 8 months, U is generated randomly as a time less than or equal to 8 months. For all other patients, a time less than or equal to its right censoring time is randomly assigned as its left censoring time U. In the new data, 14 out of 112 events were lost due to left censoring.
The main interest of this analysis is to study the influence of thiotepa on recurrence frequency and the dependence of such recurrence rates on the size of initial tumors. Therefore, the covariates considered are the treatment group (0—thiotepa; 1—placebo) (\(Z_1\)) and the number of initial tumors (\(Z_2\)) for each of the patients. In these data, we have the left censoring times U and the right censoring times V are independent of the covariate vector \({{\varvec{Z}}}=(Z_1,Z_2)\).
The estimates of the model parameters, along with their corresponding standard errors (SE), are provided in Table 4. We also obtain estimates using the Prentice, Williams, and Peterson total time (PWP-TT) model (Prentice et al., 1981), which only considers right censoring and was fitted to the original dataset. From Table 4, it is evident that the estimates obtained from our proposed model align with those from the PWP-TT model. This agreement demonstrates the significant reduction in tumor recurrences for cancer patients with thiotepa treatment. The estimate of \(\mu _0(t)\) is depicted in Fig. 2, showing an increase in the baseline mean function after 46 months.
4.2 Study of gamma interferon in chronic granulomatous disease
We consider the Chronic Granulomatous Disease (CGD) data described in Fleming and Harrington (1991) for illustration. The dataset is accessible within the R package survival (Therneau, 2023). CGD refers to a set of inherited rare immune system disorders characterized by recurrent pyogenic infections that typically appear early in life and may lead to death in childhood. The efficacy of gamma interferon in reducing the rate of serious infections in CGD patients was investigated in a double blinded clinical trial in which patients were randomly assigned to receive either gamma interferon or a placebo. Out of 128 patients enrolled, 63 were randomized into the gamma interferon group and the rest to the placebo group. Among the placebo group, 30 patients experienced at least one infection, while 14 patients in the gamma interferon group encountered at least one infection.
We randomly chose left censoring times (U) by observing the data, because the data are not doubly censored as such. For patients having follow-up periods greater than 200 days, the left censoring time is randomly selected to be less than or equal to 200 days. All other patient’s left censoring times are chosen randomly to be less than or equal to their right censoring time. In the modified data, 20 out of 76 events were lost due to left censoring. Several covariates were recorded, but we focus on two of them, treatment group \(Z_1\), (\(Z_1=0\) for gamma interferon and \(Z_1=1\) for placebo) and the patient’s age \(Z_2\). We can see that the left and right censoring times are independent of the covariates for this data.
The estimates of the model parameters along with their standard errors (SE) are presented in Table 5. We also calculate the estimates of regression parameters using the PWP-TT model on the original data set, and the results are shown in Table 4. It is evident that the estimates obtained from our proposed model are consistent with those from the PWP-TT model. This shows that in people with CGD, gamma interferon decreases infection recurrence. Additionally, as patients get older, the rate of severe infections declines. The estimate of \(\mu _0(t)\) is plotted in Fig. 3.
5 Conclusion
Recurrent event data subject to double censoring are often encountered in clinical and other lifetime studies. In this article, we proposed a semiparametric regression model for recurrent event data, which specifies the multiplicative covariate effect on the marginal mean function under double censoring. The estimating equation approach is employed for the estimation of regression parameters. Asymptotic properties of the proposed estimates are discussed in detail. The results of the Monte Carlo simulation study make sure that the procedures followed are efficient. The proposed method was demonstrated on two real data sets: the multiple tumor recurrence data for bladder cancer patients, and a study of gamma interferon in chronic granulomatous disease.
Several open problems remain to be addressed in the analysis of recurrent event data subject to double censoring. In this article, we considered time-independent covariates. However, recurrent event data often involve time-dependent covariates as well. Miloslavsky et al. (2004) proposed an inverse probability of censoring weighted estimator for the regression parameters in the Andersen–Gill model for recurrent event data in the presence of time-dependent covariates and dependent censoring. Analysis of multivariate recurrent data in the presence of time-dependent covariates was studied by Sun et al. (2009) and Zhao et al. (2012). Regression analysis of recurrent event data with time-dependent covariates and informative censoring was studied by Huang et al. (2010) which allows correlations between censoring times and recurrent event process via frailty. Therefore, it is of interest to develop methods for the regression analysis of doubly censored recurrent events in the presence of time-dependent covariates, and work in this direction will be studied separately.
In the analysis of recurrent event data, two common time scales are often employed: the total time and the time since the last event (gap time). Recurrent event data subject to double censoring can be modeled using the gap times between recurrent events. A semiparametric transformation model can be developed for such data type, which include proportional hazards and proportional odds models as special cases. Further, one can develop inference procedures for doubly censored recurrent event data with multiple modes of recurrence. The modeling and analysis of doubly censored recurrent event data using frailty models is not yet discussed in literature. These problems are under investigation and will be reported in separate studies.
Data availability
The source of the data set used for illustration purpose is mentioned in the manuscript.
References
Andersen, P. K., & Gill, R. D. (1982). Cox’s regression model for counting processes: a large sample study. The Annals of Statistics, 10(4), 1100–1120.
Cai, T., & Cheng, S. (2004). Semiparametric regression analysis for doubly censored data. Biometrika, 91(2), 277–290.
Chang, M. N. (1990). Weak convergence of a self-consistent estimator of the survival function with doubly censored data. The Annals of Statistics, 18(1), 391–404.
Chen, B. E., Cook, R. J., Lawless, J. F., & Zhan, M. (2005). Statistical methods for multivariate interval-censored recurrent events. Statistics in Medicine, 24(5), 671–691.
Choi, S., & Huang, X. (2021). Efficient inferences for linear transformation models with doubly censored data. Communications in Statistics-Theory and Methods, 50(9), 2188–2200.
Choi, T., Kim, A. K., & Choi, S. (2021). Semiparametric least-squares regression with doubly-censored data. Computational Statistics & Data Analysis, 164, 107306.
Cook, R. J., & Lawless, J. F. (2007). The Statistical Analysis of Recurrent Events. New York: Springer.
Dörre, A. (2021). Semiparametric likelihood inference for heterogeneous survival data under double truncation based on a Poisson birth process. Japanese Journal of Statistics and Data Science, 4(2), 1203–1226.
Du, Y., & Lv, Y. (2022). A flexible additive-multiplicative transformation mean model for recurrent event data. Communications in Statistics-Theory and Methods, 51(2), 328–339.
Fleming, T. R., & Harrington, D. P. (1991). Counting Processes and Survival Analysis. New Jersey: John Wiley & Sons.
Gu, M., & Zhang, C.-H. (1993). Asymptotic properties of self-consistent estimators based on doubly censored data. The Annals of Statistics, 21(2), 611–624.
Huang, C.-Y., Qin, J., & Wang, M.-C. (2010). Semiparametric analysis for recurrent event data with time-dependent covariates and informative censoring. Biometrics, 66(1), 39–49.
Ji, S., Peng, L., Cheng, Y., & Lai, H. (2012). Quantile regression for doubly censored data. Biometrics, 68(1), 101–112.
Kelly, P. J., & Lim, L. L. Y. (2000). Survival analysis for recurrent event data: an application to childhood infectious diseases. Statistics in Medicine, 19(1), 13–33.
Lawless, J. F. (2011). Statistical Models and Methods for Lifetime Data. New Jersey: John Wiley & Sons.
Lawless, J. F., & Nadeau, C. (1995). Some simple robust methods for the analysis of recurrent events. Technometrics, 37(2), 158–168.
Li, S., Hu, T., Tong, T., & Sun, J. (2020). Semiparametric regression analysis of multivariate doubly censored data. Statistical Modelling, 20(5), 502–526.
Lin, D. Y., Wei, L.-J., Yang, I., & Ying, Z. (2000). Semiparametric regression for the mean and rate functions of recurrent events. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(4), 711–730.
Liu, Y., Wu, Y., Cai, J., & Zhou, H. (2010). Additive-multiplicative rates model for recurrent events. Lifetime Data Analysis, 16(3), 353–373.
Miloslavsky, M., Keleş, S., Laan, M. J., & Butler, S. (2004). Recurrent events analysis in the presence of time-dependent covariates and dependent censoring. Journal of the Royal Statistical Society Series B: Statistical Methodology, 66(1), 239–257.
Pepe, M. S., & Cai, J. (1993). Some graphical displays and marginal regression analyses for recurrent failure times and time dependent covariates. Journal of the American Statistical Association, 88(423), 811–820.
Prentice, R. L., Williams, B. J., & Peterson, A. V. (1981). On the regression analysis of multivariate failure time data. Biometrika, 68(2), 373–379.
Sankaran, P. G., & Anisha, P. (2011). Shared frailty model for recurrent event data with multiple causes. Journal of Applied Statistics, 38(12), 2859–2868.
Sankaran, P. G., & Sreedevi, E. P. (2016). A proportional hazards model for the analysis of doubly censored competing risks data. Communications in Statistics-Theory and Methods, 45(10), 2975–2987.
Shen, P.-S. (2011). Semiparametric analysis of transformation models with doubly censored data. Journal of Applied Statistics, 38(4), 675–682.
Shen, P.-S. (2022). The Cox-Aalen model for doubly censored data. Communications in Statistics-Theory and Methods, 51(23), 8075–8092.
Shen, P.-S., & Chen, C.-M. (2018). Aalen’s linear model for doubly censored data. Statistics, 52(6), 1328–1343.
Shen, H., & Cook, R. J. (2015). Analysis of interval-censored recurrent event processes subject to resolution. Biometrical Journal, 57(5), 725–742.
Su, C.-L., & Lin, F.-C. (2021). Analysis of cyclic recurrent event data with multiple event types. Japanese Journal of Statistics and Data Science, 4(2), 895–915.
Sun, J., & Wei, L. (2000). Regression analysis of panel count data with covariate-dependent observation and censoring times. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(2), 293–302.
Sun, L., Zhu, L., & Sun, J. (2009). Regression analysis of multivariate recurrent event data with time-varying covariate effects. Journal of Multivariate Aalysis, 100(10), 2214–2223.
Therneau, T. M. (2023). A Package for Survival Analysis in R. R package version 3.5-7.
Turnbull, B. W. (1974). Nonparametric estimation of a survivorship function with doubly censored data. Journal of the American Statistical Association, 69(345), 169–173.
Wang, L., Sun, J., & Tong, X. (2010). Regression analysis of case II interval-censored failure time data with the additive hazards model. Statistica Sinica, 20(4), 1709.
Wei, L. J., Lin, D. Y., & Weissfeld, L. (1989). Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. Journal of the American Statistical Association, 84(408), 1065–1073.
Zeng, D., & Lin, D. (2007). Semiparametric transformation models with random effects for recurrent events. Journal of the American Statistical Association, 102(477), 167–180.
Zhao, X., Liu, L., Liu, Y., & Xu, W. (2012). Analysis of multivariate recurrent event data with time-dependent covariates and informative censoring. Biometrical Journal, 54(5), 585–599.
Acknowledgements
The authors would like to express their gratitude to the associate editor and the unknown referees for their valuable comments and constructive suggestions, which have greatly contributed to the improvement of this research paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that there is no conflict of financial or non-financial interests that are directly or indirectly related to this research work.
Ethical statement
The submitted work is original and has not been published elsewhere.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
A: Proofs of results in Sect. 2.1
We make use of the results in Cai and Cheng (2004), for doubly censored data in a non-recurrent setup to prove the theorems stated in Sect. 2.1. Sun and Wei (2000) proved the asymptotic results of the estimators of regression parameters of panel count data with covariate-dependent observation and censoring times, in a similar way. Furthermore, the consistency and asymptotic normality of \(\hat{\varvec{\gamma }}\) and \(\hat{\varvec{\tau }}\), which are the regression coefficients in a proportional hazard model, have already been established in the literature and are stated below.
Theorem A.1
Under the regularity conditions C1–C4, the estimator \(\hat{\varvec{\gamma }}\) is consistent. Further, \(\sqrt{n} (\hat{\varvec{\gamma }} - \varvec{\gamma })\) can be approximated by a \(p-\)variate normal distribution with mean vector 0 and variance–covariance matrix given by
where
and
with \(\bar{N}_1(t) = \sum \limits _{i=1}^{n} N_i^{(1)}(t)\).
Theorem A.2
Under the regularity conditions C1–C4, the estimator \(\hat{\varvec{\tau }}\) is consistent. Also, \(\sqrt{n} (\hat{\varvec{\tau }} - \varvec{\tau })\) can be approximated by a \(p-\)variate normal distribution with mean vector 0 and variance–covariance matrix given by
where
and
with \(\bar{N}_2(t) = \sum \limits _{i=1}^{n} N_i^{(2)}(t)\) (see Lawless and Nadeau (1995)).
Proof of Theorem 2.1
Proof
Substituting (2.5) in (2.6) , followed by simple algebraic manipulation, yields
where
Now, using the idea discussed in Du and Lv (2022), \(U(\varvec{\beta })\) can be written as
Using the Taylor expansion of \(U(\varvec{\beta })\) around \(\varvec{\beta }_0\), where \(\varvec{\beta }_0\) is the true value of \(\varvec{\beta }\), yields
where
with \(\bar{{{\varvec{z}}}}(t)\) as the limit of \(\bar{{{\varvec{Z}}}}(t)\) (Du & Lv, 2022). Further, note that, \(\sup _{t} \vert \hat{\Lambda }_l(t\vert {{\varvec{Z}}}) - \Lambda _l(t\vert {{\varvec{Z}}}) \vert \rightarrow 0\) for \(l=1,2\), almost surely (see Lawless (2011)). Then, it is easy to follow that \(\sup _{t} \vert \bar{G}_n(t\vert {{\varvec{Z}}}) - \bar{G}(t\vert {{\varvec{Z}}}) \vert \rightarrow 0\) and \(\sup _{t} \vert \bar{H}_n(t\vert {{\varvec{Z}}}) - \bar{H}(t\vert {{\varvec{Z}}}) \vert \rightarrow 0\) almost surely. Thus, \(\sup _{t} \vert \bar{W}_n(t\vert {{\varvec{Z}}}) - \bar{W}(t\vert {{\varvec{Z}}}) \vert \rightarrow 0\) almost surely. Along with this, using the arguments similar to that of Lin et al. (2000), it follows that \(\sqrt{n} (\hat{\varvec{\beta }} - \varvec{\beta })\) can be approximated by a \(p-\)variate normal distribution with mean vector 0 and variance–covariance matrix given by \(\Sigma \), where
with
Now, a consistent estimator of \(\Sigma \) is given by \(\frac{1}{n} \hat{A}^{-1} \left( \sum \nolimits _{i=1}^{n} \hat{\Omega _i} \hat{\Omega }_i ^{\text {T}} \right) \hat{A}^{-1}\), where \(\hat{\Omega }_i\) is obtained by replacing all the theoretical quantities in \({\Omega _i}\) with their estimates and
Similar results are derived in Cai and Cheng (2004) for double censoring data in a non-recurrent scenario. \(\square \)
Proof of Theorem 2.2
Proof
We first write
Using Taylor series expansion of \(\hat{\mu }_0(t)\) in \(\varvec{\beta }_0\), the first term on the right side of A.11, equals to
By the uniform strong law of large numbers and Lemma 1 of Lin et al. (2000), we obtain
almost surely, where \(D(t,\varvec{\beta }_0)\) is given in Theorem 2.2. Also, we have
almost surely. From (A.6)
Therefore, by (A.11)–(A.13), we get
In addition, for \(0 \le t \le \tau ^{'}\)
Hence, Lemma 1 of Lin et al. (2000) implies that uniformly in t
Finally, it follows from (A.11)–(A.16) that:
uniformly in t, where \(\psi _i(t)\) is defined in Theorem 2.2. Hence, for each t, \(\sqrt{n}\{\hat{\mu }_0(t ; \hat{\varvec{\beta }})-\mu _0(t)\}\) follows an asymptotic behavior similar to a sum of independent and identically distributed mean-zero variables. By employing the multivariate central limit theorem, this expression converges, in finite-dimensional distribution, to a zero-mean Gaussian process. Utilizing modern empirical process theory, as described in Lin et al. (2000), we can ascertain that \(\psi _i(t)\) is a tight function. Consequently, \(\sqrt{n}\{\hat{\mu }_0(t ; \hat{\varvec{\beta }})-\mu _0(t)\}\) is also a tight function and weakly converges to a mean-zero Gaussian process, with a covariance function \(\Gamma (s,t)=E({\Psi }_1(s){\Psi }_1(t))\) at (s, t) . \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sankaran, P.G., Hari, S. & Sreedevi, E.P. Semiparametric regression analysis of doubly censored recurrent event data. Jpn J Stat Data Sci 7, 183–202 (2024). https://doi.org/10.1007/s42081-023-00234-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42081-023-00234-x