1 Introduction

Reliability and survival studies often involve events that can occur more than once per subject, known as recurrent events. These types of outcomes are prevalent in various fields, such as biology, medicine, reliability engineering, and social sciences. Medical examples of recurrent events include tumor recurrences, sequences of asthmatic attacks, and multiple infection episodes. Recurrent events can also occur in other contexts, such as the recurrence of economic recessions, traffic accidents, or repeated breakdowns of devices like mobile phones or computers (Du & Lv, 2022).

Regression analysis methods for recurrent events can be classified into two categories: conditional methods and marginal techniques. Conditional methods involve modeling the intensity or hazard rate function (Prentice et al., 1981; Andersen & Gill, 1982), while marginal techniques focus on mean functions or rate functions (Pepe & Cai, 1993; Lawless & Nadeau, 1995; Lin et al., 2000). Marginal methods have the advantage of providing an easier interpretation of the mean number of recurrent events compared to the intensity of the event. Frailty models have been proposed for recurrent event data by Zeng and Lin (2007) and Sankaran and Anisha (2011), while Kelly and Lim (2000) provide a comparison of various approaches for recurrent event analysis. A comprehensive review of recurrent event analysis is presented by Cook and Lawless (2007). Su and Lin (2021) discussed a semiparametric rate model for recurrent event data with multiple event types, that incorporates cyclic or periodic components.

In many survival studies, the lifetimes of patients can be accurately observed only if it occurs within a window of observational time. The remaining lifetimes are only known to be either left censored or right censored. The data so obtained are referred to as doubly censored data. The analysis of doubly censored data has been studied by several researchers. Pioneering works include Turnbull (1974), Chang (1990), and Gu and Zhang (1993). Regression analysis for doubly censored data has been discussed by Cai and Cheng (2004) and semiparametric transformation models for doubly censored data were studied by Shen (2011). Ji et al. (2012) proposed a quantile regression method for doubly censored data. Proportional hazard models for the analysis of doubly censored data with multiple modes of failure have been proposed by Sankaran and Sreedevi (2016), while Shen and Chen (2018) and Shen (2022) introduced Aalen’s linear model and the Cox–Aalen model for doubly censored data. Recently, Li et al. (2020) presented a regression analysis method for multivariate doubly censored data using flexible semiparametric transformation frailty models, and Choi et al. (2021) proposed the Buckley–James method for an accelerated failure time model under double random censoring. Additionally, Choi and Huang (2021) discussed nonparametric maximum-likelihood estimation of semiparametric transformation models for double censored data. A joint model for the birth distribution and lifetime distribution in the context of heterogeneous survival data under double truncation is studied by Dörre (2021).

While recurrent event times are often subject to right censoring, there are situations where the data may also have other types of incomplete information. For instance, recurrent event data with general interval censoring has been studied by Chen et al. (2005) and Shen and Cook (2015). If the event of interest can occur more than once per subject, and these event recurrences can be subjected to both left censoring and right censoring, we obtain doubly censored recurrent event data. In such scenarios, the recurrence of an event can only be observed within an observational window defined by left and right censoring random variables, denoted as U and V, respectively.

Even though recurrent event data exposed to different types of censoring have been studied, recurrent event data subject to double censoring has not been explored yet. This motivates us to propose a semiparametric regression model for the analysis of doubly censored recurrent event data, based on marginal mean function of the cumulative number of recurrent events. The mean function characterizes the subject’s recurrence experience, and our proposed semiparametric regression model incorporates multiplicative covariate effects on the marginal mean function under double censoring. The effect of covariates on censoring random variables is also modeled using multiplicative intensity models.

The paper is organized as follows. In Sect. 2, we introduce the model and develop inference procedures to estimate regression parameters. The estimating equation approach is employed in inference problems. Asymptotic properties of the proposed estimators are discussed in detail. Estimation of the baseline cumulative mean function is also studied. A Monte Carlo simulation study is carried out in Sect. 3 to assess the finite sample behavior of the proposed estimators. The model is then applied to real-life data sets in Sect. 4. Finally, major conclusions of the study are presented in Sect. 5.

2 Model and inference procedures

Consider an event history study of some recurrent events that involve n independent subjects. Let \(N_i(t)\) be the cumulative number of events that have occurred up to time t, for \(0 \le t \le t^*\), \(i=1,2, \dots , n\) where \(t^*\) denotes the length of the study. Let \(T_{ij}\) be the recurrence times of the \(i^{th}\) subject for \(j=1,2, \dots , m_i\); \(i=1,2, \dots , n\). Assume that \(T_{ij}\)’s are doubly censored by the random variables \(U_i\) and \(V_i\); specifically left censored by \(U_i\) and right censored by \(V_i\). This means that we can observe the exact event recurrence time \(T_{ij}\) only when it falls within the observational window \([U_i, V_i]\), where \(U_i < V_i\), for \(i=1,2, \dots , n\). If \(T_{ij} < U_i\), the recurrence time will be left-censored, and if \(T_{ij} > V_i\), the recurrence time will be right-censored, where \(U_i\) and \(V_i\) are always observed. Denote \(X_{ij} = \text {min}\{\text {max}(T_{ij},U_i),V_i)\}\) for \(j=1,2, \dots , m_i\); \(i=1,2, \dots , n\). Let \({{\varvec{Z}}}_i\) be a \(p \times 1\) vector of covariates observed for each subject i, and we assume independence between \(T_i\) and \((U_i,V_i)\) given \({{\varvec{Z}}}_i\). Let \(E(N_i(t) \vert {{\varvec{Z}}}_i) = \mu (t \vert {{\varvec{Z}}}_i)\) be the mean function of \(N_i(t)\) conditional on \({{\varvec{Z}}}_i\). We consider the proportional mean model to specify the covariate effect on recurrence times, given by

$$\begin{aligned} \mu (t \vert {{\varvec{Z}}}_i) = \mu _0(t) \exp (\varvec{\beta }^{\text {T}}{{\varvec{Z}}}_i), \end{aligned}$$
(2.1)

where \(\mu _0(t)\) is the unspecified baseline mean function and \(\varvec{\beta }\) is the \(p \times 1\) vector of regression parameters. Furthermore, the covariate \({{\varvec{Z}}}_i\) may also have an effect on the censoring random variables \(U_i\) and \(V_i\).

To model the covariate effect on censoring random variables, we assume that the hazard rate function \(\lambda _1(t)\) of \(U_i\) satisfies the proportional hazards (PH) model

$$\begin{aligned} \lambda _1(t \vert {{\varvec{Z}}}_i) = \lambda _{10}(t) \exp (\varvec{\gamma }^{\text {T}}{{\varvec{Z}}}_i), \end{aligned}$$
(2.2)

and due to the strict order restriction between the censoring variables \(U_i\) and \(V_i\), the hazard rate function \(\lambda _2(t)\) of \(V_i\) is assumed to satisfy the proportional hazards (PH) model (as discussed in Wang et al. (2010))

$$\begin{aligned} \lambda _2(t \vert U_i, {{\varvec{Z}}}_i) = I(t > U_i) \lambda _{20}(t) \exp (\varvec{\tau }^{\text {T}}{{\varvec{Z}}}_i), \end{aligned}$$
(2.3)

where \(\lambda _{10} (t)\) and \(\lambda _{20} (t)\) are completely unspecified baseline hazard functions, and \(\varvec{\gamma }\) and \(\varvec{\tau }\) are p-dimensional vector of regression parameters.

Define \(\delta _{1ij} = I(T_{ij} \le U_i)\), \(\delta _{2ij} =I(U_i < T_{ij} \le V_i)\) and \(\delta _{3ij} = I(T_{ij} > V_i)\), where I(A) is the indicator function of set A.

Denote

$$\begin{aligned} N_i^{(1)}(t) = \sum \limits _{j=1}^{m_i} I(X_{ij} \le t, \delta _{1ij}=1) \text { and } N_i^{(2)}(t) = \sum \limits _{j=1}^{m_i} I(X_{ij} \le t, \delta _{3ij}=1). \end{aligned}$$

be the number of left-censored recurrence times and right-censored recurrence times, respectively, for \(i^{th}\) individual. Then

$$\begin{aligned} E(dN_i^{(1)}(t) \vert {{\varvec{Z}}}_i) = \lambda _1(t \vert {{\varvec{Z}}}_i) \text { and } E(dN_i^{(2)}(t) \vert U_i, {{\varvec{Z}}}_i) = \lambda _2(t \vert U_i, {{\varvec{Z}}}_i). \end{aligned}$$

Also denote \(N_i^*(t) = \sum \nolimits _{j=1}^{m_i} I(X_{ij} \le t)\) and \(N_i(t) = \sum \nolimits _{j=1}^{m_i} I(T_{ij} \le t)\). Note that

$$\begin{aligned} P(X_{ij} \le t \le V_i \vert {{\varvec{Z}}}_i) = P(U_i < t \le V_i \vert {{\varvec{Z}}}_i) \cdot P(T_{ij} \le t \vert {{\varvec{Z}}}_i). \end{aligned}$$
(2.4)

Let \(W(t \vert {{\varvec{Z}}}_i) = P(U < t \le V\vert {{\varvec{Z}}}_i) = \bar{H}(t\vert {{\varvec{Z}}}_i) - \bar{G}(t\vert {{\varvec{Z}}}_i)\), where \(\bar{G}(t\vert {{\varvec{Z}}}_i)=P(U > t\vert {{\varvec{Z}}}_i)\) and \(\bar{H}(t\vert {{\varvec{Z}}}_i)=P(V \ge t\vert U_i, {{\varvec{Z}}}_i)\). Now, using the probability statement in (2.4), the consistent estimators of \(\mu _0(t)\) and \(\varvec{\beta }\) are obtained by solving the following two estimating equations:

$$\begin{aligned} \sum \limits _{i=1}^{n} [N_i^*(t) - W(t\vert {{\varvec{Z}}}_i) \mu _0(t) \exp (\varvec{\beta }^{\text {T}} {{\varvec{Z}}}_i)] = 0 \end{aligned}$$
(2.5)

and

$$\begin{aligned} \sum \limits _{i=1}^{n} \int _{y_a}^{y_b} {{\varvec{Z}}}_i [N_i^*(t) - W(t\vert {{\varvec{Z}}}_i) \mu _0(t) \exp (\varvec{\beta }^{\text {T}} {{\varvec{Z}}}_i)] d{k}(t) = 0, \end{aligned}$$
(2.6)

where \(y_a\) and \(y_b\) are prespecified constants, such that \(P(X_{ij} \le y_a)\) and \(P(X_{ij} \ge y_b)\) are positive. Here, k(t) is a data-dependent weight function, and in practice, we use \(\hat{k}(t) = \frac{1}{N} \sum _{i=1}^{n} \sum _{j=1}^{m_i} I(X_{ij} \le t)\), where \(N = \sum _{i=1}^{n} m_i\). Thus, \(\hat{k}(t)\) is an increasing weight function that uniformly converges to the deterministic function k(t) within the interval \(t \in [y_a,y_b]\).

The quantity \(W(t \vert {{\varvec{Z}}}_i)\) in (2.5) and (2.6) is unknown. To estimate \(W(t \vert {{\varvec{Z}}})\), we define \(\Lambda _{10}(t) = \int _{0}^{t} \lambda _{10} (s) ds \) and \(\Lambda _{20}(t) = \int _{0}^{t} \lambda _{20} (s) ds \) as the cumulative hazard rate functions of left censoring random variables U and right censoring random variables V, respectively. Then, the survival functions \(\bar{G}(t \vert {{\varvec{Z}}})\) and \(\bar{H}(t \vert {{\varvec{Z}}})\) are given by

$$\begin{aligned} \bar{G}(t\vert {{\varvec{Z}}}) = \text {exp}(-{\Lambda }_{10}(t)\text {exp}({\varvec{\gamma }}^{\text {T}} {{\varvec{Z}}})) \end{aligned}$$

and

$$\begin{aligned} \bar{H}(t\vert {{\varvec{Z}}}) = \text {exp}(-{\Lambda }_{20}(t)\text {exp}({\varvec{\tau }}^{\text {T}} {{\varvec{Z}}})). \end{aligned}$$

To estimate \(\varvec{\gamma }\), we consider the equation

$$\begin{aligned} 0 = U_1(\varvec{\gamma }) = \sum \limits _{i=1}^{n} \int _{y_a}^{y_b} \left[ {{\varvec{Z}}}_i - \frac{\sum \nolimits _{i=1}^{n} I(U_i \ge t) \exp (\varvec{\gamma }^{\text {T}} {{\varvec{Z}}}_i) {{\varvec{Z}}}_i}{\sum \nolimits _{i=1}^{n} I(U_i \ge t) \exp (\varvec{\gamma }^{\text {T}} {{\varvec{Z}}}_i)} \right] dN_i^{(1)}(t).\nonumber \\ \end{aligned}$$
(2.7)

To estimate \(\varvec{\tau }\), we consider the equation

$$\begin{aligned} 0 = U_2(\varvec{\tau }) = \sum \limits _{i=1}^{n} \int _{y_a}^{y_b} \left[ {{\varvec{Z}}}_i - \frac{\sum \nolimits _{i=1}^{n} I(V_i \ge t> U_i) \exp (\varvec{\tau }^{\text {T}} {{\varvec{Z}}}_i) {{\varvec{Z}}}_i}{\sum \nolimits _{i=1}^{n} I(V_i \ge t > U_i) \exp (\varvec{\tau }^{\text {T}} {{\varvec{Z}}}_i)} \right] dN_i^{(2)}(t).\nonumber \\ \end{aligned}$$
(2.8)

Let \(\hat{\varvec{\gamma }}\) and \(\hat{\varvec{\tau }}\) be the estimators of \(\varvec{\gamma }\) and \(\varvec{\tau }\) given by \(U_1(\varvec{\gamma }) =0\) and \(U_2(\varvec{\tau }) = 0\). The estimators of \(\Lambda _{10}(t)\) and \(\Lambda _{20}(t)\) are Breslow type estimators. That is, for \(\Lambda _{10}(t) \), the estimator is

$$\begin{aligned} \hat{\Lambda }_{10}(t) = \sum \limits _{i=1}^{n} \int _{0}^{t} \frac{ I(U_i \ge s) dN_i^{(1)}(s)}{\sum \nolimits _{i=1}^{n} I(U_i \ge s) \exp (\hat{\varvec{\gamma }}^{\text{ T }} {{\varvec{Z}}}_i)}. \end{aligned}$$
(2.9)

For \(\Lambda _{20}(t) \), the estimator is given by

$$\begin{aligned} \hat{\Lambda }_{20}(t) = \sum \limits _{i=1}^{n} \int _{0}^{t} \frac{ I(V_i \ge s > U_i) dN_i^{(2)}(s)}{\sum \nolimits _{i=1}^{n} I(V_i \ge s) \exp (\hat{\varvec{\tau }}^{\text{ T }} {{\varvec{Z}}}_i)}. \end{aligned}$$
(2.10)

Then, the survival functions \(\bar{G}(t \vert {{\varvec{Z}}})\) and \(\bar{H}(t \vert {{\varvec{Z}}})\) can be estimated as \(\bar{G}_n(t\vert {{\varvec{Z}}})\) and \(\bar{H}_n(t\vert {{\varvec{Z}}})\), where

$$\begin{aligned} \bar{G}_n(t\vert {{\varvec{Z}}}) = \text {exp}(-\hat{\Lambda }_{10}(t)\text {exp}(\hat{\varvec{\gamma }}^{\text {T}} {{\varvec{Z}}})) \end{aligned}$$

and

$$\begin{aligned} \bar{H}_n(t\vert {{\varvec{Z}}}) = \text {exp}(-\hat{\Lambda }_{20}(t)\text {exp}(\hat{\varvec{\tau }}^{\text {T}} {{\varvec{Z}}})). \end{aligned}$$

Now, by substituting \(W(t\vert {{\varvec{Z}}})\) with \(\bar{W}_n(t\vert {{\varvec{Z}}}) = \bar{H}_n(t\vert {{\varvec{Z}}}) - \bar{G}_n(t\vert {{\varvec{Z}}})\) in estimating equations (2.5) and (2.6), one can estimate \(\mu _0(t)\) and \(\varvec{\beta }\).

We can estimate the baseline mean function \(\mu _0(t)\) as

$$\begin{aligned} \hat{\mu }_0(t)=\frac{\sum \nolimits _{i=1}^{n}N_i^*(t)}{\sum \nolimits _{i=1}^{n}\bar{W}_n(t\vert {{\varvec{Z}}}_i)\text {exp}(\varvec{\hat{\beta }}^{\text {T}}{{\varvec{Z}}}_i)}. \end{aligned}$$
(2.11)

Remark 2.1

When covariates do not have effect on the censoring times U and V, a simple estimate for \(W(t\vert {{\varvec{Z}}})\) is proposed as \(W_n^{*}(t) = \frac{1}{n} \sum \nolimits _{i=1}^{n} I(U_i < t \le V_i)\). This can be substituted in (2.5) and (2.6) and the estimators of \(\mu _0(t)\) and \(\varvec{\beta }\) can be obtained by solving those equations. We can note that, when \(m_i=1\) for all i, this situation reduces to the model studied in Cai and Cheng (2004).

2.1 Asymptotic properties

We now discuss the asymptotic properties of the estimators of regression parameters. We prove the consistency and asymptotic normality of the estimators of \(\varvec{\beta }, \varvec{\gamma }\) and \(\varvec{\tau }\). To establish the asymptotic properties, we define the martingale

$$\begin{aligned} {M}_i(t)=Y_i(t) [ N_i^{*}(t)-{\mu }_0(t){W}(t\vert {{\varvec{Z}}}_i)\exp ({\varvec{\beta }}^{\text {T}}{{\varvec{Z}}}_i)], \quad i=1,2,\dots ,n, \end{aligned}$$
(2.12)

where \(Y_i(t)\) represents the at-risk process, and we assume the following regularity conditions, similar to those given in Liu et al. (2010) and Du and Lv (2022):

C1::

\(\{N_i^{*}(.),N_i^{(1)}(.), N_i^{(2)}(.), Y_i(.),{{\varvec{Z}}}_i, i=1,2,\ldots ,n\}\) are independent and identically distributed, where \(Y_i(t)\) is the at-risk process.

C2::

\(N_i^{*}(.),N_i^{(1)}(.)\) and \(N_i^{(2)}(.)\) are bounded by constants.

C3::

\({{\varvec{Z}}}_i,i=1,2,\ldots ,n\) are of bounded total variation on \([0,t^{*}]\).

C4::

The matrices BP and A defined in Appendix are non-singular.

The consistency and asymptotic normality of \(\hat{\varvec{\gamma }}\) and \(\hat{\varvec{\tau }}\), the regression coefficients for the left and right censoring times, respectively, are detailed in the Appendix.

Theorem 2.1

Under the regularity conditions C1–C4, the estimator \(\hat{\varvec{\beta }}\) is strongly consistent and \(\sqrt{n} (\hat{\varvec{\beta }} - \varvec{\beta })\) can be approximated by a \(p-\)variate normal distribution with mean vector 0 and variance–covariance matrix given by \(\Sigma \), where

$$\begin{aligned} \Sigma = \frac{1}{n} A^{-1} \left( \sum \nolimits _{i=1}^{n} \Omega _i \Omega _i^{\text {T}} \right) A^{-1}, \end{aligned}$$

where A and \(\Omega _i\) are specified in Appendix in (A.7) and (A.9), respectively.

Theorem 2.2

Under the regularity conditions C1–C4, \(\sqrt{n}(\hat{\mu }_0(t)-\mu _0(t))\) converges weakly to a zero-mean Gaussian process with covariance function at (st) given by

$$\begin{aligned} \Gamma (s,t)=E({\Psi }_1(s){\Psi }_1(t)), \end{aligned}$$

where

$$\begin{aligned} {\Psi }_1(t)= & {} \frac{{M}_1(t)}{n^{-1}\sum \nolimits _{i=1}^{n}Y_i(t) {W}(t\vert {{\varvec{Z}}}_i) \exp ({\varvec{\beta }}^{\text{ T }}{{\varvec{Z}}}_i)}\\ {}{} & {} -{D}(t,{\varvec{\beta }}) {A}^{-1}\int _{y_a}^{y_b}({{\varvec{Z}}}_1-\bar{{{\varvec{Z}}}}(t)){M}_1(t)d{k}(t), \end{aligned}$$

with

$$\begin{aligned} D(t,\varvec{\beta })=\frac{\sum \nolimits _{i=1}^{n}Y_i(t) W(t\vert {{\varvec{Z}}}_i) {{\varvec{Z}}}_i \mu _0(t)\exp (\varvec{\beta }^\text {T} {{\varvec{Z}}}_i) }{\sum \nolimits _{i=1}^{n}Y_i(t) W(t\vert {{\varvec{Z}}}_i)\exp (\varvec{\beta }^{\text {T}}{{\varvec{Z}}}_i)}. \end{aligned}$$
(2.13)

A consistent estimator of \(\Gamma (s,t)\) is given by

$$\begin{aligned} \hat{\Gamma }(s,t) = \frac{1}{n} \sum \limits _{i=1}^n \hat{\Psi }_i(s)\hat{\Psi }_i(t), \end{aligned}$$

where

$$\begin{aligned} \hat{\Psi }_i(t)=\frac{\hat{M}_i(t)}{n^{-1}\sum \limits _{i=1}^{n}Y_i(t)\bar{W}_n(t\vert {{\varvec{Z}}}_i) \exp (\hat{\varvec{\beta }}^{\text{ T }}{{\varvec{Z}}}_i)}-\hat{D}(t,\hat{\varvec{\beta }}) \hat{A}^{-1}\int _{y_a}^{y_b}\!\!\!({{\varvec{Z}}}_i-\bar{{{\varvec{Z}}}}(t))\hat{M}_i(t)d\hat{k}(t), \end{aligned}$$

with \(\hat{M}_i(t)\) and \(\hat{D}(t,\hat{\varvec{\beta }})\) are obtained from (2.12) and (2.13), respectively, by replacing all the unknown quantities with their corresponding sample estimators. The terms \(\bar{{{\varvec{Z}}}}(t)\) and \(\hat{A}^{-1}\) are defined in (A.4) and (A.10), respectively.

The proofs of the above theorems are given in Appendix. Note that the variance–covariance matrix of the estimator of \(\varvec{\beta }\) does not yield to a closed form. In practice, one can use bootstrap re-sampling technique to find the standard error of \(\varvec{\hat{\beta }}\). The confidence intervals of \(\varvec{\hat{\beta }}\) can be constructed using the asymptotic normality of the estimator \(\varvec{\hat{\beta }}\).

3 Simulation study

A Monte Carlo simulation study is carried out to assess the finite sample behavior of the proposed estimators. We consider two covariates \(Z_1\) and \(Z_2\), where \(Z_1\) is generated from a Bernoulli distribution with a probability of success 0.5 and \(Z_2\) is generated from a standard normal distribution. Based on these two covariates, under the proportional hazard assumption, the left and right censoring times U and V are generated separately, and then, the recurrence times are simulated accordingly. The data generation for (UV) is independent of the recurrence times, given \((Z_1,Z_2)\).

Following (2.2) and (2.3), we consider three different combinations for modeling the left and right censoring time. The left and right censoring times are assumed to follow the following three models:

  1. (i)

    A Gompertz proportional hazard model with baseline hazard function

    $$\begin{aligned} \lambda _{10}(t) = \theta _1 \exp \{\alpha _1 t \} \text { and } \lambda _{20}(t) = \theta _2 \exp \{\alpha _2 t \}, \end{aligned}$$

    where \(\theta _1=0.2\), \(\alpha _1=0.1\), \(\theta _2=0.1\), and \(\alpha _2=0.1\). We assume the regression coefficients \(\varvec{\gamma }=\varvec{\tau }=(1,2)^{\text {T}}\).

  2. (ii)

    A Weibull proportional hazard model with with baseline hazard function

    $$\begin{aligned} \lambda _{10}(t) = \theta _1 \alpha _1 t^{\alpha _1 - 1} \text { and } \lambda _{20}(t) = \theta _2 \alpha _2 t^{\alpha _2 - 1}, \end{aligned}$$

    where \(\theta _1=0.2\), \(\alpha _1=2\), \(\theta _2=0.3\), and \(\alpha _2=2.5\). Here, \(\varvec{\gamma }=\varvec{\tau }=(1.5,1)^{\text {T}}\).

  3. (iii)

    A baseline power model with baseline hazard function

    $$\begin{aligned} \lambda _{10}(t) = t^{\alpha _1} \text { and } \lambda _{20}(t) = t^{\alpha _2}, \end{aligned}$$

    where \(\alpha _1=1\), and \(\alpha _2=2\). In this case, \(\varvec{\gamma }=\varvec{\tau }=(1,2)^{\text {T}}\).

In all the cases, the left censoring times, U, take values in [0, 5]. To ensure a sufficient follow-up time for every subject, right censoring times, V, are generated between \(U+5\) and \(U+20\). The recurrence times follow a Gompertz proportional hazards model:

$$\begin{aligned} \lambda (t \vert {{\varvec{Z}}})=\theta \exp \{\alpha t \} \cdot \exp \{ \beta _1 Z_1+\beta _2 Z_2\}, \end{aligned}$$

with \(\theta =0.1\) and \(\alpha =0.1\) and \(\lambda (t \vert {{\varvec{Z}}})\) as the corresponding hazard rate function in presence of covariates. We generate 1000 data samples of sizes n = 50, 100, and 200 each with different choices of \(\beta _1\) and \(\beta _2\). In our simulation, we do not fix the censoring rates but rather consider random censoring proportions for different parameter combinations. Various combinations of parameters result in different proportions for left and right censoring times and varying levels of the average number of recurrences per subject. For example, here is a summary of the censoring proportions observed for some of the combinations in our study:

For Model (i) with a Gompertz proportional hazards model for left and right censoring times, with \(\beta _1=1.3\) and \(\beta _2=2\), the average number of events per subject is 21.14. The left censoring rate is found to be 8.91% of the total events, while the right censoring rate is 35.75% of the total events. The double censored data have an average of 11.48 events per subject. Under the same setting but with \(\beta _1=0.2\) and \(\beta _2=0.4\), the average number of events per subject reduces to 4.18. In this case, the left censoring rate is 10.47% and the right censoring rate is 25.49%. The double censored data have an average of 2.17 events per subject.

Similarly, for Model (ii) with \(\beta _1=1.5\) and \(\beta _2=1.9\), the average number of events per subject is 28.03. The left censoring rate is 10.56% and the right censoring rate is 8.51%. For the same model but with \(\beta _1=0.2\) and \(\beta _2=0.4\), the average number of events per subject is 2.55. In this case, the left censoring rate is 9.94% and the right censoring rate is 27.69%.

For Model (iii) with \(\beta _1=-0.5\) and \(\beta _2=-1.0\), the average number of events per subject is 1.83. The left censoring rate is 11.19% and the right censoring rate is 31.60%. When considering \(\beta _1=1.3\) and \(\beta _2=2.0\), the average number of events per subject is 24.66. The left censoring rate is 12.43% and the right censoring rate is 6.03%.

The absolute bias, mean squared error (MSE), and \(95\%\) coverage probability (CP) of the regression parameters for all the above models are calculated. Since, for all choices of \(\beta _1\) and \(\beta _2\), the results are similar, we present the results for randomly chosen four combinations of \(\beta _1\) and \(\beta _2\), for each model in Tables 13.

Table 1 Absolute bias, MSE, and CP of \(\varvec{\hat{\beta }}\) for case (i)
Table 2 Absolute bias, MSE, and CP of \(\varvec{\hat{\beta }}\) for case (ii)
Table 3 Absolute bias, MSE, and CP of \(\varvec{\hat{\beta }}\) for case (iii)
Fig. 1
figure 1

Estimate of \(\mu _0(t)\) for case (i) with \(\beta _1=1.5\) and \(\beta _2=1.9\) for sample sizes 50, 100, and 200

Tables 13 demonstrate that both absolute bias and MSE decrease as the sample size increases, indicating the consistency of the proposed estimation procedure. Furthermore, the coverage probability stabilizes near the chosen significance level. We also note that the estimated baseline mean functions for different combinations are of a similar nature. Therefore, we present only the estimated baseline mean functions for case (i) with \(\beta _1=1.5\) and \(\beta _2=1.9\), for both the choices of \(\bar{W}_n(t \mid {{\varvec{Z}}})\) and \(W_n^{*}(t \mid {{\varvec{Z}}})\) in Fig. 1. In Fig. 1, the blue line represents the estimated values of \(\mu _0(t)\) using the weight function \(\bar{W}_n(t \mid {{\varvec{Z}}})\), green line represents the estimated values of \(\mu _0(t)\) using the weight function \(W_n^{*}(t \mid {{\varvec{Z}}})\), and the red line represents the true values. We can observe that toward the end of the follow-up times, the estimator slightly overestimates the baseline mean function, for both choices of weight functions.

4 Data analysis

The inference procedure discussed in Sect. 2 is illustrated using two real-life data sets.

4.1 Multiple tumor recurrence data of bladder cancer patients

We apply the proposed methods to the multiple tumor recurrence data for patients with bladder cancer given in Wei et al. (1989). These data were obtained in a randomized clinical trial conducted by the Veterans Administration Co-operative Urological Research Group (VACURG) in which the patients with superficial bladder tumors were randomly assigned to one of three treatments: thiotepa, pyridoxine, and placebo. This data set is recently analyzed by Du and Lv (2022). The data are available in the R package survival (Therneau, 2023). In this analysis, we consider the patients treated with either placebo or thiotepa and having at least 1 month of follow-up, which results in the analysis of 47 patients assigned to placebo and 38 patients assigned to thiotepa. The right censoring times (V) are taken as the maximum follow-up time for each patient. Since the data are not doubly censored as such, we have chosen the left censoring times (U) randomly by observing the data. For patients with follow-up time greater than 8 months, U is generated randomly as a time less than or equal to 8 months. For all other patients, a time less than or equal to its right censoring time is randomly assigned as its left censoring time U. In the new data, 14 out of 112 events were lost due to left censoring.

The main interest of this analysis is to study the influence of thiotepa on recurrence frequency and the dependence of such recurrence rates on the size of initial tumors. Therefore, the covariates considered are the treatment group (0—thiotepa; 1—placebo) (\(Z_1\)) and the number of initial tumors (\(Z_2\)) for each of the patients. In these data, we have the left censoring times U and the right censoring times V are independent of the covariate vector \({{\varvec{Z}}}=(Z_1,Z_2)\).

The estimates of the model parameters, along with their corresponding standard errors (SE), are provided in Table 4. We also obtain estimates using the Prentice, Williams, and Peterson total time (PWP-TT) model (Prentice et al., 1981), which only considers right censoring and was fitted to the original dataset. From Table 4, it is evident that the estimates obtained from our proposed model align with those from the PWP-TT model. This agreement demonstrates the significant reduction in tumor recurrences for cancer patients with thiotepa treatment. The estimate of \(\mu _0(t)\) is depicted in Fig. 2, showing an increase in the baseline mean function after 46 months.

Table 4 Estimates and SE of regression coefficients for the tumor recurrence data
Fig. 2
figure 2

Estimate of \(\mu _0(t)\) for the tumor recurrence data

4.2 Study of gamma interferon in chronic granulomatous disease

We consider the Chronic Granulomatous Disease (CGD) data described in Fleming and Harrington (1991) for illustration. The dataset is accessible within the R package survival (Therneau, 2023). CGD refers to a set of inherited rare immune system disorders characterized by recurrent pyogenic infections that typically appear early in life and may lead to death in childhood. The efficacy of gamma interferon in reducing the rate of serious infections in CGD patients was investigated in a double blinded clinical trial in which patients were randomly assigned to receive either gamma interferon or a placebo. Out of 128 patients enrolled, 63 were randomized into the gamma interferon group and the rest to the placebo group. Among the placebo group, 30 patients experienced at least one infection, while 14 patients in the gamma interferon group encountered at least one infection.

We randomly chose left censoring times (U) by observing the data, because the data are not doubly censored as such. For patients having follow-up periods greater than 200 days, the left censoring time is randomly selected to be less than or equal to 200 days. All other patient’s left censoring times are chosen randomly to be less than or equal to their right censoring time. In the modified data, 20 out of 76 events were lost due to left censoring. Several covariates were recorded, but we focus on two of them, treatment group \(Z_1\), (\(Z_1=0\) for gamma interferon and \(Z_1=1\) for placebo) and the patient’s age \(Z_2\). We can see that the left and right censoring times are independent of the covariates for this data.

The estimates of the model parameters along with their standard errors (SE) are presented in Table 5. We also calculate the estimates of regression parameters using the PWP-TT model on the original data set, and the results are shown in Table 4. It is evident that the estimates obtained from our proposed model are consistent with those from the PWP-TT model. This shows that in people with CGD, gamma interferon decreases infection recurrence. Additionally, as patients get older, the rate of severe infections declines. The estimate of \(\mu _0(t)\) is plotted in Fig. 3.

Table 5 Estimates and SE of regression coefficients for the CGD data
Fig. 3
figure 3

Estimate of \(\mu _0(t)\) for the CGD data

5 Conclusion

Recurrent event data subject to double censoring are often encountered in clinical and other lifetime studies. In this article, we proposed a semiparametric regression model for recurrent event data, which specifies the multiplicative covariate effect on the marginal mean function under double censoring. The estimating equation approach is employed for the estimation of regression parameters. Asymptotic properties of the proposed estimates are discussed in detail. The results of the Monte Carlo simulation study make sure that the procedures followed are efficient. The proposed method was demonstrated on two real data sets: the multiple tumor recurrence data for bladder cancer patients, and a study of gamma interferon in chronic granulomatous disease.

Several open problems remain to be addressed in the analysis of recurrent event data subject to double censoring. In this article, we considered time-independent covariates. However, recurrent event data often involve time-dependent covariates as well. Miloslavsky et al. (2004) proposed an inverse probability of censoring weighted estimator for the regression parameters in the Andersen–Gill model for recurrent event data in the presence of time-dependent covariates and dependent censoring. Analysis of multivariate recurrent data in the presence of time-dependent covariates was studied by Sun et al. (2009) and Zhao et al. (2012). Regression analysis of recurrent event data with time-dependent covariates and informative censoring was studied by Huang et al. (2010) which allows correlations between censoring times and recurrent event process via frailty. Therefore, it is of interest to develop methods for the regression analysis of doubly censored recurrent events in the presence of time-dependent covariates, and work in this direction will be studied separately.

In the analysis of recurrent event data, two common time scales are often employed: the total time and the time since the last event (gap time). Recurrent event data subject to double censoring can be modeled using the gap times between recurrent events. A semiparametric transformation model can be developed for such data type, which include proportional hazards and proportional odds models as special cases. Further, one can develop inference procedures for doubly censored recurrent event data with multiple modes of recurrence. The modeling and analysis of doubly censored recurrent event data using frailty models is not yet discussed in literature. These problems are under investigation and will be reported in separate studies.