1 Introduction

In many studies, the event of interest can be experienced more than once per subject. Such outcomes have been termed as recurrent events, which are commonly encountered in longitudinal follow-up studies. Examples of recurrent events include, among others, bladder tumor recurrence times among patients in a randomized treatment trial (Byar 1980), times to the development of mammary tumors for rats in a carcinogenicity experiment (Gail et al. 1980), infection occurrence times among leukemia patients receiving bone marrow transplants (Prentice et al. 1981), times to valve seat replacement on diesel engines in a service fleet (Lawless and Nadeau 1995), and times to inpatient hospital admissions among intravenous drug users (Wang et al. 2001).

In practice, the data collection will have a fixed termination date, beyond which the occurrence times are censored. Recurrent events occur as naturally ordered multivariate failure time data. As a result, recurrent event data are often analyzed using methods of multivariate survival analysis (e.g., Prentice et al. 1981; Andersen and Gill 1982; Wei et al. 1989). However, applying these methods to recurrent event data requires care, since they are designed for a larger class of general data structures amenable to multivariate survival analysis. Correspondingly, the analysis of recurrent event data continues to be the subject of much methodological research, with interest focused on assessing the effects of covariates on certain features of the recurrent event process.

Let \(N_{i}^{*}(t)=\int _{0}^{t}dN_{i}^{*}(u)\) be the number of recurrent events in (0, t] for subject i, where \(dN_{i}^{*}(t)=N_{i}^{*}(t+dt)-N_{i}^{*}(t)\) denotes the number of events in the small time interval \((t,t+dt].\) Assume that subject i is observed over the period \([0,C_{i}]\) , and is observed to experience events at times \(T_{i1},\ldots ,T_{i,m_{i}},\) where \(C_{i}\) denotes the follow-up or censoring time and \(m_{i}\) denotes the total number of recurrent events for subject i. For subject i, let \(\tilde{T}_{ij}=T_{ij}-T_{i,j-1}\) denote the time elapsed from the \((j-1)\)th occurrence of the event to the jth occurrence of the event with \(T_{i,0}\equiv 0,\) \(j=1,\ldots ,m_{i}.\) \(N_{i}^{*}(t)\) is thus a subject-specific counting process since (1) \(N_{i}^{*}(t)\ge 0\), (2) \(N_{i}^{*}(t)\) is integer-valued, (3) for \(s<t,\) \(N_{i}^{*}(s)\le N_{i}^{*}(t)\) and (4) for \(s<t,\) the number of events in (st] is given by \(N_{i}^{*}(t)-N_{i}^{*}(s)\) (e.g., Ross 1989).

Recurrent event data are usually analyzed in the context of counting or point process models (Andersen et al. 1993). Following classical survival analysis, these methods are based on modeling the intensity and hazard functions. Because the mean number of events is more easily interpreted than the hazard, some authors propose modeling the mean and rate functions (Pepe and Cai 1993; Lawless and Nadeau 1995; Lin et al. 2000). We now define these terms formally: intensity function, mean function and rate function.

Let \(\mathcal {N}_{i}^{*}(t)=\{N_{i}^{*}(s);0\le s<t\}\) denote the event history of the ith subject up to time \(t^{-}.\) If

$$\begin{aligned} E\{dN_{i}^{*}(t)|\mathcal {N}_{i}^{*}(t)\}=\lambda _{i}(t|\mathcal {N}_{i}^{*}(t))dt \end{aligned}$$

then \(\lambda _{i}(t|\mathcal {N}_{i}^{*}(t))\) is called the intensity function of \(N_{i}^{*}(t)\). The probability distribution of \(\{dN_{i}^{*}(t);t\ge 0\}\) can be determined completely in terms of \(\lambda _{i}(t|\mathcal {N}_{i}^{*}(t))\) as discussed by Andersen et al. (1993). For Poisson processes, the intensity function is non-stochastic; that is \(\lambda _{i}(t|\mathcal {N}_{i}^{*}(t))=\lambda _{i}(t),\) and for renewal processes, \(\lambda _{i}(t|\mathcal {N}_{i}^{*}(t))=h_{i}(t-T_{i,N_{i}^{*}(t^{-})})\) where \(h_{i}(\cdot )\) is the hazard function for the inter-event times (which are iid) of the ith subject (Chiang 1968). The recurrent event process can be modeled as a Poisson process when event counts are of interest. The mean function of \(N_{i}^{*}(t)\) is defined by \(\mu _{i}(t)=E[N_{i}^{*}(t)].\) If \(E[dN_{i}^{*}(t)]=r_{i}(t)dt,\) then \(r_{i}(t)\) is called the rate function of \(N_{i}^{*}(t).\) When one wants to assess the effect of covariates on the process, analysis of the mean and rate functions is suggested, particularly since assumptions on the intra-subject dependence structure are avoided.

For recurrent event data, there are various models proposed in the survival analysis literature. These include conditional intensity models (Prentice et al. 1981; Andersen and Gill 1982; Chang and Wang 1999; Zeng and Lin 2006), marginal intensity models (Wei et al. 1989; Lee et al. 1992), the frailty model approach (Nielsen et al. 1992; Murphy 1994, 1995; Zeng and Lin 2007), and marginal means and rates models (Pepe and Cai 1993; Lawless and Nadeau 1995; Lin et al. 2000, 2001; Ghosh 2004; Schaubel et al. 2006; Sun and Su 2008; Liu et al. 2010; Sun et al. 2011, 2012). Cook and Lawless (2007) provided a broad review of the existing literature for the analysis of this type of data.

Often, however, there may exist a dependent terminal event that stops the recurrent events and the follow-up. This terminal event is usually correlated with the recurrent events of interest, and this correlation should be accounted for in the analysis. Indeed, this correlation can be counter intuitive. For example, frequent visits to the hospital for treatments can be positively correlated (e.g. aggressive, invasive, or risky therapies) or negatively correlated (physical therapy or dialysis) with a high death rate. Mazroui et al. (2010) emphasized the importance of ascertaining the nature of the dependence between the terminal event and the recurrent event process in the medical context, citing the reason that some therapies may be beneficial in slowing the rate of disease recurrence, but not prolonging survival. Not surprisingly, modeling the dependence between the terminal event and the recurrent events (as opposed to simply studying the recurrent event process) has gained more importance over the past few years.

The existing methods for handling recurrent event data in the presence of a terminal event generally fall into two approaches: frailty methods and marginal methods. Frailty models use random effects to account for the correlation between the recurrent and terminal events (Mazroui et al. 2010; Wang et al. 2001; Huang and Wang 2004; Liu et al. 2004; Ye et al. 2007; Zeng and Lin 2009). Marginal methods focus on the marginal rates of the recurrent and terminal events, leaving the correlation between the recurrent and terminal events unspecified (Cook and Lawless 1997; Ghosh and Lin 2000, 2002; Miloslavsky et al. 2004; Pan and Schaubel 2009; Zeng and Cai 2010).

We note that Huang and Wang (2004) just used the proportional hazards model for the terminal event. It is known that sometimes the proportional hazards model may not fit failure time data well. When this is the case, one alternative is the additive hazards model. The latter describes a different aspect of the relationship between the survival time and covariates and in many situations could be more plausible than the proportional hazards model ( Lin and Ying 1994). For example, in public health studies, the risk difference described by the additive hazards model is the quantity used more often than the risk ratio described by the proportional hazards model. This is discussed further in Breslow and Day (1987). Dunson and Herring (2005) proposed a new Bayesian model selection and averaging procedure which can be used to choose between proportional and additive models.

In this paper, we propose a joint model with the recurrent event process and the terminal event linked through a common subject-specific latent variable, with the proportional intensity model used for modeling the recurrent event process and the additive hazards model used for modeling the terminal event time. This model is flexible in that no parametric assumptions on the distributions of censoring times and latent variables are made. In Sect. 2, we describe the proposed models. An estimating procedure and asymptotic properties for the parameter estimators are established in Sect. 2.1. Section 2.2 reports some results from simulation studies conducted for evaluating the finite sample performance of the proposed method. In Sect. 3, the methodology is applied to hospitalization data for heart failure patients from the clinical data repository at University of Virginia Health System. All technical proofs are included in the Appendix.

2 Model specification

Let \(N^{*}(t)\) denote the number of recurrent events over the time interval (0, t], and let X be a p-dimensional vector of covariates. A non negative latent variable v is associated with the recurrent process, satisfying the property that \(E(v|X)=1\). Let D be the terminating event time (e.g., death) and C be the follow-up or censoring time. Write \(Y=C\wedge D\) and \(\delta =I(D\le C)\), where \(a\wedge b=\min (a,b)\), and \(I(\cdot )\) is the indicator function. The deterministic time \(\tau >0\) signifies the end of the observation period, so that \(C\le \tau \) with probability 1. Due to censoring, \(N^{*}(\cdot )\) is not fully observed, and the number of observed events is denoted by \(N(t)=N^{*}(t\wedge Y)\). We denote by m the total number of recurrent events to occur; that is, \(m=N(Y)\). The random object \(\left( N^{*}(\cdot ),X,C,D,Y,v,\delta ,m\right) \) will signify a model for a subject chosen at random, and subscripts assigned to distinguish subjects. Thus, for a random sample of n subjects, \(\left( N_{i}^{*}(\cdot ),X_{i},C_{i},D_{i},Y_{i},v_{i},\delta _{i},m_{i}\right) ,i=1,2,\ldots ,n\) will be the associated independent and identically distributed random objects. Note that for this sequence, only \(\left( N_{i}(\cdot ),Y_{i},\delta _{i},X_{i},m_{i}\right) ,\ i=1,\dots ,n\), is observable.

For the recurrent event process, we assume that conditional on \(X_{i}\) and a latent variable \(v_{i}\), \(N_{i}^{*}(\cdot )\) is a Poisson process with intensity function

$$\begin{aligned} E(dN_{i}^{*}(t)|v_{i},X_{i})=d\Lambda _{i}(t)=v_{i}\exp \{\gamma _{0}'X_{i}\}d\Lambda _{0}(t) \end{aligned}$$
(1)

where \(\Lambda _{0}(t)\) is the unspecified deterministic baseline cumulative intensity function and \(\gamma _{0}\) is the regression parameter. Note that the intensity function \(E(dN_{i}^{*}(t)|v_{i},X_{i})\) in model (1) is not conditional on the event history. It is valid in our case because, due to the memory-less property of the Poisson Process, the intensity function is equivalent to the rate function of the recurrent event process. The latent variable \(v_{i}\) is treated as a nuisance parameter and no parametric assumptions are imposed. Since any other positive value of the mean of \(v_{i}\) can be absorbed into \(\Lambda _{0}(t)\), for identifiability of model (1), we assume that \(v_{i}\) is non negative and has conditional mean 1 given \(X_{i}\). Thus, \(v_{i}\) plays the role of a frailty parameter in the recurrent event process, with the intensity of recurrent events increasing with \(v_{i}\).

We specify the additive hazards model for the terminal event time \(D_{i}\) as

$$\begin{aligned} \alpha _{i}(t)=\alpha _{0}(t)+\eta _{0}'X_{i}+\theta _{0}v_{i} \end{aligned}$$
(2)

for \(t\ge 0\), where \(\alpha _{0}(\cdot )\) is an unspecified deterministic baseline hazard function, \(\eta _{0}\) is a p-dimensional vector of unknown regression parameters, and \(\theta _{0}\) is an unknown parameter. Note that (2) defines the conditional (on \(v_{i}\) and \(X_{i}\)) hazard function when the quantity on the right is non negative. This hazard function is 0 otherwise.

It is assumed that conditional on \(X_{i}\) and \(v_{i}\), \(D_{i}\), \(C_{i}\) and \(N_{i}^{*}(\cdot )\) are mutually independent. Thus, (1) and (2) link the recurrent event process with the terminal event for a given individual i through the random effect \(v_{i}\). The parameter \(\theta _{0}\) conveys information about the correlation (conditional on \(X_{i})\) between the conditional hazard function for the terminal event time, and the conditional intensity function of the recurrent event process for a given individual. Specifically, if we denote (with t and \(X_{i}\) held fixed) \(f_{i}(v_{i})=v_{i}\exp \{\gamma _{0}'X_{i}\}d\Lambda _{0}(t)/dt\) and \(g_{i}(v_{i})=\max \{0,\alpha _{0}(t)+\eta _{0}'X_{i}+\theta _{0}v_{i}\}\), then \(f_{i}\) and \(g_{i}\) are both non-decreasing in \(v_{i}\) when \(\theta _{0}\ge 0\) , and when \(\theta _{0}\le 0\) , \(f_{i}\) is non-decreasing in \(v_{i}\) while \(g_{i}\) is non-increasing in \(v_{i}.\) Therefore, the conditional (on \(X_{i}\)) covariance of \(f(v_{i})\) and \(g(v_{i})\) is non-negative if \(\theta _{0}\ge 0\) , and is non-positive if \(\theta _{0}\le 0\). It follows that the conditional recurrent event intensity and conditional hazard function for the terminal event time for individual i are non-negatively correlated when \(\theta _{0}\ge 0\), and non-positively correlated when \(\theta _{0}\le 0\) . Thus, testing the null hypothesis \(\theta _{0}=0\) versus the alternatives \(\theta _{0}>0\) or \(\theta _{0}<0\) will provide evidence on how the recurrent event process influences the risk of the terminal event; if the null hypothesis is rejected in favor of \(\theta _{0}>0\), then higher intensities of recurrent events are associated with an increased risk of the terminal event occurring, while if the null hypothesis is rejected in favor of \(\theta _{0}<0\), then higher intensities of recurrent events are associated with lower risk of the terminal event occurring.

2.1 Estimation of regression parameters

The model we have outlined is semiparametric, since it involves both finite and infinite dimensional unknowns. Since we make no assumptions about the form of the distribution of the latent variable or follow-up/censoring times, we use estimating-equation procedures to form our parameter estimators. Estimators formed in this way can be semiparametric efficient under certain regularity conditions. We do not attempt such a discussion here, opting instead for our minimal assumptions and the tractability of the estimating-equation technique.

Note that we are using the same informative censoring model that was studied by Wang et al. (2001), who focused on the estimation of parameters in model (1). They treated the nonparametric distributions of the censoring and latent variable as nuisance parameters, and were able to formulate consistent, asymptotically normal estimators for the regression parameters \(\gamma _{0}\). Thus, our focus is mainly on estimation results for model (2), with distributional results being asymptotic. Our purpose is to demonstrate the importance of taking into account the association between the recurrent event process and the terminal event process when drawing inferences on the covariate effects on the terminal event. In Sect. (3), we examine this issue using marginal p-values (based on asymptotic distributions) computed using data from hospitalization data for heart failure patients from the clinical data repository at University of Virginia Health System. For this, we find it sufficient here to derive the joint asymptotic distribution of the parameter estimators in model (2) only, rather than to solve the more complicated problem of doing this for the joint distribution of all the parameter estimators in both models (1) and (2). It will be seen in this section that we use estimating equations for all the parameters in both models, but only solve the equations and develop the asymptotic distributions for the parameters from model (2). For the marginal asymptotic distribution needed for model (1), we use the results from Wang et al. (2001) in our analysis in Sect. (3) and in our simulation results. For analyses requiring simultaneous inferences on the parameters in both models (1) and (2), it will be necessary to derive the joint asymptotic distribution of the parameter estimators for both models. For now, we leave this as a open question for future research.

Let \(N^{D}(t)=I(Y\le t,\delta =1)\), \(\Delta (t)=I(t\le Y)\), and \(\tau \) be the end time of the study. Estimation equations involving the parameters of interest can be written down easily based on the following population equations, where \(Q(t),t\ge 0\), is a (possibly data-dependent) weight function:

$$\begin{aligned}&E\left[ dN^{D}(t)-\left\{ \eta 'X+\theta v+\alpha _{0}(t)\right\} \Delta (t)dt\right] =0\\&\quad E\left[ \int _{0}^{\tau }Q(t)X\Big \{ dN^{D}(t)-\{\eta 'X+\theta v+\alpha _{0}(t)\}\Delta (t)dt\Big \}\right] =0\\&\quad E\left[ \int _{0}^{\tau }Q(t)v\Big \{ dN^{D}(t)-\{\eta 'X+\theta v+\alpha _{0}(t)\}\Delta (t)dt\Big \}\right] =0 \end{aligned}$$

Note that if v can be observed, then using the generalized estimating equation approach (Liang and Zeger 1986), the parameters for the terminal event time (2) can be estimated as follows. Adding subject-specific subscripts, and since \(\left( N_{i}^{D}(\cdot ),\Delta _{i},X_{i},v_{i}\right) ,i=1,2,\ldots ,n\) are independent and identically distributed, the estimating equations would be:

$$\begin{aligned}&\sum _{i=1}^{n}dN_{i}^{D}(t)-\{\eta 'X_{i}+\theta v_{i}+\alpha _{0}(t)\}\Delta _{i}(t)dt=0,\\&\quad \sum _{i=1}^{n}\int _{0}^{\tau }Q(t)X_{i}\Big \{ dN_{i}^{D}(t)-\{\eta 'X_{i}+\theta v_{i}+\alpha _{0}(t)\}\Delta _{i}(t)dt\Big \}=0,\\&\quad \sum _{i=1}^{n}\int _{0}^{\tau }Q(t)v_{i}\Big \{ dN_{i}^{D}(t)-\{\eta 'X_{i}+\theta v_{i}+\alpha _{0}(t)\}\Delta _{i}(t)dt\Big \}=0. \end{aligned}$$

However, in practice, \(v_{i}\) can not be observed. For this, define \(\Lambda _{i}^{*}(t)=\Lambda _{0}(t)e^{\gamma _{0}'X_{i}}\) and \(V_{i}^{*}=m_{i}\Lambda _{i}^{*}(Y_{i})^{-1}\)for subject i. Note that \(E(V_{i}^{*}|X_{i},Y_{i},\delta _{i},v_{i})=v_{i}\) and

$$\begin{aligned} E(V_{i}^{*}(m_{i}-1)\Lambda _{i}^{*}(Y_{i})^{-1}|X_{i},Y_{i},\delta _{i},v_{i})=v_{i}^{2}. \end{aligned}$$

Thus, for given \(\gamma _{0}\) and \(\Lambda _{0}(t)\), we can estimate \(\alpha _{0}(t)\), \(\eta _{0}\) and \(\theta _{0}\) using the following three unbiased estimating equations:

$$\begin{aligned}&\sum _{i=1}^{n}dN_{i}^{D}(t)-\{\eta 'X_{i}+\theta V_{i}^{*}+\alpha _{0}(t)\}\Delta _{i}(t)dt=0, \end{aligned}$$
(3)
$$\begin{aligned}&\quad \sum _{i=1}^{n}\int _{0}^{\tau }Q(t)X_{i}\Big \{ dN_{i}^{D}(t)-\{\eta 'X_{i}+\theta V_{i}^{*}+\alpha _{0}(t)\}\Delta _{i}(t)dt\Big \}=0, \end{aligned}$$
(4)
$$\begin{aligned}&\quad \sum _{i=1}^{n}\int _{0}^{\tau }Q(t)V_{i}^{*}\Big \{ dN_{i}^{D}(t)-\{\eta 'X_{i}+\theta (m_{i}-1)\Lambda _{i}(Y_{i})^{-1}+\alpha _{0}(t)\}\Delta _{i}(t)dt\Big \}=0.\nonumber \\ \end{aligned}$$
(5)

Of course, \(\gamma _{0}\) and \(\Lambda _{0}(t)\) are unknown, but they can be consistently estimated by fitting the model (1). Following Wang et al. (2001), given \(v_{i}\) and \(X_{i}\), the recurrent event process is a non homogeneous Poisson process under model (1). Since \(m_{i}\) denotes the total number of recurrent events for subject i, it follows that given (\(v_{i}\), \(X_{i},\) \(Y_{i}\)), \(m_{i}\) has a Poisson distribution with mean \(v_{i}\Lambda _{0}(Y_{i})e^{\gamma _{0}'X_{i}}\). Define \(F(t)={\Lambda _{0}(t)/\Lambda _{0}(\tau )}\). Then F(t) can be estimated by

$$\begin{aligned} \widehat{F}(t)=\prod _{t<s\le \tau }\left( 1-\frac{\sum _{i=1}^{n}dN_{i}^{D}(s)}{\sum _{i=1}^{n}\Delta _{i}(s)N_{i}^{D}(s)}\right) . \end{aligned}$$

Let \(X_{i}^{*}=(1,X_{i}')'\), \(\alpha _{1}=\log \Lambda _{0}(\tau )\) and \(\alpha =(\alpha _{1},\gamma _{0}')'\). Using the generalized estimating equation approach, \(\alpha \) can be estimated by solving

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}X_{i}^{*}(m_{i}\widehat{F}^{-1}(Y_{i})-\exp \{\alpha 'X_{i}^{*}\})=0. \end{aligned}$$
(6)

Let \(\widehat{\alpha }=(\widehat{\alpha }_{1},\widehat{\gamma }')'\) denote the solution to the foregoing estimating equation. Then \(\Lambda _{0}(t)\) can be estimated by \(\widehat{\Lambda }_{0}(t)=e^{\widehat{\alpha }_{1}}\widehat{F}(t)\). Define \(\widehat{\Lambda }_{i}(Y_{i})=\widehat{\Lambda }_{0}(Y_{i})e^{\widehat{\gamma }'X_{i}}\) and \(\widehat{V}_{i}=m_{i}\widehat{\Lambda }_{i}(Y_{i})^{-1}.\)

Let \(\mathcal {A}_{0}(t)=\int _{0}^{t}\alpha _{0}(s)ds\). Then for given \(\eta \) and \(\theta ,\) by replacing \(V_{i}^{*}\) with \(\widehat{V}_{i}\) in the estimating equation (3), a reasonable estimator for \(\mathcal {A}_{0}(t)\) is the solution to

$$\begin{aligned} \sum _{i=1}^{n}dN_{i}^{D}(t)-\{\eta 'X_{i}+\theta \widehat{V}_{i}+\alpha _{0}(t)\}\Delta _{i}(t)dt=0, \end{aligned}$$

The solution to the above estimating equation is

$$\begin{aligned} \widehat{\mathcal {A}}_{0}(t;\eta ,\theta )=\int _{0}^{t}\frac{\sum _{i=1}^{n}dN_{i}^{D}(s)-\{\eta 'X_{i}+\theta \widehat{V}_{i}\}\Delta _{i}(s)ds}{\sum _{i=1}^{n}\Delta _{i}(s)}. \end{aligned}$$

On replacement of \(\mathcal {A}_{0}(t)\) with \(\widehat{\mathcal {A}}_{0}(t;\eta ,\theta )\) in (4) and (5), and by replacing \(V_{i}^{*}\) and \(\Lambda _{i}(Y_{i})\) with \(\widehat{V}_{i}\) and \(\widehat{\Lambda }_{i}(Y_{i})\), respectively, we obtain the following two estimating functions for \(\eta \) and \(\theta \):

$$\begin{aligned} U_{1}(\eta ,\theta )= & {} \sum _{i=1}^{n}\int _{0}^{\tau }Q(t)\{X_{i}-\bar{X}(t)\}\Big \{ dN_{i}^{D}(t)-\{\eta 'X_{i}+\theta \widehat{V}_{i}\}\Delta _{i}(t)dt\Big \},\\ U_{2}(\eta ,\theta )= & {} \sum _{i=1}^{n}\int _{0}^{\tau }Q(t)\Big [\{\widehat{V}_{i}-\bar{V}(t)\}\big \{ dN_{i}^{D}(t)-\eta 'X_{i}\Delta _{i}(t)dt\big \}\\&-\,\theta \{\widehat{\Omega }_{i}-\widehat{V}_{i}\bar{V}(t)\}\Delta _{i}(t)dt\Big ], \end{aligned}$$

where

$$\begin{aligned} \widehat{\Omega }_{i}=m_{i}(m_{i}-1)\widehat{\Lambda }_{i}(Y_{i})^{-2}, \end{aligned}$$

and

$$\begin{aligned}&\bar{X}(t)=\frac{\sum _{i=1}^{n}\Delta _{i}(t)X_{i}}{\sum _{i=1}^{n}\Delta _{i}(t)},\\&\bar{V}(t)=\frac{\sum _{i=1}^{n}\Delta _{i}(t)\widehat{V}_{i}}{\sum _{i=1}^{n}\Delta _{i}(t)}. \end{aligned}$$

Let \(\widehat{\eta }\) and \(\widehat{\theta }\) denote the solutions to \(U_{1}(\eta ,\theta )=0\) and \(U_{2}(\eta ,\theta )=0.\) Then \(\widehat{\eta }\) and \(\widehat{\theta }\) have the explicit form

$$\begin{aligned} \left( \begin{array}{c} \widehat{\eta }\\ \widehat{\theta } \end{array}\right) =\widehat{A}^{-1}\left\{ \frac{1}{n}\sum _{i=1}^{n}\int _{0}^{\tau }Q(t)\left( \begin{array}{c} X_{i}-\bar{X}(t)\\ \widehat{V}_{i}-\bar{V}(t) \end{array}\right) dN_{i}^{D}(t)\right\} , \end{aligned}$$

where

$$\begin{aligned} \widehat{A}=\left( \begin{array}{cc} \widehat{A}_{11} &{} \quad \widehat{A}_{12}\\ \widehat{A}_{12}' &{} \quad \widehat{A}_{22} \end{array}\right) , \end{aligned}$$

and

$$\begin{aligned}&\widehat{A}_{11}=\frac{1}{n}\sum _{i=1}^{n}\int _{0}^{\tau }Q(t)\big \{ X_{i}-\bar{X}(t)\big \}\big \{ X_{i}-\bar{X}(t)\big \}'\Delta _{i}(t)dt,\\&\widehat{A}_{12}=\frac{1}{n}\sum _{i=1}^{n}\int _{0}^{\tau }Q(t)\big \{ X_{i}-\bar{X}(t)\big \}\{\widehat{V}_{i}-\bar{V}(t)\big \}\Delta _{i}(t)dt,\\&\widehat{A}_{22}=\frac{1}{n}\sum _{i=1}^{n}\int _{0}^{\tau }Q(t)\big \{\widehat{\Omega }_{i}-\widehat{V}_{i}\bar{V}(t)\big \}\Delta _{i}(t)dt. \end{aligned}$$

By using the law of large numbers and the consistency of \(\widehat{\gamma }\) and \(\widehat{\Lambda }_{0}(t)\) (Wang et al. 2001), one can show that \(\widehat{\eta }\) and \(\widehat{\theta }\) are consistent. To establish the asymptotic normality of \(\widehat{\eta }\) and \(\widehat{\theta }\), let \(T_{ij}\) denote the occurrence time of the jth event of the ith subject, and let \(P_{1n}(x,y,m)\) and \(P_{2n}(x,y,m,\delta )\) denote the joint probability measures of \((X_{i},Y_{i},m_{i})\) and \((X_{i},Y_{i},m_{i},\delta _{i})\), respectively. Here we assume that the random vectors \((X_{i},Y_{i},m_{i})\) and \((X_{i},Y_{i},m_{i},\delta _{i})\) are defined in the same sample space for all subjects i, and are independent and identically distributed, so the probability measures \(P_{1n}\) and \(P_{2n}\) do not depend on i. Define

$$\begin{aligned}&\widehat{H}(t)=\frac{1}{n}\sum _{i=1}^{n}\sum _{j=1}^{m_{i}}I(T_{ij}\le t),\\&\widehat{R}(t)=\frac{1}{n}\sum _{i=1}^{n}\sum _{j=1}^{m_{i}}I(T_{ij}\le t\le Y_{i}),\\&\widehat{\kappa }_{i}(t)=\sum _{j=1}^{m_{i}}\Big \{\int _{t}^{\tau }\frac{I(T_{ij}\le u\le Y_{i})d{\widehat{H}}(u)}{{\widehat{R}}^{2}(u)}-\frac{I(t<T_{ij}\le \tau )}{{\widehat{R}}(T_{ij})}\Big \},\\&\widehat{e}_{i}=X_{i}^{*}\Big [\frac{m_{i}}{\widehat{F}(Y_{i})}-\exp \{\widehat{\alpha }'X_{i}^{*}\}\Big ]-\int \frac{x^{*}m\widehat{\kappa }_{i}(y)dP_{1n}(x^{*},y,m)}{{\widehat{F}}(y)}, \end{aligned}$$

and

$$\begin{aligned} \widehat{D}_{1}=\frac{1}{n}\sum _{i=1}^{n}\exp \{\widehat{\alpha }'X_{i}^{*}\}X_{i}^{*\otimes 2}, \end{aligned}$$

where \(v^{\otimes 2}=vv'\) for a column vector v. Furthermore, let \(\widehat{\phi }_{1i}\) denote the vector \(\widehat{D}_{1}^{-1}\widehat{e}_{i}\) without the first entry and \(\widehat{\phi }_{2i}\) denote the first entry of \(\widehat{D}_{1}^{-1}\widehat{e}_{i}\). Set \(\widehat{\varphi }_{i}(t)=\widehat{\kappa }_{i}(t)+\widehat{\phi }_{2i}\), \(\widehat{b}_{i}(c,w)=\widehat{\varphi }_{i}(c)+\widehat{\phi }_{1i}'w\) and \(\widehat{\xi }_{i}=(\widehat{\xi }_{1i}',\widehat{\xi }_{2i})'\), where

$$\begin{aligned} \widehat{\xi }_{1i}=&\int _{0}^{\tau }Q(t)\{X_{i}-\bar{X}(t)\}d\widehat{M}_{i}^{D}(t)\\&+\widehat{\theta }\int _{0}^{\tau }Q(t)\int \{x-\bar{X}(t)\}\frac{mI(y\ge t)}{\widehat{\Lambda }_{0}(y)e^{\widehat{\gamma }'x}}\widehat{b}_{i}(y,x)dP_{1n}(x,y,m)dt, \end{aligned}$$

and

$$\begin{aligned} \widehat{\xi }_{2i}=&\int _{0}^{\tau }Q(t)\Big [\{\widehat{V}_{i}-\bar{V}(t)\}\big \{ dN_{i}^{D}(t)-\Delta _{i}(t)\big (\widehat{\eta }'X_{i}dt+d\widehat{\mathcal {A}}_{0}(t)\big )\big \}\\&-\widehat{\theta }\{\widehat{\Omega }_{i}(t)-\widehat{V}_{i}\bar{V}(t)\}\Delta _{i}(t)dt\Big ]-\int Q(y)\frac{m\delta }{\widehat{\Lambda }_{0}(y)e^{\widehat{\gamma }'x}}\widehat{b}_{i}(y,x)dP_{2n}(x,y,m,\delta )\\&+\int _{0}^{\tau }Q(t)\int \{\widehat{\eta }'x-\widehat{\theta }\bar{V}(t)\}\frac{mI(y\ge t)}{\widehat{\Lambda }_{0}(y)e^{\widehat{\gamma }'x}}\widehat{b}_{i}(y,x)dP_{1n}(x,y,m)dt\\&+\int _{0}^{\tau }Q(t)\int \frac{mI(y\ge t)}{\widehat{\Lambda }_{0}(y)e^{\widehat{\gamma }'x}}\widehat{b}_{i}(y,x)dP_{1n}(x,y,m)d\widehat{\mathcal {A}}_{0}(t)\\&+\widehat{\theta }\int _{0}^{\tau }Q(t)\int \frac{2m(m-1)I(y\ge t)}{\{\widehat{\Lambda }_{0}(y)e^{\widehat{\gamma }'x}\}^{2}}\widehat{b}_{i}(y,x)dP_{1n}(x,y,m)dt, \end{aligned}$$

where \(d\widehat{M}_{i}^{D}(t)=dN_{i}^{D}(t)-\Delta _{i}(t)\{\widehat{\eta }'X_{i}+\widehat{\theta }\widehat{V}_{i}\}dt-\Delta _{i}(t)d\widehat{\mathcal {A}}_{0}(t).\) The asymptotic joint normality of \(\widehat{\eta }\) and \(\widehat{\theta }\) is established in the following theorem with the proof given in the Appendix.

Theorem 2.1

Under the regularity conditions (R1)-(R4) stated in the Appendix, \(n^{1/2}(\widehat{\eta }-\eta _{0})\) and \(n^{1/2}(\widehat{\theta }-\theta _{0})\) have asymptotic joint normal distribution with mean zero and covariance matrix that can be consistently estimated by \(\widehat{A}^{-1}\widehat{\Sigma }\widehat{A}^{-1},\) where \(\widehat{\Sigma }=n^{-1}\sum _{i=1}^{n}\widehat{\xi }_{i}^{\otimes 2}.\)

We note finally that the estimates \(\widehat{\mathcal {A}}_{0}(t;\widehat{\eta },\widehat{\theta })\) and \(\widehat{\Lambda }_{0}(t)\) may not be non decreasing functions. To correct for this, we could replace them above by

$$\begin{aligned} \widehat{\mathcal {A}}_{0}^{\star }(t;\widehat{\eta },\widehat{\theta })= & {} \max _{s\le t}\widehat{\mathcal {A}}_{0}(s;\widehat{\eta },\widehat{\theta })\\ \widehat{\Lambda }_{0}^{*}(t)= & {} \max _{s\le t}\widehat{\Lambda }_{0}(s) \end{aligned}$$

as suggested in Lin and Ying (1994). Lin and Ying (1994) pointed out however that under regularity conditions, we would have \(\widehat{\mathcal {A}}_{0}^{\star }(\cdot )-\widehat{\mathcal {A}}_{0}(\cdot )=o_{p}(n^{-\frac{1}{2}})\) and \(\widehat{\Lambda }_{0}^{*}(\cdot )-\widehat{\Lambda }_{0}(\cdot )=o_{p}(n^{-\frac{1}{2}})\), so that the asymptotic distribution of \(\left( n^{1/2}(\widehat{\eta }-\eta _{0}),n^{1/2}(\widehat{\theta }-\theta _{0})\right) \) would be unaffected.

2.2 Simulation studies

In this section, we conduct simulation studies to examine the finite sample performance of the estimators proposed for the models in Sect. 2.1. In the simulation studies, for each subject i, we generate the covariate \(X_{i}=(X_{i1},X_{i2})'\) where \(X_{i1}\) is generated from a Bernoulli distribution with success probability 0.5 and \(X_{i2}\) is generated from a uniform distribution on (0,1). The latent variable \(v_{i}\) is generated from a gamma distribution with mean 1 and variance 1 if \(X_{i1}=1\) and a uniform distribution on (0.5, 1.5) otherwise. It can be easily verified that \(v_{i}\) is non negative and \(E(v_{i}|X_{i})=1\). Given \(X_{i}\) and \(v_{i}\), the terminal event time \(D_{i}\) is assumed to follow an additive hazards model:

$$\begin{aligned} \alpha _{i}(t)=1+\eta _{0}'X_{i}+\theta _{0}v_{i}, \end{aligned}$$

where \(\eta _{0}=(-0.5,0.5)'\) or \((0.5,-0.5)'\), and \(\theta _{0}=-0.5\) or 0.5. The censoring time \(C_{i}\) is taken as \(\min (E+0.5,\tau )\), where E is exponentially distributed with mean 1 and the end time \(\tau \) is taken to be \(\tau =2.5\), which yields a censoring percentage ranging between 22 and \(39\,\%\).

For the recurrent event process, given \(X_{i}\), \(v_{i}\) and \(Y_{i}=D_{i}\wedge C_{i}\), the recurrent event times are generated from a Poisson process with the intensity function:

$$\begin{aligned} \lambda _{i}(t)=v_{i}\exp \{\gamma _{0}'X_{i}\}\lambda _{0}(t), \end{aligned}$$

where \(\gamma _{0}=(-0.5,0.5)'\) or \((0.5,-0.5)'\), and \(\lambda _{0}(t)=2.5(1+t)\). The average number of the recurrent events per subject ranges from 2.31 to 4.23 for different model parameters. For each simulation study, we take the weighting function to be \(Q(\cdot )\equiv 1.\) Sample sizes are chosen to be \(n=200\) and \(n=300.\) In order to help assess the accuracy of our simulation estimates, we apply a stratified sampling scheme in the simulation. For each n, we simulate 4000 replications in \(m=20\) groups of 200 replications each. For each replication, we compute the estimator, and average it over the 200 replications within each of the m groups (strata). By the Central Limit Theorem, these 20 averages are nearly iid and normally distributed, and thus a confidence interval for the true parameter using the standard confidence interval based on the t-distribution with \(m-1=19\) degrees of freedom can be computed from the tabled values using the standard errors in parentheses next to each simulation estimate. This allows one to assess how many significant digits are likely to be accurate in the simulation estimates.

Tables 1, 2, and 3 present the simulation results for the estimation of \(\gamma _{0}\), \(\eta _{0}\) and \(\theta _{0}\) respectively. The tables include the bias (BIAS) which is equal to the estimated value minus the true value, the model-based standard deviation estimates (SE), the non model-based standard deviation estimates (ESE), and the empirical coverage probabilities (CP) for the normal-approximation-based nominal 95 % confidence interval based on Theorem 2.1. As mentioned above, the stratified sampling allows us to calculate the simulation standard errors of all the estimates, and these standard errors are reported in parentheses after each estimate in the tables.

To make all the definitions clear, and take the estimation of \(\eta \) for example, let \(\widehat{\eta }_{i}\) be the estimate of \(\eta \) for the ith simulation run, where \(i=1,2,\ldots ,4000\). Also let \(\widehat{\sigma }_{i}\) be the estimated standard deviation of \(\widehat{\eta }_{i}\) according to Theorem 2.1. Define \(\bar{\eta }_{j}\) and \(\bar{\sigma }_{j}\) be the mean of \(\widehat{\eta }_{i}\) and \(\widehat{\sigma }_{i}\) respectively within the jth stratum, where \(j=1,2,\ldots ,20\). Also let \(\tilde{\sigma }_{j}\) denote the sample standard deviation of \(\widehat{\eta }_{i}\) within the the jth stratum. SE and ESE are the average values of \(\{\bar{\sigma }_{j}\}\) and \(\{\tilde{\sigma }_{j}\}\) respectively over all j, and the sample standard deviations of \(\{\bar{\eta }_{j}\}\), \(\{\bar{\sigma }_{j}\}\) and \(\{\tilde{\sigma }_{j}\}\) over all j and then divided by \(\sqrt{m}\) give the simulation standard errors of all three estimates.

Tables 1, 2, and 3 show that our proposed method performs reasonably well. Specifically, the proposed estimators are usually close to the true parameter values, but appear to be slightly biased. We followed up by checking our simulation results based on a larger range of values of n and verified (statistically) that the bias is \(O(n^{-1})\) and therefore negligible in terms of the asymptotic normality result of Theorem 2.1, which makes our estimation method at least comparable to the Monte Carlo EM algorithm proposed by Liu et al. (2004). There is also good agreement between the model-based and non model-based standard deviations. The performance of the proposed estimator becomes better when the sample size increases from 200 to 300 as expected. And, the confidence intervals based on Theorem 2.1 all have reasonable estimated coverage probabilities.

Table 1 Simulation results for estimation of \(\gamma _{0}\)
Table 2 Simulation results for estimation of \(\eta _{0}\)
Table 3 Simulation results for estimation of \(\theta _{0}\)

3 An application

Now we apply the proposed method to the hospitalization data for heart failure patients from the clinical data repository at University of Virginia Health System. They investigated 1475 heart failure patients in the study, in which about \(20\,\%\) of the patients died before the censoring time. In the study, they measured three baseline covariates for each patient: race, gender and age (ranged from 60 to 90). For individual i, let \(X_{i1}\) be a binary indicator of race (white = 1, nonwhite = 0), \(X_{i2}\) be a binary indicator of gender (male = 1, female = 0) and \(X_{i3}\) be the regulated age (centered at 72 and divided by 10). The dataset also consists of the information of longitudinal medical cost, however, we disregard this part of the data because we are only interested in the analysis of the hospitalization as recurrent event data.

Table 4 shows the application results of model (1) for the recurrent hospitalizations using the procedure proposed in Sect. 2.1. Table 5 shows the application results of model (2) for the terminal event with \(Q(\cdot )\equiv 1\) comparing with those in Lin and Ying (1994) without taking into account of the recurrent events.

Table 4 Application results for recurrent events
Table 5 Application results for terminal events
Fig. 1
figure 1

The residual plot of the failure time process versus age for the subjects in the four groups

Est is the estimate of the parameter, and SE is the standard error estimate. The results using our model suggest that white patients tend to be at less risk for hospitalization, and so are younger patients. Age also has a significant effect on the hazard rate of failure, but race is not so significant. The estimate of the parameter \(\theta _{0}\) indicates that, not surprisingly in this application, the correlation between the death rate and the rate of hospitalization is positive, significant at \(4.08\times 10^{-15}\) level. When comparing our results to those from the additive hazards model alone, we see that covariate effects, especially that of gender, diminish after adjusting for the frailty term, which further strengthens our belief that association between the recurrent event process and the terminal event process is important to be taken into account.

To check the adequacy of the model for the data, following Lin et al. (2000) and Zeng and Cai (2010), we examine the total summation of the residuals for each subject

$$\begin{aligned} \widehat{M}_{i}^{D}=\int _{0}^{Y_{i}}\Big [dN_{i}^{D}(t)-\{\widehat{\eta }'X_{i}+\widehat{\theta }\widehat{V}_{i}\}dt-d\widehat{\mathcal {A}}_{0}(t;\widehat{\eta },\widehat{\theta })\Big ], \end{aligned}$$

which has an approximate mean zero and should be approximately independent of \(X_{i}\) under model (2). Thus, a simple graphical procedure for assessing the adequacy of the assumed model is to plot the residual against the covariate \(X_{i}\). In Fig. 1, we divide the patients into four different groups based on their race and gender (Male and White, Male and Nonwhite, Female and White, Female and Nonwhite), and plot the cumulative residuals \(\widehat{M}_{i}^{D}\) versus age for each group. It can be seen from the figure that the residuals have no trend, which indicates that our model fits the data well.

More formally, we apply some model checking techniques to check the goodness-of-fit of the proposed model by doing randomness tests on the distribution of the Median-Absolute-Deviation (MAD) normalized residuals, where the MAD of the residuals is defined as

$$\begin{aligned} MAD=median_{i}(|\widehat{M}_{i}^{D}-median_{j}(\widehat{M}_{j}^{D})|). \end{aligned}$$

First we define the measure of discrepancy between two cumulative distributions functions (CDF) to be \(D(\mathcal {F}_{1},\mathcal {F}_{2})=\sup _{-\infty \le t\le \infty }|\mathcal {F}_{1}(t)-\mathcal {F}_{2}(t)|\). Let \(\mathcal {G}\) be the empirical CDF of the MAD-divided residuals from the fit to the hospitalization data. Then we generate \(n=20\) independent sets of residuals by fitting the model using the estimated parameters from the fit to the hospitalization data, which gives us n empirical CDFs that are independent and identically distributed estimates of the null CDF of the normalized residuals. Let \(\mathcal {F}_{n}\) be the average of these n empirical CDFs and \(D(\mathcal {G},\mathcal {F}_{n})\) the discrepancy between the real data fit from the simulated null distribution.

To estimate the p-value associated with \(D(\mathcal {G},\mathcal {F}_{n})\), we apply bootstrap simulations of \(N=1000\) replications to generate N independent and identically distributed observations \(\{D_{1},D_{2},\ldots ,D_{N}\}\). For each replication i (\(i=1,2,\ldots ,N\)), we generate \(n=20\) new independent runs of the simulation using real-data fitted parameters, and again this gives us n estimates of the null CDFs of the normalized residuals. Let \(\mathcal {H}_{n}\) be the average of these n empirical CDFs. We then generate one additional empirical CDF \(\mathcal {T}\) using the real-data fitted parameters, and \(D_{i}=D(\mathcal {T},\mathcal {H}_{n})\). The bootstrap estimate of the p-value associated with \(D(\mathcal {G},\mathcal {F}_{n})\) is the fraction of cases in which \(D_{i}<D(\mathcal {G},\mathcal {F}_{n}),i=1,2,\ldots ,N\).

The algorithm yields the statistic \(D(\mathcal {G},\mathcal {F}_{n})=0.021\) with a p-value of 0.430 based on 1000 realizations of \(D(\mathcal {T},\mathcal {H}_{n})\). This result indicates that there is no significant evidence to reject the goodness-of-fit of our model.

4 Conclusions

Frailty models are common for modeling jointly repeated measures and survival time data (Henderson et al. 2000; Lin et al. 2002). Here, we have proposed a joint model of the recurrent event process and the terminal event through a common subject-specific latent variable, in which the proportional intensity model is used for modeling the recurrent event process and the additive hazards model is used for modeling the terminal event time. The latent variable (frailty) is assumed to act as a multiplicative factor in the intensity function and an additive factor in the hazard function, and hence induces correlation between the recurrent event process and the terminal event time. A specific feature of our proposed model is that the frailty distribution is treated as a nuisance parameter and no parametric assumptions are imposed. Our estimation procedures can be easily implemented, and simulation results show that the proposed methods work well.

Our analysis is applicable under settings where recurrent event data are available for a large number of processes exhibiting a relatively small number of recurrent events. These types of processes arise frequently in medical studies, where information is often available on many individuals, each of whom may experience transient clinical events repeatedly over a period of observation. If the study ends not only with the censoring scheme but also can end with a terminal event such as death, this is where our proposed model would be most useful. Therefore our work can significantly influence applications in the biomedical fields. More precisely, by performing inferences based on the estimated parameters, we are able to analyze different effects of all kinds of possible determinants (such as age, gender, race, environmental condition, genetic information, or medical treatment) on the rates of the repeated clinical events and the risk of the death, and also to infer the correlation between the recurrent event rate and the risk of the terminal event . Note that a terminal event does not necessarily have to be death. For example, a major injury that causes the retirement of a race horse can be treated as a terminal event, while the injuries during its career are considered to be the recurrent event process. All these types of data sets can be analyzed through our model.

In applying our model to the heart failure data, the frailty term coefficient in (2) allows us to determine that the recurrent event process (the rate of hospitalization) and the terminal event process (death rate) are significantly correlated (positively, in this case). To our knowledge, ours is the first model of this type to allow explicit inference on this correlation, making it particularly important in biomedical applications.