1 Introduction

Standard survival data analysis involves modeling time to the occurrence of an event, where the event is assumed to occur once for a given subject over time (Lawless 2003). However, more complicated situations may arise in practice. We consider two such commonly encountered problems in survival analysis: (a) recurrent event data analysis (Cook and Lawless 2007), where an individual may experience an event multiple times over follow-up (e.g., recurrent heart attacks of coronary patients); and (b) joint modeling for single-event survival data (Wulfsohn and Tsiatis 1997), where the event time distribution depends on a longitudinally measured internal covariate (e.g., CD4 cell levels may be recorded longitudinally along with an observation of the onset of AIDS).

In many clinical and biomedical studies, the survival event may not necessarily be fatal. Thus, an individual can experience the event repeatedly over time. Such processes are called recurrent event processes, and the data generated by such processes are called recurrent event data (for details, see Cook and Lawless 2007). Over the last few decades, there has been a considerable amount of discussion on methods for recurrent event data analysis based on an extension of the Cox proportional hazards (PH) model. A few key ones include methods based on the counting process approach (Andersen and Gill 1982), methods based on gap times between events (Prentice et al. 1981), methods for analyzing multivariate failure time data (Wei et al. 1989), and methods for PH frailty models (Therneau and Grambsch 2000). Conventional accelerated failure time (AFT) models can also be extended to formulate the effects of covariates on the mean function of the counting process for recurrent events. For example, semi-parametric methods for AFT models were considered by Cheng (2004), Ghosh (2004) and Lin et al. (1998). A number of authors also considered generalization of the AFT models for recurrent event data analysis. For example, a general class of accelerated means regression models was considered by Sun and Su (2008), a class of additive-accelerated means regression models was proposed by Li et al. (2010), AFT modelling strategy that shares the same spirit as quantile regression was considered by Huang and Peng (2009) and Zhao et al. (2016), and AFT-type marginal models were discussed by Xu et al. (2017).

The other problem considered in this paper is joint modeling, where a longitudinal response is observed along with an observation of the time to the occurrence of an event. A typical goal in such studies is to investigate the effect of the longitudinal response (internal covariate for the event process) on the time to occurrence of the event. As discussed by Tsiatis and Davidian (2004), some common issues encountered in such studies include (a) measurement error in the longitudinal response, and (b) missing information over time (longitudinal response measurements are usually collected only intermittently over time, leading to missing observations). Failure to account these issues leads to biased estimates of the regression parameters (Rizopoulos 2012). The modern approach to analyze these types of data involves two separate models: a model that takes into account measurement error and missing data in the internal time-dependent covariate to construct the complete longitudinal history for each individual (longitudinal model), and another model that uses the longitudinal history of each individual to quantify association between the internal covariate and the time to the occurrence of the event (time-to-event model). The motivating idea behind the joint modeling techniques is to link the longitudinal model with the time-to-event process through shared random effects (Henderson et al. 2000; Rizopoulos 2012; Tsiatis and Davidian 2004; Wulfsohn and Tsiatis 1997). There has been a considerable amount of discussion on methods for joint modeling over the last few years. For example, the EM or the Monte Carlo EM algorithm was developed to estimate parameters in the joint models (Tseng et al. 2005; Wu et al. 2010; Zhang and Wu 2019); the Bayesian method was also developed for estimation (Rizopoulos et al. 2014); the dependence of longitudinal measurements on the survival process through rescaling the time index was considered by Cai et al. (2017); the use of some latent features of the longitudinal component in the survival submodel instead of the whole longitudinal component as a covariate was discussed by Dong et al. (2019); joint modeling for a nonlinear mixed-effects model with missing data was considered by Wu et al. (2010); and extensions to the survival submodels were considered by Elashoff et al. (2008) for competing risks problems, Chi and Ibrahim (2006) to model cure fractions, and (Chen et al. 2004; Tseng et al. 2005 to go beyond PH models.

The most popular approach for recurrent event data analysis and joint modeling is to assume a PH model to describe the event process, where the covariates act multiplicatively on the hazard function (Kalbfleisch and Prentice 2002; Lawless 2003). For example, the semi-parametric Cox PH model (Cox 1972) is widely used for recurrent event data analysis, whereas parametric PH models are commonly considered to describe the time-to-event process for joint modeling. Note that the use of the semi-parametric Cox PH in joint modeling usually leads to an underestimation of the standard errors of the parameter estimates (Hsieh et al. 2006; Rizopoulos 2012), and therefore most methods for joint modeling are based on parametric response distributions (Hwang and Pennell 2014). The key assumption in the PH model is the proportionality assumption concerning the effects of the covariates. Although there are well-known techniques to test the proportionality assumption for the Cox PH model (e.g., Schoenfeld 1980), checking this assumption for joint modeling has received less attention in the literature. An alternative framework involves considering an AFT model, where the covariates act multiplicatively on time (i.e., the effect of covariates is to accelerate or decelerate event time). Since an AFT model is formulated to allow the influence of the outcome by the entire covariate process, it is also considered as an excellent alternative to the PH model for time-varying covariates (Zhang and Wu 2019). The AFT models are also in general less restrictive (Cook and Lawless 2007). The general framework of recurrent event models based on the AFT family of distributions has been discussed in Cook and Lawless (2007), though these models are not in common use perhaps because of the lack of implementation and computing resources. For joint modeling, Tseng et al. (2005) described the maximum likelihood approach under the AFT assumption, and highlighted some challenges of modeling the baseline hazard function in a likelihood setting. To circumvent this problem, they considered a piecewise constant baseline hazard function.

The Weibull, log-logistic and log-normal distributions are the most commonly used AFT models for time-to-event data analysis. The log-logistic and log-normal distributions are useful to model unimodal hazard shapes, whereas the Weibull model is widely used to characterize monotone increasing and decreasing hazard functions (Lawless 2003). The generalized gamma (GG) distribution (Stacy 1962) is a more flexible AFT model, which includes the Weibull, gamma and log-normal distributions as special cases. It accommodates all four standard shapes of the hazard function (increasing, decreasing, unimodal and buthtub shape), and can be used to model data with wide ranging characteristics. Mudholkar and Srivastava (1993, 1995) proposed the exponentiated Weibull (EW) distribution that also accommodates all four standard shapes of the hazard function. It includes the Weibull, exponentiated exponential and Burr type X distributions as special cases. Cox and Matheson (2014) showed that the EW could be a convenient alternative to the GG distribution, and suggested further investigation to have a deeper understanding of the EW distribution as a lifetime model. For completeness of description, a summary of these distributions is presented in Table 1.

Table 1 Survival functions and hazard functions for some common AFT models for a non-negative random variable T (the inverse-scale parameter is denoted by \(\rho \), and the shape parameters are denoted by \(\upkappa \) and \(\gamma \))

As far as computational resources are concerned, the eha package (Broström 2018) in R has implemented only the basic AFT models for recurrent event data analysis (Weibull, log-logistic and log-normal), whereas the main focus of the JM package (Rizopoulos 2010) for joint modeling is on the PH models (only one AFT model has been implemented, which is the Weibull AFT). As described above, there are richer AFT models (e.g., generalized gamma, exponentiated Weibull) that can describe data with wide ranging characteristics but have received far less attention in modeling recurrent event data and joint analysis of time-to-event and longitudinal data. Note that the maximum likelihood method is convenient and easy to implement for recurrent event data analysis. On the other hand, fitting joint models is a computationally intensive task as it requires to approximate multiple integrals that do not have an analytic solution except in very special cases. In particular, numerical methods such as Gaussian quadratures can be computationally very intensive and may offer non-convergence (Zhang and Wu 2019). Hence, it may be advantageous to apply a Bayesian approach for joint modeling and the maximum likelihood method for recurrent event analysis. In the Bayesian paradigm, asymptotic approximations are not necessary, model assessment is more straightforward, and computational implementation is typically much easier (Gould et al. 2015).

In this article, we develop methods for recurrent event data analysis and joint modeling by considering the AFT models of Table 1 to describe the event process. In particular, we consider two very flexible AFT models (generalized gamma and exponentiated Weibull), which, to our knowledge, have never been discussed extensively for recurrent event data analysis and joint modeling. Note that many authors have shown the benefits of undertaking a flexible parametric analysis while gaining the advantages of a parametric approach (e.g., the flexible modeling of the hazard function and of time-dependent effects, and estimation of the survival probabilities). For joint modeling, AFT models are biologically meaningful and allows the entire covariate history to influence subject-specific risk (Tseng et al. 2005). In the light of our discussion above, we consider the maximum likelihood method for recurrent event analysis and Bayesian approach for joint modeling. We also develop computational algorithms for statistical inference, implemented in the R statistical software (Core 2020). Finally, an R package has been developed to fit these models, which includes several utility functions that can extract useful information from fitted models. This work can be seen as a complement to the existing work on recurrent event data analysis and joint modeling.

In Sect. 2, we present the modeling framework, maximum likelihood method for statistical inference, model diagnostics and computational techniques for recurrent event data analysis. Joint modeling techniques in the Bayesian framework are described in Sect. 3. A Monte Carlo simulation study is presented in Sect. 4, investigating the relative performance of the models for recurrent event data analysis and joint modeling. In Sect. 5, the practical relevance of these models is demonstrated with applications to two real datasets. We conclude in Sect. 6 by summarizing our findings.

2 Recurrent event modeling

Let \(\{\mathbf {z}(t),t\ge 0\}\) be a left-continuous \(p\times 1\) covariate process which includes only external covariates, with \(\mathbf {z}(t)=(z_1(t),z_2(t),\ldots ,z_p(t))'\). The intensity function of the AFT model can be expressed as

$$\begin{aligned} \lambda (t)=\lambda _0(g(t))\exp \{-\mathbf {z}'(t)\varvec{\beta }\}, \end{aligned}$$
(1)

where \(\varvec{\beta }\) is the vector of regression coefficients of the same length as \(\mathbf {z}(t)\), \(\lambda _0(\cdot )\) is the baseline intensity function (corresponds to an individual for whom \(\mathbf {z}(t)=\mathbf {0}\) for all \(t\ge 0\)), and \(g(t)=\int _0^t \exp \{-\mathbf {z}'(u)\varvec{\beta }\}du\) (Cook and Lawless 2007). Note that g(t) can be considered as a transformed time scale defined by the covariate process, and hence (1) is commonly known as a time transform model. The corresponding mean function is

$$\begin{aligned} \Lambda (t)=\Lambda _0(g(t)), \end{aligned}$$
(2)

where \(\Lambda _0(t)=\int _0^t \lambda _0(u)du\). We assume that \(\lambda _0(t)\) takes the form of a Weibull, log-logistic, log-normal, EW or GG intensity function (see Table 1).

2.1 Maximum likelihood method

Suppose m subjects are each under observation from time \(t_0 = 0\) to a censoring or stopping time. For the ith subject (\(i=1,2,\ldots ,m\)), the time-dependent covariates are assumed to be constant between assessment times \(0=t_{i0}<t_{i1}<\ldots <t_{in_i}=t_i\). Let \(z_{ijk}\) be the value of the kth covariate (\(k=1,2,\ldots ,p\)) for the ith subject in the jth interval \((t_{i,j-1},t_{ij}]\), \(j=1,2,\ldots ,n_i\), with \(\mathbf {z}_{ij}=(z_{ij1},z_{ij2},\ldots ,z_{ijp})'\). Using (1), the intensity function for subject i at time \(t_{ij}\) can be written as

$$\begin{aligned} \lambda _i(t_{ij})=\lambda _0(t_{ij}\exp (-\mathbf {z}'_{ij}\varvec{\beta }))\exp (-\mathbf {z}'_{ij}\varvec{\beta }). \end{aligned}$$
(3)

Let \(d_{ij}\) be the event status (1 = failure, 0 = censored) for the ith subject in the jth time interval \((t_{i,j-1},t_{ij}]\). Letting \(\varvec{\theta }\) be the full vector of parameters, the likelihood function based on data from m independent individuals is

$$\begin{aligned} L(\varvec{\theta })=\prod _{i=1}^m L_i(\varvec{\theta }), \end{aligned}$$

where

$$\begin{aligned} L_i(\varvec{\theta })&=\prod _{j=1}^{n_i} \bigg [\lambda _i(t_{ij})\exp \bigg \{-\int _{t_{i,j-1}}^{t_{ij}}\lambda _i(s)ds\bigg \}\bigg ]^{d_{ij}} \times \bigg [\exp \bigg \{-\int _{t_{i,j-1}}^{t_{ij}}\lambda _i(s)ds\bigg \}\bigg ]^{1-d_{ij}}&\nonumber \\&=\prod _{j=1}^{n_i} \big [\lambda _0(t_{ij}\exp (-\mathbf {z}'_{ij}\varvec{\beta }))\exp (-\mathbf {z}'_{ij}\varvec{\beta })\big ]^{d_{ij}}\times \exp \bigg \{-\int _{t_{i,j-1}e^{-\mathbf {z}'_{ij}\varvec{\beta }}}^{t_{ij}e^{-\mathbf {z}'_{ij}\varvec{\beta }}}\lambda _0(u)du\bigg \}&\nonumber \\&=\prod _{j=1}^{n_i} \big [\lambda _0(g_i(t_{ij}))\exp (-\mathbf {z}'_{ij}\varvec{\beta })\big ]^{d_{ij}}\times \exp \{\Lambda _0(g_i(t_{i,j-1}))-\Lambda _0(g_i(t_{ij}))\},&\end{aligned}$$
(4)

with \(g_i(t_{ij})=t_{ij}\exp {(-\mathbf {z}'_{ij}\varvec{\beta })}\). As an example, the log-likelihood function using the EW hazard function can be written as

$$\begin{aligned} \ell (\varvec{\theta })&=\sum _{i=1}^{m}\sum _{j=1}^{n_i} d_{ij}\Big \{\log {\upkappa }+\log \gamma +\upkappa \log \rho +(\upkappa -1)\log (g_i(t_{ij}))&\nonumber \\&\quad +(\gamma -1)\log (1-e^{-(\rho g_i(t_{ij}))^\upkappa })-(\rho g_i(t_{ij}))^\upkappa&\nonumber \\&\quad -\log [1-(1-e^{-(\rho g_i(t_{ij}))^\upkappa })^\gamma ]-\mathbf {z}'_{ij}\varvec{\beta }\Big \}&\nonumber \\&\quad -\sum _{i=1}^{m}\sum _{j=1}^{n_i} \log \bigg \{\frac{1-(1-e^{-(\rho g_i(t_{i,j-1}))^\upkappa })^\gamma }{1-(1-e^{-(\rho g_i(t_{ij}))^\upkappa })^\gamma }\bigg \},&\end{aligned}$$
(5)

where \(\rho >0\) is the inverse-scale parameter and \(\upkappa ,\gamma >0\) are the shape parameters of the EW distribution, and \(\varvec{\theta }=(\rho ,\upkappa ,\gamma ,\varvec{\beta }')'\). Other models where \(\lambda _0(t)\) takes the form of a Weibull, log-normal, log-logistic or a GG distribution hazard function can similarly be handled.

Tests and interval estimates for the model parameters are based on the approximate normality of the maximum likelihood estimators (Cook and Lawless 2007): \((\varvec{\hat{\theta }}-\varvec{\theta }) \sim N(\mathbf {0},I^{-1}(\varvec{\hat{\theta }}))\), where \(I(\varvec{\hat{\theta }})\) is the observed information matrix evaluated at \(\varvec{\hat{\theta }}\). An alternative is to use the likelihood ratio method, which is particularly useful to test hypotheses concerning nested models. For example, the Weibull model is nested within the EW and GG models, and hence the likelihood ratio test can be employed to check the adequacy of the Weibull fit with respect to the EW or GG fit. Note that the Akaike information criterion (AIC) (Akaike 1974) is also useful for comparisons between a number of possible models.

The counting process format is useful for numerical computations, involving multiple lines of data for each subject. This approach considers dividing the total follow-up time for each subject into smaller time intervals to allow for recurrent events. In particular, there are two time points specified in each line for a subject: \(t_{i,j-1}\) and \(t_{ij}\), representing the start time and stop time for subject i in interval j, respectively. Note that start and stop contain the left truncation times and event/censoring times, respectively, whereas status (censoring indicator) indicates which times are event times. General optimization software can be employed to maximize \(\ell (\varvec{\theta })\) to obtain the maximum likelihood estimate \(\varvec{\hat{\theta }}\). The simplest computational approach is to maximize \(\ell (\varvec{\theta })\) using an optimization software that does not require expressions for derivatives, and gives an estimate of the asymptotic covariance matrix of \(\varvec{\hat{\theta }}\) using numerical differentiation. We have written an R package, JMR, that fits recurrent event models using the Weibull, log-logistic, log-normal, EW and GG distributions (the package is available in GitHub, https://github.com/sa4khan/JMR). It also has some useful functions, implementing the likelihood ratio test to check the adequacy of the Weibull fit with respect to the EW or GG fit, the computation of AIC, and residual plots (see below).

2.2 Generalized residuals

If all covariates are external, then the random variable

$$\begin{aligned} E_{ij}=\int _{T_{i,j-1}}^{T_{ij}}\lambda _i(u)du, ~~j=1,2,\ldots ,n_i \end{aligned}$$
(6)

has a standard exponential distribution with survivor function \(\exp (-u), u > 0\), where \(T_{i,j-1}\) and \(T_{ij}\) are the start and stop times for subject i in interval j. Generalized residuals \(\hat{E}_{ij}\) are obtained by replacing \(T_{ij}\) with the observed time \(t_{ij}\) and \(\lambda _i(u)\) with the maximum likelihood estimate \(\hat{\lambda }_i(u)\) in (6). For sufficiently large samples, the \(\hat{E}_{ij}\) should be similar to independent standard exponential random variables if the specifications \(\lambda _i(u)\) are correct (Cook and Lawless 2007). Thus, a plot of \(-\log {S_{\text {KM}}(\hat{E}_{ij})}\) versus \(\hat{E}_{ij}\) should be roughly a straight line with unit slope when the model is adequate, where \(S_{\text {KM}}(\cdot )\) denotes the Kaplan-Meier estimate (Kaplan and Meier 1958) of the survivor function.

3 Joint modeling

Henderson et al. (2000) proposed a flexible joint model, linking the linear random-effects model for longitudinal data with the Cox PH model for survival data via a latent bivariate process. The maximum likelihood method was considered for statistical inference. Considering AFT models to describe the event process, we propose a fully Bayesian version of the joint modeling approach of Henderson et al. (2000), including its implementation via Markov Chain Monte Carlo (MCMC) methods.

Let there be m subjects with lifetimes denoted by \(T_1,T_2,\ldots ,T_m\). Assuming that the data are subject to right censoring, we observe \(t_i=\min (T_i,C_i)\), where \(C_i>0\) corresponds to a potential censoring time for subject i. Letting \(\delta _i=\text {I}(T_i\le C_i)\) that equals 1 if \(T_i\le C_i\) and 0 otherwise, the observed data for individual i consist of \(\{t_i,\delta _i,\mathbf {z}_i\}\), \(i=1,2,\ldots ,m\), where \(t_i\) is a lifetime or censoring time according to whether \(\delta _i=1\) or 0, respectively, and \(\mathbf {z}_i\) is a \(p\times 1\) vector of baseline covariates. Also, assume that the ith subject provides a set of longitudinal quantitative measurements \(\{y_{ij}, j =1,2,\ldots ,n_i\}\) at times \(\{s_{ij},j =1,2,\ldots ,n_i\}\). As proposed by Henderson et al. (2000), we consider to jointly model the longitudinal and the event data via a latent zero-mean bivariate Gaussian process \(\{U_i,V_i\}\), which is assumed to be independent across different subjects. This latent process is assumed to drive a pair of linked sub-models as described below.

  • Longitudinal Model. We model the longitudinal response \(y_{ij}\) at time \(s_{ij}\) by the relationship

    $$\begin{aligned} y_{ij}=\mu _i(s_{ij})+U_i(s_{ij})+\epsilon _{ij}, \end{aligned}$$
    (7)

    where \(\mu _i(s_{ij})\) is the mean response, \(U_i(s_{ij})\) incorporates subject-specific random effects, and \(\epsilon _{ij} \sim N(0, \sigma ^2)\) is a sequence of mutually independent measurement errors. We assume that the mean response at time s is characterized by a linear model \(\mu _i(s)=\mathbf {x}'_i(s)\varvec{\alpha }\), where \(\mathbf {x}_i(s)\) is a vector of covariates (possibly time-dependent) and \(\varvec{\alpha }\) is the corresponding vector of regression coefficients (fixed effects). For \(U_i(s)\), we assume a linear random effects model \(U_i(s)=\mathbf {w}'_i(s)\mathbf {b}_i\), where \(\mathbf {w}_i(s)\) is the design vector for the random effects \(\mathbf {b}_i \sim N(\mathbf {0},\Sigma _b)\). In our formulation, flexible representations of \(\mathbf {x}_i(s)\) and \(\mathbf {w}_i(s)\) may be considered using a vector of functions of time, expressed in terms of polynomials or splines. Such representations are particularly useful in applications where the longitudinal profiles exhibit nonlinearity over time.

  • Time-to-Event Model. We consider an AFT model to describe the event intensity process at time t (Cox and Oakes 1984), expressed as

    $$\begin{aligned} \lambda _i(t)=\lambda _0(g_i(t))\exp (-\mathbf {z}'_i\varvec{\beta }-V_i(t)), \end{aligned}$$
    (8)

    where \(g_i(t)=\int _0^{t} \exp (-\mathbf {z}'_i\varvec{\beta }-V_i(u))du\), \(\lambda _0(\cdot )\) is the baseline intensity function specified by the Weibull, log-logistic, log-normal, EW or GG distribution (see Table 1), \(\mathbf {z}_i\) is a vector of baseline covariates with a corresponding vector of regression coefficients \(\varvec{\beta }\) (no intercept term is included), and \(V_i(t)\) is specified in a similar way to \(U_i(t)\). In our implementation, dependence between the longitudinal and time-to-event sub-models is captured through \(V_i(t) = \phi U_i(t)\) so that \(\phi \) is the measure of association induced by the fitted longitudinal values. Note that \(\mathbf {z}_i\) may or may not have elements in common with \(\mathbf {x}_i\) of the longitudinal model. The model (8) implies that an individual ages on an accelerated schedule, \(g_i(t)\), under a baseline survival function \(S_0(g_i(t))\), which is biologically meaningful as it allows the entire covariate history to influence subject-specific risk (Tseng et al. 2005).

Bayesian inference is carried out by MCMC, where we sample from the posterior distribution by the Hamiltonian Monte Carlo algorithm (Gelman et al. 2013). The posterior distribution of the model parameters is derived under the assumptions that given the random effects, (a) the longitudinal process is independent of the time-to-event process, and (b) longitudinal responses of each subject are mutually independent. Let \(\varvec{\zeta }\) be the parameter vector of the chosen time-to-event distribution (\(\varvec{\zeta }=(\rho ,\upkappa )'\) for Weibull, log-logistic and log-normal, and \(\varvec{\zeta }=(\rho ,\upkappa ,\gamma )'\) for EW and GG), \(\varvec{\theta }_t=(\varvec{\zeta }',\varvec{\beta }',\phi )'\) the parameter vector of the time-to-event process, \(\varvec{\theta }_l=(\varvec{\alpha }',\sigma ^2)'\) the parameter vector for the longitudinal process, \(\varvec{\theta }_b\) the unique parameters of the random-effects covariance matrix \(\Sigma _b\), and \(\varvec{\theta }=(\varvec{\theta }'_t,\varvec{\theta }'_l,\varvec{\theta }'_b)'\) the full parameter vector. Letting \(\mathbf {y}_i=(y_{i1},y_{i2},\ldots ,y_{in_i})'\), \(\mathbf {b}=(\mathbf {b}'_1,\mathbf {b}'_2,\ldots ,\mathbf {b}'_m)'\), and denoting a density function by \(p(\cdot )\), the posterior density under the above assumptions takes the form

$$\begin{aligned} p(\varvec{\theta },\mathbf {b})\propto \prod _{i=1}^{m} p(\mathbf {y}_{i}|\mathbf {b}_i,\varvec{\theta }_l)p(t_i,\delta _i|\mathbf {b}_i,\varvec{\theta }_t) p(\mathbf {b}_i|\varvec{\theta }_b) p(\varvec{\theta }), \end{aligned}$$
(9)

where

$$\begin{aligned}&p(\mathbf {y}_{i}|\mathbf {b}_i,\varvec{\theta }_l)=\prod _{j=1}^{n_i}p(y_{ij}|\mathbf {b}_i,\varvec{\theta }_l),&\nonumber \\&p(y_{ij}|\mathbf {b}_i,\varvec{\theta }_l)=\frac{1}{\sigma \sqrt{2\pi }}\exp \Big \{-\frac{1}{2\sigma ^2}(y_{ij}-\mathbf {x}'_i(s_{ij})\varvec{\alpha }- \mathbf {w}'_i(s_{ij})\mathbf {b}_i)^2\Big \},&\end{aligned}$$
(10)
$$\begin{aligned}&p(t_i,\delta _i|\mathbf {b}_i,\varvec{\theta }_t)=\{\lambda _0(g_i(t_i))\exp (-\mathbf {z}'_i\varvec{\beta }-\phi \mathbf {w}'_i(t_i)\mathbf {b}_i)\}^{\delta _i}\times S_0(g_i(t_i)),&\end{aligned}$$
(11)
$$\begin{aligned}&p(\mathbf {b}_i|\varvec{\theta }_b)=\frac{1}{\sqrt{|\Sigma _b|(2\pi )^{n_b}}}\exp \Big (-\frac{1}{2}\mathbf {b}'_i \Sigma _b^{-1}\mathbf {b}_i\Big ),&\end{aligned}$$
(12)

\(p(\varvec{\theta })\) represents the prior specifications for \(\varvec{\theta }\), \(\lambda _0(\cdot )\) and \(S_0(\cdot )\) are, respectively, the baseline intensity function and baseline survivor function of an AFT model (see Table 1), \(g_i(t_i)=\int _0^{t_i} \exp (-\mathbf {z}'_i\varvec{\beta }-\phi \mathbf {w}'_i(u)\mathbf {b}_i)du\), and \(n_b\) is the dimensionality of the random-effects vector \(\mathbf {b}_i\). For the EW and GG time-to-event models, our choices of distributions for the relevant quantities allow us to rewrite the joint model as

$$\begin{aligned} \left. \begin{array}{c} [y_{ij}|\mathbf {b}_i,\varvec{\theta }_l] \sim N(\mathbf {x}'_i(s_{ij})\varvec{\alpha }+\mathbf {w}'_i(s_{ij})\mathbf {b}_i,\sigma ^2),\\ [1.25ex] \left[ 0|\mathbf {b}_i,\varvec{\theta }_t\right] \sim \text {Poisson}(-\log p(t_i,\delta _i|\mathbf {b}_i,\varvec{\theta }_t)),\\ \left[ \mathbf {b}_i|\varvec{\theta }_b\right] \sim N(\mathbf {0},\Sigma _b), \left[ \Sigma ^{-1}_b|\nu ,\varPsi \right] \sim \text {Wishart}(\varPsi ,\nu ),\\ \left[ \varvec{\alpha }|\varvec{\mu }_\alpha ,\Sigma _\alpha \right] \sim N(\varvec{\mu }_\alpha ,\Sigma _\alpha ), \left[ \varvec{\beta }|\varvec{\mu }_\beta ,\Sigma _\beta \right] \sim N(\varvec{\mu }_\beta ,\Sigma _\beta ),\\ \left[ \phi |a_0,a_1 \right] \sim N(a_0,a_1), \left[ \sigma ^{-2}|b_0,b_1\right] \sim \text {Gamma}(b_0,b_1),\\ \left[ \rho ^{-2}|c_0,c_1\right] \sim \text {Gamma}(c_0,c_1), \left[ \upkappa ^{2}|d_0,d_1\right] \sim \text {Gamma}(d_0,d_1),\\ \left[ \gamma ^{2}|e_0,e_1\right] \sim \text {Gamma}(e_0,e_1), \end{array}\right\} \end{aligned}$$
(13)

where the hyperparameters \(\varPsi \), \(\nu \), \(\varvec{\mu }_\alpha \), \(\Sigma _\alpha \), \(\varvec{\mu }_\beta \), \(\Sigma _\beta \), \(a_0\), \(a_1\), \(b_0\), \(b_1\), \(c_0\), \(c_1\), \(d_0\), \(d_1\), \(e_0\) and \(e_1\) are all assumed known. Some remarks are necessary for the hierarchical formulation (13).

  1. 1.

    For the Weibull, log-logistic and log-normal time-to-event models, the last equation for \(\gamma ^2\) is redundant.

  2. 2.

    Gauss-Kronrod or Gauss-Legendre quadrature rule (Press et al. 2007) can be used to evaluate the integral \(g_i(t_i)\) in (11).

  3. 3.

    The so-called “zeros-trick” (Spiegelhalter et al. 2003) is used to specify the distribution of \((t_i,\delta _i)\) (Eq. 11), as it is not of a standard form. The idea behind this technique is that the contribution of a Poisson(\(\xi \)) observation of zero is \(\exp (-\xi )\); if we set \(\xi _i=-\log p(t_i,\delta _i|\mathbf {b}_i,\varvec{\theta }_t)\), \(i=1,2,\ldots ,m\), with observed data a vector of 0’s, then we get the correct contributions (see the second equation of (13)).

  4. 4.

    For the gamma and Wishart distributions in (13), we consider the parameterizations implemented in the STAN software (Stan 2020). For example, \(b_0\) and \(b_1\) are, respectively, the shape and inverse-scale parameters of the \(\text {Gamma}(b_0,b_1)\) distribution, and \(\varPsi \) and \(\nu \) are, respectively, the scale matrix and degrees of freedom of the Wishart distribution \(\text {Wishart}(\varPsi ,\nu )\).

  5. 5.

    Our R package JMR can fit the joint models as formulated above. The MCMC algorithm is implemented in RStan (Stan 2020) using fairly vague, minimally informative priors. Gauss-Legendre 5-point quadrature rule is used to evaluate the integral \(g_i(t_i)\). Note that JMR has options to replace the default values of the hyperparameters in (13), and the default specification of 5-point Gauss-Legendre quadrature with a 7, 15, 21, 31, or 41-point Gauss-Kronrod quadrature. In addition, this package can be used for dynamic predictions, residual analysis and model comparison.

Henceforth, the joint models with Weibull, log-logistic, log-normal, EW and GG time-to-event processes will be interchangeably referred to as Models W, LL, LN, EW ang GG, respectively.

3.1 Dynamic predictions

Based on a fitted joint model, we are interested to make a prediction for a new subject i from the same population. With \(y_i(s)=\{y_i(u),0\le u\le s\}\), \(\mathbf {x}_i(s)=\{\mathbf {x}_i(u),0\le u\le s\}\) and \(\mathbf {z}_i\) known, the conditional survival probability \(P(T_i\ge t|T_i\ge s,y_i(s),\mathbf {x}_i(s),\mathbf {z}_i)\) for \(t\ge s\) is of particular interest in medical research. Note that the longitudinal measurements are typically recorded up to a specified time, implying that the subject is event-free up to that time point. Therefore, it is more relevant to focus on conditional subject-specific predictions. For estimation, we first consider a Monte Carlo approach similar to the algorithm proposed by Rizopoulos (2012) to construct a subject’s profile given survival up to time s. The algorithm is described as follows.

  1. 1.

    Randomly take a realization from the posterior simulations of the joint model parameters. Denote it by \(\varvec{\theta }^{(1)}\).

  2. 2.

    Given \(\varvec{\theta }^{(1)}\), the history of the longitudinal response, covariate information, and survival up to time s, simulate a realization of the random-effects vector, say \(\mathbf {b}^{(1)}_i\), from its posterior distribution. For this step Rizopoulos (2012) used a Metropolis-Hastings algorithm with independent proposals from a properly centered and scaled multivariate t distribution (implemented in R). We consider here a different approach: draw a random sample from the posterior distribution of the joint model (13) with \(m = 1\), event time = s and \(\delta =0\), given \(\varvec{\theta }\) (specified by \(\varvec{\theta }^{(1)}\)), \(y_i(s)\), \(\mathbf {x}_i(s)\) and \(\mathbf {z}_i\). This is equivalent to drawing \(\mathbf {b}_i\) from its posterior distribution given \(\varvec{\theta }^{(1)}\), and is relatively straightforward to implement in STAN (i.e., after sufficient adaptation and burn-in iterations, draw only a single sample from the posterior distribution).

  3. 3.

    Compute \(P(T_i\ge t|T_i\ge s,y_i(s),\mathbf {x}_i(s),\mathbf {z}_i)\) using \(\frac{S(t|\mathbf {z}_i,\mathbf {b}^{(1)}_i,\varvec{\theta }^{(1)})}{ S(s|\mathbf {z}_i,\mathbf {b}^{(1)}_i,\varvec{\theta }^{(1)})}\).

  4. 4.

    Repeat steps 1-3 M times.

We can now derive posterior summaries of the conditional probabilities (e.g., mean, median, and 95% credible interval) from the MCMC sample of size M as constructed using the above algorithm.

3.2 Hazard-based residuals

Hazard-based residuals, \(\hat{R}_{i}\), are defined as the estimated cumulative hazard function evaluated at each observed event time \(t_i\), where the cumulative hazard function at \(t_i\) is given by

$$\begin{aligned} R_i=-\log S_0(g_i(t_i))=-\log S_0\Big (\int _0^{t_i} \exp (-\mathbf {z}'_i\varvec{\beta }-\phi \mathbf {w}'_i(u)\mathbf {b}_i)du\Big ). \end{aligned}$$
(14)

For sufficiently large samples, the \(\hat{R}_{i}\) should be similar to independent standard exponential random variables if the model is correct (Rizopoulos 2012). Thus, a plot of \(-\log {S_{\text {KM}}(\hat{R}_{i})}\) versus \(\hat{R}_{i}\) should be roughly a straight line with unit slope when the model is adequate, where \(S_{\text {KM}}(\cdot )\) denotes the Kaplan-Meier estimate of the survivor function. Note that \(\varvec{\theta }\) and \(\mathbf {b}_i\) each has its own posterior distribution, inducing a posterior distribution for \(R_i\) at each observed \(t_i\). Therefore, we regard the MCMC sample mean of \(R_i\) as an estimate of the cumulative hazard function at \(t_i\), denoted by \(\hat{R}_{i}\). Kaplan-Meier estimate of the survivor function is then obtained based on \(\hat{R}_{i}\).

3.3 Model comparison

Bayesian models can be compared in several ways. Methods based on an estimate of prediction accuracy from a fitted Bayesian model are particularly popular. These include the deviance information criterion (DIC) (Spiegelhalter et al. 2002) and the widely applicable information criterion (WAIC) (Watanabe 2010). Note that the predictive accuracy is estimated using the expected log predictive density function.

For WAIC, the measure of predicted accuracy is defined as the expected log pointwise predictive density (elppd), estimated using

$$\begin{aligned} \widehat{\text {elppd}}_\text {WAIC}=\sum _{i=1}^{m}\log \bigg (\frac{1}{K} \sum _{k=1}^{K}p\big (t_i,\delta _i,\mathbf {y}_i|\varvec{\theta }^{(k)},\mathbf {b}_i^{(k)}\big )\bigg )-p_\text {WAIC}, \end{aligned}$$
(15)

where \(\{\varvec{\theta }^{(k)},\mathbf {b}_i^{(k)}; k=1,2,\ldots ,K\}\) is the usual posterior simulations, and \(p_\text {WAIC}\) is the effective number of parameters, computed from posterior simulations as follows:

$$\begin{aligned} p_\text {WAIC}=\sum _{i=1}^{m}var \Big (\log p\big (t_i,\delta _i,\mathbf {y}_i|\varvec{\theta }^{(k)},\mathbf {b}_i^{(k)}\big )\Big ), \end{aligned}$$
(16)

where \(var(a_{k})=\frac{1}{K-1}\sum _{k=1}^{K}(a_k-\bar{a})^2\). WAIC is then defined in terms of the deviance:

$$\begin{aligned} \text {WAIC}=-2\times \widehat{\text {elppd}}_\text {WAIC}. \end{aligned}$$
(17)

Note that WAIC is based on pointwise predictive density, and therefore it has an explicit connection to the cross-validation technique. For this reason, although computationally expensive, WAIC is the preferred choice for Bayesian model comparison (see Gelman et al. 2014 for a comprehensive review of predictive information criteria from a Bayesian perspective, including WAIC and DIC).

4 Simulations

4.1 Recurrent event analysis

We considered two scenarios: (1) the Weibull is the true survival model with \(\rho =0.25\) and \(\upkappa =2\) (increasing hazard), and (2) the log-logistic is the true survival model with \(\rho =0.25\) and \(\upkappa =2\) (unimodal hazard). Two baseline covariates were considered in all simulations: one continuous covariate (\(z_1\)) generated from the standard normal distribution, and one binary covariate (\(z_2\)) generated from the Bernoulli(0.5) distribution. Regression parameter values were chosen to be \(\beta _1=0.5\) and \(\beta _2=-0.5\) corresponding to \(z_1\) and \(z_2\), respectively. Let \(\mathbf {z}_i=(z_{i1},z_{i2})'\) be the covariate vector for individual i, and \(\varvec{\beta }=(0.5,-0.5)'\). To generate recurrent event data for individual i, we set

$$\begin{aligned} E_{ij}=\int _{t_{i,j-1}}^{t_{i,j-1}+Q_{ij}}\lambda _0(u\exp (-\mathbf {z}'_{i}\varvec{\beta }))\exp (-\mathbf {z}'_{i}\varvec{\beta }), \end{aligned}$$
(18)

which has a standard exponential distribution (Cook and Lawless 2007). Thus, given \(t_{i,j-1}\), we generated \(q_{ij}\) (a realization of \(Q_{ij}\)) by drawing a random number from the standard exponential distribution and solving (18) for \(Q_{ij}\). By repeating this for \(j = 1, 2, \ldots \), we generated successive event times \(t_{ij} = t_{i,j-1}+q_{ij}\). The time origin for each individual was set to \(t_{i0}=0\), and the maximum follow-up time was generated from the uniform(3, 5) distribution. For each scenario, we generated 500 data sets (each with \(m = 100\)), and then fitted the Weibull, log-logistic, log-normal, EW and GG recurrent event models to each of these simulated data sets. We considered bias and mean squared error (MSE) to evaluate the performance of a fit.

Table 2 Simulation results for recurrent event data analyses (W, LL, LN, EW and GG stand for Weibull, log-logistic, log-normal, exponentiated Weibull and generalized gamma, respectively), with one continuous covariate with regression coefficient \(\beta _1=0.5\), one binary covariate with regression coefficient \(\beta _2=-0.5\), and \(m=100\): table entries are average and standard deviation of 500 estimates of the parameters; also bias and MSE

Numerical results are summarized in Table 2. As expected, we see superior performance of the true model (Weibull for increasing hazard, and log-logistic for unimodal hazard) in terms of bias and MSE. Limitations of the Weibull, log-logistic and log-normal models are also evident in our simulation results: we see relatively poor performance of the log-logistic and log-normal models when the hazard shape is increasing, and relatively poor performance of the Weibull model when the hazard shape is unimodal. These results are expected because, theoretically, the log-logistic and log-normal models are useful to describe only unimodel hazard functions, whereas the Weibull model can describe only monotone increasing and decreasing hazard shapes. However, the EW and GG models are more flexible as highlighted in Sect. 1; it is evident from our simulation study that the performance of the EW and GG models is quite comparable to that of the true model in terms of bias and MSE. Thus, when the shape of the hazard function is unknown, we recommend EW or GG-based estimators for recurrent event data analysis.

4.2 Joint modeling

For joint modeling, we considered a similar setup for the time-to-event process: (a) two simulation scenarios as described in Sect. 4.1, (b) two baseline covariates \(z_1\) and \(z_2\) with regression coefficients \(\beta _1=0.5\) and \(\beta _2=-0.5\), respectively, and (c) \(m=100\). In addition, we considered one internal covariate with regression coefficient \(\phi =-0.25\). The longitudinal model was taken to be

$$\begin{aligned} y_{ij}=\alpha _0+\alpha _1x_{i1}+\alpha _2 s_{ij}+b_{i0}+b_{i1}s_{ij}+\epsilon _{ij}, \end{aligned}$$
(19)

with \(\alpha _0=10\), \(\alpha _1=1\), \(\alpha _2=-1\), \(x_1\) a fixed time-independent covariate generated from the Bernoulli(0.5) distribution, \(s_{ij}=0,1,2,3,4\) (time points at which longitudinal measurements are planned to be taken), \(\sigma ^2=0.5\) (i.e., \(\epsilon _{ij}\sim N(0,0.5)\)), \(\sigma _{11}=\text {sd}(b_{i0})=0.5\), \(\sigma _{22}=\text {sd}(b_{i1})=0.1\), and \(r_{12}=\text {corr}(b_{i0},b_{i1})=-0.2\). The random-effects covariance matrix, \(\Sigma _b\), was derived using \(\sigma _{11}\), \(\sigma _{22}\) and \(r_{12}\). For individual i, the following algorithm was then used to simulate data.

  1. 1.

    Generate \(\mathbf {b}_i=(b_{i0},b_{i1})'\) from \(N(\mathbf {0},\Sigma _b)\).

  2. 2.

    Take

    $$\begin{aligned} S_0(g_i(T_i))&=S_0\Big (\int _0^{T_i} \exp (-\mathbf {z}'_i\varvec{\beta }-\phi \mathbf {w}'_i(u)\mathbf {b}_i)du\Big )\\&=S_0\Big (\int _0^{T_i} \exp [-(0.5z_{i1}-0.5z_{i2})+0.25(b_{i0}+b_{i1}u)]du\Big ), \end{aligned}$$

    and solve the equation \(S_0(g_i(T_i))=G\) for \(T_i\), where G is a random number from the uniform(0, 1) distribution.

  3. 3.

    Generate censored time \(C_i\) from the exponential distribution with rate parameter \(\rho ^*\).

  4. 4.

    Take event time \(t_i=\min (T_i,C_i)\).

  5. 5.

    For \(s_{ij}<t_i\), obtain repeated measurements \(y_{ij}\) using (19), where \(\epsilon _{ij}\sim N(0,0.5)\).

Note that the rate parameter of the exponential distribution, \(\rho ^*\), was chosen such that the average fraction of censored data to be around \(30\%\).

For each simulation, 500 data sets were generated. Posterior summaries were averaged over the 500 sets for each parameter, and the bias and MSE were calculated.

Table 3 Simulation results for joint modeling of time-to-event and longitudinal data (W, LL, LN, EW and GG stand for Weibull, log-logistic, log-normal, exponentiated Weibull and generalized gamma, respectively), with Weibull the true survival model (increasing hazard), and \(m=100\): table entries are average and standard deviation of 500 posterior means of the parameters; also bias and MSE
Table 4 Simulation results for joint modeling of time-to-event and longitudinal data (W, LL, LN, EW and GG stand for Weibull, log-logistic, log-normal, exponentiated Weibull and generalized gamma, respectively), with log-logistic the true survival model (unimodal hazard), and \(m=100\): table entries are average and standard deviation of 500 posterior means of the parameters; also bias and MSE

Numerical results are summarized in Tables 3 and 4 for scenarios 1 and 2, respectively. We see that all the models perform very well to describe the longitudinal process (MSEs are almost identical for each of the parameters of the longitudinal process), though we see some differences in the estimation of the survival model parameters. Overall, we see superior performance of the true model (Weibull for scenario 1/increasing hazard, and log-logistic for scenario 2/unimodal hazard) in terms of MSE. For increasing hazard (Table 3), the performance of the EW and GG models is comparable to that of the Weibull model, whereas the performance of the log-logistic and log-normal models is close but less accurate in terms of MSE. On the other hand, for unimodal hazard (Table 4), we see superior performance of the log-logistic and log-normal models. For this scenario, the EW and GG models also perform reasonably well in terms of MSE. All these results are expected and desirable, suggesting that the proposed methodology works very well for joint modeling of longitudinal and time-to-event data.

5 Examples

5.1 Recurrent event analysis of rehospitalization data

González et al. (2005) described a study where 403 patients diagnosed with colorectal cancer between January 1996 and December 1998 were followed up until 2002. The study took place in the Hospital de Bellvitge, Barcelona, Spain. The data are available in the frailtypack package (Rondeau et al. 2019) in R, and contain rehospitalization times after surgery in patients diagnosed with colorectal cancer (colorectal cancer patients may have several readmissions after discharge). For recurrent event analysis, we consider an AFT model of the form

$$\begin{aligned} \lambda (t)\,=&\,\lambda _0\big (t\exp {[-(\beta _1 \text {sex} + \beta _2 \text {chemo} + \beta _3 \text {charl}_1 + \beta _4 \text {charl}_2)]}\big )\\&\exp {[-(\beta _1 \text {sex} + \beta _2 \text {chemo} + \beta _3 \text {charl}_1 + \beta _4 \text {charl}_2)]}, \end{aligned}$$

where sex = I(male), chemo = I(chemotherapy), charl\(_1\) = I(1 \(\le \) Charlson’s index \(\le \) 2), and charl\(_2\) = I(Charlson’s index \(\ge \) 3). Note that Charlson’s index (a higher score indicates a greater burden of comorbidity) is taken as a time-dependent covariate.

First, we fit the Andersen–Gill (AG) model to the rehospitalization data (Table 5). Note that the Andersen–Gill approach uses the Cox PH model, which can be fitted easily using the coxph function of the survival package (Therneau and Grambsch 2000) in R. Tests of the PH assumption for the covariates, carried out using the R function cox.zph, give significant evidence of nonproportionality in the effect of Charlson’s index (p-value = 0.043). One approach to overcome this problem is to fit the Andersen–Gill model by taking into account an interaction between Charlson’s index and time. Alternatively, we may consider an AFT model to analyze the rehospitalization data as described below.

Table 5 Recurrent event analyses—Andersen–Gill (AG), Weibull AFT (W), log-logistic AFT (LL), log-normal AFT (LN), exponentiated Weibull AFT (EW) and generalized gamma AFT (GG) fits for rehospitalization data
Fig. 1
figure 1

Recurrent event analyses of rehospitalization data-residual plots along with 95% pointwise confidence intervals (dashed lines) for the Weibull AFT, log-logistic AFT, log-normal AFT, exponentiated Weibull AFT and generalized gamma AFT fits

The fits of the Weibull, log-logistic, log-normal, EW and GG AFT models are summarized in Table 5 (the use of our R package to fit these models is illustrated in supp. material Section S1). Our analyses indicate clear superiority of the EW and GG over the Weibull, log-logistic and log-normal fits in describing these data. In particular,

  • the smallest AIC value is obtained for the EW and GG fits (AIC = 6972, 6975, 6974, 6964 and 6964 for the Weibull, log-logistic, log-normal, EW and GG models, respectively);

  • the likelihood ratio statistic to test the goodness of fit of Weibull as a submodel of the EW is \(\chi ^2=9.40\) on 1 df, with p-value < 0.01 (significant evidence against the Weibull model relative to the EW in describing the rehospitalization data);

  • the likelihood ratio statistic to test the goodness of fit of Weibull as a submodel of the GG is \(\chi ^2=9.78\) on 1 df, with p-value < 0.01 (significant evidence against the Weibull model relative to the GG);

  • the residual plots (Fig. 1) indicate clear superiority of the EW and GG over the Weibull, log-logistic and log-normal fits.

Our analyses based on the EW and GG fits reveal that (a) each of the covariates under study is highly significant, (b) the time to rehospitalization of a male is speeded by a factor of about \(e^{-(-0.82)}=2.3\) relative to a female (i.e, males with colorectal cancer are of higher risk of rehospitalization after surgery), (c) the time to rehospitalization of a colorectal cancer patient who received chemotherapy is decelerated by a factor of about \(e^{-0.49}=0.61\) relative to a patient who did not receive chemotherapy (i.e, there is a negative association between I(chemotherapy) and rehospitalization time), (d) the time to rehospitalization of a patient with Charlson’s index between 1 and 2 is accelerated by a factor of about 2.1 relative to a patient with no comorbidities (i.e., Charlson’s index \(=\) 0), and (e) the time to rehospitalization of a patient with Charlson’s index \(\ge \) 3 is accelerated by a factor of about 3.4 relative to a patient with no comorbidities.

5.2 Joint modeling of AIDS data

Rizopoulos (2012) described a study involving 467 human immunodeficiency virus (HIV) infected patients who had failed or were intolerant to zidovudine therapy (AZT). The main objective was to compare two antiretroviral drugs to prevent the progression of HIV infections: didanosine (ddI) and zalcitabine (ddC). Patients were randomly assigned to receive either ddI or ddC and followed until death or the end of the study, resulted in 188 complete and 279 censored observations. It was also of interest to quantify the association between CD4 cell counts (internal time-dependent covariate) and time to death. The data are available in the JM package (Rizopoulos 2010) in R. Letting \(y=\sqrt{\text {CD4}}\), we consider a joint model to analyze these data, with

$$\begin{aligned}&y_{ij}=\alpha _0+\alpha _1 s_{ij}+\alpha _2 (s_{ij}\times \text {drug}_i)+\alpha _3\text {sex}_i+\alpha _4\text {prevOI}_i+\alpha _5\text {AZT}_i&\\&\qquad \qquad +\,b_{i0}+b_{i1}s_{ij}+\epsilon _{ij},&\\&\lambda _i(t_i)=\lambda _0\Big (\int _0^{t_i} \exp [-\mathbf {z}'_i\varvec{\beta }-\phi (b_{i0}+b_{i1}u)]du\Big ) \exp [-\mathbf {z}'_i\varvec{\beta }-\phi (b_{i0}+b_{i1}t_i)],&\\&\mathbf {z}_i'\varvec{\beta }=\beta _1\text {drug}_i+\beta _2\text {sex}_i+\beta _3\text {prevOI}_i+\beta _4\text {AZT}_i,&\end{aligned}$$

where \(s_{ij}\) is the jth measurement occasion for individual i, drug = I(ddI), sex = I(male), prevOI = I(previous opportunistic infection), and AZT = I(AZT failure). To fit a joint model, we consider two chains to approximate the posterior distributions, each with 2,500 MCMC iterations after adaptation, burn-in and thinning.

We fit five joint models to the AIDS data with the time-to-event process described by an AFT model: W, LL, LN, EW and GG (the use of our R package is illustrated in supp. material Section S1). For the sake of comparison, we also fit the Weibull PH (WPH) joint model, where the time-to-event process is described by the PH model with the baseline hazard specified by the Weibull distribution. Some posterior characteristics of the parameters are given in Tables 6 and 7. We see that the Gelman-Rubin diagnostic (Gelman and Rubin 1992) values (\(R_c\)) are all very close to 1, indicating convergence of MCMC output. In addition, the trace plots of the parameters show no pattern (i.e., the MCMC chains appear to converge well), and the density plots appear smooth with no indication of multimodality (as an example, the trace and density plots of Model W parameters are presented in supp. material Section S2).

Table 6 Joint modeling of AIDS data using Weibull PH (WPH), Weibull AFT (W), log-logistic AFT (LL), log-normal AFT (LN), exponentiated Weibull AFT (EW) and generalized gamma AFT (GG) processes—results for the longitudinal submodel parameters: table entries are posterior means, posterior medians, standard deviations, 95% credible intervals (2.5% and 97.5% quantiles), and Gelman–Rubin statistic values (\(R_c\))
Table 7 Joint Modeling of AIDS data using Weibull PH (WPH), Weibull AFT (W), log-logistic AFT (LL), log-normal AFT (LN), exponentiated Weibull AFT (EW) and generalized gamma AFT (GG) processes - results for the time-to-event submodel parameters: table entries are posterior means, posterior medians, standard deviations, 95% credible intervals (2.5% and 97.5% quantiles), and Gelman–Rubin statistic values (\(R_c\))

The WAIC for Models WPH, W, LL, LN, EW and GG are 7526, 7527, 7550, 7562, 7533 and 7530 respectively, suggesting that the WPH and W fits are superior to the fits of the other four models. The results show almost identical goodness of fit for the WPH and W models. We also see slightly more support for the WPH and W over the GG and EW fits (perhaps no significant difference). On the other hand, the LL and LN fits are comparable, but provide far less support compared to the other four models. These results are also evident in the residual plots of Fig. 2.

Fig. 2
figure 2

Joint modeling of AIDS data-residual plots for the time-to-event submodels. A random sample of size 200 is drawn from the posterior simulations, and the Cox–Snell residuals are computed for each sample. The estimate of the posterior expectation is obtained as the MCMC sample mean of the residuals at each observed time point. A plot of residual vs. estimated cumulative hazard is then produced (solid circles). Uncertainty in the plot can be assessed through the 200 sets of residuals obtained from the posterior simulations (grey lines)

Based on the 95% credible intervals, time (s) and PrevOI are statistically significant for the longitudinal process (Table 6), whereas prevOI and CD4 cell counts (the internal time-dependent covariate) are significant for the time-to-event process (Table 7). Note that these results are consistent with those of Guo and Carlin (2004), who used a PH model to describe the event process in analyzing these data.

With the Weibull AFT fit, the estimate of \(\phi \) (coefficient for CD4) is 0.159 with 95% credible interval (0.117, 0.205), suggesting a positive association between CD4 cell counts and event time (high CD4 count leads to better health status). It is also supported by the Weibull PH fit: the estimate of \(\phi \) is \(-0.248\) with 95% credible interval \((-0.319,-0.182)\); and the estimate \(e^{-\phi }\) is 1.28, suggesting that a unit decrease in the marker corresponds to a 1.28-fold increase in the risk for death. To illustrate, we consider dynamic predictions for two patients with different CD4 cell levels, adjusted for the other factors (i.e., same drug, sex, prevOI, AZT and censoring status; data for these two patients are given in supp. material Section S3). One patient (id# 404) has lower CD4 cell levels, and the other patient has relatively higher CD4 cell levels (id# 409). Note that \(s=12\) is the last observed time at which a longitudinal measurement is available (i.e., both these patients have survived up to time \(s=12\)). With this information, we estimate the conditional survival probabilities \(P(T>t|T>12)\) for \(t\ge 12\) with the Weibull AFT fit. The survival curves for these two patients are given in Fig. 3 (numerical results along with 95% pointwise credible intervals are given in supp. material Section S3). The left panels of Fig. 3a, b show the fitted longitudinal profiles (patient 404 has lower CD4 cell levels), whereas the right panels display the survival curves (predictions are displayed by the solid lines, and the pointwise Bayesian intervals are displayed by the dashed lines). We also present Fig 3c, where a comparison of survival probabilities of these two patients are displayed (survival curves of Fig. 3a and b are superimposed on the same plot). We see higher survival rates for patient 409, suggesting a positive association between CD4 cell counts and event time (a negative association between CD4 cell counts and the risk of death). We obtained similar predictions using the Weibull PH fit (results are not shown here).

Fig. 3
figure 3

Dynamic predictions (using the Weibull AFT joint model) for two AIDS patients with different CD4 cell levels, adjusted for the other factors (i.e., same drug, sex, prevOI, AZT and censoring status). a and b Show plots of the survival curves (right panel), with the predictions displayed by the solid line and the pointwise Bayesian intervals displayed by the dashed lines; the corresponding fitted longitudinal profiles are included in the left panel. c Shows a comparison of survival probabilities of these two patients (survival curves of a and b are superimposed on the same plot). Data for these two patients are given in supp. material Section S3; one patient (id# 404) has lower CD4 cell levels, and the other patient has relatively higher CD4 cell levels (id# 409)

6 Conclusion

The key assumption in the PH model is the proportionality assumption concerning the effects of covariates. The assumption of PH is strong and it is important that it be checked. The AFT model is widely viewed as an alternative to the PH model, and it turns out to be particularly useful when the PH assumption fails. Although the AFT model is a well-known alternative to the PH model in standard survival data analysis, it has seldom been utilized in recurrent event data analysis and joint modeling. In this article, we have demonstrated applications of AFT models in recurrent event data analysis and joint modeling. In addition to the commonly used AFT models (Weibull, log-logistic, log-normal), we have also introduced two flexible models (EW and GG) that can describe data with ranging characteristics. Indeed, the flexibility of these two models is evident in our simulations and real data analyses. This work will serve as a complement to the existing work on recurrent event data analysis and joint modeling. Note that we assumed the typical setup of joint modeling, where there is only one internal time-dependent covariate and a set of baseline (time-independent) covariates. It would be interesting to allow time-varying covariates under the setup of joint modeling for recurrent event data. Extensions of our joint modeling framework may thus be desirable.