1 Introduction

Longitudinal data are frequently encountered in a wide variety of fields, such as medical follow-up studies and observational investigations. In many applications, observation times are usually irregular, and may be correlated with the longitudinal process. Various methods for analyzing longitudinal data with informative observation times have been developed [1, 5, 9, 13,14,15,16, 20, 22]. For example, Lin et al. [9] considered a marginal regression model and proposed a class of inverse intensity-of-visit process-weighted estimators. Sun et al. [16] suggested a joint model for the longitudinal and observation processes via a shared latent variable. Liang et al. [5] proposed a class of joint models via two latent variables. All the above methods primarily analyze longitudinal data with informative observation times in the absence of a dependent terminal event.

In many situations, a dependent terminal event such as death, which precludes the follow-up, may exist. Moreover, Liu et al. [11] indicated that regarding the dependent terminal event as noninformative censoring may yield biased estimates in modeling the hospital visits and the longitudinal medical costs. It is also common that the longitudinal process may be correlated with both observation times and a terminal event. For example, Liu et al. [10] and Sun et al. [17], in analyzing data from a medical cost study, found that patients visiting the hospital more often tended to pay more for each visit, and these patients also had a higher mortality rate. That is, the longitudinal medical costs could be correlated with both hospital visits and death. Thus, there is clearly a need to develop suitable models for analyzing longitudinal data, which accounts for both informative observation times and a dependent terminal event simultaneously.

However, researches on the joint analysis of longitudinal data with informative observation times and a dependent terminal event are limited. Liu et al. [10] suggested a joint random effects model, where the random effects are assumed to be normally distributed. Sun et al. [17] proposed a joint modeling approach via two latent variables, where the dependence structure between two latent variables are left unspecified. Moreover, in the analysis of a bladder cancer data conducted by the Veterans Administration Cooperative Urological Research Group, Liang et al. [5] argued that the treatment assignment has a random effect on the tumor recurrence process. However, the existing methods for analyzing longitudinal data with informative observation times and a dependent terminal event did not consider the case where some covariates have random effects. This motivates us to develop the present study.

In this article, we propose a new joint model for analysis of longitudinal data with informative observation times and a dependent terminal event in the spirit of the works of Liang et al. [5], which allows for inference about the random effects of covariates. Specifically, a semiparametric mixed effects model is specified for the longitudinal process, a proportional rate frailty model is used for the observation process, and a proportional hazards frailty model is used for the terminal event. The association among the three related processes is modeled via two latent variables. The proposed joint model generalizes the approach of Liang et al. [5] by taking the terminal event into account. In addition, unlike Liang et al. [5], our model does not assume that the observation process is a nonhomogeneous Poisson process. Thus, it is more comprehensive and flexible.

The rest of the article is organized as follows. Section 2 describes the joint model of the longitudinal process, the observation times, and the terminal event. In Sect. 3, estimating equation approaches are proposed for regression parameters of interest, and the asymptotic properties of the proposed estimators are established. Some simulation results for evaluating the proposed methods are reported in Sect. 4. An application to a medical cost study of chronic heart failure patients from the University of Virginia Health System is provided in Sect. 5, and some concluding remarks are given in Sect. 6. All proofs are relegated to the Appendix.

2 Model Specification

Consider a longitudinal study involving n independent subjects. For the ith subject, \(i=1,\ldots ,n\), denote \(Y_i(t)\) as the longitudinal process of interest at time t. Let \(X_i(t)\) be the \(p\times 1\) vector of external covariates as described in Kalbfleisch and Prentice [2]. Also, let \(C_i\) be the censoring time and \(D_i\) be the terminal event time such as death. In addition, define \(T_i=\min (C_i, D_i)\) and \(\Delta _i(t)=I(T_i\ge t),\) where \(I(\cdot )\) is the indicator function. Let \(\tilde{N}_i^R(t)\) be the counting process denoting the number of the observation times in the time interval [0, t] and the observation process \(N_i^R(t)=\tilde{N}_i^R(t\wedge T_i)\), where \(a\wedge b\) is the minimum of a and b. Note that the longitudinal process \(Y_i(t)\) is observed only at the jump points of \(N_i^R(t)\). Let \(\tilde{N}_i^D(t)\) denote the terminal process before or at time t,  and \(N_i^D(t)=\tilde{N}_i^D(t\wedge T_i)\). Also let \(\upsilon _i\) be a nonnegative unobserved frailty. In what follows, we assume that given \(X_i(t),\) the censoring time \(C_i\) is independent of \(\{Y_i(\cdot ),\ \tilde{N}_i^R(\cdot ),\ \tilde{N}_i^{D}(\cdot ),\ D_i,\ \upsilon _i\}.\)

For the longitudinal process, following Liang et al. [5], we consider the following semiparametric mixed effects model for \(Y_i(t)\):

$$\begin{aligned} Y_i(t)=\mu _0(t)+\gamma _0^TX_i(t)+u_i^TZ_i(t)+\epsilon _i(t), \end{aligned}$$
(1)

where \(\mu _0(t)\) is an unspecified smooth function, \(\gamma _0\) is a vector of unknown regression parameters, \(Z_i(t)\) is a q-dimensional subvector of \((1,X_i(t)^T)^T\), \(u_i\) is a q-dimensional vector of subject-specific random effects, and \(\epsilon _i(t)\) is a zero-mean measurement error process and independent of \(u_i.\) For identifiability of model (1), the random effects \(u_i\) are assumed to have zero mean.

Following Ye et al. [19], we consider a (partial) marginal rate of the observation times given \(X_i(t)\), \(D_i=s\) and \(\upsilon _i\), which is defined as

$$\begin{aligned} \mathrm{{d}}\Lambda _R(t|\upsilon _i, X_i(t))=P\{\mathrm{{d}}\tilde{N}_i^R(t)=1|\upsilon _i, X_i(t), D_i=s \}, \ \ s \ge t. \end{aligned}$$

For the analysis, we specify the observation process model as

$$\begin{aligned} \mathrm{{d}}\Lambda _R(t|\upsilon _i, X_i(t)) = \upsilon _i\exp \{\beta _0^T X_i(t)\}\mathrm{{d}}\Lambda _0^R(t), \end{aligned}$$
(2)

where \(\beta _0\) is an unspecified p-dimensional regression parameters, and \(\Lambda _0^R(t)\) is cumulative baseline function. Note that \(\mathrm{{d}}\Lambda _R(t|\upsilon _i, X_i(t))\) may depend on \(X_i(t)\) and the frailty \(\upsilon _i\), but does not depend on the terminal event time \(D_i=s \ge t.\) This implies that given covariates \(X_i(t),\) \(\upsilon _i\) accounts for the correlation between the observation times and the terminal event. Also, it follows that \(\mathrm{{d}}\Lambda _R(t|\upsilon _i, X_i(t))=P\{\mathrm{{d}}\tilde{N}_i^R(t)=1|\upsilon _i, X_i(t), D_i \ge t\}\) (e.g., [3]), which indicates that given \(X_i(t)\) and \(\upsilon _i\), \(\mathrm{{d}}\Lambda _R(t|\upsilon _i, X_i(t))\) specifies the marginal rate of the observation times among those subjects surviving to time t. In addition, \(P\{\mathrm{{d}}\tilde{N}_i^R(t)=1|\upsilon _i, X_i(t), D_i=s\}=0,\ s < t\), that is, the occurrence of additional observation times is precluded by the terminal event.

For the terminal event, we consider the following proportional hazards frailty model:

$$\begin{aligned} \mathrm{{d}}\Lambda _D(t|\upsilon _i, X_i(t)) = \upsilon _i\exp \{\alpha _0^T X_i(t)\}\mathrm{{d}}\Lambda _0^D(t), \end{aligned}$$
(3)

where \(\alpha _0\) is an unknown p-dimensional regression parameters, and \(\Lambda _0^D(t)\) is the cumulative baseline hazard function.

For models (1), (2), and (3), we assume that the association between the random effects \(u_i\) and \(\upsilon _i\) is formulized as \(E(u_i|\upsilon _i)=\eta _0(\upsilon _i-1),\) where \(\eta \) is a q-dimensional parameter. As in Ye et al. [19], we assume that given \(X_i(t),\) the frailty \(\upsilon _i\) has a gamma distribution with mean 1 and variance \(\theta \), where \(E(\upsilon _i|X_i(t))=1\) is fixed for identifiability of models (1), (2), and (3).

Remark 1

In the medical cost study of chronic heart failure patients in Sect. 5, a higher mortality rate is associated with a higher frequency of hospital visits. Thus, we assume that the observation process and the terminal event have a positive association through a common frailty variable \(\upsilon _i\) in the same fashion. However, the proposed method can be extended to allow for a negative association between the observation process and the terminal event. More details can be found in the third paragraph of Sect. 6.

Remark 2

Note that

$$\begin{aligned} E\big \{Y_i(t)-\mu _0(t)-\gamma _0^TX_i(t)|\upsilon _i, X_i(t) \big \}= E\big \{u_i^T |\upsilon _i, X_i(t)\big \} Z_i(t). \end{aligned}$$

We followed the suggestion of Liang et al. [5] to assume the linear relationship between \(u_i\) and \(\upsilon _i\) for computational simplicity. In fact, the proposed method can be extended to the case that \(E\big \{u_i|\upsilon _i, X_i(t)\big \}=g(\upsilon _i;\eta ),\) where \(g(\upsilon _i;\eta )\) is a q-dimensional vector with each component being a polynomial in \(\upsilon _i.\) A further discussion can be found in the second paragraph of Sect. 6.

Remark 3

Although model (1) that allows for inference about the random effects of covariates was considered by Liang et al. [5] and others (e.g., [16, 22]), and models (2) and (3) that allow the common frailty to directly relate to the association between the observation and terminal event processes were investigated by Kalbfleisch et al. [3], a joint modeling for simultaneously accommodating all of the aforementioned features has not been considered in the literature. The proposed model fits this gap and provides a flexible framework to allow subject-specific observation process with continuous missing patterns as well as various types of associations between the longitudinal process and the observation times in the presence of a dependent terminal event.

3 Estimation Procedure

To handle the unobserved frailty \(\upsilon _i\) we proceed with the estimation procedure by analyzing the average rates obtained by taking the conditional expectation of models (2) and (3) with respect to \(\upsilon _i\) given \(D_i \ge t\) and \(X_i(t)\) ([3]). Note that

$$\begin{aligned} E[\upsilon _i|D_i \ge t, X_i(t)] =\frac{1}{1+\theta \int _0^t\exp \{\alpha ^TX_i(u)\}\mathrm{{d}}\Lambda _0^D(u)}. \end{aligned}$$

Then we have the following marginal rates:

$$\begin{aligned} \mathrm{{d}}\Lambda _R(t|X_i(t)) =\psi _i(t)^{-1}\exp \{\beta _0^T X_i(t)\}\mathrm{{d}}\Lambda _0^R(t), \end{aligned}$$
(4)

and

$$\begin{aligned} \mathrm{{d}}\Lambda _D(t|X_i(t)) =\psi _i(t)^{-1}\exp \{\alpha _0^T X_i(t)\}\mathrm{{d}}\Lambda _0^D(t), \end{aligned}$$
(5)

where

$$\begin{aligned} \psi _i(t)\equiv \psi _i(t; \alpha , \theta , \Lambda _0^D) =1+\theta \int _0^t\exp \{\alpha ^TX_i(u)\}\mathrm{{d}}\Lambda _0^D(u). \end{aligned}$$

For the unobserved random effects \(u_i\), using \(E(u_i|\upsilon _i)=\eta _0(\upsilon _i-1)\) and taking the conditional expectation of \(u_i\) given \(D_i \ge t\) and \(X_i(t)\), we obtain

$$\begin{aligned} E[u_i |D_i \ge t, X_i(t)]=\eta _0 \big [(1+\theta )/\psi _i(t)-1\big ]. \end{aligned}$$
(6)

Let \(\mathcal {A}_0(t)=\int _0^t\mu _0(s)\mathrm{{d}}\Lambda _0^R(s),\) \(\xi =(\alpha ^T,\beta ^T,\theta ,\gamma ^T,\eta ^T)^T,\) and \(\xi _0=(\alpha _0^T, \beta _0^T, \theta _0, \gamma _0^T, \eta _0^T)^T\) be the true value of \(\xi .\) Define

$$\begin{aligned} \mathrm{{d}}M_i(t;\xi ,\mathcal {A})&=Y_i(t)\psi _i(t)\mathrm{{d}}N_i^R(t) -\Delta _i(t)\left\{ \gamma ^TX_i(t)+\eta ^T B_i(t)\right\} \exp \{\beta ^TX_i(t)\}\mathrm{{d}}\Lambda _0^R(t)\nonumber \\&\quad -\Delta _i(t)\exp \{\beta ^TX_i(t)\}\mathrm{{d}}\mathcal {A}(t), \end{aligned}$$
(7)

where

$$\begin{aligned} B_i(t)\equiv B_i(t; \alpha , \theta , \Lambda _0^D)=\big [(1+\theta )/\psi _i(t)-1\big ]Z_i(t). \end{aligned}$$

In view of (4) and (6), we have that \(E\{M_i(t; \xi _0, \mathcal {A}_0)|X_i(t), \Delta _i(t)\}=0\) under models (1) and (2). Thus, for given \(\xi ,\) \(\mathrm{{d}}\Lambda _0^R(t),\) \(\psi _i(t)\), and \(B_i(t)\), a reasonable estimator of \(\mathcal {A}_0(t)\) is the solution to the following estimating equation:

$$\begin{aligned}&\sum _{i=1}^n\int _0^t\Big [Y_i(u)\psi _i(u)\mathrm{{d}}N_i^R(u) -\big \{\gamma ^TX_i(u)+\eta ^T B_i(u)\big \}\exp \{\beta ^TX_i(u)\}\mathrm{{d}}\Lambda _0^R(u)\\&\quad -\Delta _i(u)\exp \{\beta ^TX_i(u)\}\mathrm{{d}}\mathcal {A}(u)\Big ]=0, \quad 0\le t\le \tau , \end{aligned}$$

where \(\tau \) is a constant such that \(P\{T_i\ge \tau \} > 0.\) Denote this estimator by \(\hat{\mathcal {A}}_0(t;\alpha ,\beta ,\theta ,\Lambda _0^D).\) Let \(\hat{\alpha },\) \(\hat{\beta },\) \(\hat{\theta },\) \(\hat{\Lambda }_0^R(t),\) \(\hat{\psi }_i(t)\), and \(\hat{B}_i(t)\) be the estimates of \(\alpha ,\) \(\beta ,\) \(\theta ,\) \(\Lambda _0^R(t),\) \(\psi _i(t)\), and \(B_i(t)\), respectively, which will be discussed later. In view of (7), applying the generalized estimating equation approach [4] and replacing \(\mathcal {A}_0(t)\) with the above estimator, we specify the following estimating function for \(\gamma \) and \(\eta :\)

$$\begin{aligned}&\sum _{i=1}^n\int _0^\tau \begin{pmatrix}{\begin{matrix}X_i(t)-\bar{X}(t,\hat{\beta })\\ \\ \hat{B}_i(t)-\bar{\hat{B}}(t) \end{matrix}}\end{pmatrix} \Big [Y_i(t)\hat{\psi }_i(t)\mathrm{{d}}N_i^R(t)\nonumber \\&\quad -\big \{\gamma ^TX_i(t)+\eta \hat{B}_i(t)\big \} \exp \{\hat{\beta }^TX_i(t)\}\mathrm{{d}}\hat{\Lambda }_0^R(t)\Big ]=0, \end{aligned}$$
(8)

where

$$\begin{aligned} \bar{X}(t; \beta )=\frac{\sum _{j=1}^n\Delta _j(t)X_j(t)\exp \{\beta ^TX_j(t)\}}{\sum _{j=1}^n\Delta _j(t)\exp \{\beta ^TX_j(t)\}}, \end{aligned}$$

and

$$\begin{aligned} \bar{\hat{B}}(t)=\frac{\sum _{j=1}^n\Delta _j(t)\hat{B}_j(t)\exp \{\hat{\beta }^TX_j(t)\}}{\sum _{j=1}^n\Delta _j(t)\exp \{\hat{\beta }^TX_j(t)\}}. \end{aligned}$$

Denote the solution to (8) as \((\hat{\gamma }^T,\hat{\eta }^T)^T\), which has an explicit form:

$$\begin{aligned} \begin{pmatrix} \hat{\gamma } \\ \\ \hat{\eta } \end{pmatrix}&=\left[ \sum _{i=1}^n\int _0^\tau \begin{pmatrix}{\begin{matrix}X_i(t)-\bar{X}(t,\hat{\beta })\\ \\ \hat{B}_i(t)-\bar{\hat{B}}(t)\end{matrix}}\end{pmatrix} ^{\otimes 2}\Delta _i(t)\exp \{\hat{\beta }^TX_i(t)\}\mathrm{{d}}\hat{\Lambda }_0^R(t)\right] ^{-1}\nonumber \\&\quad \times \left[ \sum _{i=1}^n \int _0^\tau \begin{pmatrix}{\begin{matrix}X_i(t)-\bar{X}(t,\hat{\beta })\\ \\ \hat{B}_i(t)-\bar{\hat{B}}(t) \end{matrix}}\end{pmatrix}Y_i(t)\hat{\psi }_i(t)\mathrm{{d}}N_i^R(t)\right] , \end{aligned}$$
(9)

where \(a^{\otimes 2}=aa^T\) for any vector a.

Now we consider the estimators \(\hat{\alpha },\) \(\hat{\beta },\) \(\hat{\theta },\) \(\hat{\Lambda }_0^R(t)\), and \(\hat{\Lambda }_0^D(t).\) Under models (2) and (3), \(\alpha ,\) \(\beta ,\) \(\theta ,\) \(\Lambda _0^R(t)\), and \(\Lambda _0^D(t)\) can be consistently estimated using a similar method to Kalbfleisch et al. [3]. Specifically, define

$$\begin{aligned} \mathrm{{d}}M_i^R(t)=\psi _i(t)\mathrm{{d}}N_i^R(t)-\Delta _i(t) \exp \{\beta ^T X_i(t)\}\mathrm{{d}}\Lambda _0^R(t), \end{aligned}$$

and

$$\begin{aligned} \mathrm{{d}}M_i^D(t)=\psi _i(t)\mathrm{{d}}N_i^D(t)-\Delta _i(t)\exp \{\alpha ^TW_i(t)\}\mathrm{{d}}\Lambda _0^D(t). \end{aligned}$$

It then follows from (4) and (5) that \(M_i^R(t)\) and \(M_i^D(t)\) are zero-mean stochastic processes. Thus, for given \(\psi _i(t)\), we can use the following estimating equations to estimate \(\alpha ,\) \(\beta ,\) \(\Lambda _0^R(t)\) and \(\Lambda _0^D(t)\):

$$\begin{aligned}&\sum _{i=1}^n \int _0^\tau \big \{X_i(t)-\bar{X}(t;\alpha )\big \}\psi _i(t)\mathrm{{d}}N_i^D(t)=0,\\&\sum _{i=1}^n \int _0^\tau \big \{X_i(t)-\bar{X}(t;\beta )\big \}\psi _i(t)\mathrm{{d}}N_i^R(t)=0,\\&\sum _{i=1}^n \Big [\psi _i(t)\mathrm{{d}}N_i^D(t)-\Delta _i(t)\exp \{\alpha ^TX_i(t)\}\mathrm{{d}}\Lambda _0^D(t)\Big ]=0, \ \ \ 0 \le t \le \tau ,\\&\sum _{i=1}^n \Big [\psi _i(t)\mathrm{{d}}N_i^R(t)-\Delta _i(t)\exp \{\beta ^TX_i(t)\}\mathrm{{d}}\Lambda _0^R(t)\Big ]=0, \ \ \ 0 \le t \le \tau . \end{aligned}$$

However, the weight function \(\psi _i(t),\) also includes unknown parameters \(\theta ,\) which must be estimated. For this, define

$$\begin{aligned} \omega _{1i}(t)&= E[\tilde{N}_i^R(t)| X_i(t), D_i=t],\\ \omega _{2i}(t)&= E[\tilde{N}_i^R(t)| X_i(t), D_i>t]. \end{aligned}$$

Under the assumed models, we have

$$\begin{aligned} \omega _{1i}(t)&= (\theta +1)\psi _i(t)^{-1}\int _0^t\exp \{\beta ^T X_i(u)\}\mathrm{{d}}\Lambda _0^R(u),\\ \omega _{2i}(t)&= \psi _i(t)^{-1}\int _0^t\exp \{\beta ^T X_i(u)\}\mathrm{{d}}\Lambda _0^R(u). \end{aligned}$$

Thus,

$$\begin{aligned} \theta +1=\frac{\omega _{1i}(t)}{\omega _{2i}(t)}. \end{aligned}$$
(10)

In view of (10), as discussed in Kalbfleisch et al. [3], we specify the following estimating equation for \(\theta \):

$$\begin{aligned} \sum _{i=1}^n\int _0^\tau \big \{N_i^R(t)-(\theta +1)\omega _{2i}(t)Q(t)\big \}\mathrm{{d}}N_i^D(t)=0, \end{aligned}$$

where

$$\begin{aligned} Q(t)=\frac{\sum _{j=1}^n\omega _{2j}(t)^{-1}\Delta _j^*(t)N_j^R(t)}{\sum _{j=1}^n\Delta _j^*(t)}, \end{aligned}$$

with \(\Delta _j^*(t)=\Delta _j(t)\{1-N_j^D(t)\}\) being an indicator that subject j is at risk at t and dies after t.

Let \(\rho =(\alpha ^T, \beta ^T, \theta , \Lambda _0^D, \Lambda _0^R)^T.\) We can estimate \(\rho \) using the solutions to the equations \(U(\rho )=(U_1^T, U_2^T, U_3,U_4, U_5)^T=0,\) where

$$\begin{aligned} U_1&=\sum _{i=1}^n\int _0^\tau \left\{ X_i(t)-\bar{X}(t;\alpha )\right\} \psi _i(t) \mathrm{{d}}N_i^D(t),\\ U_2&=\sum _{i=1}^n\int _0^\tau \left\{ X_i(t)-\bar{X}(t;\beta )\right\} \psi _i(t)\mathrm{{d}}N_i^R(t),\\ U_3&=\sum _{i=1}^n\int _0^\tau \big \{N_i^R(t)-(\theta +1)Q(t)\omega _{2i}(t)\big \}\mathrm{{d}}N_i^D(t)=0,\\ U_4&=\sum _{i=1}^n \Big [\psi _i(t)\mathrm{{d}}N_i^D(t)-\Delta _i(t)\exp \{\alpha ^TX_i(t)\}\mathrm{{d}}\Lambda _0^D(t)\Big ], \ \ \ 0 \le t \le \tau ,\\ U_5&=\sum _{i=1}^n \Big [\psi _i(t)\mathrm{{d}}N_i^R(t)-\Delta _i(t)\exp \{\beta ^TX_i(t)\}\mathrm{{d}}\Lambda _0^R(t)\Big ], \ \ \ \ 0 \le t \le \tau . \end{aligned}$$

Let \(\hat{\rho }\) denote the solutions to \(U(\rho )=0.\) Note that the first terms of the estimating functions \(U_4\) and \(U_5\) represent two pure jump processes with jumps at observed event times. Thus, the solutions to \(U_4=0\) and \(U_5=0\) must be piecewise constant functions with jumps only at the observed terminal event times and the observation times (across all subjects), respectively, which yield the Aalen-Breslow-type estimators \(\hat{\Lambda }_0^D(t)\) and \(\hat{\Lambda }_0^R(t)\) [6]. Since estimation of each parameter depends on a subset of the other parameters, the solutions to the above estimating equations can be obtained through a recursive procedure (e.g., [3]). Thus, the estimators \(\hat{\psi }_i(t)\) and \(\hat{B}_i(t)\) can be obtained by replacing \(\alpha ,\theta \) and \(\Lambda _0^D(t)\) with \(\hat{\alpha },\hat{\theta },\hat{\Lambda }_0^D(t)\) in \(\psi _i(t;\alpha ,\theta ,\Lambda _0^D)\) and \(B_i(t;\alpha ,\theta ,\Lambda _0^D),\) respectively. To summarize, we propose the following two-step estimation procedure:

Step 1 First obtain estimator \(\hat{\rho }\) by solving the equations \(U(\rho )=0\). Then calculate the estimators \(\hat{\psi }_i(t)\) and \(\hat{B}_i(t)\) for \(1\le i\le n.\)

Step 2 Plug \(\hat{\eta },\) \(\hat{\psi }_i(t)\) and \(\hat{B}_i(t)\) into Eq. (9) to obtain the estimators \(\hat{\gamma }\) and \(\hat{\eta }.\)

We use the criterion that the absolute differences of the consecutive iterations of parameter estimates is less than \(10^{-3}\) to check convergence. The algorithm in Step 1 converges most times in general, but nonconvergence occurs occasionally depending on the setups. In the simulation studies, the percentage of nonconvergence is about \(2\%\) under different setups with sample size \(n=600.\)

As discussed in Ye et al. [19] and Kalbfleisch et al. [3], under the regularity conditions (C1)-(C4) stated in the Appendix, \(\hat{\rho }\) exists and is unique and consistent. Then using the uniform strong law of large numbers, one can show that \(\hat{\gamma }\) and \(\hat{\eta }\) are consistent. Let \(\hat{\xi }=(\hat{\alpha }^T, \hat{\beta }^T, \hat{\theta }, \hat{\gamma }^T, \hat{\eta }^T)^T.\) Thus, \(\hat{\xi }\) is consistent to \(\xi _0.\) The asymptotic distribution of \(\hat{\xi }\) is stated in the following theorem with the proof given in Appendix.

Theorem 1

Under the regularity conditions (C1)-(C4) stated in Appendix, \(n^{1/2}(\hat{\xi }-\xi _0)\) converges in distribution to a normal random vector with mean zero, and covariance matrix \(\Gamma ^{-1}\Sigma (\Gamma ^T)^{-1}\), where \(\Sigma \) and \(\Gamma \) are given in Appendix.

The asymptotic covariance matrix can be consistently estimated by the usual plug-in method. However, \(\Sigma \) has a complicated analytic form, and it may be unstable to estimate \(\Sigma \) when the plug-in method is used with a moderate sample size. Here, we propose to use the bootstrap method to estimate the covariance matrix of \(\hat{\xi }\). In the following simulation studies with sample size \(n=600\), we find that the covariance estimation is fairly accurate when 100 bootstrap samples are used.

Remark 4

Note that the estimator \(\hat{\theta }\) may not always be nonnegative. For this case, we propose to estimate \(\theta \) by using \(\hat{\theta }^*=\hat{\theta }I(\hat{\theta }\ge 0)\) with a nonnegative constraint. Based on the arguments that are similar to those in Lin and Ying [7] and Zeng, Chen, and Ibrahim [21], we can show that \(\hat{\theta }^*\) possesses the same asymptotic normality as \(\hat{\theta }\) does if \(\theta _0 >0\) and that \(n^{1/2}\hat{\theta }^*\) converges in distribution to \({\varpi }^2 G I(G \ge 0)\) if \(\theta _0=0\), where G denotes the standard normal distribution, and \(\varpi ^2\) is the asymptotic variance of \(n^{1/2}\hat{\theta }.\)

Finally, for comparison purposes, we summarize the differences between our proposed model and the model of Liu et al. [10] (denoted by LHO) as follows:

  1. (1)

    For the longitudinal process, the LHO’s method assumed a random effects model, which neither included an unspecified baseline function of time nor considered the case where some covariates have random effects. Instead, we propose a semiparametric mixed effects model, which includes an unspecified baseline function of time (intercept) and allows for inference about the random effects of time-varying covariates.

  2. (2)

    For the observation process, the LHO’s method considered a proportional intensity frailty model and assumed that conditional on the random effects, the observation process is independent of the terminal event time. We instead consider a proportional rate frailty model that specifies the marginal rate of the observation process given survival and assume that the observation process is not independent of the terminal event even conditional on the random effects.

  3. (3)

    For the terminal event, both methods propose the proportional hazards frailty model. The LHO’s method assumed that the terminal event depends on the longitudinal and the observation processes through two independent random effects, respectively. The proposed method, however, formulates the associations among the three related processes through two dependent random effects, and the dependence structure is assumed to have a linear or polynomial form.

  4. (4)

    For the estimation procedures, the LHO’s method conducted maximum likelihood estimation on the basis of the assumption that the two random effects are independently normally distributed. Thus, their estimation results are expected to be sensitive to departures from this assumption. Unlike their procedure, we use an estimating equation approach for parameter estimation while assuming that the frailty \(\upsilon _i\) follows a gamma distribution. As demonstrated by the simulation studies in Sect. 4, the proposed method is robust to misspecification of the frailty distribution.

4 Simulation

We conducted simulation studies to examine the finite sample properties of the proposed estimators. In the study, two covariates \(X_i=(X_{i1},X_{i2})^T\) were considered, where \(X_{i1}\) follows a Bernoulli distribution with success probability 0.5,  and \(X_{i2}\) follows a uniform distribution on (0, 1). Set \(Z_i=X_{i1}\). The frailty \(\upsilon _i\) was generated from a gamma distribution with unit mean and variance \(\theta _0=0,\) 0.5 or 1. The censoring time was generated from a uniform distribution on (c, 5),  where c is chosen to yield about \(30\%\) censoring for the terminal event. Given the frailty \(\upsilon _i\) and the covariates \(X_i\), the terminal event time \(D_i\) was generated from model (3) with \(\Lambda _0^D(t)=0.5t\) and \(\alpha _0=(0.3, 0.5)^T.\) The observation times were generated from a Poisson process with the intensity function \(\lambda _R=1.5\upsilon _i\exp \{-0.3X_{i1}+0.7X_{i2}\}.\) The average number of observations per subject was about 2 under the preceding settings. For given \(\upsilon _i,\) \(u_i=\eta _0(\upsilon _i-1)+e_i,\) where \(e_i\) is a normal distribution with mean zero and variance 0.5,  and \(\eta _0=-1,\) 0, or 1. The longitudinal response \(Y_i(t)\) was generated from model (1) with \(\mu _0(t)=2t + 1\) and \(\gamma _0=(1, -0.5)^T,\) where the measurement errors \(\epsilon _i(t)\) are generated independently from a standard normal distribution for all t. The results presented below are based on 500 replications with sample size \(n=600.\) The asymptotic covariance was estimated using the bootstrap method with 100 bootstrap samples, which were found to be adequate.

Table 1 Simulation results for estimation of \(\gamma _0\), \(\theta _0\), and \(\eta _0\) when covariates are time-independent

The simulation results for estimation of \(\gamma _0\), \(\theta _0\), and \(\eta _0\) are summarized in Table 1, which includes the bias (Bias) given by the sample means of the estimates minus the true values, the sample standard errors (SE), the sample mean of the standard error estimate (SEE), and the \(95\%\) empirical coverage probabilities (CP) based on the normal approximation. Table 1 shows that our proposed method performed well for the situations considered here. Specifically, the proposed estimators were practically unbiased, and the standard error estimators were very accurate based on the bootstrap method. The \(95\%\) empirical coverage probabilities were reasonable. Note that when \(\theta _0=0\), \(\eta _0\) is unidentifiable. However, based on our simulation results, the estimators of \(\gamma _0\) and \(\theta _0\) are still performed well. In addition, the estimates of \(\Lambda _0^D(t)\) and \(\Lambda _0^R(t)\) are provided in Fig. 1, which indicates that the estimators are accurate.

Fig. 1
figure 1

The first row is for the estimates of \(\Lambda _0^D(t)\) and the second row is for the estimates of \(\Lambda _0^R(t).\) The dashed lines are the proposed estimators, and the solid lines are the true functions

For comparison, we also considered the method of Liang et al. [5] (denoted by LLY), who studied models (1) and (2) without the terminal event. Under the same setup as above, the comparison results are also reported in Table 1. The results indicate that the LLY’s method may lead to biased estimates when the corresponding independent conditions are violated (i.e., \(\theta _0 \ne 0\)). Figure 2 illustrates how the biases of the LLY’s method may arise. Notably, the SE’s of the proposed method are greater than those of LLY’s method. A major reason is that the proposed method involves the estimators of the parameters \(\alpha _0\) and \(\Lambda _0^D(t)\) in model (3), and these estimators introduce additional uncertainty to the proposed estimation procedure.

Fig. 2
figure 2

Bias curves of LLY’s method for the estimates of \(\gamma _1\) and \(\gamma _2\). The dashed lines are for \(\gamma _1,\) and the solid lines are for \(\gamma _2\)

We also conducted simulation studies to examine the performance of the proposed estimators when the gamma distribution was misspecified. We considered two scenarios for the frailty \(\upsilon _i\): (i) \(\upsilon _i\) followed a log-normal distribution with unit mean and variance 0.5; (ii) \(\upsilon _i\) was generated as one-tenth of a Poisson variable with mean 10. The other setups were the same as in Table 1 with \(\gamma _0=(1, -0.5)^T,\) \(\eta _0=-1,\) 0, or 1. The results are given in Table 2. The proposed estimators still performed reasonably well for the two scenarios considered, and the proposed method was robust to misspecification of the frailty distribution.

Table 2 Sensitivity analysis for the misspecification of the frailty

Furthermore, we conducted simulation studies for the setting with time-varying covariates. In the study, we took two time-dependent covariates as \(X_{i1}(t)=\tilde{X}_{i1}t\) and \(X_{i2}(t)=\tilde{X}_{i2}t,\) where \(\tilde{X}_{i1}\) and \(\tilde{X}_{i2}\) were independently generated from a uniform distribution on (0, 1). Set \(X_i(t)=(X_{i1}(t), X_{i2}(t))^T\) and \(Z_i(t)=X_{i1}(t).\) The censoring time was generated from a uniform distribution on (c, 2),  where c is chosen to yield about \(30\%\) censoring for the terminal event. The other setups were the same as in Table 1, except that \(\Lambda _0^D(t)=t\) and \(\Lambda _0^R(t)=5t.\) The results are summarized in Table 3 with \(n=600.\) It can be seen that the proposed method still performed satisfactorily in this case.

Table 3 Simulation results for estimation of \(\gamma _0\), \(\theta _0\), and \(\eta _0\) when covariates are time-dependent

5 An Application

In this section, we applied the proposed methods to the medical cost data of chronic heart failure patients that have been analyzed by Liu et al. [10], Sun et al. [17] and among others. These data were from the clinical data repository at the University of Virginia Health System, which included a total of 1475 patients aged 60–89 years who were first diagnosed with heart failure and treated in 2004. The follow-up ended with each patient’s last hospital admission up to July 31, 2006, or death date, which was obtained from the Death Certificate Data at the Virginia Department of Vital Statistics. During follow-up, 297 patients (20%) died and others were censored. For each patient, three baseline covariates were measured: race, age, and gender. Preliminary studies implied that patients visiting the hospital more often tended to pay more for each visit, and these patients also had a higher mortality rate. That is, the medical cost (longitudinal process) may be strongly correlated with the hospital visits (observation times) and the death (terminal event). To show further how the longitudinal profiles and observational times profiles are associated with terminal events, we plotted the scatter diagrams of the frequency of hospital visits and the log sum of medical cost (until the observed survival time \(T_i\)) versus the observed survival time in Figs. 3 and 4, respectively. These plots show that the medical costs and hospital visits are positively correlated with the death. Given that gender had been shown to have no effect on the medical cost and the hospital visits [10, 17], here we focused on the effects of race and age on the actual monetary expense of the hospital with informative observation times and a dependent terminal event.

Fig. 3
figure 3

Scatter plot of the frequency of hospital visits versus the observed survival time

Fig. 4
figure 4

Scatter plot of the log sum of medical cost versus the observed survival time

As in Liu et al. [10] and Sun et al. [17], we defined \(Y_i(t)\) as the log-transformed cost. For covariates, let \(X_{i1}\) be a binary indicator of race (white \(=1,\) nonwhite \(=0\)), and \(X_{i2}\) denote the age group, taking values 0, 1, and 2 for 60–69, 70–79, and 80–89 years, respectively. Let \(\tau \) be the longest follow-up time. The asymptotic variance was estimated by the bootstrap method with 100 bootstrap samples. We chose \(Z_i=(X_{i1}, X_{i2})^T\) in model (1), because the race and age are significantly related to the hospital visits. The analysis results are summarized in Table 4. For the hospital process, both age and race are significantly related to the hospital visiting, which is in line with the result obtained by Liu et al. [10] (denoted by LHO). In particular, older patients were more likely to visit the hospital and had lower medical cost. White patients visited hospital at less risk and tended to have less medical costs at each visit. For the cost process, we found that age had a significant effect on the medical cost for each visit, but race did not seem to be directly related to the medical cost. Although the results of the LHO’s method implied that age was only marginally significant at 2% level, their estimator for the race effect was significantly different from ours, and the direction of the race effect is reversed. One possible reason is the misspecification of the assumption that the two random effects are independently normally in the LHO’s method. The estimate \(\hat{\theta }=0.3634\) (p value \(<0.0001\)) indicates that there was a significantly positive association between the hospital visits and the death. That is, patients who tended to visit hospital more frequently had a higher mortality rate. In addition, in view of (10), this estimate suggests that a patient who is known to die at time t is expected to have more than 1.3 times as many hospital visits as a patient with identical covariates who has not died by the time t. Moreover, based on the estimate \(\hat{\eta },\) older patients who visited the hospital more often tended to pay more for each visit, but white patients visiting the hospital more often tended to pay less for each visit. These results are basically consistent with those obtained by the LHO’s method.

Table 4 Analysis results for the medical cost data of heart failure patients

For comparison, we also analyzed the data with the LLY’s method, regarding the terminal event as an independent censoring time. The comparison results are provided in the second half of Table 4. For the cost process, the effects of race and age estimated based on the LLY’s method are substantially smaller, and the direction of the race effect is even reversed. For the hospital process, LLY’s method also produced the smaller effects of race and age. Moreover, the association parameters estimated using LLY’s method are significantly different from ours. This is because the LLY’s method ignores the dependent terminal event, and thus yields biased estimates.

6 Discussion

In this article, we proposed a joint modeling for analyzing longitudinal data with informative observation times and a dependent terminal event via two latent variables. The joint model is more comprehensive and flexible in that it does not assume that the observation process is a nonhomogeneous Poisson process and allows that some covariates have random effects. An estimating equation approach was developed for parameter estimation, which yielded consistent and asymptotically normal estimators. The simulation results showed that the proposed estimation approach performs well, and the method was robust to misspecification of the frailty distribution for the situations considered.

Here we have assumed that the covariate histories \(\{X_i(t): 0\le t\le T_i\}\) are observed, and hence \(\bar{X}(t; \beta )\) is well defined. In practice, however, the covariate histories are typically measured discretely, and are often available at the observation times. Thus, some smoothing procedure is needed to interpolate and approximate \(\bar{X}(t; \beta )\). For this, as discussed in Lin and Ying [8], by using the singleton nearest-neighbor method, we may approximate \(\bar{X}(t; \beta )\) by

$$\begin{aligned} \bar{X}^*(t; \beta )=\frac{\sum _{j=1}^n\Delta _j(t)X_j^*(t)\exp \{\beta ^TX_j^*(t)\}}{\sum _{j=1}^n\Delta _j(t)\exp \{\beta ^TX_j^*(t)\}}, \end{aligned}$$

where \(X_j^*(t)\) is the measurement of \(X_j(\cdot )\) at the time point nearest to t. Other choices of \(\bar{X}^*(t; \beta )\) would be the nearest two-neighbor or two-left-right-neighbor average [8]. Also, we may approximate \(\bar{X}(t; \beta )\) by other smoothing methods such as a linear smoother or a nonparametric smoother [18]. A similar approximate method can be used for \(\bar{\hat{B}}(t).\) It would be worthwhile to further address this issue both theoretically and numerically.

In the joint models, we have assumed that the relationship between the latent variables has a linear form. In fact, as long as \(E(u_i|\upsilon _i,X_i(t))\) is a polynomial in \(\upsilon _i\), the estimation procedure can be directly extended to this case. This extension is useful because any continuous function can be approximated by polynomials. However, a high order of polynomial may lead to unstable approximation. Thus, a simple linear form may be a good choice for small or moderate sample sizes. It would be desirable to develop a method for the case that \(E\big \{u_i|\upsilon _i, X_i(t)\big \}\) is an unspecified function of \(\upsilon _i\), that is, the dependence structure between the random effects \(u_i\) and the frailty \(\upsilon _i\) is left unspecified [17]. Nevertheless, this extension is highly nontrivial and it requires further investigation.

Note that models (2) and (3) allow a positive association between the observation process and the terminal event. Although these models fit the example discussed in Sect. 5 well, the negative association may exist. Based on the discussion of Kalbfleisch et al. [3], the proposed model can be generalized to allow for a negative association between the observation process and the terminal event. For this case, we specify \(\upsilon _i^{-1}\) as the frailty in model (2) and retain \(\upsilon _i\) as the frailty in models (1) and (3), where the frailty \(\upsilon _i\) is assumed to follow a gamma distribution with mean 1 and variance \(\theta < 1\). It can be checked that

$$\begin{aligned} \psi _i^*(t)=E\{\upsilon _i^{-1}|X_i(t), D_i \ge t\}=\frac{\psi _i(t)}{1-\theta }. \end{aligned}$$

By replacing \(\psi _i(t)\) and \(1+\theta \) with \(\psi _i^*(t)^{-1}\) and \(1-\theta \), respectively, in \(U_2\), \(U_3\), and \(U_5,\) the same estimating equations can be constructed as in the previous sections. A more general approach is to generalize model (2) to

$$\begin{aligned} \mathrm{{d}}\Lambda _R(t|\upsilon _i)=\upsilon _i^{\sigma }\exp \{\beta _0^T X_i(t)\}\mathrm{{d}}\Lambda _0^R(t), \end{aligned}$$

where \(\sigma \) is an unknown parameter. Estimation of \(\sigma \) in this model is a challenging problem and requires substantial efforts in the future. Finally, the proposed estimation procedure was developed on the basis of the generalized estimating equation approach. The efficiency of the resulting estimators is worthy of further investigation.