Joint Analysis of Longitudinal Data and Informative Observation Times with Time-Dependent Random Effects

Li, Yang; He, Xin; Wang, Haiying; Sun, Jianguo

doi:10.1007/978-3-319-42571-9_2

Yang Li⁶,
Xin He⁷,
Haiying Wang⁸ &
…
Jianguo Sun⁹

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

1206 Accesses
2 Citations

Abstract

Longitudinal data occur in many fields such as the medical follow-up studies that involve repeated measurements. For their analysis, most existing approaches assume that the observation or follow-up times are independent of the response process either completely or given some covariates. In practice, it is apparent that this may not be true. We present a joint analysis approach that allows the possible mutual correlations that can be characterized by time-dependent random effects. Estimating equations are developed for the parameter estimation and the resulting estimators are shown to be consistent and asymptotically normal.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Analyzing longitudinal data with informative observation and terminal event times

Article 01 October 2016

An approximate method for generalized linear and nonlinear mixed effects models with a mechanistic nonlinear covariate measurement error model

Article 17 October 2018

Simultaneous Variable Selection and Estimation in Generalized Semiparametric Mixed Effects Modeling of Longitudinal Data

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Longitudinal data occur in many fields such as the medical follow-up studies that involve repeated measurements. In these situations, study subjects are generally observed only at discrete times. Therefore, for the analysis of longitudinal data, two processes need to be considered: one is the response process, which is usually of the primary interest but not continuously observable; the other one is the observation process, which is nuisance but gives rise to the discrete times when the responses are observed.

An extensive literature exists for the analysis of longitudinal data. Sun and Kalbfleisch (1995) and Wellner and Zhang (2000) investigated nonparametric estimation of the mean function when the response process is a counting process. Cheng and Wei (2000), Sun and Wei (2000), Zhang (2002) and Wellner and Zhang (2007) developed some semiparametric approaches for regression analysis under the proportional means models. However, with respect to the observation process, most existing approaches assume that the observation times are independent of the underlying response process either completely or given some covariates. For the analysis with a correlated observation process, there is limited work and most of them assume independent censoring or require some restrictive conditions such as the Poisson assumption or specified correlation structure for dependence (Huang et al. 2006; Sun et al. 2007; He et al. 2009; Zhao and Tong 2011; Kim et al. 2012; Li et al. 2013; Zhao et al. 2013; Zhou et al. 2013).

In many situations, however, the response process, the observation and censoring times may be mutually correlated. In addition, such correlations may be time-dependent. For instance, both the observation times and longitudinal responses may depend on the stage of disease progression. Their correlation may change over time and so are their correlations with the follow-up times. He et al. (2009) considered such correlations in shared frailty models. However, their method requires the assumptions that the underlying random effect is normally distributed and the observation process is a nonhomogeneous Poisson process. Also all correlations between the three processes are assumed to be fixed over time. Zhao et al. (2013) proposed a robust estimation procedure and relaxed the Poisson assumption required in He et al. (2009). However, the follow-up times are assumed to be independent from covariates, responses and observation times; and the possible correlations between responses and observation times are time-independent. More recently, Sun et al. (2012) presented a joint model with time-dependent correlations between the response process, the observation times and a terminal event, where the random effect associated with the terminal event is fixed over time and follow a specified distribution. In practice, however, such conditions may not hold or be difficult to check when informative censoring involves.

We consider regression analysis of longitudinal data when the underlying response process, the observation and censoring times are mutually correlated and none of the correlations is restricted by specified forms or distributions. A general estimation approach is proposed. The remainder of this chapter is organized as follows: In Sect. 2, we introduce the notation and present the model. Section 3 presents the estimation procedure and establishes the asymptotic properties of the resulting estimators. In Sect. 4, a simulation study is performed to evaluate the finite sample properties of the proposed estimators. Some concluding remarks are given in Sect. 5.

2 Notation and Models

Consider a longitudinal study in which the response process of interest is observed only at some discrete sampling time points. For each subject i, i = 1, ⋯ , n, let N _i(t) be the observation process, which gives the cumulative number of observation times up to time t. In practice, one observes $\widetilde{N}_{i}(t) = N_{i}(t \wedge C_{i})$ where a ∧ b = min(a, b) and C _i denotes the censoring or follow-up time. Let Y _i(t) denote the response process, which gives the response of interest at time t but is observed only at m _i discrete observation times $\{T_{i,1},\cdots \,,T_{i,m_{i}}\}$ when $\widetilde{N}_{i}(t)$ has jumps. Suppose that there exists a p-dimensional vector of covariates denoted by Z _i, which will be assumed to be time-independent.

In the following, we model the correlation between Y _i(t), N _i(t) and C _i through an unobserved random vector b _i(t) = (b _1i(t), b _2i(t), b _3i(t))′, which could be time-dependent. Define $\mathcal{B}_{it} =\{ \mathbf{b}_{i}(s),s \leq t\}$. It will be assumed that the b _i(t)’s are independent and identically distributed, $\mathcal{B}_{it}$ is independent of Z _i, and given Z _i and $\mathcal{B}_{it}$, C _i, N _i(t) and Y _i(t) are mutually independent. To be specific, the mean function of Y _i(t) is assumed to follow the proportional means model

$$\displaystyle{ E\{Y _{i}(t)\vert \mathbf{Z}_{i},\mathbf{b}_{i}(t)\} =\varLambda _{0}(t)\exp \{\beta '\mathbf{Z}_{i} + b_{1i}(t)\}, }$$

(1)

where Λ ₀(t) is an unknown baseline mean function and β denotes a vector of p-dimensional regression coefficients. When b _1i(t) = 0 meaning that Y _i(t) is independent of both N _i(t) of C _i, model (1) has been considered extensively by Cheng and Wei (2000), Sun and Wei (2000), Zhang (2002) and Hu et al. (2003) among others. When b _1i(t) is time-independent, model (1) is equivalent to model (3) considered in Zhao et al. (2013). In general, b _1i(t) is unknown and may follow an arbitrary distribution.

The observation process N _i(t) follows the proportional rates model

$$\displaystyle{ E\{dN_{i}(t)\vert \mathbf{Z}_{i},\mathbf{b}_{i}(t)\} =\exp \{\gamma '\mathbf{Z}_{i} + b_{2i}(t)\}d\mu _{0}(t)\,, }$$

(2)

where γ is a vector of unknown parameters and dμ ₀(t) is an unknown baseline rate function. For the C _i′s, motivated by the additive hazards models that have been commonly used in survival analysis (Lin and Ying 2001; Kalbfleisch and Prentice 2002; Zhang et al. 2005), we consider the additive hazards model. That is, the hazard λ _i(t | Z _i, b _i(t)) of C _i, defined as the rate of observing C _i at time t provided that C _i is no larger than t, is given by

$$\displaystyle{ \lambda _{i}(t\vert \mathbf{Z}_{i},\mathbf{b}_{i}(t)) =\lambda _{0}(t) +\xi '\mathbf{Z}_{i} + b_{3i}(t)\,. }$$

(3)

Here λ ₀(t) is an unknown baseline hazard function and ξ denotes the effect of covariates on the hazard function of C _i′s. Note that instead of model (3), one may consider the proportional hazards model. As pointed out by Lin et al. (1998) and others, the additive model (3) can be more plausible than the proportional hazards model in many applications. Related applications and model-checking techniques of model (3) can be found in Yuen and Burke (1997), Kim and Lee (1998), Ghosh (2003) and Gandy and Jensen (2005) among others.

In the above, models (1)–(3) can be viewed as natural generalizations of some existing and commonly used models. In fact, when any of the b _ki(t)’s (k = 1, 2, 3) is zero or independent from other b _ji(t)’s (j = 1, 2, 3 and j ≠ k), the corresponding process is independent from the others. Therefore, the proposed joint model also applies to special cases when either the observation or censoring times are noninformative. In general, since the form or distribution of b _i(t) is arbitrary and completely unspecified, the joint model described above is quite flexible compared to many existing procedures.

Note that in models (1)–(3), for simplicity, we have assumed that the set of covariates that may affect Y _i(t), N _i(t) and C _i is the same. In practice, it is apparent that this may not be the case and actually the estimation procedure proposed below still applies as long as one replaces Z _i by appropriate covariates. As an alternative, one can define a single and big covariate vector by combining all different covariates together. In the following, we will focus on estimation of regression parameters β along with γ and ξ. For this, it is easy to see that the use of the existing procedures that assume independence could give biased or even misleading results.

3 Estimation Procedure

In this section, we will present an inference procedure for estimation of β which is usually of the primary interest. For this, first note that the counting process $\widetilde{N}_{i}(t) = N_{i}(t \wedge C_{i})$ jumps by one at time t if and only if C _i ≥ t and dN _i(t) = 1. Also we have

$$\displaystyle\begin{array}{rcl} & & E\{d\widetilde{N}_{i}(t)\vert \mathbf{Z}_{i}\} = E\{I(t \leq C_{i})dN_{i}(t)\vert \mathbf{Z}_{i}\} \\ & & \quad = E\bigg[E\{I(t \leq C_{i})dN_{i}(t)\vert \mathbf{Z}_{i},\mathcal{B}_{it}\}\bigg\vert \mathbf{Z}_{i}\bigg] \\ & & \quad = E\bigg[E\{I(t \leq C_{i})\vert \mathbf{Z}_{i},\mathcal{B}_{it}\}E\{dN_{i}(t)\vert \mathbf{Z}_{i},\mathcal{B}_{it}\}\bigg\vert \mathbf{Z}_{i}\bigg] \\ & & \quad = E\bigg[exp\{ -\varLambda _{0}^{{\ast}}(t) - B_{ i}(t) -\xi '\mathbf{Z}_{i}^{{\ast}}(t)\}\exp \{\gamma '\mathbf{Z}_{ i} + b_{2i}(t)\}d\mu _{0}(t)\bigg\vert \mathbf{Z}_{i}\bigg] \\ & & \quad =\exp \{\gamma '\mathbf{Z}_{i} -\xi '\mathbf{Z}_{i}^{{\ast}}(t)\}d\varLambda _{ 1}^{{\ast}}(t), {}\end{array}$$

(4)

where

$$\displaystyle{\varLambda _{0}^{{\ast}}(t) =\int _{ 0}^{t}\lambda _{ 0}(s)ds,\;\;B_{i}(t) =\int _{ 0}^{t}b_{ 3i}(s)ds,\;\;\mathbf{Z}_{i}^{{\ast}}(t) =\int _{ 0}^{t}\mathbf{Z}_{ i}ds\;\;}$$

and

$$\displaystyle{d\varLambda _{1}^{{\ast}}(t) =\exp \{ -\varLambda _{ 0}^{{\ast}}(t)\}E[exp\{b_{ 2i}(t) - B_{i}(t)\}]d\mu _{0}(t).}$$

Define

$$\displaystyle{dM_{i}^{{\ast}}(t;\eta ) = d\widetilde{N}_{ i}(t) - e^{\eta '\mathbf{X}_{i}(t)}d\varLambda _{ 1}^{{\ast}}(t)}$$

and dM _i ^∗(t) = dM _i ^∗(t; η ₀), where η = (γ, ξ)′, X _i(t) = (Z _i, −Z _i ^∗(t))′ and η ₀ denotes the true value of η. It can be shown that M _i ^∗(t) is a mean-zero stochastic process. It follows that the estimators of η and dΛ ₁ ^∗(t) can be obtained by solving the following two estimating equations

$$\displaystyle{ U_{\eta }(\eta ) =\sum _{ i=1}^{n}\int _{ 0}^{\tau }\bigg\{\mathbf{X}_{ i}(t) -\bar{ X}(t;\eta )\bigg\}d\widetilde{N}_{i}(t) = 0 }$$

(5)

and

$$\displaystyle{ \sum _{i=1}^{n}\bigg[d\widetilde{N}_{ i}(t) - e^{\eta '\mathbf{X}_{i}(t)}d\varLambda _{ 1}^{{\ast}}(t)\bigg] = 0. }$$

(6)

In the above, τ is the longest follow-up time, $\bar{X}(t;\eta ) = S^{(1)}(t;\eta )/S^{(0)}(t;\eta )$ and $S^{(k)}(t;\eta ) = n^{-1}\sum _{i=1}^{n}e^{\eta '\mathbf{X}_{i}(t)}\mathbf{X}_{i}(t)^{\otimes k}$ with a ^⊗0 = 1, a ^⊗1 = a, $\bar{x}(t) = lim_{n\rightarrow \infty }\bar{X}(t;\eta _{0})$ and s ^(k)(t) = lim _{n → ∞} S ^(k)(t; η ₀), k = 0, 1.

To estimate β, consider

$$\displaystyle\begin{array}{rcl} & & E\{Y _{i}(t)d\widetilde{N}_{i}(t)\vert \mathbf{Z}_{i},\mathcal{B}_{it}\} {}\\ & & \quad = E\{I(t \leq C_{i})Y _{i}(t)dN_{i}(t)\vert \mathbf{Z}_{i},\mathcal{B}_{it}\} {}\\ & & \quad = E\{I(t \leq C_{i})\vert \mathbf{Z}_{i},\mathcal{B}_{it}\}E\{Y _{i}(t)\vert \mathbf{Z}_{i},\mathcal{B}_{it}\}E\{dN_{i}(t)\vert \mathbf{Z}_{i},\mathcal{B}_{it}\} {}\\ & & \quad = exp\{ -\varLambda _{0}^{{\ast}}(t) - B_{ i}(t) -\xi '\mathbf{Z}_{i}^{{\ast}}(t)\} {}\\ & & \varLambda _{0}(t)\exp \{\beta '\mathbf{Z}_{i} + b_{1i}(t)\}\exp \{\gamma '\mathbf{Z}_{i} + b_{2i}(t)\}d\mu _{0}(t) {}\\ & & \quad =\exp \{ (\beta +\gamma )'\mathbf{Z}_{i} -\xi '\mathbf{Z}_{i}^{{\ast}}(t)\} {}\\ & & \qquad \exp \{ -\varLambda _{0}^{{\ast}}(t) + b_{ 1i}(t) + b_{2i}(t) - B_{i}(t)\}\varLambda _{0}(t)d\mu _{0}(t), {}\\ \end{array}$$

and therefore

$$\displaystyle{ E\{Y _{i}(t)d\widetilde{N}_{i}(t)\vert \mathbf{Z}_{i}\} =\exp \{\beta '\mathbf{Z}_{i} +\eta '\mathbf{X}_{i}(t)\}d\varLambda _{2}^{{\ast}}(t), }$$

(7)

where

$$\displaystyle{d\varLambda _{2}^{{\ast}}(t) =\exp \{ -\varLambda _{ 0}^{{\ast}}(t)\}\varLambda _{ 0}(t)E[exp\{b_{1i}(t) + b_{2i}(t) - B_{i}(t)\}]d\mu _{0}(t).}$$

Define

$$\displaystyle{dM_{i}(t;\beta,\eta ) = Y _{i}(t)d\widetilde{N}_{i}(t) -\exp \{\beta '\mathbf{Z}_{i} +\eta '\mathbf{X}_{i}(t)\}d\varLambda _{2}^{{\ast}}(t)}$$

and dM _i(t) = dM _i(t; β ₀, η ₀), where β ₀ denotes the true value of β. Then M _i(t) is a mean-zero stochastic process. This naturally suggests the following estimating equations to estimate β and dΛ ₂ ^∗(t):

$$\displaystyle{ U_{\beta }(\beta;\hat{\eta }) =\sum _{ i=1}^{n}\int _{ 0}^{\tau }W(t)\mathbf{Z}_{ i}\bigg[Y _{i}(t)d\widetilde{N}_{i}(t) - e^{\beta '\mathbf{Z}_{i}+\hat{\eta }'\mathbf{X}_{i}(t)}d\varLambda _{ 2}^{{\ast}}(t)\bigg] = 0, }$$

(8)

and

$$\displaystyle{ \sum _{i=1}^{n}\bigg[Y _{ i}(t)d\widetilde{N}_{i}(t) - e^{\beta '\mathbf{Z}_{i}+\hat{\eta }'\mathbf{X}_{i}(t)}d\varLambda _{ 2}^{{\ast}}(t)\bigg] = 0,\;\,0 \leq t \leq \tau, }$$

(9)

where $\hat{\eta }= (\hat{\gamma },\;\;\hat{\xi })'$ and $d\widehat{\varLambda }_{1}^{{\ast}}(t)$ are the estimators of η and dΛ ₁ ^∗(t), respectively, solved from (5) and (6), and W(t) is a possibly data-dependent weight function. We denote the estimates of β and dΛ ₂ ^∗(t) by $\hat{\beta }$ and $d\widehat{\varLambda }_{2}^{{\ast}}(t)$, respectively, solved from (8) and (9).

To establish the asymptotic properties of $\hat{\beta }$ and $\hat{\eta }$, define

$$\displaystyle\begin{array}{rcl} & \widehat{M}_{i}^{{\ast}}(t) =\widetilde{ N}_{i}(t) -\int _{0}^{t}e^{\hat{\eta }'\mathbf{X}_{i}(s)}d\widehat{\varLambda }_{1}^{{\ast}}(s;\hat{\eta }), & {}\\ & \widehat{M}_{i}(t) =\int _{ 0}^{t}Y _{i}(s)d\widetilde{N}_{i}(s) -\int _{0}^{t}e^{\hat{\beta }'\mathbf{Z}_{i}+\hat{\eta }'\mathbf{X}_{i}(s)}d\widehat{\varLambda }_{2}^{{\ast}}(s;\hat{\beta },\hat{\eta }), & {}\\ & \widehat{E}_{Z}(t;\beta,\eta ) = \frac{\sum _{i=1}^{n}\mathbf{Z}_{ i}e^{\beta '\mathbf{Z}_{i}+\eta '\mathbf{X}_{i}(t)}} {\sum _{i=1}^{n}e^{\beta '\mathbf{Z}_{i}+\eta '\mathbf{X}_{i}(t)}} \mbox{ and }e_{z}(t) = lim_{n\rightarrow \infty }\widehat{E}_{Z}(t;\beta _{0},\eta _{0}).& {}\\ \end{array}$$

The following theorem gives the consistency and asymptotic normality of $\hat{\beta }$ and $\hat{\eta }$.

Theorem 1.

Assume that the conditions (C1)–(C5) given in the Appendix hold. Then $\hat{\eta }$ and $\hat{\beta }$ are consistent estimators of η ₀ and β ₀ , respectively. The distributions of $n^{1/2}(\hat{\eta }-\eta _{0})$ and $n^{1/2}(\hat{\beta }-\beta _{0})$ can be asymptotically approximated by the normal distributions with mean zero and covariance matrices $\widehat{\varSigma }_{\eta } =\widehat{\varOmega }_{ \eta }^{-1}\widehat{\varPsi }\widehat{\varOmega }_{\eta }^{-1}$ and $\widehat{\varSigma }_{\beta } =\widehat{ A}_{\beta }^{-1}\widehat{\varSigma }\widehat{A}_{\beta }^{-1}$ , respectively, where a ^⊗2 = aa′, $\widehat{\varPsi }= n^{-1}\sum _{i=1}^{n}\hat{u}_{i}^{\otimes 2}$, $\widehat{\varSigma }= n^{-1}\sum _{i=1}^{n}(\hat{v}_{1i} -\hat{ v}_{2i})^{\otimes 2}$,

$$\displaystyle\begin{array}{rcl} \hat{u}_{i}& =& \int _{0}^{\tau }\Big(\mathbf{X}_{ i}(t) -\bar{ X}(t;\hat{\eta })\Big)d\widehat{M}_{i}^{{\ast}}(t)\,, {}\\ \hat{v}_{1i}& =& \int _{0}^{\tau }W(t)\Big(\mathbf{Z}_{ i} -\widehat{ E}_{Z}(t;\hat{\beta },\hat{\eta })\Big)d\widehat{M}_{i}(t)\,, {}\\ \hat{v}_{2i}& =& \int _{0}^{\tau }\widehat{A}_{\eta }\widehat{\varOmega }_{\eta }^{-1}\Big(\mathbf{X}_{ i}(t) -\bar{ X}(t;\hat{\eta })\Big)d\widehat{M}_{i}^{{\ast}}(t)\,, {}\\ \widehat{A}_{\beta }& =& n^{-1}\sum _{ i=1}^{n}\int _{ 0}^{\tau }W(t)e^{\hat{\beta }'\mathbf{Z}_{i}+\hat{\eta }'\mathbf{X}_{i}(t)}\Big(\mathbf{Z}_{ i} -\widehat{ E}_{Z}(t;\hat{\beta },\hat{\eta })\Big)^{\otimes 2}d\widehat{\varLambda }_{ 2}^{{\ast}}(t;\hat{\beta },\hat{\eta }), {}\\ \widehat{A}_{\eta }& =& n^{-1}\sum _{ i=1}^{n}\int _{ 0}^{\tau }W(t)e^{\hat{\beta }'\mathbf{Z}_{i}+\hat{\eta }'\mathbf{X}_{i}(t)}\Big(\mathbf{Z}_{ i} -\widehat{ E}_{Z}(t;\hat{\beta },\hat{\eta })\Big)X'_{i}(t)d\widehat{\varLambda }_{2}^{{\ast}}(t;\hat{\beta },\hat{\eta }) {}\\ \end{array}$$

and

$$\displaystyle{\widehat{\varOmega }_{\eta } = n^{-1}\sum _{ i=1}^{n}\int _{ 0}^{\tau }\{\mathbf{X}_{ i}(t) -\bar{ X}(t;\hat{\eta })\}^{\otimes 2}e^{\hat{\eta }'\mathbf{X}_{i}(t)}d\widehat{\varLambda }_{ 1}^{{\ast}}(t;\hat{\eta }).}$$

4 A Simulation Study

In this section, we report some results obtained from a simulation study conducted to assess the finite sample behavior of the estimation procedure proposed in the previous sections. For each subject i, the covariate Z _i was assumed to be a Bernoulli random variable with the probability of success being 0. 5. Given Z _i and some unobserved random effects b _i(t) = (b _1i(t), b _2i(t), b _3i(t))′, the hazard function of the censoring time C _i was assumed to have the form

$$\displaystyle{ \lambda _{i}(t\vert \mathbf{Z}_{i},\mathcal{B}_{it}) =\lambda _{0} +\xi \mathbf{Z}_{i} + b_{3i}(t), }$$

(10)

with the largest follow-up time τ = 1. The number of observations $\widetilde{N}_{i}(t)$ was assumed to follow a Poisson process on (0, C _i) with the mean function

$$\displaystyle{ E\{N_{i}(t)\vert \mathbf{Z}_{i},\mathcal{B}_{it}\} =\int _{ 0}^{t}\exp \{\gamma \mathbf{Z}_{ i} + b_{2i}(s)\}d\mu _{0}(s)\,. }$$

(11)

In practice, the exact time of C _i may not be observable and $d\widetilde{N}_{i}(t)$ is observed instead of dN _i(t), thus we considered $E\{\widetilde{N}_{i}(t)\vert \mathcal{B}_{it}\}$ for the observation process. From (10) and (11),

$$\displaystyle{E\{d\widetilde{N}_{i}(t)\vert \mathbf{Z}_{i},\mathcal{B}_{it}\} =\exp \{\gamma \mathbf{Z}_{i} -\xi \mathbf{Z}_{i}t\}d\varLambda _{1}^{{\ast}}(t),}$$

where dΛ ₁ ^∗(t) = exp{ −λ ₀ t + b _2i(t) − B _i(t)}dμ ₀(t) and B _i(t) = ∫ ₀ ^t b _3i(s)ds. Given Z _i and $\mathcal{B}_{it}$, $\widetilde{N}_{i}(t)$ was assumed to follow a nonhomogeneous Poisson process and the total number of observation times m _i was generated with mean $E\{m_{i}\} = E\{\widetilde{N}_{i}(\tau )\vert Z_{i},\mathcal{B}_{i\tau }\}$. Then the observation times $\{T_{i,1},\ldots,T_{i,m_{i}}\}$ were taken as m _i order statistics from the density function

$$\displaystyle{f_{\widetilde{N}}(t) = \frac{\exp \{\gamma \mathbf{Z}_{i} -\xi \mathbf{Z}_{i}t\}d\varLambda _{1}^{{\ast}}(t)} {\int _{0}^{\tau }\exp \{\gamma \mathbf{Z}_{i} -\xi \mathbf{Z}_{i}t\}d\varLambda _{1}^{{\ast}}(t)}.}$$

The longitudinal response Y _i(t) was generated from a mixed Poisson process with the mean function

$$\displaystyle{ E\{Y _{i}(t)\vert \mathbf{Z}_{i},\mathcal{B}_{it}\} = Q_{i}\varLambda _{0}(t)\exp \{ -\beta \mathbf{Z}_{i} + b_{1i}(t)\}, }$$

(12)

where Q _i was generated independently from a gamma distribution with mean 1 and variance 0.5. The results given below are based on the sample size of 100 or 200 with 1000 replications and W(t) = W _i = 1.

Table 1 shows the estimation results on β for the situation when b _1i, b _2i and b _3i are time-independent. Note that here ξ ₀ = 0 or γ ₀ = 0 represents the cases when either censoring or the observation times is independent of covariates, respectively. For the random effects, we took b _1i = b _2i = b _3i = b _i, where the b _i′s were generated from the uniform distribution over (−0. 5, 0. 5). It can be seen that the proposed estimates seem unbiased and the estimated standard errors (SEE) are close to the sample standard errors (SSE). Also the empirical 95 % coverage probabilities (CP) are quite accurate.The same conclusions are also obtained for the situation when b _1i, b _2i and b _3i are time-dependent, for which the results are presented in Table 2. Here we took b _1i(t) = b _i t ^1∕3, b _2i(t) = b _i t ^1∕2 and b _3i = b _i with the same b _i generated as for Table 1. We also considered other set-ups such as using different baselines and with Z _i being a continuous variable and obtained similar results.

Table 1 Estimation results with λ ₀ = 2, μ ₀(t) = 20t, Λ ₀(t) = 5t, b _1i = b _2i = b _3i

Full size table

Table 2 Estimation results with λ ₀ = 2, μ ₀(t) = 20t, Λ ₀(t) = 5t, b _1i(t) = b _i t ^1∕3, $b_{2i}(t) = b_{i}\sqrt{t}$ and b _3i(t) = b _i

Full size table

To further investigate the performance of the proposed estimators of β in comparison with those proposed by He et al. (2009) and Sun et al. (2012), we carried out a simulation study and estimated β using all four methods. Note that unlike the proposed estimation procedures, the latter two methods require observing the exact time of a censoring or terminal event C _i. For this, we used the subjects’ last observation times as commonly done in practice. With respect to the method given by Sun et al. (2012), we applied it by using C _i as its original terminal event time D _i and τ as its C _i. Note that as mentioned earlier, both He et al. (2009) and Sun et al. (2012) considered the distribution-based random effects for possible correlations. For the comparison, we focus on the performances of their procedures when the random effects follow various distributions besides those assumed. However, since both of them involve covariate effects in forms different from those considered by our proposed models, we fix β ₀ = 0 and ξ ₀ = 0 in order to avoid unfair comparisons caused by the misspecification of covariate effects. The estimation results are given in Table 3 with three set-ups. In the first set-up, referred to as M ₁, we considered the situation as used for Table 1 except μ ₀(t) = 10t and b _1i = −b _2i = b _3i. In the second and third set-ups called M ₂ and M ₃, we generated b _1i(t), b _2i(t) and b _3i(t) from various distributions such that the assumptions required by either Sun et al. (2012) or He et al. (2009) are satisfied. For example, we took λ ₀(t) = 0 and generated b _3i(t) from an extreme-value distribution as assumed by Sun et al. (2012). We also generated b _1i(t), b _2i(t) and b _3i(t) from the assumed distributions required by He et al. (2009).

Table 3 Estimation results on β based on the proposed procedure and the procedures given in Sun et al. (2012) and He et al. (2009) with β ₀ = ξ ₀ = γ ₀ = 0

Full size table

Note that in all set-ups considered above, our proposed models are correctly specified because there are no assumed distributions on b _1i(t), b _2i(t) or b _3i(t). In contrast, the models from either of He et al. (2009) or Sun et al. (2012) are only correctly specified in one of the set-ups. On the other hand, since there are no covariate effects in all set-ups, we do not expect that the point estimates of β given by He et al. (2009) or Sun et al. (2012) are much biased even if the imposed distributions are misspecified in the estimation. For their variance estimates, we expect that SEE and SSE agree for both, because the former applied bootstrap resampling and the latter did not involve any assumed distribution of random effects in their variance estimation. Therefore, we only compare bias and SSE. It can be seen that all estimation procedures gave comparably small bias as expected. However, it appears that the proposed estimators are more efficient for all cases in general. In comparison, the method given by He et al. (2009) is comparably efficient to the proposed estimators only under M ₃ when all its distribution assumptions are satisfied. For the method given by Sun et al. (2012), it is worth noting that when D _i is substituted by the last observation time C _i from subject i, it gives relatively large SSE, especially when C _i’s vary much, regardless of whether the assumption about b _3i(t) is satisfied (for M ₂) or not (for M ₃).

5 Concluding Remarks

We proposed a joint model for analyzing longitudinal data with informative censoring and observation times. The mutual correlations are characterized via a shared vector of time-dependent random effects. As mentioned earlier, several procedures have been developed in the literature for longitudinal data when either censoring or observation process is informative. However when both of them are informative, there is limited work that can apply except those given in He et al. (2009) and Sun et al. (2012). In addition, all the existing procedures assumed time-independent or specifically distributed correlation structures. The proposed joint model is flexible in that the shared vector of random effects can be time-dependent and neither of its structure nor distribution are specified. For the parameter estimation, the proposed procedure is simple and easy to implement.

There exist several directions for future research. One is that as mentioned above, one may want to consider other models rather than models (1)–(3) and develop similar estimation procedures. Of course, a related problem is model selection and one may want to develop some model selection techniques to choose the optimal model among several candidate models (Tong et al. 2009; Wang et al. 2014). Note that in the proposed method, we have employed a weight function W(t) and it would be desirable to develop some procedures for the selection of an optimal W(t). As in most similar situations, this is clearly a difficult problem as it requires the specification of the covariance function of Y _i(t) and $\widetilde{N}_{i}(t)$ (Sun et al. 2012). Finally in the above, we have focused on regression analysis of Y _i(t) with time-independent covariates. Sometimes one may face time-dependent covariates and thus it would be helpful to generalize the proposed method to this latter situation. Also sometimes nonparametric estimation of Y _i(t) or the baseline functions may be of interest. For those purposes, some constraints should be imposed on b _i(t) for identifiability, for example, E{b _i(t)} = 0. When panel count data arise (Sun and Zhao, 2013), the generalization of existing nonparametric estimation procedures to cases with informative observation or censoring times is a challenging direction for future work too.

References

Cheng, S. C., & Wei, L. J. (2000). Inferences for a semiparametric model with panel data. Biometrika, 87, 89–97.
Article MathSciNet MATH Google Scholar
Gandy, A., & Jensen, U. (2005). Checking a semiparametric additive hazards model. Lifetime Data Analysis, 11, 451–472.
Article MathSciNet MATH Google Scholar
Ghosh, D. (2003). Goodness-of-fit methods for additive-risk models in tumorignenicity experiments. Biometrics, 59, 721–726.
Article MathSciNet MATH Google Scholar
He, X., Tong, X., & Sun, J. (2009). Semiparametric analysis of panel count data with correlated observation and follow-up times. Lifetime Data Analysis, 15, 177–196.
Article MathSciNet MATH Google Scholar
Hu, X. J., Sun J., & Wei, L. J. (2003). Regression parameter estimation from panel counts. Scandinavian Journal of Statistics, 30, 25–43.
Article MathSciNet MATH Google Scholar
Huang, C. Y., Wang, M. C., & Zhang, Y. (2006). Analysing panel count data with informative observation times. Biometrika, 93, 763–775.
Article MathSciNet MATH Google Scholar
Kalbfleisch, J. D., & Prentice, R. L. (2002). The statistical analysis of failure time data. New York: Wiley.
Book MATH Google Scholar
Kim, J., & Lee, S.Y. (1998). Two-sample goodness-of-fit tests for additive risk models with censored observations. Biometrika, 85, 593–603.
Article MathSciNet MATH Google Scholar
Kim, S., Zeng, D., Chambless, L., & Li, Y. (2012). Joint models of longitudinal data and recurrent events with informative terminal event. Statistics in Biosciences, 4, 262–281.
Article Google Scholar
Li, N., Zhao, H., & Sun, J. (2013). Semiparametric transformation models for panel count data with correlated observation and follow-up times. Statistics in Medicine, 32(17), 3039–3054.
Article MathSciNet Google Scholar
Lin, D. Y., Oaks, D., & Ying, Z. (1998). Additive hazards regression with current status data. Biometrika, 85(2), 289–298.
Article MathSciNet MATH Google Scholar
Lin, D. Y., & Ying, Z. (2001). Semiparametric and Nonparametric Regression Analysis of Longitudinal Data (with discussion). Journal of the American Statistical Association, 96(453), 103–113.
Article MathSciNet MATH Google Scholar
Sun, J., & Kalbfleisch, J. D. (1995). Estimation of the mean function of point processes based on panel count data. Statistica Sinica, 5, 279–289.
MathSciNet MATH Google Scholar
Sun, J., Tong, X., & He, X. (2007). Regression analysis of panel count data with dependent observation times. Biometrics, 63, 1053–1059.
Article MathSciNet MATH Google Scholar
Sun, J., & Wei, L. J. (2000). Regression analysis of panel count data with covariate-dependent observation and censoring times. Journal of the Royal Statistical Society, Series B, 62, 293–302.
Article MathSciNet Google Scholar
Sun, J., & Zhao, X., (2013). The statistical analysis of panel count data. New York: Springer.
Book MATH Google Scholar
Sun, L., Song, X., Zhou, J., & Liu, L. (2012). Joint analysis of longitudinal data with informative observation times and a dependent terminal event. Journal of the American Statistical Association, 107(498), 688–700.
Article MathSciNet MATH Google Scholar
Tong, X., Sun, L., He, X., & Sun, J. (2009). Variable selection for panel count data via non-concave penalized estimating function. Scandinavian Journal of Statistics, 36, 620–635.
Article MathSciNet MATH Google Scholar
Wang, H., Li, Y., & Sun, J. (2014). Focused and model average estimation for regression analysis of panel count data. Scandinavian Journal of Statistics. doi:10.1002/sjos.12133.
MATH Google Scholar
Wellner, J. A., & Zhang, Y. (2000). Two estimators of the mean of a counting process with panel count data. Annals of Statistics, 28, 779–814.
Article MathSciNet MATH Google Scholar
Wellner, J. A., & Zhang, Y. (2007). Two likelihood-based semiparametric estimation methods for panel count data with covariates. Annals of Statistics, 35, 2106–2142.
Article MathSciNet MATH Google Scholar
Yuen, K.C., & Burke, M.D. (1997). A test of fit for a semiparametric additive risk model. Biometrika, 84, 631–639.
Article MathSciNet MATH Google Scholar
Zhang, Y. (2002). A semiparametric pseudolikelihood estimation method for panel count data. Biometrika, 89, 39–48.
Article MathSciNet MATH Google Scholar
Zhang, Z., Sun, J., & Sun, L. (2005). Statistical analysis of current status data with informative observation times. Statistics in Medicine, 24, 1399–1407.
Article MathSciNet Google Scholar
Zhao, X., & Tong, X. (2011). Semiparametric regression analysis of panel count data with informative observation times. Computational Statistics and Data Analysis, 55(1), 291–300.
Article MathSciNet MATH Google Scholar
Zhao, X., Tong, X., & Sun, J. (2013). Robust estimation for panel count data with informative observation times. Computational Statistics and Data Analysis, 57, 33–40.
Article MathSciNet Google Scholar
Zhou, J., Zhao, X., & Sun, L. (2013). A new inference approach for joint models of longitudinal data with informative observation and censoring times. Statistica Sinica, 23, 571–593.
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC, USA
Yang Li
Department of Epidemiology and Biostatistics, University of Maryland, College Park, MD, USA
Xin He
Department of Mathematics and Statistics, University of New Hampshire, Durham, NH, USA
Haiying Wang
Department of Statistics, University of Missouri, Columbia, MO, USA
Jianguo Sun

Authors

Yang Li
View author publications
You can also search for this author in PubMed Google Scholar
Xin He
View author publications
You can also search for this author in PubMed Google Scholar
Haiying Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianguo Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Li .

Editor information

Editors and Affiliations

Mailman School of Public Health Department of Biostatistics, Columbia University, New York, New York, USA
Zhezhen Jin
Division of Biostatistics, NYU School of Medicine, New York, New York, USA
Mengling Liu
Biostatistics Celgene Corporation, Summit, New Jersey, USA
Xiaolong Luo

Appendix

1.1 Proof of Theorem 1

To derive the asymptotic properties of the proposed estimators $\hat{\beta }$ and $\hat{\eta }$, we need the following regularity conditions:

(C1)
$\{\widetilde{N}_{i}(\cdot ),Y _{i}(\cdot ),C_{i},\mathbf{Z}_{i}\}_{i=1}^{n}$ are independent and identically distributed.
(C2)
There exists a τ > 0 such that P(C _i ≥ τ) > 0.
(C3)
Both $\widetilde{N}_{i}(t)$ and Y _i(t) (0 ≤ t ≤ τ, i = 1, …, n) are bounded.
(C4)
W(t) and Z _i, i = 1, …, n, have bounded variations and W(t) converges almost surely to a deterministic function w(t) uniformly in t ∈ [0, τ].
(C5)
$A_{\beta } = E\{\int _{0}^{\tau }w(t)e^{\beta '_{0}\mathbf{Z}_{i}+\eta _{0}'\mathbf{X}_{i}(t)}[\mathbf{Z}_{i} - e_{z}(t)]^{\otimes 2}d\varLambda _{2}^{{\ast}}(t)\}$ and $\varOmega _{\eta } = E\Big[\int _{0}^{\tau }\big\{\mathbf{X}_{i}(t) -\bar{ x}(t)\big\}^{\otimes 2}e^{\eta _{0}'\mathbf{X}_{i}(t)}d\varLambda _{1}^{{\ast}}(t)\Big]$ are both positive definite.

Under condition (C2), we define

$$\displaystyle{U_{1}(\beta;\hat{\eta }) =\sum _{ i=1}^{n}\int _{ 0}^{\tau }W(t)\mathbf{Z}_{ i}\bigg[Y _{i}(t)d\widetilde{N}_{i}(t) - e^{\beta '\mathbf{Z}_{i}+\hat{\eta }'\mathbf{X}_{i}(t)}d\widehat{\varLambda }_{ 2}^{{\ast}}(t)\bigg],}$$

which is integrable under conditions (C3) and (C4). Also note that $d\widehat{\varLambda }_{2}^{{\ast}}(t)$ satisfies

$$\displaystyle{ \sum _{i=1}^{n}\bigg[Y _{ i}(t)d\widetilde{N}_{i}(t) - e^{\beta '\mathbf{Z}_{i}+\hat{\eta }'\mathbf{X}_{i}(t)}d\widehat{\varLambda }_{ 2}^{{\ast}}(t)\bigg] = 0,\;\,0 \leq t \leq \tau. }$$

(13)

Let

$$\displaystyle{\widehat{A}_{\beta }(\beta ) = -n^{-1}\partial U_{ 1}(\beta,\hat{\eta })/\partial \beta ',\widehat{A}_{\eta }(\eta ) = -n^{-1}\partial U_{ 1}(\beta _{0},\eta )/\partial \eta ',}$$

and under (C1), let

$$\displaystyle{A_{\beta } =\lim _{n\rightarrow \infty }\widehat{A}_{\beta }(\beta _{0}),\;A_{\eta } =\lim _{n\rightarrow \infty }\widehat{A}_{\eta }(\eta _{0}).}$$

The consistency of $\hat{\beta }$ and $\hat{\eta }$ follows from the facts that $U_{1}(\beta _{0};\hat{\eta })$ and U _η(η ₀) both tend to 0 in probability as n → ∞, and that under condition (C5), $\widehat{A}_{\beta }(\beta )$ and − n ⁻¹ ∂ U _η(η)∕∂ η′ both converge uniformly to the positive definite matrices A _β and Ω _η over β and η, respectively, in neighborhoods around the true values β ₀ and η ₀. Then the Taylor series expansions of $U_{1}(\hat{\beta };\hat{\eta })$ at $(\beta _{0};\hat{\eta })$ and (β ₀, η ₀) yield $n^{1/2}(\hat{\beta }-\beta _{0}) = A_{\beta }^{-1}n^{-1/2}U_{1}(\beta _{0};\hat{\eta }) + o_{p}(1) = A_{\beta }^{-1}\Big\{n^{-1/2}U_{1}(\beta _{0};\eta _{0}) -A_{\eta }n^{1/2}(\hat{\eta }-\eta _{0})\Big\} + o_{p}(1).$ The proof of Theorem 1 is sketched as follows:

(1)
First, using some derivation operation to $U_{1}(\beta;\hat{\eta })$ and (13), we can get
$$\displaystyle{\widehat{A}_{\beta }(\beta ) = n^{-1}\sum _{ i=1}^{n}\int _{ 0}^{\tau }W(t)\big\{\mathbf{Z}_{ i} -\widehat{ E}_{Z}(t;\beta,\hat{\eta })\big\}^{\otimes 2}e^{\beta '\mathbf{Z}_{i}+\hat{\eta }'\mathbf{X}_{i}(t)}d\widehat{\varLambda }_{ 2}^{{\ast}}(t;\beta,\hat{\eta }).}$$
(2)
Solving $d\widehat{\varLambda }_{2}^{{\ast}}(t;\beta _{0},\eta _{0})$ from (13) and applying to U ₁(β ₀; η ₀) yields
$$\displaystyle{U_{1}(\beta _{0};\eta _{0}) =\sum _{ i=1}^{n}\int _{ 0}^{\tau }w(t)\Big(\mathbf{Z}_{ i} - e_{z}(t)\Big)dM_{i}(t) + o_{p}(n^{1/2}),}$$
where $e_{z}(t) = lim_{n\rightarrow \infty }\widehat{E}_{Z}(t;\beta _{0},\eta _{0})$ as defined earlier in Sect. 3 and w(t) is a deterministic function defined under (C5).
(3)
Differentiation of U ₁(β ₀, η) and (13) with respect to η yields
$$\displaystyle{\widehat{A}_{\eta }(\eta ) = n^{-1}\sum _{ i=1}^{n}\int _{ 0}^{\tau }W(t)\big[\mathbf{Z}_{ i} -\widehat{ E}_{Z}(t;\beta _{0},\eta )\big]e^{\beta _{0}'\mathbf{Z}_{i}+\eta '\mathbf{X}_{i}(t)}X'_{ i}(t)d\widehat{\varLambda }_{2}^{{\ast}}(t;\beta _{ 0},\eta )\,.}$$
(4)
According to Eq. (5) and by using the asymptotic results in Lin et al. (2000) (A.5), one can show that
$$\displaystyle{n^{1/2}\{\hat{\eta } -\eta _{ 0}\} =\varOmega _{ \eta }^{-1}n^{-1/2}\sum _{ i=1}^{n}\bigg[\int _{ 0}^{\tau }\Big(\mathbf{X}_{ i}(t) -\frac{s^{(1)}(t)} {s^{(0)}(t)}\Big)dM_{i}^{{\ast}}(t)\bigg] + o_{ p}(1),}$$
where $\varOmega _{\eta } = E\Big[\int _{0}^{\tau }\big\{\mathbf{X}_{i}(t) -\bar{ x}(t)\big\}^{\otimes 2}e^{\eta _{0}'\mathbf{X}_{i}(t)}d\varLambda _{1}^{{\ast}}(t)\Big]$, which is invertible under (C5).

Combining the results in steps (1)–(4), we have

$$\displaystyle\begin{array}{rcl} U_{1}(\beta _{0};\hat{\eta })& =& \sum _{i=1}^{n}\bigg[\int _{ 0}^{\tau }w(t)\big\{\mathbf{Z}_{ i} - e_{z}(t)\big\}dM_{i}(t)\bigg] {}\\ & & -A_{\eta }\varOmega _{\eta }^{-1}\sum _{ i=1}^{n}\bigg[\int _{ 0}^{\tau }\big\{\mathbf{X}_{ i}(t) -\bar{ x}(t)\big\}dM_{i}^{{\ast}}(t)\bigg] + o_{ p}(n^{1/2}). {}\\ \end{array}$$

Since A _β is also invertible under (C5), it then follows from the multivariate central limit theorem that the conclusions hold.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., He, X., Wang, H., Sun, J. (2016). Joint Analysis of Longitudinal Data and Informative Observation Times with Time-Dependent Random Effects. In: Jin, Z., Liu, M., Luo, X. (eds) New Developments in Statistical Modeling, Inference and Application. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-42571-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-42571-9_2
Published: 09 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42570-2
Online ISBN: 978-3-319-42571-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Joint Analysis of Longitudinal Data and Informative Observation Times with Time-Dependent Random Effects

Abstract

Similar content being viewed by others

Analyzing longitudinal data with informative observation and terminal event times

An approximate method for generalized linear and nonlinear mixed effects models with a mechanistic nonlinear covariate measurement error model

Simultaneous Variable Selection and Estimation in Generalized Semiparametric Mixed Effects Modeling of Longitudinal Data

Keywords

1 Introduction

2 Notation and Models

3 Estimation Procedure

Theorem 1.

4 A Simulation Study

5 Concluding Remarks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

1.1 Proof of Theorem 1

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Joint Analysis of Longitudinal Data and Informative Observation Times with Time-Dependent Random Effects

Abstract

Similar content being viewed by others

Analyzing longitudinal data with informative observation and terminal event times

An approximate method for generalized linear and nonlinear mixed effects models with a mechanistic nonlinear covariate measurement error model

Simultaneous Variable Selection and Estimation in Generalized Semiparametric Mixed Effects Modeling of Longitudinal Data

Keywords

1 Introduction

2 Notation and Models

3 Estimation Procedure

Theorem 1.

4 A Simulation Study

5 Concluding Remarks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Proof of Theorem 1

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation