Keywords

4.1 Introduction

The idea of an assumed copula was suggested by Zheng and Klein (1995) in their analysis of survival data subject to dependent censoring. They considered a bivariate distribution function of survival time and censoring time, where the form of the copula function is completely specified, including its parameter value. This strong assumption of the copula is imposed to make the model identifiable. Assuming the independence copula is equivalent to the assumption of independent censoring between survival time and censoring time.

Zheng and Klein (1995) view censoring as a competing risk of death and view death as a competing risk of censoring. This is the setting of bivariate competing risks where one can observe the first-occurring event time and the type of the observed event (death or censoring whichever comes first). With this view, survival data with dependent censoring are equivalent to bivariate competing risks data. In the context of competing risks, the independence among event times is rarely assumed since many medical and engineering applications yield event times that are positively associated. Hence, statistical methods for analyzing bivariate competing risks data can be applicable for analyzing survival data with dependent censoring.

Under an assumed copula, Zheng and Klein (1995) estimated the marginal survival function by the copula-graphic (CG) estimator. The survival function estimated by the CG estimator is analogous to the one estimated by the Kaplan–Meier estimator. The CG estimator reduces to the Kaplan–Meier estimator under the independence copula. In real applications, the CG estimator is calculated by assuming one of Archimedean copulas. Rivest and Wells (2001) obtained a simple expression of the CG estimator when the assumed copula belongs to Archimedean copulas. Nowadays, the CG estimator is an indispensable tool for analyzing survival data with dependent censoring (Braekers and Veraverbeke 2005; Staplin 2012; de Uña-Álvarez and Veraverbeke 2013; 2017; Emura and Chen 2016; Emura and Michimae 2017; Moradian et al. 2017). Note, however, that the CG estimator cannot handle covariates. Likelihood-based approaches can naturally deal with covariates under an assumed copula.

Throughout this chapter, we review the copula-graphic estimator, parametric likelihood methods, and semi-parametric likelihood methods developed under an assumed copula.

4.2 The Copula-Graphic (CG) Estimator

Analysis of survival data often begins by drawing the Kaplan–Meier survival curve which graphically summarizes survival experience of patients in the data. However, under dependent censoring, the Kaplan–Meier estimator may give biased information about survival. A survival curve calculated from the CG estimator provides unbiased information about survival if the copula function between death time and censoring time is correctly specified. Below, we shall introduce the CG estimator under an Archimedean copula as derived in Rivest and Wells (2001).

Consider random variables, defined as

  • T: survival time

  • U: censoring time

Consider an Archimedean copula model

$$ \Pr (T > t,U > u) = \phi_{\theta }^{ - 1} [\phi_{\theta } \{ S_{T} (t)\} + \phi_{\theta } \{ S_{U} (u)\} ], $$
(4.1)

where \( \phi_{\theta } :[0,1] \mapsto [0,\infty ] \) is a generator function, which is continuous and strictly decreasing from \( \phi_{\theta } (0) = \infty \) to \( \phi_{\theta } (1) = 0 \) (Chap. 3); \( S_{T} (t) = \Pr (T > t) \) and \( S_{U} (u) = \Pr (U > u) \) are the marginal survival functions.

Let \( (t_{i} ,\delta_{i} ) \), \( i = 1, \ldots ,n \), be survival data without covariates, where \( t_{i} = \hbox{min} \{ T_{i} ,U_{i} \} \), \( \delta_{i} = {\mathbf{I}}(T_{i} \le U_{i} ) \), and \( {\mathbf{I}}( \cdot ) \) is the indicator function. Assume that all the observed times are distinct (\( t_{i} \ne t_{j} \) whenever \( i \ne j \)). Based on the data, one can estimate the survival function by the following estimator:

.

The CG estimator is defined as

$$ \hat{S}_{T} (t) = \phi_{\theta }^{ - 1} \left[ {\sum\limits_{{t_{i} \le t,\delta_{i} = 1}} {\phi_{\theta } \left( {\frac{{n_{i} - 1}}{n}} \right) - \phi_{\theta } \left( {\frac{{n_{i} }}{n}} \right)} } \right],\quad 0 \le t \le \mathop {\hbox{max} }\limits_{i} (t_{i} ) $$

where \( n_{i} = \sum\nolimits_{\ell = 1}^{n} {{\mathbf{I}}(t_{\ell } \ge t_{i} )} \) is the number at risk at time t i ; \( \hat{S}_{T} (t) = 1 \) if no death occurs up to time t; \( \hat{S}_{T} (t) \) is undefined for \( t > \mathop {\hbox{max} }\limits_{i} (t_{i} ) \).

The derivation of the CG estimator: Assume that S T (t) is a decreasing step function with jumps at death times. Thus, δ i  = 1 implies S T (t i ) ≠ S T (t i  − dt) and S U (t i ) = S U (t i  − dt). Setting t = u = t i in Eq. (4.1), we have

$$ \phi_{\theta } \{ \Pr (T > t_{i} ,U > t_{i} )\} = \phi_{\theta } \{ S_{T} (t_{i} )\} + \phi_{\theta } \{ S_{U} (t_{i} )\} . $$

In the left-side of the preceding equation, we estimate \( \Pr (T > t_{i} ,U > t_{i} ) \) by (n i  − 1)/n, where \( n_{i} - 1 = \sum\nolimits_{\ell = 1}^{n} {{\mathbf{I}}(t_{\ell } > t_{i} )} \) is the number of survivors at time t i . Accordingly,

$$ \phi_{\theta } \left( {\frac{{n_{i} - 1}}{n}} \right) = \phi_{\theta } \{ S_{T} (t_{i} )\} + \phi_{\theta } \{ S_{U} (t_{i} )\} . $$
(4.2)

Meanwhile, we set t = u = t i  − dt in Eq. (4.1) and then estimate \( \Pr (T > t_{i} - dt,U > t_{i} - dt) \) by n i /n. Then,

$$ \phi_{\theta } \left( {\frac{{n_{i} }}{n}} \right) = \phi_{\theta } \{ S_{T} (t_{i} - dt)\} + \phi_{\theta } \{ S_{U} (t_{i} )\} ,\quad \delta_{i} = 1. $$
(4.3)

Equations (4.2) and (4.3) result in the system of difference equations

$$ \phi_{\theta } \left( {\frac{{n_{i} - 1}}{n}} \right) - \phi_{\theta } \left( {\frac{{n_{i} }}{n}} \right) = \phi_{\theta } \{ S_{T} (t_{i} )\} - \phi_{\theta } \{ S_{T} (t_{i} - dt)\} ,\quad \delta_{i} = 1. $$

We impose the usual constraint that S T (t i  − dt) = 1 when t i is the smallest death time. Then, the solution to the different equations is

$$ \begin{aligned} \phi_{\theta } \{ S_{T} (t)\} & = \sum\limits_{{t_{i} \le t,\delta_{i} = 1}} {[\phi_{\theta } \{ S_{T} (t_{i} )\} - \phi_{\theta } \{ S_{T} (t_{i} - dt)\} ]} \\ & = \sum\limits_{{t_{i} \le t,\delta_{i} = 1}} {\phi_{\theta } \left( {\frac{{n_{i} - 1}}{n}} \right) - \phi_{\theta } \left( {\frac{{n_{i} }}{n}} \right)} , \\ \end{aligned} $$

which is equivalent to the CG estimator. ■

Under the independence copula, given by \( \phi_{\theta } (t) = - \log (t) \), the CG estimator is equivalent to the Kaplan–Meier estimator. Under the Clayton copula, given by \( \phi_{\theta } (t) = (t^{ - \theta } - 1)/\theta \) for θ > 0, the CG estimator is written as

$$ \hat{S}_{T} (t) = \left[ {1 + \sum\limits_{{t_{i} \le t,\delta_{i} = 1}} {\left\{ {\left( {\frac{{n_{i} - 1}}{n}} \right)^{ - \theta } - \left( {\frac{{n_{i} }}{n}} \right)^{ - \theta } } \right\}} } \right]^{ - 1/\theta } . $$

This CG estimator can be computed by the compound. Cox R package (Emura et al. 2018).

The CG estimator provides a graphical summary of survival experience for patients in the same manner as the Kaplan–Meier estimator.

.

The survival curve is defined as the plot of \( \hat{S}_{T} (t) \) against t, starting with t = 0 and ending with \( t_{\hbox{max} } = \mathop {\hbox{max} }\limits_{i} (t_{i} ) \). The curve is a step function that jumps only at points where a death occurs. On the curve, censoring times are often indicated as the mark “+.

If \( t_{\hbox{max} } = \mathop {\hbox{max} }\limits_{i} (t_{i} ) \) corresponds to time-to-death of a patient, then \( \hat{S}_{T} (t_{\hbox{max} } ) = \phi_{\theta }^{ - 1} (\infty ) = 0 \). This is because \( \phi_{\theta } \left( {\frac{{n_{i} - 1}}{n}} \right) = \phi_{\theta } (0) = \infty \) for some i in the definition of the CG estimator. If \( t_{\hbox{max} } = \mathop {\hbox{max} }\limits_{i} (t_{i} ) \) corresponds to censoring time of a patient, then \( \hat{S}(t_{\hbox{max} } ) > 0 \).

Additional remarks: The CG estimator can be modified to accommodate a variety of different censoring and truncation mechanisms. de Uña-Álvarez and Veraverbeke (2013) derived the CG estimator when survival time is subject to both dependent censoring and independent censoring. This estimator is convenient if the data provide the causes of censors for all patients. For instance, censoring caused by dropout may be dependent while censoring caused by the study termination is independent (see Chap. 14 of Collett (2015)). de Uña-Álvarez and Veraverbeke (2017) derived the CG estimator when survival time is subject to both dependent censoring and independent truncation. Chaieb et al. (2006) and Emura and Murotani (2015) derived the CG estimator when survival time is subject to independent censoring and dependent truncation.

4.3 Model and Likelihood

Throughout this chapter, we consider a bivariate survival function

$$ \Pr (T > t,U > u|{\mathbf{x}}) = C_{\theta } \{ S_{T} (t|{\mathbf{x}}),S_{U} (u|{\mathbf{x}})\} , $$

where C θ is a copula (Nelsen 2006) with a parameter θ; \( S_{T} (t|{\mathbf{x}}) = \Pr (T > t|{\mathbf{x}}) \) and \( S_{U} (u|{\mathbf{x}}) = \Pr (U > u|{\mathbf{x}}) \) are the marginal survival functions. The covariates are defined as \( {\mathbf{x}} = ({\mathbf{x}}_{1} ,{\mathbf{x}}_{2} ) \) such that \( S_{T} (t|{\mathbf{x}}) = S_{T} (t|{\mathbf{x}}_{1} ) \) and \( S_{U} (u|{\mathbf{x}}) = S_{U} (t|{\mathbf{x}}_{2} ) \). For instance, if \( {\mathbf{x}}_{1} = ({\text{Age}},{\text{gender}}) \) and \( {\mathbf{x}}_{2} = ({\text{gender}}) \), the model does not consider the effect of age on censoring time.

Survival data consist of \( (t_{i} ,\delta_{i} ,{\mathbf{x}}_{i} ) \), \( i = 1, \ldots ,n \), where \( {\mathbf{x}}_{i} = (x_{i1} , \ldots ,x_{ip} )^{{\prime }} \) is a vector of covariates. The likelihood for the ith patient is expressed as

$$ L_{i} = \Pr (T = t_{i} ,U > t_{i} |{\mathbf{x}}_{i} )^{{\delta_{i} }} \Pr (T > t_{i} ,U = t_{i} |{\mathbf{x}}_{i} )^{{1 - \delta_{i} }} = f_{T}^{\# } (t_{i} |{\mathbf{x}}_{i} )^{{\delta_{i} }} f_{U}^{\# } (t_{i} |{\mathbf{x}}_{i} )^{{1 - \delta_{i} }} , $$

where

$$ \left. {f_{T}^{\# } (t_{i} |{\mathbf{x}}_{i} ) = - \frac{\partial }{\partial x}\Pr (T > x,U > t_{i} |{\mathbf{x}}_{i} )} \right|_{{x = t_{i} }} ,\quad \left. {f_{U}^{\# } (t_{i} |{\mathbf{x}}_{i} ) = - \frac{\partial }{\partial y}\Pr (T > t_{i} ,U > y|{\mathbf{x}}_{i} )} \right|_{{y = t_{i} }} , $$

are called the sub-density functions . Therefore, the log-likelihood is defined as

$$ \ell = \sum\limits_{i = 1}^{n} {[\delta_{i} \log f_{T}^{\# } (t_{i} |{\mathbf{x}}_{i} ) + (1 - \delta_{i} )\log f_{U}^{\# } (t_{i} |{\mathbf{x}}_{i} )]} . $$
(4.4)

An equivalent expression is

$$ \ell = \sum\limits_{i = 1}^{n} {[\delta_{i} \log h_{T}^{\# } (t_{i} |{\mathbf{x}}_{i} ) + (1 - \delta_{i} )\log h_{U}^{\# } (t_{i} |{\mathbf{x}}_{i} ) -\Phi (t_{i} ,t_{i} |{\mathbf{x}}_{i} )]} , $$
(4.5)

where

$$ h_{T}^{\# } (t_{i} |{\mathbf{x}}_{i} ) = \frac{{f_{T}^{\# } (t_{i} |{\mathbf{x}}_{i} )}}{{\Pr (T > t_{i} ,U > t_{i} |{\mathbf{x}}_{i} )}},\quad h_{U}^{\# } (t_{i} |{\mathbf{x}}_{i} ) = \frac{{f_{U}^{\# } (t_{i} |{\mathbf{x}}_{i} )}}{{\Pr (T > t_{i} ,U > t_{i} |{\mathbf{x}}_{i} )}}, $$

are the cause-specific hazard functions , and

$$ \Phi (t_{i} ,t_{i} |{\mathbf{x}}_{i} ) = - \log \,\Pr (T > t_{i} ,U > t_{i} |{\mathbf{x}}_{i} ) = - \log \,\Pr (\hbox{min} \{ T,U\} > t_{i} |{\mathbf{x}}_{i} ) $$

is the cumulative hazard function for \( \hbox{min} \{ \;T,U\;\} \).

With appropriate models on C θ , \( S_{T} ( \cdot |{\mathbf{x}}) \) and \( S_{U} ( \cdot |{\mathbf{x}}) \), one can obtain the maximum likelihood estimator (MLE) with Eqs. (4.4) or (4.5).

4.4 Parametric Models

4.4.1 The Burr Model

Escarela and Carrière (2003) considered a copula model with the Burr distribution defined as

$$ S_{T} (t|{\mathbf{x}}_{1i} ) = \{ 1 + \gamma_{1} (\lambda_{1i} t)^{{\nu_{1} }} \}^{{ - 1/\gamma_{1} }} ,\quad t \ge 0;\quad S_{U} (\;u\;|{\mathbf{x}}_{2i} \;) = \{ \;1 + \gamma_{2} (\lambda_{2i} u)^{{\nu_{2} }} \;\}^{{ - 1/\gamma_{2} }} ,\quad u \ge 0, $$

where \( v_{j} > 0 \), γ j  > 0, and \( \lambda_{ji} = \exp (\beta_{j0} + {\varvec{\upbeta}}_{j}^{{\prime }} {\mathbf{x}}_{ji} ) \) for \( j = 1 \) and 2. The Burr distribution includes many distributions as special cases; \( v_{j} = 1 \) gives the Pareto distribution , γ j  = 1 gives the log-logistic distribution, and γ j  → 0 gives the Weibull distribution. For the copula, Escarela and Carrière (2003) considered the Frank copula .

$$ C_{\theta } (u,v) = - \frac{1}{\theta }\log \left\{ {1 + \frac{{(e^{ - \theta u} - 1)(e^{ - \theta v} - 1)}}{{e^{ - \theta } - 1}}} \right\},\quad \theta \ne 0. $$

Their motivation to use the Frank model is that they wish to consider both positive dependence \( (\theta > 0) \) and negative dependence \( (\theta < 0) \) between two variables.

4.4.2 The Weibull Model

Likelihood-based analyses of Escarela and Carrière (2003) focused on the Weibull model

$$ S_{T} (t|{\mathbf{x}}_{1i} ) = \exp \{ - (\lambda_{1i} t)^{{\nu_{1} }} \} ,\quad t \ge 0;\quad S_{U} (u|{\mathbf{x}}_{2i} ) = \exp \{ - (\lambda_{2i} u)^{{\nu_{2} }} \} ,\quad u \ge 0. $$

With the Frank copula model, they maximize the log-likelihood of Eq. (4.4) with respect to \( (\beta_{10} ,{\varvec{\upbeta}}_{1} ,\nu_{1} ,\beta_{20} ,{\varvec{\upbeta}}_{2} ,\nu_{2} ) \) given the value θ. This leads to the profile likelihood

$$ \ell^{*} (\theta ) = \mathop {\hbox{max} }\limits_{{(\beta_{10} ,{\varvec{\upbeta}}_{1} ,\nu_{1} ,\beta_{20} ,{\varvec{\upbeta}}_{2} ,\nu_{2} )}} \ell (\beta_{10} ,{\varvec{\upbeta}}_{1} ,\nu_{1} ,\beta_{20} ,{\varvec{\upbeta}}_{2} ,\nu_{2} |\theta ). $$

The MLE of \( (\beta_{10} ,{\varvec{\upbeta}}_{1} ,\nu_{1} ,\beta_{20} ,{\varvec{\upbeta}}_{2} ,\nu_{2} ) \) is obtained at a given value \( \hat{\theta } = \arg \max_{\theta } \ell^{*} (\theta ) \).

The data analysis of Escarela and Carrière (2003) revealed that the estimator \( \hat{\theta } \) had a wide confidence interval (CI) if no covariate enters the model. This phenomenon is related to the non-identifiability of the model. The CI of \( \hat{\theta } \) was shrunken if many covariates enter the model. Heckman and Honoré (1989) showed that the non-identifiability is resolved by adding covariates into the marginal models. Unfortunately, there are no papers that give the conditions (e.g., how many covariates or how many samples) required to give reasonable precision of \( \hat{\theta } \) for estimating the true value θ.

In this context, we suggest regarding the approach of Escarela and Carrière (2003) as a two-step fashion. The first stage selects (not estimates) θ via the profile likelihood. With the selected value \( \hat{\theta } \), the second stage estimates the remaining parameters \( (\beta_{10} ,{\varvec{\upbeta}}_{1} ,\nu_{1} ,\beta_{20} ,{\varvec{\upbeta}}_{2} ,\nu_{2} ) \) by the MLE. The SEs of \( (\beta_{10} ,{\varvec{\upbeta}}_{1} ,\nu_{1} ,\beta_{20} ,{\varvec{\upbeta}}_{2} ,\nu_{2} ) \) may not account for the variation of \( \hat{\theta } \) following the approaches of an assumed copula .

4.4.3 The Pareto Model

In the absence of covariates, Shih et al. (2018) considered the Pareto marginal models

$$ S_{T} (t) = (1 + \alpha_{1} t)^{{ - \gamma_{1} }} ,\quad t \ge 0;\quad S_{U} (u) = (1 + \alpha_{2} u)^{{ - \gamma_{2} }} ,\quad u \ge 0, $$

where α j  > 0 and γ j  > 0 are re-parameterized from the Burr models. The marginal hazard functions are \( h_{T} (t) = \alpha_{1} \gamma_{1} /(1 + \alpha_{1} t) \) and \( h_{U} (u) = \alpha_{2} \gamma_{2} /(1 + \alpha_{2} u) \) and the marginal density functions are \( f_{T} (t) = h_{T} (t)S_{T} (t) \) and \( f_{U} (u) = h_{U} (u)S_{U} (u) \). Applying the Frank copula to Eq. (4.4), the log-likelihood can be written as

$$ \begin{aligned} \ell (\alpha_{1} ,\alpha_{2} ,\gamma_{1} ,\gamma_{2} |\theta ) & = \sum\limits_{i = 1}^{n} {\delta_{i} \{ \log f_{T} (t_{i} ) - \theta S_{T} (t_{i} ) + \log (e^{{ - \theta S_{T} (t_{i} )}} - 1) - \log (e^{ - \theta } - 1) + \theta S(t_{i} )\} } \\ & \quad + \sum\limits_{i = 1}^{n} {(1 - \delta_{i} )\{ \log f_{U} (t_{i} ) - \theta S_{U} (t_{i} ) + \log (e^{{ - \theta S_{U} (t_{i} )}} - 1) - \log (e^{ - \theta } - 1) + \theta S(t_{i} )\} } , \\ \end{aligned} $$

where \( S(t) = C_{\theta } \{ S_{T} (t),S_{U} (t)\} \). The MLE is obtained by maximizing the preceding equation.

They developed a Newton–Raphson algorithm to obtain the MLE of \( (\alpha_{1} ,\alpha_{2} ,\gamma_{1} ,\gamma_{2} ) \) given the value θ. The Bivariate.Pareto R package (Shih and Lee 2018) can be used to compute the MLE and the SE for the parameters. Hence, this model uses an assumed copula. Their Newton–Raphson algorithm employs a randomization scheme to reduce the sensitivity of the convergence results against the initial values, which is termed the randomized NewtonRaphson algorithm (Hu and Emura 2015). When θ is unknown, the profile likelihood estimate was suggested, namely \( \hat{\theta } = \arg \max_{\theta } \ell^{*} (\theta ) \), where \( \ell^{*} (\theta ) = \mathop {\hbox{max} }\limits_{{(\alpha_{1} ,\alpha_{2} ,\gamma_{1} ,\gamma_{2} )}} \ell (\alpha_{1} ,\alpha_{2} ,\gamma_{1} ,\gamma_{2} |\theta ) \). However, they reported that the profile likelihood occasionally does not have a peak and \( \hat{\theta } \) has a large sampling variation. These problems are related to the non-identifiability of competing risks data (Tsiatis 1975).

Due to the difficulty of estimating θ, Shih et al. (2018) considered a restricted model \( S_{T} (t) = S_{U} (t) = (1 + \alpha t)^{ - \gamma } \). The model makes a strong assumption that the two marginal distributions are the same. Under the Frank copula, they developed the randomized Newton–Raphson algorithm to obtain the MLE of \( (\alpha ,\gamma ,\theta ) \). While the peak of the likelihood always exists under this restricted model, the variation of estimating θ remains large. Including covariates into the marginal Pareto models may improve the precision of \( \hat{\theta } \). Alternatively, a sensitivity analysis may be considered under a few selected values of θ.

4.4.4 The Burr III Model

In the absence of covariates, Shih and Emura (2018) considered the Burr III marginal distributions

$$ S_{T} (t) = 1 - (1 + t^{ - \gamma } )^{ - \alpha } ,\quad t > 0;\quad S_{U} (u) = 1 - (1 + u^{ - \gamma } )^{ - \beta } ,\,\,u > 0, $$

where \( (\alpha ,\beta ,\gamma ) \) are positive parameters. They considered the generalized FGM copula with a copula parameter \( \theta \). In their model, the copula is imposed on a bivariate distribution function rather than a bivariate survival function. More details about this copula, such as the range of \( \theta \) and the expressions of Kendall’s tau, are referred to Amini et al. (2011), Domma and Giordano (2013) and Shih and Emura (2016, 2018).

Shih and Emura (2018) used the randomized Newton–Raphson algorithm to obtain the MLE of \( (\alpha ,\beta ,\gamma ) \) given the value of θ. When the value of θ is unknown, they suggested making inference for \( (\alpha ,\beta ,\gamma ) \), followed by the profile likelihood estimate \( \hat{\theta } = \arg \max_{\theta } \ell^{*} (\theta ) \), where \( \ell^{*} (\theta ) = \mathop {\hbox{max} }\limits_{(\alpha ,\beta ,\gamma )} \ell (\alpha ,\beta ,\gamma |\theta ) \). They also proposed a goodness-of-fit method to test the validity of the generalized FGM copula and the Burr III marginal models. The estimation and goodness-of-fit algorithms are implemented in the GFGM.copula R package (Shih 2018). Their method is developed for bivariate competing risks data, where dependent censoring is a competing risk of death, and death is a competing risk of dependent censoring.

4.4.5 The Piecewise Exponential Model

The piecewise exponential model has been considered to fit survival data with dependent censoring (Staplin et al. 2015; Emura and Michimae 2017). Let \( 0 = \alpha_{0} < a_{1} < \cdots < a_{m} \) be a knot sequence, where m is the number of knots. Assume that the hazard function for T in an interval \( (a_{j - 1} ,a_{j} ] \) is a constant \( e^{{\theta_{j} }} \) for \( j = 1, \ldots ,m \), such that \( {\varvec{\uptheta}} = (\theta_{1} , \ldots ,\theta_{m}) \) are parameters without restriction to their ranges. The survival function is

$$ S_{T} (t;{\varvec{\uptheta}}) = \exp \left\{ { - e^{{\theta_{j} }} (t - a_{j - 1} ) - \sum\limits_{k = 1}^{j - 1} {e^{{\theta_{k} }} (a_{k} - a_{k - 1} )} } \right\},\quad \quad t \in (a_{j - 1} ,a_{j} ], $$

where \( \sum\nolimits_{k = 1}^{0} {( \cdot ) \equiv 0} \). In a similar fashion, define the survival function \( S_{U} (u;{\varvec{\upgamma}}) \) for the censoring time U, where \( {\varvec{\upgamma}} = (\gamma_{1} , \ldots ,\gamma_{m} ) \).

Emura and Michimae (2017) considered a copula model

$$ \Pr (T > t,U > u) = C_{\theta } \{ S_{T} (t;{\varvec{\uptheta}}),S_{U} (u;{\varvec{\upgamma}})\} ,\,\,{\varvec{\uptheta}} = (\theta_{1} , \ldots ,\theta_{m} ),{\varvec{\upgamma}} = (\gamma_{1} , \ldots ,\gamma_{m} ), $$

where \( S_{T} (t;{\varvec{\uptheta}}) \) and \( S_{U} (u;{\varvec{\upgamma}}) \) follow the piecewise exponential models. The Clayton copula and the Joe copula were chosen for their numerical studies. They developed inference procedures based on the likelihood in Eq. (4.4) given the value θ. Hence, they applied an assumed copula . They did not use the profile likelihood for selecting θ since it may not work with many parameters in the marginal distributions. Alternatively, they suggested a sensitivity analysis to examine the result under a few different values of θ.

Staplin et al. (2015) originally proposed the piecewise exponential models for dependent censoring, but did not use copulas. Consequently, the sub-density functions in their likelihood function require some numerical integrations of the joint density of T and U.

4.5 Semi-parametric Models

4.5.1 The Transformation Model

Chen (2010) considered a semi-parametric transformation model defined as

$$ S_{T} (t|{\mathbf{x}}_{1i} ) = \exp [ - G_{1} \{\Lambda _{0} (t)e^{{{\varvec{\upbeta}}_{1}^{{\prime }} {\mathbf{x}}_{1i} }} \} ],\quad S_{U} (u|{\mathbf{x}}_{2i} ) = \exp [ - G_{2} \{\Gamma _{0} (u)e^{{{\varvec{\upbeta}}_{2}^{{\prime }} {\mathbf{x}}_{2i} }} \} ], $$

where \( {\varvec{\upbeta}}_{j} \) are regression coefficients, and \( G_{j} ( \cdot ) \) is a known and nonnegative increasing function such that \( G_{j} (0) = 0 \), \( G_{j} (\infty ) = \infty \), and \( g_{j} (t) \equiv dG_{j} (t)/dt > 0 \) for \( j = 1 \) and 2; \( \Lambda _{0} \) and \( \Gamma _{0} \) are unknown increasing functions. No distributional assumptions are imposed on \( \Lambda _{0} \) and \( \Gamma _{0} \). The linear transformation \( G_{j} (t) = t \) corresponds to the Cox model.

Under the semi-parametric transformation model, the cause-specific hazard functions are

$$ h_{T}^{\# } (t|{\mathbf{x}}_{i} ) = \lambda_{0} (t)e^{{{\varvec{\upbeta}}_{1}^{{\prime }} {\mathbf{x}}_{1i} }} \eta_{1i} (t;{\varvec{\upbeta}}_{1} ,{\varvec{\upbeta}}_{2} ,\Lambda _{0} ,\Gamma _{0} |\theta ),\quad h_{U}^{\# } (t|{\mathbf{x}}_{i} ) = \gamma_{0} (t)e^{{{\varvec{\upbeta}}_{2}^{{\prime }} {\mathbf{x}}_{2i} }} \eta_{2i} (t;{\varvec{\upbeta}}_{1} ,{\varvec{\upbeta}}_{2} ,\Lambda _{0} ,\Gamma _{0} |\theta ), $$

where \( \lambda_{0} (t) = d\Lambda _{0} (t)/dt \), \( \gamma_{0} (t) = d\Gamma _{0} (t)/dt \),

$$ \begin{aligned} & \eta_{1i} (t;{\varvec{\upbeta}}_{1} ,{\varvec{\upbeta}}_{2} ,\Lambda _{0} ,\Gamma _{0} |\theta ) = g_{1} \{\Lambda _{0} (t)e^{{{\varvec{\upbeta}}_{1}^{{\prime }} {\mathbf{x}}_{1i} }} \} S_{T} (t|{\mathbf{x}}_{1i} )D_{\theta ,1} [S_{T} (t|{\mathbf{x}}_{1i} ),S_{U} (t|{\mathbf{x}}_{2i} )], \\ & \eta_{2i} (t;{\varvec{\upbeta}}_{1} ,{\varvec{\upbeta}}_{2} ,\Lambda _{0} ,\Gamma _{0} |\theta ) = g_{2} \{\Gamma _{0} (t)e^{{{\varvec{\upbeta}}_{2}^{{\prime }} {\mathbf{x}}_{2i} }} \} S_{U} (t|{\mathbf{x}}_{2i} )D_{\theta ,2} [S_{T} (t|{\mathbf{x}}_{1i} ),S_{U} (t|{\mathbf{x}}_{2i} )], \\ \end{aligned} $$
$$ D_{\theta ,1} (u,v) = \frac{{\partial C_{\theta } (u,v)/\partial u}}{{C_{\theta } (u,v)}},\quad D_{\theta ,2} (u,v) = \frac{{\partial C_{\theta } (u,v)/\partial v}}{{C_{\theta } (u,v)}}. $$

Under the independence copula \( C_{\theta } (u,v) = uv \), the cause-specific hazard functions are equal to the marginal hazards:

$$ h_{T}^{\# } (t|{\mathbf{x}}_{i} ) = \lambda_{0} (t)e^{{{\varvec{\upbeta}}_{1}^{{\prime }} {\mathbf{x}}_{1i} }} g_{1} \{\Lambda _{0}(t)e^{{{\varvec{\upbeta}}_{1}^{{\prime }} {\mathbf{x}}_{1i} }}\}, \quad h_{U}^{\# } (t|{\mathbf{x}}_{i} ) = \gamma_{0} (t)e^{{{\varvec{\upbeta}}_{2}^{{\prime }} {\mathbf{x}}_{2i} }} g_{2} \{\Gamma _{0}(t)e^{{{\varvec{\upbeta}}_{2}^{{\prime }} {\mathbf{x}}_{2i} }}\}.$$

To obtain the MLE of \( ({\varvec{\upbeta}}_{1} ,{\varvec{\upbeta}}_{2} ,\Lambda _{0} ,\Gamma _{0} ) \), we treat \( \Lambda _{0} \) and \( \Gamma _{0} \) as increasing step functions that have jumps sizes \( d\Lambda _{0} (t_{i} ) =\Lambda _{0} (t_{i} ) -\Lambda _{0} (t_{i} - ) \) for δ i  = 1 and \( d\Gamma _{0} (t_{i} ) =\Gamma _{0} (t_{i} ) -\Gamma _{0} (t_{i} - ) \) for δ i  = 0. Putting the cause-specific hazard functions into Eq. (4.5) and replacing \( \lambda_{0} (t_{i} ) \) by \( d\Lambda _{0} (t_{i} ) \) and \( \gamma_{0} (t_{i} ) \) by \( d\Gamma _{0} (t_{i} ) \), we obtain the log-likelihood

$$ \begin{aligned} \ell ({\varvec{\upbeta}}_{1} ,{\varvec{\upbeta}}_{2} ,\Lambda _{0} ,\Gamma _{0} |\theta ) & = \sum\limits_{i} {\delta_{i} [{\varvec{\upbeta}}_{1}^{{\prime }} {\mathbf{x}}_{1i} + \log \eta_{1i} (t_{i} ;{\varvec{\upbeta}}_{1} ,{\varvec{\upbeta}}_{2} ,\Lambda _{0} ,\Gamma _{0} |\theta ) + \log d\Lambda _{0} (t_{i} )]} \\ & \quad + \sum\limits_{i} {(1 - \delta_{i} )[{\varvec{\upbeta}}_{2}^{{\prime }} {\mathbf{x}}_{2i} + \log \eta_{2i} (t_{i} ;{\varvec{\upbeta}}_{1} ,{\varvec{\upbeta}}_{2} ,\Lambda _{0} ,\Gamma _{0} |\theta ) + \log d\Gamma _{0} (t_{i} )]} \\ & \quad - \sum\limits_{i} {\Phi _{\theta } [S_{T} (t_{i} |{\mathbf{x}}_{1i} ),S_{U} (t_{i} |{\mathbf{x}}_{2i} )]} , \\ \end{aligned} $$

where \( \Phi _{\theta } (u,v) = - \log C_{\theta } (u,v) \). Since the marginal distributions have a number of parameters to be estimated, the profile likelihood may not properly identify a suitable value of \( \theta \). Chen (2010) suggested a sensitivity analysis to examine the result under a few different values of \( \theta \), possibly selected by prior knowledge and expert opinion.

The approach of Chen (2010) reduces to Cox’s partial likelihood approach (Cox 1972) under the independence copula and the linear transformation. Under these assumptions, the MLE \( ({\hat{\varvec{\upbeta}}}_{1} ,{\hat{\varvec{\upbeta}}}_{2} ,\hat{\Lambda }_{0} ,\hat{\Gamma }_{0} ) \) is obtained by maximizing two functions

$$ \begin{aligned} \ell_{1} ({\varvec{\upbeta}}_{1} ,\varLambda_{0} ) & = \sum\limits_{i} {\delta_{i} [{\varvec{\upbeta}}_{1}^{{\prime }} {\mathbf{x}}_{1i} + \log \,d\Lambda _{0} (t_{i} )]} + \sum\limits_{i} {\log S_{T} (t_{i} |{\mathbf{x}}_{1i} )} , \\ \ell_{2} ({\varvec{\upbeta}}_{2} ,\Gamma _{0} ) & = \sum\limits_{i} {(1 - \delta_{i} )[{\varvec{\upbeta}}_{2}^{{\prime }} {\mathbf{x}}_{2i} + \log d\Gamma _{0} (t_{i} )]} + \sum\limits_{i} {\log S_{U} (t_{i} |{\mathbf{x}}_{2i} )} , \\ \end{aligned} $$

since \( \ell ({\varvec{\upbeta}}_{1} ,{\varvec{\upbeta}}_{2} ,\Lambda _{0} ,\Gamma _{0} ) = \ell_{1} ({\varvec{\upbeta}}_{1} ,\Lambda _{0} ) + \ell_{2} ({\varvec{\upbeta}}_{2} ,\Gamma _{0} ) \). Then , the MLE \( ({\hat{\varvec{\upbeta}}}_{1} ,\hat{\Lambda }_{0} ) \) for \( ({\varvec{\upbeta}}_{1} ,\Lambda _{0} ) \) is the partial likelihood estimator \( {\hat{\varvec{\upbeta}}}_{1} \) and the Breslow estimator \( \hat{\Lambda }_{0} \) (Chap. 2).

4.5.2 The Spline Model

Emura et al. (2017) considered a spline-based model defined as

$$ S_{T} (t|{\mathbf{x}}_{1i} ) = \exp \{ -\Lambda _{0} (t)e^{{{\varvec{\upbeta}}_{1}^{{\prime }} {\mathbf{x}}_{1i} }} \} ,\quad S_{U} (u|{\mathbf{x}}_{2i} ) = \exp \{ -\Gamma _{0} (u)e^{{{\varvec{\upbeta}}_{2}^{{\prime }} {\mathbf{x}}_{2i} }} \} , $$

where \( {\varvec{\upbeta}}_{j} \) are regression coefficients, and the baseline hazard functions are modeled by

$$ \frac{d}{dt}\Lambda _{0} (t) = \lambda_{0} (t) = \sum\nolimits_{\ell = 1}^{5} {g_{\ell } M_{\ell } (t)} = {\mathbf{g}}^{{\prime }} {\mathbf{M}}(t),\quad \frac{d}{dt}\Gamma _{0} (t) = \gamma_{0} (t) = \sum\nolimits_{\ell = 1}^{5} {h_{\ell } M_{\ell } (t)} = {\mathbf{h}}^{{\prime }} {\mathbf{M}}(t), $$

where \( {\mathbf{M}}(t) = (M_{1} (t), \ldots ,M_{5} (t))^{{\prime }} \) are the cubic M-spline basis functions (Ramsay 1988). Here, \( {\mathbf{g}}^{{\prime }} = (g_{1} , \ldots ,g_{5} ) \) and \( {\mathbf{h}}^{{\prime }} = (h_{1} , \ldots ,h_{5} ) \) are unknown positive parameters. These five-parameter approximations give a good flexibility in estimation for real applications (Ramsay 1988) and are one of reasonable choices (Commenges and Jacqmin-Gadda 2015). Since the spline bases are easy to integrate, the baseline cumulative hazard functions are computed as \( \Lambda _{0} (t) = \sum\nolimits_{\ell = 1}^{5} {g_{\ell } I_{\ell } (t)} \) and \( \Gamma _{0} (t) = \sum\nolimits_{\ell = 1}^{5} {h_{\ell } I_{\ell } (t)} \), where \( I_{\ell} (t) \) is the integration of \( M_{\ell} (t) \), called the I-spline basis (Ramsay 1988).

The joint. Cox package (Emura 2018) offers functions M.spline () for computing \( M_{\ell} (t) \) and I.spline () for \( I_{\ell} (t) \). To compute these spline bases, one needs to specify the range of t. The package uses the range \( t \in [\xi_{1} ,\xi_{3} ] \) for the equally spaced knots \( \xi_{1} < \xi_{2} < \xi_{3} \), where \( \xi_{2} = (\xi_{1} + \xi_{3} )/2 \). A possible choice is \( \xi_{1} = \min_{i} (t_{i} ) \) and \( \xi_{3} = \max_{i} (t_{i} ) \). The expressions of \( M_{\ell} (t) \) and \( I_{\ell} (t) \) are given in Appendix A. Figure 4.1 displays the M- and I-spline basis functions with the knots \( \xi_{1} = 1 \), \( \xi_{2} = 2 \), and \( \xi_{3} = 3 \).

Fig. 4.1
figure 1

M-spline basis functions (left-panel) and I-spline basis functions (right-panel) with knots \( \xi_{1} = 1 \), \( \xi_{2} = 2 \), and \( \xi_{3} = 3 \)

Under the spline model, the cause-specific hazard functions are

$$ h_{T}^{\# } (t|{\mathbf{x}}_{i} ) = \lambda_{0} (t)e^{{{\varvec{\upbeta}}_{1}^{{\prime }} {\mathbf{x}}_{1i} }} \eta_{1i} (t;{\varvec{\upbeta}}_{1} ,{\varvec{\upbeta}}_{2} ,\Lambda _{0} ,\Gamma _{0} |\theta ),\quad h_{U}^{\# } (t|{\mathbf{x}}_{i} ) = \gamma_{0} (t)e^{{{\varvec{\upbeta}}_{2}^{{\prime }} {\mathbf{x}}_{2i} }} \eta_{2i} (t;{\varvec{\upbeta}}_{1} ,{\varvec{\upbeta}}_{2} ,\Lambda _{0} ,\Gamma _{0} |\theta ), $$

where

$$ \begin{aligned} & \eta_{1i} (t;{\varvec{\upbeta}}_{1} ,{\varvec{\upbeta}}_{2} ,\Lambda _{0} ,\Gamma _{0} |\theta ) = S_{T} (t|{\mathbf{x}}_{1i} )D_{\theta ,1} [S_{T} (t|{\mathbf{x}}_{1i} ),S_{U} (t|{\mathbf{x}}_{2i} )], \\ & \eta_{2i} (t;{\varvec{\upbeta}}_{1} ,{\varvec{\upbeta}}_{2} ,\Lambda _{0} ,\Gamma _{0} |\theta ) = S_{U} (t|{\mathbf{x}}_{2i} )D_{\theta ,2} [S_{T} (t|{\mathbf{x}}_{1i} ),S_{U} (t|{\mathbf{x}}_{2i} )]. \\ \end{aligned} $$

Putting these formulas into Eq. (4.5), we obtain the log-likelihood

$$ \begin{aligned} \ell ({\varvec{\upbeta}}_{1} ,{\varvec{\upbeta}}_{2} ,{\mathbf{g}},{\mathbf{h}}|\theta ) & = \sum\limits_{i} {\delta_{i} [{\varvec{\upbeta}}_{1}^{{\prime }} {\mathbf{x}}_{1i} + \log \eta_{1i} (t_{i} ;{\varvec{\upbeta}}_{1} ,{\varvec{\upbeta}}_{2} ,\Lambda _{0} ,\Gamma _{0} |\theta ) + \log \lambda_{0} (t_{i} )]} \\ & \quad + \sum\limits_{i} {(1 - \delta_{i} )[{\varvec{\upbeta}}_{2}^{{\prime }} {\mathbf{x}}_{2i} + \log \eta_{2i} (t_{i} ;{\varvec{\upbeta}}_{1} ,{\varvec{\upbeta}}_{2} ,\Lambda _{0} ,\Gamma _{0} |\theta ) + \log \gamma_{0} (t_{i} )]} \\ & \quad - \sum\limits_{i} {\Phi _{\theta } [S_{T} (t_{i} |{\mathbf{x}}_{1i} ),S_{U} (t_{i} |{\mathbf{x}}_{2i} )]} . \\ \end{aligned} $$

The estimator of \( ({\varvec{\upbeta}}_{1} ,{\varvec{\upbeta}}_{2} ,{\mathbf{g}},{\mathbf{h}}) \) is obtained by maximizing the penalized log-likelihood

$$ \ell ({\varvec{\upbeta}}_{1} ,{\varvec{\upbeta}}_{2} ,{\mathbf{g}},{\mathbf{h}}|\theta ) - \kappa_{1} \int {\ddot{\lambda }_{0} (t)^{2} dt} - \kappa_{2} \int {\ddot{\gamma }_{0} (t)^{2} dt} , $$

where \( \ddot{f}(t) = d^{2} f(t)/dt^{2} \), and (κ1, κ2) are given nonnegative values. The parameters (κ1, κ2) are called smoothing parameters, which control the degrees of penalties on the roughness of the two baseline hazard functions. It is shown in Appendix A that

$$ \int\limits_{{\xi_{1} }}^{{\xi_{3} }} {\ddot{\lambda }_{0} (t)^{2} dt} = {\mathbf{g}}^{{\prime }}\Omega {\mathbf{g}},\quad \int\limits_{{\xi_{1} }}^{{\xi_{3} }} {\ddot{\gamma }_{0} (t)^{2} dt} = {\mathbf{h}}^{{\prime }}\Omega {\mathbf{h}},\quad\Omega = \frac{1}{{\Delta^{5} }}\left[ {\begin{array}{*{20}c} { 1 9 2} & { - 1 3 2} & { 2 4} & { 1 2} & 0\\ { - 1 3 2} & { 9 6} & { - 2 4} & { - 1 2} & { 1 2} \\ { 2 4} & { - 2 4} & { 2 4} & { - 24} & { 2 4} \\ { 1 2} & { - 1 2} & { - 2 4} & { 9 6} & { - 1 3 2} \\ 0 & {12} & {24} & { - 132} & {192} \\ \end{array} } \right], $$

where \( \Delta = \xi_{2} - \xi_{1} = \xi_{3} - \xi_{2} \). A naïve approach is to set \( \kappa_{1} = \kappa_{2} = 0 \) as in Shih and Emura (2018).

A more sophisticated approach is to choose (κ1, κ2) by optimizing a likelihood cross-validation (LCV) criterion (O’ Sullivan 1988). Under the independence copula, the penalized log-likelihood is written as the sum of two marginal penalized log-likelihoods,

$$ \left[ {\ell_{1} ({\varvec{\upbeta}}_{1} ,\Lambda _{0} ) - \kappa_{1} \int {\ddot{\lambda }_{0} (t)^{2} dt} } \right] + \left[ {\ell_{2} ({\varvec{\upbeta}}_{2} ,\Gamma _{0} ) - \kappa_{2} \int {\ddot{\gamma }_{0} (t)^{2} dt} } \right], $$

where

$$ \begin{aligned} \ell_{1} ({\varvec{\upbeta}}_{1} ,\Lambda _{0} ) & = \sum\limits_{i} {\delta_{i} [{\varvec{\upbeta}}_{1}^{{\prime }} {\mathbf{x}}_{1i} + \log \lambda_{0} (t_{i} )]} - \sum\limits_{i} {\Lambda _{0} (t_{i} )\exp ({\varvec{\upbeta}}_{1}^{{\prime }} {\mathbf{x}}_{1i} )} , \\ \ell_{2} ({\varvec{\upbeta}}_{2} ,\Gamma _{0} ) & = \sum\limits_{i} {(1 - \delta_{i} )[{\varvec{\upbeta}}_{2}^{{\prime }} {\mathbf{x}}_{2i} + \log \gamma_{0} (t_{i} )]} - \sum\limits_{i} {\Gamma _{0} (t_{i} )\exp ({\varvec{\upbeta}}_{2}^{{\prime }} {\mathbf{x}}_{2i} )} . \\ \end{aligned} $$

We suggest choosing κ1 and κ2 based on the two marginal LCVs defined as

$$ LCV_{1} = \hat{\ell }_{1} - {\text{tr}}\{ \hat{H}_{PL1}^{ - 1} \hat{H}_{1} \} ,\quad LCV_{2} = \hat{\ell }_{2} - {\text{tr}}\{ \hat{H}_{PL2}^{ - 1} \hat{H}_{2} \} , $$

where \( \hat{\ell }_{1} \) and \( \hat{\ell }_{2} \) are the log-likelihood values evaluated at their marginal penalized likelihood estimates, and \( \hat{H}_{PL1}^{{}} \) and \( \hat{H}_{PL2}^{{}} \) are the converged Hessian matrices for the marginal penalized likelihood estimations, \( \hat{H}_{1}^{{}} \) and \( \hat{H}_{2}^{{}} \) are the converged Hessian matrices for the marginal log-likelihoods such that

$$ \hat{H}_{1} = \hat{H}_{PL1} + 2\kappa_{1} \left[ {\begin{array}{*{20}c} {O_{{p_{1} \times p_{1} }} } & {O_{{p_{1} \times 5}} } \\ {O_{{5 \times p_{1} }} } &\Omega \\ \end{array} } \right],\quad \hat{H}_{2} = \hat{H}_{PL2} + 2\kappa_{2} \left[ {\begin{array}{*{20}c} {O_{{p_{2} \times p_{2} }} } & {O_{{p_{2} \times 5}} } \\ {O_{{5 \times p_{2} }} } &\Omega \\ \end{array} } \right], $$

where \( O \) is a zero matrix and \( p_{j} \) is the dimension of \( {\varvec{\upbeta}}_{j} \) for \( j = 1 \) and 2. The values of \( (\kappa_{1} ,\kappa_{2} ) \) are obtained by maximizing \( LCV_{1} \) for \( \kappa_{1} \) and \( LCV_{2} \) for \( \kappa_{2} \), separately. One may apply the R function splineCox.reg in the joint.Cox R package to find the optimal value of \( \kappa_{1} \) (or \( \kappa_{2} \)).