Keywords

21.1 Introduction

For clinical trials with time to event data, often proportional hazards (Cox 1972) is assumed when comparing two treatment arms, and a single value of the hazard ratio is used to describe the group difference. When the proportionality assumption may not hold true, a natural approach to assess the time-dependency of the treatment effect is to analyze the hazard ratio function. For example, a conventional method is to give a hazard ratio estimate over each of a few time intervals, by fitting a piece-wise proportional hazards model. Alternatively, a “defined” time-varying covariate can be used in a Cox regression model, resulting in a parametric form for the hazard ratio function (e.g., Kalbfleisch and Prentice 2002, Chap. 6). With these approaches, it may not be easy to pre-specify the partition of the time axis or the parametric form of the hazard ratio function. Also, although the hazard ratio provides a nice display of temporal pattern of the treatment effect, it may not directly translate to the survival experience. It is possible for the hazard ratio to be less than 1 in a region where there is no improvement in the survival probability, or more than 1 in a region where the survival probability is not reduced. Similar phenomena also exists for the average of hazard ratio. Thus to assess the cumulative treatment effect, other measures can be used to supplement the hazard ratio.

Let F T (t) and F C (t) be the cumulative distribution functions of the two comparison groups, named treatment and control, respectively. The failure probability ratio

$$\begin{aligned} RR(t)=\frac{F_T(t)}{F_C(t)}\end{aligned}$$

is the process version of relative risk, a measure often used in epidemiology. It directly indicates if the failure probability in the time interval \((0, t]\) is lower in the treatment group than in the control group, regardless of the possible up and down pattern of the hazard ratio within \((0,t]\). Let \(\Lambda_T(t)\) and \(\Lambda_C(t)\) be the cumulative distribution functions of the two comparison groups respectively. The ratio of cumulative hazards

$$\begin{aligned}\textit{CHR}(t)=\frac{\Lambda_T(t)}{\Lambda_C(t)}\end{aligned}$$

also indicates the cumulative treatment effect, taking value \(<1\) if and only if \(F_T(t)<F_C(t).\) Unlike the failure probability ratio, a value 0.8 for the ratio of cumulative hazards does not translate to a \(20~\%\) reduction of the failure probability. However, there is a nice property that if one adopts a proportional hazards adjustment for baseline covariates, then the ratio of cumulative hazards remains the same while the failure probability ratio depends on those covariates.

Although measures such as the failure probability ratio and the ratio of cumulative hazards provide usual supplementary information in addition to the hazard ratio, and the non-parametric estimators are easily available via the Nelson–Aalen estimator for the cumulative hazard function (Nelson 1969; Aalen 1975) and the Kaplan–Meier estimator of the survival function (Kaplan and Meier 1958), the non-parametric inference procedures are not used frequently, as the estimates are often not very smooth and the confidence intervals can be quite wide near the beginning of the data range. In this chapter, we consider semiparametric inference on the two ratios under a sufficiently flexible model. Assume that the failure times are absolutely continuous. The short-term and long-term hazards model proposed in Yang and Prentice (2005) postulates that

$$\lambda_T(t)= \frac{1}{e^{-{\beta}_2}+\Big(e^{-{\beta}_1}- e^{-{\beta}_2}\Big)S_C(t)}\lambda_C(t),\ t<\tau_0,$$
(21.1)

where \({\beta}_1,\ {\beta}_2\) are scalar parameters, S C the survivor function of the control group, \(\lambda_T(t)\), \(\lambda_C(t)\) the hazard function of the two groups respectively, and

$$ \tau_0=\sup \left\{x: \int_0^x {\lambda}_C(t)dt <\infty\right\}.$$
(21.2)

Under this model, \(\lim_{t\downarrow 0} \lambda_T(t)/\lambda_C(t)= e^{\beta_1}, \lim_{t\uparrow \tau_0} \lambda_T(t)/\lambda_C(t)=e^{\beta_2}.\) Thus, various patterns of the hazard ratio can be realized, including proportional hazards, no initial effect, disappearing effect, and crossing hazards. In particular, model (21.1) includes the proportional hazards model and the proportional odds model as special cases. There is no need to pre-specify a partition of the time axis or a parametric form of the hazard ratio function. For this model, Yang and Prentice (2005) proposed a pseudo-likelihood method for estimating the parameters, and Yang and Prentice (2011) studied inference procedures on the hazard ratio function and the average of the hazard ratio function. Extension of model (21.1) to the regression setting was also studied for current status data in Tong et al. (2007).

In the sections to follow, we first obtain the estimates and point-wise confidence intervals of the two ratios under model (21.1). Since the ratios are functions of time, simultaneous confidence intervals, or confidence bands, of the ratios are more appropriate than the point-wise confidence intervals. We will employ a resampling scheme to obtain the confidence bands of the ratios. Such semiparametric inference procedures are applicable in a wide range of applications due to the properties of model (21.1) mentioned before. They will be illustrated through applications to data from two clinical trials.

Some previous work is related to the problems considered here. Dong and Matthews (2012) developed empirical likelihood estimator for the ratio of cumulative hazards with covariate adjustment. Schaubel and Wei (2011) considered several measures under dependent censoring and non-proportional hazards, and point-wise confidence intervals were constructed. In earlier works, Dabrowska et al. (1989) introduced a relative change function defined in terms of cumulative hazards and found simultaneous bands for this function under the assumption of proportional hazards. Parzen et al. (1997) constructed nonparametric simultaneous confidence bands for the survival probability difference. Cheng et al. (1997) proposed point-wise and simultaneous confidence interval procedures for the survival probability under semiparametric transformation models. McKeague and Zhao (2002) proposed simultaneous confidence bands for ratios of survival functions via the empirical likelihood method.

The article is organized as follows. In Sect. 21.2 the short-term and long-term hazard ratio model and the parameter estimator are described. Point-wise confidence intervals are established for the failure probability ratio and the ratio of cumulative hazards. In Sect. 21.3, confidence bands are developed. Simulation results are presented in Sect. 21.4. Applications to data from two clinical trials are given in Sect. 21.5. Some discussion is given in Sect. 21.6.

21.2 The Estimators and Point-Wise Confidence Intervals

Denote the pooled lifetimes of the two groups by \(T_1, \cdots,T_n\), with \(T_1, \cdots,T_{n_1},\ n_1<n,\) constituting the control group. Let \(C_1, \cdots,C_n\) be the censoring variables, and \(Z_i=I(i> n_1), i=1, \cdots, n\), where \(I(\cdot)\) is the indicator function. The available data consist of the independent triplets \((X_i,\delta_i, Z_i)\), \(i=1,\dots,n,\) where \(X_i =\min (T_i,C_i)\) and \(\delta_i=I(T_i\leq C_i).\) We assume that \(T_i,\ C_i\) are independent given Z i . The censoring variables (C i ’s) need not be identically distributed, and in particular the two groups may have different censoring patterns. For \(t<\tau_0\) with τ0 defined in (21.2), let R(t) be the odds function \(1/S_C(t)-1\) of the control group. The model of Yang and Prentice (2005) can be expressed as

$$\begin{aligned} \lambda_{i}(t)= \frac{1}{e^{-{\beta}_1Z_i}+ e^{-{\beta}_2Z_i} R(t)} \frac{dR(t)}{dt},\ i= 1, \dots, n, t<\tau_0,\end{aligned}$$

where \(\lambda_{i}(t)\) is the hazard function for T i given Z i .

Under model (21.1), RR(t) and \(\textit{CHR}(t)\) depends on the parameter \({\bf \beta}=\Big({\beta}_1,{\beta}_2\Big)^T\) and the baseline function R(t), where \(``^T"\) denotes transpose. Yang and Prentice (2005) studied a pseudo likelihood estimator \({\hat{\bf \beta}}\) of β which we describe below.

Let \(\tau<\tau_0\) be such that

$$\lim_n \sum^n_{i=1}I\Big(X_i\geq \tau\Big)>0,$$
(21.3)

with probability 1. For \(t\leq \tau\), define

$$\begin{aligned}\nonumber \hat{P}(t;{\bf b})&=\prod_{s\leq t}\left(1- \frac{\sum^n_{i=1}\delta_ie^{-b_2Z_i} I(X_i=s)} {\sum^n_{i=1}I(X_i\geq s)}\right),\\ \nonumber \hat{R}(t;{\bf b})&=\frac{1}{\hat{P}(t;{\bf b})} {\int^t_0} \frac{\hat{P}_{-}(s;{\bf b})}{\sum^n_{i=1}I\Big(X_i\geq s\Big)} d\left(\sum^n_{i=1}\delta_ie^{-b_1Z_i} I\Big(X_i\leq s\Big)\right),\end{aligned}$$

where \(\hat{P}_-(s;{\bf b})\) denotes the left continuous (in s) version of \(\hat{P}(s;{\bf b}).\) Let \(L({\beta}, R)\) be the likelihood function of \({\beta}\) under model (21.1) when the function R(t) is known, with the corresponding score vector \(S({\beta}, R)=\partial \ln L({\beta}, R)/\partial{\beta}.\) Define \(Q({\bf b})=S({\bf b}, R)|_{R(t)={\hat{R}}(t;{\bf b})}.\) Then the pseudo maximum likelihood estimator \({\hat{\bf \beta}}=({\hat{\beta}_1}, {\hat{\beta}_2})^T\) of \({\bf \beta}\) is the zero of \(Q({\bf b})\). Note that the use of \({\hat{R}}(t;{\bf b})\) results in the estimating function \(Q({\bf b})\) which does not involve the infinite dimensional nuisance parameter R(t), thus the finite dimensional parameter β can be estimated much more easily.

Once \({\hat{\bf \beta}}\) is obtained, R(t) can be estimated by \(\hat{R}\Big(t;{\hat{\bf \beta}}\Big)\). Thus under model (21.1), plugging-in the estimators \({\hat{\bf \beta}}\) and \(\hat{R}\Big(t;{\hat{\bf \beta}}\Big)\), we can estimate the failure probability ratio RR(t) and the ratio of cumulative hazards \(\textit{CHR}(t)\) by

$$\widehat{RR}(t)=\frac{1+\hat{R}\Big(t;{\hat{\bf \beta}}\Big)}{\hat{R}\Big(t;{\hat{\bf \beta}}\Big)}\left(1- \Big\{1+e^{-{\hat{\bf \beta}}_2+{\hat{\bf \beta}}_1}\hat{R}\Big(t;{\hat{\bf \beta}}\Big)\Big\}^{-e^{{\hat{\bf \beta}}_2}}\right),$$
(21.4)

and

$$\widehat{\textit{CHR}}(t)=\frac{e^{{\hat{\bf \beta}}_2}\ln \Big\{1+e^{-{\hat{\bf \beta}}_2+{\hat{\bf \beta}}_1}\hat{R}\Big(t;{\hat{\bf \beta}}\Big)\Big\}} {\ln\Big\{1+\hat{R}\Big(t;{\hat{\bf \beta}}\Big)\Big\}},$$
(21.5)

respectively. Note that under the model and with the pseudo likelihood estimator, the distributions of the two groups share a common baseline function R(t) which is estimated using pooled data. Thus the resulting estimators for RR(t) and \(\textit{CHR}(t)\) are expected to be smoother and more stable than the nonparametric estimators. In Appendix A, we show that, under certain regularity conditions, the two estimators in (21.4) and (21.5) are strongly consistent under model (21.1). To study the distributional properties of the estimators, let

$$\begin{aligned} U_n(t)=\sqrt{n}{(\widehat{RR}(t)-RR(t))},\ t\leq\tau,\end{aligned}$$
$$\begin{aligned} V_n(t)=\sqrt{n}{(\widehat{\textit{CHR}}(t)- \textit{CHR}(t))},\ t\leq\tau,\end{aligned}$$

and

$$\begin{aligned} \Omega=\Big\{-\frac{1}{n} \frac{\partial Q ({\bf \beta})}{\partial{\bf \beta}}\Big\}^{-1}.\end{aligned}$$

Let \(\hat{\Omega}\) be an estimator of Ω defined by replacing \({\bf \beta}\) with \({\hat{\bf \beta}}\) and R(t) with \({\hat{R}}\Big(t;{\hat{\bf \beta}}\Big)\).

In Appendix B we show that, for \(t\leq\tau\), the processes U n and V n are asymptotically equivalent to, respectively,

$$\begin{aligned} \nonumber\tilde{U}_n(t)&=& \frac{A_{RR}^T(t) \Omega}{\sqrt{n}} \left(\sum_{i\leq n_1}\int_0^\tau \mu_1dM_i +\sum_{i>n_1}\int_0^\tau \mu_2 dM_i\right)\end{aligned}$$
$$\begin{aligned}&&+\,\frac{B_{RR}(t)}{\sqrt{n}} \left(\sum_{i\leq n_1}\int_0^t \nu_1dM_i +\sum_{i>n_1}\int_0^t \nu_2dM_i\right)\end{aligned}$$
(21.6)

and

$$\begin{aligned} \nonumber\tilde{V}_n(t)&=& \frac{A_{\textit{CHR}}^T(t) \Omega}{\sqrt{n}} \left(\sum_{i\leq n_1}\int_0^\tau \mu_1dM_i +\sum_{i>n_1}\int_0^\tau \mu_2 dM_i\right)\end{aligned}$$
$$\begin{aligned}&&+\,\frac{B_{\textit{CHR}}(t)}{\sqrt{n}} \left(\sum_{i\leq n_1}\int_0^t \nu_1dM_i +\sum_{i>n_1}\int_0^t \nu_2dM_i\right),\end{aligned}$$
(21.7)

where \(A_{RR},\ A_{\textit{CHR}},\ \mu_1, \mu_2\) are appropriately defined 2 × 1 vector functions and \(B_{RR},\ B_{\textit{CHR}},\ \nu_1, \nu_2\) scalar functions given in Appendix B. It will then be shown that U n and V n converge weakly to some zero-mean Gaussian processes \(U^*\) and \(V^*\) respectively. With estimators \(\hat{B}_{RR}(t), \hat{A}_{RR}(t),\) …, given in Appendix B, it will be shown that the limiting covariance functions of \(U^*\) and \(V^*\) can be consistently estimated, respectively, by

$$\begin{aligned} \nonumber\hat{\sigma}_{RR}(s,t) & = \hat{A}_{RR}^T(s) \hat{\Omega} \left(\int^\tau_0 \frac{\hat{\mu}_1(w)\hat{\mu}_1^T(w)K_1(w)d{\hat{R}}\Big(w;{\hat{\bf \beta}}\Big)} {n\Big(1+{\hat{R}}\Big(w;{\hat{\bf \beta}}\Big)\Big)}\right.\nonumber\\&\quad+ \left. \int^\tau_0 \frac{\hat{\mu}_2(w)\hat{\mu}_2^T(w)K_2(w)d{\hat{R}} \Big(w;{\hat{\bf \beta}}\Big)}{n\Big(e^{-{\hat{\bf \beta}}_1}+e^{-{\hat{\bf \beta}}_2}{\hat{R}}\Big(w;{\hat{\bf \beta}}\Big)\Big)}\right) \hat{\Omega}^T \hat{A}_{RR}(t)\nonumber\\&\quad+ \hat{B}_{RR}(s)\hat{B}_{RR}(t) \left(\int^s_0 \frac{\hat{\nu}_1^2(w)K_1(w)d{\hat{R}} \Big(w;{\hat{\beta}}\Big)} {n\Big(1+{\hat{R}}\Big(w;{\hat{\beta}}\Big)\Big)}\right.\nonumber\\&\quad+ \left.\int^s_0 \frac{\hat{\nu}_2^2(w)K_2(w)d{\hat{R}} \Big(w;{\hat{\beta}}\Big)} {n\Big(e^{-{\hat{\beta}}_1}+e^{-{\hat{\beta}}_2}{\hat{R}} \Big(w;{\hat{\beta}}\Big)\Big)}\right)\nonumber\\&\quad+ \hat{B}_{RR}(t) \hat{A}_{RR}^T(s)\hat{\Omega} \left(\int^t_0 \frac{\hat{\mu}_1(w) \hat{\nu}_1(w) K_1(w)d{\hat{R}}\Big(w,{\hat{\beta}}\Big)}{n\Big(1+{\hat{R}} \Big(w;{\hat{\beta}}\Big)\Big)}\right.\nonumber\\&\quad+ \left.\int^t_0 \frac{\hat{\mu}_2(w) \hat{\nu}_2(w) K_2(w)d{\hat{R}} \Big(w,{\hat{\beta}}\Big)}{n\Big(e^{-{\hat{\beta}}_1}+e^{-{\hat{\beta}}_2}{\hat{R}} \Big(w;{\hat{\beta}}\Big)\Big)}\right)\nonumber\\&\quad+ \hat{B}_{RR}(s) \hat{A}_{RR}^T(t)\hat{\Omega} \left(\int^s_0 \frac{\hat{\mu}_1(w) \hat{\nu}_1(w) K_1(w)d{\hat{R}}\Big(w,{\hat{\beta}}\Big)}{n\Big(1+{\hat{R}} \Big(w;{\hat{\beta}}\Big)\Big)} \right.\\&\quad+ \left.\int^s_0 \frac{\hat{\mu}_2(w) \hat{\nu}_2(w) K_2(w)d{\hat{R}}\Big(w,{\hat{\beta}}\Big)}{n\Big(e^{-{\hat{\beta}}_1} +e^{-{\hat{\beta}}_2}{\hat{R}}\Big(w;{\hat{\beta}}\Big)\Big)}\right),\end{aligned}$$
(21.8)

and

$$\begin{aligned} \nonumber\hat{\sigma}_{\textit{CHR}}(s,t) &= \hat{A}_{\textit{CHR}}^T(s) \hat{\Omega} \left(\int^\tau_0 \frac{\hat{\mu}_1(w)\hat{\mu}_1^T(w)K_1(w)d{\hat{R}}\Big(w;{\hat{\bf \beta}}\Big)} {n\Big(1+{\hat{R}}\Big(w;{\hat{\bf \beta}}\Big)\Big)}\right.\nonumber\\&\quad+ \left.\int^\tau_0 \frac{\hat{\mu}_2(w)\hat{\mu}_2^T(w)K_2(w)d{\hat{R}}\Big({w;{\hat{\bf \beta}}}\Big)} {n\Big(e^{-{\hat{\bf \beta}}_1}+e^{-{\hat{\bf \beta}}_2}{\hat{R}}\Big(w;{\hat{\bf \beta}}\Big)\Big)}\right) \hat{\Omega}^T \hat{A}_{\textit{CHR}}(t)\nonumber\\&\quad+ \hat{B}_{\textit{CHR}}(s)\hat{B}_{\textit{CHR}}(t) \left(\int^s_0 \frac{\hat{\nu}_1^2(w)K_1(w)d{\hat{R}}\Big(w;{\hat{\bf \beta}}\Big)} {n\Big(1+{\hat{R}}\Big(w;{\hat{\bf \beta}}\Big)\Big)}\right.\nonumber\\&\quad+ \left.\int^s_0 \frac{\hat{\nu}_2^2(w)K_2(w)d{\hat{R}}\Big(w;{\hat{\bf \beta}}\Big)} {n\Big(e^{-{\hat{\bf \beta}}_1}+e^{-{\hat{\bf \beta}}_2}{\hat{R}}\Big(w;{\hat{\bf \beta}}\Big)\Big)}\right)\nonumber\\&\quad+ \hat{B}_{\textit{CHR}}(t) \hat{A}_{\textit{CHR}}^T(s)\hat{\Omega} \left(\int^t_0 \frac{\hat{\mu}_1(w) \hat{\nu}_1(w) K_1(w)d{\hat{R}}\Big(w,{\hat{\bf \beta}}\Big)}{n\Big(1+{\hat{R}}\Big(w;{\hat{\bf \beta}}\Big)\Big)}\right.\nonumber\\&\quad+ \left.\int^t_0 \frac{\hat{\mu}_2(w) \hat{\nu}_2(w) K_2(w)d{\hat{R}}\Big(w,{\hat{\bf \beta}}\Big)}{n\Big(e^{-{\hat{\bf \beta}}_1}+e^{-{\hat{\bf \beta}}_2}{\hat{R}}\Big(w;{\hat{\bf \beta}}\Big)\Big)}\right)\nonumber\\&\quad+ \hat{B}_{\textit{CHR}}(s) \hat{A}_{\textit{CHR}}^T(t)\hat{\Omega} \left(\int^s_0 \frac{\hat{\mu}_1(w) \hat{\nu}_1(w) K_1(w)d{\hat{R}}\Big(w,{\hat{\bf \beta}}\Big)}{n\Big(1+{\hat{R}}\Big(w;{\hat{\bf \beta}}\Big)\Big)}\right.\\&\quad+ \left.\int^s_0 \frac{\hat{\mu}_2(w) \hat{\nu}_2(w) K_2(w)d{\hat{R}}\Big(w,{\hat{\bf \beta}}\Big)}{n\Big(e^{-{\hat{\bf \beta}}_1}+e^{-{\hat{\bf \beta}}_2}{\hat{R}}\Big(w;{\hat{\bf \beta}}\Big)\Big)}\right).\end{aligned}$$
(21.9)

The estimators \(\hat{\Omega}, \hat{A}_{RR}(t), \hat{A}_{\textit{CHR}}(t)\) involve the derivative vector \(\partial{\hat{R}}\Big(t;{\bf \beta}\Big)/\partial{\bf \beta}\) and the derivative matrix in Ω. From various simulation studies, these derivatives can be approximated by numerical derivatives for easier calculation, and the results are fairly stable with respect to the choice of the jump size in the numerical derivatives.

For a fixed \(t_0\leq \tau\), confidence intervals for \(RR\Big(t_0\Big)\) can be obtained from the asymptotic normality of \(\widehat{RR}(t_0)\) and the estimated variance \(\hat{\sigma}_{RR}\Big(t_0,t_0\Big)\). For better small sample behavior and to ensure that the confidence intervals remain on the positive side of the axis as usual, we make a logarithm transformation resulting in the asymptotic \(100(1-\alpha)\%\) confidence interval

$$\widehat{RR}(t_0){\rm exp}\left(\mp z_{\alpha/2} \frac{\sqrt{\hat{\sigma}_{RR}(t_0,t_0)}}{\sqrt{n}\widehat{RR}(t_0)}\right),$$
(21.10)

where \(z_{\alpha/2}\) is the \(100(1-\alpha/2)\%\) percentile of the standard normal distribution.

Similarly, for \(\textit{CHR}(t_0)\), an asymptotic \(100(1-\alpha)\%\) confidence interval is

$$\widehat{\textit{CHR}}(t_0){\rm exp}\left(\mp z_{\alpha/2} \frac{\sqrt{\hat{\sigma}_{\textit{CHR}}(t_0,t_0)}}{\sqrt{n}\widehat{\textit{CHR}}(t_0)}\right).$$
(21.11)

21.3 Confidence Bands

For simultaneous inference on RR(t) over a time interval \(I=[a,b] \subset [0,\tau]\), let w n (t) be a data-dependent function that converges in probability to a bounded function \(w^*(t)>0\), uniformly in t over I. Then, it follows that \(U_n/w_n\) converges weakly \(U^*/w^*\). Let \(c_{\alpha}\) be the upper αth percentile of \(\sup_{t\in I}|U^*/w^* |\), then an asymptotic \(100(1-\alpha)\%\) simultaneous confidence band for \(RR(t), t\in I,\) can be obtained as

$$ \widehat{RR}(t){\rm exp}\left(\mp c_{\alpha} \frac{w_n(t)}{\sqrt{n}\widehat{RR}(t)}\right).$$
(21.12)

The analytic form of \(c_{\alpha}\) is quite intractable. The bootstrapping method provides a well established alternative approach. However, it is very time-consuming. More discussion on this is described further on the applications to clinical trial data in Sect. 21.5. Here we have used a normal resampling approximation similar to the approach used in Lin et al. (1993). This approach results in substantial savings in computing time, and has been used in many works, including Lin et al. (1994), Cheng et al. (1997), Tian et al. (2005), and Peng and Huang (2007).

For \(t \leq \tau\), let \(N_i(t)=\delta_i I(X_i\leq t), i=1, \cdots,n,\) and define the process

$$\begin{aligned} \nonumber\hat{U}_n(t)&= \frac{\hat{A}_{RR}^T(t) \hat{\Omega}}{\sqrt{n}} \left(\sum_{i\leq n_1}\int_0^\tau \hat{\mu}_1 d(\epsilon_iN_i) +\sum_{i>n_1}\int_0^\tau \hat{\mu}_2 d(\epsilon_iN_i)\right)\\& \quad+\,\frac{\hat{B}_{RR}(t)}{\sqrt{n}} \left(\sum_{i\leq n_1}\int_0^t \hat{\nu}_1d(\epsilon_iN_i) +\sum_{i>n_1}\int_0^t \hat{\nu}_2d(\epsilon_iN_i)\right)\\&= \nonumber\frac{\hat{A}_{RR}^T(t) \hat{\Omega}}{\sqrt{n}} \left(\sum_{i\leq n_1}\epsilon_i \delta_i \hat{\mu}_1(X_i)I(X_i\leq \tau) +\sum_{i>n_1}\epsilon_i \delta_i \hat{\mu}_2(X_i)I(X_i\leq \tau)\right)\\&\quad+\,\frac{\hat{B}_{RR}(t)}{\sqrt{n}} \left(\sum_{i\leq n_1}\epsilon_i \delta_i \hat{\nu}_1 (X_i)I(X_i\leq t) +\sum_{i>n_1}\epsilon_i \delta_i \hat{\nu}_2(X_i)I(X_i \leq t)\right),\end{aligned}$$
(21.13)

where \(\epsilon_i,\ i=1,\dots, n,\) are independent standard normal variables that are also independent of the data. Conditional on \((X_i, \delta_i, Z_i), i=1,\dots, n,\) \(\hat{U}_n\) is a sum of n independent variables at each time point. In Appendix B, it will be shown that \(\hat{U}_n\) given the data converges weakly to \(U^*\). It follows that \(\sup_{t\in I}|\hat{U}_n(t)/w_n(t)|\) given the data converges in distribution to \(\sup_{t\in I}|U^*(t)/w^*(t) |\). Therefore, \(c_{\alpha}\) can be estimated empirically from a large number of realizations of the conditional distribution of \(\sup_{t\in I}|\hat{U}/w |\) given the data.

Similarly, to considerations in Yang and Prentice (2011) for inference on the hazard ratio, we look at several choices of the weight w n . For \(w_n(t)=\sqrt{\hat{\sigma}_{RR}(t,t)}\) we obtain the equal precision bands (Nair 1984), which only differ from point-wise confidence intervals in using \(c_\alpha\) instead of \(z_{\alpha/2}\). For \(w_n(t)=1+\hat{\sigma}_{RR}(t,t)\) we obtain the Hall–Wellner type bands recommended by Bie et al. (1987). The simplest case \(w_n(t)\equiv 1\) does not require the computation of \(\hat{\sigma}_{RR}(t, t)\), and hence is easier to implement.

To obtain simultaneous confidence bands for \(\textit{CHR}(t)\), let

$$\begin{aligned}\nonumber\hat{V}_n(t)&= \frac{\hat{A}_{\textit{CHR}}^T(t) \hat{\Omega}}{\sqrt{n}} \left(\sum_{i\leq n_1}\int_0^\tau \hat{\mu}_1 d(\epsilon_iN_i) +\sum_{i>n_1}\int_0^\tau \hat{\mu}_2 d(\epsilon_iN_i)\right)\\& \quad+\,\frac{\hat{B}_{\textit{CHR}}(t)}{\sqrt{n}} \left(\sum_{i\leq n_1}\int_0^t \hat{\nu}_1d(\epsilon_iN_i) +\sum_{i>n_1}\int_0^t \hat{\nu}_2d(\epsilon_iN_i)\right)\\&= \nonumber\frac{\hat{A}_{\textit{CHR}}^T(t) \hat{\Omega}}{\sqrt{n}} \left(\sum_{i\leq n_1}\epsilon_i \delta_i \hat{\mu}_1(X_i)I(X_i\leq \tau) +\sum_{i>n_1}\epsilon_i \delta_i \hat{\mu}_2(X_i)I(X_i\leq \tau)\right)\\&\quad+\,\frac{\hat{B}_{\textit{CHR}}(t)}{\sqrt{n}} \left(\sum_{i\leq n_1}\epsilon_i \delta_i \hat{\nu}_1 (X_i)I(X_i\leq t) +\sum_{i>n_1}\epsilon_i \delta_i \hat{\nu}_2(X_i)I(X_i \leq t)\right),\end{aligned}$$
(21.14)

where \(\epsilon_i,\ i=1,\dots, n,\) are independent standard normal variables that are also independent of the data. Let \(\tilde{w}_n(t)\) be a data-dependent function that converges in probability to a bounded function \(\tilde{w}^*(t)>0\), uniformly in t over I. Let \(\tilde{c}_\alpha\) be upper αth percentile of \(\sup_{t\in [a, b]}|V^*(t)/\tilde{w}^*|\). Similarly, to the argument above for RR(t), an asymptotic \(100(1-\alpha)\%\) simultaneous confidence band for \(\textit{CHR}(t), t\in I,\) can be obtained as

$$\widehat{\textit{CHR}}(t){\rm exp}\left(\mp \tilde{c}_\alpha \frac{\tilde{w}_n(t)}{\sqrt{n}\widehat{\textit{CHR}}(t)}\right),$$
(21.15)

where \(\tilde{c}_{\alpha}\) can be approximated empirically from a large number of realizations of the conditional distribution of \(\sup_{t\in [a, b]}|\hat{V}(t)/\tilde{w}_n |\) given the data. For \(\tilde{w}_n=\sqrt{\hat{\sigma}_{\textit{CHR}}(t,t)}, 1+\hat{\sigma}_{\textit{CHR}}(t,t)\) and \(\tilde{w}_n\equiv 1\) respectively, we obtain the equal precision, Hall–Wellner type, and unweighted confidence bands for \(\textit{CHR}(t).\)

21.4 Simulation Studies

For stable moderate sample behavior, we restrict the range of the confidence bands for both RR(t) and \(\textit{CHR}(t)\). The range is between the 40th percentile of the uncensored data at the lower end and the 95th percentile of the uncensored data at the upper end. The lower end point of this range seems a little high compared to other situations such as the inference on the hazard ratio in Yang and Prentice (2011). This is to provide a range in which the nonparametric procedures and the proposed model-based procedures in (21.10–21.12) and (21.15) are to be compared. Toward the beginning of the data range, the nonparametric estimates can be very unstable and the confidence intervals can be quite wide, as will be illustrated in the data example to follow. Also, compared with the hazard ratio as a measure of the temporal pattern of the treatment effect, RR(t) and \(\textit{CHR}(t)\) measure the cumulative treatment effect. Thus in biomedical research, there is little interest in their behaviors near the beginning of the data range. In various applications to clinical trial data, the specified range for the confidence bands is not nearly as restrictive as it seems and contains a meaningful interval of the data range. In the estimating procedures, the function \(\hat{P}(t;{\bf b})\) is replaced by an asymptotically equivalent form

$$\begin{aligned} {\rm exp}\left(- {\int^t_0} \frac{1}{\sum^n_{i=1}I\Big(X_i\geq s\Big)}d\left\{\sum^n_{i=1}\delta_i e^{-b_2Z_i}I\Big(X_i\leq s\Big)\right\}\right).\end{aligned}$$

For simulation studies reported here and for the real data application in Sect. 21.5, τ was set to include all data in calculating \({\hat{\bf \beta}}\). All numerical computations were done in Matlab. Some representative results of simulation studies are given in Table 21.1, where lifetime variables were generated with R(t) chosen to yield the standard exponential distribution for the control group. The values of \({\beta}\) were \((\log(.9), \log(1.2))\) and \((\log(1.2), \log(.8))\), representing \(1/3\) increase or decrease over time from the initial hazard ratio, respectively. The censoring variables were independent and identically distributed with the log-normal distribution, where the normal distribution had mean c and standard deviation 0.5, with c chosen to achieve various censoring rates. The data were split into the treatment and control groups by a 1:1 ratio. The empirical coverage probabilities were obtained from 1000 repetitions, and for each repetition, the critical values \(c_{\alpha}\) and \(\tilde{c}_{\alpha}\) were calculated empirically from 1,000 realizations of relevant conditional distributions. For both RR(t) and \(\textit{CHR}(t)\), the equal precision bands, Hall–Wellner type bands and unweighted bands are denoted by EP, HW, and UW respectively.

Table 21.1 Empirical coverage probabilities of the three types of confidence bands HW, EP, and UW, for the failure probability ratio RR and the ratio of cumulative hazards \(\textit{CHR}\), under model (21.1), based on 1000 repetitions

Note that with 1,000 repetitions and \(1.96\sqrt{.95\cdot 0.05/1000}=0.0135\), we expect the empirical coverage probabilities to be mostly greater than 0.9365. In Table 21.1, for RR, the empirical coverage probabilities are greater than 0.9365 for all but one case with the smallest sample size n = 100 and at \(50~\%\) censoring. For CHR, the confidence bands are mostly conservative, with all empirical coverage probabilities greater than 0.95. One plausible explanation for this conservative phenomenon could be that the estimate for \(\textit{CHR}(t)\) is more directly related to the martingales associated with censored data, resulting in better approximations.

Fig. 21.1
figure 1

95 % point-wise confidence intervals and simultaneous confidence bands of the failure probability ratio for the WHI VTE data: Outside red solid lines—equal precision confidence band, magenta dash-dotted lines—Hall–Wellner confidence band, outside cyan dashed lines—unweighted confidence band, dotted lines—95 % point-wise confidence intervals, central black solid line—the estimated failure probability ratio under the model, central green dashed line—the estimated failure probability ratio using Kaplan–Meier estimators

21.5 Applications

For the Women’s Health Initiative (WHI) randomized controlled trial of combined (estrogen plus progestin) postmenopausal hormone therapy, an elevated coronary heart disease risk was reported, with overall unfavorable health benefits versus risks over an average of 5.6-year study period (Writing Group 2002; Manson et al. 2003). After controlling for time from estrogen-plus-progestin initiation and confounding, hazard ratio estimates still indicate elevated risk of coronary heart disease and venous thromboembolism early on during the trial, under a piece-wise Cox model assuming constant hazard ratio separately on 0–2 years, 2–5 years, and 5+ years (Prentice et al. 2005). Let us first illustrate the methods developed in the previous sections with the venous thromboembolism (VTE) data from the WHI clinical trial. Among the 16,608 postmenopausal women (\(n_1=8102\)), there were 167 and 76 events observed in the treatment and control group respectively, implying about \(98.5~\%\) censoring, primarily by the trial stopping time. Fitting model (21.1) to this data set, we get \({\hat{\bf \beta}}=(4.72, 0.014)^T\). Plots of the model based survival curves and the Kaplan–Meier curves for the two groups show that the model is reasonable. For RR(t), the three 95 % simultaneous confidence bands (EP, HW, and UW) under model (21.1) are given in Fig. 21.1, together with the point estimates. The nonparametric point estimates are also included to compare with the model-based estimates. Furthermore, model-based 95 % point-wise confidence intervals are included as well, to indicate by how much the confidence intervals are widened to improve from point-wise to simultaneous coverage. From Fig. 21.1, it can be seen that the Hall–Wellner type band and the equal precision band are almost the same a little after the 4th year. However, the Hall–Wellner type band is noticeably wider toward the beginning of the date range. The unweighted band maintains a roughly constant width through the data range considered, which is roughly as wide as the equal precision band at the begining of the data range, but wider throughout the rest of the data range. Similar phenomena are often seen in additional applications not reported here. Based on various applications and simulation studies, we recommend that the equal precision band be used in making inference on RR(t) under model (21.1).

For \(\textit{CHR}(t)\), the 95 % point-wise confidence intervals and confidence bands under model (21.1) are given in Fig. 21.2. Similarly to the case for RR(t), the equal precision band is preferred in making inference on \(\textit{CHR}(t)\) under model (21.1). From Fig. 21.1 and 21.2, there is evidence that from 2.5 to 7.5 years, the event probability is higher in the treatment group than in the control group.

Fig. 21.2
figure 2

95 % point-wise confidence intervals and simultaneous confidence bands of the ratio of cumulative hazards for the WHI VTE data: Outside red solid lines—equal precision confidence band, magenta dash-dotted lines—Hall–Wellner confidence band, em outside cyan dashed lines—unweighted confidence band, dotted lines\(95~\%\) point-wise confidence intervals, central black solid line—the estimated failure probability ratio under the model, central green dashed line—the estimated failure probability ratio using Kaplan–Meier estimators

For comparison, from Yang and Prentice (2011), the 95 % point-wise confidence intervals and equal precision confidence band are obtained for the hazard ratio under model (21.1), given in Fig. 21.3. The results are in good agreement with the results under the piece-wise Cox model used in Prentice et al. (2005). In an interval near the beginning of the data range, there is greater hazard of venous thromboembolism in the treatment group than in the control group. This interval has shorter length than the intervals in Fig. 21.1 and 21.2 where the treatment group has a higher event probability than in the control group.

Fig. 21.3
figure 3

95 % point-wise confidence intervals and simultaneous confidence bands of the hazard ratio function for the WHI VTE data: Red solid lines—equal precision confidence band, blue dash-dotted lines\(95~\%\) point-wise confidence intervals, dotted line—the estimated hazard ratio function

Note that the simple bootstrap method for approximating \(c_\alpha\) and \(\tilde{c}_\alpha\), when \(w_n\equiv 1\) and \(\tilde{w}_n\equiv 1\) respectively, is already much more computationally intensive than the normal resampling approximation employed here. With \(w_n(t)=\sqrt{\hat{\sigma}_{RR}(t)}\) and \(\tilde{w}_n=\sqrt{\hat{\sigma}_{\textit{CHR}}(t)}\), the bootstrap method would require one more level of bootstrapping samples to obtain the estimated variance functions, thus further increasing the computational burden. In comparison, once \(\hat{\sigma}_{RR}(t)\) and \(\hat{\sigma}_{\textit{CHR}}(t)\) are obtained, the normal resampling approximation only needs a small additional computation and programming cost.

To see how the nonparametric procedures compare with the proposed model-based procedures, Fig. 21.4 presents 95 % point-wise confidence intervals, both model-based and nonparametric, together with the point estimates, of \(\textit{CHR}(t)\) for the VTE data from WHI. It can be seen that the nonparametric estimates and confidence intervals can be quite unstable near the beginning of the data range. As \(t \downarrow 0\), the hazard ratio at t and \(\textit{CHR}(t)\) should both approach the same limit, which is \(e^{\beta_1}\) under the model. From Fig. 21.4, the model-based estimator of \(\textit{CHR}(t)\) near t = 0 takes values around 5, which is comparable to results in the literature, while the nonparametric estimator of \(\textit{CHR}(t)\) near t = 0 takes much more extreme values. Also, the model-based estimates and confidence intervals are smoother throughout, and the confidence intervals are often narrower than their nonparametric counterparts. Similar phenomena are also present for RR(t) (omitted). This is a major reason that the nonparametric estimates for RR(t) and \(\textit{CHR}(t)\) are rarely used in biomedical studies.

Fig. 21.4
figure 4

Model-based and nonparametric 95 % point-wise confidence intervals of the ratio of cumulative hazards for the WHI VTE data: Outside red solid lines—model based 95 % point-wise confidence intervals, outside blue dashed lines—nonparametric 95 % point-wise confidence intervals, central magenta solid line—model based estimate of the ratio of cumulative hazards, central blue dashed line—nonparametric estimate of the ratio of cumulative hazards

Next, we look at an example with mild violation of the proportional hazards assumption. The Digoxin Intervention Group trial (The Digitalis investigation group 1997) was a randomized, double-blind clinical trial on the effect of digoxin on mortality and hospitalization. In the main trial, patients with left ventricular ejection fraction of 0.45 or less were randomized to digoxin (3397 patients) or placebo (3403 patients) in addition to diuretics and angiotensin-converting-enzyme inhibitors. We look at the data on death attributed to worsening heart failure. For testing the proportional hazards assumption, the acceleration test statistic of Breslow et al. (1984) gives a p- value of 0.098. This indicates some mild proportionality violation. For RR(t),the 95 % point-wise confidence intervals and confidence bands under model (21.1) are given in Fig. 21.4. Possibly due to only a mild violation of the proportionality assumption, the Hall–Wellner type band, the equal precision band and the unweighted band are almost the same for the entire data range considered. From Fig. 21.4, there is evidence that for the range of 1.5–3 year, the treatment reduces the event probability.

Fig. 21.5
figure 5

95 % point-wise confidence intervals and simultaneous confidence bands of the failure probability ratio for the DIG data: Outside red solid lines—equal precision confidence band, magenta dash-dotted lines—Hall–Wellner confidence band, outside cyan dashed lines—unweighted confidence band, Dotted lines\(95~\%\) point-wise confidence intervals, central black solid line—the estimated failure probability ratio under the model, central green dashed line—the estimated failure probability ratio using Kaplan–Meier estimators

For \(\textit{CHR}(t)\), the 95 % point-wise confidence intervals and confidence bands under model (21.1) are given in Fig. 21.5. Again all three confidence bands are very close to each other. From Fig. 21.5, there is evidence of reduced event probability in the treatment group for the range of 1.3 year to 3 years.

Fig. 21.6
figure 6

95 % point-wise confidence intervals and simultaneous confidence bands of the ratio of cumulative hazards for the DIG data: Outside red solid lines—equal precision confidence band, magenta dash-dotted lines—Hall–Wellner confidence band, outside cyan dashed lines—unweighted confidence band, dotted lines—95 % point-wise confidence intervals, central black solid line—the estimated failure probability ratio under the model, central green dashed line—the estimated failure probability ratio using Kaplan–Meier estimators

Again for comparison, from Yang and Prentice (2011), the 95 % point-wise confidence intervals and equal precision confidence band are obtained for the hazard ratio under model (21.1), given in Fig. 21.6. From Fig. 21.6, there is evidence that from 0 to .75 year, in the treatment group there is reduced hazard of death attributed to worsening heart failure. Note that this range is much narrower than the range where there is evidence of reduced event probability in the treatment group seen from Fig. 21.4 and 21.5 (Fig. 21.7).

Fig. 21.7
figure 7

95 % point-wise confidence intervals and simultaneous confidence bands of the hazard ratio function for the DIG data: Red solid lines—equal precision confidence band, blue dash-dotted lines—95 % point-wise confidence intervals, dotted line—the estimated hazard ratio function

21.6 Discussion

We have studied the asymptotic properties of the estimators for the failure probability ratio and the ratio of cumulative hazards under a semiparametric model applicable to a sufficiently wide range of applications. Point-wise confidence intervals and confidence bands are developed for the two ratios. In simulation studies, the confidence bands have good performance for moderate samples. Among the confidence bands with different weights, the equal precision confidence band is recommended based on various simulation studies and clinical trial data applications. Similarly, inference procedures can be developed for the odds ratio. The point-wise confidence intervals and confidence bands for the odds ratio are usually wider than the corresponding intervals and bands for the failure probability ratio and the ratio of cumulative hazards. Due to space limit those results are not presented here. When the censoring is heavy, there are very little differences among the confidence intervals and bands for the failure probability ratio, the ratio of cumulative hazards, and the odds ratio. The confidence intervals and bands presented here provide good visual tools for assessing cumulative effect of the treatment. They can supplement the visual tools based on the hazard ratio which focuses the temporal pattern of the treatment effect. It is also of interest to extend the results here by considering adjustment for covariate via a regression analysis. These and other problems are worthy of further exploration.