Introduction

Various metrics for tumor growth inhibition (TGI) have been shown to predict overall survival (OS) in solid tumors and in multiple myeloma. These metrics include both continuous and categorical measures of change in tumor size at first or second post-baseline assessment, as well as model-based descriptors of TGI (e.g., growth rate or time to tumor (re)growth, reviewed in Ref. [1]). These measures offer the potential to be used as alternative endpoints early in clinical studies to support end-of-Phase II decisions and Phase III study design using model-based simulation approaches [1, 2].

In patients with metastatic renal cell carcinoma (mRCC), estimates of tumor growth rate were slower in patients treated with sunitinib compared with interferon-α (IFN-α) and growth rate correlated with OS [3]. Everolimus dose–response was assessed in the Phase III RECORD-1 trial by means of a longitudinal exposure response model [4]. In this same trial, growth of non-target lesions and appearance of new lesions were predictive of OS in a multivariate Cox model [5]. Maitland et al. [6] analyzed the longitudinal tumor size data from a Phase III trial of sorafenib and evaluated the operational characteristics of a drug effect parameter estimate in randomized Phase II trials between sorafenib and hypothetical comparators. However, all these efforts lack a quantitative framework to predict OS based on tumor growth inhibition [1] that could further be used to support the design and analysis of upcoming Phase II and Phase III studies.

The objectives of this work were (1) to leverage historical data and assess the quantitative link between TGI and OS based on data collected in a variety of Phase II and Phase III studies of temsirolimus, sunitinib or axitinib and (2) to identify TGI thresholds that are predictive of expected OS benefit which could be used as targets to support early decisions (at end of Phase II or at an interim point of a Phase III clinical trial).

Methods

Trials and data

The data came from patients with mRCC who received treatments with a variety of mechanism of action including a cytokine: IFN-α, an mTOR inhibitor: temsirolimus, or vascular endothelial growth factor receptors inhibitors: sunitinib, sorafenib or axitinib. The data were collected from ten Phase II or Phase III studies summarized in Table 1. Study protocols and results are described in the respective publications [718].

Table 1 Characteristics of the studies

These studies have been conducted to support development and registration of temsirolimus [7], sunitinib [813] and axitinib [1418].

Patient characteristics

The following baseline patient characteristics were available in all studies and tested as prognostic factors: age, sex, race, eastern cooperative oncology group (ECOG) performance status (PS), tumor burden (sum of longest diameters, number of metastatic sites, presence of lung or liver metastases), disease history (time from diagnosis, prior nephrectomy, line of therapy), and hemoglobin, lactate dehydrogenase (LDH) and corrected calcium levels.

These characteristics include the prognostic factors that have been well described, e.g., the ones used in the Memorial Sloan–Kettering Cancer Center (MSKCC) prognostic score [19]: PS, LDH, calcium, hemoglobin, and prior nephrectomy or the three factors recently proposed by Bamias et al. [20]: PS, time from diagnostic and number of metastatic sites.

Tumor growth inhibition metrics

Patient-level TGI metrics were obtained using parameter estimates derived from longitudinal tumor size models. To be evaluable for TGI modeling, a patient needed to have baseline and at least one post-baseline tumor size measurement (sum of longest diameters of target lesions according to the RECIST criterion version 1.0 [21]). Three alternative TGI models previously proposed by Stein (bi-exponential model) [22, 23], Wang (linear growth-exponential shrinkage model) [24] or Claret [simplified TGI (sTGI) model] [23] were evaluated to fit the data from each study separately and from the pooled studies using PHOENIX NLME FOCE algorithm [25]. Model equations are given in Supplementary Materials. The models were compared using Akaike [26] criterion and standard goodness-of-fit plots. These models were not subjected to any simulation-based assessment as they were not meant to be used for simulation but to estimate the TGI metrics for use in the OS model.

The three TGI metrics were calculated based on individual post hoc parameter estimates. Two were calculated using the sTGI model [23]: early tumor shrinkage (ETS), the ratio of tumor size (Y) at week x (x = 8, 10, 12 weeks) to baseline (Y 0) and the time to tumor growth (TTG), which was dependent on the estimates of growth rate (KL), kill rate (KD) and resistance appearance rate (λ).

$${\text{week}}\,x\,{\text{ETS}}_{i} = \frac{{Y_{{{\rm{week}}x, i}}}}{{Y_{0i}}}$$
$${\text{TTG}}_{i} = \frac{{{\rm{log}}\left({{\rm{KD}}_{i}} \right) - {\rm{log}}\left( {{\rm{KL}}_{i}}\right)}}{{\lambda_{i}}}$$

The third metric was the growth rate, G, estimated using the biexponential model [22]. Note TTG is negative when KL > KD, i.e., in patients with progression at first post-baseline visit. TTG can unequivocally be calculated for each and every patient based on the model parameter estimates. It is not a time-varying covariate and can be obtained independently of patient survival. Some practical aspects of using TTG were recently discussed [27].

Survival model development

A covariate-free parametric survival regression model (using the survreg function in R version 2.15.0) was applied to the OS data in order to identify the probability density function that best described the observed survival times. The most suitable distribution was selected among normal, lognormal, Weibull, logistic, log-logistic, extreme and exponential according to the AIC values and goodness-of-fit plots.

Patient baseline characteristics and TGI metrics listed above were tested to explain variability in OS. Covariates effects on OS were first screened using Kaplan–Meier and (nonparametric) Cox regression analyses (using survfit and coxph functions, respectively, in R, version 2.15.0).

A “full” model was built by including all significant covariates from the Cox univariate analysis (p < 0.05 per the log-likelihood ratio test). A backward stepwise elimination was subsequently employed. At each elimination step, the relative influence of each remaining covariate on the model was re-evaluated by deleting it from the reduced model on an individual basis using a cutoff of p < 0.01. As the tumor size metrics were highly correlated, a separate backward stepwise elimination was conducted for each metric.

The model performances were evaluated using a posterior predictive check (PPC). OS distributions and hazard ratios (HR) were simulated 1000 times for the patients, as in the original studies. Model parameters were sampled from the estimated mean values and the uncertainty in parameter estimates for each simulated study replication. Censoring was replicated by sampling patient study duration from a uniform distribution of 0–1500 days consistent with the observed censored OS data.

Simulations

Simulations were performed to assess the relationship between the expected effect size in OS, i.e., the HR of an investigational treatment versus standard of care (SOC) (assumed to be sunitinib) and the difference in the effect size in TGI metric (e.g., week 8 ETS). Multiple replications (n = 1000) of virtual Phase III studies comparing an investigational treatment (i.e., a new agent or a combination) to SOC were simulated using the final ETS vs OS model. Patient characteristics were reflective of those observed in the sunitinib first-line study (9, 10). The power of the Phase III studies (i.e., probability of HR ≤ target, e.g., 0.80) was also calculated conditional on the difference in week 8 ETS. This setting would mimic the calculation of expected HR that could be performed as soon as when tumor size data become available to estimate TGI metric, and support interim or end-of-Phase II decisions or interim analysis of Phase III.

Results

Tumor size model and TGI metrics

Over the 2628 patients included in the ten clinical studies, 2552 (97.1 %) were evaluable for tumor size modeling. The best fit of the data was obtained from the sTGI model, followed by Stein and Wang models for each of the studies and on the pooled database (Akaike criterion of 64314, 65600, 66990 for the three models, respectively). Parameter estimates as well as diagnostic plots for both sTGI and bi-exponential models are given in supplementary material (Supplementary Tables 1, 2; Figures 1, 2).

Overall survival model

In the univariate Cox analysis (Supplementary Table 3), TTG (implemented using log(TTG)) was the most significant TGI metric (score of 285) followed by log(G) (score of 229) and ETS (score of 136–140). Most baseline prognostic factors (11 of 14 tested) were also significant. A lognormal distribution most appropriately characterized the OS distribution, as previously observed for NSCLC [24]. Backward elimination of the covariates retained in the Cox univariate analysis led to the selection of one of the TGI metric and of identical baseline prognostic factors regardless of the TGI metric in the model: ECOG, hemoglobin, corrected calcium, LDH, number of metastases and time from diagnosis. The OS model with log(TTG) best described the data (log likelihood of −9921); however, ETS models also demonstrated good performance in the PPC. The model based on ETS at week 8 (log likelihood of −10017) was preferred and retained for further simulations as week 8 ETS has the potential to constitute an early biomarker of treatment effect and offers the ability to pre-specify the time point in a prospective planned interim data analysis. Parameter estimates of the OS models are given in Table 2 (week 8 ETS) or in Supplementary Table 4 (log(TTG)). Parameters estimates full covariance matrix for the two models are given in Supplementary Table 5 and 6. The final model was based on 2491 patients, i.e., the subset of patients will all baseline prognostic factors documented among the 2552 patients with TGI data. Patients received six different treatments: sunitinib (n = 886), axitinib (n = 479), sorafenib (n = 313), temsirolimus (n = 182), IFN-α (n = 475) and temsirolimus + IFN-α (n = 156). All parameters are well estimated. According to these models, a poor prognosis is associated with ECOG performance status >0, more metastatic sites, lung metastasis, low hemoglobin, high calcium, high LDH and shorter time from diagnosis, all factors being known prognostic factors linked to disease severity [19, 20]. Finally, the probability of survival increases in patients with more TGI (e.g., longer TTG, larger ETS), independently of the baseline prognostic factors. Depending on the goal, the two models may have distinct applications that will be discussed later.

Table 2 Parameter estimates of the week 8 ETS OS model

The models were qualified by simulating survival distributions for each of the ten studies (not shown) as well as survival distributions stratified by tertile of predictors as illustrated in Fig. 1 for week 8 ETS and supplementary Figure 3 for TTG. In addition, the models were qualified in simulating survival distributions for each of the six treatments investigated in the ten studies (sunitinib, axitinib, sorafenib, temsirolimus, interferon, temsirolimus and interferon) without stratification and after stratification by low and high levels of the predictors (separated at the median of 0.892 to mitigate the smaller number of patients; supplementary Figures 4, 5 for the ETS model). These assessments support treatment independence of the week 8 ETS OS relationship. Similar results were obtained with the TTG model (not shown). Finally, the models were qualified by simulating the HR of sunitinib versus IFN-α in study 1034 [10], as this study was the basis of the simulations that will be presented next. Figure 2 illustrates the results for the week 8 ETS model where the observed HR of 0.88 was within the 95 % prediction interval of 0.71 to 0.92 (median prediction of 0.82).

Fig. 1
figure 1

Predictive check of the week 8 ETS OS model by tertiles of week 8 ETS. Solid lines Kaplan–Meyer plots by tertiles of week 8 ETS: dark gray first tertile [0.123,0.843), medium gray second tertile [0.843, 0928), light gray third tertile [0.928, 2.36); areas: 95 % prediction intervals by the model

Fig. 2
figure 2

Predictive check of the sunitinib to INF-α HR in first-line sunitinib study (9) using the week 8 ETS OS model. Predictive distribution with 95 % prediction intervals between dash lines versus observed, solid line

Simulations

The predicted distribution (95 % prediction interval) of HR comparing an investigational treatment to sunitinib, taken as the SOC, in a 600-patient study (300 per arm) as a function of difference in TGI metric (delta in week 8 ETS) is shown in Fig. 3. The greater the shrinkage at week 8 for the investigational treatment relative to sunitinib (delta in week 8 ETS), the lower the expected HR between the two treatments. A 20 % difference in week 8 ETS may result in an improved OS with a HR of 0.73 (median HR prediction, black thick line). A 300-patient per arm Phase III study would have an 80 % probability of technical success (PTS; 80th percentile HR prediction, black thin line, for a 20 % week 8 ETS difference) to show a HR < 0.80. The two dotted lines can be moved to read the expected PTS to show other HRs conditional of week 8 ETS difference. Alternative scenarios could be simulated to explore alternative study designs, e.g., conditional on an observed week 8 ETS difference in a Phase II study, Phase III study sample size, PTS for alternative target product profiles (HR) given week 8 ETS difference.

Fig. 3
figure 3

Predictive distribution (95 % prediction interval) of HR comparing an investigational treatment to sunitinib in a 600-patient study (300 per arm) as a function of difference in tumor growth inhibition (delta in week 8 ETS). Gray envelope: 95 % prediction interval. Black lines expected median (thick), expected 80th percentile (thin), vertical dotted line 20 % week 8 ETS difference from sunitinib, horizontal dotted line cross the vertical dotted line and the y axis at HR 0.80 indicating that a 300-patient per arm Phase III study would have an 80 % probability of technical success to show a HR < 0.80

Discussion

None of the currently available treatments are consistently effective in advanced and/or metastatic RCC. There is now a range of effective targeted agents available; however, a number of patients may not respond, or even progress rapidly, and as such there is still an unmet medical need [28] for this disease. The use of model-based drug development in oncology is gaining momentum [1, 29]. A few models have been developed to support the development of new investigational treatments in mRCC [36], but a modeling and simulation framework linking short-term TGI (available in Phase II or in interim Phase III data) and OS outcome is lacking. We leveraged historical data from ten Phase II and Phase III studies in 2552 patients with mRCC conducted as part of the development of temsirolimus, sunitinib or axitinib. Longitudinal models for tumor size data over time were used to estimate TGI metrics that captured either short-term (week 8 ETS) or longer-term (TTG) effect of the various treatments [1]. Both short-term and longer-term TGI metrics (all based on target lesions) were predictive of overall survival benefit for both first- and second-line treatments. This is not consistent with Stein et al. [5] where non-target lesion progression at weeks 2–14 was the only significant TGI metric in a multivariate Cox model. Appearance of new lesions was borderline significant when progression due to target lesions was not. This discrepancy may be due to the small sample size in the Stein analysis based on 246 patients of a single everolimus Phase III trial [5], while this analysis was based upon 2552 patients from 10 trials.

Multivariate models incorporated the known baseline prognostic factors in RCC [19, 20] (ECOG performance status, hemoglobin, corrected calcium, LDH levels as well as markers of tumor burden (number of metastases, presence of lung metastases) and time from diagnosis. Line of therapy was not a significant predictor after adjusting for these prognostic factors. The TGI metrics captured drug effect as shown by the ability of the models to accurately predict sunitinib versus IFN-α HR [10]. In addition, model assessment in the six different treatments that were considered in those ten studies (sunitinib, axitinib, sorafenib, temsirolimus, IFN-α and temsirolimus + IFN-α) with a variety of mechanisms of action suggested that the link between TGI and OS may be independent of treatment [1]. This is the first multivariate OS model qualified for simulations of expected OS and HR in mRCC. This model can be leveraged to simulate expected OS for new investigational treatments as well as the expected HR for new investigational treatments versus standard of care, by utilizing early longitudinal tumor size data. These models can therefore be used to support early clinical decisions (Phase II, interim Phase III). As an illustration, simulations were performed to define TGI metric targets that may be used to design Phase II studies. On a case by case basis, an early TGI metric (e.g., week 8 ETS) or a longer-term TGI metric (e.g., TTG [1] that requires more mature data but may better capture drug effects) may be used as a promising endpoint in Phase II studies [30, 31]. According to the simulations based on week 8 ETS, an investigational treatment inducing a 20 % week 8 shrinkage difference on top of the reference (here sunitinib) may result in an improved OS with a median HR of 0.73 versus sunitinib. A 300-patient per arm Phase III study would have 80 % PTS to show a HR ≤ 0.80. A Phase II study may then be prospectively designed to show achievement of such a TGI threshold. The published OS model and resultant simulations can be leveraged to support Phase II design and predict expected OS and HR (based on early observed TGI in early clinical data obtained either in Phase II or Phase III studies), thereby informing important mRCC development decisions, e.g., Go/No Go and dose regimen selection.