Introduction

Recurrent glioblastoma is associated with a grim prognosis. Despite global research efforts, no treatments have been shown to confer substantial survival benefit in large-scale clinical trials, and progression within weeks to months of commencing any second line treatment is almost inevitable. It is therefore crucial to ensure that treatment is not increasing morbidity by adversely affecting health-related quality of life (HRQL), and further, to establish whether any treatment could improve HRQL for individuals affected by this devastating cancer.

In the CABARET randomized phase 2 clinical trial, we compared bevacizumab monotherapy with bevacizumab plus carboplatin in adult patients with recurrent glioblastoma [1]. The primary outcome of the study, progression-free survival, showed no difference between arms [1]. A secondary endpoint was to compare HRQL outcomes between the two arms, which was particularly important given the potential for an additional cytotoxic drug to add to toxicity and consequent HRQL losses.

Here we present HRQL results for Part 1 of the CABARET study, in which patients were randomized to receive either bevacizumab or bevacizumab plus carboplatin. For the smaller Part 2 of the trial (a second randomization assessing continuation vs. cessation of bevacizumab after recurrence on Part 1), HRQL outcomes, already reported, showed no difference in time to deterioration in overall HRQL for patients who continued and those who ceased bevacizumab [2]. In this paper, we describe HRQL in the study population and assess whether adding carboplatin to bevacizumab resulted in either improvement in or detriment to HRQL. At the trial outset, we postulated that any PFS benefit of adding carboplatin may also translate to improvement in HRQL. Given the negative primary results of the study, we then wished to understand whether the addition of carboplatin resulted in worse HRQL outcomes due to additional treatment side effects, or whether HRQL changes were largely determined by disease status. The trial design also allowed us to gain understanding of the proportion of patients experiencing symptomatic benefit when exposed to bevacizumab treatment on either study arm.

Methods

Eligibility

The eligibility criteria for the CABARET clinical trial have been described in detail previously [1]. Briefly, consenting adults (over 18 years) with recurrent glioblastoma who had previously been treated with radiotherapy and temozolomide chemotherapy, with Eastern Cooperative Oncology Group (ECOG) performance status 2 or better, and adequate hematological and other organ function, were eligible to participate. Patients with more than one recurrence and no prior treatment other than radiotherapy or temozolomide were eligible.

Trial design and treatments

Patients in Part 1 of CABARET were randomly assigned to receive either intravenous bevacizumab monotherapy (5 mg/kg every 2 weeks), or intravenous bevacizumab at the same dose plus intravenous carboplatin (area under the curve (AUC) = 5, every 4 weeks). Randomization was stratified by treatment center, age, sex, and ECOG performance status. The efficacy endpoints of the trial and its conduct have been published [1]. Written consent was obtained from each participant. Patients were expected to complete planned HRQL assessments while on the study, and were considered off-study when treatment ceased owing to site-determined disease progression, unacceptable toxicities, or death.

HRQL assessment and scoring

We used the European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire Core 30 Version 3.0 (QLQ-C30) and Brain Cancer Module (QLQ-BN20); both are well-validated instruments and have been used in many clinical trials [3, 4]. The QLQ-C30, the core questionnaire of the EORTC’s modular HRQL suite, covers core domains of HRQL and symptoms common to all cancer. It is designed to be complemented by site-specific modules, such as the QLQ-BN20, designed specifically for brain cancer patients. Together, they cover a comprehensive set of HRQL issues pertinent to patients in the CABARET trial, and enable comparison with other trials.

The QLQ-C30 contains 30 items covering five aspects of functioning (physical, role, emotional, cognitive, social), eight symptoms (fatigue, pain, nausea/vomiting, dyspnea, insomnia, appetite loss, constipation, diarrhea), financial impact, and global health status/quality of life (global HRQL). The QLQ-BN20 covers future uncertainty, visual disorder, motor dysfunction, communication deficit, headache, seizures, drowsiness, hair loss, itching, difficulty with bladder control, and weakness of both legs. Patients respond by self-report, with most items rated on a 4-point scale, from 1 “not at all” to 4 “very much”, except for the two global health status/quality of life items, which are measured on a 7-point Likert scale (“very poor” through “excellent”). Patient responses are scored on HRQL measurement scales according to standard algorithms [5]. Some scales represent the average of several items, while others contain only a single item. All scales have a 0–100 range, but the direction differs, with a higher score representing better outcomes for functioning domains and global HRQL, while for symptoms, a higher score represents greater frequency and/or impact of the symptom. Improvement in function is represented by an increase in score, whereas improvement in symptoms is represented by a decrease in score.

Six domains from the QLQ-BN20 were identified by clinician consensus to potentially represent baseline symptoms of progressive glioblastoma, specifically: cognitive functioning (2 items), communication deficit (3), drowsiness (1), headaches (1), motor dysfunction (3), and visual disorder (3). The number of patients with baseline deficits with potential for at least 10-point improvement in each domain were identified.

HRQL assessments occurred at baseline (before treatment), on day 1 or up to 3 days before each 4-week treatment cycle, and at the end of treatment. Wherever possible, the HRQL questionnaires were completed before the patient was reviewed and underwent treatment for that cycle in order to avoid potential bias from patients being aware of their disease status at the time of progression. Questionnaires were completed on day 1 of each cycle, as this was the day patients were in hospital. No further HRQL testing was conducted after 30 days beyond the date of trial closure (December 2014).

Participation in HRQL assessment was mandatory; the site research nurse administered the questionnaires at each relevant time point. Additional explanation or reading instructions and questions aloud for patients with visual or reading difficulty was allowed; however, only the patient could complete the questionnaire. Reasons for non-completion were documented wherever possible.

Statistical analysis

A prespecified HRQL statistical analysis plan was developed before the database was interrogated for the purpose of the HRQL analysis. All statistical analyses were conducted in SAS9.3 (SAS Institute Inc., Cary, NC, USA.), with no adjustment for multiple statistical comparisons.

HRQL assessment participation rate (‘Participation’) was calculated as the number of patients who were administered the questionnaires at designated time points divided by the total number of patients still on the study and expected to complete at each time point (the ‘‘number expected’’ population) [6]. Additionally, for participating patients, we calculated the proportion of items completed of the total of 50 items in the QLQ-C30 and QLQ-BN20 (‘Completion’).

The sample size of 120 for the CABARET trial was determined by the primary endpoint of progression-free survival; analyses of HRQL and other secondary endpoints are exploratory in this phase 2 study. Eight domains were prespecified in the HRQL analysis plan, selected on the basis of clinical rationale and their representativeness of the effect of glioblastoma on function and symptoms: global HRQL, social functioning, role function, physical function, cognitive function, drowsiness, communication deficit, and motor dysfunction. Descriptive analyses are presented for baseline scores and the mean difference from baseline at each time point. Mean change scores were calculated as the difference between a patient’s baseline score and their average score on treatment. This was calculated for all patients remaining progression-free who had two or more HRQL results for each domain, and then combined as the mean of all participants’ mean change scores from baseline for each domain tested. The treatment effect on the change scores was assessed using a two sample t test, as a difference in means between treatment arms. This was exploratory and uncorrected for multiple comparisons. We also assessed the proportions of patients who improved and who deteriorated from baseline [6]. A 10-point change was selected, as it is commonly considered to be the minimum clinically relevant change in a 0–100 HRQL scale [7]. For each of the eight selected domains, logistic regression was used to estimate odds ratios to compare trial arms for the effect of treatment on the proportions of patients with a ≥ 10 point improvement, and separately for a ≥ 10 point deterioration. The models included randomized treatment only and were not adjusted for any other risk factors.

The Kaplan–Meier method was used to determine the median time to HRQL scale deterioration (≥10 points), progression or death, whichever came first, for the same eight domains. Proportional-hazards regression models were used to determine hazard ratios for the effect of treatment on this outcome. Time to deterioration in HRQL was measured as the time between randomization and the first recorded deterioration in that scale or item. Deterioration was defined as a worsening of ≥10 points on a 0–100 scale persisting for at least 4 weeks, or a single worsening of ≥10 points on a 0–100 scale where further measurements had not been obtained because of progression, death, or inability to complete the questionnaires due to clinical deterioration. Patients whose HRQL did not deteriorate and remained alive or were lost to follow-up were censored at the date of last contact.

Results

Between November 2010 and March 2012, 122 patients were enrolled in the CABARET trial from 18 sites across Australia, with 120 receiving at least one study treatment and two patients declining participation after randomization. At baseline, 117 of the 122 randomized patients (96%) participated in HRQL questionnaires, and 116 provided analyzable data. Baseline characteristics of these patients, which were similar in the two randomized arms, are shown in Supplementary Table 1.

Participation and completion

Participation and completion rates for scheduled assessments and items within questionnaires were over 90% at baseline and the majority of treatment cycles, and similar in the two arms of the study (Table 1). From cycle 9 (approximately week 36) onwards, reported participation and completion was, at almost all time points, 100% for the small number of patients who continued on treatment beyond that point. At the end-of-treatment timepoint, 113 of the 122 randomized patients were potentially able to complete HRQL questionnaires, after exclusion of two who withdrew before starting treatment, five who died without progression, and two who continued treatment without site-determined progression. Participation rates were lower at this timepoint: n = 72, 64%. Reasons for non-participation at the end of treatment were documented and are shown in Supplementary Table 2. Twenty-three (56% of the 41 eligible patients who did not fill out HRQL questionnaires at end of treatment) did not do so because they were too unwell at the time.

Table 1 Participation in testing for health-related quality of life (HQRL) by patients in the CABARET trial, Part 11

Baseline and change scores

Scores at baseline for all domains of the EORTC QLQ-C30 and QLQ-BN20 questionnaires are available in Supplementary Table 3. QLQ-C30 scores were similar to expected reference values for patients with brain cancer [5, 8]. Mean change scores for global QOL are depicted in Fig. 1, and for each domain in Supplementary table 4. There was no evidence of a difference between the two arms for the overall mean of the mean changes from baseline for any domain of the QLQ-C30 or QLQ-BN20. In both arms of the trial, physical functioning and role functioning (QLQ-C30) and future uncertainty and headaches (QLQ-BN20) were associated with the largest negative change in score from baseline, the largest of which was 13.3 points (mean, for future uncertainty, combination arm) (Supplementary table 4).

Fig. 1
figure 1

Mean change from baseline in EORTC QLQ-C30 global quality-of-life score, by treatment arm. Red bevacizumab monotherapy; blue bevacizumab + carboplatin

Clinically significant changes

Results for proportions of patients with a ≥ 10 point change (improvement or deterioration) are shown in Table 2. There was no evidence of any difference in the odds of improvement or deterioration between treatment groups. Results from both arms combined are henceforth discussed, where appropriate, in this paper.

Table 2 Odds ratios for improvement or deterioration from baseline for eight preselected domains

All but 1 of 116 patients with analysable data for these domains reported baseline scores with potential for at least 10-point improvement in at least one of six domains from the QLQ-BN20 prespecified to represent symptoms of progressive glioblastoma, specifically: cognitive functioning (24%), communication deficit (23%), drowsiness (15%), headaches (28%), motor dysfunction (25%), and visual disorder (39%). Most patients reported multiple symptoms with potential for improvement: 79 (68%) reported four or more. Almost half (53/115, 46%) reported an improvement of ≥10 points in at least one of these domains, with 19 (17%) improving in one domain, 12 (10%) in two domains, 15 (13%) in three domains, and 7 (6%) in four domains. (Supplementary Figure 1) There was no discernible pattern to improvements, which varied in keeping with the unique pattern of symptoms experienced by individual patients. These results suggest that approximately half of all patients treated on either arm experienced a clinically relevant improvement during treatment in one or more domains that were likely to represent symptoms or signs of their disease.

Sustained improvements over baseline (≥10 points for at least two time points) were most commonly reported in motor dysfunction and cognitive, role, and social function in both arms, possibly reflecting a therapeutic effect of treatment in either study arm.

Time to deterioration

Overall, for Part 1 of CABARET, the median progression-free survival was 3.5 months for each arm as determined by central radiological review. Comparison between arms for time to HRQL deterioration, disease progression, or death for the eight selected domains are shown in Table 3. The hazard ratios (HR) are for the combination arm relative to bevacizumab monotherapy. Deterioration in HRQL was seen earlier than the reported median progression-free survival (as defined by the RANO criteria) of 3.5 months, in all prespecified domains, with no evidence of differences between arms, although deterioration across most domains was somewhat slower for the combination arm. Post hoc analysis combining the two treatment arms showed that across the eight prespecified domains, approximately two-thirds of patients had a single or sustained (over 2 or more questionnaire time points) deterioration of HRQL that preceded radiological or clinical disease progression and cessation of treatment. In total, 43 of 122 patients (35%) had a sustained ≥10 point deterioration in the overall QOL score over two or more visits before disease progression; and another 34 (28%) a decrease at a single time point before progression, death, or inability to complete further assessments. Figure 2 shows the Kaplan–Meier survival curve for time to deterioration for global QOL, with no evidence of any difference in event rates between the two arms.

Table 3 Months to deterioration in score or disease progression or death
Fig. 2
figure 2

Kaplan–Meier curve comparing deterioration in global QOL between arms

Discussion

In this randomized phase 2 study comparing bevacizumab monotherapy with bevacizumab plus carboplatin chemotherapy, we did not observe any clear differences between arms with respect to HRQL outcomes. While adding carboplatin to bevacizumab did not result in improved PFS, neither was it associated with worse HRQL in patients who received chemotherapy in addition to bevacizumab. While there was a > 10 point difference between arms at some time points for global QOL (Fig. 1), the wide confidence intervals reflect the small sample size at these times, and no statistically significant differences occurred in mean change scores overall for any domain. HRQL participation rates were excellent except at the end-of-treatment visit.

For all eight prespecified domains, the median time to deterioration was shorter than the median time to clinical or radiological progression, and for each of these domains more than 50% of patients experienced at least one deterioration in score before disease progression was documented. This could potentially signify an effect of treatment itself on the domains, but alternatively could represent subtle, subclinical disease progression before a formal radiological or clinical finding of progressive disease. Similar observations of deteriorating neurocognitive function prior to, and heralding, radiological progression have been reported [9, 10]. This may be particularly pertinent in the setting of bevacizumab treatment, where the antiangiogenic effects can reduce perfusion, thus obscuring contrast enhancement and clear evidence of radiological progression [11].

HRQL assessment provides valuable information in many clinical trials, especially where overall survival is poor and therefore an individual’s remaining life must be optimized for quality. In CABARET, we were able to achieve excellent HRQL compliance, with the exception of the end-of-treatment visit, which usually coincided with radiological or clinical progression (or both). This supports the feasibility of including HRQL assessment in clinical trials in advanced glioma after first-line therapy, and demonstrates that patients are able to complete a 50-item questionnaire in this setting. Our excellent completion rates may also reinforce the importance of staff training and a protocol requiring patients to complete HRQL questionnaires before receiving information on treatment response or discontinuation; these were issues that received attention in CABARET. Completion rates for patients on treatment were similar to or better than those from the BELOB study, a contemporary randomized trial in a similar recurrent-glioblastoma population [12]. In the BELOB HRQL study, compliance at the end of treatment dropped significantly. While missing data will continue to be an issue in such trials, especially at the time of disease progression, we postulate that a tight window in which to complete HRQL questionnaires and protocol-mandated conduct of HRQL testing with careful instructions for research nurses contributed to the excellent completion rates during treatment.

There are several possible approaches to the analysis of HRQL results. This has been a contentious issue recently, particularly when trials of bevacizumab in glioblastoma are considered. Two large-scale phase 3 studies in patients with newly diagnosed glioblastoma, both randomizing patients to standard treatment with or without bevacizumab, reported conflicting HRQL outcomes for patients receiving bevacizumab [13,14,15]. This has incited debate and discussion; differences in statistical methods and differences in data interpretation are among the purported reasons for such discrepancies [16]. The handling of missing data, in particular, remains an ongoing challenge in HRQL analysis, especially in populations with rapidly progressive disease such as glioblastoma. This requires careful consideration and a priori decisions when formulating a statistical analysis plan. The obvious advantage of including death or disease progression when calculating time to HRQL deterioration is that attrition (hence missing data), which is understandably common in this patient population, should not result in bias, given that all patients are taken into consideration in the time-to-event analysis and that death or deterioration with inability to complete HRQL assessment is appropriately characterized as deterioration in HRQL. This requires the assumption that patients with missing questionnaires were unlikely to be experiencing symptomatic or HRQL benefit at the time of the missing questionnaire, an assumption which medically is entirely appropriate. We acknowledge that, ideally, time to HRQL or clinical deterioration would be measured such that independent of radiological progressive disease, if a patient continues to derive clinical and HRQL benefit or stability while on a treatment, this could be measured and documented. However, as in most oncology clinical trials, HRQL data were not collected after date of progression, and therefore cannot be reported for the CABARET trial.

Given that there are no clear standard second-line anticancer agents that can provide anything more than modest survival benefits in this setting, any drug or regimen selected should not result in detriment to HRQL. Ours was an important study in comparing a doublet therapy with bevacizumab monotherapy, our concern being that having added carboplatin chemotherapy to bevacizumab might have adversely affected HRQL outcomes, while not providing additional survival benefit. However, we found no evidence of any difference in HRQL outcomes between treatments. Both arms had similar proportions of improvements and deteriorations in HRQL parameters, and there was no clear additional burden of toxicity indicated by the HRQL outcome measures. This could represent the disease itself and its subclinical progression exerting the dominant effect on HRQL; that the measures used were not sensitive to carboplatin-related effects on HRQL; or that HRQL was assessed in the wrong time frame for identifying chemotherapy toxicities (given that the questionnaire recall period was ‘during the last week’ and questionnaires were administered before day 1 of each 4-week treatment cycle, we effectively assessed HRQL in the last week of each cycle). Nevertheless, based on our HRQL data, we did not find any clear benefit for bevacizumab monotherapy over combination treatment, or vice versa. This study was designed in 2009 and carboplatin would no longer be a chemotherapy agent of choice in the recurrent setting; with lomustine being far more likely to be chosen as the control arm.

The issue of whether or not bevacizumab may benefit or worsen HRQL is an important one in light of the more modest effects of bevacizumab on progression-free survival reported in recent studies, and the uncertainty in the recurrent disease setting as to the overall survival benefits of the drug [17]. As such, any beneficial or detrimental effect of bevacizumab on HRQL becomes a critical aspect of determining the merit of treatment in this setting. The decline in HRQL noted for the majority of patients during radiological progression-free time may reflect disease status and subclinical progression rather than effects of therapy itself. This is supported by HRQL data from the BELOB randomized phase 2 study comparing lomustine with or without bevacizumab in a similar patient population, which reported no negative effects of bevacizumab on HRQL [12]. Similarly, in Part 2 of CABARET, in which 48 of the original 122 patients were randomized to either continuing or ceasing bevacizumab after progression on Part 1, there were no clinically or statistically significant differences between the randomized arms in time to deterioration in overall HRQL. Although we were limited by a small sample size with low statistical power, we did not find benefits or detriments in relation to HRQL when comparing patients who continued or ceased bevacizumab in this small randomized cohort.

Although many patients had some decline in HRQL preceding disease progression, a substantial proportion had some improvements in HRQL domains while on study. Post hoc analysis of patients who (based on baseline scores) had the potential for HRQL domain improvement showed that in domains relevant to symptoms from progressive glioblastoma, close to 50% reported clinically relevant improvements in at least one of these domains while receiving treatment. We postulate that these may represent those patients deriving meaningful clinical benefit and symptomatic improvement as a result of therapy.

There are some limitations to this study. The overall sample size is small, being a phase 2 study. A substantial percentage of patients missed assessment at the end of treatment, the high attrition rate at this time generally because patients were too unwell to complete questionnaires, which is to be expected in patients with glioblastoma. In anticipation of this clinical scenario, we planned to analyze time to HRQL deterioration, allowing us to include all patients, including those who did not complete questionnaires because of progression or death. This method is appropriate in this context as it avoids bias due to attrition in questionnaire completion at the end of treatment and was particularly relevant in Part 2 of the study, in which most patients’ HRQL deterioration was attributed to being not able to complete serial questionnaires because of clinical deterioration, disease progression or death. An additional strength of the CABARET trial was that HRQL assessment was mandatory, and therefore overall, most patients who were still on the study at each time point completed questionnaires, minimizing the risk of bias from noncompletion and avoiding loss of power.

In summary, the results from Part 1 of the CABARET study do not show any evidence of a difference in HRQL for patients with recurrent glioblastoma receiving bevacizumab monotherapy or bevacizumab plus carboplatin chemotherapy. However, improvements in HRQL domains reflecting disease burden were seen in almost 50% of patients, potentially reflecting symptom control from treatment received. It is feasible to complete a study with high HRQL completion rates in patients with recurrent glioblastoma if it is a mandatory aspect of the trial, the protocol provides relevant detail about HRQL administration, and staff receive good training. We found using time to HRQL deterioration a robust and useful statistical endpoint in this patient group, and suggest that this is an appropriate clinical trial endpoint in scenarios where attrition in questionnaire completion is likely and anticipated. These results should help inform future studies, and we encourage similar HRQL assessment methods in future clinical trial settings.