Introduction

Progress in the treatment of cancer has been made recently with the introduction of immuno-oncology agents, especially immune checkpoint inhibitors (ICI) [1]. Patient-reported outcomes (PROs), including health-related quality of life (HRQoL), are generally including as a secondary endpoint in randomized clinical trial to assess the clinical benefit of the treatment [2].

However, the methodology of PRO data remains a challenge. The European Medical Agency recommends a double-blinded trial design to avoid potential bias in PRO results [3]. While the absence of blinding may also affect classical clinician-reported data, including progression-free survival [4] and toxicity [5], the challenge with open-label trial design is particularly important for PROs [6]. The US Food and Drug Administration (FDA) also considers that open-label randomized clinical trials (RCTs) are rarely adequate to support labeling claims based on PRO instruments [7]. This is a crucial determinant for sponsors not using PRO results in FDA submissions for approval of a new drug [8]. Arguments are that if patients know which treatment they receive, it can influence their responses to PRO questionnaires, with potential disappointment if they are randomized to the control arm, or conversely, potential satisfaction if they are randomized to the experimental arm [9]. Therefore, patients randomized to the control arm may underestimate their HRQoL while patients randomized to the experimental arm may overestimate their HRQoL. Indeed, patients randomized to the control arm may be less likely to complete PRO questionnaires than those randomized to the experimental arm. The FDA thus explored the risk of bias associated with completion rates for cancer trials submitted to the agency between 2007 and 2017 [10]. They identified a slight difference in completion rates between the two trial designs, favoring the experimental arm, but no clear difference between open and blinded studies. There are many reasons why an open label design for a study is necessary, such as the toxicity profile being so different between experimental treatment and chemotherapy that double-blind protocols would be not adequate. Since a large proportion of RCTs are conducted in open-label, it is essential to explore the risk of bias to facilitate the consideration of PRO data from open-label studies. Indeed, a significant part of open-label RCTs conducted in immuno-oncology report positive results in terms of PRO scores, especially in comparison to chemotherapy [11,12,13,14,15]. To the best of our knowledge, no research has been conducted in the field of immuno-oncology regarding the risk of bias in PRO data in unblinded trials.

In this context, the objective of this study was to perform a systematic literature review to compare open-label to blinded trial design in the assessment of the PRO data in cancer immunotherapy trials in patients with advanced cancer. The primary objective was to characterize the impact of the design in terms of PROs completion rates and PRO scores at baseline and over time. An exploratory objective was to assess potential differences of risk of bias according to the type of study design.

Methods

Search methods for identification of studies

A systematic literature search was conducted using PubMed/MEDLINE, Cochrane Library and Embase databases. Search strategies combined different terms to represent RCTs, immuno-oncology and PROs. The full search strategies for each database are listed in Supplementary Online File (Tables S1, S2 and S3). This study was conducted following the Preferred Reporting Items for Systematic review and Meta-Analysis guidelines [16].

Inclusion and exclusion criteria

All publications from January 1, 2009 to May 2, 2019 in English language were eligible. The starting date of 2009 was chosen due to the online publication of the first RCT in immuno-oncology in 2009 [17]. Only original papers of RCTs, on patients with advanced cancer, with a least one arm treated with immunotherapy, investigating at least one immuno-oncology drug (pembrolizumab, atezolizumab, nivolumab, durvalumab, avelumab, ipilimumab, or tremelimumab) and reporting PRO results were included. Meta-analyses as well as subgroup analyses were not considered.

Selection of studies and data extraction

Each paper identified by the search algorithms was screened independently by two reviewers. First, titles and abstracts were screened and then the full paper. Two senior reviewers resolved discrepancies.

The following information were collected on retained studies:

  • general information regarding the study, including phase of the trial, disease site(s) and stage(s), treatment type in experimental and control arms, type of trial (open-label or blinded), primary endpoint and PRO endpoint status (e.g., primary, secondary, or exploratory).

  • information regarding PRO assessment and analysis, including questionnaire(s) used, timing of assessment and definition of the minimal important difference.

  • reporting of the results: completion rate reported at baseline, at the first post-baseline assessment and over time, PRO level at baseline, any statistically and/or clinically significant results between treatment arms over time, and main dimensions significant over time between treatment arms pooling data from all questionnaires. Regarding completion rate, each completion rate was reassessed when possible considering the number of patients with an available questionnaire dividing by the number of patients expected to complete the questionnaire. If the information of the number of patients still in the study and able to complete the questionnaire at each measurement time was not available, the completion rate was not reported except for baseline measurement time.

  • risk of bias, using the Cochrane risk of bias tool [18], appreciated regarding: the random sequence generation (e.g., a low risk of bias if a random component was considered in the sequence generation process), the allocation concealment (e.g., a low risk of bias if the investigator and participant could not predict assignment), the attrition bias corresponding to incomplete outcome data (e.g., a low risk of bias if there is no missing outcome data, missing balanced in number and similar reason across groups), and selective reporting bias (e.g., a low risk of bias in case of all pre-specified outcomes reported in the publication). Attrition and selective reporting biases were reported regarding PRO data.

Statistical analysis

Categorical variables were described in terms of absolute and relative frequencies. Quantitative variables were described using median with range. Absolute differences in completion rates between arms were calculated as the experimental arm(s) rate minus the control arm rate reported in terms of median with range. Analyses were conducted using SAS version 9.4 (SAS Institute Inc., Cary, NC, USA).

Results

Studies identified

A total of 8,284 references were identified through the three databases. Finally, 27 papers (0.3%) were retained which met the predefined inclusion criteria, corresponding to 23 studies: 15 (65%) open-label and 8 (35%) blinded studies (Fig. 1) [11,12,13, 15, 19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41].

Fig. 1
figure 1

Flowchart of information through the different phases of a systematic review *includes the following types of publications: study protocol, reviews, interview, perspectives, commentaries, letters, erratum, meeting reports and case reports. RCT randomized clinical trial, PRO patient-reported outcomes

General characteristics of the studies

Majority of the 23 studies retained were phase III RCTs (75%) (Table 1). Most of blinded studies involved melanoma patients (N = 7, 88%), while open-label studies focused on different disease sites. A single immunotherapy was generally administered in the experimental arm(s) (N = 18, 79%). The principal immuno-oncology drugs investigated, alone or as part of a combination therapy regimen, were nivolumab (N = 11, 48%) and ipilimumab (N = 7, 30%). In the control arm, patients generally received chemotherapy for open-label studies (N = 10, 67%) while a single immunotherapy (N = 3, 39%) or placebo (N = 2, 25%) were mainly administrated in blinded studies. Overall survival was the main primary endpoint (N = 17, 74%). PRO data was either a secondary or an exploratory endpoint in open label studies (N = 7, 47% for each) and a secondary endpoint in all blinded studies (N = 8, 100%). Finally, a higher proportion of open-label trials published PRO results in a dedicated paper compared to blinded trials (73% vs. 50%). General information of included studies are summarized in supplementary Online File (Table S4).

Table 1 General characteristics of the studies

PROs assessment and analysis

Most of trials used the European Organisation Research and Treatment of Cancer (EORTC) questionnaires or the EQ-5D questionnaire (N = 16, 70% each) (Table 2). No study administered the first PRO questionnaire strictly before randomization. A majority of studies administered the first PRO questionnaire before treatment start (57%).

Table 2 Patient-reported outcomes (PROs) assessment and analysis

PROs completion rate

Most of trials (N = 17, 74%) reported at least partial information regarding completion rate, with 80% of open-label trials and 63% of blinded trials.

When the overall completion rate was reported (N = 16), it was high with a median of 92% (range 67–99%) and slightly higher in open-label trials (N = 11, 94%, range 78–99%) than in blinded trials (N = 5, 90%, range 67–95%) (Table 3). The baseline completion rate by treatment arm was available for 16 trials (N = 12 open-label trials, N = 4 blinded trials). The median absolute difference observed between experimental and control arms was similar irrespective of type of trial design, equal to 2% (range 0.1–5%) in open-label (N = 12) and blinded trials (N = 4).

Table 3 Patient-reported outcome completion rate globally and by treatment arm according to the type of study

At first post-baseline assessment, the overall completion rate remained high with a median of 89% (N = 13, range 70–97%) and higher in open-label trials (N = 9, 90%, range 72–97%) than in blinded trials (N = 4, 78%, range 70–92%). The median difference observed between treatment arms equal to 2% (− 0.3–18%) in open-label trials (N = 11), favoring the experimental arm, versus − 4% (− 5–0.6%) in blinded trials (N = 4), favoring the control arm.

During follow-up, completion rate remained stable in open-label trials (N = 7, 88%, range 71–96%) while it decreased in blinded trials (N = 4, 71%, range 70; 98%). Compliance remained slightly higher in the experimental arm for open-label studies while it was higher in the control arm for blinded studies, with an absolute median difference observed between treatment arms equal to 4% (N = 8) in open-label trials versus − 2% (N = 4) in blinded trials.

PRO level at baseline and over time

More than half of trials (57%) discussed the comparability of PRO scores between treatment arms at baseline, with a greater number of open-label (73%) than blinded studies (25%) (Table 4). Baseline PRO scores were reported in the majority of trials (N = 16, 70%) and particularly in open-label trials compared to blinded trials (80% vs. 50%). Among them, no clinically significant difference was observed between treatment arms considering the minimal important difference defined by the authors.

Table 4 Patient-reported outcomes (PRO) level at baseline and over time

Over time, among the 16 comparative trials reporting any statistical test for PRO data, 14 trials (88%) have found at least one statistical difference in PRO outcome between treatment arms with 10 out of 11 (91%) open-label trials and 4 out of 5 (80%) blinded trials. The statistical methods used for the longitudinal analysis vary according to the trials: time to PRO event model was the most widely used model in open-label trials (N = 11, 73%) while most of blinded trials used mean change from baseline (N = 6, 75%, see Table 2). Whatever the questionnaire used and statistical method used, the global HRQoL dimension or global score was the most frequently significant dimension between treatment arms among trials reporting these types of dimensions/scores (in 9 out of 13 (69%) open-label studies and 3 out of 7 (43%) blinded studies) (data not shown).

Risk of bias assessment

A low risk of bias was observed regarding both the random sequence generation and the allocation concealment in majority of open-label trials (87% and 73%, respectively) and in blinded trials (88% and 76%, respectively) (Fig. 2). A high risk of attrition bias was detected in a higher proportion of open-label trials as compared to blinded trials (27% versus 13%), indicating that more open-label studies were affected by missing data in an unbalanced proportion between treatment arms as example. In contrast, a high risk of reporting bias was detected in a higher proportion of blinded trials compared to open-label trials (38% versus 7%), indicating that open-label trials more systematically reported all results in terms of PRO data than blinded trials.

Fig. 2
figure 2

Risk of bias using to the Cochrane risk of Bias tool according to the type of study

Discussion

This systematic literature review identified 23 studies published until May 2019 reporting PROs results from RCTs on immuno-oncology in patients with advanced cancer. Among them, 60% were open-label studies, which is commonly observed in oncology clinical trial [10].

Most of studies reported information regarding the completion rate of PRO questionnaires, and particularly open-label trials (80% vs. 63% in blinded trials). This percentage is higher than the 53% observed in a recent FDA publication on trials about malignant hematologic/oncologic conditions [10]. In our review, baseline PRO scores were also reported in majority of trials and particularly in open-label trials compared to blinded trials (80% vs. 50%). These two items are crucial to assess the quality of the study and are part of the Consolidated Standards of Reporting Trials PRO checklist [42]. The effort of open-label studies to provide this information can thus be emphasized. This could be related also to the higher proportion of open-label trials with PRO results published in a dedicated paper (73% vs. 50% for blinded trials), allowing to report more details.

In order to ensure the comparability of treatment arms at baseline, recommendations of the EORTC are to assess the first PRO questionnaire prior to randomization or at least before treatment start [43]. In our review, no study assessed HRQoL strictly before randomization. Due to the reluctance to use PRO in open-label studies, researchers should make an effort to systematically collect baseline PRO questionnaire before randomization. To assess the validity of the baseline assessment, a recommendation could be to collect the first PRO questionnaire prior to randomization, for example at screening, and then after randomization but prior to the treatment start. Any observed differences in terms of compliance and PRO scores between these two assessments in an open-label study could be an indication of bias due to the design.

Since most patients included in open-label studies have knowledge of the arm in which they have been randomized at the time of the baseline assessment, this could have influenced their response to PRO questionnaires, in terms of compliance and PRO level at baseline. However, our results demonstrated that completion rate was similar between arms, irrespective of study design. Further, the differences in PRO scores observed between treatment arms at baseline were not clinically significant. Thus, no signal of bias at baseline has been identified in this review. In their review, the FDA found the same result, whether the first questionnaire was administrated at screening (i.e., prior to randomization) or baseline (i.e., before treatment start) [10].

In our review, a slight difference was observed in completion rates over time since the first post-baseline assessment: the completion rate was slightly higher in experimental arm in open-label studies while it was higher in control arm for blinded trials. The same trend was observed in the FDA review exploring the completion rate at 6 months [10]. However, since these data were collected during treatment, the difference observed could not be due to the design only, but also the efficacy, toxicity of the treatment and/or difference of indications between studies. A systematic reporting of the reason of non-completion could help to qualify the profile of missing data.

Another important result that we observed is the high risk of attrition bias observed in open-label studies compared to blinded trials (27% vs. 13%). In addition to the study design, this may be explained by the high proportion of open-label trials comparing immunotherapy to chemotherapy, chemotherapy being potentially subject to greater attrition rates due to its side effects profile. The main reluctance from FDA regarding PRO data from open-label trials comes from PRO domains not directly related to the treatment effect, including global HRQoL and emotional dimensions [44]. The underlying theory is that these dimensions could be more negatively impacted by a possible disappointment bias than other domains. In consequence, the FDA suggests to focus on treatment-related side effects, disease-related symptoms and physical function, all of which are more proximal concepts to disease [44]. As we already noted from our systematic review, no clinically significant difference between treatment arms was observed at baseline, irrespective of domain assessed. Pooling data from all questionnaires used, we found that the global HRQoL domain or global summary score was the most often significant PRO score between treatment arms over time. This analysis could not be deepened particularly due to the heterogeneity in questionnaires used, domains assessed and reported. Other reviews recently published about PRO results from clinical trials conducted on immune checkpoint inhibitors highlighted that the questionnaires used may not be appropriate or at least may not capture all treatment-related side effects[45,46,47]. Immune-related adverse events, including rash and pruritus, are very specific and are not assessed with the classical QLQ-C30 questionnaire widely used in clinical trials. The use of questionnaires addressing specific ICIs’ side effects could potentially help to limit the risk of attrition bias.

Different biases have been identified in this review. First, the limited number of trials identified and in particular blinded trials, emphasizes the need to interpret all results observed for exploratory purpose only. Ongoing trials may further explore combination of immune-oncology drugs, making easier the design of blinded study. The homogeneity of studies included in this review, limited to immunotherapy, facilitates the interpretation of the results but may not allow generalization to other treatments results.

Another important factor to consider in the interpretation of the results, is the difference in the profile of disease sites according to the type of studies. Indeed, almost all blinded studies are on melanoma while open-label studies included a number of cancer sites. A systematic review recently published explored the impact of the RCT design on PRO data, focusing on prostate cancer and including all therapies and setting [48]. This review explored the quality of PRO reporting and concordance with clinical endpoints between open-label and blinded trials and no difference was highlighted according to the design.

Another possible bias of our review is the high proportion of the trials exploring nivolumab or ipilimumab. The results of this first review, if duplicated at a later date, could be strengthened with the integration of other new molecules currently being explored in clinical trials. Since the data extraction, publications of PRO data from other immune-oncology trials have been published including data from the PACIFIC open-label phase III trial exploring the added value of durvalumab to improve overall survival in previously treated unresectable non-small cell lung cancer patients [49]. This would enhance the number of treatments explored and support the observation of this first review.

Researches should be pursued to guarantee the reliability of PRO data in open-label studies. Analyses should be repeated to other treatment strategies. A comparison of individual data from two closed RCTs should ideally be done to complement these results. This analysis could allow to extract the same information from both trials and to directly compare the results between trials. In case of suspicion of potential impact of the study design on compliance or PRO level, adequate methodology or dedicated statistical methods must be explored to allow the consideration of PRO results in open-label studies. This program of research must be pursued to allow the consideration of PRO results by health agencies for the evaluation of new treatment strategies.

In conclusion, this study provides crucial information regarding the alleged bias in open-label trials regarding PRO endpoints in the context of immune-oncology RCTs in patients with advanced cancer. The main result identified that in open-label trial design little or no impact was observed on the two main domains explored: the compliance at baseline, with a high and similar completion rate between arms at baseline irrespective of trial design, and PRO scores at baseline, with no meaningful differences observed in PRO scores at baseline between arms, again irrespective of trial design. This provides some confidence that the baseline assessment, even if this is done post-randomization is not subject to potential bias.