Introduction

The assessment of health-related quality of life (HRQOL) in oncology randomised controlled trials (RCTs) through patient-reported outcome (PRO) instruments has seen a steady increase over the last two decades helped, in part, through the development of a number of PRO cancer-specific instruments in the 1980s and 1990s [14]. However, despite their use in oncology RCTs, concerns have been raised regarding the consistency and quality of reporting of HRQOL data. Recent reviews have suggested a degree of variability in the reporting quality of HRQOL data in oncology RCTs: on the one hand, criteria such as the rationale for the instrument selection, presentation of results and discussion of the findings tend to be well reported, whereas the clinical significance of results and a description of missing data are frequently not addressed [5, 6]. However, there does appear to be a trend towards a greater degree of reporting of HRQOL in oncology RCTs over time [7].

The quality of HRQOL data reported is of importance if outcomes are increasingly being employed to inform clinical decision-making processes, as well as in comparative effectiveness studies, and health policy and reimbursement decisions [8].

There remain issues regarding the methodology of RCTs in oncology in terms of the quality and completeness of the reporting of HRQOL data. Two of the commonly used PROs in oncology RCTs are the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire-Core 30 questionnaire (EORTC QLQ-C30) and the Functional Assessment of Cancer Therapy-General (FACT-G) [1, 2]. What is not known is whether despite the frequent use of these instruments in clinical trials there are still issues in terms of the reporting of HRQOL data from these trials. This in turn would have significant implications in terms of clinical practice and study outcomes.

The primary aim of this study was therefore to undertake a meta-review of published systematic reviews to synthesise published data and detail any methodological shortcomings associated with the reporting of HRQOL in RCTs in oncology utilising the EORTC QLQ-C30 and/or the FACT-G. The secondary aims were to determine whether there were any similarities/differences in trial reporting between the two instruments (EORTC QLQ-C30 and FACT-G), and to evaluate the quality of HRQOL reporting against a robustness checklist [9].

Methods

A comprehensive meta-review was undertaken of systematic reviews of oncology PRO instruments used in oncology RCTs. This approach was adopted in order to maximise the synthesis of results from previously published studies and collate evidence from RCTs for different cancer types.

Inclusion criteria for the review were defined as systematic reviews of RCTs involving adult (≥18 years) patients with a cancer diagnosis, irrespective of stage/grade or tumour type and including any anti-cancer treatment (surgery, chemotherapy, radiotherapy, biological therapy or any combination of these) and PROs. Exclusion criteria included non-English language systematic reviews, the absence of patients-reported outcomes (including HRQOL reported by proxies or physicians). The Cochrane Library, PubMed, Ovid, PsychINFO and EMBASE databases were searched using the following broad search terms: (“neoplasm$” OR “cancer”) AND [(“quality of life” OR “HRQOL” OR “PRO$”) AND (“PRO measure$”) OR (“instrument$”)] AND (“clinical trial$ OR randomised controlled trial$) AND [(“review$”) OR (“literature review$”) OR (“systematic review$”). The publication type was restricted to full manuscripts (excluding textbooks, abstracts and unpublished manuscripts). The search was limited to systematic reviews of oncology RCTs involving HRQOL data published after 2000. An overview of the search process is shown in Fig. 1. In summary, article titles and abstracts were screened against the search criteria. Full articles were subsequently screened for inclusion (as well as full duplicate references) in the study. The reference sections of each review publication were inspected to identify any potential overlap in studies included.

Fig. 1
figure 1

Overview of the database search

The final systematic reviews were analysed recording cancer type, patient numbers, as well as the quality of HRQOL data reporting (from the EORTC QLQ-C30 and/or FACT-G)Footnote 1 which was evaluated using a standardised checklist [9]. This checklist consists of the fundamental and critical issues that a well-designed trial should include in order to produce reliable PROs. The checklist comprises four key areas: conceptual, measurement, methodology, and interpretation representing 11 criteria (see Table 2).

The total number of criteria reported were recorded and used to calculate a “robustness” score [9] The number of criteria reported for each reported study were summed to provide an overall score categorised as probably robust for scores of between 8 and 11 on the checklist, limited robustness for scores between 5 and 7, and very limited for scores between 0 and 4. Only those studies were included which reported at least 8/11 criteria. This method has been used previously to categorise the standard of reporting of HRQOL data in clinical trials [9, 10].

Results

Quality of reporting

A final 8 systematic reviews [1017] were identified where HRQOL data had been recorded in RCTs (Fig. 1). Three reviews did not record all of the 11 criteria [10, 15, 16]. The reviews covered a variety of cancers including non-small cell lung and colorectal cancer, leukaemia, prostate and multiple myeloma, as well as surgical oncology [10]. This represented a total of 101 trials. The FACT-G had been utilised in 21/101 (20.8 %), the EORTC QLQ-C30 in 78/101 (77.2 %) and 2 trials had employed both instruments (~2 %). A total of 34,616 patients had completed the instruments (32 % FACT-G, 64 % EORTC QLQ-C30).Footnote 2 A summary of trials is shown in Table 1.

Table 1 Systematic reviews and details

The proportions of the 11 criteria reported are shown in Table 2 for the two instruments. Given that not all the systematic reviews recorded all criteria, the proportions were adjusted for the difference in total number of trials (i.e. the denominator was adjusted); however, this had little impact on the results and these data are therefore not shown. It may be seen from this table that there was a great deal of variation in reporting the criteria: psychometric properties, adequacy of domain, timing of assessments and presentation of results were reported in more than two-thirds of trials. In contrast to this, clinical significance, rationale for instrument selection and a priori hypothesis were reported in fewer than 40 % of trials.

Table 2 Compliance with checklist

Efficace et al. [9] have defined the following criteria as critical in terms of reporting HRQOL data: psychometric properties, baseline compliance and missing data documentation. At least 50 % of trials across all 8 reviews included these three criteria for both instruments, and for the EORTC QLQ-C30, at least two of these criteria were reported in over 70 % of trials.

Instrument comparison

There was a considerable degree of agreement between the two instruments with 9/11 criteria falling within a 10 % difference between the two instruments. However, two criteria demonstrated large differences (>20 %), i.e., cultural verification of instruments (difference of 26 %) and baseline compliance (19 % difference). In both instances, these criteria were reported more frequently for the EORTC QLQ-C30. Although it should be noted that the 95 % confidence intervals were particularly wide for these two properties and this result should therefore be interpreted with caution.

Robustness

Robustness could be assessed in the five systematic reviews which had included all 11 criteria [1114, 17] amounting to 75 trials [EORTC QLQ-C30, 59/75 (78.7 %); FACT-G, 16/75 (21.3 %)]. Table 3 provides a summary of the robustness scores. More trials incorporating FACT-G could be categorised as probably robust, compared to EORTC QLQ-C30 trials. On the other hand, more FACT-G trials could also be categorised as having very limited methodological quality in terms of reporting HRQOL data.

Table 3 Robustness of trials

Discussion

This study reviewed and compared the reporting quality of HRQOL data captured through two common PRO instruments, the EORTC QLQ-C30 and FACT-G.

The results demonstrated that the majority of reporting criteria for a robust design are not being met for trials employing these instruments with 7/11 criteria not reported in more than 50 % of trials. Comparisons between the two instruments revealed a significant degree of agreement and that broadly speaking the criteria are being reported similarly; however, there were important differences for two domains (baseline compliance and cultural validity) on which reporting of the EORTC QLQ-C30 data was more complete. In terms of the “critical” criteria [9], the EORTC QLQ-C30 trials are reporting 2/3 criteria in more than 70 % of cases, whereas for the FACT-G, this was 1/3. The analysis of the overall robustness revealed that slightly more trials utilising the FACT-G could be classified as robust; however, fewer EORTC QLQ-C30 trials were of very limited robustness compared to the FACT-G.

These results are in line with previous studies [5, 6] and demonstrate that important criteria, such as a priori hypotheses for HRQOL and the clinical significance of results, are not being recorded in RCTs. These shortcomings in reporting of HRQOL will have to be addressed if this data are to help in supporting or informing decision-making processes. Given the widespread use of both the EORTC QLQ-C30 and the FACT-G, these results have important significance in terms of the interpretation of data for patient care in clinical practice and the interpretation of results from clinical trials.

There are a number of potential limitations with this meta-review. In terms of the robustness of trials, we were not able to assess this for all the systematic reviews which may have introduced a degree of bias into the results. Furthermore, the reviews were published by different groups and cover different time periods. There may be differences in the criteria utilised in these publications in selecting RCTs. This is to some extent inevitable where decisions (such as in/exclusion of papers) rely on judgment. However, the reviews were selected on the basis of the use of the Efficace checklist [9] and this should therefore have introduced a level of standardisation.

It is hoped that recent initiatives such as the CONSORT statement [8] and the CONSORT PRO extension [18] providing recommendations for the reporting of PROs and HRQOL in clinical trials, as well as the International Society for Quality of Life’s [19] adoption of minimum standards for PROs in comparative effectiveness research, will lead to greater levels of standardisation and improvements in the quality of trial methodology and consequently by association the potential value of the information provided by HRQOL data.