Introduction

Patient assessment of health-related quality of life (HRQOL) in cancer clinical trials has increased over the years. With this increasing use, there is also an increasing need to improve the ways in which HRQOL data are interpreted [1]. One approach in this regard is to focus on identifying the smallest difference or change in HRQOL scores over time that may be considered of clinical relevance [2]. Minimal clinically important differences (MCID) are of great utility to interpret changes in HRQOL scores because, although statistical tests provide information regarding the probability that an effect exists, they say nothing about the meaningfulness of the effect [3].

Determining the MCID in HRQOL scores over time in cancer studies can be an important tool for clinicians and researchers to assess the effectiveness of treatment interventions and also for sample size calculations in clinical trial design. However, several studies have indicated that MCID differ depending on various factors, including the cancer population [4], baseline values [5], methods and chosen anchors [6, 7]. This makes evaluating and interpreting changes in HRQOL outcomes a challenge for clinicians [8].

The European Organization for Research and Treatment Quality of Life Questionnaire core 30 (EORTC QLQ-C30) is one of the most widely used patient-reported questionnaires for measuring HRQOL in cancer research. Unfortunately, research has shown that up to 62% of the studies [9] that have used the EORTC QLQ-C30 lack any reporting of clinical significance, even though guidelines for assessing clinical significance do exist. In their study involving patients with breast and small-cell lung cancer, Osoba et al. [10] labelled changes of 5–10 points in EORTC QLQ-C30 scores (range 0–100) as “small” differences, changes in scores of 10–20 as “moderate” differences and those above 20 as “large” differences. King [11] came to similar findings in a pooled set of different studies and cancer sites. Based on these two studies, mean differences of 10 points or more are now regarded as being clinically meaningful when interpreting EORTC QLQ-C30 scores.

To determine a MCID, investigators focus on two approaches: the anchor-based approach and the distribution-based approach [12]. Anchor-based methods link HRQOL measures to indicators that have clinical relevance or to patient-derived ratings of change in health. These anchors have been used previously to interpret EORTC QLQ-C30 scores. Distribution-based approaches use summary statistics calculated from HRQOL data; two commonly used statistics are the effect size and standard error of measurement (SEM).

In this study, we estimated MCID on the EORTC QLQ-C30 Global Health Status (GHS) scores for older patients with cancer with a geriatric risk profile to complement existing literature regarding MCID. As changes in HRQOL are strongly associated with the treatment received and its side effects, the study was sub-divided into a chemotherapy and surgery group.

As a secondary analysis, a responder analysis was carried out to demonstrate how baseline patient characteristics were associated with significant clinical HRQOL changes (deterioration vs. no deterioration) as reported by the MCID. Knowing which baseline patient characteristics are most likely to contribute to a patient’s HRQOL will allow health practitioners to identify better targeted treatment decisions.

Patients and methods

Data

Data are derived from a prospective, multicentre, observational cohort study, involving 9102 patients from 22 Belgian hospitals (8 academic and 14 non-academic) over a period of 4 years (2012–2015). Patients included were at least 70 years of age and had newly diagnosed or progressive invasive tumours (solid and hematologic malignancies) at inclusion. Patients with local basal or spinocellular carcinoma of the skin and treated with local therapy only were excluded.

The study focused mainly on patients who were scheduled to undergo therapy with significant toxicity and morbidity (i.e. surgery, chemotherapy or radiotherapy) since these therapies were expected to have a potential impact on a patient’s overall health and functioning. Given that level of toxicity and morbidity are strongly related to the type of treatment [13], the MCID analysis was performed separately for those patients who underwent surgery and those who underwent chemotherapy. In the end, patients receiving radiotherapy alone were not included in the study because the numbers in this category were very limited. In addition, patients receiving any sort of combined treatment (e.g. surgery plus chemotherapy, radiotherapy or antihormonal therapy) were excluded from analysis because it would be impossible to disentangle the separate impact of each modality on the GHS.

At baseline, all patients underwent a geriatric screening with the geriatric 8 (G8) screening tool. This tool contains eight questions with a total score ranging from 0 to 17. A higher score indicates a better health status. The tool was developed to separate older patients with cancer without a geriatric risk profile who should be able to receive standard treatment (G8 > 14) from those that should undergo a geriatric assessment (GA) to guide tailoring of geriatric interventions and/or treatment decision making (G8 ≤ 14). As the aim of the primary analysis was to investigate the adherence to GA-based recommendations and subsequent interventions, GA measures and HRQOL information (EORTC QLQ-C30 GHS scale) were only collected for those patients with a G8 score ≤ 14. Within the surgery group, this was after surgery. Within the chemotherapy group, this depended on the chemotherapy schedule and as a result the follow-up could have been during or after chemotherapy. The study was approved by the Ethics Committees of all hospitals involved.

The EORTC QLQ-C30 Global Health Status scale

In this study, we focused on the EORTC QLQ-C30 GHS scale, which consists of two questions, “How would you rate your overall health during the past week?” and “How would you rate your overall quality of life during the past week?” Patients answer the two questions by means of 7-point Likert scales and the two scores are combined to define the GHS. The GHS score is linearly transformed to a 0–100 score to facilitate statistical interpretation. A higher HRQOL is reported by a higher GHS score. The EORTC GHS scale is one of the most frequently used QLQ-C30 scales and administration of this instrument has been used as the primary endpoint in various trials [14,15,16].

Geriatric assessment as clinical anchor

The anchor-based approach to establish MCID requires an anchor that is clearly definable and interpretable by clinicians, has clinical relevance and strongly correlated with a HRQOL measure, in our study the EORTC QLQ-30 GHS scale. GA is commonly used by clinicians to obtain a better view on the global health and functional status and reserve capacities of older patients with cancer [17]. GA is multidimensional, interdisciplinary patient evaluation that leads to the identification of the general health status including medical, cognitive, social, nutritional and psychological parameters and is used by clinicians to reveal geriatric problems. Previous studies have shown a correlation between GA measures, ECOG PS and GHS scale [18,19,20] and are therefore considered in this study as possible anchors that could guide interpretation of HRQOL scores.

Data from the ECOG Performance Scale (PS) and the following seven GA measures were collected at baseline and at follow-up and were further explored as possible anchors in our study: the Mini Mental State Examination (MMSE) [21], Katz’s Activities of Daily Living (ADL) [22], Lawton’s instrumental Activities of Daily Living (iADL) [23], Mini Nutritional Assessment-Short Form (MNA-SF) [24], the 15-item Geriatric Depression Scale (GDS15) [25], and Visual Analogue Scales (VAS) for Pain and Fatigue.

The ECOG Performance Scale ranges from 0 (fully active, able to carry on all pre-disease performance without restriction) to 4 (bedbound) and known to be a good clinical anchor for the EORTC QLQ-C30 GHS scale. Mini Mental State Examination is 30-point questionnaire measuring cognitive impairment. A score greater than or equal to 24 points indicates a normal cognition. Below this, scores can indicate severe (≤ 9 points), moderate (10–18 points), or mild (19–23 points) cognitive impairment. The Activities of Daily Living scales ranged from 6 (fully independent) to 24 (fully dependent) with dependency defined as a score > 6.

The Lawton’s instrumental Activities of Daily Living summary score ranges from 0 (low function, dependent) to 8 (high function, independent) for women and 0 through 5 for men and is rated either fully independent (8 for women or 5 for men) or dependent (< 8 for women or < 5 for men). iADL differs from ALD as these activities require more complex thinking skills. The Mini Nutritional Assessment-Short Form (score 0–14) is a validated nutrition screening and assessment tool that can identify geriatric patients age 65 and above who are malnourished or at risk of malnutrition (< 12) versus those patients who have a good nutrition (≥ 12). Risk for depression is measured by the GDS15 whereby a score of 0–4 indicates no risk and a score above 4 indicates risk. The VAS for Fatigue is a self-reported measure consisting simply of a 10-cm line marked “no fatigue” at one end and “worst possible fatigue” at the other which the patient marks to indicate the degree of fatigue s/he is experiencing. The following guideline has been recommended for labelling results [26]: no fatigue (0–0.4 cm), mild fatigue (0.5–4.4 cm), moderate fatigue (4.5–7.4 cm), and severe fatigue (7.5–10 cm). Similar interpretation can be followed for the VAS for Pain.

Statistical analysis

Linear and logistic regression models were used to test the statistical association between baseline mean GHS scores and the main socio-demographic, clinical and GA characteristics. The association was assessed with the least mean square difference (β), its standard deviation (SD) and the p-value of the Wald χ2 statistic. Missing values were treated as a category by itself. The level of significance was set at p < 0.05.

The t test was used to assess whether the mean GHS change was statistically significant in each treatment group. The Chi-square test was used to assess whether the mean GHS change in the surgery group differed significantly from that seen in the chemotherapy group and whether patients who reported HRQOL information differed from those who did not report HRQOL for each chosen clinical anchor.

The polyserial correlation coefficient was used to assess the correlation between mean GHS and clinical anchor scores at both time points and change over time. The polyserial correlation is a generalization of the biserial correlation and used when one of the variables has been ordered in categories and the other variable has a normal distribution [27]. A correlation of at least 0.30 was considered acceptable for a clinical anchor [4]. For each retained clinical anchor, GHS MCID were calculated using data from those patients who provided information on both the anchor and GHS at baseline and follow-up. For each patient, we calculated the GHS change over time and assigned the patient to one of the three clinically meaningful categories (improvement, stable, deterioration) for the clinical anchors selected as reported in literature. We then obtained estimates of MCID by calculating the difference in mean HRQOL changes by adjacent clinical categories, that is, “improvement” versus “stable” and “stable” versus “deterioration”.

Distribution-based methods were used to evaluate the anchor-based MCID. For comparison purposes, four distribution-based approaches were applied, set respectively at 0.2 SD, 0.3 SD, 0.5 SD and the SEM. To calculate the change score using the effect size, one multiplies the SD of the GHS baseline scores by 0.2, defined as a small effect size [28], or alternatively by 0.3 or 0.5 [29, 30]. The SEM, which measures the precision of the HRQOL instrument, is calculated by the formula SEM = SD\(\sqrt {1 - {r_{\text{s}}}}\), where SD is the standard deviation of HRQOL scores, and rs is the reliability coefficient. The rs for the EORTC QLQ-C30 GHS has been defined as 0.85 in a previous published study [31]. Thresholds of 1 SEM have also been used to estimate MCID [32].

A final analysis included a responder analysis to determine baseline patient characteristics associated with HRQOL changes according to the reported MCID for deterioration using logistic regression. A dummy variable was created to categorize patients whose HRQOL deteriorated (coded as 1) relative to patients whose HRQOL did not deteriorate (stable or improved) over the course of their treatment (coded as 0) using the reported MCID from our analysis. The association was assessed with the odds ratio (OR), its confidence interval and the p-value (set at 5%) of the Wald χ2 statistic. OR measures the strength of association between an exposure value and an outcome, in our case HRQOL deterioration versus no HRQOL deterioration. An OR equal to 1 means that the exposure does not affect the odds of an outcome. An OR > 1 means that the exposure is associated with higher odds of outcome, while an OR < 1 means that the exposure is associated with lower odds of outcome [33].

Results

Patient population

In total, 9102 patients were initially enrolled in the study, of which 3195 were excluded because their G8 score was higher than 14 and as per protocol no follow-up data were collected for these patients. We also excluded (n = 3441) any patients who were expected to receive any sort of combined therapy or any other treatment different from surgery or chemotherapy. This left a total of 2466 patients included in the study, of which 1214 were scheduled to receive surgery alone and 1252 were scheduled to receive chemotherapy alone. However, HRQOL data were not obtained for 184 patients and 858 patients at baseline and at follow-up, respectively. As a result, the final analysis involved 1424 patients, of whom 741 had received surgery and 683 chemotherapy. Figure 1 shows the flowchart of patient inclusion. To check possible selection bias due to missing HRQOL forms, the Chi-square test did not report any statistically significant differences between patients who did report HRQOL information versus those who did not HRQOL information for each of the anchors (ECOG PS p = 0.0718; GDS15, p = 0.097; VAS Fatigue, p = 0.469).

Fig. 1
figure 1

CONSORT diagram showing the flow of patient inclusion

Patient characteristics

Table 1 shows the mean baseline GHS score for selected socio-demographic, clinical variables and geriatric assessment tools and the outcomes of the regression analyses assessing the association between baseline GHS score and socio-demographic, clinical and GA patient characteristics in each treatment group. The study population reported significantly better baseline GHS in both groups with increasing age. This association was mainly driven by a small group of the oldest patients who reported a high GHS score. In the surgery group, patients who relapsed or progressed had a significantly better baseline GHS than newly diagnosed patients. Patients diagnosed with hematologic cancer had a significantly better GHS than those with carcinoma cancer and patients with primary cancer unknown had a statistically significant lower GHS then patients with breast cancer. But these findings were based on a small sample among patients with hematologic or unknown primary cancer. Patients with stage II cancer has a worse baseline GHS than those with stage I cancer. In the chemotherapy group, patients with digestive cancer had a statistically better GHS then patients who had breast cancer.

Table 1 Mean baseline GHS score and linear regression analysis to assess the association between baseline GHS and socio-demographic, clinical and GA patient characteristics in each treatment group

For most of the GA variables (except MMSE), patients who had a better score on their baseline geriatric assessment also reported a statistically significant (all p < 0.05) better GHS in each treatment group.

Correlation between possible anchors and HRQOL

Table 2 displays the mean and SD of the GHS score at baseline (T0) and follow-up at the 3 months (T1) and change over time (T1–T0) for the surgery and chemotherapy groups. Within both treatment groups, patients reported an improved GHS at follow-up (62.21 and 53.77 respectively). Patients who underwent surgery had a statistically significant change in GHS score (6.03; p < 0.001) over time, while patients who underwent chemotherapy did not (1.69; p = 0.083). This confirms our rationale to focus on these two treatment groups separately in our study. In addition, Table 2 reports the correlation coefficients between the mean GHS scores with potential clinical anchors at baseline and follow-up at 3 months. Only the variables ECOG PS, GDS15 and VAS Fatigue meet the 0.30 threshold in each treatment group at both time points and will be further analysed as clinical anchors.

Table 2 Mean (SD) GHS score by treatment at baseline (T0), follow-up at 3 months (T1) and over time (T1–T0) and correlation coefficients for GHS score with ECOG PS, ADL, iADL, VAS Pain, VAS Fatigue and GDS15 at baseline and follow-up

Calculation of MCID

Mean (SD) of GHS change scores in the three anchor-defined categories for ECOG PS, GDS15 and VAS Fatigue respectively and the difference in mean change scores between adjacent categories in each treatment group are presented in Table 3. Changes in ECOG PS were categorized into three groups: deterioration (PS worsened by one category), stable (PS stayed the same) and improvement (PS improved by one category). Changes in PS of two or more categories had to be excluded (98 in surgery and 117 in chemotherapy group respectively) from the analysis as they were considered too large for determining “minimal clinical change. According to a previous study [34], any change of two points on the GDS15 score represents a clinical change. Therefore, we categorized the GDS15 into the following three groups: deterioration (GDS15 increased by 2 points), stable (GDS15 increased or decreased by max. 1 point) and improvement (GDS15 decreased by 2 points). Changes of more than two points in the GDS15 were excluded (241 in surgery and 222 in chemotherapy group respectively) from the analysis as they were considered too large for determining “minimal” clinical change. Vinker et al. [35] reported a minimum clinically important difference of 1.37 for a 10 cm VAS Pain, so we categorized VAS Fatigue into the following groups: deterioration (VAS Fatigue increased by 2 points), stable (VAS Fatigue decreased or increased by max. 1 point), and improvement (VAS Fatigue decreased by 2 points). Also here, changes of more than 2 points in the VAS scale were excluded (309 in surgery and 298 in chemotherapy group respectively) from the analysis as they were considered too large for determining “minimal” clinical change. As an example for the surgery group, in Table 3 the first difference (absolute value) in GHS mean change of adjacent categories (stable vs. improvement) for ECOG PS is the difference between 8.33 and 13.24 (= 4.91) and the second difference (stable vs. deterioration) is the difference between 8.33 and − 2.24 (= 10.57).

Table 3 Mean (SD) of GHS scores in the three anchor-defined categories (improvement; stable; deterioration) and the difference in mean change scores between adjacent categories when anchored to ECOG PS, GDS15 and VAS Fatigue in each treatment group

Figure 2 displays the anchor-based MCID estimates adjacent to the distribution-based estimates using SEM, 0.5 SD, 0.3 SD and 0.2 SD in each treatment group. A 0.2 SD of the variable GHS at baseline is defined as the smallest clinical effect. Anchor-based MCID values lower than the smallest clinical effect defined by 0.2 SD of the GHS scores at baseline are considered not clinically meaningful and are therefore of no further relevance. In our example, the smallest clinical effect within the surgery group is 4.35, obtained by multiplying the SD of the GHS baseline score (21.74) by 0.2, and in the chemotherapy group it is 4.51 [0.2 × SD at T0 (22.54)]. As a result, within the surgery group, MCID for improvement was set at 5 using EOCG PS and GDS15 as clinical anchors and MCID for deterioration was set at 9 using VAS Fatigue and 11 using ECOG PS as clinical anchors. Within the chemotherapy group, MCID for improvement was set at 8 using ECOG PS as clinical anchors and MCID for deterioration was set at 5 using VAS Fatigue and 7 using ECOG PS.

Fig. 2
figure 2

The anchor-based MCID estimates adjacent to the distribution-based estimates using SEM, 0.5 SD, 0.3 SD and 0.2 SD in each treatment group

Responder analysis

To assess the association between baseline patient characteristics and HRQOL deterioration, MCID for deterioration was set initially at 9 for the surgery group and 5 for the chemotherapy group as estimated by VAS Fatigue.

Applying the above MCID of 9 for deterioration to our surgery group, 23.48% (N = 174) of the patients reported a deterioration versus 76.52% (N = 567) reported an improved or stable HRQOL over time. Applying the above MCID of 5 for deterioration to our chemotherapy group, 36.90% (N = 252) of the patients reported a deterioration versus 63.10% (N = 431) reported an improved or stable HRQOL over time. When MCID of deterioration was set at 11 in the surgery group and 7 in the chemotherapy group using ECOG PS as clinical anchor, exact same results were reported (Table 4).

Table 4 Logistic regression models to assess the association between HRQOL deterioration and baseline socio-demographic, clinical and GA patient characteristics

In the surgery group, the odds of experiencing a HRQOL deterioration during treatment was 37% [(0.63–1.00) × 100%] lower in patients who reported pain versus those who reported no pain at baseline and 33% lower in patients with good nutrition versus those patients with poor nutrition. This means that patients with pain or poor nutrition at baseline were likely to experience a HRQOL benefit (or at least stable) from their surgery. In the chemotherapy group, the odds of experiencing an HRQOL deterioration during treatment was 38% [1–0.62] lower in patients who reported fatigue versus those who reported no fatigue at baseline. When we performed a sensitivity analysis (see Table 1 Online-Only Supplementary Material), the results stayed the same.

Discussion

The aim of our analysis was to determine minimal important clinical differences in HRQOL as determined by patients undergoing surgery or chemotherapy. Quality of life was measured with the EORTC QLQ-C30 GHS scale. Using the anchor-based approach, MCID for improvement in the surgery group was set at 5 using ECOG PS and GDS15 as clinical anchors. MICD for deterioration was set at 11 or 9 using ECOG PS or VAS Fatigue respectively. Within the chemotherapy group, MCID for improvement was 8 using ECOG PS as clinical anchor and MCID for deterioration was 7 or 5 using ECOG PS or VAS Fatigue respectively. That MCID for improvement and deterioration differ was also reported in a large systematic review by Cocks et al. [36].

Although MCID might differ by patient population [5, 37], our reported MCID estimates are in line with the 5–10% range considered as a clinically significant HRQOL change among younger patients with cancer [5, 6, 38]. This indicates that older patients with cancer do experience similar quality of life changes than their younger peers during their treatment. When quality of life is an important goal of the treatment, our findings provide a rationale not to discriminate between older and younger patients with cancer in the treatment decision making process.

Applying the above MCID estimates for deterioration to our dataset, our study found that the odds of those patients who reported pain or risk for malnutrition (in the surgery group) or fatigue (in the chemotherapy group) at baseline to experience a HRQOL benefit (or at least stable) was higher compared to those patients who experienced no pain, good nutrition or no fatigue at baseline. Our results, using the reported MCID under this study, demonstrate that additional treatment benefits, such as less pain and better quality of life, are worth bearing in mind by clinicians when it comes to making treatment decisions.

Except for the symptoms mentioned above, no other GA variables at baseline were associated with a HRQOL change. Our study showed homogeneity in the elderly population regarding the impact of treatment on their HRQOL and suggests that no further disaggregation based on geriatric assessment is necessary in this population for treatment management.

Our findings constitute further evidence of the potential power of patient-reported outcomes, in particular quality of life measures, to improve or support clinical decision making. However, our study is not without limitations. One limitation is that some correlations between the clinical anchors used in our study and EORTC QLQ-C30 GHS scores were not that strong, with the possible result that some anchors did not perform well in defining the MCID and were thus excluded from interpretation. Nonetheless, excluding them based on the fact that the values should be at least equal to 0.2 SD is generally a well-accepted approach [28, 29, 39]. Another limitation in our study is that we did not account for a possible response shift (RS) [40]. A RS may occur when a patient adjusts to a changed health status. This altered meaning of HRQOL over time poses a challenge to clinicians in terms of how they interpret changes in HRQOL. Changes can be captured with retrospective pre-test (then-test) design whereby patients are asked at T1 to provide a new score for T0 or with more complex models [41,42,43]. Ousmen et al. [44] found in their study of primary breast cancer patients a moderate, but not significant, effect (0.25) of RS in case of deterioration and a small, but not significant, effect (0.08) of RS in case of improvement for the EORTC QLQ-C30 GHS scale. The authors also acknowledge the potential bias introduced by differential item functioning, whereby socio-demographic, cultural and other differences in the study population may impact the performance on the EORTC QLQ-C30 scale. Different MCID therefore might have been reported within subgroups. Unfortunately, our sample size was below the recommend size [45] to carry out this research, but further research on this topic and how it impacts MCID estimates is encouraged.

Our study had a significant amount of people with missing HRQOL information at follow-up. Those patients with no follow-up had statistically a lower performance level (ECOG PS) in combination with worse pain and malnutrition (surgery group) or worse fatigue (chemotherapy) than those patients who reported at follow-up. This might indicate that our results regarding HRQOL deterioration are mainly driven by older patients with a good functioning level combined with low symptom burden at baseline. Finally, our findings are derived from an observational study, where patients were allocated to the chemotherapy or surgery group based on clinical decisions and where treatment schedules were different. This selection bias hampers any comparison between the two groups regarding the prognostic value of patient baseline characteristics and HRQOL deterioration and might explain GHS differences over time between the two treatment groups.

With respect to this study’s strengths, our findings are based on the largest sample in literature at present for determining MCID. Moreover, our dataset is unique as it incorporates an older population with cancer, with a fair proportion of low ECOG PS scores, which is commonly excluded from these kind of studies. This makes our results more generalizable to the global cancer population compared to MCID derived from clinical trials with predominantly higher scoring and younger patients and therefore adds new insights to the existing literature on MCID. To date, the availability of one single, universal MCID is still hampered by the variation in MCID estimates, and MCID should therefore be applied carefully taking the population characteristics under study into consideration. The development of an evidence-based protocol as proposed by Musoro et al. [46] will facilitate the further use of MCID within specific cancer disease sites.

Our results should encourage clinicians to use these MCID to assess possible changes in patient management while patients are undergoing treatment or to aid in treatment decisions based on an initial screening of the patient before treatment starts. In addition, clinical researchers can use these MCID to determine the sample size when HRQOL, as measured by the EORTC GHS scale in particular, is considered to be an endpoint in a clinical study; clearly, the smaller the MCID, the bigger the sample size needed to detect these clinical changes.