Introduction

The incidence of primary brain tumors is low compared with other cancers, such as lung or breast cancer [1]. Despite multimodal treatment, including surgery, radiotherapy, and chemotherapy, patients with gliomas (the most common primary brain tumor) cannot be cured of their disease and have a poor prognosis [24]. For these patients, the quality of survival is arguably at least as important as the duration of survival [5]. The benefits of multimodal treatment strategies, in terms of prolonged survival or delay of progression, have to be carefully weighed against the side effects of the treatment, which may adversely influence the patient’s functioning and well-being during his/her remaining life span [6].

Measuring a brain tumor patient’s functioning and well-being goes far beyond assessing (progression-free) survival or tumor response to treatment on imaging, which are considered to be “hard” and traditional outcome measures. According to the World Health Organization International Classification of Functioning, Disability and Health (2001), disability involves dysfunctioning at three levels. A patient’s impairment is the most basic level (i.e., hemiparesis). On a higher level, the patient’s activity limitations reflect the consequences of the impairment in daily life (i.e., the patient with a hemiparesis is unable to climb stairs), whereas how the disability affects the patient’s well-being and his/her social interactions may be reflected in the patient’s participation restrictions (i.e., the patient who cannot climb stairs will be forced to move to another home/level).

A more integrated way to measure patients’ functioning and well-being is the assessment of a patient’s health-related quality of life (HRQOL). HRQOL is a multidimensional concept covering physical, psychological, and social domains, as well as symptoms induced by the disease and its treatment [7]. During the last few decades, HRQOL has become an important (secondary) outcome measure in clinical trials for brain tumor patients [817]. In trials that are palliative in character, maintenance of HRQOL even is the most important outcome [18].

HRQOL Measurement Tools

By definition, HRQOL is a patient-reported outcome (PRO) and thus provides the patient’s perspective [19]. There is a wide range of PRO instruments, from one-dimensional measures (assessing one specific aspect of HRQOL, such as fatigue) to multidimensional HRQOL measures [5].

Currently, there is no single gold standard tool to measure HRQOL, and several valid measures of HRQOL in brain tumor patients are available. One frequently used HRQOL tool used for cancer patients is the European Organisation for Research and Treatment of Cancer (EORTC) QLQ-C30 [20]. This questionnaire contains 30 items organized into five functional scales: three symptom scales, one global health and quality of life scale, and several single symptom items. The EORTC QLQ-C30 is often used in conjunction with the brain tumor-specific questionnaire, the EORTC QLQ-BN20 [21, 22]. This questionnaire, developed for and validated in brain tumor patients, consists of 20 items subdivided into four multi-item scales on future uncertainty, motor dysfunction, communication deficits and visual disorders.

Another frequently used cancer-specific HRQOL questionnaire is the Functional Assessment of Cancer Therapy (FACT-G) and the disease-specific subscale FACT-Brain, which provides additional questions for brain tumor patients [23]. The items in these questionnaires measure a range of HRQOL domains, including physical, functional, social, family, and emotional well-being, and the quality of the relationship with the physician. The FACT modules and the EORTC questionnaires differ in respect to their focus, with the former focusing more on psychosocial aspects and the EORTC QLQ-C30 more on symptoms.

A more recently developed PRO used in brain tumor research is the MD Anderson Symptom Inventory Brain Tumor Module (MDASI-BT) [24, 25]. This questionnaire was developed to score both symptom severity and symptom burden or interference with daily living. The MDASI-BT shows similarities with the EORTC QLQ-BN20, as both questionnaires focus on symptoms.

Use of HRQOL in Clinical Practice

As noted above, the assessment of HRQOL in clinical trials for brain tumor patients is becoming more widespread and the number of studies including such an assessment is increasing rapidly. Results of these HRQOL studies are often used in clinical practice. First, the results may facilitate treatment decision-making by informing physicians about the value and impact of a specific treatment strategy [6], both in terms of overall and progression-free survival and quality of life. Second, routine HRQOL assessments in daily clinical practice provide the physician with additional information on the patients’ well-being and can thereby facilitate communication between the physician and the patient [2628].

A sine qua non is that the data of such HRQOL assessments are collected, analyzed, and interpreted correctly. However, interpretation of the results may be hampered by methodological issues that are not always taken into account.

Methodological Limitations of HRQOL Measurements in Trials

Selection Bias

One potential source of error in HRQOL assessments comes from selection bias, which reflects a distortion of the results arising from the process by which patients are selected into the study or from an unrepresentative sample that remains for analysis. For example, HRQOL, as well as other outcomes reported in glioma trials, may not represent the entire glioma population because of stringent inclusion criteria for clinical trials. Cheng et al. [29] described that patients older than 70 years, with Karnofsky Performance Scale scores below 50 or even 70, or with cognitive impairments, are often excluded from participation. It is likely that these excluded patients have lower HRQOL scores, and thus the mean HRQOL scores for those glioma patients who are included in studies may well over-rate the actual situation of the larger patient population of interest. Moreover, patients with poor health status and/or tumor progression are more likely to drop out of clinical trials than patients with stable disease and/or better performance status [15]. As a result, patients with a better health status and favorable treatment response will be over-represented during follow-up and subsequent analysis, leading again to an overestimation of HRQOL [30].

In many studies, HRQOL is assessed up to, but not beyond, brain tumor progression [9, 1315, 31, 32]. Important information on HRQOL will therefore be lost, as evaluation of HRQOL after disease progression may reflect deterioration on many HRQOL endpoints [31]. To evaluate the effect of late treatment effects and disease progression on HRQOL, data collection after tumor progression should be included in future trials [33, 34]. This is especially true for brain tumor patients in whom tumor-directed treatment is no longer possible and in whom maintaining their HRQOL becomes the main objective. It should be noted though, that post-progression treatment must be part of the study protocol in clinical trials in order to compare HRQOL outcomes between treatment arms.

Timing of the Assessments

Interpretation of HRQOL data might also be hampered by inappropriate timing of the HRQOL assessments. Optimal timing of these assessments depends on the purpose of the clinical trial. The main objective of a trial could be to determine the effect of toxicity induced by chemotherapy on HRQOL, measured at the peak of the specific toxicity-related effect, or to determine whether the treatment has a lasting negative effect on HRQOL, i.e., at the time immediate toxicity has faded [35]. Several studies have addressed the effect of the timing of administration of HRQOL questionnaires and the interpretation of HRQOL outcomes, and found that modest changes in the timing of the assessment or in time “windows” for the completion of HRQOL questionnaires could result in substantially different outcomes [3638]. As an example, in patients with brain tumors, HRQOL questionnaires are often administered just before the patient sees the physician to discuss the results of the magnetic resonance imaging scan. At that moment, patients are typically anxious about a possible relapse and this may negatively influence their HRQOL scores. Likewise, administering the questionnaires immediately after the patient has received results of ancillary investigations might also influence HRQOL scores because feelings of relief or depression might have an effect on the way patients fill out the questionnaire. Hence, when the objective of the study is to determine the lasting effect of treatment on HRQOL (longitudinal analysis), the optimal moment to administer HRQOL questionnaires in brain tumor patients might be at the same time as the magnetic resonance imaging scan or chemotherapy cycle. However, when the objective is to determine the immediate effect of the treatment (i.e., toxicity of the chemotherapy), HRQOL measurements should be performed shortly after the treatment and not one or several weeks later, when the toxicity effect has faded.

Although the timing of HRQOL assessments is important for the interpretation of the results, in practice, there will inevitably be deviations from the planned data collection. Therefore, a specific time window for the completion of HRQOL questionnaires should be specified to allow for some variability in the timing of the assessments within a trial. Allowing these completion-time windows, with questionnaires completed a few days before to a few days after the planned date, might improve the validity of the results by allowing inclusion of a larger proportion of HRQOL questionnaires (and thus limiting the amount of missing data), thereby facilitating statistical modeling [39]. However, this time window should be carefully defined, taking into account the specific research question and the cyclic changes of the parameters to be studied [37].

Only few recommendations are described to prevent hampering of the interpretation of HRQOL data by inappropriately timed HRQOL assessments. The exact timing of the HRQOL assessment should be specified in advance, as well as the completion-time window, and adherence to this time schedule should be monitored regularly. Furthermore, it is important to perform a baseline measurement before randomization and thus before the start of the treatment, to avoid the situation that the initial HRQOL assessment is already influenced unintentionally by early side effects of treatment [40].

Missing Data

A major source of bias in HRQOL data comes from missing data that may seriously hamper the analysis of longitudinal HRQOL data from brain tumor patients. Missing data will reduce the power of the study and also the possibility of detecting clinically meaningful differences. Missing data may arise either from missing complete questionnaires or from missing responses within a questionnaire. Missing complete questionnaires can occur when a patient misses a clinic visit at a certain time point or when a patient has dropped out of the study, whereas missing a response within a questionnaire typically occurs when a patient unintentionally misses the item or because a patient chooses not to answer a particular item for personal reasons [41].

The most common reasons for dropout in brain tumor patients are progression of the disease or death [15]. As a result, patients with a shorter life expectancy are always overrepresented in the initial phase of a clinical trial, whereas patients with more favorable prognostic factors and a good treatment response will dominate later at follow-up [30].

Although missing data are a common problem in many research areas, this is rarely reported in published studies. Walker et al. [42] assessed systematically the problems involved in collecting HRQOL data from glioma patients and the reasons for such missing data. The major source of missing data was administrative failure (72.2 %). Administrative failures could be divided into patient-related factors and researcher-related factors. Patient-related factors included intentional or non-intentional non-attendance, poor patient motivation, misunderstanding of instructions, and incorrect completion of questionnaires. Researcher-related factors included administrative failure (e.g., questionnaires not being administered or being administered at the wrong time), lack of explanation that might result in incorrectly completed questionnaires or missing questions, or reluctance to assess a patient who is deteriorating clinically. Other reasons for missing data were poor health status and patient refusal. Importantly, patients who had been compliant during the follow-up period lived significantly longer than patients who were not compliant. Moreover, non-compliant patients were older, more anxious, and had worse performance status at baseline [42]. These results suggest that interpretation of longitudinal data should be done with caution. In particular, analyses based only on complete cases will be biased because it will typically be based on a sample of patients that is younger, healthier, and with a better prognosis.

There are different ‘missingness mechanisms’ that were first described by Rubin [43]: Missing Completely At Random (MCAR), Missing At Random (MAR) and Missing Not At Random (MNAR). Data are MCAR when missingness is completely random and not dependent on any patient characteristics. When data are MAR, missingness depends on previously observed responses or specific patient characteristics. Lastly, MNAR depends on unobserved responses, as well as on previously observed responses [44]. Most statistical analyses can be performed when data are MCAR or MAR [45]. However, MCAR in clinical trials is probably not realistic given that non-response more frequently occurs when patients have a poorer health status. The nature of certain items could also result in non-response that is not MCAR, such as the item on sexuality in the FACT [41]. Moreover, patient and tumor characteristics can have a significant influence on the missingness mechanism, and the HRQOL scores at each time point can also have a significant effect on missingness at the next time point [15].

When a small proportion of items within a submitted questionnaire are missing, there are several methods of data imputation (e.g., replacement with subject mean, subscale mean, item mean) that will result in similar HRQOL scores as when the dataset would have been complete. This is even true for items that are MNAR. Even for datasets with a large proportion of missingness, imputation results in better estimates than a complete case analysis. A complete case analysis may result in a clinically significant bias [41]. Another method to reduce missing (item) data is with electronic data capture. This method produces similar data accuracy compared with paper-based methods, but reduces the amount of missing (item) data and decreases the time required for collecting the data [46].

Response Shift

Detection of changes in HRQOL outcomes is important in order to determine the effectiveness of health care interventions at the individual or group level [47]. Interpretation of these changes could, however, be hampered by a so-called response shift [48, 49]; patients may alter their internal standards (recalibration), values (reprioritization), or conceptualization (reconceptualization) of HRQOL when they experience changes in health [49, 50]. Building on the original model of Sprangers and Schwartz [51], a new theoretical model was proposed to describe the process of response shift [52]. The original model incorporated several components that would predict changes in perceived HRQOL over time; it described the interaction between catalysts (health state, intervention, or events that have an effect on HRQOL), antecedents (sociodemographics, personality, expectations, and spiritual identity), mechanisms (coping, social comparison, goal reordering, reframing expectations), and response shifts, and posited a feedback loop to explain how changes in HRQOL can be stabilized despite changes in health status [51]. Expanding the original model with a frame of reference, in this case HRQOL appraisal, allowed for taking into account unexplained changes in HRQOL measurements [52].

Response shift might be the anticipated effect of interventions in palliative research. In situations where symptoms and functions may not improve that much, an increase in HRQOL is the most desired outcome [53]. However, most research is hampered by response shift as it may attenuate or increase estimates of treatment effects if patients adapt to disease progression or treatment toxicity. This may be especially problematic when the response shift does not affect two treatment arms similarly [54]. Therefore, it is important to understand to what extent changes in HRQOL over time represent true changes and to what extent measurement error due to this response shift [55].

Numerous methods are available to detect and measure the size and direction of response shift. These approaches can be categorized into retrospective rating, vignettes, direct questioning, and individualized methods [53, 54]. One of the most common approaches is the then-test method. Within the then-test method, pre-test data are also collected in retrospect, simultaneously with the post-test data. The concept is based on the perception that patients provide a retrospective pre-test response from the same perspective as the post-test response [56]. It is assumed that patients rate both pre- and post-test HRQOL levels with the same internal standard. Hence, the bias introduced by response shift might be eliminated when comparing post-test scores with then-test scores [54]. Comparison of the mean of the pre-test and the mean of the then-test would give an estimate of the size and the direction of the response shift [57]. However, this method is not without criticism owing to the possible influence of recall bias, in particular [58]. This is especially true in brain tumor research where most patients experience a decline in cognitive function, thereby increasing the occurrence of recall bias.

Differential Item Functioning

Many translations of HRQOL questionnaires have been validated, but international comparisons using these questionnaires might not be valid if patients with different cultural backgrounds respond differently to the items. Differential item functioning [59] occurs when there are differences in response to different items of HRQOL questionnaires with respect to, for example, language, culture, country, age, gender, and treatment. Differences in response to several items in the EORTC QLQ-C30 exist and are likely to be caused by a lack of similarity between the original English versions and translations [60]. Also, different cultural groups value aspects of their HRQOL differently [61]. Although these differences are considered to be small [62, 63] they may be clinically important [64]. This should specifically be considered in observational and non-randomized clinical trials.

Cognitive Deterioration

Patients with primary brain tumors are different from other cancer patients, in that these patients not only have cancer, but also a progressive brain disease resulting not only in neurological but also cognitive deterioration [6568]. Apart from the tumor, tumor-related epilepsy, treatment (surgery, chemotherapy, and radiotherapy), and specific medication, such as corticosteroids and anti-epileptic drugs, may influence cognitive functions [68]. Thus, cognitive dysfunction may hamper HRQOL assessments based on patient self-report in that some patients may not be able to provide valid feedback about their own level of functioning or symptom experience.

Additional Measures

Proxy Ratings

Although there is consensus that patients are the best source to rate their HRQOL [69], proxy ratings should be considered as a potentially appropriate alternative in brain tumor research because proxies might better judge the patients’ HRQOL in those situations where patients are cognitively impaired or have a very poor health status. As proxies, such as partners or relatives, are often involved in the care of the patient, they have a fairly good picture of the patients’ well-being. Proxy measures might therefore substitute or complement patient self-assessment, thereby decreasing the amount of missing data and improving the accuracy of HRQOL assessments, especially in those patients with cognitive impairments. In these cases, differences between patient and proxy ratings do not necessarily reflect inaccuracy [70].

If one intends to use proxy ratings, agreement between patient and proxy ratings on HRQOL is required. However, results from studies that have investigated the levels of agreement between proxy and patient ratings have been somewhat inconsistent. Several studies have shown moderate-to-good agreement between patient and proxy/physician ratings [7174], while others have reported poor agreement [75, 76•, 77]. Moreover, patients and proxies often agree on symptom scales, but less on psychosocial scales [78, 79]. Disagreement between patient and proxies is especially profound with increasing symptom severity or cognitive impairment [71, 72, 80]. Agreement between patients and proxies on HRQOL is typically highest in the high and low range of patient’s HRQOL [73].

Proxies tend to underestimate patients’ HRQOL, while physicians tend to underestimate their patients’ pain [77, 81]. However, as this discrepancy is frequently non-differential, proxy ratings could be used for between-treatment comparisons. Proxy characteristics, such as age and type of proxy (relative, spouse, or healthcare professional), depression, and caregiver burden, also contribute to differences between proxy and patient ratings [81, 82], as well as different appreciations/expectations of HRQOL by different persons [83].

Theoretically, the point of view from which the proxy is rating the HRQOL of the patient is of importance. If the desired viewpoint is not clearly described, an unintended measurement error might be introduced to the data. It must therefore be specified whether the proxy is rating HRQOL from a proxy–patient perspective or from a proxy–proxy perspective [82]. Proxy–patient perspective represents the proxy assessment of the patients HRQOL from the patient’s viewpoint, whereas the proxy–proxy perspective represents the proxy assessment of the patients HRQOL from the proxy’s perspective. However, Gundy et al. [83] have demonstrated that, in practice, proxy HRQOL scores are not influenced significantly by the perspective that they are asked to use (proxy–patient versus proxy–proxy).

Instrumental Activities of Daily Living

To obtain a more complete picture on patients’ functioning and well-being, an additional measure specifically focused on everyday functioning could also be considered. Although HRQOL is often considered as an all-embracing concept, with measurements providing information on physical, psychological, and social aspects, as well as symptoms induced by the disease and its treatment, they do not provide objective information on the patient’s functioning in daily life. Neuropsychological test batteries are typically used to assess objectively brain tumor patients’ cognitive functioning [8, 68, 84], but these assessments are time consuming, require trained staff, and do not necessarily reflect how patients are functioning in everyday life. An additional measure intended specifically to assess everyday functioning might therefore fill this gap. Measures of instrumental activities of daily living (IADL) include complex daily activities that are important for functioning independently in society [85]. IADL involves higher order activities and may therefore be vulnerable to the early effects of cognitive decline. To gain more reliable information on the brain tumor patient’s functioning and well-being, completion of IADL measures by proxy respondents may be appropriate. Such a proxy-based IADL questionnaire has recently been validated in patients with early dementia [86]. Given its good psychometric properties, and the similarities in problems in IADL between patients with early dementia and those with brain tumors, we expect that this or similar measures may also be useful in assessing the IADL of brain tumor patients.

Considerations and Conclusions

Despite its limitations, use of HRQOL assessments as a (secondary) outcome in brain cancer research is increasing, and it is therefore expected that HRQOL data will have a growing effect on clinical decision-making and healthcare policy development [87]. Hence, it is important that HRQOL data are adequately reported in clinical trials. Several studies have described key methodological issues that need to be addressed when conducting and reporting HRQOL studies [5, 6, 8890]. These issues include reporting on baseline compliance and missing data, providing a priori hypotheses and a rationale for using specific measurement tools, providing information on the psychometric properties of the selected tool and information on the timing of the HRQOL assessments, and, lastly, addressing the clinical significance of the HRQOL outcomes. Appropriate statistical analysis, with sophisticated techniques, is also warranted [91]. A minimum standard checklist available for evaluating HRQOL outcomes in cancer clinical trials was developed [92] based on good practice in reporting on HRQOL [69, 93, 94]. In addition, Osoba et al. [87] have recommended a simple approach, consisting of four steps, to report on clinically meaningful changes in HRQOL data. Use of both of these approaches can help to reduce the bias in interpretation of HRQOL data and raise the standards of reporting. While a significant improvement in the quality of HRQOL reporting has been observed [90], there is still room for improvement. Future studies should take the various methodological issues outlined in this article into account, limiting problems where possible, and, in any case, reporting problems as they are encountered when writing up clinical trial results.

In conclusion, although HRQOL assessment is very valuable in brain tumor research, it is not the only outcome measure to determine the well-being of an individual patient. First, methodological limitations such as incorrect timing, missing data, response shift, selection bias, and differential item functioning hamper the interpretation of these results. Moreover, as most brain tumor patients experience a cognitive decline, patient self-reported HRQOL may not always be entirely valid or reliable. Proxy ratings represent a potential supplementary or even alternative source of information about the HRQOL of patients with progressive cognitive decline. Moreover, additional measures, such as IADL, could also complement HRQOL assessments, providing us with a more complete picture of the patient’s level of functioning in daily life.