Background

An essential goal of primary dementia care and psychosocial interventions for people living with dementia (PwD) is to improve health-related quality of life (HRQoL) [1,2,3,4]. However, the assessment of HRQoL among PwD is challenging due to the decline in cognitive capacity [5,6,7,8] and the limited perception of time, attention, judgment, and communication. These factors could affect the understanding and the completion of HRQoL questionnaires [6, 8, 9].

According to acceptability and validity, the preference-based health utility questionnaire EQ-5D performed comparably to other well-validated dementia-specific measures, e.g. the Quality of Life in Alzheimer's Diseases (QoL-AD) [10, 11]. This was underlined by two systematic reviews, concluding that the EQ-5D is a valid utility-based instrument for PwD and, therefore, recommended in use to measure HRQoL in this patient population [12, 13]. However, as dementia severity progresses, the collection of proxy ratings given by family members and informal caregivers or by medical and care staff instead of self-ratings was found to be more feasible [14].

Nevertheless, proxy-ratings demonstrate external perspectives of patients' HRQoL and should, hence, be used with caution [1, 15, 16]. Bryan et al. [17] found that data provided by clinicians and medical care staff had a higher construct validity compared to proxy ratings by informal caregivers for the more observable dimensions of HRQoL, e.g. patients' mobility and self-care. Conversely, caregiver ratings had a higher construct validity for the less observable dimension of HRQoL, e.g. depression and anxiety. Furthermore, there are also some differences, i.e. within proxy raters. For example, spouses rated HRQoL more positively than adult children [15]. These differences could significantly affect the conclusions drawn from HRQoL assessments.

Methodological drawbacks of the former version of the EQ-5D, the EQ-5D-3L, [18] prompted the development of a new five-level version, the EQ-5D-5L. This new instrument expands the range of the domains from three to five levels, aiming to improve discriminative ability, sensitivity, and responsiveness and reduce the ceiling effects [19, 20]. The psychometric properties of the three-level version compared to the five-level version have already been assessed in the general population and several chronic diseases. As a result, there are a marginal superiority of the five-level version in terms of increased informativity, discriminative ability, and decreased ceiling effects [21, 22].

However, no head-to-head comparison of the psychometric properties between the EQ-5D-3L and EQ-5D-5L in dementia diseases has been published, especially not comparing the commonly used proxy ratings. It is unknown whether an expansion from three to five levels in each dimension improves the deviation of proxy ratings by informal caregivers or health care professionals. Thus, this study aimed to compare the psychometric properties of the EQ-5D-5L with those of the 3L classification system in cognitively impaired PwD with patients' proxy ratings by family caregivers and care manager, both with a close patient-relationship.

Methods

Overview

The EQ-5D-3L, the EQ-5D-5L, the EQ-visual analogue scale (VAS) and the Quality of Life in Alzheimer's Diseases (QoL-AD) were completed as proxy ratings by family caregivers and care manager. Both versions of the EQ-5D were compared in terms of acceptability, agreement, ceiling effects, redistribution properties, inconsistency, informativity, and convergent and discriminative validity.

Study design and recruitment

This study used data collected from the ongoing interventional study DCM:IMPact (Dementia Care Management: Implementation into different Care Settings), an implementation study, which builds on the DelpHi-trial (Dementia: Life- and person-centred help in Mecklenburg-Western Pomerania, Germany) [23]. The mixed-methods, multi-center, implementation study DCM:IMPact was initiated to evaluate the effectiveness and efficiency of collaborative dementia care [23, 24]. Effective [25] and cost-effective [26] dementia care management intervention was implemented in various care settings (e.g. physician networks, nursing care centers) to disclose which care setting would reveal the highest need and lowest implementation barriers for such models of care and, thus, where the best effects could be achieved.

Health care professionals assessed patients' eligibility (70 years or older, living at home, screened positive for dementia or received a formal dementia ICD-10 diagnosis). If patients were eligible, the professionals provided written and oral information about the study and asked for patient and caregiver written informed consent (IC). This study was approved by the local ethics committee at the University Medicine Greifswald (BB 01/2019).

This analysis was based on preliminary data, including n = 77 patients, n = 52 caregivers and one dementia care manager, who had provided collaborative dementia care management for six months. Data were assessed at baseline and three and six months after the baseline assessment.

Data assessment

The EQ-5D-3L, the EQ-5D-5L, the EQ-VAS [18, 19, 27, 28], and the QoL-AD [29] were administered as proxy ratings via standardized computer-assisted face-to-face interviews. Thus, caregivers completed the measures via interview administrations at the caregivers' home done by a specifically-qualified nurse, the care manager. The care manager subsequently self-completed the EQ-5D-3L and 5L.

The informal caregivers and the care manager first completed the EQ-5D-3L with the EQ-VAS, followed by the completion of the EQ-5D-5L and the QoL-AD. Thus, for the caregiver, the "Interviewer Administered Proxy version 1" were used where the interviewer asked the caregiver (proxy) to rate the patient's health-related quality of life in their (the proxy's) opinion. For the care manager, we used the "Proxy version 1", where the care manager (the proxy) was asked to rate the patient's health-related quality of life in their (the proxy's) opinion. Interviews of the caregivers conducted by care manager were done first before the care manager themself completed the EQ-5D-3L and 5L.

Health-related quality of life and clinical instruments

The EQ-5D is a generic HRQoL instrument containing three (no, some, and extreme problems) or five levels (no, slight, moderate, severe, and extreme problems) for the following five dimensions: mobility, self-care, pain/discomfort, usual activities, and anxiety/depression. The responses to the EQ-5D-3L were converted to health utilities, the preference-based single index measure for HRQoL anchored at 0 for death and 1 for full health [18, 19, 27, 28]. The QoL-AD is a dementia-specific HRQoL instrument consisting of 13 items (eg, physical health, living situation, family, mood, energy, cognition, relationships, activities, etc.) using a scale of 1–4 (poor, fair, good, or excellent). Results of the QoL-AD are presented as a sum score, ranging from 13 to 52. Higher scores indicate better quality of life [29].

The following sociodemographic and clinical data were assessed: cognitive impairment measured with the Mini-Mental State Examination (MMSE) [30], comorbidity assessed with the number of ICD-10 (International Statistical Classification of Diseases and Related Health Problems) diagnoses listed in the GP files [31] and the response to the Charlson Comorbidity Index [32], social functioning [33] and depression based on the Geriatric Depression Scale (GDS) [34], deficits in daily living activities based on the Bayer Activities of Daily Living (B-ADL) Scale [35], healthcare resource utilization, e.g. hospitalization, by application of the Resource Utilization in Dementia Questionnaire (RUD) [36], general mental and physical health (the dementia care manager subjectively categorized the patients' general health after completion of the intervention into one of the categories: very good, good, poor), and severity of pain assessed with the standardized assessment of older adults in primary care (STEP) [37].

Data analyses

The responses to the EQ-5D-3L and EQ-5D-5L were converted to health utilities with the European and German value set [28, 38], respectively. The European value set of preference weights scores were applied to generate a VAS-based weighted health status index for all the potential 243 EQ-5D health states, ranging from 1 to − 0.074. The German value set is based on time trade-off and discrete choice experiments to estimate values for all 3125 possible health states, ranging from − 0.661 to 1. Descriptive statistics were used to present sociodemographic and clinical data for the study population. Measurement properties of the EQ-5D-3L and EQ-5D-5L were assessed in terms of acceptability, ceiling effects, agreement, redistribution properties, inconsistency, informativity, discriminative ability, and convergent validity.

Missing values and floor/ceiling effects

The number of missing values, the score ranges (observed vs. possible range), and the floor (% with lowest possible score) and ceiling effects (% with highest possible score) were used to compare the acceptability of both instruments. Additionally, absolute and relative changes in the ceiling effect of EQ-5D-3L versus EQ-5D-5L were calculated.

Agreement

The agreement between both versions was assessed with intraclass correlations (ICC) and presented with Bland–Altman plots. The ICC represents the proportion of variance from both index scores attributable to differences between individuals instead of the differences between the EQ-5D-3L and 5L. The higher the ICC, the higher agreement between the two versions. ICC higher (lower) than 0.7 indicates an acceptable (poor) agreement.

Redistribution properties and inconsistency

Inconsistency was assessed as suggested by previous studies [20, 39, 40], which defined a response within one EQ-5D domain as inconsistent when an answer in the three-level version is at least deviated two levels from the answer given in the five-level version (for example, 12,111 in the 3L version and 14,111 in the 5L version). The inconsistency size could thus range from 1 (two-level difference) to 3 (four-level difference). Redistribution properties were calculated as the percentage of consistent and inconsistent 3L–5L response pairs and the average size of inconsistency for each dimension and displayed with cross-tabulation of dimension scores.

Informativity

The informativity of both measures was assessed with Shannon indices (i.e. Shannon–Weaver index (H') and Shannon's evenness index (J')), which are appropriate measures to determine the discriminatory ability in health state classification in the comparison of the EQ-5D-3L and EQ-5D-5L. The higher the Shannon H' index, the more absolute information is captured by the measures. The Shannon Evenness index (J') captures the relative informativity of the distribution measure, regardless of the number of categories [20, 41]. If a cognitively impaired patient would not complete the additional levels as part of the 5L, relative informativity would be decreased, i.e. an expression of a loss of potential informativity [20, 42]. Discriminative power (change in absolute and relative informativity) was estimated for each dimension and the overall classification system. Positive (negative) values will demonstrate a gain (loss) of absolute and potential informativity of the 5L compared to the 3L version.

Known groups validity

The discriminative ability, defined as the ability to distinguish between different health and diseases stages, was assessed by different stages of functional impairment (Bayer Activities of daily living), cognitive impairment (Mini-Mental State Examination), depression (Geriatric depression scale) as well as general physical and mental health status. Cut-off values used for this analysis were established and validated within the development of each measure. Linear trends were assessed with the nonparametric Jonckheere trend test (> 3 groups) or Mann–Whitney test (2 groups).

Convergent validity

Convergent validity was assessed with Spearman's Correlation Coefficient between the EQ-5D-3L and EQ-5D-5L with the QoL-AD. Due to some overlap of dimensions (i.e. physical health, usual activities, self-care), we assumed there should be a moderate correlation between these measures. A correlation coefficient higher than 0.3 and 0.5 was determined as a moderate and strong correlation, respectively [43]. There should be a positive (negative) correlation between EQ-5D dimensions and B-ADL and GDS (EQ-VAS and QoL-AD) scores, as well as positive (negative) correlations between EQ-5D utility index and B-ADL and GDS (EQ-VAS and QoL-AD) scores.

Results

Patients' characteristics and health utility

The sample characteristics are presented in Table 1.

Table 1 Patients’ characteristics (n = 77)

There were 106 EQ-5D-3L and 5L proxy ratings by family caregivers (52 baseline, 15 three-months and 39 six-month follow-up assessments), and 133 EQ-5D-3L and 5L proxy ratings by the care manager (72 baselines, 20 three-months and 41 six-months follow-up assessments) were included in the analyses. The mean proxy-rating score of caregivers was 0.48 (SD 0.26) and 0.50 (SD 0.32) for the 3L and 5L, respectively. The mean health status caregivers stated in the VAS was 50.0 (SD 19.7). The care manager reported a higher mean utility score of 0.52 (SD 0.22) and 0.61 (SD 0.25) for the EQ-5D-3L and 5L. The mean value in the VAS by the care manager was 49.0 (SD 18.9), slightly lower as compared to the caregiver rating. The density plots and histograms of both measures and proxy ratings are demonstrated in Additional file 1: Figs. S1 and S3.

Missing values

Missing values occurred comparably and less frequently in both versions (3L vs 5L). There was also a similar occurrence of missing values within proxy ratings (0.9% for caregiver ratings and 0.8% for care manager ratings), as demonstrated in Table 2.

Table 2 Missing values and ceiling effects

Ceiling effects

Ceiling effects were for all domains smaller for the 5L compared to the 3L. The relative ceiling effect of the index decreased by 56% and 75% in the caregiver proxy rating and the case manager proxy rating, respectively. Ceiling effects were highest for the dimension "pain/discomfort" and "anxiety/depression". However, ceiling effects (8.5% vs. 3.0%) were higher and ceiling effect reduction lower (55.5% vs. 75.0%) in caregiver ratings than care manager ratings. Ceiling effects are depicted in Table 2.

Agreement

The agreement between both versions was good, but slightly higher among caregivers (ICC = 0.885, CI 0.831–0.922; p < 0.001) than in care manager ratings (ICC = 0.840, CI 0.684–0.908; p < 0.001). The density plot, presented in Additional file 1: Fig. S1, shows that the EQ-5D-5L index scores were higher than the 3L's. The Bland–Altman Plots, presented in Additional file 1: Fig. S2, showed a mean difference between the 5L and 3L index values (5L–3L) of 0.02 (SD 0.16) and 0.08 (SD 0.19) for the caregiver and care manager rating, respectively. 91% of the mean differences in the caregiver-rating and 95% of the mean differences in the care manager-rating were within the confidence intervals. Fewer outlier differences were distributed above, most below the 95% confidence interval.

Redistribution properties and inconsistency

The overall inconsistency of the EQ-5D proxy ratings was very low but slightly higher in caregiver ratings (caregiver: 2.6%, average size 1.09; care manager: 2.0%, average size 1.06). At least one inconsistent pair occurred in 14 (13%) out of 106 caregiver proxy assessments and 12 (9%) out of 133 care manager proxy assessments. The highest inconsistency was found for the dimension "pain/discomfort", with an average inconsistency size of 1.17 and 1.09, respectively. The redistribution properties and level of inconsistency are demonstrated in Table 3.

Table 3 Redistribution properties from 3 to 5L responses and number of consistent and inconsistent respond pairs: a cross tabulation of dimension scores

Informativity

Absolute and relative informativity increased in the 5L compared to the 3L for both proxies, which demonstrated an average increase of the absolute (caregiver rating: + 0.56, care manager rating: + 0.51) and relative informativity (caregiver rating: + 0.06, care manager rating: + 0.10). The relative and absolute informativity increase was highest for the domain "mobility". Absolute and relative informativity are presented in Table 4.

Table 4 Inconsistency between the 3L and 5L and Shannon (H') and Shannon Evenness index (J')

Known groups validity

The EQ-5D-3L and EQ-5D-5L were equally able to discriminate between general physical and mental health stages, functional impairment, and patient hospitalizations. The five-level version of the EQ-5D better distinguishes between stages of depression and pain, demonstrating the superiority of the 5L over the 3L for caregiver ratings. Contrary to this, the EQ-5D-3L care manager rating better discriminates between stages of general physical and mental health, functional and cognitive impairment, and pain than the 5L. The discriminative ability is represented in Table 5.

Table 5 Discriminative ability/known-groups validity of the EQ-5D-3L and the EQ-5D-5L (proxy-rating given by a caregivers and care manager)

Convergent validity

Both proxy ratings demonstrated that the EQ-5D-5L had a better convergent validity with most of the measures, which revealed the superiority of the 5L version. However, the convergent validity was better for caregiver ratings than care manager ratings, demonstrated by larger correlation coefficients. The convergent validity of both measures is presented in Table 6.

Table 6 Convergent validity of the EQ-5D-3L and the EQ-5D-5L assessed using Spearman Correlation

Discussion

To the best of our knowledge, this is the first analysis comparing the psychometric properties of the EQ-5D-5L compared to the EQ-5D-3L in PwD using proxy ratings given by informal family caregivers and health professionals. Generally, the EQ-5D-5L reveals higher index scores than the 3L. Both EQ-5D-5L proxy ratings improve psychometric properties by reducing ceiling effects and improving informativity and convergent validity. As demonstrated by the ICC, the agreement between the three- and five-level EQ-5D was excellent but slightly higher in the caregiver proxy rating than the care manager rating. Also, caregiver proxy ratings demonstrated a better convergent validity than the care manager proxy ratings. Thus, the EQ-5D-5L shows its superiority over the 3L version as a proxy rating used in dementia, primarily when family caregivers assess patients' health status.

Both EQ-5D-5L proxy measures demonstrated similar feasibility, acceptability (missing values), informativity, and consistency. The agreement between both proxy-rating measures was excellent (ICC 0.885 for caregiver rating and 0.840 for care manager rating) and in line with the reported agreement in previous studies, revealing ICC higher than 0.85 [44, 45]. Missing values infrequently occurred, comparable to previously published studies [20, 44, 46].

Several validation studies found ceiling effects of both measures in different settings between 15 and 50%, with a decrease of up to 10% from the 3L to the 5L version [20, 44, 45, 47,48,49,50]. The ceiling effects reported by proxies in this analysis were lower (< 8.5%) than in these previous studies, underlining the excellent feasibility of both EQ-5D-5L proxy ratings in dementia. Even though the proportion of patients in "full health" was lower, the decreased absolute and relative ceiling effect of the EQ-5D-5L was in line with other studies' findings [20, 44,45,46,47,48,49,50].

In line with previously published studies, this analysis demonstrated a significant gain in absolute informativity and an improvement in relative informativity for all dimensions in the EQ-5D-5L [20, 44, 45, 51]. The underlying sample of older and, in most cases, multimorbid PwD could be the main reason for choosing all response levels in the 5L to describe patients' health by family caregivers or the care manager, which causes a higher variation of health states. This is in line with previously published studies [20, 52].

The extension from three to five levels could be more challenging, causing inconsistent valuations. However, informal caregiver and care manager proxy ratings revealed a very low inconsistency (< 3%), in line with previous studies (1–5%) [44, 45, 48, 53]. Thompson et al. underlined that inconsistency is higher in populations with multimorbidity (up to 10%) than in the general population (4%). Still, inconsistency appeared to be low in caregiver and care manager proxy ratings, as demonstrated in this analysis. This could mean that proxies, e.g. caregivers or staff, can reliably assess patients' health status.

A systematic review by Hounsome et al. [54] summarized different aspects of the performance of the EQ-5D in studies of dementia, which revealed that other proxies, e.g. family carers and health care professionals, provide separate ratings for patients' health. The authors further concluded that the mode of assessment and selecting appropriate proxies is vital to ensure a high validity in this specific sample. Further studies report that both instruments, the EQ-5D-3L and 5L, demonstrate a good known-groups validity in dementia diseases, with some evidence that the 5L discriminates better between different groups [44,45,46], which caregiver ratings in this analysis could only confirm. For the care manager, the EQ-5D-3L distinguished better between stages of health, suggesting that we could not demonstrate an overall superiority of the EQ-5D-5L over the EQ-5D-3L in the known-groups validity. Also, caregiver ratings discriminate better between health states than ratings of health professionals, demonstrating the superiority of caregiver ratings over ratings of health professionals.

Both proxy-rating instruments also performed well regarding the convergent validity, with evidence tendency for a slightly better convergent performance of the EQ-5D-5L. However, this superiority remains uncertain and should be confirmed in future psychometric head-to-head analyses that compare both measures in dementia diseases. Most previously published studies reported a slight to considerate superiority of the EQ-5D-5L concerning the convergent validity [44, 45]. However, the caregiver rating's convergent validity was slightly higher than the care manager rating, which demonstrates that the EQ-5D-5L administered in caregivers outperforms the care manager proxy rating. A study by Bryan et al. [17] sought to identify whether the validity of the EQ-5D was higher for family caregivers or health care professionals. Their findings suggest that EQ-5D ratings by family caregivers had a higher validity for less observable dimensions, i.e. "usual activities" and "anxiety and depression". In contrast, the construct validity in health care professional ratings was higher for the more observable dimensions of the EQ-5D, i.e. "mobility" and "self-care". This could, however, not be confirmed by our results.

Strengths and limitations

This is the first study administrating both the three and five-level versions of the EQ-5D in multiple proxies, creating a sufficient basis for a comprehensive assessment of the psychometric performance of the EQ-5D in dementia. Furthermore, a strength of this analysis was the inclusion of several in this indication critical clinical measures, like cognitive and functional impairment, general health, depression and pain, which were needed to assess the validity of the EQ-5D instruments thoroughly.

However, several limiting aspects are acknowledged. First of all, the study was based on a sample size of 106 caregiver assessments and 131 care manager proxy ratings, limiting the generalizability of the results. Especially the fact that only n = 52 caregivers and only one heath professional (care manager) assessed patients' health limits the robustness of the presented results. Secondly, the EQ-5D-3L was completed before the EQ-5D-5L by caregivers and the health professional (care manager). Thus, overuse of levels two and four in the 5L could be possible, as reported by Janssen et al. [39]. Furthermore, the mode of administration differed between caregiver proxy-rating (interview) and care manager ratings (self-rating). This could further limit the generalizability of the presented results.

Contrary to this, an initial completion of the 5L could have caused the overuse of level two in the 3L version. Future studies should consider randomization of the application process to reduce potential bias. Most importantly, the care manager completed the EQ-5D-3L and 5L after interviewing the caregiver, who stated the patient's health as a proxy. Therefore, this survey sequence might have influenced the care manager in assessing the patient's health status.

Finally, the comparison of the psychometric performance is affected by the different value sets used. While the European value set used for the EQ-5D-3L caused a range of utility values between – 0.074 and 1, the German value set led to EQ-5D-5L utility values between − 0.661 to 1. Thus, the EQ-5D-5L basically had a wider range that could be an advantage in distinguishing groups of diseases stages and general health, and also to correlate with other measures. While the agreement between the measures (3L and 5L) was excellent and both measures performed equally in the known groups' validity, the 5L had a better convergent validity, which could also be due to the basic differences in the different value sets used. Further research is needed to reveal the impact on the psychometric properties revealed within head-to-head comparisons. Cross-walks that mapped responses of the EQ-5D-3L to the EQ-5D-5L could be helpful and a prerequisite to use the same value set for both measures.

Conclusion

Our results provide some indications that the five-level version of the EQ-5D was slightly superior over the three-level version by improving informativity and convergent and discriminative validity and reducing ceiling effects. Our findings also indicate that family caregivers' ratings may be preferable to measure HRQoL in PwD due to a better discriminatory ability and higher convergent validity. However, further research is needed to clarify and confirm the superiority of the five-level version of the EQ-5D using larger sample size and also taking reliability and responsiveness measures into account.