FormalPara Key Points for Decision Makers

Overall, the EQ-5D measures health-related quality of life in asthma validly and is sensitive to its changes.

However, its measurement characteristics pale beside asthma-specific health-related quality-of-life instruments.

There is a need to improve its measurement characteristics in this population.

1 Introduction

The EQ-5D is a widely used instrument for measuring health-related quality of life (HRQoL). It comprises a health descriptive system and a visual analog scale (EQ VAS). The health descriptive system includes a five-item classifier that describes health status on the interview day in five domains: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. This is represented numerically using either a three-level “problem” rating scale (1: “no”, 2: “some/moderate”, 3: “extreme/unable to/confined to bed”), known as EQ-5D-3L, or a five-level scale (1: “no”, 2: “slight”, 3: “moderate”, 4: “severe”, 5: “extreme/extremely/unable to”), referred to as EQ-5D-5L. The latter is the newer version, which is more sensitive and responsive [1]. This descriptive profile of health can be applied to a country-specific utility set derived from standard elicitation techniques to generate an index score anchored by 0 (being dead) and 1.00 (full health). This score can be used to yield quality-adjusted life-years for cost-utility analyses or a level sum score (LSS) that ranged from five to 25 [1]. The EQ VAS (VAS) is a numerical scale that ranges from 0 (worst imaginable health) to 100 (best imaginable health), allowing respondents to rate their overall health. The older version used in EQ-5D-3L entails marking a straight line from a box indicating “your health today,” while the later version adopted in 2018 for both EQ-5D-3L and EQ-5D-5L asks for both a direct “X” marking on the scale and a written score number in the designated box [2].

EQ-5D-3L was approved by the National Institute for Health and Clinical Excellence for the economic evaluation of healthcare interventions. In addition, EQ-5D is also increasingly used as a patient-reported outcome measure for assessing the effects of diseases and treatment in clinical and research settings [3, 4]. For EQ-5D to effectively guide patients, healthcare providers, policymakers, and researchers in healthcare decisions aimed at improving service provision and clinical outcomes, it must possess robust psychometric properties [5]. Important psychometric properties that enhance its utility as an HRQoL instrument are construct validity (CV), responsiveness, and test-retest reliability [6]. Construct validity refers to the degree/extent to which EQ-5D accurately measures HRQoL, the construct it is designed to measure. Responsiveness is its ability to measure and detect clinically relevant and important changes in HRQoL over time and/or responses to the administered interventions, while test-retest reliability assesses its consistency in producing similar results in repeated HRQoL measurement within the same individual when their condition remains stable across two timepoints [6].

Asthma is a highly prevalent chronic respiratory disease that causes symptoms such as breathlessness, wheezing, and coughing, leading to poorer HRQoL [7,8,9,10]. To date, two systematic reviews had examined the psychometric properties of EQ-5D in assessing the HRQoL impact of asthma. The first systematic review concluded that EQ-5D is valid in asthma with respect to CV, reliability, and responsiveness, based on the aggregated findings of seven papers selected from two databases. However, this review, performed over 10 years ago, presented the results narratively without employing a standard methodology and did not quantify the magnitude of each psychometric property [11]. Additionally, the seven papers examined were on the older three-level version of EQ-5D and VAS that are known to possess poorer psychometric properties than their latter five-level and VAS versions [12, 13]. Of the seven articles included in the review by Pickard et al., two were population-based studies, which included other chronic medical conditions in addition to asthma. Among the two clinical studies included, one focused on pediatric asthma, and the other on allergic rhinitis rather than asthma [11, 14,15,16,17]. The second review, which searched articles until 2020, included 17 articles but did not follow the current recommended standard for assessing studies on measurement properties of HRQoL instruments. The authors found varying CV from weak to strong but could not adequately assess responsiveness as only one study was available [18].

Since the review by Pickard et al., critique has emerged on the ability of EQ-5D with its recall period of “TODAY” to adequately capture the episodic symptoms and flares of asthma. Moreover, the content validity and acceptability of EQ-5D in asthma have also recently been questioned [19, 20]. Condition-specific HRQoL instruments developed specifically for asthma and/or obstructive airway diseases, such as the Asthma Quality of Life Questionnaire (AQLQ), Newcastle Asthma Symptoms Questionnaire (NASQ), Severe Asthma Questionnaire (SAQ), Asthma Control Test (ACT), Asthma Control Questionnaire (ACQ), Asthma Therapy Assessment Questionnaire (ATAQ), and St George’s Respiratory Questionnaire (SGRQ) have been shown in some studies to be more sensitive than EQ-5D-5L in assessing HRQoL impacts of the disease and its treatment [4, 21,22,23,24].

We aimed to provide an updated systematic review on the CV and responsiveness of EQ-5D measures in asthma and compare them with asthma-specific HRQoL instruments. This review may inform future research and aid the development of enhanced versions of EQ-5D for use in asthma.

2 Methods

2.1 Protocol and Procedure Overview

We registered the study protocol in the International Prospective Register of Systematic Reviews (PROSPERO) database (CRD42021262169). The review was reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) statement. We also adhered to the COnsensus-based Standards for the Selection of Health Measurement Instruments (COSMIN), a set of consensus-based rules for evaluating studies on the measurement properties of scales, to guide the evaluation of the methodological quality of selected articles and measurement properties (CV and responsiveness) of the EQ-5D scale [25, 26]. In accordance with COSMIN guidelines, we defined the CV of EQ-5D as the magnitude in which results conform to an a priori expectation of measuring HRQoL in asthma. This was done by forming a pre-defined set of hypotheses on its correlations with asthma-specific HRQoL scales (convergent validity), and its score differences among subgroups with dissimilar asthmatic patient characteristics (known-group validity). Whereas, responsiveness is the ability of EQ-5D to determine changes in asthmatic patients’ HRQoL over time [27]. We used the bibliographic EndNote database, version X9.3.3 (Clarivate Analytics, Philadelphia, PA, USA) to import references and filter duplicate articles. Two sets of reviewers (AP and LA/SS) independently screened titles/abstracts, selected articles, and extracted data. Disagreements were resolved through iterative discussions between the reviewers and consultation with a third reviewer (LN).

2.2 Information Sources and Search Strategy

A scoped search was first performed in PROSPERO, COSMIN, Cochrane, and PubMed databases to confirm there were no similar reviews published or in progress using three principal concepts: (“asthma”) and (“EQ-5D”) and (“psychometric property” OR “measurement property” OR “validity” OR “responsiveness”). A search strategy developed with the aid of an academic library specialist was applied on 1 June, 2024 to six electronic databases (PubMed, Embase, Cochrane, Cumulative Index to Nursing and Allied Health Literature, PsychINFO, and Scopus) with no restrictions specified (Electronic Supplementary Material [ESM]). In addition, the team also manually screened the references of selected articles and searched for gray literature in ProQuest Dissertations & Thesis, GreySource (under the “Biological & Medical Sciences” classification scheme), and Google Scholar to maximize the search breadth and limit publication bias [28].

2.3 Selection of Articles

Articles were selected for a full-text review based on the following pre-set eligibility criteria: (a) validation papers evaluating psychometric properties of EQ-5D (EQ-5D-3L or EQ-5D-5L) in asthma; (b) clinical (also termed non-validation) papers that collected and reported EQ-5D outcome data of asthma subjects; (c) a sample with at least 80% asthma diagnosis if analyses were not performed separately for each of the included health conditions; (d) human subjects with asthma aged 12 years and above; and (e) original research involving observational and interventional investigations. We excluded mapping studies and studies that modified the HRQoL instrument or focused mainly on the HRQoL of caregivers, as well as conference proceedings, study protocols, trial registrations, reviews, editorials, personal opinions/commentaries, guidelines, book chapters, and articles with full text unavailable in English or Chinese.

2.4 Data Extraction

Data were extracted using a pre-designed and pilot-tested Excel sheet with the following variables: (1) first author; (2) publication year; (3) list of countries; (4) total sample size; (5) gender proportions; (6) survey language(s); (7) survey administration mode; (8) EQ-5D value set; (9) EQ VAS version; (10) types of EQ-5D measures (index, LSS, items, VAS); (11) measurement properties of EQ-5D collected; (12) concurrent HRQoL instruments collected; (13) type of intervention; (14) monitoring/treatment intervals; (15) number of studies (per COSMIN’s nomenclature, a single study refers to one hypothesis testing of the scale’s measurement property, so each included article can comprise two or more studies or tests); (16) types of measures (correlation coefficient, Cohen’s d, standardized effect size [SES], standardized response mean [SRM]) and their results; (17) measures of central tendency (means or medians), distributions (standard deviations, standard error, variance), and group sample sizes to manually calculate the Cohen’s effect sizes and SES (or SRM) if they were not presented in the primary articles [27, 29, 30].

2.5 Hypotheses Generation

Based on existing knowledge drawn from the literature and iterative discussions between the two authors of this paper (AP and LN), a priori hypotheses for each EQ-5D measure were formulated to assess: (1) correlational relationships between EQ-5D and AQLQ and/or its preference-based index, Asthma Quality of Life-5 Dimensions (AQL-5D) using the correlation coefficient (convergent validity); (2) relational differences between EQ-5D and asthma-specific instruments using Cohen’s d minimally important difference (MID) thresholds among established clinical groups based on disease severity, control, and treatments (known-group validity), or using SES/SRM/MID thresholds among those whose health status have changed with interventions (responsiveness) [ESM]. The expected magnitudes and directions of relationships were pre-specified in all the hypotheses for the three measurement properties. The strengths of the effect sizes were pre-defined as follows: (1) correlation coefficient < 0.1: very weak; 0.1 to < 0.3: weak; 0.3 to < 0.5: moderate; ≥ 0.5: strong; (2) MID of EQ-5D index ≥0.03; MID of VAS ≥ 5; MID of EQ-5D item (% change in “no problem” response) ≥ 5% determined arbitrarily; and (3) Cohen’s d/SES/SRM: 0.2 to < 0.5: small; 0.5 to < 0.8: moderate; ≥ 0.8: large [6, 31,32,33,34].

2.6 Data Analysis

The magnitude and direction of the effect size for each study were compared against the corresponding pre-determined hypothesis threshold. The test is considered a “pass” if it meets the a priori threshold. The numbers of passes were totaled to generate proportions of fulfilled (or satisfied) hypotheses for CV (combining convergent and known-group) and responsiveness. A proportion of 75% and above is rated “sufficient” for the specific measurement property per COSMIN guidelines [6]. We analyzed the hypothesis testing data of each EQ-5D measure for validation and clinical (non-validation) evidence, separately.

2.7 Data Synthesis and Statistical Analysis

All effect size estimates (correlations, Cohen’s d, and SRM) available for head-to-head comparisons between EQ-5D measures and AQLQ/AQL-5D were aggregated using the inverse variance method, the most used statistical tool in meta-analytics to generate larger statistical powers and establish more precise estimates of the target effect sizes. We applied random-effect modeling to retrieve estimates of the effect sizes due to anticipated potential heterogeneity. Potential causes for heterogeneity determined a priori included different geographical locations in which the studies were conducted as well as different languages, modes of administration, VAS versions, and value sets used among the selected studies. The pooled estimates were expressed as overall mean weighted values with 95% confidence intervals and presented schematically using forest plots. I2 statistics were used to test the degrees of heterogeneity in the effect magnitudes across studies.

We further computed Cohen’s d and SRM ratios by dividing the aggregated estimate of the relevant EQ-5D measure by that of AQLQ or AQL-5D to quantify the relative efficiency of EQ-5D measures against the disease-specific measures. We performed the following sensitivity analyses: (1) using the alternative fixed-effect model and (2) repeating Cohen’s d and SRM ratios computations with aggregated estimates pooled from the highest and lowest values extracted from each article. The R Statistical Software (version 4.1.2; R Core Team 2021) was used to analyze all the data. A two-sided p-value of 0.05 or less indicated statistical significance.

2.8 Quality of Study, Scale, and Overall Evidence Assessment

We adapted the risk of bias criteria from the COSMIN manual to evaluate the design standards of each study. Figure 1 outlines the details of the criteria used to assess the quality of each study as well as the overall quality of the EQ-5D scale and evidence. Evaluating the qualities of studies is important in interpreting and synthesizing the evidence, especially when there are discrepant results among the studies, as it allows better-quality studies to be weighed higher.

Fig. 1
figure 1

Flow diagram showing studies’ measurement properties & scale assessments based on COSMIN recommendations. CV convergent validity, KG known-group validity, Mod moderate, ROB risk of bias

3 Results

We present the systematic process of searching and selecting the articles containing the relevant studies in a PRISMA flow diagram (Fig. 2). The six online databases collectively retrieved 1391 records of which 765 were duplicates. A total of 30 clinical and seven validation articles were selected following the removal of the duplicates and 589 articles that did not meet the selection criteria [4, 21, 23, 35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68]. We identified 493 studies (hypothesis tests) from the 37 selected articles to evaluate CV (n = 428) and responsiveness (n = 65) of the EQ-5D scale. A schematic breakdown of the types and numbers of the articles and studies is shown in Fig. 1.

Fig. 2
figure 2

PRISMA flow diagram. PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses

Characteristics of the clinical and validation articles with their respective studies are summarized in the ESM. The vast majority of the studies were conducted in Western geographical regions (CV: 94.9%; responsiveness: 93.8%) with either English or non-English being the only language of administering the EQ-5D (English: 42.3% [CV]; 9.2% [responsiveness] vs non-English: 55.9% [CV]; 72.3% [responsiveness]). Multiple languages (including English) were used in seven CV studies and 12 responsiveness studies, and the scale was self-administered rather than interviewer administered in most of the studies (CV: 76.3%; responsiveness: 87.7%). The later EQ VAS version that was developed in 2018 was used in 17.4% and 50.8% of the CV and responsiveness studies, respectively.

Out of the 493 a priori hypotheses tested for assessing CV and responsiveness, 78.4% and 76.9% were satisfied, respectively. The overall proportions of satisfied hypotheses were higher based on data extracted from validation compared with clinical articles for both CV (83.5% vs 76.0%) and responsiveness (89.5% vs 71.7%). Table 1 shows the hypothesis testing results for CV and responsiveness in each EQ-5D measure (index, LSS, five items, and VAS) and the quality of the measures analyzed separately for validation articles, clinical articles, and all articles combined. In the overall analysis, sufficient CV was found for all EQ-5D measures, except for the EQ-5D item “mobility” where the evidence was inconsistent. In contrast, responsiveness was sufficient for only the VAS and the two EQ-5D items (“usual activities,” “pain/discomfort”) while the evidence was inconsistent for the EQ-5D index and the remaining three EQ-5D items (“mobility,” “self-care,”anxiety/depression”). The overall methodological quality was rated “very good” or “adequate” in 78.2% of the CV tests and 92.3% of responsiveness tests. All the CV and responsiveness tests in validation studies were rated “very good” or “adequate”, whereas in clinical studies, the proportion rated similar were 88.9% and 89.1% for CV and responsiveness tests, respectively. The overall quality of evidence for CV was high in the EQ-5D index, LSS, and VAS but moderate in each of the five EQ-5D items. For responsiveness, the evidence quality were high for all five EQ-5D items and the VAS, but moderate for the EQ-5D index (Table 1). The quality of evidence for these measurement properties of each EQ-5D measure is summarized separately for validation and clinical studies in Table 1.

Table 1 Quality of EQ-5D measurement properties and evidence (no. of studies = 493)

We pooled the results of the studies quantitatively as the quality evidence of most studies was assessed to be adequate to good. The pooled correlation coefficient between the EQ-5D index and AQLQ total (scores) was 0.52 (95% confidence interval 0.43–0.59), and between VAS and AQLQ total was 0.53 (95% confidence interval 0.34–0.69) (Fig. 3). We were not able to do so for LSS and AQLQ total as no correlation data between these two scales were available for extraction in any of the selected articles. The pooled SRM and Cohen’s d estimates for pairwise comparisons between EQ-5D measures and AQLQ total (or AQL-5D index) are presented as forest plot graphs in Figs. 3 and 4, respectively. The SRM ratios for the EQ-5D index and VAS compared to AQLQ total were 0.26 (n = 11) and 0.63 (n = 9), respectively. The Cohen’s d ratio of EQ-5D index, LSS, and VAS to AQLQ total was 0.56 (number of tests, n = 27), 1.16 (n = 16), and 0.75 (n = 37), respectively. The Cohen’s d ratio for the EQ-5D index compared to the AQL-5D index was 0.49 (n = 5). Results of the sensitivity analyses are presented in the ESM.

Fig. 3
figure 3

Forest plots showing pooled correlation and standardized response mean (SRM) estimates for pairwise comparisons between EQ-5D measures (index, visual analog scale [VAS]) and Asthma Quality of Life Questionnaire (AQLQ) total. CI confidence interval, SE standard error

Fig. 4
figure 4

Forest plots showing pooled Cohen’s d estimates for pairwise comparisons between EQ-5D measures and Asthma Quality of Life Questionnaire (AQLQ) total (or Asthma Quality of Life-5 Dimensions [AQL-5D] index). CI confidence interval, LSS level sum score, SE standard error, VAS visual analog scale

4 Discussion

The generic EQ-5D enables comparisons of HRQoL across populations. However, this assumes it demonstrates acceptable psychometric properties within the involved populations. This systematic review was performed to assess two key psychometric properties of EQ-5D, namely CV and responsiveness, in measuring the HRQoL among patients with asthma. We identified 481 studies, each testing a hypothesis on one of the two psychometric properties (CV and responsiveness) of an EQ-5D measure in asthma, i.e., index, LSS, VAS, or dimensional items. Overall, we found at least moderate-to-high evidence that most EQ-5D measures possess sufficient quality in CV and responsiveness among patients with asthma. However, exceptions include: (1) inconsistent quality in responsiveness for index, and “self-care” and “anxiety/depression” items; (2) insufficient quality in CV and responsiveness for the “mobility” item, and (3) an unknown quality in responsiveness for LSS as there was no relevant study available for assessment.

The narrative systematic review by Pickard et al., encompassing seven papers, found the responsiveness of the EQ-5D index to be modest (SRM 0.29 and 0.32) at best in asthma observed in two of the seven papers [11]. Both previously published systematic reviews concluded EQ-5D (index and VAS) to be predictive of asthma severity [11, 18]. In our review, only the “mobility” item failed the test for CV. This is not surprising given the younger age profile of patients with this condition compared with other chronic obstructive airway diseases such as chronic obstructive pulmonary disease. Additionally, asthma does not directly impact ambulation unlike orthopedic, rheumatologic, or neurologic conditions affecting the joints, musculoskeletal and/or nerve structures, which directly control ambulatory movement. Instead, ambulation may be limited by a high level of dyspnea especially during an acute severe flare of symptoms, during which patients would likely not have been able to participate in the survey. In contrast to CV, the EQ-5D index measure fell short of sufficient responsiveness, though by just a small margin. This could either be due to the smaller number of studies (n = 15) available for analyses as compared to those (n = 115) for CV or because the five dimensions (items) are not sufficient to capture changes in asthma-related health status. Out of the five items, only “pain/discomfort” and “usual activities” showed sufficient responsiveness. Intuitively, both items are deemed the most relevant given the bodily discomfort conferred by asthma symptoms such as breathlessness, wheezing, cough, and chest tightness, and their direct impact on daily routine activities. Unlike “mobility” and “self-care”, “pain/discomfort” and “usual activities” can be wide ranging and therefore more encompassing, allowing the assessment of a varying degree of effort.

The correlations of the EQ-5D index and VAS with AQLQ total marginally crossed into the strong category in our analyses, supporting the need for further work to improve the psychometric properties of EQ-5D. Likewise, the Cohen’s d and SRM estimates of the EQ-5D index and VAS were comparatively lower than those of AQLQ total and/or AQL-5D index. The comparative performance of the EQ-5D index against the asthma-specific scales was dismal when contrasted with VAS, especially for SRM in assessing relative responsiveness. These trends persisted even in sensitivity analyses using alternative fixed modeling or restricting the analyses to both the nadir and highest values. The only exception was LSS, which appeared to be on par with AQLQ total. The underlying cause for this discrepancy cannot be ascertained from this study and warrants further exploration. Regardless, it is worthwhile to note that unlike the other more popular EQ-5D measures, particularly the index score, the LSS is less often applied in practice, and currently does not play any active role in healthcare economic evaluations. However, the LSS could be useful for clinical use because of its simplicity, ease of interpretation, and comparability across countries.

While it is not surprising that EQ-5D is less sensitive and responsive than asthma-specific measures, adding “bolt-on” items to EQ-5D may boost its psychometric performance in patients with asthma. “Bolt-ons” are additional dimensions that are attached to EQ-5D, supplementing its five core items, and expanding its descriptive classifier. They can improve the content validity of EQ-5D, making it better suited for assessing specific disease conditions [69]. Recently, a breathing bolt-on has been developed, shown to enhance the CV of EQ-5D-5L in chronic obstructive pulmonary disease, a chronic respiratory airway disease of distinct pathophysiology from asthma, primarily affecting older adults [70]. Although “bolt-ons” may improve the performance of EQ-5D, their use may lessen the comparability of EQ-5D across different health conditions and may generate an overlap in measurement if the “pain/discomfort” dimension in EQ-5D sufficiently captures symptoms of the condition (e.g., breathlessness in asthma).

There are limitations in our review that need to be considered. One limitation was that we may have missed some articles during our searches. To overcome this, we also reviewed the references of included articles for relevancy. It would have been ideal to evaluate the instrument by country, language, and modes of administration. However, this was not feasible because of the limited number of studies in each population subtype. We included only articles published in English and, as such, there may exist a selection (language) bias. In addition, we combined the analyses of studies using newer and older versions of the EQ-5D (levels and VAS). The newer 5-level EQ-5D and VAS versions are thought to possess better psychometric properties than the older versions, therefore we might have underestimated the comparative performance of EQ-5D-5L with AQLQ and AQL-5D. We also did not analyze the differences between EQ-5D-3L and EQ-5D-5L as there were inadequate studies to compare the two in all the properties of interest and it was not the objective of our study. However, filtering out EQ-5D-3L and older VAS and repeating the analyses through sensitivity testing did not appear to change the results and conclusions of our review. The heterogeneity statistics and dissimilar estimate yields from the meta-analysis supported the presence of significant heterogeneity. This was inevitable given that the pooled studies originated from various continents (North America, Europe, the UK, Australia, and Asia); patients were recruited in both hospital and community settings with the EQ-5D being administered in multiple languages. Although this attested to the widespread adaptation of EQ-5D as a generic preference-based measure and HRQoL measuring instrument, it contributed to substantial heterogeneity; we attempted to mitigate this statistically by specifying a random-effect model in the meta-analysis. However, it is noteworthy to highlight that despite the diverse geographical study sites, fewer than 10% of the studies were conducted in Asia. As such, we will need to extrapolate the results to the Asian context with caution. In addition, different preference (utility) weights were used, with some sites adopting non-native weights, which could result in inaccurate index values, as these values are influenced by the selection of the value set and valuation method. Last, it stands to reason that a longer condition (asthma)-specific measure with multiple dimensional items specifically designed for this population will outperform EQ-5D, a generic preference-based measure with only five items meant originally for economic evaluations. Nevertheless, our study provided further evidence to support the need for effort in improving the psychometric performance of EQ-5D in asthma.

Despite the limitations, there are several strengths in this review. By adhering to the COSMIN guidelines in assessing the quality of the psychometric studies, we were able to arrive at our conclusions using the best quality evidence. We employed the latest updated version of the COSMIN guidelines, which provided a clearer systematic structure and interpretation of criteria. It is recommended that researchers use COMIN tools in carrying out psychometric studies in asthma to increase the internal validity of the results. Additionally, we had included clinical (non-validation) studies of EQ-5D. These are experimental and application types of studies such as clinical trials and interventional longitudinal studies, and they are available in greater quantity in the literature than validation studies. Although they may not have explicitly examined the psychometric properties of EQ5D, we were able to utilize more data to complement the limited data from validation studies.

5 Conclusions

In this systematic review, we found that EQ-5D measures generally exhibit validity and responsiveness in measuring HRQoL in asthma, although they were lower when compared with condition-specific measures. We advocate for further efforts to enhance its psychometric properties for use in asthma research, while acknowledging its advantages of brevity and wide adaptation.