Introduction

Quality-adjusted life years (QALYs) are a commonly used outcome measure in economic evaluations of interventions to manage diabetes and its complications. The weights used to calculate QALYs (often referred to as utility scores) are based on preference-based measures of health-related quality of life (HRQoL). These can be determined directly from patients through techniques such as the ‘time trade-off’ (TTO) method or indirectly using generic instruments such as the EQ-5D. There are now a substantial number of studies reporting utility scores for people with diabetes and for common complications associated with the disease. Typically, these utilities are used in economic evaluations of interventions for the prevention or management of diabetes and often involve use of either Markov modelling or discrete-time simulation to estimate QALYs with and without the intervention [13]. In this context, it is important to consider the appropriateness of values of particular health states when there is often considerable heterogeneity of utility scores reported in the literature.

Currently, most models use utility values from a single study and often use the same source of information for different types of complications. Meta-analysis allows us to systematically synthesise information on quality of life from many different studies. Not only does this facilitate use of summary scores when determining the impact complications can have on quality of life, it also provides a range of values that could be used when testing model sensitivity. Hence, meta-analysis is likely to become integral to diabetes and other health economic modelling in the future.

The main purpose of this study is to undertake a systematic review of preference-based measurements of QoL in patients with diabetes and with common diabetes-related complications (stroke, myocardial infarction, blindness, end stage renal disease (ESRD), amputation and ulcers). Meta-analysis is employed to estimate summary measures of key utilities for people with diabetes. We also examine whether there are systematic differences associated with direct and indirect approaches to measuring utility or between different generic QoL instruments when measuring utility. Using a diabetes simulation model, we examine the degree to which the use of different utility values may impact on lifetime estimates of QALYs. Finally, we investigate the effect different utility values may impact on incremental QALYs gained by simulating a theoretical diabetes therapy.

Methods

Study selection

A literature search was carried out to identify potentially relevant studies reporting utility scores for diabetes and diabetes-related complications (either directly elicited or generated from generic QoL instruments). Databases used in the search included OVID, MEDLINE, EMBASE, CINAHL, PubMed as well as Cochrane’s systematic reviews database. Other databases searched included the Health Technology Assessment website, Health Economic Evaluation Database (HEED), NHS Economic Evaluation Database, the TUFTS CEA register, the Digital Theses Database and Google Scholar. Bibliographies of review articles and articles that reported utility values were also examined, as well as articles which included the reported articles in their citations list. These were examined through Google Scholar and PubMed.

Our review was confined to preference-based measures of HRQoL. These included the index scores from the Euro-Qol (EQ-5D) [4], Health Utilities Index 3 (HUI3) [5] and SF-6D score [6] as well as directly elicited utility values from time trade-off (TTO) or standard gamble (SG) [7] exercises. Inclusion criteria were as follows: articles must be published before the end of 2009 in English in peer reviewed journals. In addition, all study subjects were required to have type 1 or type 2 diabetes and had to be 18 years or older at the time their QoL was elicited. Search terms (detailed in an appendix of ESM) included both keywords relating to major diabetes-related complications and health state valuation keywords.

Abstracts and the full text of all articles were examined with regards to these criteria by two independent reviewers to determine whether they fulfilled the inclusion criteria and the correct estimates were reported. All discrepancies between reviewers were resolved.

Data extraction

Information on mean (standard error) of reported utility scores for diabetes patients and a number of selected health states was extracted from each study. These included a history of: (1) myocardial infarction; (2) stroke; (3) ulcer; (4) amputation; (5) diabetic retinopathy or blindness; (6) end stage renal disease; (7) no complications. The following information was also extracted from each study for use in the meta-regression: (1) sample size of the study; (2) mean age of study participants; (3) proportion of males and (4) method of QoL elicitation. If data were not reported on any of these characteristics, studies were excluded from the meta-regression but included in the meta-analysis [813]. Studies that did not report a measure of variance around the estimated mean utility value were excluded from both the meta-analysis and meta-regression.

Meta-analysis and meta-regression

Meta-analysis was used to combine the results of multiple studies into a single overall value often termed the effect size for each health state. We performed random effects meta-analyses using the metan command in STATA [14]. Meta-analyses were conducted for seven health states listed above as well as an overall analysis for the presence of diabetes (categorised as general diabetes). For the seven health states, we ignored potential within-study correlation in estimates because of the small numbers of studies and estimates (fewer than 10 studies and estimates in each case). However, for the meta-analysis of general diabetes patients, we conducted an additional analysis to account for potential within-study variability using the methods described in Hedges et al. [15] implemented using STATA. Since this method requires specifying the correlation between estimates within-studies, we conducted a sensitivity analysis by varying this correlation from 0 to 0.9 in increments of 0.1.

For patients with diabetes, a random effects meta-regression model was used to examine heterogeneity of utility values across study characteristics. We conducted two analyses: the first used standard meta-regression methods in STATA (through the metareg command) and the second used the methods described in Hedges et al. [15]. Again, we conducted a sensitivity analysis by varying this correlation from 0 to 0.9 in increments of 0.1. Meta-regression allowed for pooling of utility scores while simultaneously accounting for variation in study methods [16]. Studies determining separate QoL measures pertaining to different patient groups, e.g., trial based studies that reported separate outcomes for treatment and control groups [1723], or studies that compared different populations [2432] were included as separate observations in the meta-analysis and the meta-regression (provided they included study characteristics of the different population groups). In the meta-regression, the utility value was the dependent variable. Study characteristics hypothesised to be associated with the treatment effect [14] were as follows:

  • Number of participants in the study;

  • Average age of the sample respondents;

  • Proportion of males;

  • Method of QoL elicitation using the following categories (i) TTO & SG; (ii) SF-6D and HUI-3 scores; EQ-5D as the reference;

Simulations using reported utility estimates

Using an existing diabetes simulation model, the UKPDS Outcomes Model [1], we investigated the impact on lifetime QALYs of using different utility values as determined in the meta-analysis. The input population for the model simulations was a cohort of 10,000 identical patients (to reduce Monte Carlo error), aged 65, male, non-smoking and with mean clinical risk factors as determined from a recent large diabetes study [32]. To estimate the base case for all comparisons, patients were run through the simulations in annual cycles for a period of 35 years, and QALYs were determined using mean values of utility for each state, presented in Table 1. If a patient experienced multiple complications during the simulation, the utility score was set to the lowest value of all the health states they had previously experienced.

Table 1 Reports the number of studies, observations, mean utility score, the range of utility values and the mean number of patients and range for the meta-analyses of the 8 health states

We then investigated by simulating the impact of using minimum and maximum reported utility scores for each of the following states both separately and then combined: no complications; myocardial infarction, stroke, amputation, blindness and end stage renal disease. We report the results as the difference in QALYs compared to the base model.

Secondly, we investigated the impact on calculated QALYs of using maximum and minimum reported utilities for particular patient groups with each complication. For these simulations, the input dataset was changed to represent a cohort of patients who had all experienced one of the complications. The effect of maximum and minimum reported utility scores on QALYs was determined by simulation, and the results were presented as differences in QALYs compared to the reference case, using mean utility scores.

Lastly, we investigate how choice of utility may impact on incremental QALYs gained as the result of an intervention. For this application, we simulate a theoretical diabetes therapy by reducing initial Hba1c by 1% of the mean level [32] and maintaining this level of Hba1c throughout the patients’ lifetime. Incremental QALYs for the intervention are estimated using three sets of utility scores: the baseline set of utility scores, the maximum and minimum utility score sets across all states.

Results

Figure 1 reports a flow chart of the literature search. Our initial search identified 9,492 studies with which 9,191 were excluded after an examination of the abstract. The full text of 301 studies was retrieved for further review. From the 301 studies, 126 were duplicates (studies which appeared more than once in our search results), 50 were eliminated as they reported HRQoL scores from another primary source, 26 were removed because they used measures that did not meet the inclusion criterion and a further 23 studies were eliminated because they did not report sufficient information about the utility scores of the outcomes of interest. Of the 76 studies left, 30 did not report a measure of variance around the reported utility to be included in the meta-analysis or reported other statistics such as median utility values [33]. One further study was omitted [34] from our analysis due to the standard deviation being implausibly high. The final meta-analysis for general diabetes used 45 studies and 66 observations, and the meta-regression was based on 40 studies and 59 observations.

Fig. 1
figure 1

Flow chart of the literature search, results, reasons for excluded studies and number of included articles in the final meta-analysis

A full listing of the studies used in the meta-analysis of overall utility score for people with diabetes is reported in Table A in the appendix of ESM. In regard to elicitation methods, 33 studies used the EQ-5D; 8 used the HUI 3 or SF-6D each; 15 studies used TTO and 2 used SG. For the meta-regression, which involved only studies reporting patient and study characteristics, the number of respondents ranged from 22 to 7348 with an average of 806. Using a weighted (by sample size) average, there were 52.7% men (range 25–99%) and the weighted average age of patients was 62.6 years (range 37–77 years).

In regard to reporting utility values for studies of patients with a history of complications (see Table B in the appendix of ESM), the number of studies ranged from 4 for ESRD to 6 for blindness, with 5 studies for myocardial infarction, stroke, ulcer and amputation. The average number of respondents per study ranged from 128 (for myocardial infarction) to 284 for stroke and blindness. In addition, seven studies with an average of 578 respondents indicated that the patient had no history of complications.

Summary information for studies reporting utility values for the seven diabetes complications and for no history of complications are presented in Table 1. There was considerable heterogeneity in the number of patients in each study and the utility values presented. Overall, the EQ-5D was the most frequent method of eliciting preference-based measures of HRQoL.

Figure 2 reports 66 utility scores for diabetic patients in 45 studies, ordered by date of publication [813], [1732], [3558]. The QoL scores ranged from 0.53 [24] to 0.88 [41], with an inter-quartile range of 0.71–0.9. The mean utility value from the random effects meta-analysis was 0.76 (95% confidence interval 0.75–0.78). There was considerable heterogeneity in the utility values (I2 = 98.4%; Q = 4157.9, degrees of freedom = 65, P < 0.001; between-study estimate of variability = 0.003).

Fig. 2
figure 2

A forest plot of the overall meta-analysis of 45 studies (66 observations) reporting utility scores and confidence intervals of each study for general diabetes patients

We used a within-study correlation of 0.8 in our models, because the sensitivity analysis showed little difference across the range of values from 0.1 to 0.9. Accounting for within-study correlation, the overall utility value was 0.76 (95% confidence interval 0.74–0.79).

The results of the meta-regression to examine heterogeneity in the 41 studies reporting overall measure of utility for people with diabetes are reported in Table 2. By way of interpretation for every 10 year increase in the average age of participants, the average utility score declined by 0.06 points. For every 10% increase in the proportion of males, the average utility score increased by 0.01. In regard to method of elicitation, the mean utility for studies that used TTO/SG was 0.07 higher than for studies that used the EQ-5D, and studies that used other generic methods such as the HUI and SF-6D produced lower mean utility scores than the EQ-5D by 0.08. The estimate of between-study variability was 0.003.

Table 2 Results of the random effects meta-regression of factors influencing utility scores of the general diabetic population having accounted for within-study correlation

The estimates from accounting for within-study correlation were similar to those from the standard meta-regression (see Table C in appendix of ESM). Without accounting for within-study correlation, average utility score decreased by 0.05 points for every 10 year increase in the average age of participants; for every 10% increase in the proportion of males, average utility increased by 0.01; studies that used TTO/SG as their method of elicitation produced a higher utility score by 0.07 when compared to the EQ-5D, and studies that used HUI/SF-6D had lower utility scores than EQ-5D studies by 0.08 points.

The variation in estimated QALYs as a result of using maximum and minimum reported utility values are presented in Fig. 3. These values refer to a representative 65-year-old male diabetes patient who initially does not have any complications. Changing individual utility scores of selected complications resulted in a change in simulated outcomes of between −0.04 and 0.24 QALYs. However, changes to the no complications state produced larger changes of around ± 1 QALY. Setting the utility scores for all states to their maximum or all states to their minimum values produced changes of +1.06 and −1.11 QALYs respectively.

Fig. 3
figure 3

Difference in simulated quality-adjusted life years using maximum and minimum utility scores from the literature. All differences are with respect to the base model using average utility scores from the meta-regression, which predicts a quality-adjusted life expectancy of 9.46 years

Table 3 shows the variation in outcomes derived from a simulation model when all patients start with a pre-specified complication. Differences in life expectancy when using the maximum and minimum reported utility scores for the complication ranged from 0.78 QALYs (myocardial infarction) to 5.16 QALYs (stroke). Outcomes are subject to much greater variations as all patients in the simulation are affected by the utility scores. Stroke (whose utility scores ranged from 0.31 to 0.79) produced the largest variation in predicted QALYs ranging from 3.56 to 8.72. In comparison, diabetes patients with myocardial infarction (utility range 0.68–0.77) predicted QALYs ranging from 6.49 to 7.27.

Table 3 Results of simulation model using maximum, minimum and mean utility scores for each diabetic complication reporting QALE (years) for each score and the difference in QALE from the base case

The prediction of incremental undiscounted QALYs gained as a result of a hypothetical intervention that reduced HbA1c (glycosylated haemoglobin) by 1% point produced relatively small effects when comparing the use of mean, maximum or minimum reported utility values (Mean of 0.31 (range 0.30–0.32). This is reported in the appendix (Table C) of ESM.

Discussion

This systematic review has shown that there is a wide range of utility values of diabetic patients for both overall HRQol and for some major complications. The meta-analysis results suggested a high degree of heterogeneity in reported utility values, and the meta-regression results indicate that this is in part due to variations in average patient characteristics such as age, the proportion of males and the methods used for eliciting QoL values. In this meta-analysis (having accounted for within-study correlation), the overall average utility for people with diabetes was 0.76 (0.75, 0.77) and the average values for individual health states ranged from 0.81 for diabetic patients with no complications to 0.48 for patients with ESRD.

This study builds on some recent reviews of QoL for people with diabetes. Mills et al. [59] and Imayama [60] both conducted a systematic literature review of utility scores for patients with diabetes mellitus, whilst Cochran’s [61] meta-analysis was based upon the QoL outcomes for diabetes patients following self-management training. However, these prior studies involved many non-preference-based QoL scores and are therefore less relevant to health economists. Also, this is the first analysis we are aware of to specifically focus on summary scores for a range of diabetes complications in addition to overall QoL of people with diabetes.

The meta-regression results indicated that heterogeneity of utility scores between studies was partly due to age, proportion of males in the study and elicitation method. Due to the lack of patient characteristics reported by studies, some factors, which have been shown to be associated with health utility in diabetes (type of therapy, duration of diabetes, obesity, co morbidities, etc.), were unable to be included. These characteristics potentially can affect utility scores and studies need to consistently report these about their respective populations. The meta-regression results were similar to the EQ-5D population norms in the UK and in the USA [62, 63]; in that lower utility values are associated with older populations and for populations with a higher proportion of women. Similarly, the utility scores obtained from TTO/SG methods were greater than the scores obtained from HUI3/SF-6D scores [16] and larger than EQ-5D scores [64, 65].

There have been reported analyses of the utility scores from the general population associated with some of the events examined in this study. Tengs & Lin’s [16] QoL meta-analysis score for a moderate stroke was 0.68, whilst Leungo–Fernandez [65] found a moderate stroke score to be 0.63. In comparison, our study did not classify the severity of stroke but scored a slightly lower score of 0.59. Liem et al.’s [66] QoL estimate (TTO studies only) for patients undertaking peritoneal dialysis of 0.50 compared to our ESRD estimate of 0.48. The summary scores reported here appear generally lower than those based on the general population, which raises the important question as to whether diabetes actually has an additional impact on QoL. This is an important area for further research.

In regard to the use of these values in simulation models, the range in reported utility values do generate differences in QALYs, but the simulated range is generally less than one year for people who initially do not have any diabetic complications. The variation is much greater for patients with a prior history of these complications. For example, using the mean summary scores obtained from the meta-analysis, a 65-year-old male patient with stroke could expect a quality-adjusted life expectancy of 6.71 QALYs. However, if the maximum reported value from the literature was used, the expected QALYs would increase by 2.01 years, and if the minimum reported value was used, it would decrease by 3.15 years. Approximately 50% variation in remaining lifetime QALYs. Despite the high variation in lifetime QALYs obtained by using different sets of utility values, the incremental QALYs gained as a result of a theoretical therapy to reduce Hba1c only had slight variation regardless of which utility scores were used. This suggests the variation in utility values may have less of an impact on evaluative studies measuring incremental effects than studies looking into the burden of disease.

A motivation for this study has been the lack of reference values of utilities for diabetes and complications of diabetes that are used in economic evaluations of diabetes prevention and management. Cost-utility analyses may take the form of spreadsheet calculations, Markov modelling or more complex computer simulation modelling, but currently there seems to be no clear procedure by which to determine the most appropriate utility values to use in an analysis. In this context, an advantage of undertaking a meta-analysis is that it provides both an average value as well as extreme values that could be used in a sensitivity analysis. Such an approach could improve the comparability of models, as well as eliminate the possibility that a particular utility value has been chosen to produce a desired outcome. By providing a range of values alongside the summary utility scores, this could be useful in helping inform outcomes used in economic evaluations and policy analysis. This is particularly important in the field of diabetes as there is considerable heterogeneity in the utility values between studies. The greater use of meta-analysis of HRQoL outcomes would seem particularly important when they are used to help inform economic evaluations for re-imbursement decisions of new diabetes therapies and technologies (such as by NICE [67] and PBAC [68]).

In conclusion, this study represents one of the first meta-analyses of preference-based outcomes for diabetes patients and diabetic complications. There was a large range of utility values for diabetes patients, and its complications found in the literature search due in part to the average age, proportion of males in the study and the elicitation method. These can produce substantial differences in QALYs estimates for people with diabetes, particularly those experiencing major complications. However, our results of testing a hypothetical new diabetes therapy showed that the heterogeneity of utility values had a lesser impact on the incremental QALYs gained for a new diabetes therapy or intervention.