Introduction

With healthcare resources under pressure in many countries, decision-makers are increasingly looking to evidence from economic evaluations. Alcohol consumption is a major contributor to the global burden of disease and carries with it a substantial economic burden [1, 2]. Although brief interventions for high-risk drinkers can be effective [2], and cost-effective compared to no intervention [3], their ability to compete with other healthcare interventions may be limited by the choice of suitable primary end points to measure both effectiveness and cost-effectiveness.

Consumption end points allow for thorough evaluation of alcohol programmes [4] and are widely used. However, consumption itself is not a direct cost [4] and there is considerable variability in consumption measures used [5]. Consequently, standardised alcohol measures are recommended [6, 7]; however, these do not facilitate decisions between competing alcohol and non-alcohol programmes. For this reason, the National Institute for Health and Clinical Excellence for England and Wales recommends a generic health outcome measure, the quality-adjusted life year (QALY) [8]. Nevertheless, QALYs are rarely used in alcohol research [6, 9]. A recent systematic review of economic evaluations of alcohol treatments [6] identified only two primary studies that calculated QALYs [10, 11], both of which used a common generic health-related quality of life (HRQoL) instrument, the EQ-5D. However, in neither study did the tool seem sensitive to changes in alcohol consumption.

The EQ-5D assesses current health states across five dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression with three levels: none, some or extreme problems [12]. A second section of the EQ-5D includes a visual analogue scale (VAS) on which current health from 0 (worst imaginable health state) to 100 (best) can be recorded. Questionnaire responses to the first section of the EQ-5D can be converted into a single summary index score using preference valuations of different combinations of the EQ-5D dimensions derived from population surveys. In the UK, preference values are most commonly taken from a 1993 survey (n = 3,337), where 45 of the possible 243 health states were valued, with multivariate methods used to estimate preference valuations for the remaining health states [13, 14]. The resulting index scores ranged from −0.59 (severe problems across all five dimensions) to 1.00 (no problems on any dimension) [13], and these scores can in turn be converted to QALYs by multiplying the length of time spent in that health state [15].

A number of studies have used the EQ-5D to value alcohol-related health, mainly among alcohol treatment populations, rather than general population samples. In cross-sectional studies, alcohol treatment populations have demonstrated poorer HRQoL compared to general population norms [16, 17]. Prospective cohort studies have followed up treatment populations for up to 18 months, yet none have demonstrated a significant change in EQ-5D, despite some showing substantial improvements in alcohol consumption and other outcomes [10, 1821]. However, these studies may have been hampered by small sample sizes (n = 6–617).

Several studies using general population samples have valued alcohol-related health using the EQ-5D, with much larger sample sizes than the treatment studies (range 2,400–42,000), whilst adjusting for confounding factors such as age, gender and socio-economic status [2225]. All were, however, cross-sectional in design. High levels of current or past alcohol consumption were associated with lower EQ-5D index scores [22, 24, 25], particularly relating to anxiety/depression problems [23, 24], though in one study, adjustment for confounding factors accounted for these differences [22].

Although a number of studies have valued alcohol-related health using the EQ-5D, these studies have several limitations. Most focused only on those in treatment, a group that represents a small proportion of harmful drinkers in the UK [26]. In addition, these treatment population studies were mainly characterised by small sample sizes. The few studies in general populations were larger and included controls for confounding factors, though were not able to provide information about change in alcohol consumption or EQ-5D scores. The purpose of this exploratory study was twofold: first to investigate the ability of the EQ-5D to discriminate between levels of alcohol risk in a large sample of hazardous and harmful drinkers, and second to explore the relationship between transitions between alcohol risk levels and changes in EQ-5D index scores at 1, 3 and 12 months whilst controlling for confounding factors.

Methods

Sample

The data for this analysis were taken from an Internet-based randomised controlled trial for hazardous and harmful alcohol consumers. The Down Your Drink (DYD) trial was conducted entirely online, including recruitment, randomisation and delivery of the interventions (psychologically enhanced website vs. website with information only) [27]. Participants in the trial were therefore Internet browsers, who found the DYD website via Google, whilst searching for information or help in reducing their drinking between February 2007 and May 2009 [27, 28]. People were invited to take part in the trial if they scored five or more on the shortened version of the Alcohol Use Disorders Identification Test (AUDIT), the AUDIT-C, indicating hazardous or harmful alcohol consumption [28]. The AUDIT-C comprises the first three questions of full AUDIT [29] and is a well-established brief screening tool [30]. The majority of participants were white British and UK residents; however, due to the online nature of the trial, 12 % of respondents were from approximately 70 other non-UK countries [27].

Ethics approval for the DYD trial was granted by the University College London Research Ethics Committee, and all participants provided online informed consent. The trial was conducted in three stages: the pilot study (n = 3,746 at baseline, follow-up at 1 and 3 months), the main trial (n = 2,652 at baseline, follow-up at 3 and 12 months) and the main trial extension (n = 1,537 at baseline, follow-up at 3 months only) [27]. The trial achieved rates of follow-up of 55 % at 1 month, between 41 and 48 % at 3 months and 32 % at 12 months [27]. For the purposes of this study, all three trial stages were combined. However, as the inclusion criteria of five or more on the AUDIT-C was introduced in the main trial, 156 individuals from the pilot study with a score of <5 were excluded, resulting in sample sizes of 7,779 at baseline, 1,974 at 1 month, 3,449 at 3 months and 854 at 12 months.

Measures

EQ-5D

A standard three-option version of the EQ-5D was administered to participants including both the EQ-5D dimensions and the VAS. EQ-5D dimension scores were converted to index scores using UK population values [13].

Levels of alcohol risk

Alcohol consumption was measured using an online instrument, TOT-AL, which measures total past week alcohol consumption [31]. Levels of drinking risk to health were calculated from TOT-AL scores (1 unit = 8 g of alcohol) and for the purposes of international comparability and consistency and were based on levels of risk as used in the Global Burden of Disease study [32]: safe (mean past week <28 g/3.5 units of alcohol/day for men and <20 g/2.5 units for women), hazardous (28–79.99 g/3.5–10 units for men and 20–54.99 g/2.5–6.9 units for women) and harmful (≥80 g/10 units for men and ≥55 g/6.9 units for women). As our sample included only people who scored five or more on the AUDIT-C (indicative of hazardous or harmful drinking), we re-labelled these risk levels: low, medium and high risk. Thus, for example, low-risk drinkers comprised all those with a total weekly alcohol consumption of <196 g/24.5 units of alcohol for men and <140 g/17.5 units of alcohol for women. Transitions between these drinking risk levels were explored.

Analyses

The trial sample was treated for analytical purposes as one cohort (rather than conducting analyses within randomised groups). This method was appropriate for two reasons. Firstly, the focus of this study was on the ability of the EQ-5D to detect changes in alcohol consumption, regardless of why consumption changed. Secondly, there were substantial reductions in self-reported alcohol consumption at follow-up points in the trial, but there was no evidence of a difference based on trial arm [27]. Reductions in alcohol consumption have been frequently observed in control or untreated groups in alcohol studies and may be due to factors such as natural variability over time and reactivity to assessment [3336].

In baseline comparisons with British population norms (see Figs. 1, 2), to make the graphs more comparable, weights by education were added to bring the proportion with a degree (high in the DYD population at 51.6 %) down to 12 %, the proportion holding a degree in 1993 when the general population data were collected [37].

Changes in EQ-5D scores between baseline and follow-up points were calculated by subtracting baseline index scores from follow-up scores. Multiple linear regression models were estimated to explore the relationship between alcohol risk transitions and change in EQ-5D index scores, stratified by baseline alcohol risk (low, medium or high). Within each model, people who remained at the same level from baseline to follow-up were the reference category. For example, in a model examining change in EQ-5D score at 1 month for people who drank at low-risk levels at baseline, those who drank at low levels at 1 month were the reference group, to which those who had increased their drinking risk level were compared. A similar methodology has been used in a previous alcohol study using the SF-12 [38], as it allows for meaningful comparisons between groups. All models were adjusted for baseline EQ-5D score, age, gender and educational level. Coefficients represent change in EQ-5D score, with a negative value indicating a decrease in HRQoL from baseline to follow-up and a positive value indicating an increase in HRQoL. Model residuals were explored and outliers removed as a form of sensitivity analysis.

The focus on change in EQ-5D scores, rather than on predicting index scores, was intended to reduce the impact of ceiling effects [39]. Nevertheless, model diagnostics revealed that the change in EQ-5D data was not normally distributed (data not shown). Bootstrapping [40], a method traditionally used to account for the skewness that is observed in cost data [41], was therefore used. The linear regression model was fitted to 1,000 bootstrap data sets, and 95 % confidence intervals were computed using the bias-corrected and accelerated (BCa) method. All analyses were carried out using STATA 11.2 software.

Results

Comparing the DYD sample with British population norms for EQ-5D

Compared to British population norms (n = 3,395 people from England, Scotland and Wales in 1993) [37], overall, DYD participants were less likely to report problems with mobility, self-care, usual activities and pain, with mobility problems particularly infrequent. However, anxiety/depression problems were highly prevalent in the DYD population, with over 57 % reporting a problem, compared to only 21 % of the general population.

At baseline, the DYD sample had a higher proportion of women than men (57 %), over one-third of participants were aged 30–39 (23 % 18–29 and 43 % 40–93, mean age 38.2, SD 10.8), and over half held degree-level qualifications (52 %). Comparisons of the prevalence of problems to the British norms were therefore stratified by age and sex, and the DYD results were also weighted by education (see Figs. 1, 2). The DYD population remained generally in better health compared to the reference population. However, despite this overall pattern, the youngest DYD participants (aged 20–29) experienced greater problems with most EQ-5D dimensions compared to the general population sample, particularly with usual activities. In addition, as in the non-stratified analyses, problems with anxiety/depression dominated the DYD sample, especially among the younger individuals.

Fig. 1
figure 1

Proportion of males who reported any problem for the EQ-5D dimensions

Fig. 2
figure 2

Proportion of females who reported any problem for the EQ-5D dimensions

EQ-5D scores according to level of drinking risk at baseline

Figure 3 shows the prevalence of problems on each of the EQ-5D dimensions by level of drinking risk at baseline (see Table 1 for frequencies and percentages). With the exception of self-care, for which problems were low (2 % or less) across all drinking levels, problems on the other EQ-5D dimensions increased significantly with increasing drinking. However, problems for most dimensions were only notably higher among the harmful drinkers, with only a slight increase in problems for those drinking at medium-risk levels compared to low-risk drinkers. The exception was anxiety/depression where over 40 % of individuals drinking at low-risk level-reported problems, increasing to over 50 % for medium-risk drinkers and to almost 70 % for people drinking at high-risk levels.

Fig. 3
figure 3

Percentage of individuals with problems on EQ-5D dimensions by level of drinking risk at baseline. Test for trend p values: mobility (p < 0.001), self-care (p = 0.09), usual activities (p < 0.001), pain/discomfort (p < 0.001), anxiety/depression (p < 0.001)

Table 1 EQ-5D index, VAS and dimension scores by level of drinking risk at baseline

Table 1 summarises the EQ-5D results for the three drinking risk levels. The EQ-5D index scores exhibited large ceiling effects, particularly among the low-risk drinking group, of whom over half scored 1 (indicating full health). A score of 0.848 (moderate problems with anxiety/depression, no other problems) was also common in all three groups, accounting for between 50 and 60 % of the scores below 1. Mean index scores were similar for low- and medium-risk drinkers (0.88 and 0.87, respectively), but were markedly lower for people who reported drinking at high-risk levels at baseline (mean = 0.80). Similarly, mean VAS scores were 10 points lower for high-risk drinkers (61.4) compared to low-risk drinkers (71.8), but the average VAS score for medium-risk drinkers was only 2 points lower (69.7).

Change in level of drinking risk between baseline and follow-up

At all follow-up times, there were substantial reductions in self-reported alcohol consumption from baseline values (Table 2). Table 3 presents levels of drinking risk at baseline and at the three follow-up periods. For people drinking at low-risk levels at baseline, most (61–66 %) had not changed category of risk at each follow-up period. Similarly, for those drinking at medium-risk levels, around 60 % were still drinking at these levels at follow-up, but 28–36 % had reduced their drinking to low-risk levels. Among participants who were drinking at high-risk levels of alcohol at baseline, the majority had reduced their drinking to a lower-risk level at follow-up, with between 14 and 23 % subsequently within low-risk levels.

Table 2 TOT-AL past week alcohol consumption (units) at baseline and follow-up
Table 3 Level of drinking risk at baseline and follow-up

Change in EQ-5D score between baseline and follow-up

Mainly due to ceiling effects, many participants’ EQ-5D index scores did not change between baseline and follow-up. For example, between baseline and 3 months 1,918 (over 54 %) did not change index score. Of these non-changers, 60 % reported full health at both time points, 28 % had a score of 0.848 at both (moderate problems with anxiety/depression, no other problems), and the final 12 % had other scores at both time points. Of those who improved in EQ-5D score between baseline and follow-up, an improvement from 0.848 to 1 was the most common, accounting for around 40 % of improvement at each of the one-, three- and 12-month follow-up periods (results not shown).

Change in EQ-5D scores according to change in alcohol consumption

Figure 4 shows the correlation of change in alcohol consumption and change in EQ-5D from baseline to 1, 3 and 12 months. For all time periods, there was a weak negative correlation between change in alcohol consumption and change in EQ-5D score, indicating that as alcohol consumption increased, EQ-5D score decreased (Pearson’s correlation r 0.09, 0.11 and 0.12 for 1, 3 and 12 months, respectively, all p < 0.001). However, the association was not strong, and the scatterplots show the variability of results.

Fig. 4
figure 4

Correlation of change in EQ-5D score and change in TOT-AL score from baseline to 1, 3 and 12 months, showing a fitted regression line and its 95 % confidence interval

Table 4 shows the mean EQ-5D index scores at baseline and follow-up according to drinking risk level at baseline and follow-up, and results from linear regression models predicting change in EQ-5D index score adjusted for baseline EQ-5D score, age, gender and educational level. For most groups, index scores did not change significantly between baseline and follow-up. The exception was among high-risk drinkers at baseline: those who decreased their drinking at follow-up had small improvements in index score (from 0.04 to 0.06) compared with those who did not decrease their drinking: five of these six differences were statistically significant. Sensitivity analyses conducted after removing outliers changed results minimally (data not shown).

Table 4 Changes in EQ-5D from baseline to follow-up according to change in level of drinking risk: mean index scores, and change in index scores from linear regression models adjusted for baseline EQ-5D, age, gender and educational level

Discussion

In this study population of help-seeking hazardous and harmful drinkers, baseline EQ-5D index scores below one were driven primarily by problems with anxiety/depression. Over 90 % of the <1 scores in the sample included some problem with anxiety/depression, and over 50 % indicated moderate anxiety/depression and no other problems. These findings are consistent with previous research, where mental health problems have been shown to dominate QoL in studies using both the EQ-5D [16, 19, 23] and other QoL tools [38, 4247].

EQ-5D results were compared across three levels of average weekly consumption, here indicative of low, medium and high risk. Medium-risk drinkers reported more problems only with anxiety compared to low-risk drinkers, whereas high-risk drinkers reported more problems with mobility, usual activities and pain, in addition to increased levels of anxiety. Consequently, overall EQ-5D index scores did not differentiate between those at medium and low risk, whereas high-risk drinkers had significantly lower scores. These results indicate that those at medium risk are not experiencing physical health problems as captured by the EQ-5D. In an American longitudinal study using the generic HRQoL measure the SF-12, participants who developed alcohol abuse according to DSM-IV criteria (but not dependence) at follow-up showed no significant decreases in physical functioning, leading the authors to suggest that abuse may be a behavioural, rather than a health problem per se [38]. Nevertheless, our results could also indicate that the generic EQ-5D is insensitive to smaller changes in HRQoL that might be expected as a result of hazardous drinking, as opposed to harmful drinking.

Despite striking improvements in average alcohol consumption at follow-up points compared to baseline values, EQ-5D index scores changed minimally at 1, 3 and 12 months. There was only a weak correlation between alcohol consumption and EQ-5D index scores, and many participants exhibited no change in EQ-5D score, despite a substantial change in alcohol consumption. In adjusted linear regression models, significant changes in EQ-5D scores were only observed for those who were high-risk drinkers at baseline: those who reduced their drinking risk had an improvement of between 0.04 and 0.06 compared to those who did not change. There is no established threshold for a minimally important difference (MID) for the EQ-5D; 0.03 had been suggested based on the smallest difference in moving from one level to another on any of the 5 dimensions [48]. A mean threshold of 0.074 (range −0.011 to 0.140) has been estimated across eight studies including eleven patient groups [49], and this value is now widely used [see for example, 5052]. Although our results fall below this mean threshold, the wide MID range across the eight studies from which it is derived demonstrates that further empirical work is needed.

To our knowledge, no previous study has explored changes in EQ-5D associated with changes in alcohol consumption in hazardous or harmful drinking populations. Studies using other generic HRQoL measures in non-treatment populations have also found a lack of significant association between alcohol use and summary HRQoL scores [46, 53], or sensitivity to changes in alcohol consumption over time [38, 43].

There are a number of reasons why the EQ-5D may exhibit poor discriminative ability and sensitivity in a population of hazardous and harmful drinkers. The tool has been widely criticised for its lack of items and consequent ceiling effects and has been recently revised to include 5 options per dimension [54]. In our study, the EQ-5D demonstrated a considerable ceiling effect, with over half of the low-risk drinkers and over a quarter of high-risk alcohol users scoring 1.00, similar to a previous study of alcohol-dependent hospital inpatients, where over 27 % reported full health [16]. In addition, over one-third of the sample had moderate problems with anxiety, and no other problems. The dimensions of the EQ-5D may be too narrow to capture some effects of alcohol on quality of life due to the inclusion of redundant items, such as mobility [44, 55], and the exclusion of relevant items, such as sleep, social functioning [56] and societal impacts, e.g. crime [57]. Finally, the EQ-5D is designed to measure current health status, and therefore cannot capture risk to health, which is particularly relevant to both hazardous and harmful drinkers and other public health issues. Population-based studies have demonstrated that although some harm may be reduced immediately following changes in alcohol consumption, it can take many years before there is an observed improvement in health outcomes [58, 59].

Strengths and limitations

The main strength of our study was the large sample size, not only at baseline but at several follow-up points up to 12 months, which allowed for the examination of the behaviour of the EQ-5D with respect to transitions in alcohol risk among hazardous and harmful drinkers. This method of examining change has been previously restricted to treatment populations only, with small sample sizes. The stratified analyses allowed for meaningful comparisons between groups and also circumvented the need to deal with repeated measures (as some participants were followed up at more than one point in time). Our results control for age, gender and education, all of which are predictors of both alcohol consumption and HRQoL; nevertheless, there could be further residual confounding factors.

Limitations of our study are imposed by the web-based context and resulting younger and more educated population, which may restrict generalisability. Our sample comprised individuals with varying degrees of readiness to change their drinking who were browsing the Internet and decided to take part in a randomised controlled trial evaluating a behaviour change support intervention. Some may have already taken steps to reduce their drinking before entering the trial, and this should be borne in mind when interpreting study findings, particularly as this could be true for those who were already experiencing health problems due to their alcohol consumption. This reflects the complex relationship between alcohol and health; although increased alcohol consumption can impact on health, declining health may reduce drinking [60].

There were high levels of attrition, which are a common challenge in Internet-based trials, with follow-up rates for online studies as low as 11 % at 3 months [61]. The DYD trial achieved rates of follow-up of 55 % at 1 month, between 41 and 48 % at 3 months and 32 % at 12 months [27], and we investigated the nature of attrition in a dedicated randomised sub-study [62] and used incentives to reduce it [63]; nevertheless, these high levels of missing data can lead to biased estimates [64], which should be borne in mind when interpreting our findings. The hazardous and harmful drinking nature of our sample is also noteworthy in making comparisons with previously studied populations and makes a distinct contribution to the literature.

Alcohol consumption was self-reported, which can be prone to bias due to social desirability. However, the web-based setting allowed for anonymity and privacy, both of which can increase reports of sensitive information, and studies in treatment settings generally suggest that carefully collected self-reported alcohol data are reliable [65]. There have not yet, however, been online or brief intervention studies that have reached similar conclusions [66]. We tested our models by removing outliers, which changed the results minimally. Nevertheless, our findings on the insensitivity of the EQ-5D to changes in alcohol consumption are valid to the extent that the self-reported change in behaviour has actually occurred.

To allow for international comparability and consistency, the levels of risk employed here (low, medium and high) were based on those in the Global Burden of Disease study [32], with thresholds for overall weekly consumption converted to average daily consumption. Consequently, they do not take account of the risks involved in heavy episodic drinking, which partly explains the number of participants categorised as low-risk, despite being identified by the AUDIT-C as hazardous or harmful drinkers.

Our exploration of change in EQ-5D scores reduced the analytical issues that arise from ceiling effects [39]. However, it should be noted that ceiling effects may still have affected the results somewhat; for participants with an initial EQ-5D score of 1, their change in EQ-5D score has to be ≤0 and is more likely to be zero than for participants with lower initial EQ-5D scores.

Conclusions and implications

To our knowledge, this study is the first to explore the relationship between transitions in alcohol risk levels and changes in EQ-5D index scores (commonly used to calculate QALYs) among hazardous and harmful alcohol users. In our study, compared to low-risk drinking, medium-risk drinking was associated with significantly more anxiety/depression problems, whereas high-risk drinking was associated with significant detriments to physical HRQoL in addition to increased anxiety/depression. Nevertheless, when aggregated and combined with UK preference values, the three-option EQ-5D index scores were relatively insensitive to short-term (up to 12 months) changes in alcohol consumption, particularly among low- and medium-risk drinkers. A significant change in EQ-5D scores was only demonstrated for high-risk drinkers who reduced their drinking at follow-up, and these changes were small, falling below accepted thresholds for MIDs in EQ-5D scores.

Considering the published research and our own findings, the three-level EQ-5D may not be an optimal primary end point for capturing clinical benefit and cost-effectiveness in randomised controlled trials of interventions among hazardous and harmful alcohol users with relatively short follow-up periods. Further testing of the sensitivity of the tool in these populations is needed. As trials are often not suitable for longer-term follow-up, greater epidemiological data on natural history of alcohol consumption are needed for modelling of the consequences (including QALYs) of evaluated interventions in the longer term. Particularly, longitudinal alcohol studies including the EQ-5D would help to estimate future potential QALY gains [67]. Finally, the bi-directional nature of the relationship between alcohol and health warrants consideration in any future research.