Introduction

When testing the construct validity of an instrument (often referred to as construct validation by extreme groups, or discriminative validity) it is common to estimate the difference in health-related quality of life (HRQL) between two different groups of individuals, in order to assess whether a measure can consistently discriminate between two patient groups, where one has a certain trait and the other does not [1]. Such an assessment constitutes an important component of the validation of HRQL measures as such empirical evidence is required to demonstrate that a measure is measuring what it is intended to measure [1, 2]. Thus, within this paper, we seek to assess whether different HRQL measures can discriminate between the HRQL of different groups of individuals. It is, however, important to acknowledge that such an assessment constitutes only one property contributing towards construct validity because evidence of construct validity, as Streiner and Norman [1] point out, can itself only be provided by a series of converging results, and not by a single study—no single experiment can unequivocally prove a construct.

In this paper we use three HRQL measures. The first two (the EQ-5Dindex [3] and SF-6D [4]) use questionnaires to elicit a health state description, which is in turn converted into a utility score (a scale where 0 is equivalent to death, and 1 is equivalent to full health), and the third (EQ VAS [3]) asks people to rate their current health state on a 0 to 100 scale (see the Methods section for further details). Marra et al. [5] have argued that there remains a gap in the literature regarding the assessment of utility measures with regard to criteria such as construct validity. Moreover, most papers which have estimated the difference in HRQL between different groups of individuals have done so without adjusting for differences between these groups [611], or only adjusted for differences in age and sex [12, 13]. Recently, however, two papers [14, 15] have extended this previous literature, by assessing whether the EQ-5Dindex can discriminate between different groups of individuals, after adjusting for differences in both sociodemographic factors and clinical conditions. Such adjustments were made because particular sociodemographic factors (e.g., age and gender) are associated with many clinical conditions. In this paper, we contribute further to the evidence on the construct validity of HRQL measures by similarly assessing whether the EQ-5Dindex, and in addition the SF-6D (derived from responses to the SF-36 questionnaire [16]) and EQ VAS, can discriminate between the HRQL of different groups of individuals with different sociodemographic factors and clinical conditions. It should also be noted that, in line with previous studies [13, 15], we consider that a measure can discriminate between the HRQL of different groups when groups are found to have a significantly different level of HRQL, where such comparisons are made both whilst adjusting for, and without adjusting for, other sociodemographic factors and clinical conditions.

We hypothesized that HRQL measures would be able to discriminate in accordance with differences previously observed in the literature. Namely that HRQL would vary according to age [68, 1015, 17], gender [6, 1115, 18], smoking status [6, 12], ethnicity [6, 14, 15], body mass index (BMI) [1820], education [7, 1012, 14, 15], social class [6, 12], economic status [12], housing tenure [12], income [11, 14, 15], and the presence of health problems [68, 10, 1215, 17]. With occupational skill level [21] acting as a proxy for five variables (education, social class, economic status, housing tenure, and income), we used cross-sectional data to assess whether the EQ-5Dindex, SF-6D and EQ VAS could discriminate between the HRQL of people with different sociodemographic factors (age, gender, ethnicity, smoking status, BMI, and occupational skill level), and people with and without different clinical conditions (back pain, hip pain, knee pain, heart disease, stroke, asthma, cancer, diabetes, rheumatoid arthritis, and osteoarthritis).

Minimally important difference

As outlined above, we initially considered that a measure could discriminate between different groups when there was a significant difference between the HRQL of those groups of individuals. Claxton [22] has however argued that decisions should be made on the basis of mean values, irrespective of whether such differences are statistically significant. Thus, we also focus on the size of the difference in HRQL between different groups of individuals, and in particular we consider whether these differences can be considered to constitute a minimally important difference (MID). Jaeschke et al. [23] defined an MID as “The smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient’s management” (quote taken from Walters and Brazier [24]). In an attempt to determine the MID Walters and Brazier [24] reviewed eight longitudinal studies in 11 patient groups. They estimated that the MID for the EQ-5Dindex ranged between −0.011 and 0.140 (mean = 0.074), compared to 0.011–0.097 (mean = 0.041) for the SF-6D [24], where the differential valuation range covered by the EQ-5Dindex and SF-6D scores was offered as a potential reason for these different MID estimates. Other estimates of the MID, for the EQ-5Dindex, include 0.033 (based on an effect size of 0.2) [14] and 0.036 (the smallest change associated with a change in levels on the EQ-5Dindex) [25], and 0.03 for the SF-6D (the mean change in the SF-6D associated with a reported change in general health in nine reviewed studies) [26]. Within this paper we took a conservative approach and inferred that a difference in HRQL of >0.03, between different groups of individuals constituted an MID on the EQ-5Dindex and SF-6D. As we are unaware of any studies which have estimated the MID for the EQ VAS we similarly assumed, for the purposes of our analysis, that a difference of >3.0 on the EQ VAS constituted an MID.

Methods

Participants and procedures

As part of the recruitment for a study designed to assess the cost-effectiveness of different lifestyle interventions for knee pain (LIKP) an ascertainment questionnaire was posted to all people who were registered in one United Kingdom (UK) general practice and aged ≥45 years, with the following exceptions—those who were deemed (by their general practitioner) to be unable to complete information requested in a questionnaire, or had had either a total knee replacement, an amputation of a lower limb, a permanent cardiac pacemaker, or an intra-articular steroid injection within the last 3 months. The ascertainment questionnaire requested information on sociodemographic factors (including age, gender, ethnicity, smoking status, height, weight, and current/previous job title), HRQL (according to the EQ-5D and SF-36 questionnaires), a person’s level of knee pain, and details of other clinical conditions. Additionally, respondents were asked to provide their name and address if they were willing to take further part in the LIKP study. People were asked to complete the ascertainment questionnaire, and post it to the LIKP study team at the University of Nottingham. The implication of such methods was that no checks were made on the self-reported data. Ethical approval for this study was granted by the Nottingham research ethics committee.

Data from all returned questionnaires were used in this study, even if not all sections of the questionnaire were completed; this also applied to questionnaires on which the person’s name and address were not reported. For the purposes of this paper data was extracted in order to enable the information on six sociodemographic factors to be calculated for each person: age, gender, ethnicity, smoking status, BMI (kg/m2) and occupational skill level. In addition, information about 10 common clinical conditions (whether they had back pain, hip pain, and/or knee pain on most days of the last month, and whether they had been diagnosed with heart disease, stroke, asthma, cancer, diabetes, rheumatoid arthritis, and osteoarthritis), and HRQL (according to the EQ-5D and SF-36 questionnaires) was also extracted. People were asked to report which ethnic group best described them. Due to the small number of non-white respondents we categorized each person as either white or non-white. Similarly, people were categorised according to whether they had, or had not, ever smoked regularly for a period of at least 3 months. BMI was categorised in accordance with World Health Organisation (WHO) recommendations [27], and in line with US National Institutes of Health (NIH) clinical guidelines [28], a BMI of <18.5 was considered underweight, of 18.5 to <25 as normal, of 25 to <30 as overweight, and of ≥30 as obese. The occupational skill level of each person was estimated by assigning the standard occupational classification (SOC2000) skill level [21] to their reported job title. The SOC2000 categorizes each job title into one of four skill levels, where 1 is the lowest and 4 the highest, dependent upon the length of time deemed necessary for a person to become fully competent in the performance of tasks associated with such a job title [21]. Respondents who reported they had never worked, or were a housewife, were assigned to the lowest occupational skill level.

Health-related quality of life (HRQL)

HRQL was assessed using the EQ-5D [3] and SF-36 questionnaires [16]. The EQ-5D questionnaire was developed by the EuroQol group, and has two components—five questions concerned with particular health dimensions and a visual analogue scale/thermometer. In the former respondents are asked to report the level of problems they have (no problems, some/moderate problems, and severe/extreme problems) with regard to mobility, self-care, usual activities, pain/discomfort, and anxiety/depression, the responses to which can then be mapped to one of 243 health states. This provides a measure of HRQL, as an individual utility score (referred to as the EQ-5Dindex) can be attached to each of these 243 health states. In the UK, based on preferences elicited from a survey of 3395 residents using the time trade-off (TTO) technique [29], regression analysis was used to estimate the utility scores for the EQ-5Dindex which ranged between −0.594 (33333—the lowest level on all five dimensions) and 1 (11111—the highest level on all five dimensions) [30]. Within the second (EQ VAS) component of the EQ-5D questionnaire people are asked to indicate how good or bad their health state is (on the day they complete the questionnaire), on a visual analogue scale where 0 corresponds to worst imaginable health state and 100 to best imaginable health state [3].

Responses to 11 of the questions on the SF-36 questionnaire [16] can be used to estimate a score on the SF-6D [4], though it is not always necessary for all 11 questions to be fully completed in order for an SF-6D score to be calculated (see Gerard et al. [31] for further discussion). The SF-6D is composed of six dimensions (physical functioning, role limitations, social functioning, pain, mental health, and vitality), each of which having between four and six levels. Based on the preferences elicited from 611 UK residents, using the standard gamble (SG) technique [32], regression analysis was used to estimate utility scores for each of the SF-6D health states [4]. We estimated SF-6D scores using the consistent [33] version of the SF-6D algorithm (see Brazier et al. [34] for further details), upon which utility scores are estimated to range between 0.296 (645655) and 1 (111111).

By completing the EQ-5D and SF-36 questionnaires three measures of HRQL can thereby be estimated—the EQ-5Dindex, the EQ VAS, and SF-6D. Utility scores are provided by the EQ-5D and SF-6D, whereas the EQ VAS summarizes HRQL on a 0–100 scale.

Analysis

Descriptive statistics

In order to compare the HRQL of different groups of individuals continuous variables were converted into categorical variables, e.g., people were categorized into one of four age groups: (1) 45 to <55 years (2) 55 to <65 years, (3) 65 to <75 years, and (4) ≥75 years. Descriptive statistics concerning the six sociodemographic factors (age, gender, ethnicity, smoking status, BMI, and occupational skill level), and 10 clinical conditions (back pain, hip pain, knee pain heart disease, stroke, asthma, cancer, diabetes, rheumatoid arthritis, and osteoarthritis) were then computed for all respondents in each patient group. Based on the responses from all individuals, for each of the EQ-5Dindex, SF-6D, and EQ VAS the range of scores, the median, 25th and 75th percentiles, the mean score, and associated standard deviation, were also calculated. The latter two summary measures were also estimated for each group of individuals with different sociodemographic factors, and those with and without each of the 10 clinical conditions, where the mean scores were concentrated on as these are used in analyses which seek to identify interventions which provide the maximum health gain that can be obtained from a given level of resource use [22, 35].

Discriminative ability

To assess whether each of the three HRQL measures could discriminate between groups of individuals with different sociodemographic factors, and those with and without clinical conditions, the t-test was conducted. We first assessed whether different groups had statistically significant (< 0.05) different mean HRQL scores (we refer to these as unadjusted comparisons). Moreover, the independent sample t-test was conducted both when there were only two groups of individuals (e.g., male versus female gender), and when comparing the mean HRQL scores for more than two different groups (e.g., BMI underweight, normal BMI, overweight, and obese), where in the latter case pairwise multiple comparisons were conducted between all other groups and the reference group (the group that was expected to have the highest HRQL, e.g., normal BMI). Use of the t-test requires the data to be approximately normally distributed. To test whether this was the case the Kolmogorov–Smirnov Z test was conducted for each of the three HRQL measures, where a significant p-value would lead one to reject the null hypothesis that the data were normally distributed. As it has been shown that responses to the EQ-5Dindex do not tend to be normally distributed [31], for the EQ-5Dindex and other measures that failed the normality test, the Mann–Whitney U test was also conducted, where again pairwise multiple comparisons were conducted between the HRQL of all other groups and the reference group. Both the t-test and Mann–Whitney U test were conducted as it has been suggested that further research should be carried out in order to assess the generalizability of the conclusion by Walters and Campbell [36] that nonparametric methods produce similar results to those of conventional statistical methods (i.e., the t-test and linear regression), and that the latter are thereby robust to the violation of assumptions that HRQL data are likely to cause (i.e., non-normality).

In order to assess whether the differences in HRQL between different groups of individuals was due to a particular sociodemographic factor, or clinical condition, rather than the effect of confounding factors, multiple linear regression analysis was also conducted. Three separate regression analyses were conducted where each of the EQ-5Dindex, SF-6D and EQ VAS acted as the dependent variable, in each regression analysis the six sociodemographic factors and 10 clinical conditions acted as explanatory variables. Again evidence of the ability to discriminate between different groups was considered to be provided when there was a significant difference between the HRQL of different groups of individuals, where this would be the case for regression coefficients which were significantly different from zero.

Within each regression analysis the constant term in each of the three regressions should be interpreted as the estimated HRQL of people in the reference group of each of the sociodemographic factors and clinical conditions, i.e., those who are aged 45–55 years, male, white, have never smoked regularly, have a normal BMI, the highest occupational skill level, and none of the 10 clinical conditions. Additionally, the use of alternative forms of each of the continuous explanatory variables was explored (e.g., age was included as a transformed continuous variable, rather than a categorical variable) in an attempt to identify the model which was best able to explain variation in the dependent variable, whilst (where possible) still trying to satisfy the standard assumptions of linear regression analysis (see Wooldridge [37] for further details of these). Where necessary, transformations of the dependent variables were also explored in an attempt to meet these standard assumptions.

Minimally important difference

As was pointed out in the Introduction it can be argued that the size of the mean difference between the HRQL of different groups of individuals is more important than whether such differences are actually statistically significant. Thus, we also assessed whether the size of the mean differences in each of the regression analyses could be considered to constitute an MID, i.e., whether the mean differences were greater than our previously inferred MID of >0.03 for the EQ-5Dindex and SF-6D, and >3.0 for the EQ VAS. This, for instance, enables one to assess whether there is a MID between the HRQL of those with and without knee pain, after adjusting for factors that may differ between these groups of individuals.

Finally, in order to estimate how the differences in HRQL differ according to the two measures of utility (i.e., how the differences vary across different sociodemographic factors and clinical conditions) we conducted a further linear regression. As in the previous regression analyses the sociodemographic factors and clinical conditions acted as explanatory variables, but here the dependent variable was the difference between the EQ-5Dindex and SF-6D score (EQ-5Dindex minus SF-6D). Within this analysis a significant difference (i.e., an estimated regression coefficient significantly different from zero) would mean that the difference in HRQL between two particular groups was estimated to be significantly higher for either the EQ-5Dindex or the SF-6D.

Results

Participants

There were 6,765 people registered with the general practice on 1 July 2004; 3,122 were aged ≥45 years, and 2,770 were sent an ascertainment questionnaire. Of these, 1,865 (67.3% of those who were sent one) returned the ascertainment questionnaire to the LIKP study team at the University of Nottingham. However, not all of the 1,865 respondents answered all questions—for the three HRQL measures the level of missing data ranged between 6.9% for the EQ-5Dindex and 13.6% for the SF-6D (see Table 1), compared to between 4.3% (gender) and 7.2% (BMI) for the sociodemographic factors, and up to 6.1% (hip pain) for the clinical conditions (Table 2). All subsequently reported estimates, including percentage calculations, are based on those who answered the relevant question.

Table 1 Summary scores for each of the EQ-5Dindex, SF-6D, and EQ VAS
Table 2 Mean HRQL estimates, and standard deviations (SD), for the EQ-5Dindex, SF-6D, and EQ VAS for groups of individuals with different sociodemographic factors and those with and without clinical conditions

Analysis

Descriptive statistics

The mean age of respondents was 64.7 years (range 45–99 years), and the mean BMI was 26.0 kg/m2 (range 14.3–51.6 kg/m2); 55.2% were female, 97.6% were of white ethnicity, 46.5% had smoked regularly for a period of at least 3 months, 32.5% were classified as having the highest occupational skill level, and 10.1% (2.6% of whom reported they had never worked or were a housewife) were classified as having the lowest occupational skill level (Table 2). With regard to the 10 clinical conditions 23.9% were classified as having knee pain, 18.5% back pain, 11.8% osteoarthritis, 10.9% heart disease, 10.8% hip pain, 6.7% asthma, 6.0% diabetes, 5.4% cancer, 5.1% rheumatoid arthritis, and 3.3% stroke (Table 2).

Summary scores for the three HRQL measures are shown in Table 1. It can be seen that the mean score was slightly higher on the EQ-5Dindex, compared to the SF-6D, whereas the median score was slightly higher on the SF-6D. This result can be explained by the fact that the EQ-5Dindex scores were more negatively skewed (see Table 1), and by the fact that 31.4% of respondents had a utility score of 1 according to the EQ-5Dindex, compared to 4.3% on the SF-6D. With regard to the EQ VAS the mean score was 75.54, and 1.2% reported a score of 100.

Discriminative ability

In Table 2 the mean HRQL scores, for each of the EQ-5Dindex, SF-6D and EQ VAS, are presented for groups of individuals with different sociodemographic factors, and those with and without the clinical conditions. Across all three HRQL measures, for each sociodemographic factor and each clinical condition it can be seen that, as hypothesized, those people in the each of the reference groups (i.e., younger age, male, white ethnicity, never smoked regularly, normal BMI, highest occupational skill level, and no clinical conditions) had a higher mean HRQL. However, we judged discriminative ability to mean that at least one group of individuals had a significantly (p < 0.05) lower mean HRQL than those in the reference group. On each of the three HRQL measures, this criteria was met for groups who were older (compared to younger respondents), obese (compared to normal BMI), had a lower occupational skill level (compared to those with the highest occupational skill level), and for all groups who had a certain clinical condition (compared to those who did not), with the exception of cancer. The criteria was also met for females, compared to males (on the EQ-5Dindex and SF-6D), non-white ethnicity, compared to white ethnicity (EQ VAS), those who have smoked regularly, compared to those that have not (EQ-5Dindex and EQ VAS), underweight BMI, compared to normal BMI (SF-6D), and for cancer (EQ VAS).

The significance values for the Mann–Whitney U test are shown in Table 2. It can be seen that of the 66 pairwise comparisons there was only one comparison which was found to be significant (p < 0.05) according to the t-test, but not the Mann–Whitney U test (the t-test estimated that, according to the SF-6D, the HRQL of those aged ≥65 to < 75 years was significantly different to those aged ≥45 to <55 years), and two comparisons which were found to be significant according to the Mann–Whitney U test, but not the t-test (the Mann–Whitney U test estimated that, according to the EQ-5Dindex, those who were aged ≥55 to <65 yrs had a significantly different HRQL to those who were aged ≥45 to <55 years, and that those of non-white ethnicity had a significantly different HRQL to those of white ethnicity).

Results of the linear regression analyses, conducted to assess whether each of the EQ-5Dindex, SF-6D and EQ VAS could discriminate between respondents with different sociodemographic factors and those with and without clinical conditions, after adjusting for other factors, are presented in Table 3. Overall the configuration of six sociodemographic factors, and 10 clinical conditions, were able to explain between 25.1% (EQ VAS) and 34.7% (EQ-5Dindex) of the variation in HRQL (Table 3). Additionally, it was estimated that people with the most favorable levels on each of the sociodemographic factors, and none of the clinical conditions, would have a mean score of 0.957 on the EQ-5Dindex, 0.859 on the SF-6D, and 84.86 on the EQ VAS. Relative to these scores, each of the three HRQL measures, estimated that there was a significant loss in HRQL for groups who were older (e.g., on the EQ-5Dindex the average score for those who were aged >75 was 0.052 lower than those aged ≥45 to <55 years), obese, had a lower occupational skill level, back pain, hip pain, knee pain, heart disease, rheumatoid arthritis or osteoarthritis. There was also a significant loss in HRQL for groups who were female (according to the EQ-5Dindex and SF-6D), smoked regularly (EQ-5Dindex), or had had a stroke (EQ-5Dindex and SF-6D), asthma (SF-6D and EQ VAS), cancer (EQ VAS), or diabetes (EQ-5Dindex and EQ VAS).

Table 3 Parameter estimates (95% confidence intervals) of the explanatory variables used to estimate variation in the EQ-5Dindex, SF-6D, and EQ VAS (* p < 0.05, † p < 0.01, ‡ p < 0.001)

These regression analyses did however show, particularly for the EQ-5Dindex, that some of the standard assumptions of linear regression analysis were not met. For example, we found that rather than the residuals having a normal distribution they had a negative skewness, and that the Breusch-Pagan test (F statistic) suggested heteroskedasticity to be present (see Table 3). That said, the values of the variance inflation factor (VIF) suggested that none of the explanatory variables were perfectly related to one another. Due to the negative skewness of the residuals we transformed the EQ-5Dindex score by adding a value of 0.59 to it (to ensure it was a positive value), and then raising it to the power of three: (EQ-5Dindex + 0.59)3. When this new variable was used as the dependent variable in the regression the adjusted R 2 improved, the new residuals had a lower level of skewness, and there was no evidence of heteroskedasticity (see square brackets in Table 3). Additionally, with one exception, the explanatory variables which were statistically significant within this regression analysis were the same explanatory variables as those which were statistically significant within the regression when the untransformed EQ-5Dindex variable acted as the dependent variable. The exception was the variable of occupation skill level, where when the dependent variable was (EQ-5Dindex + 0.59)3, those with the lowest skill level (1) were not estimated to have a significantly lower HRQL than those with the highest skill (4) after adjusting for other factors (see Table 3). Conversely, those with a skill level of 2 were estimated to have a significantly lower HRQL than those with the highest skill when the dependent variable was (EQ-5Dindex + 0.59)3, whereas this had not been the case when the untransformed EQ-5Dindex variable acted as the dependent variable. This robustness therefore increases the level of confidence that one can have in the aforementioned results where the EQ-5Dindex acted as the dependent variable.

Minimally important difference

Within each of the regression analyses differences in HRQL which we consider to constitute an MID are shown by a tick (√) in Table 4. It can be seen that the deficit in HRQL associated with older age, obese BMI, lower occupational skill level, back pain, hip pain, knee pain, heart disease, stroke, asthma, rheumatoid arthritis, and osteoarthritis were deemed to be a MID on each of the three HRQL measures as the differences in HRQL (between these groups and the respective reference groups) were >0.03 on both the EQ-5Dindex and SF-6D, and >3.0 on the EQ VAS. The deficit in HRQL was also considered to be an MID for those of non-white ethnicity (according to the EQ VAS), underweight BMI (EQ-5Dindex and SF-6D), cancer (SF-6D and EQ VAS), and diabetes (EQ-5Dindex and EQ VAS). Thus, after adjusting for other factors, none of the three measures estimated that the deficit in HRQL associated with gender or smoking status constituted an MID.

Table 4 Parameter estimates which were considered to constitute a minimally important difference (MID) i.e. >0.03 for the EQ-5Dindex and SF-6D, and >3.0 for the EQ VAS

Finally, we used linear regression analysis to assess whether there was a significant difference between (1) the loss in HRQL, for a particular group, according to the EQ-5Dindex, and (2) the loss in HRQL for the same group according to the SF-6D. The results in Table 5 show that when the EQ-5Dindex score minus the SF-6D score acted as the dependent variable most of the explanatory variables had a negative value. Such a negative value would indicate that the loss in HRQL was estimated to be higher according to the EQ-5Dindex than the SF-6D. Additionally, the significant values indicate that the loss in HRQL associated with certain age and occupational skill level groups, and back pain, hip pain, knee pain, heart disease, diabetes, and osteoarthritis was estimated to be significantly greater according to the EQ-5Dindex. That said it should be noted that only 13.6% of the variation in the difference in HRQL between the two instruments can be explained by the six sociodemographic factors and 10 clinical conditions (Table 5).

Table 5 Parameter estimates of the explanatory variables used to estimate variation in the difference between the EQ-5Dindex and SF-6D score (* p < 0.05, † p < 0.01, ‡ p < 0.001)

Discussion

When undertaking both adjusted and unadjusted comparisons we have demonstrated that each of the EQ-5Dindex, SF-6D, and EQ VAS estimated that there was a significant difference in HRQL between most groups with different sociodemographic factors and those with and without clinical conditions (see Tables 2 and 3). Indeed, after adjusting for other factors, each of the three measures was able to discriminate between the HRQL of those groups who had different ages, BMI, occupational skill level and those with and without back pain, hip pain, knee pain, heart disease, rheumatoid arthritis, and osteoarthritis. However, not all of the three measures could discriminate between the HRQL of groups of individuals with different genders, ethnicity, smoking status, underweight BMI, or those with and without stroke, asthma, cancer, or diabetes. By comparing the size of the mean HRQL differences between groups to the inferred MID it can also be seen that most of the differences associated with the clinical conditions would be deemed important, but that, after adjusting for other factors, this would not be the case for many of the sociodemographic factors (see Table 4). Finally, there was also a tendency for the mean deficit in HRQL associated with different clinical conditions to be estimated to be higher according to the EQ-5Dindex than the SF-6D (see Table 5).

Comparisons with other studies

Our results are in agreement with other studies, which have shown that HRQL measures are generally able to discriminate between the HRQL of patient groups that differ in terms of sociodemographic factors, or the presence of clinical conditions [68, 10, 1215, 1720]. For example, for the EQ-5Dindex, our results (shown in brackets), taken from Table 3, are highly comparable to two recent studies [14, 15]. After adjusting for age, gender, ethnicity, and clinical conditions, Lubetkin et al. [15] estimated that the mean difference in HRQL to be −0.060 for those aged ≥70 years, compared to those aged 18–39 years (>75 compared to ≥45 to <55 years = −0.052), −0.023 for females, compared to males (−0.029), up to −0.032 for those of non-white ethnicity (−0.014), −0.047 for diabetes (−0.058), −0.040 for asthma (−0.032), −0.064 for heart disease (−0.052) and −0.063 for stroke (−0.074). Similarly, after adjusting for age, gender, ethnicity, income, and education, Sullivan et al. [14] estimated that the mean difference to be −0.055 for congestive heart failure, −0.034 for acute myocardial infarction, −0.048 for other ill-defined heart disease (heart disease = −0.052), −0.0207 for asthma (−0.032), −0.035 for diabetes mellitus without complication (diabetes = −0.058), −0.0610 for osteoarthritis (−0.079), and −0.085 for rheumatoid arthritis and related disease (rheumatoid arthritis = −0.102).

Implications

Amongst other things, we have shown that there are small, but systematic, differences in the estimated deficit in HRQL, associated with certain sociodemographic factors and clinical conditions, as estimated by the EQ-5Dindex and SF-6D (see Tables 3 and 5). One implication of this is that there is the potential for the estimated improvement in HRQL, associated with an intervention, to vary according to the HRQL instrument with which it is measured, as has been shown previously [9, 3841]. Thus, there is also the potential for the estimated cost-effectiveness of an intervention to vary according to the HRQL measure that is used, and, in light of this, others have argued that estimates of cost-effectiveness which are based on different HRQL measures should not be compared [42], or that they should only be compared with extreme caution [38].

By comparing the estimated deficit in HRQL between patient groups to the MID we have additionally shown that the deficit associated with all clinical conditions, with the exception of cancer (see next section for further discussion), can be considered important. Conversely, in line with the results of Lubetkin et al. [15], this was not the case for differences between different genders. That said, it has been pointed out that preference-based measures were primarily designed to inform resource allocation decisions, and that therefore changes in costs need to be considered, as well as changes in HRQL [43].

Limitations

There were two potential deficiencies in the wording of the ascertainment questionnaire which may have affected the reliability of the self-reported data, though we believe that only the second potential deficiency may have affected the results of this study. The first potential deficiency was that people were simply asked to tick the respective box if they had been diagnosed with heart disease, stroke, asthma, cancer, diabetes, rheumatoid arthritis or osteoarthritis. Thus, in all circumstances, we infer a non-response to mean that they had not been diagnosed with the respective condition (i.e., the level of missing data was recorded as zero for these clinical conditions in Table 2). Such an approach does however leave open the possibility that some of these non-responses should have been categorized as missing (as the respondent did not consider the question). Thus, in a small number of cases, some respondents may have been categorized as not having the clinical condition, when in fact they had been diagnosed with a clinical condition but did not answer this particular question. The impact of this miscategorization would have been to understate the differences between groups of individuals, but as the HRQL measures were still generally able to discriminate between the different groups of individuals we do not believe this impinged upon the results of this study. Moreover, the impact of this oversight is minimized when we seek to identify an MID as only those who completed data on all other clinical conditions and sociodemographic factors were included in the linear regression analyses. The second similar oversight is that we asked people to report if they had been diagnosed by their doctor as having cancer. At least two respondents (we only began to monitor this part way through the data entry procedure) ticked the box to say they had been diagnosed with cancer, but in addition noted that it had now been ‘cured’. Thus, as well as the fact that there will have been a large variety of cancers for those included in the cancer category, some may also not have been affected by the condition when they completed the questionnaire. One potential implication of this is that, as those in the cancer and non-cancer categories may be more similar than intended, this may explain why neither the EQ-5Dindex nor SF-6D could discriminate between the HRQL of these patient groups.

Though the t-test and regression analysis are commonly used to analyze responses to the EQ-5Dindex [1315, 44, 45] there may be some limitations with regard to the statistical analysis performed within this paper. These largely arise because the data for each of the HRQL measures were not normally distributed (a requirement for the t-test) and some of the standard assumptions of regression analysis were not met. That said, we demonstrated that the results of the t-test were robust as the qualitative interpretation of these results (in relation to the p < 0.05 cut-off) was virtually identical to those obtained using the Mann–Whitney U test. Similarly, when a transformation of the EQ-5Dindex was used as the dependent variable, instead of the EQ-5Dindex itself, more of the assumptions of regression analysis were met, whilst the qualitative interpretation of all but one explanatory variable remained unchanged. Thus our results support the conclusion, of Walters and Campbell [36], that the t-test and linear regression are robust to the violation of assumptions that HRQL data are likely to cause.

Further potential limitations include those that are common to cross-sectional subpopulation studies of this nature. These include the fact that causality is generally easier to infer in a longitudinal study [46], that the use of self-reported data may result in biased estimates of the prevalence of clinical conditions [47], and that participants in medical research tend to be of a higher social class than nonparticipants [48, 49]. Indeed the latter of these may explain why the number of non-white people may have been underrepresented in our study (see Table 2 for further details). Some protection against reporting spurious effects is however provided by the large sample size of our study, and by using regression analysis to control for a number of potentially confounding variables. Finally, it should be noted that, in this paper, we have only assessed the discriminative ability of the three HRQL measures with regard to specific sociodemographic factors and clinical condition. Thus, as well as providing only information on a property which contributes towards construct validity we are also unable to comment on the performance of these measures with regard to other important validity and reliability criteria (see [2], [50] for a discussion of these).

Conclusion

We have demonstrated that each of the EQ-5Dindex, SF-6D, and EQ VAS was able to discriminate between the HRQL of the majority of groups of individuals with different sociodemographic factors, and those with and without clinical conditions. However, for groups of different gender, ethnicity, smoking status, or BMI or those with and without stroke, asthma, cancer, or diabetes not all of the three HRQL measures were able to discriminate between the HRQL of these groups of individuals. Finally, it was shown that the mean deficit in HRQL associated with many clinical conditions was estimated to be higher according the EQ-5Dindex than the SF-6D. This, in turn, implies that the benefits of alleviating such conditions, and thereby the cost-effectiveness of different interventions, will also tend to be more favorable according to the EQ-5Dindex compared to the SF-6D. Further research is needed to confirm whether this is the case, which conditions it applies to, and whether cost-effectiveness estimates which are based on different instruments need to be adjusted accordingly.