Introduction

The use of preference-based generic instruments to measure the health-related quality of life (HRQL) of a general population or of individuals suffering from a specific disease has been increasing. These instruments, based on the multi-attribute utility theory, generate utilities and are essentially generic HRQL instruments with predefined preference weights. The preference weights or utility scores for the different health states are derived through a valuation process, using techniques as the standard gamble (SG), the time trade-off (TTO) or the visual analogue scale (VAS). Some of the most used multi-attribute utility measures are the EuroQoL (EQ-5D) [13], the Health Utilities Index (HUI) [46], the Short-Form 6D (SF-6D) [7, 8], the Quality of Well-Being Scale (QWB) [911], and the Assessment of Quality of Life (AQoL) [12, 13]. Their ease of administration has contributed to their increased use as a source of quality weightings in economic evaluations and in clinical trials.

However, there are several discrepancies in terms of utility results between instruments. In fact, many researchers found significant differences in global utility scores obtained by different multi-attribute utility instruments [1424]. The objective of the present study is to compare the SF-6D and the EQ-5D and to investigate the differences in agreement between them. The main goal is to understand the possible reasons for the divergences found and to analyze their implications.

Methods

Study sample

Patients with cataracts waiting for a surgery at two hospitals in the Algarve, Portugal, from May to August 2005 were identified for this study. Patients were approached during an outpatient visit and asked to participate. Informed consent was obtained from all study participants, who answered self-administered questionnaires; those with difficulties in seeing their contents or who were illiterate were helped by a nurse. The order of the questionnaires was predefined and was the same throughout the study: SF-6D, EQ-5D, and Catquest. We used the official Portuguese versions of those instruments. One month after surgery, when patients came for a fourth follow-up visit, they were asked to complete the same questionnaires and were again helped by a nurse, if necessary. In this paper we only present the results from SF-6D and EQ-5D baseline assessment.

SF-6D

SF-6D is a new single-index summary preference-based measure of health derived from 11 items of SF-36 by a team at the University of Sheffield [8]. The items of SF-36 are converted into a six-dimensional health state classification system, the SF-6D, with four to six levels, allowing for a total of 18,000 unique health states. Dimensions of SF-6D include physical functioning, role limitations, social functioning, pain, mental health, and vitality. Different health states are assigned to values derived from valuations of a sample of 249 SF-6D health states using SG in a representative sample of the United Kingdom (UK) population [8]. The SF-6D score can be regarded as a continuous value on a 0.30–1.00 scale, where 1.00 indicates full health [8].

EQ-5D

EQ-5D was developed by the EuroQoL Group, a multidisciplinary group of researchers, as a standard generic instrument for describing and valuing quality of life that could be used to generate cross-national comparisons of health state [25]. It is composed of two parts. The first is a descriptive system consisting of five dimensions (mobility, self care, usual activities, pain/discomfort, and anxiety/depression), with three levels each, allowing for a total of 243 health states. States have been valued by a representative sample of the UK general population using the TTO valuation technique [2, 3]. Models were estimated to predict single-index scores for all health states, named the EQ-5D index. This index allows values below zero corresponding to conditions worse than dead. The second is a VAS, looking like a thermometer, with values corresponding to each respondent’s current perception regarding his/her personal HRQL. Respondents are asked to rate their current health on a scale from 0 to 100, semantically anchored by worst and best imaginable health states [1]. It is a self-administered questionnaire, easy to apply, and its brevity has been considered a plus.

Statistical analysis

Only subjects who fully completed the SF-6D, the EQ-5D, and the VAS were considered; no replacement or imputation was performed on missing response items. Frequencies and descriptive statistics were computed to characterize the study sample. Comparisons between utility measures were possible through descriptive statistics, as well as Spearman correlation coefficients. Both parametric [t-tests and analysis of variance (ANOVA)] and nonparametric tests (Kruskal–Wallis tests) were used to look for significant differences in utilities among sociodemographic groups. These differences were considered statistically significant if P-values were less than 0.10. Nonparametric tests were used because of the heterogeneity of variances observed in some cases and the nonnormality of some dimensions. Simple correspondence analysis (SCA) was used to assess the agreement among the instruments’ descriptive systems and to look for similarities between dimensions’ levels. Cluster analysis was used to classify SF-6D and EQ-5D levels into homogeneous groups. The statistical software used for the analyses were SPSS version 13, and SAS version 8.0.

Agreement among utility measures

Correspondence analysis is a descriptive and exploratory technique designed to analyze simple two-way and multiway tables containing some measure of correspondence between rows and columns. The results provide information, similar in nature to that produced by factor analysis techniques, about the structure of categorical variables included in the table. This technique is used for displaying the associations among a set of categorical variables in a scatterplot or map, allowing a visual examination of any pattern or structure in the data. Correspondence analysis is a technique for displaying multivariate categorical data graphically, by deriving coordinates to represent categories of the variables involved, which may then be plotted to provide an illustration of the data [26]. Displaying the categories of a contingency table in a scatterplot encompasses the concept of distance between the percentage profiles of each variable. When analyzing the scatterplot one should be aware that directly associated variables will have close coordinates and, therefore, will be plotted near to each other.

We used SCA with the purpose of assessing the agreement among the instruments’ descriptive systems and of investigating similarities between dimensions’ levels.

Clustering SF-6D and EQ-5D levels

Aiming at classifying SF-6D and EQ-5D levels into homogeneous groups, we also carried out a hierarchical agglomerative cluster analysis and a partitional k-means clustering. In the first case, we applied the hierarchical cluster analysis to the six dimensions identified through the SCA, using the Ward, furthest neighbor, and within-groups agglomeration methods and the squared Euclidean distance as a distance measure. In the k-means cluster analysis, used to obtain a better classification, each point was assigned to the centroid by using the furthest neighbor method. The use of all these methods enabled us to validate the cluster analysis. The decision regarding the number of clusters to choose was based on the fusion coefficients, the cut-off of the dendrogram, the elbow criterion, and R 2 measures.

Results

Sample

From the 360 participants who received the baseline questionnaire, 300 completed the VAS as part of the EQ-5D, and 352 global utility scores could be generated using SF-6D and EQ-5D scoring functions.

The majority of the sample were women (56.5%), married or living together with someone else (60.5%), with low educational level (79.5%). The respondents’ age ranged from 49 to 92 years, with a mean of 73 years (standard deviation, SD = 8.7 years). They were most frequently retired or manual workers, and living in urban areas (Table 1). Although the sample average income was low, i.e., less than €500 (71.0%), almost all respondents lived in their own houses (86.4%).

Table 1 Demographic characteristics of patients

Comparison between utility measures

The distributions for SF-6D and EQ-5D are shown in Table 2. In SF-6D, attributes with 15% or more patients at the two lowest levels include role limitation, social functioning, pain, and mental health. There may be some potential for a floor effect in this measure, particularly for role limitation, because many patients feel that they are situated in this dimension’s lowest level when compared with their own responses to usual activities on the EQ-5D. On the contrary, there is evidence of a ceiling effect in EQ-5D, that is, there is very little use of level 3 in three of its five dimensions. This suggests that one extreme problem in EQ-5D is much worse than any of the worst levels of the SF-6D [14].

Table 2 Distributions of responses to SF-6D and EQ-5D dimensions (percentages)

Table 3 represents the Spearman correlation coefficients between the SF-6D and EQ-5D. As expected, all the similar dimensions had direct and high correlation (greater than 0.45): physical functioning and mobility, physical functioning and usual activities, role limitations and usual activities, social functioning and mobility, social functioning and usual activities, pain and pain/discomfort, and mental health and anxiety/depression. There were also high correlations between mental health and pain/discomfort, between vitality and mobility, between vitality and pain/discomfort, and between vitality and anxiety/depression. On the contrary, we found that role limitations and self-care, physical functioning and self-care, and vitality and self-care had the lowest correlations (<0.30).

Table 3 Spearman correlation coefficients between SF-6D and EQ-5D dimensions*

We also found that VAS values were lower than both EQ-5D and SF-6D scores. Utility scores for the three measures are reported in Table 4 and Fig. 1. In terms of the mean, SF-6D and EQ-5D provide very similar estimates; the VAS mean is systematically lower. Twenty-five percent of the sample registered at least 0.85 health state utility in EQ-5D, while the same percentage reported at least 0.77 in SF-6D and 0.69 in VAS.

Table 4 Descriptive statistics of SF-6Da, EQ-5Db, and VAS utility scores
Fig. 1
figure 1

Distribution of SF-6D, EQ-5D, and VAS utility scores

The range and variability for EQ-5D and VAS were higher than those of SF-6D scores. EQ-5D presented negative values, denoting health states worse than death.

We also tested the sensitivity of health utility measures and VAS in terms of major patient characteristics. All measure scores were statistcally significantly lower in women (Table 5). As was expected, patients aged less than 61 years reported slightly higher levels of utility scores (except in VAS), and these differences were significant in all age groups. Contrarily to SF-6D, the EQ-5D measure seems to capture the general idea that older people tend to increase slightly the values they give to their health compared with individuals from the immediately lower age group. Mean utility scores were statistically significantly lower in the lower educational level than in the other educational levels. Statistically significant differences were found among people living in urban and rural areas; for the SF-6D people living in rural areas reported lower levels of utility scores. People married or living together with someone else reported higher levels of health utilities than single, widowed, and divorced or separated people, these differences being significant (except in SF-6D).

Table 5 Relationship between patients’ characteristics and utility measures

Nonparametric tests showed that health utility values were significantly related to employment: those who were retired and housewives reported lower utility values than employed and unemployed people. We also found statistically significant differences between people with different levels of income: those who earned €2,000 or more showed higher levels of utility than the others.

Agreement among utility measures

SCA was applied to a hypercontingency table formed by the six dimensions of the SF-6D and the five dimensions of the EQ-5D, a 31 × 15 matrix. Displaying the instruments’ dimensions, by deriving coordinates to represent categories of the variables involved, the first three axes explained 91% of the total variance (the first axis explained 72.5% of the total variance, the second 13.4%, and the third 5.2%). Figure 2 shows the strong contributions (above the mean) of the instruments’ levels to the axes formation.

Fig. 2
figure 2

SCA scatter plot above the first and the second factorial axes (diamonds for SF-6D levels, filled triangles for EQ-5D levels). The levels are labeled 6D or 5D according to the instrument. We also used two indices in the labels, the first denoting the dimension and the second the level (for instance, 6D12 represents the second level of dimension 1 of SF-6D).

Table 6 synthesizes the strong associations between pairs of levels, obtained through each cell contribution to the chi-squared measure (dimensions with levels highly correlated are marked in grey). This table also shows the sense of the associations: direct associations are marked with (+) and inverse associations are marked with (−).

Table 6 Strong associations between the levels of the highly correlated dimensions

Figure 2 and Table 6 show a high inverse association between the levels “Your health does not limit you in vigorous activities” (6D11) and “Some problems walking about” (5D12), meaning that a person with no limitations in vigorous activities in SF-6D will not probably state some problems in walking in EQ-5D. On the contrary, a person with little limitation in bathing and dressing (6D15) will probably refer to some problems in walking (5D12). According to this line of thinking, someone with no limitations in vigorous activities (6D11) will almost certainly refer to having no problems with performing usual activities related to work, study, housework, family or leisure activities (5D31), and will not refer to some problems with performing usual activities in the same measure (5D32). Also, someone with little limitation in bathing and dressing (6D15) will answer that he/she is unable to perform usual activities (5D33).

An individual without problems with his/her work or other regular daily activities as a result of physical health or any emotional problems (6D21) will tend to say that he/she has no problems with performing usual activities (5D31) and will not state problems with performing usual activities (5D32). An individual who is limited in the kind of work or other activities as a result of his/her physical health and accomplishes less than he/she would like as a result of emotional problems (6D24) will not mention having no problems with performing usual activities (5D31), but will most probably say that he/she has some problems with performing usual activities (5D32) or that he/she is unable to perform usual activities (5D33).

Someone who is not limited in his/her social activities (6D31) will not indicate that he/she has some problems walking about (5D12). Similarly, a person who is limited in his/her social activities most of the time (6D34) will not say that he/she has no problems walking about (5D11), but will state some problems walking about (5D12). A person not limited in his/her social activities (6D31) will probably state no problems with performing usual activities (5D31) and, naturally, will not refer to having some problems with performing usual activities (5D32). Any person whose health limits his/her social activities most of the time (6D34) will not only state that he/she has some problems with performing usual activities (5D32), but also that he/she is unable to perform usual activities (5D33) and will not, most certainly, state no problems with performing usual activities (5D31).

In terms of pain, it is possible to see that someone who has no pain (6D41) will report no pain or discomfort (5D41) or moderate pain or discomfort (5D42). Therefore, people reporting having pain that moderately interferes with their normal work (6D44) will report moderate pain or discomfort (5D42) and, obviously, they will not report no pain or discomfort in EQ-5D (5D41). Hence, people reporting having pain that interferes quite a bit with their normal work (6D45) will refer to having extreme pain or discomfort (5D43).

Regarding mental health, answers referring to feeling tense or downhearted and low none of the time (6D51) are directly related to people with no pain or discomfort (5D41) and inversely related to people with moderate pain or discomfort (5D42). Anyone who is feeling tense or downhearted and low most of the time (6D54) will state having extreme pain or discomfort (5D43). Feeling tense or downhearted and low none of the time (6D51) means not being anxious or depressed (5D51), since individuals’ answers are related. However, naturally, this means the contrary of being moderately anxious or depressed (5D52). Someone who is feeling tense or downhearted and low some of the time (6D53) will not refer to being not anxious or depressed (5D51), as will not refer to feeling tense or downhearted and low most of the time (6D54). In fact, a person in this last situation (6D54), will report being extremely anxious or depressed (5D53).

Another pattern can be found in terms of vitality, since an individual who feels a lot of energy all of the time (6D61) or a little of the time (6D64), will not have some problems walking about (5D11). On the contrary, anyone who reports having a lot of energy none of the time (6D65) will report some problems walking about (5D12) and will not say that he/she has no problems walking about (5D11). Someone with a lot of energy all of the time (6D61) refers to not having pain or discomfort (5D41) and not having moderate pain or discomfort (5D42). Having a lot of energy a little of the time (6D64) is stated by those who experience extreme pain or discomfort (5D43) and by those who are extremely anxious or depressed (5D53). Similarly, a person who mentions a lot of energy all of the time (6D61) will not refer to being moderately (5D52) or extremely anxious or depressed (5D53).

In a more detailed analysis at the level of the third main axis, we can also observe some more associations, shown in Table 6. In fact, it is possible to find a high positive association between the levels 6D32 and 5D31, meaning that a person whose health limits his/her social activities a little of the time (6D32), will probably refer to not having problems in performing usual activities (5D31).

Feeling tense or downhearted and low a little of the time (6D52) means having moderate pain or discomfort (5D42) and being moderately anxious or depressed (5D52), since individuals’ answers are related. And these individuals will not say they have no pain or discomfort (5D41), as they will not state being extremely anxious or depressed (5D53). Someone who has a lot of energy most of the time (6D62) will not refer to having no pain or discomfort (5D41) or being extremely anxious or depressed (5D53). In fact, a person in this situation will report having moderate pain or discomfort (5D42) and being moderately anxious or depressed (5D52).

Clustering SF-6D and EQ-5D levels

The Ward method, as well as the furthest neighbour and the within-groups methods, pointed to five clusters of homogeneous levels. The solution of the k-means cluster analysis (Table 7) was similar, but more consistent.

Table 7 Clusters of SF-6D and EQ-5D levels

Levels belonging to the same group are homogeneous, as they are associated to each other. The first group is mainly formed by levels denoting no problems in physical or mental health. Levels referring to some problems belong to the second group. The third cluster is mainly formed by levels related to a lot of problems in physical or mental health, while the forth cluster includes the levels which define extreme health problems. Finally, the fifth cluster gathers the levels that define very extreme health problems.

Discussion

In the literature of HRQL measures, there is an overall concern regarding differences in terms of results between instruments. Several studies that attempt to compare different instruments have been published [10, 12, 1424, 27, 29, 30]. This study compares the SF-6D and the EQ-5D and investigates the differences in agreement between them. It also attempts to understand the possible reasons for the divergences found and to explore their implications. It was not our purpose just to compare the instruments, but also to apply different methodologies to understand the pattern of an individual when answering the SF-6D and EQ-5D, i.e., how would he/she respond for a certain dimension of EQ-5D, given that he/she gave a particular answer to a certain dimension of SF-6D. Although the SF-6D and EQ-5D provide very similar estimates at the mean level, the range and variability for EQ-5D were higher than those of SF-6D. The results showed evidence of a potential floor effect in the SF-6D and of a ceiling effect in the EQ-5D.

As expected, Spearman correlation coefficients revealed direct and high correlations between all the similar dimensions (physical functioning and mobility, physical functioning and usual activities, role limitations and usual activities, social functioning and mobility, social functioning and usual activities, pain and pain/discomfort, and mental health and anxiety/depression). There were also high correlations between mental health and pain/discomfort, between vitality and mobility, between vitality and pain/discomfort, and between vitality and anxiety/depression and low correlations between role limitations and self-care, physical functioning and self-care, and vitality and self-care.

Nonparametric tests showed that health utility values were significantly related to sex, age, marital status, educational level, employment status, residence, and income: women; patients aged 60 years or more; single, widowed, and divorced or separated people; patients with low educational levels; retired people and housewives; people living in rural areas; and those who earned less than €2,000 reported lower levels of utility than men; patients aged less than 60 years; those who were married or living together with someone else; patients with high educational levels; employed and unemployed individuals; people living in urban areas; and those who earned €2,000 or more.

SCA was used to assess the agreement among the instruments’ descriptive systems and to investigate similarities between the levels of their dimensions. This enabled us to identify the levels most associated to each other and therefore to describe patterns of the individuals’ answers. For instance, it is now possible to say that an individual who accomplishes less than he/she would like, as a result of emotional problems in the SF-6D, will not answer that he/she has no problems with performing usual activities in the EQ-5D, but will most probably say that he/she has some problems with performing usual activities or that he/she is unable to perform usual activities, in the EQ-5D.

Cluster analysis was used to classify SF-6D and EQ-5D levels into homogeneous groups. The first group is mainly formed by levels denoting no problems in physical or mental health and the second by levels referring to some problems. Levels related to a lot of problems in physical or mental health belong to the third cluster, while in the forth cluster are the levels that define extreme health problems. Finally, the fifth cluster gathers the levels that define very extreme health problems.

It should, however be noted that the level “Your health limits you a lot in bathing and dressing” (6D16) does not appear in this fifth group, where it should be. The explanation for this may be an incorrect answer to SF-6D [the four individuals who answered “Confined to bed” (5D13) in the EQ-5D, chose the level above in the SF-6D] or that the last levels of mobility and self-care in EQ-5D and physical functioning in SF-6D do not measure the same concepts. Does this mean that the SF-6D is not able to identify very extreme problems, or is the explanation only the individuals’ misunderstanding?

Both instruments showed consistency, namely in the agreement found in some dimensions and in some levels of each dimension. However, it seemed that they measure different concepts, at least to some extent. Actually, some levels of both instruments agreed, while others, contrarily to what was expected, disagreed. These findings generally support the results of Tsuchiya et al. [20] and Brazier et al. [14] in terms of the major difference between the two instruments: the differences in the descriptive systems account for at least a part of the major differences in the range of the two instruments. Indeed, using cluster analysis we found some levels from both instruments that were supposed to measure the same concepts, but where individuals answered in a different way. This means that apparently similar levels are in fact different and contribute to the differences found in terms of the indices computed from both descriptive systems. These differences between the descriptive systems of the instruments found in our study should be further investigated.

Conclusion

The aim of this paper was to compare EQ-5D and SF-6D and to investigate the differences in agreement between them. It was also our purpose to understand the possible reasons for divergences found and to explore their implications. To our knowledge, no study has yet compared EQ-5D and SF-6D in terms of their descriptive systems, analyzing the extent of agreement or disagreement of their dimensions’ levels.

A major strength of the research reported here is its newness in terms of the way the comparison was addressed and the type of methodology used. Over the past years there have been other studies comparing SF-6D and EQ-5D [1424, 27, 30]. Whereas most of them present comparisons in terms of dimensions, none of them compared the levels of those dimensions. Moreover, none presented findings reporting the probable pattern of the answers of a particular individual to a certain level of a dimension of EQ-5D, given that he/she gave a particular answer to a certain level of a dimension of SF-6D. Bearing in mind that the methods adopted in this paper have not been widely used before in looking at the comparison of preference-based instruments, it can be said that its originality has to be balanced by a negative counterpart. In fact, though the methods used are robust and applicable to this problem, the existence of several levels in both instruments (31 in SF-6D and 15 in EQ-5D) leads to several possible relations between them, generating sctatterplots that are not easy to analyze. Although we only analyzed the strong associations between the instruments’ levels, it was still a real challenge to identify the patterns of the individuals’ answers. However, it is our conviction that this research identified some areas of agreement, as well as disagreement, between both instruments, and we hope that it helps shed some light on the issue of the comparability between instruments, which is a topic currently in vogue in the HRQL literature. Another limitation of the research is that data were collected from a relatively small and unusual sample of respondents. Whilst the type of patients could condition the results, since it is an elderly population, it should, however, be stressed that this could also be seen as a strength of the study, since studies comparing EQ-5D and SF-6D using data from old and visually impaired patients are not common. Nevertheless, to address this issue we intend to apply this methodology to a larger sample from the general population. Patients suffering from other diseases should also be used in future analysis to confirm the consistency of the findings reported herein.

This study provided evidence that both instruments are consistent, although it seems that they measure different concepts, at least to some extent. These findings are particularly important since these instruments are usually employed in HRQL studies and in economic evaluations. Furthermore, this reinforces the importance of the research carried out on mapping between instruments [27, 3133] and the need for more investigation in this field. Further research is needed to overcome the differences between EQ-5D and SF-6D: revisions of one or both descriptive systems or of their scoring algorithm are necessary to enable the interchangeably use of both instruments. Brazier et al. [14] suggest adding more intermediate levels to the EQ-5D or adding lower levels to the SF-6D dimensions, at least for the physical functioning and role limitations. Our current research is centred on this last suggestion of those authors—adding lower levels to two of the SF-6D dimensions—in order to correct its floor effect and to try to have extreme levels similar in both instruments.

Further studies should compare the performance of the SF-6D with that of other preference-based measures, such as HUI, and compare utility scores provided by SF-6D, EQ-5D and HUI with the ones obtained by elicitation techniques, such as SG or TTO. In fact, there is already some literature on these matters and on mapping between instruments [1421, 2733], although not specifically comparing the SF-6D utility scores to utilities generated by SG or TTO.