Definition

Economic evaluation involving cost-effectiveness analysis (CEA) is increasingly being used to inform resource allocation in health care. Using a cost-per-quality adjusted life year (QALY) analysis, CEA enables comparisons across a wide range of diseases and treatments using a common measurement. The “quality” part of the QALY is estimated by preference-based measures of health.

Several national guidelines and recommendations for the economic evaluation of drugs and other health technologies advise, for comparison purposes, the choice of outcome parameters that are validated and used in the literature. Such outcome parameters include preference-based measures such as the EQ-5D, the SF-6D, the HUI, the QWB scale, or the AQoL. However, such preference-measures are language and cultural dependent. Therefore, there is an increasing need to not only perform language validations in different countries, but also derive country-specific value sets for preference-based measures. Over the last years, valuation surveys were performed in general population samples to derive preference system weights for the most widely used preference-based measures.

Some methodological issues arise from the use of preference-based measures, such as the existence of differences in stated preferences across countries and the use of nontraditional elicitation techniques. Ongoing research has been, however, discussing these issues and several papers have been published on these matters.

Description

The use of economic evaluation has been increasing throughout the world to inform resource allocations in health care. The economic evaluation involves cost-effectiveness analysis (CEA) and cost-utility analysis (CUA) where the use of measures of health related–quality of life (HRQoL) has become increasingly important to evaluate the outcomes of treatments or health care programs (Drummond, Sculpher, Torrance, O’Brien, & Stoddart, 2005). These measures, often called preference-based measures of health, generate a single preference-based index that is usually used to estimate quality adjusted life-years (QALY) which are a common measurement used to compare outcomes across health care programs. In fact, the preferred outcome measure for conducting a CUA is often the QALY, which is calculated by multiplying the number of life-years gained from an intervention by a standard weight that reflects the HRQoL during that time. This weight is obtained from preference-based measures of health such as the EuroQol (EQ-5D), the Short Form 6 Health Survey Instrument (SF-6D), the Health Utilities Index (HUI), the Quality of Well-Being (QWB) scale, or the Assessment of Quality of Life (AQoL). In order to produce a value associated to each health status, these measures always combine a self-reported health state descriptive system with a set of preference weights (value set) to be applied to the self-reported data. The value sets are usually elicited from the general population in large survey studies using a choice-based method such as time trade-off (TTO) or standard gamble (SG) (Brazier, Ratcliffe, Salomon, & Tsuchiya, 2007).

The EQ-5D is a self-reported generic preference-based measure of health. It comprises two components: a health descriptive system and a visual analogue scale (VAS). The health descriptive system comprises five dimensions and each dimension can be rated at three levels (EQ-5D-3L), thus generating 243 different health states. The EQ-5D VAS is defined as a vertical thermometer ranging from 0 (worst imaginable health state) to 100 (best imaginable health state) where respondents assign a value to their health state at that moment in time. A sample of 3,395 individuals, representative of the UK population, valued 42 health states through TTO. Econometric models were estimated to calculate unique utility scores for all health states defined by the EQ-5D. These values, which constitute the EQ-5D index, vary between −0.59 and 1.00 (see Brazier et al., 2007 and Szende, Oppe, & Devlin, 2007 for a more detailed overview on the instrument). Recently, a new version of the EQ-5D has been developed. This new version has been designated by EQ-5D-5L since it has five levels in each of the five dimensions generating 3,125 different health states (more information about the EQ-5D-5L can be found on the EuroQol group’s webpage: http://www.euroqol.org/).

The SF-6D is another econometric preference-based index. It is derived from 11 items of the Short Form 36 Health Survey Instrument (SF-36) and comprises six dimensions of health, each one with four to six levels (see Brazier et al., 2002 for a more detailed description of the instrument). The SF-6D thus describes a total of 18,000 different health states. Of these, 249 health states were valued by a representative sample of the general UK population using the SG. Econometric models were estimated to predict single utility scores for all health states defined by the SF-6D. These health state values constitute an index – the SF-6D index – which can be seen as a continuous value on a scale from 0.35 to 1.00. Another version of the SF-6D was developed based on the Short Form 12 Health Survey Instrument (SF-12).

The HUI has currently three different versions (HUI mark 1, 2, and 3), being the later version (HUI3) the most widely used of the three. HUI2 has seven dimensions and defines 24,000 health states, and HUI3 has evolved from it. Changes involving both dimensions and number of levels were made to the descriptive system of HUI2 to reduce the degree of structural dependence and increase sensitivity. The new classification system of the HUI3 defines 972,000 health states. The valuation surveys for HUI2 and HUI3 were conducted in Canada. The valuation tasks included valuing states using VAS, and VAS and SG simultaneously in order to transform VAS values into SG values. Published valuation functions for HUI2 and HUI3 were calculated using multi-attribute theory. Brazier et al. (2007) present a detailed overview of the two versions of the instruments.

The QWB scale is the oldest preference-based measure of health, and its basic structure and weighting scoring has remained largely unchanged over the past decades. Its health descriptive system contains two components: the first is three multilevel dimensions relating to function that produce 46 functional levels; the second is a list of 27 symptom and complex problems. This structure forms 945 health states. A sample of these health states was valued using a VAS by a sample on North American individuals. An overall health state score was calculated by a simple additive formula. Further details may be found in Brazier et al. (2007).

The AQoL has currently two versions. The AQoL1 has five dimensions, each one with a different number of items. The items have four levels each. The advantage of this instrument over the others is that it uses a number of different items within a dimension. The AQoL2 comprises six dimensions, 20 items, and more than four levels. A stratified sample of 363 Australians was used in a two-stage valuation procedure to generate the utility weights of the AQoL. In this procedure, the valuation of item levels used VAS transformed into TTO, that was then used to generate values for the corner states and multidimensional states (see Brazier et al., 2007 for more details).

Currently, various national guidelines and recommendations for the economic evaluation of drugs and other health technologies indicate that is advisable, for comparison purposes, the choice of outcome parameters that are validated and used in the literature. Among them, we may cite the following: National Institute for Health and Clinical Excellence (NICE) (UK), Haute Authorité de Santé (HAS) (France), Institut für Pharmaökonomische Forschun (IPF) (Austria), Health Information and Quality Authority (Ireland), Canadian Agency for Drugs and Technologies in Health (Canada), College voor zorgverzekeringen (CVZ) (The Netherlands), Ministry of Social Affairs and Health (Finland), Instituto Nacional da Farmácia e do Medicamento (INFARMED) (Portugal), or Statens legemiddelverk (Norway). Such outcome parameters include preference-based measures as the EQ-5D, the SF-6D, the HUI, the QWB scale, or the AQoL. Although the EQ-5D has been widely considered as the most appropriate choice of instrument, most of these agencies feel it is inappropriate to require the use of the EQ-5D to the exclusion of any other methods meeting its underlying criteria (except for NICE that currently declares a preference for one of the referred preference-based measures of health as the generic instrument to be used in the measurement of HRQoL in adults). These instruments use preferences from the “informed” general public, which is the appropriate source to use for collective resource allocation purposes. Consequently, in CUA, the choice of the preference-based measure depends on its validation for the country population and on the availability of population preference values, elicited using techniques such as SG or TTO. In addition, ongoing discussion on the literature about the possibility of having international guidelines for economic evaluation has recently outlined the use of QALY or other similar measure as main economic outcome and the use of a preference-based (generic) measure as source of health values/utilities (Drummond & Rutten, 2008).

However, some methodological issues follow the development of these preference-based measures, such as the need to have country-specific value sets and the use of nontraditional elicitation techniques. Early work using preference-based measures has tended to use value sets from the original country where the preference-based was first developed (e.g., UK for the EQ-5D and SF-6D and Canada for the HUI) due to the high costs of conducting national valuation surveys that include high monetary costs and complex and time-consuming tasks. Nowadays, there is a growing interest in deriving country-specific value sets for the most widely used preference-based measures.

Therefore, the absence of a country-specific preference-based value set for the most used HRQoL instruments is a major problem for the usefulness of CUA for health care policy faced by several countries. From a health care policy point of view, the availability of a value set representing the preferences of the general population of the country would be a major strength. Moreover, there is evidence that stated preferences may differ across countries and there is an increasing interest in studying cross-country variations in health state values. On the one hand, there is evidence on cultural differences in the perception of health and suffering. On the other hand, countries differ in what concerns the availability of services and health resources, and contributing to differences in the importance given by individuals from different countries to a certain dimension.

Over the last years, valuation surveys were performed in general population samples to derive preference system weights for the most used preference-based measures. Given that the EQ-5D is the most popular preference-based instrument worldwide, it has the largest number of translations and also the largest number of country-specific value sets (by December 2012, there were preference system weights for at least 19 countries). General information about this instrument can be found on the EuroQol group’s webpage. There is also a growing interest in having country-specific preference system weights for the SF-6D. There are now specific value sets for the SF-6D for Portugal, Japan, Hong-Kong, and Brazil with preference system weights for Singapore currently being determined. HUI has also been valued to other countries, in special, HUI2 in the UK and HUI3 in France.

Ongoing research has been reinforcing the idea of the existence of differences in stated preferences across countries. Several papers have been published over the last years comparing national value sets from the same preference-based measure discussing the issue of obtaining different utility values and hence different QALY.

Previous studies have reported differences in EQ-5D preference weights that might have important effects on estimates of incremental cost-effectiveness (e.g., Bernert et al., 2009; Huang et al., 2007; Johnson, Luo, Shaw, Kind, & Coons, 2005; Noyes, Dick, & Holloway, 2007). Others have reported substantial differences on HUI2 preference weights from the original Canadian values (Brazier et al., 2007).

Recently, Ferreira, Ferreira, Rowen, and Brazier (2011) published their research addressing this issue of comparability of country-specific value sets by examining Portuguese (PT) and UK preference weights for the SF-6D obtained using different valuation methods. The purpose was to fully understand the existing relationship between the valuation technique and whether health state values differ across different populations. Comparisons of the PT and UK ordinal value sets evidenced a high level of agreement between them, suggesting a robustness of the rank relationship for the PT and UK population samples used in the study. Nevertheless, the PT ordinal weights were found to be systematically lower than the UK weights for physical functioning and pain. A possible explanation for this is that UK and PT respondents potentially give different weights to these dimensions when ranking the health states. This suggests that physical functioning and pain are more important to the PT population than for the UK population. On the other hand, it could also be argued that the results are due to differences in the reference point used by different populations. Whereas these differences could be due to differences in the valuation studies such as study design, interviewer effects, or year of study, the remaining differences may reflect cultural dissimilarities between countries. However, it is difficult to separate and isolate these differences. Moreover, comparisons between the SG value sets from both countries identified important differences between them, stressing the importance of using Portuguese-specific country value sets.

Another issue that should be taken into account when comparing country-specific value sets is the valuation technique used. Traditionally, the main techniques used to value health states to elicit utility values are SG and TTO. However, in recent years, there has been a growing interest in using ordinal elicitation methods to derive utility values. The use of ordinal data (rank data) could be an alternative to cardinal methods, usually more expensive, time-consuming, and involving complex tasks to be carried out and fully comprehended by older individuals or by individuals with low educational levels. Recent research has been carried on the estimation of rank preference-based value sets for the EQ-5D for the UK (Salomon, 2003) and for the SF-6D for the UK (McCabe et al., 2006) and for Portugal (Ferreira et al., 2011). Several papers have addressed this issue comparing EQ-5D TTO-based with EQ-5D VAS-based value sets (see Szende et al., 2007 for more details). Others have discussed the choice of the valuation method used to derive the value sets and have argued that the use of other methods rather than the SG, the gold standard for eliciting utilities, could contribute to the existing differences on the most commonly used preference-based outcomes, the EQ-5D and the SF-6D (Tsuchiya, Brazier, & Roberts, 2006). Few have addressed the issue of comparability between the SG-based value sets and rank-based value sets (Ferreira et al., 2011; McCabe et al., 2006; Salomon, 2003) and have performed cross-country comparisons on ordinal preference-based value sets (Ferreira et al., 2011).

Recent research has advocated the use of cardinal methods such as the SG or the TTO to measure preferences and hence elicit utility values to generate country-specific value sets to obtain health values that can be used to estimate QALY, enabling a comparison of outcomes in CUA. However, there is still little research to suggest that ordinal and cardinal values can be translated as equal preferences since rank tasks are not choice based. Given that respondents are not asked to trade between alternatives, there is still a need to further investigate whether the data provided by rank tasks can, in fact, be used as utilities, as those elicited from choice-based methods and, also, if the relationship between ordinal and cardinal tasks is as well affected by population’s inherent characteristics. Differences were found between preference weights estimated using elicitation techniques of SG and ranking using samples from UK and Portugal. Rank preference weights were found to be much more similar for the UK and PT populations than those estimated using SG. Nevertheless, these discrepancies should be further investigated, particularly across other countries and other cultures.

For the time being, transferring utilities from one country to another without an adjustment is not advisable.

Cross-References

SF-36