Introduction

There is a broad body of research on the societal monetary value of a quality-adjusted life year (QALY) [1, 2]. The motivation of these studies is, in most cases, to provide an estimation of the threshold value that would indicate the maximum cost per QALY for a technology to be considered cost effective. The underpinning idea is that health care funding decisions should be informed by the strengths of preferences of those members of society affected by these decisions [3]. Therefore, the cost-effectiveness threshold, upon which funding decisions are made, ought to reflect society’s value for health gains.

It has been argued, though, that the monetary value of a QALY (MVQALY) is not a relevant source of information when allocating resources in the presence of fixed health care budgets [4]. The reason being that to adopt a new technology that imposes additional costs on the health care system, displacement of existing services is often required, resulting in health losses for individuals elsewhere. In such contexts, the threshold should represent the cost per QALY of displaced services (i.e. the opportunity cost), which allows the assessment of whether the health expected to be gained from the use of a new technology exceeds the health expected to be forgone elsewhere as other services are displaced.

Fixed budget constraints might characterise most, but not all decision-making contexts, and information on the MVQALY would still be relevant for, at least, two reasons. First, when displacement in health care services is not required, for instance, if funding is made available by raising new tax revenue or by decreasing allocation to other public sectors, the opportunity cost of investing in health care would fall across other alternative uses of public spending. A threshold reflecting the strengths of preferences of the public across different alternatives of consumption would arguably be more relevant in these cases [1, 5]. Secondly, using the opportunity cost approach alone might perpetuate the belief that the threshold reflects the marginal benefits of health care. Decision makers would not be made aware of technologies whose benefits measured according to society’s views offset their costs but that might not be affordable under current budget constraints. Allowing for the identification and assessment of these circumstances could have implications for decisions on the size of the health care budget. Information on both the opportunity costs of health care funding decisions and society valuation of health gains is thus relevant to inform resource allocative decisions in health care [6].

On the societal value of health gains literature, the study of the maximum willingness to pay (WTP) has become the norm for the measurement of individuals’ monetary valuation of a QALY. WTP for a QALY survey usually involves three steps: (1) to elicit in the QALY scale the utility associated with a health gain, (2) to elicit the WTP for that health gain, and (3) to combine the responses of these two estimates to arrive at a WTP for a QALY. Most of the empirical studies have consisted of asking individuals about their WTP for a small health gain, elicited in utility terms using standard gamble (SG) or time trade-off (TTO) techniques, and then aggregate this up to infer the WTP for a full QALY [7].

There are two main limitations with this general approach. These are the presence of non-traders and the linearity assumption in the valuation of health gains. These two issues are inherent to the traditional QALY model [8], but become even more problematic when combining information from the two dimensions involved in the WTP for a QALY framework, i.e. the substitution between quality of life and time/risk of death implied in the TTO/SG questionnaires, and the substitution between wealth and health implied in the WTP questionnaires. With regard to non-traders, the issue arises when individuals are willing to make a trade-off in one dimension but not in the other, which creates analytical difficulties. For instance, individuals reporting to be willing to pay a non-zero quantity for an improvement on health for which they were not prepared to trade any time or risk of death yield an infinite MVQALY. On the linearity issue, the methods used so far implicitly assume that valuations of health gains are linear with respect to the size of the health gain. This assumption has been shown to not hold [9,10,11,12,13], which have led some to conclude that finding a single value of a QALY using these methods might not be possible [10]. While the source of this problem could somehow be related to limitations of the WTP techniques to capture the true value of health gains, it might also be the case that individuals experience diminishing marginal utilities for health gains and are thus unlikely to consider, say, a health gain twice as large as twice as valuable [14]. This have further led some authors to argue that it is inappropriate to express cost-effectiveness thresholds as maximum cost per QALY [15].

In this study, we aim to estimate the societal MVQALY in Spain, allowing for differences by types of health gains. In particular, we focus on the impact of using different magnitudes and dimensions of Quality of Life (QoL) improvements defined by all possible health states included in the EQ-5D-3L. This allows us to derive a range of societal monetary values for a QALY associated with all possible health profiles that relate to the most widely applied instrument used to describe health outputs in economic evaluations. Previous research has focused on exploring whether using different values for the duration and/or the severity of health states yields different results [9,10,11, 13]. To do so, these studies normally used two or more values for the duration of the health problem and/or two or more QoL profiles, selected a priori by the authors. In our study, we focus only on differences on QoL, but instead of using somehow arbitrary and relatively few values to define different QoL improvements, we estimate the monetary value of a QALY associated with the full spectrum of the EQ-5D-3L instrument. This provides a comprehensive assessment of the impact of using different QoL improvements on the corresponding MVQALY. We set the duration of the problem as fixed and equal to 1 month. The choice of such a short duration was informed by the findings of some of these previous studies that have emphasised that the size of the health problem shown to respondents ought to be small to partly address the issue that individuals reach their budget constraint when reporting WTP to avoid a health problem experienced for a long period [10].

To measure the monetary value of a QALY corresponding to every health state described by the EQ-5D-3L, we construct a value set for the EQ-5D-3L in terms of utilities and also in terms of WTP. By combining these two, we are able to compute the WTP for a QALY value corresponding to every possible EQ-5D-3L health state. This was accomplished by conducting a survey on a representative sample of the population in Spain and jointly modelling the responses about their health preferences and their WTP values. We use a discrete choice experiment (DCE) rather than TTO or SG techniques to elicit health gains in utility terms. DCEs are particularly well suited to measure individual strengths of preferences, and have been recently applied to estimate the utilities associated with EQ-5D health states in the literature [16,17,18,19]. In DCE tasks, respondents are typically asked to select the option they would choose between two or more alternatives described in terms of a set of characteristics (attributes). In our study, the task implied individuals to indicate their preferred health state between two options described using EQ-5D-3L health profiles. This is an arguably simpler exercise than that required for TTO and SG techniques that involve going through an iterative process of identifying the point of indifference between two options [16]. It also avoids the issues related with respondents considering the health state shown to be worse than death, which requires further adjustments to the traditional TTO and SG methods. This simplicity provides us with the opportunity of conducting an online survey in a large and representative sample of the population that included the valuation of sufficient health states to create a complete value set plus a WTP questionnaire. Furthermore, this method deals with the non-trader issue discussed above, as DCEs simply ask individuals to choose the health state they preferred between two possibilities, rather than asking respondents to trade time or a risk of death to have a better quality of life. There is a limitation with the use of DCE designs though. The outputs produced by DCE data are in an arbitrarily scale, not anchored on the QALY utility scale of 0 (death) and 1 (perfect health). Therefore, as we discuss next, a rescaling approach was needed to produce values amenable to QALY calculations. We did so by including a single TTO task in the questionnaire that allowed the DCE values to be re-anchored in the QALY scale.

The methods we used are different to DCE designs that include cost as part of the attributes and estimate WTP values for the other attributes according to marginal rates of substitution. Such design would have precluded us from estimating a utility value set for the EQ-5D health states, and thus from expressing our results in terms of the corresponding WTP values for a QALY. Instead, we conducted a DCE using EQ-5D-3L dimensions as the only attributes, followed by a contingent valuation exercise to elicit WTP values related to these health states.

The rest of the paper is structured as follows. “Methods” describes the sampling and recruitment strategies, the DCE task, the WTP task, and the TTO task included in the questionnaire, as well as the modelling techniques used to analyse the data. Descriptive statistics of the sample and regression model results are provided in “Results”, alongside with the estimated ranges of the monetary value of a QALY. Finally, in “Discussion”, we summarise the main findings, strengths and limitations of this study and provide some words of caution when interpreting some of our results.

Methods

Sampling and recruitment

The data were collected via an online survey. Respondents were recruited from an existing commercial internet panel with over 227,000 users in Spain. Participants were selected by quotas based on the demographic characteristics of the Spanish general population according to age (six categories: 18–24; 25–34; 35–44; 45–54; 55–64 and 65 or more), gender, and region of residence [Spain is divided into 17 areas called Autonomous Community (AC)].

Individuals who clicked on the invitation to the survey link were shown a project information sheet and then asked whether they consented to take part in the study. Following this, the quota questions were posed, and if the potential respondent belonged to a quota category already full was screened out of the study. Respondents entering the survey firstly completed the EQ-5D-3L for their own health and their self-reported health status. Then, each respondent answered the DCE task, the WTP task, and the TTO task (see “Appendix 1: questionnaire (selected sections)”). We used the EQ-5D-3L rather than the EQ-5D-5L version to keep the tasks faced by respondents as simple as possible. At the end of the survey, respondents were asked about their socioeconomic characteristics (income and employment status) and were given the opportunity to make comments on the questionnaire and indicate whether they found difficulties in completing the survey. Ethical approval for this study was granted by the Clinical Research Ethics Committee from Hospital Universitario Nuestra Señora de la Candelaria in the Canary Islands.

The questionnaire was piloted in a group of 15 researchers, clinicians and health economists, and in a sample of 200 respondents recruited from the internet panel. This resulted in minor improvements to the wording of the questionnaire being made and a change in the design of the DCE task as we detail below.

DCE task

The DCE task consisted of pairs of scenarios based on the health states described by the five attributes of the EQ-5D-3L (mobility, self-care, usual activities, pain/discomfort and anxiety/depression), with each attribute taking one of three possible levels of severity (none, some or severe problems—except for mobility that is defined as “confined to bed” at the highest severity level). Respondents were asked to imagine themselves living in two possible health states for a month, and to select which one they preferred.

The number of potential combinations of the EQ-5D-3L is 243 (35). With two options to choose from in each choice scenario, this gives a possible 58,806 choices (243 × 242). To select the pairs of choices included in our choice set, we applied a fractional design by a means of a Bayesian efficient approach with informed priors based on the coefficients and standard errors estimated in a previous study in Spain [20].

The final choice set was reduced to 80 scenarios which were split into ten blocks (“Appendix 2: choice set”), so each respondent answered 8 DCE questions (the pilot included 10 DCE questions per individual which was found to be too long and tiring by pilot respondents). The choice set had a series of restrictions: dominated scenarios (i.e. when one health state is better in each of the five EQ-5D-3L domains) were excluded and, after the pilot, we decided to only allow three out of the five EQ-5D-3L domains to vary between the two health states shown to respondents. The reason was that participants of the pilot found the task too difficult when every domain could take a different value. The choice set design was programmed in R and had a D-error of 0.206. We checked the choice set comprised a broad variety of mild and severe health states. The order of appearance of the 8 DCE questions and, for each pair of health states, the state shown as state A (right on the screen) or state B (left on the screen) were randomly varied for each participant of the survey.

As noted above, a limitation of using the DCE approach is that the values produced by DCE designs are in an arbitrary scale that need to be anchored into the 0 (dead) to 1 (perfect health) scale. We considered including a time attribute as an additional domain into the DCE questionnaire, which has been done in previous studies to achieve this rescaling [16, 17, 21]. However, since our aim was to combine information on individuals’ valuation of health gains in terms of utility and in terms of WTP, we preferred to keep the duration of the health state constant, and short, in both parts of the questionnaire. Therefore, we used the duration of 1 month as part of the description of both health states rather than as a varying attribute. Other approaches to address rescaling include using death as a possible outcome in the choice set or as an opt-out choice. These have, though, been criticised for not conforming to the random utility theory underlying the DCE model, as a subset of respondents might consider all health states to be better than death [22]. An alternative solution is to use information on the TTO score of one health state from an external source (e.g. available EQ-5D-3L value set from the country of interest), or to undertake a TTO exercise within the same DCE sample to re-anchor their responses to the death-perfect health scale based on the preferences of the same individuals [23]. We decided to conduct the latter in this analysis and we compared the results when external TTO information was used.

WTP task

After the DCE task, respondents were asked how much they would be willing to pay to avoid spending 1 month of their life with the health problems described by the EQ-5D-3L health states they faced at the DCE. However, to reduce the number of WTP questions posed to each respondent, the sample was randomly divided into two groups, with one half of the sample being asked their WTP to avoid the states shown as state A and the other half about states B. By doing so, each respondent answered 8 WTP questions and data were obtained for all the states used in the DCE.

WTP questions were posed as an out-of-pocket one-off payment that respondents would have to make to buy a medication that would cure the described health problems that would otherwise affect them during a month. The format of the possible WTP answers was six ranges covering a wide spectrum of quantities to limit framing bias: 0€–100€, 100€–500€, 500€–2000€, 2000€–6000€, 6000€–10,000€, or more than 10,000€, followed by an option that asked respondents to indicate the exact amount she/he would pay within the selected range, or a numerical open field that allowed participants to enter a value greater than 10,000€ if they chose the last option.

TTO task

In this last task, respondents were asked to compare two possible hypothetical scenarios: in one scenario, they would live for 10 years with a described health problem and then die, and in the other they would live fewer years, starting at 5 years, in perfect health. Respondents had to choose their preferred scenario or indicate if they were indifferent. Depending on their answers, an iterative process was performed in which the time they would live without any health problem increased (decreased) by 6 months if the respondent chose the option with higher (lower) life expectancy. This process was carried out until the respondents indicated they were indifferent between the two options, or until their answers changed direction.

We used the health state corresponding to “22222” of the EQ-5D-3L algorithm, that is, the state with “some” problems in each of the five dimensions. This state allows us to assess a situation where all the dimensions of the EQ-5D-3L are affected. Furthermore, this state is thought to be not serious enough for individuals to consider it worse than death, which creates difficulties when eliciting TTO values, but not mild enough for a significant percentage of population to be unwilling to trade any time to improve their quality of life.

Data analysis

Data collected in the DCE task were analysed using a conditional logit model. The data were analysed following the random utility theory framework that implies that the utility a person receives can be derived through a utility function with an explained component and a random component. The explained component is, in our case, a set of dummy variables corresponding to the levels for each EQ-5D-3L dimension. The model included ten coefficients corresponding to levels 2 and 3 of each dimension (MO = mobility, SC = self-care, UA = usual activities, PD = pain/discomfort, AD = anxiety/depression), using level 1 as the reference category. We conducted a main effect model, i.e. not including interactions in the analysis. This model took the form:

$$\begin{aligned} u_{ij} & = \alpha_{1} {\text{MO}}2_{ij} + \alpha_{2} {\text{MO}}3_{ij} + \alpha_{3} {\text{SC}}2_{ij} + \alpha_{4} {\text{SC}}3_{ij} + \alpha_{5} {\text{UA}}2_{ij} + \alpha_{6} {\text{UA}}3_{ij} \\ & \quad + \alpha_{7} {\text{PD}}2_{ij} + \alpha_{8} {\text{PD}}3_{ij} + \alpha_{9} {\text{AD}}2_{ij} + \alpha_{10} {\text{AD}}3_{ij} + \varepsilon_{ij} \\ \end{aligned}$$
(1)

where uij is the estimated utility; i denotes the individual, j the DCE pair and \(\varepsilon_{ij}\) is the error term. We computed the mean TTO score of the 22222 health state obtained from our sample and rescaled the coefficients by dividing each \(\alpha\) coefficient by the value corresponding to the ratio: (22222TTO − 1/22222DCE − 1) [16]. In addition, we used data from the TTO score of the 22222 health state from the current Spanish EQ-5D-3L value set [24]. The rescaled coefficients can be interpreted as the disutility associated with a change in each level of each dimension of the EQ-5D-3L. To account for the fact that the disutility would be experimented for 1 month, as defined by the questionnaire, the rescaled coefficients were divided by 12 when computing the associated monetary value of a QALY.

Data collected in the WTP task were analysed using a log-transformed multilevel random effects model. Results obtained using fixed effects models were very similar. This model took the form:

$$\begin{aligned} \log \,(d_{ij} ) & = \gamma_{1} {\text{MO}}2_{ij} + \gamma_{2} {\text{MO}}3_{ij} + \gamma_{3} {\text{SC}}2_{ij} + \gamma_{4} {\text{SC}}3_{ij} + \gamma_{5} {\text{UA}}2_{ij} + \gamma_{6} {\text{UA}}3_{ij} \\ & \quad + \gamma_{7} {\text{PD}}2_{ij} + \gamma_{8} {\text{PD}}3_{ij} + \gamma_{9} {\text{AD}}2_{ij} + \gamma_{10} {\text{AD}}3_{ij} + \mu_{i} + \varepsilon_{ij} \\ \end{aligned}$$
(2)

where log(dij) is the logarithmic transformation of the reported WTP; i denotes the individual, j the EQ-5D state, \(\mu_{i}\) is the individual effect and \(\varepsilon_{ij}\) is the error term. The \(\gamma\) coefficients of the dummy variables were then re-transformed using the formulae \(\left( {e^{\gamma } - 1} \right) \times {\text{mean}}\left( {\text{WTP}} \right)\) to compute the average WTP to avoid spending a month with a health problem corresponding to a change in each level of each dimension of the EQ-5D-3L.

We combined the information of each utility and WTP coefficient into a ratio that yields the WTP for a full QALY (i.e. the MVQALY) relative to each possible health gain described by the EQ-5D. By focusing on the estimated coefficients across the sample, we are using the so-called aggregate approach, that is, we estimate the mean WTP and mean utility value across the full sample separately and we combine them into a ratio (ratio of means), as opposed to the disaggregated approach, which implies calculating this ratio for each individual and computing the mean across the sample (mean of ratios). MQALY corresponding to the move from level 2 or level 3 to level 1 in each domain was computed, and we also derived them for each of the possible moves from the 242 health states described by the EQ-5D-3L to perfect health.

Utility and WTP models were jointly estimated using bootstrapping techniques with 1000 replications, which allowed us to compute 95% confidence intervals around the WTP for a QALY estimate. We also conducted the models controlling for individual characteristics in terms of their self-assessed health, EQ-5D score, gender, age, and monthly income. We explored the existence of outliers, and we further analysed the data excluding respondents that: (1) indicated the same WTP for each one of the eight health states shown, and (2) spent less than 10 min to complete the survey. The average time required to complete the survey was estimated in 20 min. Subgroup analyses were also performed to compare the results among the sample with very good/good health and those reporting fair/bad/very bad health status, as well as among individuals with reported income below and above the sample median.

All analyses were undertaken using the software package Stata 14.0 (StataCorp USA).

Results

Descriptive statistics

Data were collected for 2003 individuals between December 2016 and January 2017. Demographic, self-assessed health and socioeconomic characteristics are presented in Table 1, as well as the degree of difficulty respondents found when answering the questionnaire.

Table 1 Sample characteristics

Regional and gender representativeness was achieved in our sample, while the percentage of people aged 65 and over was lower (10% in our sample) than in the actual Spanish population [25] (22%). This is commonly the case when using online survey methods [26]. Responses to the self-assessed health were similar to the values collected by the National Health Survey in Spain (NHSS) in 2017 [27] which included over 22,000 adults. In our sample, 69% (66% in NHSS) reported very good or good health, 24% reported fair (24% in NHSS) and 7% (10% in NHSS) reported bad or very bad health. Monthly household income from respondents was also similar to that reported at NHSS: 58% reported a monthly household income lower than 2000€ (62% in NHSS), 26% between 2000€ and 3000€ (23% in NHSS) and 16% higher than 3000€ (15% in NHSS). 58% of respondents found the survey not difficult at all, while only 2% said it was very difficult.

Regression models

Regression model results using the full sample are presented in Table 2. Four observations that reported a WTP higher than 1,000,000€ were considered outliers and were thus excluded from the analysis. The first two columns in Table 2 correspond to the results of the conditional logit model using data from the DCE. The coefficients were rescaled using the value of the TTO score of the 22222 state obtained from our sample and estimated to be 0.564. This score was similar to the value previously estimated in the current value set of 0.572 [24]. There were no differences in the WTP for a QALY estimate when we use the TTO score estimated from our sample or the pre-existing tariff. Columns 3 and 4 report the coefficients related to the log-transformed WTP model and the corresponding re-transformed effects, respectively.

Table 2 Model results and WTP for a QALY estimate

All coefficients had the expected sign and were statistically significant. In addition, coefficients related to level 3 had a greater impact, in absolute terms, than those related to level 2, both in terms of utility and WTP, as expected.

The corresponding MVQALY values derived by combining information on disutility (i.e. rescaled disutility coefficient divided by 12) and WTP are presented in column 5. These values varied between 9795€ for level 3 on the anxiety/depression domain to 25,503€ for level 2 on the mobility domain. In general, WTP for a QALY estimate was higher when they were associated with moderate health gains, i.e. to a change from “some problems” to “none”, than when corresponding to changes from “severe problems” to “none”. When focusing on moderate health problems (i.e. level 2), problems on mobility and self-care were associated with higher WTP for a QALY value, while when considering severe problems (i.e. level 3), health problems causing severe pain/discomfort had the highest estimated WTP for a QALY. 95% confidence intervals indicate that there is a degree of overlapping between domains/levels. The mean value across the 242 possible moves from an unperfect health state to perfect health was 14,104€, with a range varying from the aforementioned 9795€–25,503€ values.

Table 3 presents MVQALY estimates when we added controls for health status, gender, age and monthly household income, and when we excluded respondents that reported the same WTP in all eight health states, and/or spent less than 10 min to complete the survey. Adding control variables slightly reduced the variability of the estimates produced by each domain/level. Excluding participants who reported same WTP values for each of the eight health states or took considerably less time than that expected to fill in the questionnaire, both resulted in higher WTP for a QALY estimate. This is particularly the case for the former, which implied excluding individuals who reported a WTP equal to 0€ in each WTP question. When we focus on the reduced sample (i.e. both not reporting same WTP and taking more than 10 min to complete the survey) and add control variables, the estimates varied from 11,145€ to 29,838€, with a mean value across the 242 potential health gains of 15,987€. We observe the same pattern in terms of (1) estimates related to moderate health problems (i.e. level 2 rather than 3) yield higher WTP for a QALY value (except for the pain/discomfort domain), and (2) the domains related to higher WTP values are mobility and self-care when the problems are moderate, and pain/discomfort when the problems are severe. Results from subgroup analyses indicated that the richest 50% had a higher associated WTP value for each level and dimension, except for moderate anxiety/depression, than the poorest 50%, while differences across the healthy/unhealthy subsamples did not show a clear pattern (Table 4).

Table 3 WTP for a QALY estimate—sensitivity analyses
Table 4 WTP for a QALY estimate—subgroup analyses

Discussion

In this study, we have explored the degree of variation in the societal MVQALY corresponding to different QoL improvements as described by all possible health states included in the EQ-5D-3L instrument. Our findings indicate that societal values for a QALY related to different EQ-5D-3L health gains vary approximately between 10,000€ and 30,000€. MVQALY associated with larger improvements on QoL was found to be lower than that associated with moderate QoL gains, indicating that WTP is less than proportional to the size of the QoL improvement. Across the five EQ-5D domains, we find that individuals considered severe levels 4–7 times worse than moderate levels, but were only willing to pay between 2 and 3 times more to avoid severe rather than moderate health problems. As a result, the corresponding WTP for a QALY estimate is, in most cases, about twice as large when we focus on moderate health gains. In fact, the highest (lowest) WTP for a QALY value is associated with the change in health individuals considered it to be the one causing the lowest (highest) disutility.

This apparent paradox is, however, a common finding in the literature [9, 10, 12, 28, 29], and the result of departures from the proportionality assumption and/or the lack of sensitivity of the WTP framework to elicit the true value of health gains. With regard to the latter, budgetary restrictions and insufficient adjustment bias have been hypothesised as potential reasons for the lack of proportionality in WTP responses [10]. We explored these possibilities within our data. If budget constraints were an important source of non-linearities, we would expect individuals with higher incomes to show more proportional values. This was partly the case; WTP for a QALY value estimated among the richest 50% was, on average, 1.57 times larger for moderate than for severe problems, while there were, on average, 1.98 times larger among the poorest 50% (see Table 4). This suggests that budgetary constraints do play a role on the observed lack of linearity, but there is a degree of disproportionality that remains even among the richest individuals. We also explored the presence of anchoring and insufficient adjustment biases, which in our context are related to the fact that individuals tend to decide the minimum/maximum price they would pay to avoid a health problem and then they insufficiently adjust this price upward/downward according to the severity of the health state. We ranked the health states used in the questionnaire according to the disutility estimated by our analysis and created five equally sized groups (see Table 5). We computed the mean WTP reported for the health states included in each group, the mean disutility and the corresponding mean MVQALY. We observed that the largest gap on the associated WTP for a QALY value pertains to the difference between the group of least severe health states and the others, while the values were relatively proportional across the remaining four groups. This suggests that insufficient upward adjustment bias might be even more important than budgetary restrictions, as budget constraints should have a bigger impact on the group of more severe health states which are associated with higher WTP values.

Table 5 Mean values by level of severity

Based on these findings, caution is needed when drawing conclusions on the results of this analysis. If taken as face value, the finding that moderate problems yield higher monetary values of a QALY than severe problems could be interpreted as implying that a lower cost per QALY threshold should be applied to technologies that achieve largest improvement of health than to technologies that achieve moderate improvements on health. However, this interpretation effectively assumes that the only reason behind the lack of proportionality observed in the WTP responses is that it reflects genuine preferences characterised by diminishing marginal utilities for health gains. However, our data suggest that the diminishing MVQALY is, at least partly, produced by the lack of sensitivity of WTP responses. We think this is the most plausible explanation since the insensitivity of WTP to the size of benefit has also been observed in other areas such as WTP for risk reductions or environmental benefits [30, 31]. Therefore, the use of different decision thresholds based on this information is unlikely to be justified.

This study has a number of limitations. Online surveys provide a series of advantages but also difficulties in accessing individuals aged over 65 years old, and we observed responses which did not meet expected standards, such as the time expected to complete the survey. Methodologically, using the DCE approach we could only apply aggregate methods to combine responses on the utility and WTP values, which prevents us from the possibility of computing individual-level WTP for a QALY value. DCE techniques also implied the need to rescale the model outputs. In our study, we did so by conducting a TTO exercise, so the anchoring values were based on the elicited preferences from the same sample. However, the inclusion of this exercise is subjected to the aforementioned difficulties of TTO techniques. We aimed to limit these issues by only performing this exercise for one health state that was carefully chosen to attenuate the limitations that arise with TTO methods, which include individuals who are not willing to trade any time to avoid the health problem described or individuals who consider the state to be worse than death. Our data showed that 14% of the samples were not prepared to trade the minimum time that was allowed in the questionnaire (6 months) to avoid the problems described by the state 22222 (i.e. potential non-traders), and that about 5% of the sample were prepared to trade the maximum time that was shown (9.5 years) (i.e. potentially considering the state worse than death). A further advantage of the use of the health state 22222 to rescale the coefficients is that all the dimensions of the EQ-5D-3L are affected and thus the rescaling reflects the impact in each of the dimensions. For these reasons, the state 22222 might be considered the most suitable state to conduct the rescaling, but we acknowledge that the results are contingent to the state chosen. Using the estimated TTO value to conduct the rescaling also assumes that this mean score can be applied, on average, across the full sample. Finally, in this study we have focused on exploring the consequences of the lack of linearity when varying the size of the QoL gain, but we did not consider different sources of variation in health outcomes, such as differences in durations of the health problems, or gains in life expectancy versus gains in quality of life. These have been the focus of previous research [9,10,11,12,13].

The monetary values of a QALY we estimate are similar to that provided in previous studies in Spain. Martin-Fernández et al. [32] estimated MVQALY ranging from 10,000€ to 28,000€ using a sample of 662 patients visiting health care facilities in Madrid. The EuroVaQ study run in nine countries with a sample of 2000€ individuals in Spain [26] estimated values in the range of 20,000€–40,000€ for this country. A previous study in Spain, which explored the sensitivity of the results to a series of variations in the design of the questionnaire, estimated values that varied between 5000€ and 124,000€ [10]. Our estimates are in line with recent evidence that have provided an estimation of the cost-effectiveness threshold based on the opportunity cost approach in Spain [6]. The average opportunity cost of health care funding decisions in Spain was proxied by the mean cost per QALY at the Spanish NHS and was measured between 22,000€ and 25,000€. This suggests that the thresholds derived from the opportunity cost framework are embodied within the range we have constructed using the societal valuation of health gains approach in Spain.

This study offers an estimation of the range of monetary values of a QALY associated with the full spectrum of health gains defined by the EQ-5D-3L instrument. This was based on a large and representative sample of the population in Spain. The study shed light on the implications that the lack of proportionality has on the estimation of WTP for a QALY value and on the potential sources of the observed non-constant MVQALY. More work is needed before the observed variations can be considered to suggest that different decision thresholds according to the type and/or magnitude of the health gain might be appropriate.