FormalPara Key Points for Decision Makers

The Better than Dead (BTD) method led to consistent weights for health attributes and duration.

A small number of maximal endurable time states was detected.

Decision makers might consider time-dependent values for analyses.

1 Introduction

Health-related quality of life (HRQoL) is a commonly used effect measure in economic and health care evaluations [1]. By expressing HRQoL as a single value anchored on full health (=1) and being dead (=0), comparison of health effects across different diseases and interventions is possible [2]. These health values are often combined with survival data to compute quality-adjusted life-years (QALYs). A variety of instruments exist to measure these values, such as the visual analogue scale (VAS), standard gamble (SG), time trade-off (TTO) and discrete-choice experiment (DCE) [3, 4].

In TTO and SG instruments, states better than being dead are valued on a scale ranging from being dead to full health. But for states worse than being dead, such a scale is not adequate, and another task is used. A challenge arises as different tasks are used to elicit values for states judged to be better than being dead (positive values) and states worse than being dead (negative values) [57] Transformations are used to place these values on a single value scale, but there is no agreed method for doing so [710]. A second challenge lies in the assumption that health-state values are independent of their durations. Studies have shown that in some cases, health states are valued differently as their durations increases. For instance, Sutherland et al. and Stalmeier et al. [11, 12] described a phenomenon where a severe positive health state is less likely to be preferred over being dead when the health state lasts longer than a certain duration, the ‘maximal endurable time’ (MET). Time dependency has implications for the way health-state values are used in Markov models, for example, where each state is assigned a utility value that does not depend on time.

Traditional valuation methods (VAS, TTO, SG) have not provided satisfactory solutions for the two challenges stated above [13]. Recent studies have partly overcome these challenges by using discrete choices between health states with different durations, enabling the estimation of negative values, using probabilistic choice models [1417]. Variants of the TTO method, e.g. the lead-time TTO method, have been developed to avoid transformation problems [18]. These methods tackle the first challenge described above but have not been capable of picking up MET states.

A valid method to assess health states worse than being dead has to be able to detect time-dependent states, because for such states, while they are initially judged as better than being dead, worse than being dead preferences may occur later on. To this end, the Better than Dead (BTD) method was developed, in which preferences are elicited between health states with specified durations (scenarios) and being dead [12]. Comparisons with being dead have been part of the elicitation procedure of traditional methods. However, the insight that comparisons with being dead can be the sole basis for deriving health-state values is new. The BTD method enables a straightforward valuation of states better and worse than being dead in a single question, and allows quantitative measurement of time-dependent values [12].

Our objective was to show that the BTD method can be used to estimate values for a multi-attribute health utility index in a web-based study. First, the social demographic and experimental determinants of BTD preferences were investigated. Second, the reliability and consistency of the BTD method were tested. Last, values were estimated for the EQ-5D system.

2 Methods

An agency for market research (Survey Sampling Europe BV) recruited respondents between 18 and 45 years of age from the Dutch population. The upper limit of 45 years avoided presenting unrealistic health states (e.g. confronting a 60-year-old respondent with a health state lasting 40 years). The respondents were offered tokens worth €6.00 for participation.

Health states in the questionnaire were expressed in the EQ-5D three-level system [19]. This system uses five attributes (mobility, self-care, usual activities, pain/discomfort and anxiety/depression) to describe health states. Each attribute has three levels [no problems (1), some problems (2) and extreme problems (3)]. Thus the worst health state the EQ-5D can describe has levels of 33333 and the best health state has levels of 11111.

The web-based questionnaire consisted of three parts. Part 1 consisted of questions regarding socio-demographic characteristics, and the respondents gave a self-description of their health status, using the EQ-5D. Part 2 started with a 2-minute video introducing how the EQ-5D system is used to describe health states and discussed an example choice between states 11111 and 11112. This was followed by a warm-up question consisting of a comparison between states 11113 and 31111 without a specified duration. An audio fragment pointed out the differences between these states. In the second warm-up question, health state 33312 was compared with death, described as being quick and painless. The notion of comparing states with death was introduced by audio fragments. It was explained that being dead should not be associated with a violent ending (some pilot study respondents held such interpretations). A final audio fragment introduced the notion of duration, and the notion that some may dislike a bad state lasting too long. The third warm-up question was between (11123, 10 years) and being dead. The respondents were asked to make a forced choice between two scenarios: ‘living for a number of years in a certain health state followed by dead’, or ‘dead’. Part 3 consisted of 108 of these so-called BTD questions (see Fig. 1). The respondents were not limited in the time allowed for them to answer the questions, but they were unable to skip questions.

Fig. 1
figure 1

Better than Dead (BTD) preference question: in the web-based questionnaire, respondents were asked to indicate whether they preferred state A or state B by clicking on the preferred option

A total of 50 health states, including 17 health states from Macran and Kind [20], were selected for the experiment, using Bayesian optimal design techniques. These were divided into three blocks of 108 scenarios (each block contained 18 health states, each with six different durations of 1, 2, 5, 10, 20 and 40 years). Level balance in each block was ensured. Respondents were randomly assigned to these blocks. The side on which ‘dead’ was presented (left or right) was balanced within each block. The presentation order of the scenarios was randomized for each respondent separately. Response times were recorded for each BTD question separately. By including 70 respondents per block, the maximum standard error for any proportion would be 0.06.

Respondents from the first block were asked to participate voluntarily in a second session to assess test–retest properties. Those who agreed were given 30 BTD questions combining ten states (11211, 11312, 12111, 23232, 32211, 13213, 33113, 13232, 22333, 21233) with three durations (2, 10 and 40 years). These states spanned the value range according to a previous Dutch valuation study [21]. Again, presentation order was randomized.

3 Analysis

The analysis covered the determinants, test–retest characteristics, consistency of BTD preferences and predicted values. Analyses were performed in R version 3.0.1 with lme4 and psych packages [22].

3.1 Determinants of BTD Preferences

The effect of self-description—that is, how respondents described their health on the EQ-5D attributes—and socio-demographic variables on BTD preferences were estimated using a binomial logistic random-effect (RE) model, including the 50 presented health states and the six durations as fixed effects. It included a random effect for respondents to take into account correlated valuations as each respondent rated multiple scenarios. The effects of the presentation order of the questions, presentation side of ‘dead’ and response time were estimated in a similar but separate model. For each respondent, the median of the response times to the 108 BTD questions was used as measure of response speed.

3.2 Test–Retest Properties

The test–retest properties of the BTD method were evaluated using tetrachoric correlations between the test and retest answers to matched BTD questions. Tetrachoric correlations are similar to Pearson correlations and correlate dichotomous choices by assuming these choices are based on normally distributed intrinsic values [23]. The tetrachoric correlations were calculated as measures for overall agreement aggregated on the 30 BTD questions asked in the test and retest across all respondents, and additionally in strata (tertiles) based on the median response time per respondent.

3.3 Comparison with Previous Work

The results were compared with previous work [12]. To assess the occurrence of MET states, MET states were defined by (5 years X) preferred to being dead but (20 years X) not preferred to being dead, as in previous work [12]. This time dependency of preferences is considered to be acceptable. The percentage of respondents with MET preferences was determined. To assess the consistency of the BTD method, two tests were performed: (1) if a state lasting 20 years was preferred to being dead, shorter (5- or 10-year) durations of that state should also be preferred to being dead; and (2) if being dead was preferred to a state lasting 5 years, being dead should also be preferred to longer (10- or 20-year) durations. These two tests assumed that health-state values were independent of duration, and violations were not acceptable. These analyses were limited to health states 11121, 11312, 13311, 11113 and 23232, as these were identical to the states used in previous work [12].

3.4 Modelling Health-State Values

The most commonly used version of the QALY model defines the QALY U for health state Q with duration t as \( U(Q,t) = V(Q) \cdot t \) [24]. The existence of time-dependent MET states violates this model, as the effect of t is different for different values of Q [24]. Thus, in order to model MET states, a generalization is needed:

$$ U(Q, t) = V(Q(t)) \cdot t $$
(1)

where \( V\left( {Q(t)} \right) \) allows for time dependence for the value of Q. For mild states, BTD preferences do not depend on time. Therefore, BTD preferences are assumed not to measure \( U(Q, t) \) but to measure V(Q(t)) [12]. V(Q(t)) is estimated by taking the latent value of the binary logistic random-effects model estimated on the data:

$$ {\text{pr(BTD)}} = \frac{1}{{1 + {\text{e}}^{ - V(Q(t))} }} $$
(2)

where pr(BTD) is the percentage of respondents preferring (Q,t) to being dead.

Preliminary analysis of the data, stratified by the six durations, showed that a model consisting of 11 dummy variables (two for each of the five EQ-5D attributes, with level 1 (no problems) as the base value, and an ‘N3’ term equalling 1 if any of the EQ-5D attributes are at level 3, otherwise it is 0) fitted the data best, according to likelihood ratio (LR) tests.

Because the relation between duration and health state is unknown, a full model containing all interactions between attributes and duration was chosen. To predict values, a random-effects model was used to model whether or not a respondent preferred a scenario to being dead. The full model is:

$$ V(Q(t))_{ij} = (\alpha + \underbrace {{\beta_{1}^{'} x_{ij} }}_{\text{Attributes}}) + (\underbrace {{\beta_{2} t + \beta_{3} t^{2} }}_{\text{Duration}}) + (\underbrace {{\beta_{4}^{'} x_{ij} t + \beta_{5}^{'} x_{ij} t^{2} }}_{{{\text{Attributes}} \times {\text{Duration}}}}) + (\underbrace {{\varepsilon_{ij} + \varepsilon_{ij} t + \varepsilon_{ij} t^{2} }}_{{{\text{Random}}\quad {\text{effects}}}}) $$
(3)

where i = 1,…,n represents individual respondents, and j = 1,…,m represents health states. \( \alpha \) is the intercept that sets V(Q,(t)) to zero for pr(BTD) = 50 % (see below), and \( \beta_{1}^{'} , \beta_{4}^{'} , \beta_{5}^{'} \) are matrices with weight estimates. x ij  is a matrix containing ten dummy variables of the EQ-5D attributes and the ‘N3’ term, β 2 and β 3 are weight estimates for durations t and t 2, and \( \varepsilon_{ij} + \varepsilon_{ij} t + \varepsilon_{ij} t^{2} \) are the random effects for α, t and t 2, respectively. The β-weights are the values associated with the attributes, ‘duration’ terms, and ‘attribute × duration’ terms. The three random effects in the model allow, per respondent, a different intercept, as well as different linear and quadratic effects for duration. Thus, for each respondent, health states may have different values on the V(Q(t)) scale, and different time-dependent effects may be modelled. In subsequent analyses, simplifications of this model were considered by dropping the quadratic and linear ‘attribute × duration’ terms in steps. Models were compared using LR tests, the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) [25, 26].

When respondents were on average indifferent between a certain scenario and being dead, that is \( {\text{pr(BTD) }} = 0.5 \), this meant that V(Q d(t)) ∼ V(dead). By formula 2, if \( {\text{pr(BTD)}} = \frac{1}{{1 + {\text{e}}^{{ - V(Q^{d} ,t)}} }} = 0.5 \), this implied that \( V(Q^{d} ,t) = - \ln \left( {\frac{1}{0.5} - 1} \right) = 0 \). Thus, on the V(Q(t)) scale, the value of being dead equalled 0.

To anchor the value so that full health was equal to 1, the latent values were divided by the latent value of state 11111, for which x ij was 0, at t = 10 years. This duration was chosen because it was in the centre of our range and has commonly been used in other valuation methods. In Eq. 3, when x ij  = 0, the only remaining terms were α, β 2 t and β 3 t 2, so the anchored value of health state Q j with duration t would become:

$$ V(Q_{j} ( t))_{\text{anchored}} = \frac{{V(Q_{j} (t))}}{{\alpha + 10\beta_{2} + 10^{2} \beta_{3} }} $$
(4)

To generate a visual representation of the time dependency assessed with the BTD preferences, we plotted the QALY model V(Q(t)) · t for a number of health states [24]. This analysis assumed zero time preferences. A selection of ten health states, spread out evenly over the value range at 40 years, was made to avoid the visual clutter from depicting all 50 health states.

4 Results

4.1 Determinants of BTD Preferences

The study population was comparable to the Dutch population aged 18–45 years in terms of gender, religion, educational status and marital status, compared with public data from the Dutch Bureau of Statistics (http://www.cbs.nl). The characteristics of the participants are listed in Table 1. Of the 291 respondents, 4 (1.4 %) did not complete the questionnaire for reasons unknown, resulting in 197 missing responses. The partially completed questionnaires were included in the analyses, resulting in a total of 291 × 108 questions – 197 = 31,231 answered BTD questions. Preferences better than being dead occurred in 69 % of all BTD questions. Fifty-one responders (18 %) had lexicographic preferences, of whom 50 preferred all scenarios over being dead and one respondent preferred being dead over all scenarios. The median response time over all 108 BTD questions was 4 s, ranging from 2 to 20 s. Of the lexicographic respondents, 78 % were found to be in the fastest tertile of response times.

Table 1 Population characteristics

Gender, age and marital status were not associated with preferences. Having a religious life stance [raw BTD preferences 76 versus 65 % for a non-religious life stance; adjusted odds ratio (OR) 4.086 (95 % CI 1.996–8.363)] and a low educational level [80 % versus 67 % for medium and 71 % for high; ORs 6.942 (95 % CI 1.645–29.297) for low and 1.276 (95 % CI 0.607–2.680) for high, respectively, versus the medium educational level] were associated with more states being preferred to being dead. Scenarios were preferred to being dead more often by those indicating that they had problems with daily activities [76 versus 68 %; OR 4.340 (95 % CI 1.637–11.504)] and less often by those with problems on the pain attribute of the EQ-5D [68 versus 70 %; OR 0.420 (95 % CI 0.195–0.905)]. The presentation order of a scenario and the side on which ‘dead’ was presented (left or right) were not significant, nor was the time of day when the questionnaire was completed. Respondents in the slowest and medium tertiles of the median response time preferred fewer health states to being dead [ORs 0.12 (95 % CI 0.05–0.25) and 0.06 (95 % CI 0.03–0.13), respectively, versus the fastest tertile]. This relation remained after exclusion of respondents with lexicographic preferences.

The number of participants in block 1 was larger than those in the other blocks (138 participants versus 78 and 72) to yield enough retest respondents. Significant differences between the blocks existed in raw BTD preferences (65, 77 and 69 %), gender, age and response times but not in the responses to the warm-up questions. Full data are available in the Electronic Supplementary Material.

4.2 Test–Retest Properties

In the retest data, 15 out of 1,230 responses (41 respondents × 30 questions) were missing. The tetrachoric correlations showed overall agreement of 0.78 between the test and retest. For the fastest tertile of responders, the agreement was lower (0.45) compared with the medium and slowest tertiles (0.77 and 0.84, respectively).

4.3 Comparison with Previous Work

MET states were found in 15 % of all respondents and varied from 2 % to 6, 8, 9 and 17 % across the five health states 11121, 11312, 13311, 11113 and 23232, respectively. These percentages are lower than the previously found percentages of 4, 13, 28, 22, and 26 %, respectively [12].

Of the 488 cases in which (Q, 20 years) was preferred to being dead, 44 cases (9 %) did not prefer both shorter durations over being dead. Of the 122 cases in which being dead was preferred to (Q, 5 years), 43 cases (35 %) preferred both longer durations over being dead. Both tests showed higher percentages of inconsistencies than the previous study, which showed 1 and 5 % of inconsistencies for these two tests, respectively [12]. For the second test, there were significantly more inconsistencies in the tertile with the fastest respondents (OR 2.78; 95 % CI 1.27–6.20).

4.4 Modelling Health-State Values

Figure 2 shows the raw data—that is, the proportion of respondents with BTD preferences for a selection of health states. If values are independent of duration, these lines should be horizontal. Table 2 shows estimates of the full model (Eq. 3) in the last three columns. The relevant attribute weights were consistent—that is, larger coefficients for level 3 than for level 2—and were significant, except for MO2. This model had the best performance of the two models presented. Table 2 also presents the reduced model containing only the EQ-5D dummy variables and ‘duration’ plus ‘duration2’ terms. The negative coefficient for duration in the reduced model reflects that preference strength declines with duration (see Fig. 2). The similarities between the estimated coefficients of the full and reduced model led to very similar predictions for \( V (Q (t ) ) \) (results not shown).

Fig. 2
figure 2

Percentage of respondents who rated a scenario as better than being dead [pr(BTD)] for ten selected health states. Data were collected for t = 1, 2, 5, 10, 20 and 40 years. The lines were added for clarity

Table 2 Parameter estimates from modelled Better than Dead (BTD) preferences

Figure 3 shows the estimates for V(Q(t)) · t and their relation with the duration of a health state, based on the full model. Independence of utility and duration would show up in this graph as straight lines fanning from the origin. Though some curvature can be seen in Fig. 3 (especially for the more severe health states), visual inspection suggests that the overall dependence on duration is small. Out of all 243 possible EQ-5D health states, the full model yielded 13 MET states (12233, 12333, 13233, 22233, 23233, 31133, 31332, 32132, 32232, 32331, 33132, 33231 and 33331), where MET is defined as (5 years X) preferred to being dead but (20 years X) not preferred to being dead. A list of QALY values V(Q(t)) · t for all 50 health states calculated using the full model can be found in Table 3 in Appendix 1.

Fig. 3
figure 3

Quality-adjusted life-years (QALYs) estimated using the full model for ten selected health states. State 33323 (bold, solid line) is a maximal endurable time state. Calculations were performed for t = 1, 2, 5, 10, 20 and 40 years. The lines were added for clarity

5 Discussion

The BTD valuation method was used in a web-based questionnaire to value health states from the EQ-5D system. A consistent set of values was derived for the EQ-5D system. The values did not depend strongly on time, but, nevertheless, some MET states were detected. Test–retest properties were good but deteriorated for the fastest responders. The number of non-completers was small (1.4 %). The results indicated that, in the presence of introductory video and audio fragments, the respondents understood the tasks without help from interviewers or researchers.

Preferences depended on socio-demographic characteristics. The relation between religious life stance and preferring states to being dead was to be expected, as religiousness has been linked with non-trading in TTO [27] and difficulties in conceptualizing states worse than being dead, inadvertently linking them with euthanasia [28]. Self-described health states and a low educational level were also significant factors, consistent with findings in other studies [27, 29]. Other socio-demographic characteristics were not significant. Age has been found to be a determinant in other studies [30] but not here, possibly because of the restricted age range of our participants (18–45 years).

A comparison with the previous Dutch valuation study showed that weights for mobility, self-care and usual activities were similar—that is, the weights were between 0.123 and 0.184; pain/distress had the largest weight in both studies (0.414 and 0.419); however, the weight for anxiety/depression was lower in this study (0.241 versus 0.408). While these converging results are encouraging, a comparison with our previous study [12] caused concerns about strategic behaviour—that is, skimping on the task. Response times in this study (median 4 s per BTD question) were much faster than the 20 s in the earlier study. While the number of lexicographic respondents was similar (18 and 19 % [12]), this study went on to show that of the lexicographic responders, 78 % were found to be in the fastest response time tertile. In the same vein, more inconsistencies were found in the present web-based study compared with the previous study, and this was, again, more frequent for the fast responders. These findings suggest that strategic behaviour may have affected the data quality. The causes may lie in any of the differences between the earlier study and the present one, such as the presence of an interviewer in the previous study, the use of a paper-based questionnaire in the previous study versus a stand-alone web-based questionnaire in this study, the smaller number of BTD questions in the previous study (15 versus 108) and the fact that the previous study followed the MVH protocol more closely [31]. This study also found fewer MET states. This suggests that ‘how strongly values depend on duration’ may be more sensitive to task characteristics than anticipated. This concurs with the view that preferences are not pre-existing but are constructed by task demands [32]. It is our personal impression that in the web-based questionnaire used in this study, the data quality was lower—a view supported by mixed evidence [3335].

Nevertheless, progress has been made. In the BTD method, positive and negative health states are valued using a single question. The method combines the advantages of a discrete-choice experiment (DCE), i.e. a low cognitive burden [36], and includes duration in the preference choices [37]. Values were estimated using a random-effects model, as has previously been used by others [15, 38]. In the resulting scale, the value of zero corresponds to the value of being dead, solving previous problems with the estimation of the health state ‘dead’ [17]. The random-effects model led to valid estimates, even though 18 % of the respondents had lexicographic preferences, mentioned earlier as being problematic [39]. The random factors per respondent appear to solve this problem.

6 Conclusion

The main goal of this study was to test the feasibility of using the BTD method to estimate values for multi-attribute health-state classification systems, such as the EQ-5D. The results of this study support the feasibility of the BTD method. Specifically, the method yielded consistent scores for the EQ-5D health attributes. The results showed that the values were mostly independent from duration, though some MET states were detected. Further research should explore how the experimental setting modifies the effect of time on values.