Introduction

Within the framework of conventional Cost-Utility Analysis (CUA) the initial health state of a person is only of importance to the extent that health improvement depends on health-related quality of life (HRQoL) before and after treatment. The initial health state of a person per se is irrelevant. However, when informed of the fact that individual patients find two health improvements to be of identical benefit, members of the public generally express a strong preference for allocating resources to those patients with the worst initial health state. This result has been independently derived in Norway, Australia, the USA, Spain and the UK (see Table 1). It is true that moderately ill patients can only benefit moderately from treatment, whereas severely ill patients can benefit more. But when patients are expected to derive the same benefit, and all else is equal, conventional CUA provides no basis for distinguishing between them. There is no value associated with the severity of the initial health state itself. Conventional CUA disregards entirely the following sort of sentiment, as expressed by Callahan: ‘Our bias, I contend, should be to give priority to persons whose suffering and inability to function in ordinary life is most pronounced, even if the available treatment for them is comparatively less efficacious than for other conditions’ [1, p. 463]; a point also noted by Cohen: ‘society may want to direct resources preferentially to those who are farthest from good health, even if larger aggregate benefits could be obtained under a different distribution’ [2, p. 287].

Table 1 Studies of social preferences for severity of illness

The idea that the worse off—e.g. the more severely ill—have a moral claim for special consideration has considerable intuitive appeal. It can be found in official government guidelines in several countries, and in reports of government-appointed commissions [35]. It is encapsulated in Rawls’s ‘Difference Principle’, which states that social and economic inequalities are justifiable only in so far as they are to the greatest benefit of the least advantaged members of society [6]. Daniels has explicitly extended Rawls’s Difference Principle to health care, arguing that fair equality of opportunity requires the provision of health care to those in greatest need—in the present context, the more severely ill [7].

The importance of severity is not a purely theoretical matter. In the US severity has been the dominating factor in the allocation of heart and liver transplants (when need exceeds supply). Those with the best prognosis after receipt of an organ are those with the least severe illness, and maximum health gain would be achieved by giving this group priority. By contrast, the actual policy gives a very high weighting to those with the most severe problem. This results in the counterintuitive situation where the relatively healthy must wait until their health has deteriorated sufficiently for them to satisfy the severity criterion [8, 9]. However, this policy is only counterintuitive against a back-drop that assumes that maximising health gains is the overriding social objective. In the present case, health production is explicitly of secondary importance to severity (Kidney transplantation is not included in this policy since dialysis is available as an alternative—that is, the condition without transplantation is not sufficiently severe for inclusion in the policy).

Discordant evidence concerning social preferences

To test the importance of severity, Nord surveyed 150 Norwegian politicians involved in health-care decision making and found that 38% would give priority to the treatment of an illness that gives ‘severe’ problems in preference to an illness that gives ‘moderate’ problems, even though treatment would help those with the severe illness only ‘a little’ whereas it would help those with the moderate illness ‘considerably’ [10]. Another 45% would divide any increase in funding evenly between the two, leaving only 11% who would follow the health maximisation strategy of conventional CUA. Only general descriptors were used in this study and no attempt was made to measure strength of preference, but the results give some indication of the direction of the preferences, albeit with a large egalitarian component.

However, a study by Dolan and Green produced results apparently at odds with this [11]. Using EuroQoL health states, they asked subjects to select a health state, D, such that they would consider a move from C to D to be of equivalent value to a move from A to B, where A is a more severe health state than C. They then asked respondents to do a PTO involving treatment T1, which would take patients from A to B, and treatment T2, which would take them from C to D. In contrast to a concern for the more severely ill, only 7 out of 28 respondents preferred T1 whereas 17 preferred T2.

It is possible respondents misunderstood aspects of this task. A hint of this comes from the qualitative results, which are curiously non-explanatory. Of the 17 respondents who preferred the T2 treatment, 12 commented: ‘I would say that treatment 2 was a definite improvement… in treatment 1 the difference isn’t so great’. If subjects were saying they considered T2 of greater value from a personal perspective, then this would seem to be inconsistent with their choice of D. If they meant that T2 was of greater value from a social perspective it is not clear why. Lack of reflection is a possibility explanation for these results. That notwithstanding, 25% of respondents still chose T1 which would benefit more severely ill patients.

In a subsequent study using prospective jurors as subjects, Ubel replicated in the US the earlier study by Nord [10]. Like Nord, he found that many people are prepared to sacrifice overall health gains to benefit those with the worst initial health state. Of 479 subjects, 9% gave priority to patients with ‘moderate’ health problems, 26% gave priority to those with ‘severe’ health problems, and 64% chose to divide resources equally between the two groups. However, Ubel noted that when subjects are not given the option of dividing resources, only a small majority favoured giving priority to patients with severe health problems. This raises the possibility that subjects in Nord’s original study may not have been expressing a preference for severity per se, but ‘may have simply been unwilling to make a difficult treatment choice’ [12, p. 897]. Also, Ubel found that responses were sensitive to the wording of the options. For example, when subjects were reminded about how much improvement each group was expected to undergo (‘a little’ for the severely ill, and ‘considerably’ for the moderately ill), fewer participants gave priority to the severely ill (6% compared with 26% without the reminder), and more gave priority to the moderately ill (21% compared with 9% without the reminder). This does not negate the significance of severity. However, it does indicate that considerable caution must be exercised in calculating precise severity weights.

A recent study by Dolan and Tsuchiya produced an even more startling result [13]. They asked subjects to rate a number of patient groups that differed, inter alia, in terms of the quality and length of their lives without treatment. They found that respondents gave consistently higher priority to patients with better prospects without treatment—be it in terms of life expectancy or quality of life—than patients with poorer prospects. If these results are to be believed, being more severely ill gives patients lower priority for health care from a social point of view.

Nord offered one possible explanation for this anomalous finding: respondents may have thought that the group with the better prospects had better prospects as a result of treatment [14]. Nord calls this the ‘fatal misunderstanding’ hypothesis. Dolan and Tsuchiya did not explicitly discuss this possibility in their original article, but pointed out, and reiterated in a rejoinder, that ‘respondents may not have processed the information as intended’ [15]. However, they do not dismiss the results entirely. They point out that the support for the more severely ill detected in other studies could likewise have resulted from respondents mistakenly thinking that the benefit to the more severely ill will be greater.

In light of these inconclusive findings, it is clear that a final assessment of the importance of severity can only be reached by weighing up the evidence for and against. The number and quality of studies in support of severity must be balanced against the number and quality of those against. Importantly, the conditions under which severity should be taken into account, and the weight it should be given, must also be derived from the evidence. The present paper aims to make a contribution to this debate by deriving inferred severity weights for QALYs based on data collected for the Assessment of Quality of Life (AQoL-II) project. This is undertaken in Sects. 46. First, however, we briefly review the remaining empirical evidence in Sect. 3, which supports the importance of severity. One objective in the following section is to observe the different contexts in which the severity hypothesis has been supported.

Supporting evidence

In one of the earliest studies on social concerns for severity—certainly one of the earliest that attempted to quantify preferences for severity—Nord found that returning one person to full health from the following state—‘unable to work, unable to pursue family and leisure activities, strong pain, depressed’—was considered as valuable as returning 50 people to full health from the following state—‘unable to work, moderate pain’. However, the utility values for these states (assigned by the participants using a rating scale) implied that curing one person in the more severe state should be equivalent to curing two people in the less severe state [16, 17]. This indicates that the social value assigned to treating the more severely ill was much higher than would be expected from the patient utility scores.

In a joint Norwegian-Australian study, also using the PTO, Nord, Richardson et al. surveyed members of the general public (in Norway) and students and nurses (in Australia). Subjects were asked to adopt the perspective of members of Parliament and to choose between two equally expensive special health units. Unit A would save ten people per year from dying and restore them to full health. Unit B would restore to full health a larger number of patients suffering from a chronic illness. The PTO results were again higher than would be expected from the individual utility scores for these states. The utility scores seriously underestimate the social value placed upon the health states when the alternative is death. The authors comment, however, that subjects’ responses to the PTO exercise were highly dispersed, ‘indicating the likelihood of a high sampling error for the median values’ [18, p. 467].

In another study, Nord asked a convenience sample of ten individuals from the National Institute of Public Health in Oslo to compare improvements in health on a seven-level disability scale with approximately equal distances between the levels [17]. For example, subjects were asked to indicate how many patients moving from level 5 to level 1 on the scale they considered equivalent to a smaller number of patients moving from level 6 to level 4 (where lower numbers represent better functioning). Again, the results showed a marked preference for treating the more severely ill (see Table 2). Although taking a patient from level 5 to level 1 should be twice as valuable as taking a patient from level 6 to level 4, taking into account only gains in HRQoL, Nord’s subjects judged them to be approximately equal: taking 16 patients from level 5 to level 1 and taking 17 patients from level 6 to level 4 were both found to be equivalent to taking one person from dying to a state of healthy (the latter was used as a reference state). One of the important aspects of this study was that the levels on the scale were judged approximately equal by the subjects themselves. This makes it difficult to explain the discrepancy between the utility-based predictions and the direct measurements by arguing that the health states were not placed on an interval scale.

Table 2 Numbers of different outcomes that may be considered equivalent in social value

Ubel, Spranca and colleagues, using the same seven-step scale as Nord, found that the observed preference for more severely ill patients extends to preventative interventions [19]. Using prospective jurors as subjects, Ubel, Spranca, et al. found only a slight preference for preventative over curative interventions when they brought similar benefits at similar costs. However, there was a significant preference for helping the more severely ill patients in both contexts. Moreover, this preference was observed, in both the curative and preventative context, even when the more severely ill would benefit less.

In another study, Pinto-Prades asked subjects in Spain to assume the role of health planners in an exercise designed to compare the Visual Analogue Scale (VAS), the Standard Gamble (SG) and (three forms of) the Person Trade-Off (PTO) [20]. The study used four EuroQoL health states for comparison. The values assigned to health improvements from these health states are shown in Table 3. Again, it can be seen that the PTO places much higher value on life saving, and ameliorating severe conditions, than the VAS or SG, and less weight on curing milder conditions. For example, to gain the equivalent of returning 4 people from health state 12,121 (a mild health state) to perfect health according to the VAS (0.75 × 4 = 3) it would be necessary to return only 3 people to full health according to the TTO (0.99 × 3 = 2.9), whereas to gain the equivalent of saving 5 people’s lives who would be left in health state 32,331 (a severe health state) according to the VAS (0.84 × 5 = 4.2) it would be necessary to save 10 people’s lives according to the TTO (0.41 × 10 = 4.1).

Table 3 Sizes of intervals as measured by VAS, SG and PTO

In a study in the US, Ubel, Loewenstein and colleagues conducted a study with economics students also using the VAS, the SG and TTO. They measured the utility associated with three health states—ganglion cyst of the hand, ligament damage to the knee, and severe headache. They then tested the same students one to 3 weeks later using the PTO to elicit their rationing choices for groups of patients with the same three conditions [21]. They also added a fourth, fatal condition—appendicitis. Participants were asked the following question concerning appendicitis and meningioma (and similar questions concerning the other conditions):

A. Which do you think would bring the most benefit?

  • ten people cured of appendicitis

  • ** people cured of meningioma

  • indifferent

The questions were tailored to individual participants—that is, the double asterisks were replaced by the number at which each participant was predicted to be indifferent, based on their answers to the utility elicitation questions. Subjects did not agree with the rationing implications of their answers to the utility elicitation questions. Higher values were assigned to health states that were initially more severe (despite this being taken into account in the evaluation of the utility scores) and the preference to treat more severely ill people was ‘consistent across all six rationing choices and all three methods of elicitation’ [21, p. 113].

Finally, in an Australian study involving 78 economics students, Richardson [22] ensured that the value of health improvement to patients at different levels of severity was perceived as being the same by informing subjects that the patients would be prepared to pay $30,000 for either the treatment of illness A or illness B; alternatively, that the patients considered health improvement from these treatments to be sufficiently valuable that they would sacrifice 1 year of their life to receive them in both cases. Subjects were then asked to adopt a social perspective by imagining that they were on a health committee of Parliament and had to prioritise the two treatments (see Fig. 1). Higher priority was given to illness A and illness B by 57% and 16% of respondents respectively, and equal priority (indicating that social value equals individual value) by 28%. When asked to nominate the number of people that would need treatment for illness B to generate the same social value as 100 people receiving treatment A the mean and median values were 318 and 200 respectively; that is, the value of the treatment for the more severe illness was valued between 2 and 3 times more highly than the value of the less serious illness despite patients valuing the treatments equally.

Fig. 1
figure 1

Quality of Life Scale

Taking into consideration the information on public preferences revealed in the previous studies, Nord [23 p. 37–38] divides health states into three classes—‘severe’, ‘considerable’ and ‘moderate’—and assigns them values consistent with the emerging empirical data outlined above. See Table 4. These values give rise to some ‘rules of thumb’ concerning severity: saving someone from death is something like 3–6 times better—has greater social value—than curing someone of a severe health problem (and returning them to full health), something like 10–15 times better than curing someone of a moderate problem, and 50–200 times better than curing someone of a moderate problem. According to Nord: ‘Quantitative models that purport to be useful for estimating the social value of health care activities in these countries (Australia, England, Norway, Spain and the US), as well as in other countries with similar values, must reflect this structure of concern’ [23 p. 38]. By contrast with the health-state (QALY) values used in CUA—derived, for example, by means of the SG, TTO or RS—these social values are higher, particularly at the upper end of the scale. The value structure encapsulated in Table 4 therefore ‘compresses health states to the upper end of the scale’ [23 p. 38].

Table 4 Rules of thumb concerning severity (after Nord 1999)

It would appear that, from a social perspective, conventional CUA underestimates the relative value of curing severe health problems, including life-saving treatments. Treatments for the more severely ill are favoured over those for the less severely ill, both when respondents themselves consider the two treatments of equivalent value from a personal point of view, and when they are told that (hypothetical) patients consider them equal. It is supported even when giving priority to the more severely ill means reducing overall health gains (severity has more than just tie-breaking significance). This result has been observed in the case of preventative and curative interventions, and when non-fatal conditions of different severity are compared as well as when life-threatening and non-fatal conditions are compared.

Nevertheless, a small number of studies have produced contrary results. At this stage, the significance of these noncompliant studies is unclear: they may indicate the importance of other factors (e.g. expected benefit, final health state, age) that can override severity, misunderstanding on the part of respondents, or national differences. Clearly, more evidence on the social importance of severity is desirable. In the remainder of this paper we present evidence relevant to this question, based on data derived from the construction of the Assessment of Quality of Life (AQoL-II) project. These are used to test a general relationship between severity and utility gain as distinct from a single test of the main hypothesis.

A new study: methodology and data

In the present study, PTO values are used to indicate the social value of a movement from death to a health state less than full health. TTO scores are used to measure the individual utility of each of these final health states. The disutility of a health state (1—utility) is the measure of severity of the state in which the person will be left by the treatment. Differences in PTO scores are regressed upon the difference in the utilities (health improvement) and also the severity of the more severe health states in which patients would remain without further treatment. The analysis is based upon three key assumptions. The first two, that the PTO and TTO measure the social value of a health program and the individual utility of a health state respectively are widely, but not universally, accepted. The relevant arguments are outside the scope of the present article. The third is that the importance of severity declines as health improves: that the severity affect is subject to diminishing returns with respect to health. This implies that differences in health improvement as judged by the TTO will be more highly valued by society when health states are poor. The relevant health states are at the severe end of the health state spectrum.

The hypothesis is illustrated with a simple numerical example in Fig. 2. In this, four programs P1 … P4 will each save a person’s life and leave them in a health state with utility scores 0.2, 0.4, 0.7 and 0.9 respectively. P2 and P4 result in TTO scores that are 0.2 higher than P1 and P3 respectively. However P1 leaves a patient in a more severe health state (TTO = 0.2) than P3 (TTO = 0.7) and creates greater social disutility than the severity of other health states. The study hypothesis is therefore that the difference in the PTO scores between P2 and P1 will be greater (say 0.3) than the difference between P4 and P3 (say 0.15). This is because P2 does not leave a patient in a health state as severe as P1. While P4 does not leave patients in P3 this is of less concern as P3 is a less severe health state than P1 and the importance of severity is subject to diminishing returns. The differences (P2–P1) and (P4–P3) are therefore hypothesised to be positively related to the magnitude of severity after standardising for differences in TTO scores.

Fig. 2
figure 2

Utility (TTO) program value (PTO) and severity

An alternative (over) simplified explanation of the hypothesis is to assume PTOi = TTOi + Si where S is the net social utility of a person being taken from death but left in the health state measured by TTOi. Consequently PTO2–PTO1 = 0.2 + (S2 − S1) and PTO4–PTO3 = 0.2 + (S4 − S3). Our hypothesis is that (S2–S1) > (S4–S3) as a result of diminishing returns to severity as health improves and our data are used to test this.

While the equation above (PTOi = TTOi + Si) is numerically correct it implies an over-simplification of the relationship between TTO and PTO which measure different quantities—a state versus change—from different perspectives—individual versus social—and with different framing effects—time discounting and distributional preferences. Consequently, the difference between PTO and TTO is likely to reflect more than just severity. Our test is that differences in PTO are a function, inter alia, of changes in TTO and the severity of the worst state in which a person may be left.

The study used health state descriptions and data from the Assessment of Quality of Life (AQoL-II) project [2428], the descriptive system. The health state descriptions formed by this instrument have a very high level of psychometric integrity and represent a valid and reliable description of critical health states. In the second stage of the AQoL project the instrument was scaled and, as part of the validation process, TTO and PTO scores were obtained for 18 multi-attribute health states from respondents selected to represent the Australian SES and demographic profile. TTO data were collected in the conventional way. Using a slide board as a visual aid subjects were asked to select between 10 years in the relevant health state and a reduced number of years in full health. The latter were ‘flip flopped’ until the subject believed the value of the reduced years was equivalent to the value of 10 years in the health state. TTO scores were obtained by dividing this number of years by 10.

PTO scores were obtained as shown in Fig. 3. Using a visual aid, subjects were asked to select between two programs, P1 and P2. Program 1 would save the life of 100 patients and return them to full health. Program 2 would save the life of x patients and leave them in the health state of interest, Us. The value of x was varied until the two programs appeared to be of equal value. ‘Social utility’ scores (i.e. value obtained from a social perspective) were obtained from the equation Us = 100/x.

Fig. 3
figure 3

Health gain, same start points

These PTO data all related, as described above, to programs commencing at death (without treatment). To test the study hypothesis different combinations of programs were compared. This had a twofold advantage. First, it increased the number of observations which could be observed from a single individual. Secondly, it allowed the observations of marginal improvements in health from two alternative programs. As data were not collected for the full 18 health states from all subjects it was not possible to construct all combinations of health state movements. In total, the study constructed 36 comparisons of programs which left patients at different levels of severity. An average of 22.75 observations was obtained for each comparison giving a total of 819 individual observations.

As described, the importance of severity was tested by econometrically regressing individual values for health changes upon the health state improvement measured by the TTO scores (TTOi–TTOj) and upon the severity (disutility) of the poorer health state in which a patient would remain without further treatment (1–TTOj). A power function was used as shown in Eq. 1. This was selected as a flexible functional form with the required property that the equation must pass through the points (0,0) and (1.00, 1.00).

$$ {\text{PTO}}_{ij} = \left( {{\text{U}}_{i}-{\text{U}}_{j} } \right)^{\alpha } \cdot {\text{ DU}}_{j}^{\beta } $$
(1)

If social and individual assessments were identical then α = 1 and β = 0. With α < 1, β = 0 the PTO would give additional weight to smaller health improvements which would be consistent with the greater scores given by PTO to life saving programs or programs removing people from severe health states. But it is inconsistent with the observation that PTO gives smaller, not larger, scores to programs giving incremental improvements to patients in less severe health states. The study hypothesis would result in α < 1, β < 1. Small health improvement where health states are severe would have amplified importance. Small health improvement nearer full health would have relatively smaller effects as determined by the relative magnitude of α and β.

Results

Demographic characteristics are summarised in Table 5. A response rate of 41.7% was obtained from the 1,030 possible respondents. Compared with the Australian population a disproportionate number of respondents had a tertiary degree. Otherwise the sample characteristics satisfactorily reflected those of the general population.

Table 5 Scaling survey: respondents and response rate

Results of the econometric analysis are reported in Eqs. 1 and 2 in Table 6. Both OLS and random effects models were employed with the latter taking account of the clustering of observations on individuals. For both functional forms coefficients on health improvement and severity were significant at the 1 percent level. α and β coefficients were, as hypothesised, less than unity. Taking account of individual clustering (Eq. 2) the random effects model assigned greater importance to severity and a corresponding reduction in the importance of health improvement.

Table 6 Model: PTO = (U1–U2)α · (DU2)β

One criticism of the PTO technique used in this study is that some subjects may treat large numbers metaphorically, not literally (e.g 1,000 means ‘many’; 10,000 means ‘very many’) [29]. Implausibly large values of n in the calculation of utility (100/n) will result in implausibly low utility scores. As a consequence, a second analysis was carried out to test the sensitivity of results to this possible problem. Values that were more than 0.4 below the resulting mean utility score for a health state were deleted and every other observation for that individual was deleted. The severity of this criterion reduced the number of individual observations to 237.

Results of the econometric analyses of the censored dataset are reported in equations 3 and 4 of Table 6.

Contrary to expectation, the results were not particularly sensitive to the editing of data. With and without censoring, both the improvement in health (the treatment effect), U1–U2, and the initial severity of the health state, DU2, were significant, with the co-efficient on the former falling marginally with data censoring and the co-efficient on the latter increasing. Wold and R 2 summary statistics indicate that in the reduced dataset results also have high explanatory power.

Results from equation 4, which is the theoretically most reliable result, were used to generate value scores for a range of health gains and initial severity levels, as indicated in Table 7. The importance of severity may be seen by reading down the columns. For example, from column 1 the health gain for point 0.2 has a social value of 0.22 if the initial severity (DU) is 0.2. The same health gain is worth 0.31 with an initial severity of 0.4 and 0.47 if the initial severity is 1.0 (i.e. the patient would otherwise have died).

Table 7 Social value by utility gain and severity

The effect of severity is highlighted in Table 8, which takes the ratio of the change in social value to the change in utility. Reading down the same column, a gain of 0.2 due to health improvement has rapidly increasingly social value as the severity of the initial health state increases.

Table 8 Ratio change in social value to utility gain

Discussion

The results of the present study indicate that differences in the social valuation of programs as measured by the PTO are related to the value of the poorer health state in the comparison as measured by the TTO. One plausible explanation of this consistent with observations elsewhere is that there is a social disutility associated with patients remaining in severe health states and for reasons discussed in the literature the effect is additional to the effect of health improvement per se judged by the individual. The hypothesis implies that there will be a social benefit if an individual is taken out of a severe health state and a social loss if they are left in a severe health state. The present study tests this second implication. If the severity effect is subject to diminishing returns with respect to health then the severity effect will rise disproportionately as the health state a person is left in deteriorates. Therefore when a pair of programs are compared, the difference between the social and the personal valuation rises as health declines. The present study uses this result to test the hypothesis. The differences in the social valuation of pairs of programs are compared with the level of health. If the severity effect is subject to diminishing returns with respect to health then after standardising for the magnitude of the change programs offering similar health gain will be more highly valued when health is poor.

Despite the different form of the test the fundamental hypothesis is the same as in other studies: people do not wish others to be in a severe health state. The results here suggest that it is not the reason for their being in the severe state that is of importance—whether the severe state occurs before or after a health program. Rather it is people ‘not being in the severe state’ which is valued.

Like the results of any statistical analysis our hypothesis is only one of several which might explain the data. Despite the longevity of their use, the properties of the TTO and PTO are not fully documented. Results may be due to some unknown property of the particular PTO instrument used or to a framing effect or to systematic misunderstanding of the question. However the latter explanation, in particular, is unconvincing. We applied an extremely stringent criterion for accepting data in the second analysis. Experienced interviewers reported a satisfactory level of comprehension by those interviewed.

A further possible problem is described by Nord et al. [30] who argue that changes in health states, as measured by conventional instruments like the TTO may not have the interval property that is claimed for them. In particular, two changes in health may be valued differently from the value implied by the conventional numbers because people with unequal health potential may have varying aspirations and reference points. This argument may well be true as it is very likely that people’s interest in health improvement will vary with their circumstances. As argued in Richardson and McKie [31] and more explicitly in Richardson et al. [32] orthodox theory implicitly assumes that disembodied health benefits may be redistributed to compensate losers (the Kaldor-Hicks criterion). As this is untrue the social or individual benefits of health gains that are embodied in a particular individual become subject to a range of other considerations. However, in the present context, the pertinent point is that individual utility is currently measured by the TTO and SG. The present results suggest that, as measured, health improvement must be adjusted by the severity of health states to obtain correct social valuations. The interpretation of this for policy is, of course, open to discussion.

A third possible interpretation of the results sees a more significant role for respondents’ preferences regarding potential: that is, respondents believe it is important for others to realise their potential for health improvement, even if this is limited. The idea that it is unfair to give a lower priority to those with a limited capacity for improvement—e.g. the disabled and chronically ill—is well known in the philosophical literature [3335], and empirical studies of public preferences provide further support for it, both in the context of life-saving and life-enhancing treatment [3641]. What the present study shows, on this interpretation, is that the importance of potential is subject to diminishing marginal returns, as severity decreases, conversely, the value of observing others achieve their potential increases disproportionately as the health state of the person deteriorates. While this possibility has been raised in other studies [37, 41], it has never been clearly demonstrated.

As with severity, it would be a mistake to think that the social benefit of potential is simply added to the private benefit of health improvement (and therefore explains why PTO exceeds TTO scores). This would be inconsistent with the data. As both TTO and PTO scales vary from 0.00 (death) to 1.00 (best health), the greater PTO values in the region of poor health imply a smaller, not greater, benefit from programs that treat patients in the regions of good health. Consistent with this, improvements in PTO scores are typically smaller than improvements in TTO utilities for programs treating patients in moderate to good health states.

A synthesis of these two interpretations is that PTO social values are a function of TTO individual utilities and the effect of potential and severity. One point in favour of this interpretation is that respondents in the PTO exercise were presented with the comparisons in Fig. 3 and the patients who will benefit from program P2—the program being evaluated—have less capacity to benefit in comparison with the invariant program, P1. It was not explained to respondents, and is irrelevant for present purposes, whether the patients who will benefit from P2 are already in state Us and will be returned to this state after life-saving treatment, or whether Us will begin after treatment [on this distinction see, 38]. On this interpretation, the results reported here are a result of the combined effect of potential and severity: people derive value from seeing others achieve their potential, but this varies with the severity of the health state involved. The importance of our results, however, is not diminished by the alternative explanation, as both the ‘severity hypothesis’ and ‘potential hypothesis’ are in conflict with the usual view that the value of a program can be measured by health improvement alone.

The results add further weight to the evidence already reported that members of the public have an aversion to the health inequalities implied by individuals remaining in serious health states. The results are consistent with widely recognised ethical perceptions. Indeed, Rawls has been criticised for not allowing natural inequalities, such as those arising from health status, to be a factor in determining who is worst off and therefore more deserving of compensation. For example, Kymlicka comments: ‘According to Rawls, people born into a disadvantaged class or race not only should not be denied social benefits, but also have a claim to compensation because of that disadvantage. Why treat people born with natural handicaps any differently? Why should they not also have a claim to compensation for their disadvantage?’ [42, pp. 72–73]. Green makes a similar point, arguing that health care is in fact a social good: ‘Access to health care is not only a social primary good, but possibly one of the most important such goods … [because] disease and ill health interfere with our happiness and undermine our self-confidence and self-respect’ [43, p. 117]. As we have seen, a growing body of evidence indicates widespread support for the ethical principle that, other things being equal, extra weight should be given to treatments that benefit those who are more severely ill.

Neither the results of the present study, nor those surveyed earlier, mandate that severity must be incorporated into health policy. Even leaving aside the few studies that find little support for severity, to assume that because something is the case (because a number of studies have detected widespread support for a policy or principle) that it ought to be the case (that the policy or principle should be implemented) is to commit the ‘naturalistic fallacy’ [44]. It is to assume that normative conclusions can be derived directly from factual premises. On the contrary, the results of public consultation exercises must always remain open to ethical criticism, and no normative conclusions can be deduced from purely empirical enquiries.

This being said, the results of empirical inquiries are relevant, and a strong case can be made for applying a severity-based equity weighting to the value of health state improvements for the purposes of setting priorities in the health sector [45]. The present results suggest that results of well-conducted studies will neither give overriding importance to severity nor neglect it entirely. It will represent a trade-off between health maximisation (efficiency) and giving priority to the more severely ill (equity). In liberal democracies, this can be seen as a socially acceptable compromise given the range of moral views in the community, some of which will tend to support more efficient policies others more egalitarian ones. This does not gainsay the impossibility of deriving ethical conclusions from empirical data. Rather, it represents a reasonable compromise given the persistence of ethical disagreement.

Conclusion

There is a growing body of empirical evidence concerning severity of illness, most of which indicates that it is a significant factor for many people when allocating limited health care. When asked to judge for others—when asked to adopt a social perspective—many respondents systematically re-weight individual patient preferences to take account of the severity of a patient’s health state. As a result, higher priority will be accorded to programs which allow patients to avoid being in severe conditions. This implies higher priority for some programs than would be obtained from an individual’s assessment of health improvement alone. This reflects a social judgement about the distribution of health benefits—about what is fair. Our results therefore suggest that the notion of ‘effectiveness’ in cost-effectiveness analyses should be extended to encompass not just magnitude of health improvement but a weighting of expected benefits to reflect the severity of the condition. In practice this implies additional priority to programs treating patients in severe health states. Similarly, the values attached to health states by multi-attribute utility instruments (MAUs) need to be modified to account for severity of pre-treatment condition [46]. Our results are not general which implies that ‘further empirical research is needed to determine the appropriate transformation functions with reasonable precision’ [47]. However our analysis and the results are robust and plausible, and increase the case for independently including the importance of severity in economic evaluation studies.