FormalPara Key Points for Decision Makers

The results revealed that interventions for younger patients were strongly preferred by two different methods. Interventions for treatment and severe disease states was preferred if a healthcare technology has the same cost per quality-adjusted life-year (QALY).

To adjust the decision-maker’s threshold by people’s preference, the preference-adjusted threshold (PAT) was calculated using the obtained societal preference based on a discrete-choice experiment (DCE). This may then contribute in adjusting the cost per QALY threshold quantitatively rather than qualitatively.

1 Introduction

Quality-adjusted life-years (QALYs) are frequently used as the unit of outcome in economic evaluations, with incremental cost-effectiveness ratios (ICERs) used as efficiency indicators. Therefore, calculating the cost per QALY helps assess the efficiency of healthcare technologies.

The principle of QALYs is sometimes stated as “A QALY is a QALY is a QALY” [1], which means that all QALYs have the same value. However, because the QALY cannot fully assess the value of healthcare technologies, other important factors such as clinical, social, and ethical parameters are also considered when making judgments regarding cost effectiveness. For example, the UK’s National Institute for Health and Care Excellence (NICE) recognizes that special circumstances such as disease severity, end-of-life situations, stakeholder interests, significant innovations, and pediatric patients should be taken into consideration [2] in addition to the cost per QALY.

Even if consideration of additional factors improves decision making based on the cost per QALY, the manner in which some factors are weighted remains unclear. Some decision makers try to decrease the ambiguity of these criteria and reflect social values. For example, NICE set up the “Citizens Council,” which consists of 30 people from the general population. It provides “a public perspective on overarching moral and ethical issues that NICE has to take account of when producing guidance” [3]. The “end-of-life” [4, 5] rule was accepted primarily because the Citizens Council and the general public supported it. In contrast, application of the “rule of rescue” [6, 7] is denied while recognizing that it is empirically supported by a “powerful human impulse” (see also the experience in Oregon, USA [8]).

It is important to use empirical evidence to show which factors are publicly supported as priority-setting criteria; however, few studies have compared preferences for interventions with the same cost per QALY (although some included efficiency or outcomes as experimental attributes). The primary objective of this study is to investigate societal preferences regarding decision making in addition to the cost per QALY. Budget allocation surveys are classically performed for this purpose, but discrete-choice experiments (DCEs) [9, 10] have also been widely used to survey public preferences for setting priorities [1117]. DCE surveys can provide information about which attributes influence people’s choices and how much each attribute relates to these preferences. The present study compared DCE results with those of a simple budget allocation experiment to consider the difference in preference using two different survey methods. The second objective was to consider how to use preference information for more transparent decision making. In general, the cost per QALY threshold of cost-effectiveness analysis is based not only on people’s preferences (from the perspective of the demand side) but is also based on other factors, such as the healthcare budget (from the perspective of the supply side). However, it is important to reflect these preferences in decision making because tax (or social insurance fee) payers generally have a say in deciding the allocation of medical resources. To promote public acceptance of decision making based on cost-effectiveness analysis, it may be important to reflect on not only the preference of decision makers but also the preference of the general public (e.g., the end-of-life rule in the example of NICE). Although these factors are normally considered qualitatively in many cases, a numerical method is also needed for more transparent and predictable decision making. The concept of a preference-adjusted threshold (PAT) is introduced to combine the pre-determined cost per QALY thresholds and people’s preferences, especially in cases that exhibit a range of thresholds.

2 Methods

2.1 Survey Design

Public preferences for interventions were surveyed using two different methods: budget allocation (single attribute) and DCE (multi attribute). In the budget allocation experiment, each characteristic of an intervention or disease was independently presented and respondents were asked to allocate a limited budget for two different populations. In the DCE, attributes were combined into one health state and respondents were asked to choose the one they thought should be prioritized. The DCE has only ordinal preference information, although cardinal scores can be elicited by budget allocation. Respondents were allowed to allocate equal budgets for different interventions in the budget allocation experiment, but were asked to choose one preferred option in the DCE, similar to other studies.

The attributes included were identified after reviewing published literature on budget allocation and DCEs. According to a systematic review examining priority setting [18], 12 attributes have been considered in previous studies: health gain (e.g., QALYs), disease severity, age, socioeconomic status, carer status, cause of disease (lifestyle or hereditary), prior medical care, availability of effective alternatives, cost or cost-effectiveness of treatment, disease prevalence, equality, and waiting times. This study targeted priority setting using cost-effectiveness analysis, which can then be applied to pricing, reimbursement, clinical guidelines, or other decision-making processes. The following six attributes were considered in this study: (a) age [11, 13, 1517, 1922]; (b) objective of care; (c) disease severity [1115, 17, 2024]; (d) prior medical care [11, 21, 25]; (e) cause of disease (lifestyle or hereditary) [11, 15, 16]; and (f) disease prevalence [17, 21].

2.2 Budget Allocation: Single-Attribute Experiment

In the single-attribute experiment, approximately 1000 respondents were asked via a web-based survey how they would allocate a medical budget (10 million Japanese yen [¥]) to two groups (interventions A and B) from the societal perspective [26]. Respondents were randomly sampled from the largest web panel in Japan and stratified by age and sex.

Six questions were prepared to address the attributes included in the study: (a) age (young or elderly patients); (b) objective of care (treatment or prevention); (c) disease severity (severe or mild); (d) prior medical care (yes or no); (e) cause of disease (lifestyle or hereditary); and (f) disease frequency (rare or common). Respondents chose the most preferred alternative from seven different budget allocations (Table 1), although all options allowed treatment for the same number of patients under the same budget. The budget for the disease was limited to a constant that was not sufficient for all patients to receive intervention A or B under public health insurance coverage. Respondents were asked to choose their preferred alternative within the context of a societal perspective, where respondents were asked to assume the role of decision makers in the central or local government.

Table 1 Budget allocation

All six questions on resource allocation were shown to each respondent. Each question included only one attribute and was not combined with any of the others considered in this experiment. The following is an example of a question regarding age (a):

There are two treatment options, as shown below. However, due to a limited budget, not all patients can pay for treatment with public health insurance. If you are a government decision maker, how would you allocate the healthcare budget to these two treatments?

Intervention A: Subjects are elderly patients with a disease. They currently have no symptoms, but will die within 1 year if they receive no intervention. Intervention A can extend life by 1 year with good health status.

Intervention B: Subjects are young patients with a disease. They currently have no symptoms, but will die within 1 year if they receive no intervention. Intervention B can extend life by 1 year with good health status.

Other questions were asked in the same manner, with patient and disease characteristics changed. Regarding the objective of care, we compared interventions for people with a 10 % risk of developing a disease (b1) and patients with a disease (b2). Since the cost of the former intervention (b1) is one-tenth that of the latter (b2), both interventions have the same efficiency. In the question regarding severity, we asked respondents to allocate the budget to two interventions for mild (c1) and severe (c2) disease states. We assumed that both interventions would lead to an improvement of the score by 10 points, such that the former intervention (c1) could improve mild patients from 70 points (out of 100) to 80 points, and the latter (c2) could improve severe patients from 20 to 30 points. For prior medical care, we compared interventions involving an additional treatment for a disease for which the patient had received prior treatment (d1) or a new treatment for a disease for which no other treatment options existed (d2). We also surveyed preferences for interventions for lifestyle (e1) or hereditary diseases (e2), as well as interventions for a disease with a high incidence rate of 1/100 (common disease, f1) or a low incidence rate of 1/100,000 (rare disease, f2).

2.3 Discrete-Choice Experiment (DCE): Multi-Attribute Experiment

Following the budget allocation experiment, we conducted a multi-attribute DCE via a door-to-door survey to measure public preferences. Approximately 1000 respondents were randomly sampled from 100 sites (municipalities) in Japan. The method used to select the 100 sites was as follows: first, the number of sites in each of the eight regions was calculated as a proportion of the population in each region. Then, in every region, the number of sites belonging to each stratum (Prefecture X, size of municipalities) was calculated based on the populations of each stratum. The Basic Resident Register was used to randomly select respondents from the selected sites. In Japan, each municipality has its own Basic Resident Register, which includes name, sex, address, and date of birth of all residents. After obtaining approval from each municipality, we collected registered data. Selected respondents were stratified by sex and age. Investigators visited the respondents at their homes, hand delivering the questionnaires. They collected the questionnaires a few days later and checked for apparent errors.

The profiles of hypothetical patients included four of the same attributes used in the budget allocation experiment: (a) age; (b) objective of care; (c) disease severity; and (d) prior medical care. We adopted binary levels for each attribute. Attributes (e) and (f) were excluded from this experiment because it is difficult to combine (e) cause of disease and (f) disease prevalence with the other attributes while assuming the same cost per QALY (e.g., it is not feasible to imagine young people with lifestyle diseases). Descriptions of attributes were similar to those in the first experiment.

The four attributes were orthogonally combined to construct 24 = 16 patient profiles. All assumed interventions had the same ICER (the cost per QALY) because the cost of treatment was the same as that for improving one patient’s health status by the same points. Two medical doctor co-authors checked the descriptions in Japanese and concluded that the 16 profiles represented appropriate disease states. The feasibility of this survey was confirmed by a small number of people from the general population.

Respondents were randomly assigned to a pair of the 16 profiles (16 × 15/2 = 120 patterns). Descriptions of two disease states based on the assignment were provided, and respondents were asked which patient they thought should preferentially receive treatment from a societal point of view, given limited medical resources. One example of a health state profile is shown below:

  • X is young patient with a certain disease.

  • Although he is currently fine, his health condition will deteriorate to 20 out of 100 points within a year (a lower score indicates a more severe condition).

  • With treatment, his heath will improve to 30 points.

  • He has received several treatments to control disease symptoms.

  • Treatment costs are ¥100,000, i.e., ¥100,000 will be needed to recover health by 10 points.

The number of questions in the questionnaires had to be limited; therefore, each respondent was shown only one of the 120 patterns. The full questionnaire comprised one of the profile pairs in addition to the other questions regarding health-related quality of life (HR-QOL) such as EQ-5D and SF-36, of which results were not reported in this paper.

2.4 Statistical Analysis

Results from the single-attribute experiment were obtained as average scores and distributions of responses. The choice number was treated as its score, which was interpreted as the degree of preference. In addition, multivariate three-level logistic regression (1, 2, 3 vs. 4 vs. 5, 6, 7) analysis was used to examine the relationship between score and demographic variables.

Discrete-choice data were analyzed using the conditional logistic model based on the random utility theory. We used the following equation as the utility function (V i) of disease state i (Eq. 1);

$$ V_{i} = \beta_{1} \cdot {\text{age}}_{i} + \beta_{2} \cdot {\text{object}}\_{\text{of}}\_{\text{care}}_{i} + \beta_{3} \cdot {\text{severity}}_{i} + \beta_{4} \cdot {\text{prior}}\_{\text{care}}_{i} + \varepsilon $$
(1)

(Age: younger = 1, elderly = 0; object of care: treatment = 1, prevention = 0; severity: severe = 1, mild = 0; and prior care: no = 1, yes = 0.)

Assuming that ε (error term) follows an extreme distribution, the probability (P) of choosing a given intervention can be written as follows (Eq. 2):

$$ P\; = \;\frac{1}{1 + \exp ( - \Delta V)}. $$
(2)

This was used for the estimation of unknown parameters (β). Analyses were first performed without including interactions of demographic attributes. Statistically significant demographic interactions were added to the conditional logistic model to examine their influence on preference.

2.5 Preference-Adjusted Threshold (PAT)

This estimated information could be used to clarify decision making based on cost-effectiveness analysis. Some countries use a threshold range (e.g., from ₤20,000 to ₤30,000 or ₤50,000 per QALY in the UK), which may be implicit, rather than a point threshold. In the Guide to the Methods of Technology Appraisal 2013 by NICE [33], if the ICER exceeds ₤20,000/QALY, the following factors are considered: the degree of certainty, inadequate capture of change in quality of life (QOL), innovative nature of technology, end-of-life situation, and non-health objectives of the National Health Service (NHS). Normally, when the ICER of a new intervention lies between the lower and upper bounds of the threshold, its cost effectiveness is assessed considering factors (clinical, social, and ethical) other than the cost per QALY. It is important to flexibly decide the cost effectiveness of interventions, but at some point this may lack transparency in the decision making. One way to improve decision making based on the cost per QALY is adjustment of the threshold that reflects people’s preferences.

To reflect people’s preferences in a pre-determined threshold range, latent utility was converted to a threshold scale for obtaining PAT. In this study, we considered characteristics of the disease and/or patients, not technical issues, on the cost-effectiveness analysis, such as the degree of uncertainty or the inadequate capture of QOL. Because these factors are measured separately from people’s preference, they should also be considered separately from PAT (Eq. 3).

$$ {\text{PAT}}(V) = T_{L} + \frac{{T_{U} - T_{L} }}{{\hat{V}_{\hbox{max} } - \hat{V}_{\hbox{min} } }}(V - \hat{V}_{\hbox{min} } ) \, $$
(3)

where \( \hat{V}_{\hbox{min} } \;[\hat{V}_{\hbox{max} } ] \) is the estimated minimum [maximum] utility among a presented profile set and T U [T L ] indicates the upper [lower] limit of the threshold. Therefore, the marginal PAT (MPAT) of the attribute is calculated as (Eq. 4):

$$ {\text{MPAT}} = \frac{{T_{U} - T_{L} }}{{\hat{V}_{\hbox{max} } - \hat{V}_{\hbox{min} } }}\hat{\beta }_{i} \, . $$
(4)

If the coefficient of V is not constant, and the logarithm function is applied, PAT can be calculated as (Eq. 5)”:

$$ \begin{aligned} {\text{PAT}}(V) = {\text{TL}} + { \log }_{\gamma } (V - {\hat{\text{V}}}_{\hbox{min} } + 1) \hfill \\ \hfill \\ \end{aligned} $$
(5)

where

$$ \gamma = \left( {{\hat{\text{V}}}_{\hbox{max} } - {\hat{\text{V}}}_{\hbox{min} } + 1} \right)^{{1/({\text{T}_{\text U}} - {\text{T}_{\text L})}}} $$

We calculated MPATs and PATs using TU = ¥10 million and TL = ¥5 million (US$42,000; US$1 = ¥120 as of October 2015) per QALY, although this is not an official threshold in Japan.

3 Results

3.1 Budget Allocation: Single-Attribute Experiment

A total of 1071 people between the ages of 21 and 69 years responded to the survey. The distribution of demographic attributes is shown in Table 2. In 2013, 3.1 % of the Japanese population lived in Shikoku region, 4.3 % lived in Hokkaido, 5.9 % lived in Chugoku, 7.1 % lived in Tohoku, 11.4 % lived in Kyushu, 16.9 % lived in Chubu, 17.8 % lived in Kinki, and 33.5 % lived in Kanto. The actual Japanese median household income was ¥4.3 million, whereas the average was ¥5.4 million. Married and unmarried people accounted for 61.1 and 22.8 % of the population, respectively. Overall, 19.1 % were university graduates. The distribution of responses to budget allocation questions was unimodal in all cases (Fig. 1), except for disease frequency (f) with the peak at No. 4 (equal budget allocation). However, the distribution of responses concerning age (a) was strongly skewed toward interventions that allocated more resources to younger people. Respondents generally preferred to allocate more resources to treatment as the objective of care (b), severe diseases (c), and hereditary diseases (e). The responses concerning prior medical care (d) were distributed almost symmetrically. Choice No. 4 was selected most frequently in all situations, suggesting that respondents prioritized neither of the two groups in allocating resources.

Table 2 Distribution of demographic factors in single- and multi-attribute experiments
Fig. 1
figure 1

Distribution of responses on the basis of budget allocation

Table 3 shows that more than half of the respondents supported the prioritization of younger people. More than one-third of the respondents preferentially allocated medical resources to interventions for treatment, severe diseases, and hereditary diseases, whereas approximately 35–45 % selected equal allocation. Table 4 shows the average score of each question. Multivariate analysis (Table 5) showed the same preference tendencies across all ages and sexes, but to varying degrees; for example, respondents preferred to allocate more resources to interventions for severe and hereditary diseases, but younger respondents showed this preference to a lesser degree than older respondents.

Table 3 Preference of the single-attribute experiment
Table 4 Average score of the single-attribute experiment
Table 5 Odds ratio of the single-attribute experiment

3.2 DCE: Multi-Attribute Experiment

A total of 1091 responses from 100 sites across Japan were obtained in the DCE. As shown in Table 6, the most preferred attribute was younger patients (a2), followed by treatment (b2) and severe disease (c2). There was no statistically significant preference for lack of prior medical care (d2) in the simple or interaction models. These results were consistent with results of the single-attribute experiment. Models including interactions with demographic attributes showed lower Akaike information criterion values and log likelihoods. Four interactions had statistically significant relationships with attribute preferences. Public medical care preferences for elderly patients increased with increasing age. Respondents who were university graduates tended to prioritize care for patients who were younger and had severe disease.

Table 6 Estimation of parameters of the multi-attribute experiment

3.3 PAT

We calculated MPATs as defined in Sect. 2.5. \( \hat{V}_{\hbox{max} } \) was 2.446 and \( \hat{V}_{\hbox{min} } \) was 0, using Model 1 in Table 6. When f(V) was constant, the MPATs of age, objective of care, disease severity, and prior medical care were ¥2.34 million (US$20,000), ¥1.23 million (US$10,000), ¥1.21 million (US$10,000), and ¥0.21 million (US$1800) per QALY, respectively. Linear and logarithmic PATs are shown in Table 7. Since the logarithmic is a concave function, logarithmic PATs were higher than linear PATs.

Table 7 Preference-based threshold of each health state

4 Discussion

This study investigated factors influencing public preferences for healthcare interventions with equal efficiency. It also provided a method for applying such information to decision making, the preference-adjusted threshold, to reflect people’s preferences based on healthcare supplier criteria. For example, healthcare technology for younger and more severe patient cases may be accepted if the cost per QALY does not exceed ¥8.5 million (US$70,000). This threshold is indeed influenced by DCE settings, such as attributes or the number of levels to be considered. However, in case consensus on these factors can be obtained in advance, PAT may be used to interpret the results of the cost-effective analysis quantitatively rather than qualitatively.

The results of the single- and multi-attribute experiments revealed similar preferences: interventions for younger patients were strongly preferred, followed by interventions for treatment, severe diseases, and hereditary diseases. Previous DCE surveys also indicated that younger patients and severe disease states are preferred, although preferences for prior medical care status show different results depending on the models used. These results provide robust evidence that is independent of the survey method. Although no differences were found in observed tendencies across respondent demographic characteristics, the degree of preference varied in some situations. The single-attribute experiment revealed that age was the only factor affecting prioritization; i.e., more than half the respondents preferred to allocate more resources to younger people. The largest proportion of respondents supported equal budget allocation for all other attributes except treatment as the objective of care (b; Table 3). Regarding comparability of the study design between the two surveys, the budget allocation survey included the same preference option, although DCE did not have the option. These results suggest that many people value equal opportunity to receive therapy. When our DCE results are interpreted, we consider it is possible that many people support equal resource allocation when a budget allocation method is used. Therefore, it is important to consider results of the budget allocation survey, including appropriateness of weighted distribution when the DCE-based preference is applied to policy making.

It may be interesting to compare our results with those of the International Social Survey Program (ISSP) Health and Health Care study conducted in 2011, which surveyed the prioritization of healthcare resources in major developed countries (approximately 30 countries and regions). The ISSP surveyed priority preferences between smokers and non-smokers (Q12) and young (aged 30 years) and elderly (aged 70 years) patients (Q13).

In Japan, 27.3 % of respondents chose the non-smoker in Q12, whereas 69.4 % supported the opinion that “their smoking habits should make no difference.” In our survey concerning cause of disease, the most preferred choice was No. 4, suggesting that neither lifestyle nor hereditary disease was prioritized. While this result is common in Japan, i.e., people do not necessarily support prioritization even if patients are not responsible for their own diseases, responses to Q12 differ greatly among countries. For example, 63.0 % of respondents in the UK supported prioritization of non-smokers. Thus, cultural differences may influence the response pattern. For Q13, 40.6 % in Japan prioritized a heart operation for individuals aged 30 years over those aged 70 years and 55.5 % in UK did not differentiate between the two. In the ISSP survey, more than half the respondents supported equal resource allocation between young and elderly patients.

In the UK, Linley and Hughes [21] recently surveyed medical priority setting based on the budget allocation method. They considered the following nine priority setting criteria: disease severity, prior medical care, medicines with new mechanisms, carer status, disadvantaged population, age, end-of-life situation, cancer, and disease frequency. In cases where two interventions had the same efficiency, the majority of respondents supported prioritization only for the following three situations: severe disease, no other medicines available, and patients reliant on informal carers. For other circumstances, however, the majority of respondents supported equal allocation. On interpreting the DCE results, it is possible that many people would support equal resource allocation if the budget allocation method were used.

PAT is a method to combine the healthcare supplier’s threshold and consumer preferences. The cost-effectiveness threshold is a parameter given by the government or public insurer; however, consumer preferences can be reflected in the threshold range. While PAT can only indicate an approximate standard and should not be a rigid criterion of cost effectiveness, it may improve the current qualitative method of decision making. The PATs shown in Table 7 are only one example of the calculation. Other preference surveys will lead to different values. Although our DCE survey included only four attributes, the number of attributes may have to be increased for practical use.

This study has several potential limitations. The budget allocation survey respondents were different from those in the DCE. Moreover, the former survey was web-based and the latter was a door-to-door survey. This difference in survey method may decrease the comparability of results. In addition, we chose attributes for the budget allocation survey and DCE based on a literature review rather than a rigid qualitative study. Some reports have pointed out that the framing effect can influence the results of these surveys [27, 28]. For example, regarding age, we showed only two levels (elderly and younger). However, people’s preferences may be different between children and younger adults. The obtained life-year or QALY and costs shown to respondents may influence preference. For example, our question assumed that the ICER of new technologies was ¥0.1 million per life saved in the budget allocation survey. However, if their ICER was ¥10 million per life saved, the results may have changed. One limitation of our study was that our survey results may not be generalizable to other settings because the setting can influence people’s preferences.

Our survey included risk attributes, which are thought to increase the possibility of a framing effect [2932]. It is difficult to confirm the existence of a framing effect, but it should be taken into consideration when interpreting the results. Finally, because only one DCE question was shown to each respondent and we did not include questions that checked for errors, we were unable to detect inappropriate respondents who had inconsistent responses, continued to choose the same alternatives, and responded in a very short time.

5 Conclusion

Our surveys revealed societal preferences for interventions. The respondents supported the prioritization of younger people if the interventions were of equal efficiency. They also tended to allocate more resources to interventions for treatment and severe patients. We introduced the concept of PAT and calculated it using the measured preference. We believe PAT can contribute to more transparent decision making by adjusting the threshold numerically by people’s preference.