Introduction

Valuation of health has long been a field of interest in health policy research and is integral for the prioritization of resources in health care [1]. Cost-effectiveness analysis (CEA) compares the relative costs and effects of treatment options and provides a rational approach to optimize health care spending [2]. Cost-utility analysis (CUA) is a special case of cost-effectiveness analysis where effectiveness is measured as quality-adjusted life years (QALYs) [3]. Quality-adjusted life years capture both duration and quality of life (QoL) and therefore provide a value of health benefits that can be compared across diseases and health states [4]. CUA has become the recommended practice for economic evaluation in health policy research and preference-based QoL values (Health state utility values, HSUVs) are preferred for estimation of QALYs in CUA [3, 5].

Health state utility values can be obtained using direct or indirect methods [6]. Direct methods derive HSUVs by mapping preferences directly on to a QoL index. The mapping can be conducted using choice-based methods such as time-trade-off (TTO) or standard gamble (SG), or Visual Analogue Scales (VAS). Indirect methods derive HSUVs by mapping preferences onto a QoL index indirectly via a generic health-related quality of life (HRQoL) questionnaire [for example, the EuroQol-5D (EQ-5D) and the short form six dimensions (SF-6D)]. Responses to the questionnaires are converted to HSUVs using a value set. The value set is available from a previous and separate study in which health states (responses to the questionnaire) have been assigned utility values using trade-off methods from a separate population, frequently a sample of the general population. Preferences for health states may be obtained from two distinct source populations: (i) from individuals who experience the health state (“experienced health,” EH); or (ii) from individuals who rate a hypothetical health state (“hypothetical health,” HH) [7]. It has been shown that both the preference elicitation method (direct vs. indirect) and the source population (EH vs. HH) may substantially affect the estimated HSUV and therefore potentially the results of CUA [8].

There is limited information on the impact of the different valuation approaches on HSUV in general and in osteoporosis in particular. Osteoporosis is a disease characterized by loss of bone mass and microarchitectural deterioration, resulting in bone fragility [9]. The main clinical consequences of the disease are fractures [9]. The consequences of osteoporotic fracture differ by fracture site, with effects ranging from severe pain, disability, and even death for patients with hip fracture [10], to less serious and frequently transient effects after distal forearm fracture [11]. Osteoporosis is associated with substantial burden to both patients and society. In Europe, it has been estimated that osteoporosis account for 2 million disability-adjusted life years (DALYs) annually [12], and the economic burden in the European Union was estimated at EUR 37 billion in 2010 [13].

The International Costs and Utilities Related to Osteoporotic fractures Study (ICUROS) is a prospective observational study on the consequences of osteoporotic fracture. In the study, patients who sustained an osteoporotic fracture completed three QoL instruments: the EQ-5D 3-level descriptive system (EQ-5D-3L), EQ Visual Analogue Scale (EQ-VAS), and TTO before fracture (recall), within 2 weeks after fracture, and at 4, 12, and 18 months after fracture. Analysis of ICUROS data allows for an assessment of the impact of both preference elicitation method and source population on HSUVs after osteoporotic fracture. Such analyses may improve understanding of the relative impact of different approaches to value QoL and facilitate comparisons of HSUVs across studies. Therefore, the aim of this study is to estimate and compare HSUVs after osteoporotic fractures using TTO, EQ-VAS, and EQ-5D-3L.

Methods

Data source

The ICUROS is conducted under the auspices of the International Osteoporosis Foundation. To date, 11 countries (Australia, Austria, Estonia, France, Italy, Lithuania, Mexico, Russia, Spain, the UK, and the USA) have participated in the ICUROS, with virtually the same study design applied in all countries. The ICUROS enrolls patients who sustain a low-energy fracture, defined as a fracture resulting from minimal trauma such as a fall from standing-height or less, based on the following inclusion criteria: aged 50 years or more, had the first study interview within 2 weeks after the first health care contact for the fracture, lived in their own home prior to the fracture, the fracture was not caused by a co-morbidity (e.g., cancer), judged to be capable of answering the patient related questionnaire. Vertebral fractures were confirmed by X-ray examination. Patients who sustained an additional fracture during the study were excluded. For this study, only patients who sustained a hip, vertebral, or distal forearm fracture were included. HSUVs were elicited during scheduled contacts with patients at enrollment (current and pre-fracture recall), and at 4, 12, and 18 months after first health care contact for the fracture. The study design has been described in more detail elsewhere [14].

Study population

In the current study, we included only patients who had sustained a hip, vertebral, or distal forearm fracture and who completed the study having provided data on all QoL instruments at all interviews.

Quality of life measurements

Time-trade-off is a direct method to derive HSUVs. It is based on the respondent’s choice of staying in a given (e.g., their current) health for a specified time (t 1) or a state of full health for a shorter period of time (t 2), both alternatives followed by immediate death. The HSUV for the respondent’s health state is derived by dividing t 2 with t 1 [15].

The EQ-VAS is a direct method to derive HSUVs. It consists of a visual analogue scale from 0 to 100 with the endpoints labeled ‘Worst imaginable health state’ (0) and ‘Best imaginable health state’ (100). The respondents indicate the point on the scale that represents her current health state. Given that EQ-VAS does not entail a choice between alternatives, the method is not preference based and therefore arguably not suited for derivation of HSUVs intended for use in CUA [16].

The EQ-5D 3L descriptive system is an indirect method to derive HSUVs. It consists of five dimensions (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression) with three levels: no problems, some problems, and severe problems, resulting in a total of 243 (35) health states [16]. Each individual health state is converted to a HSUV by applying a value set based on preferences elicited for the different health states. In this study, EQ-5D-3L HSUVs were obtained using the UK value set which was derived by letting members of the general population provide their preferences for hypothetical EQ-5D-3L health states using TTO [17]. This value set was used reflecting that it is considered most robust and is recommended by the EuroQol group in the absence of country-specific value sets [16] and therefore suited to international studies. The three different methods used to derive HSUVs are described in brief in Table 1.

Table 1 Brief description of the methods used to derive health state utility values in the study

At each time-point of assessment, (before fracture (recall), within 2 weeks after fracture, and at 4, 12, and 18 months after fracture), a set of HSUVs was derived for each of the three approaches (TTO, EQ-VAS, and EQ-5D-3L) described above. In addition to the HSUVs, accumulated QoL loss and QoL multipliers were estimated for the time periods 0–12, and 12–18 months after fracture. HSUVs derived using the EQ-VAS were divided by 100 to facilitate comparisons with the EQ-5D-3L and TTO.

Accumulated QoL loss was estimated by subtracting the accumulated QoL for the relevant time period after fracture from the inferred accumulated QoL had the fracture not occurred, i.e., recalled pre-fracture QoL. QoL multipliers were estimated by dividing the accumulated QoL for the relevant time period with pre-fracture QoL. Further details on the estimation of accumulated QoL loss and QoL multipliers are provided in Supplementary Material 1.

Statistical analysis of study data

For baseline characteristics, comparisons among groups were conducted using t tests, F tests, or Chi-square tests as appropriate. Comparisons between HSUVs at different time-points were conducted using paired t tests. All analyses were implemented in STATA 14.0 and the significance level was set at 5%.

Given that QoL multipliers are estimated using ratios and the underlying data comprise both negative values and zeros, estimates of QoL multipliers derived using the arithmetic mean may be biased. Therefore, bootstrapping of point estimates was implemented to derive QoL multipliers (0–6, 7–12, and 0–12 months) and 95% confidence intervals (CIs) were obtained using the percentile method [18]. Further details on the bootstrapping methodology are provided in Supplementary Material 1.

Results

In total 1410 patients, comprising 505, 316, and 589 patients with a hip, vertebral, or distal forearm fracture, respectively, were eligible for analysis (Fig. 1). The number of patients by country are presented in Supplementary Material 2.

Fig. 1
figure 1

Patient inclusion/exclusion flowchart

Patient characteristics stratified by fracture type are presented in Table 2 below. The mean (SD) age across all patients were 68 (10) years, 82% were female, and 24% had experienced a previous fracture in the preceding 5 years.

Table 2 Patient characteristics at fracture

Regardless of the approach used to derive HSUVs, mean absolute HSUVs were lower at all follow-up time-points compared to pre-fracture for hip and vertebral fractures. For distal forearm fracture, the mean decrements compared to pre-fracture were significant for all three approaches directly after fracture and at 4 months after fracture. At 12 months after fracture, the decrements compared to pre-fracture were not significant for EQ-VAS, and at 18 months after distal forearm fracture, the decrements were not significant for EQ-VAS or EQ-5D 3L, but remained significant for TTO (Fig. 2).

Fig. 2
figure 2

Health state utility values and 95% confidence intervals by elicitation approach for a hip fracture, b vertebral fracture, c distal forearm fracture. TTO denotes time-trade-off; EQ-VAS denotes EuroQol Visual Analogue Scale, EQ-5D 3L denotes EuroQol 5-dimension 3-level descriptive system

Numerically, TTO consistently provided the highest mean absolute HSUVs across all fracture types and time-points, including pre-fracture recall. EQ-VAS and EQ-5D-3L consistently generated the lowest HSUVs before and immediately after fracture, respectively, across all fractures, with no such monotonic relationship observed for the other time-points (Fig. 2).

The largest observed differences between the highest and the lowest mean absolute estimate for the different methods were consistently observed directly after fracture, where the EQ-5D 3L yielded substantially lower absolute mean HSUVs compared to the other approaches. The mean differences between the EQ-5D-3L and EQ-VAS HSUVs for hip, vertebral, and distal forearm fracture directly after fracture were estimated at 0.49, 0.24, and 0.16, respectively.

The EQ-5D-3L resulted in the lowest estimated mean QoL multipliers across all fracture types and phases except for 13–18 months after distal forearm fracture. Furthermore, except for 13–18 months after distal forearm fracture, the confidence intervals for EQ-5D-3L derived multipliers did not overlap the confidence intervals of the multipliers derived using the other approaches (Table 3). The relative difference between the multipliers derived using the EQ-5D-3L and the lowest of the multipliers derived using the EQ-VAS and the TTO decreased from the 0–12 to the 12–18 months periods for all fracture types: from 27 to 8% for hip fracture, from 19 to 12% for vertebral fracture, and from 7 to 1% for distal forearm fracture.

Table 3 Accumulated QoL loss and QoL multiplier by fracture type and time since fracture

Discussion

This study shows that the approach used to value health materially influences the estimated QoL impact of osteoporotic fracture. Therefore, the choice of approach can have substantial effect on the cost-effectiveness of pharmacologic fracture prevention.

Across all time-points for the three fracture types, TTO provided the highest HSUVs, whereas EQ-5D-3L consistently provided the lowest HSUVs directly after fracture. Except for 13–18 months after distal forearm fracture, EQ-5D-3L generated lower QoL multipliers compared to the other two methods, whereas no equally clear pattern was observed between EQ-VAS and TTO.

The fact that TTO produced higher HSUVs than EQ-5D-3L indicates that experienced health preferences are less sensitive to impaired health compared to hypothetical preferences. This finding is in line with most previous research, including one study in osteoporosis [20,21,22,23,24,24]. Potential reasons for the difference include individuals’ limited ability to predict future preferences, adaptation of expectations on QoL, and development of coping strategies [25]. Furthermore, the finding that the largest differences between EH and HH preferences were observed directly after fracture corresponds with previous observations that the differences between experienced and hypothetical HSUVs are most marked for severe health states [26]. In the context of fragility fractures, a potentially compounding factor is that patients with recent fracture may not expect their current health state to endure and therefore may be reluctant to trade life years for QoL. Even though the TTO questions pertain to the current health state, the TTO responses may reflect patients’ beliefs about their future health [27]. Therefore the TTO may underestimate the immediate QoL impact of a fracture. The EQ-5D-3L and EQ-VAS do not require patients to trade quality for quantity of life and therefore patients’ expectations of the duration of the current health do not influence the responses to these questionnaires. In this context, it is interesting to note that QoL 18 months after fracture is significantly lower than pre-fracture when HSUVs were derived using the TTO, but not the EQ-5D-3L, potentially reflecting that TTO, which is a continuous index, can measure minor impairments in health that may not be captured by the discrete health states constituting the EQ-5D-3L. In this context, it may be noted that the mean TTO decrement observed 18 months after distal forearm fracture may fall below the minimally important clinical difference (MICD) threshold for the TTO, which has been estimated at 0.05 [28]. However, such a difference may arguably be important on a population level given that the estimated mean change results from a distribution of outcomes observed on the patient level. To arrive at the mean estimated decrement of 0.02, a proportion of patients will experience decrements higher than 0.05 the MICD threshold. Therefore, and for the reason that individuals may value differences that are smaller than the MICD, in a health economic context, a mean decrement of 0.02 may be relevant.

The impact of the choice of elicitation method and source population on the outcome of an economic evaluation may be complex. In the context of osteoporosis, all else equal, the higher QoL multipliers observed with TTO and EQ-VAS compared to EQ-5D would reduce QALY gains from avoiding fractures. On the other hand, fractures are associated with mortality and avoiding fractures therefore, on average, increases longevity and the QALY impact of the reduced mortality would also be affected by the HSUVs. In addition, side effects of treatments may also need to be taken into consideration and the QALY impact of those also reflect the choice of elicitation method and source population. Finally, it is not evident that the willingness to pay per QALY, i.e., the threshold that society is willing to pay per QALY, is the same for EH- and HH-derived HSUVs.

The choice between preference source populations is inherently normative and depends on decision context [25]. For example, the UK National Institute of Healthcare and Clinical Excellence (NICE) [29], the Dutch Zorginstituut [30], and the First and Second Panels on Cost-Effectiveness in Health and Medicine [32] advocates HSUV derived using hypothetical health states, whereas the Swedish HTA agency TLV prefers experienced base health states [33]. In terms of preventive treatments, it has been argued that hypothetical health states are most relevant, given that the majority of the population under consideration for treatment have not experienced the health state [24].

Differences in perceptions of health and provision of specific health-related services may result in differences in valuation of health across countries. For this, and other reasons, including elicitation methodology, EQ-5D-3L value sets differ between countries [34]. Therefore, it is important to note that the HSUV from the EQ-5D-3L were derived using the UK value set and that other value sets may have produced other results [34]. While we are not aware of any studies specific to osteoporosis; in patients with acute lower respiratory tract infection, the UK value set produced HSUVs that were more sensitive to changes in health status than other value sets [35]. Therefore, the differences between HSUVs derived using EQ-5D 3L and HSUVs derived using EQ-VAS and TTO may have been smaller had another value set been implemented. However, it is unlikely that the implementation of another HH value set would have substantially altered the results observed with respect to preference source population, reflecting that the only experienced-based EQ-5D value set produced the highest HSUV of all EQ-5D value sets for severe health states [36], suggesting a substantial difference between HH- and EH-derived value sets in general. Indeed had the EH value set been implemented, the differences between the EQ-5D-3L derived HSUVs may have been smaller or even been reversed. Such a result would indicate that the choice of source population (experienced based vs. hypothetical health) may be more important than the elicitation method (direct versus indirect) in terms of impact on HSUVs.

This study has several limitations. Firstly, the inclusion/exclusion criteria may have resulted in a study population that is healthier than the average patient sustaining a fracture, albeit this may differ between countries [14]. Secondly, all patients included in the analysis had to have completed all QoL instruments at all interviews, resulting in a substantial loss to follow-up; predominately reflecting that patients did not complete the TTO instrument. The difficulty to complete the TTO instrument may reflect that old patients may not expect to live 10 years and therefore have difficulties in making a choice involving a 10 years’ time horizon. Another potential explanation is that mild cognitive impairment associated with advanced age may render the TTO instrument difficult to comprehend. In addition, it may be noted that even mild cognitive impairment may affect the elicited HSUVs [37], potentially introducing additional uncertainty to the estimates. Given that loss to follow-up may be associated with poor health, the QoL impact of osteoporotic fracture across all approaches are likely underestimated in this study. However, the effect on the relative impact of QoL from the different approaches is less likely to be biased. In this context, it is notable that a small minority of hip fractures did not result in hospitalization, more than 80% of those patients were enrolled in Russia, consistent with previous observations that a substantial minority of patients sustaining hip fracture in Russia may not be hospitalized [38]. More generally, differences between countries and cultures with respect to delivery of health care, perception of health, longevity, and their relative importance means that the results in this study may not be generalizable to all settings.

The methods for deriving the HH and EH HSUVs were different. Important discrepancies include that the EH cannot include dead as a health state and do not incorporate negative values. In addition, recall was employed to estimate pre-fracture QoL. Therefore, it is possible that pre-fracture QoL estimates are biased. However, the maximum time from the first health care contact for the fracture and the first interview was 14 days, rendering substantial recall bias unlikely given that patients can accurately recall their QoL up to 6 weeks [39]. In this context, it may be noted that mean HSUV 18 month after distal forearm fracture was similar to mean HSUV prior to fracture across the three methods (maximum mean absolute difference 0.02 [cf Fig. 2]). Given that sequelae after distal forearm fracture are generally mild [40], this finding suggest that pre-fracture QoL recall is unlikely to be systematically biased. If recall pre-fracture QoL was systematically biased, long-term QoL would have differed from pre-fracture QoL after distal forearm fracture. Additionally, it has been shown that replacing recalled QoL with age-matched general population values for EQ-5D 3L does not systematically affect the estimate QoL impact of fracture [41].

Further research in this area is needed. It would be important to better understand the differences between HH and EH preferences. Determining, the reasons for the apparent differences would be of value, and exploring the extent to which those differences are driven by the choice of HH value set could elevate the interpretation of the results. In this context, exploring differences between countries and cultural clusters could also be valuable for policy makers, whose decisions often are regional or national in nature. Furthermore, it would be informative to explore valuation of health state in persons who have experienced a health state but since have recovered. Such data would inform policy makers as to whether experience of a health state results in a permanent or transitory change in preferences, potentially guiding the choice between experienced or hypothetical-based preferences.

With the caveats discussed above, this study shows that the approach to derive QoL markedly influences the estimated QoL impact of osteoporotic fracture and therefore has the potential to affect decisions on health care prioritization.