Introduction

To characterize the value of competing osteoporosis therapies, cost-utility analyses that use societal health state values to measure effectiveness are often undertaken [1]. The incremental cost-effectiveness ratio, typically defined as the ratio of added cost to changes in health using the Quality-Adjusted Life Year (QALY) when increasingly costly alternatives are compared, is the focus of the analysis. Estimating QALYs requires a measure of health using a single score, anchored at 0 for dead and 1 for perfect health, and weighted for the relative desirability of the health state. Standards for economic evaluations recommend societal values (also called preferences or utilities) [2]. Unfortunately, measures of health state value are rarely available from clinical trial participants. Instead, clinical trials more commonly include disease-targeted health status instruments, such as the Osteoporosis Assessment Questionnaire (OPAQ) [3]. The OPAQ yields four domain scores (physical function, psychological status, symptoms, and social interaction) that cannot be used directly for cost-utility analyses. There are two main approaches used to obtain “societal health state values”(HSV): (1) direct measurement of value for health states from a representative sample of the population using methods such as standard gamble [4], time tradeoff [5], and visual analogue scale ratings [6] and (2) indirect measurement using preference-based measurement systems such as the Quality of Well Being Scale [7], the EuroQOL EQ-5D [8], the SF-36-based SF-6D [9], and the Health Utilities Index [10]. These systems include a generic self-report health state classification system and an algorithm to apply a set of societal preference weights for the range of health states described by the system.

Preference-based systems offer the opportunity to measure changes in terms of societal HSVs without the resource requirements of direct measurement. However, in reality, many clinical trials in osteoporosis do not utilize preference-based systems. Choosing outcome measurement instruments for clinical trials involves serious tradeoffs. Researchers must balance the need to maximize measurement precision for the detection of differences against practicality of administration and limits in available resources. With these concerns in mind, unless cost-utility analysis is a specific research aim, investigators often favor disease-specific measures over the generic preference-based instruments because the former focus on the key aspects targeted by treatment. Therefore, HSVs required for economic evaluation are rarely available from trial participants.

There are situations in which researchers wish to conduct cost-utility analysis without measuring health states directly. They may choose to use age- and sex-specific HSVs from a published source, such as time tradeoff-based values of current health from a population sample [11]. However, populations used to estimate published HSVs may be different from the population of interest in important ways. To address this challenge, methods have been developed to estimate HSVs from data provided by more widely used outcome measures. Several methods have been explored to estimate HSVs from the SF-36 [1214], the SF-12 [1521], condition-specific tools [2225], and national survey instruments and therefore allow the calculation of QALYs [2, 26]. The objectives of this study were to provide a model to predict societal HSVs from the preference-based systems EQ-5D-US and SF-6D using data from the OPAQ and to characterize the impact of osteoporotic fracture in terms of mean OPAQ dimension scores and EQ-5D-US and SF-6D HSVs by fracture type.

Methods

Study population

This is a secondary analysis of pooled data from two studies of persons at least 50 years old with a non-traumatic fracture within the prior 5 years recruited at Dartmouth-Hitchcock Medical Center in Lebanon, NH and the Mayo Clinic in Rochester, MN [27, 28]. The first study included women with non-traumatic fractures of the hip or spine within the prior 5 years, and the second included men with non-traumatic fractures of the hip and spine and women with fractures in other locations.

Fracture categorizations used in this study were based on published expert panel classifications [29]. In summary, three main fracture locations of hip, vertebra, and wrist were identified, and fracture classification criteria were created for other fractures in order to characterize their effects relative to these more common fractures. A panel of physician experts familiar with aspects of osteoporosis-related fractures defined the following “like” categories: (1) like a hip = shaft/distal femur, proximal tibia/fibula; (2) like a spine = cervical spine, ribs, sternum, clavicle, scapula, pelvis; (3) like a wrist = humerus, shaft/proximal radius, ankle, distal tibia/fibula; and (4) like a finger/toe = hand/fingers, foot/toes, and patella.

Participation was requested from every patient visiting one of the participating medical centers’ osteoporosis clinics. Each participant completed a series of questionnaires at the time of visit to the medical center or via mail following a visit to a participating medical center. Patients were excluded if they had experienced multiple fractures or demonstrated impaired cognitive function. Participants with missing essential variables for analyses due to incomplete questionnaires were removed from the analysis.

Measures

The measures used in this study, including the OPAQ, the Medial Outcomes Study Short Form (SF-36), the SF-6D, and the EuroQoL EQ-5D, are described below.

OPAQ

The Osteoporosis Assessment Questionnaire is a disease-targeted instrument designed to measure health status in fracture patients [3, 30, 31]. There are several versions including the original 73-question OPAQ 1.0, the 60-question OPAQ 2.0, and the 34-question OPAQ-SV. In an early validity study, the OPAQ 1.0 demonstrated evidence of validity and mean Cronbach’s alpha of 0.86 ± 0.01 [3]. In this study, we used the self-completed OPAQ 2.0, which consists of 60 questions categorized within 14 domains, and further grouped within the following four dimensions: physical function, emotional status, symptoms, and social interaction. Item scores are summed to create the 14 domain scores, which are reported from 0 to 10. Domain scores are summed and rescaled to yield dimension scores from 0 to 100, with higher scores indicating better status.

SF-36-derived SF-6D

The SF-36 was developed in the Medical Outcomes Study for the measurement of health status, and its measurement properties have been extensively studied [32]. The SF-36 has been used in the Multicentre Osteoporosis Study (CaMos). It measures eight dimensions on up to six levels. Our research used version 1 of the SF-36, with the reference time frame for health status of “the past 4 weeks.” The SF-6D provides a method for deriving a preference-weighted health state classification system from the SF-36 instrument using items from six of the eight dimensions with response caregories up to six levels [9, 33]. We used version 2 of the SF-6D, which utilizes an additive statistical model to provide health state values ranging from 0.29 to 1.00. Societal HSVs were modeled from a representative sample of 611 adults from the UK using the standard gamble.

EQ-5D

The widely used and extensively studied EuroQoL EQ-5D includes five attributes rated on three levels to define 245 health states (with “dead” and “unconscious” added) [8]. Preferences are available from multiple populations with different elicitation methods using the same EQ-5D descriptive system. We used preference weights from a US population sample [34]. We will refer to the health state classification system as EQ-5D and the HSVs as EQ-5D-US. The scoring algorithm uses the “D1” random effects model for EQ-5D-US, which provides scores between −0.11 to 1.00 based on a representative sample of 4,048 adults from the USA [34]. The reference time frame for health status used in our study was “today.”

Statistical analysis

Statistical analyses were conducted after pooling the two groups. To examine potential differences in characteristics or predictive ability of the models and final algorithms, additional analyses were performed with stratified data by fracture type and/or gender to examine potential differences in characteristics or predictive ability of the models.

Demographic data were summarized for the entire participant population. As appropriate, the mean and standard deviation or percentage and count for each characteristic were calculated and described.

For each EQ-5D and SF-36 dimension, OPAQ domain and dimension means and standard deviations were calculated. Tests of trend across EQ-5D dimensions and OPAQ domain and dimension scores were performed to assess the relationship between the measures, with the hypothesis that EQ-5D and SF-6D dimensions would correlate with similar OPAQ dimensions and related domains.

We calculated Spearman correlations for the EQ-5D and SF-6D dimensions and the respective OPAQ domain. We performed linear regression analyses using OPAQ dimension scores to predict EQ-5D-US and SF-6D HSVs. We first conducted univariate regression to assess the relationships between HSVs and demographic and clinical variables. For each preference-based measure, a comprehensive multivariate model was generated to predict HSVs using OPAQ dimensions and variables found statistically significant in univariate regressions. From the comprehensive model, a parsimonious model was created by removing variables and OPAQ scores that were not statistically significant within the comprehensive model. In addition, the parsimonious model was examined stratified by gender and fracture type. Model fit was assessed by comparing the distribution of observed and expected HSVs, visual inspection of plots for observed versus predicted values and of the residuals. Model performance for prediction of HSVs was assessed using the R 2 and root mean square error estimates (RMSE). Analyses for this study were completed with STATA, version 9 (STATA Corporation, College Station, TX, USA).

Results

Among a total of 291 study participants with fracture, 16 were excluded due to missing questionnaire responses and 275 were included in the analysis. The mean age of the participants was 68.1 years (±10.6), with a range in age from 50 to 94 years, and 72% were female (Table 1). Grouped by fracture type, hip fracture participants had the highest mean age of 74.2 years (SD 11.23), while wrist-like fracture participants had the lowest average age of 64.41 years (SD 9.09).

Table 1 Characteristics of the sample (n = 275)

Participants’ mean EQ-5D-US HSV was 0.85 (SD 0.14) and mean SF-6D HSV was 0.76 (SD 0.13). OPAQ’s four main dimension means (SD) for the entire group were: physical function 86.1 (17.4), emotional status 70.4 (16.1), symptoms 65.74 (18.6), and social interaction 60.6 (16.9).

Statistically significant correlations were observed for all EQ-5D dimensions with all OPAQ dimensions except social interaction and for all SF-6D dimensions with all of the OPAQ dimensions (p < 0.001) (Table 2). OPAQ’s physical function dimension exhibited the strongest significant correlations with EQ-5D dimensions, ranging from −0.34 to −0.60, followed by emotional status, and symptoms. A similar pattern with stronger correlation coefficients was noted between OPAQ dimensions and SF-6D, and the correlation between SF-6D dimensions and OPAQ social interaction was significant. All domains that comprise OPAQ’s physical function, emotional status, and symptoms dimensions exhibited significant correlations to all EQ-5D dimensions. For the domains that form OPAQ’s social interaction dimension, the only significant, but slight, correlations were for the social network domain to EQ-5D usual activities and self-care dimensions.

Table 2 Spearman correlation coefficients for dimensions of EQ-5D and SF-6D with OPAQ

Tests for trend demonstrated similar consistency between OPAQ dimensions and EQ-5D and SF-6D dimensions. Tests for OPAQ’s physical function, emotional status, and symptoms dimensions were all highly significant (p < 0.001) with each of the five EQ-5D dimensions. OPAQ’s social interaction dimension, again, did not show a significant trend with EQ-5D. OPAQ domains demonstrated similar trend patterns.

Univariate regression models

In separate, univariate regression analyses, each of OPAQ’s four dimensions demonstrated statistically significant relationships with EQ-5D-US and SF-6D HSVs. The relationship between social interaction and EQ-5D-US and SF-6D HSVs was the weakest of the four dimensions. The following variables were also individually statistically significant predictors of EQ-5D-US HSVs: age (as continuous variable, p = 0.004), BMI (p = 0.006), alcohol use (categorized as <1/week or ≥1/week, p = 0.012), education (categorized as up to high school versus beyond high school, p < 0.001), current employment (p = 0.001), and Deyo score (an adaptation of the Charlson comorbidity index, p < 0.001) [35, 36]. Race, Hispanic origin, currently smoking, living alone, and taking alendronate or raloxifene were not statistically significant predictors of EQ-5D-US HSVs in individual regressions. For SF-6D, univariate relationships were similar except that BMI did not predict SF-6D HSVs.

Multivariate regression models

Age, alcohol use, education, and employment were no longer statistically significant in the comprehensive regression models. OPAQ’s social interaction dimension was also no longer a significant predictor of EQ-5D-US in the comprehensive model, which demonstrated an adjusted R 2 of 0.63, p < 0.001. The final regression model for EQ-5D-US, shown in Table 3, included three of the four OPAQ dimensions (physical function, emotional status, and symptoms). The ability of the model to predict EQ-5D-US HSVs was indicated by the R 2 = 0.56, RMSE = 0.096, and p < 0.001. The range for actual EQ-5D-US HSVs was 0.118 to 1 and the range for predicted values was 0.408 to 0.995. The model for SF-6D included age and all four of the OPAQ dimensions (Table 3). This model demonstrated a slightly stronger predictive ability than the EQ-5D-US model, as shown by the R 2 = 0.58, RMSE = 0.085, and p < 0.001.

Table 3 Regression estimates for linear models to predict EQ-5D-US and SF-6D using OPAQ dimension scores

Model fit

Figure 1 shows the comparisons between predicted and observed health state values for EQ-5D-US and SF-6D for all subjects and stratified by gender. The observed HSVs were skewed to the right for both, with a ceiling effect noted for EQ-5D-US. Similar patterns were found for male and female subjects. Transformations of the independent and dependent variables did not provide improved fit for the models, and therefore, untransformed data were used in the final models. Predicted scores were lower than observed at the low end of the range for SF-6D and higher than observed for EQ-5D-US.

Fig. 1
figure 1

Predicted versus observed health state values (n = 275)

Effect of fracture type on health state values

Table 4 summarizes mean OPAQ dimension, EQ-5D-US, and SF-6D scores by fracture type group. The wrist/wrist-like/other fracture group had the highest OPAQ dimension means for physical function, emotional status, and symptoms, while the spine/spine-like fracture group had the highest mean for social interaction. The hip/hip-like group had the lowest means for physical function, emotional status, and social interaction, and the spine/spine-like group had the lowest mean score for symptoms.

Table 4 Mean scores by fracture type for OPAQ dimensions and estimated EQ-5D-US and SF-6D

The effect of fracture type on estimates of health state value was assessed by including indicators for fracture type in the model and evaluating models stratified by fracture type. We examined the effect of fracture type using each of the main fracture types and by collapsing fracture types into three categories, combining other fractures (hand/fingers, foot/toes, patella) with wrist and wrist-like fractures. Regardless of categorization, fracture type was not a significant predictor of health state value in multivariate models. Using the three most common fracture types, in Fig. 2, we illustrate that the confidence intervals for all regression coefficient estimates overlapped across fracture types.

Fig. 2
figure 2

Impact of fracture type on OPAQ coefficients showing regression coefficients for fracture type compared to the overall coefficient for n = 275 participants. Fracture types: hip = hip/hip-like (n = 42), spine = spine/spine-like (n = 80), wrist = wrist/wrist-like/other (n = 153)

Discussion

This study of 275 persons with prior osteoporosis-related fracture provided a linear regression-based method to estimate EQ-5D-US and SF-6D HSVs from the widely used osteoporosis-targeted outcome measure, the OPAQ. The strength of the relationships demonstrated between EQ-5D-US, SF-6D, and the OPAQ supports the use of this approach when HSVs are not available from the trial data. The R 2 estimates of 0.57 for the EQ-5D-US model and 0.58 for the SF-6D model are at the high end of the range (0.31–0.66) for published models including those using similar methods reported by Mortimer and Segal [37, 38]. Furthermore, RMSE values found in this study indicated acceptable prediction of HSV compared to other prediction models [23, 24, 39].

Whenever possible, a preference-based system should be utilized to measure HSVs in clinical studies. However, it is often the case that such measures are not included due to the serious resource challenges faced by investigators. Under such circumstances, the use of societal HSVs estimated from condition-specific measures such as the OPAQ may provide the opportunity to conduct cost-utility analyses that would not otherwise be possible.

The important advantages of leveraging existing data to address pressing questions about the relative value of interventions must be considered against the limitations of this approach compared to more direct measurement. Limited variability of estimates and diminished measurement precision have been associated with regression-based “mapping” approaches [4043]. Our data provide support for some concern about limited variability. The range of predicted HSVs was smaller for both EQ-5D-US and SF-6D, though this was less apparent for SF-6D than for EQ-5D-US. This could potentially translate into difficulty in detecting meaningful differences in health outcomes in studies that use this approach to estimate health state values. On the other hand, it could be argued that estimating HSVs from disease-targeted instruments such as the OPAQ may provide an additional degree of sensitivity or responsiveness by detecting change in clinically important areas where a generic instrument may not. It is unlikely that any improvements in measurement of change over time would outweigh the loss of variation and precision associated with predicting HSVs; however, future research could prospectively compare the performance of these approaches.

Another challenge presented in the application of estimation methods is in the interpretation of study results using different methods. Several approaches have been utilized to convert from health status data to HSVs, including regression modeling, effect size translation, and response mapping [37], and there is evidence to support the significant variation in the results [4042]. One review in this area was unable to identify a superior approach [37]. From the perspective of policy decision-making, comparability of various methods used to estimate societal HSVs is the subject of ongoing discussion and investigation [40, 4450]. Meaningful differences in estimates from different approaches may undermine the fundamental strength of cost-utility analysis—the ability to compare outcomes across diseases and interventions. While there is some indication that variation exists between estimates from different approaches in some cases, how often and how much is yet unknown. This serious concern must be weighed against the need to optimize the use of limited health care research resources.

Given that our models account for 56% and 58% of the variance in EQ-5D-US and SF-6D HSVs, respectively, it is important to consider possible sources of the unexplained variation. For each instrument, a portion of the unexplained variation would be attributable to measurement error. In addition, variation in coverage of constructs is a likely contribution. Although SF-6D, EQ-5D, and OPAQ cover similar key dimensions of health including physical function, emotional status, symptoms, and social interaction, there are important differences. SF-6D dimensions include role limitations and vitality and EQ-5D describes usual activities and self-care. Considering widely used models of health and disability such as the ICF model [51, 52], OPAQ, EQ-5D, and SF-6D appear to cover the level of the person completing specific tasks or activities(e.g., walking, bending, or lifting) and the level of the person participating in society (e.g., grocery shopping, using public transportation, going to a movie). However, the content within these levels of functioning is different across the instruments, in the number of questions covering the construct and in the range of functions addressed, with OPAQ appearing to emphasize activity more heavily than participation. Differences of coverage between OPAQ and the preference-based instruments could explain why both models leave a similar level of variation in HSVs unexplained and support the argument for inclusion of a preference-based measure in clinical studies when cost-utility analysis may be of interest.

There are limitations to this study that should be considered. First, although strong linear relationships were evident between the EQ-5D-US and SF-6D HSVs and the OPAQ dimension scores, particularly for EQ-5D-US, the data were skewed to the right. This would have some impact on the accuracy of EQ-5D-US coefficients and therefore the individual HSV estimates. However, this caveat may be tempered by the consideration that the purpose of this model is to estimate mean HSVs for cost-utility analysis and not individual HSVs.

This sample of persons with a history of osteoporosis-related fractures was largely white with HSVs centered at 0.75 for SF-6D and 0.85 for EQ-5D-US, and age centered at 68 years. Our OPAQ and HSV estimates are similar to those reported in the literature. Studies investigating the impact of osteoporosis on quality of life have published various HSV estimates. Although none have provided the link from OPAQ to HSV and only one presents EQ-5D-US HSV as in this study, results of other studies may be useful to relate to ours. Compared to the study of Silverman et al. of prevalent and incident vertebral fractures, the mean OPAQ dimension scores were very similar to the mean OPAQ scores found for our sample [53]. In regard to HSV, in their prospective study, Strom et al. reported lower mean EQ-5D-UK HSV for vertebral fracture and similar mean HSV for hip and wrist fracture groups at 1 year [54]. EQ-5D-UK HSV published from the European MORE and EPOS studies were slightly lower than those in our study [55, 56]. Hagino et al. reported similar mean EQ-5D HSV at 1-year post fracture to our mean EQ-5D HSV by fracture type using Japanese preference weights [57]. In studies of more severely affected patients, the EQ-5D-UK HSVs were lower than the EQ-5D-US HSVs reported here [5860]. Other studies reporting mean EQ-5D-UK HSVs for osteoporosis-related health states from community-based samples found slightly higher HSVs overall [61, 62]. For example, van Schoor et al. reported mean EQ-5D-UK HSV of 0.80 (0.69–0.85) for severe osteoporosis based on radiographic evidence. Only one study provided estimates of mean EQ-5D-US HSVs related to osteoporosis for comparison [63]. Sullivan et al. analyzed data from the Medical Expenditure Panel survey and reported on chronic conditions based on “quality priority conditions” coding and “clinical classification categories.” Preference-based scores for osteoporosis and other chronic conditions were included in this study, and the mean EQ-5D-US score for osteoporosis of 0.753 and median of 0.80 were reported. Our mean for those with osteoporotic fracture, at 0.85, is slightly higher. Taking these studies into consideration, our sample appears to be a reasonably representative one for many clinical situations; however, for trials involving severe osteoporosis-related health states or predominantly older persons, the estimates from this model may not be generalizable.

The strength of associations between dimensions of EQ-5D, SF-6D, and OPAQ was as expected and consistent with the nature of the constructs measured. It should be noted that multiple correlations were assessed for statistical significance in this study, which would allow for the detection of statistically significant relationships due to chance alone. However, given the nature of the constructs and the level of statistical significance (p < 0.001 level), we believe that these relationships are not likely to be due to chance.

While the subjects in this study were included if they had experienced a fracture within 5 years, recency is not a focus of this project and details on recency of a fracture are not available for analysis. Our aim was to link the specific OPAQ scores with health state values at the same point in time. Therefore, we used a cross-sectional approach, analyzing the OPAQ, EQ-5D, and SF-6D scores representing the same health state in time for that subject.

Conclusions

Estimation of EQ-5D-US and SF-6D health state values from OPAQ using the regression model under investigation appears to be a reasonable approach when actual values are not available and may facilitate cost-effectiveness analyses in trials where OPAQ is used. Future research should test the performance of this model prospectively.