FormalPara Key Points for Decision Makers

The Health Utilities Index Mark 3 (HUI-3) preference score is estimated from the PROMIS-29 V2.0 scales.

The estimated HUI-3 preference scores can be used for economic applications.

Future research is needed to derive preference scores directly from the patient-reported outcomes measurement information system (PROMIS®) measures.

1 Introduction

Health-related quality of life (HR-QOL) measures are often used to examine the effects of medical interventions. Generic HR-QOL profile measures provide multiple health domains scores, but not an overall index score [14]. Preference-based measures provide a single summary score assessing overall HR-QOL and are useful as an outcome measure [5], for monitoring the health of populations [6], and for estimating quality-adjusted life-years for economic evaluations [7]. They provide information on the value of different health states and can be used to estimate health outcomes for cost-effectiveness analyses.

Preference-based measures include the EuroQoL EQ-5D-3L [8], the Quality of Well-Being Scale [9], the SF-6D [10], and the Health Utilities Index Mark 3 (HUI-3) [11]. Although each of these health indexes provides valuations on a 0 (dead) to 1 (perfect health/best imaginable health) scale (three of the four indexes include health states rated less than 0), they differ in health state classification systems, methods for preference assessment, and scoring algorithms. US normative data for these measures was reported in the National Health Measurement Study [12]. The different health indexes vary in their precision along the range of the underlying health status concept, but they are all related [13].

The National Institutes of Health (NIH) launched the Patient-Reported Outcomes Measurement Information System (PROMIS®) in 2004 with the goal of developing, evaluating, and disseminating publicly available item banks assessing HR-QOL (http://www.nihpromis.org). The PROMIS® project developed global health items and profile measures to assess multiple HR-QOL domains that are now widely used in the USA. These measures are designed to be administered efficiently and provide a common language across conditions. PROMIS® domains and global health items have been mapped to the EQ-5D-3L [14] but not to other widely used preference-based measures. Estimated health preference scores from the PROMIS® measures are useful when preferences for health states have not been assessed in a study.

The HUI-3 has eight attributes: vision, hearing, speech, ambulation, dexterity, emotion, cognition, and pain and discomfort [11]. Three of the attributes have five levels (speech, emotion, pain) and the other five have six levels (vision, hearing, ambulation, dexterity, cognition). The objective of this study was to estimate HUI-3 scores from the PROMIS® global health items and PROMIS-29 V2.0 profile measure. We also compare the estimated health preference scores with HUI-3 index scores by age and gender groups. We followed recommendations for the reporting of mapping studies [15, 16]. Our completed Mapping onto Preference-based measures reporting Standards (MAPS) checklist is available upon request.

2 Methods

2.1 Measures

The HUI-3 yields a preference score based on a multi-attribute utility function derived using visual analog scale and standard gamble elicited preferences from a general population sample in Hamilton (Ontario, Canada). PROMIS® has ten global health questions or items [17], including the widely used excellent to poor general health rating question [18]. The remaining nine global health items assess global physical health, physical functioning, pain, fatigue, general mental health, emotional distress, overall quality of life, satisfaction with social activities and relationships, and ability to carry out usual social activities and roles. The PROMIS-29 V2.0 profile measure assesses pain intensity using a single 0–10 numeric rating scale item and seven health domains using four items each: physical functioning, fatigue, pain interference, depressive symptoms, anxiety, ability to participate in social roles and activities, and sleep disturbance. The PROMIS® items and scales (see the Electronic Supplementary Material) in this study are conceptually similar to the HUI-3 attributes but do not include direct measures of cognition and sensation (vision, hearing, and speech).

Study participants completed the ten PROMIS® global health items, PROMIS-29 V2.0 profile measure, HUI-3, and demographic questions on the internet. They received nominal incentives from Op4G (http://op4g.com/) for completing the survey. The specific nature and value of the incentive varied, but did not exceed US$10.

2.2 Sample

We analyzed data collected from members of the Op4G internet panel (http://op4g.com/our-panel/). Op4G maintains a US national sample, and participants are required to update demographic information regularly. We specified quotas (fulfilled by Op4G) for region (18 % Northeast, 20 % Midwest, 37 % South, 33 % West), race/ethnicity (500 Hispanic, 500 African American, and 200 Asian), and education (14 % less than high school, 31 % high school degree, 28 % some college, 27 % college degree). Quotas were also set for 24 age–gender subgroups.

2.3 Analysis Plan

We estimated Spearman correlations between HUI-3 attribute levels with corresponding PROMIS® domain scores. We estimated ordinary least squares regression equations predicting the HUI-3 preference scores from the PROMIS® global health items, PROMIS-29 V2.0 domain scores, and PROMIS-29 V2.0 items. First, we regressed the HUI-3 preference scores on the PROMIS® global health items, retaining items that were statistically significantly (p < 0.05) associated with HUI-3 preference scores. The global health items were scored assuming equal intervals with a higher score representing better health. The 0–10 global pain item was recoded in accordance with PROMIS® convention [4, 17] into five categories based on the grouping of the 0–10 response scales for the Sheehan Disability Scale and the Flushing Questionnaire: 10 = 1 (worst pain), 7–9 = 2, 4–6 = 3, 1–3 = 4, and 0 = 5 (no pain).

We regressed the HUI-3 preference scores on the PROMIS-29 V2.0 scales. We scored the scales following the PROMIS® convention that larger scale scores correspond to more of the concept depicted in the name. Thus, higher scores sometimes represent better health and sometimes worse health, depending on the name of the scale. For physical functioning and ability to participate in social roles and activities a higher score indicates better health, while a higher score on anxiety, depression, fatigue, sleep disturbance, pain interference, and pain intensity indicates worse health. We recoded the 0–10 global pain item into five categories: 0 = (no pain), 1–3 = 2, 4–6 = 3, 7–9 = 4, 10 = 5 (worst pain). Next, we regressed the HUI-3 preference scores on PROMIS-29 V2.0 items, with higher scores corresponding to the item name (11 items were coded so that a higher score is worse health and six items were coded so that a higher score is better health). For this model, we used forward stepwise regression to identify the 17 items with statistically significant (p < 0.05) unique associations with the HUI-3 preference scores.

Regression-based prediction results in biased estimates due to regression to the mean. Linear equating reduces the typical problem of over prediction of low scores and under prediction of high scores [19]. Because our objective was to map PROMIS® scores to the equivalent HUI-3 preference-based scores, we transformed predicted scores from each of the three regression models linearly to have the same mean and SD as the observed HUI-3 preference-based scores (i.e., linear equating). We then recoded mapped (equivalent) scores that were outside of the observed −0.359 to 1.000 range to the nearest minimum or maximum observed scores [19].

To obtain an estimate of capitalization on chance in our regression models, we split the sample into two random halves and derived regression equations on the first random half and applied those equations to the second random half sample. We estimated product-moment and intra-class correlations between predicted and observed HUI-3 preference scores in the first random half sample and compared them to the correlations of observed HUI-3 preference scores with predicted scores in the second half.

We compared estimated HUI-3 preference scores with observed scores overall and by age and gender subgroups. In addition, we estimated HUI-3 preference scores in the original PROMIS® Wave 1 data collected in 2007–2008 [20] using the regression equation based on the PROMIS-29 V2.0 scales and adding a constant (product of the difference in the US general population and the current study’s HUI-3 preference score means and the ratio of their standard deviations [SDs]).

3 Results

The sample consisted of 3000 individuals: 51 % female; 17 % Hispanic, 60 % non-Hispanic white, 14 % non-Hispanic black, and 9 % Asian; and 14 % less than a high school education, 31 % high school graduates, and 55 % education beyond high school. Age was distributed as 30 % 18–34 years, 18 % 35–44 years, 19 % 45–54 years, 16 % 55–64 years, 9 % 65–74 years, and 8 % 75–88 years. Fifty-six percent of the sample were married or living with a partner. The demographic characteristics of the sample was similar to that of the US general population but respondents reported worse health by about half an SD on PROMIS® domains compared with the PROMIS® Wave 1 general population sample, which is comparable with the 2000 US Census [21]. Thirty-four percent of the sample reported having been told by a doctor that they have high blood pressure, 20 % arthritis or rheumatism, 17 % asthma, 16 % migraines, 11 % diabetes mellitus, 10 % angina, 5 % heart attack, 5 % cancer (other than non-melanoma skin cancer), 5 % chronic lung disease, 4 % congestive heart failure, 3 % liver disease, and 3 % kidney disease. Moreover, the relatively poor health of the sample was indicated by an average HUI-3 preference score of 0.544 (SD = 0.400) compared with a US mean of 0.87 (SD = 0.21) in the Joint Canada/United States Survey of Health [22, 23].

Spearman correlations of the PROMIS® 4-item physical functioning scale with the HUI-3 ambulation and dexterity attributes were 0.70 and 0.55, respectively. The PROMIS® depressive symptoms scale (four items) correlated −0.62 with HUI-3 emotion. The PROMIS® pain interference scale (four items) correlated −0.68 with the HUI-3 pain attribute.

Item missing rates were less than 0.2 %; sample sizes for the multivariate analyses reported below were 2994 or larger.

3.1 Global Health Items

Six of the global health items were significantly associated and accounted for 48 % of the variance (adjusted R 2) in the HUI-3 preference score (Table 1). The strongest unique associations (standardized beta) with the HUI-3 preference scores were observed for the physical functioning and the pain rating items. The resulting equated HUI-3 preference scores had a mean of 0.530 and an SD of 0.377 compared with the observed HUI-3 preference score mean of 0.544 and SD of 0.400. The product–moment correlation between the equated and observed HUI-3 preference scores was 0.70 (n = 2994, p < 0.0001); the intra-class correlation between equated and observed scores was also 0.70.

Table 1 Regression of Health Utilities Index Mark 3 (HUI-3) preference scores on PROMIS® global health items

3.2 PROMIS-29 V2.0 Scales

Six of the PROMIS-29 V2.0 scales were significantly associated and accounted for 61 % of the variance in the HUI-3 preference scores (Table 2). Because of the suppression effects for the global pain rating item (i.e., the zero-order correlation was negative but the regression coefficient was positive), we re-ran the regression model with the item removed (Table 3); the variance explained by the model did not change (i.e., was 61 %). The strongest unique associations with the HUI-3 preference scores were observed for the physical functioning and depressive symptoms scales.

Table 2 Regression of Health Utilities Index Mark 3 (HUI-3) preference scores on PROMIS-29 V2.0 scales
Table 3 Regression of Health Utilities Index Mark 3 (HUI-3) preference scores on PROMIS-29 V2.0 scales with the global pain rating item dropped

The equated HUI-3 preference scores had a mean of 0.524 and an SD of 0.371 compared with the observed HUI-3 preference score mean of 0.544 and SD of 0.400. The equated HUI-3 preference scores correlated (product–moment) 0.78 (n = 2996; p < 0.0001) with the observed HUI-3 preference scores; the intra-class correlation between equated and observed HUI-3 preference scores was also 0.78.

3.3 PROMIS-29 V2.0 Items

The regression model for the PROMIS-29 V2.0 items showed that 17 items had significant unique associations and accounted for 64 % of the variance in the HUI-3 preference scores (Table 4). Among the 17 items, two displayed suppression effects (sleep quality and feel fatigued). The four strongest unique associations with the HUI-3 preference scores were found for three physical functioning items (do chores such as vacuuming or yard work, run errands and shop, walk at least 15 minutes) and one depressive symptoms item (felt hopeless).

Table 4 Regression of Health Utilities Index Mark 3 (HUI-3) preference scores on PROMIS-29 V2.0 items

The equated HUI-3 preference scores had a mean of 0.542 and an SD of 0.391 compared with the observed HUI-3 preference score mean of 0.544 and SD of 0.400. The equated HUI-3 preference scores correlated (product–moment) 0.80 (n = 2994; p < 0.0001) with the observed HUI-3 preference scores; the intra-class correlation between equated and observed scores was also 0.80.

3.4 Cross-Validation of Regression Equations

The product–moment and intra-class correlations between estimated and observed HUI-3 preference scores from a regression equation of the global health items in the first random half were 0.72 and 0.68 (n = 1513), respectively, compared with 0.67 and 0.63 (n = 1481) when applying the equation to the second random half sample. The product–moment and intra-class correlations between estimated and observed HUI-3 preference scores from a regression equation of the PROMIS-29 V2.0 scales in the first random half were 0.79 and 0.77 (n = 1515), respectively, compared with 0.76 and 0.74 (n = 1481) when applying the equation to the second random half sample.

3.5 Estimated Versus Observed Health Utilities Index Mark 3 (HUI-3) Preference Scores by Age and Gender

The correspondence between observed HUI-3 and equated preference scores overall and by age and gender groups is summarized in Table 5. Average equated scores were within 0.02 of observed scores for the overall sample, which is less than the 0.03 difference in scores that is regarded as minimally important [24]. Equated scores tended to be more discrepant from observed scores for the oldest study participants. For example, equated scores based on the PROMIS-29 V2.0 scales were 0.13 higher than observed scores for males 75–88 years old (0.25 vs. 0.12). The general pattern of equated HUI-3 preference scores showed a decline by age, but those aged 55–74 years (55–64 and 65–74 years age subgroups) tended to have higher scores than other age groups.

Table 5 Observed versus equated Health Utilities Index Mark 3 (HUI-3) preference scores by age and gender (standard error)

3.6 Estimated HUI-3 Preference Scores in the General Population from PROMIS-29 V2.0 Scales

The estimated HUI-3 preference scores in the PROMIS® Wave 1 sample using the PROMIS-29 V2.0 scales are similar to US general population norms reported for males by Fryback et al. [12], but HUI-3 preference estimates derived from the current study were higher (more positive) for females (Table 6).

Table 6 Estimated Health Utilities Index Mark 3 (HUI-3) preference scores in the PROMIS® Wave 1 general population sample using algorithm derived from PROMIS-29 V2.0 scales in the Op4G sample (standard error)

4 Discussion

The PROMIS® measures were rigorously developed and allow flexibility in administration using either targeted short forms or computerized adaptive testing [20]. The availability of HUI-3 preference scores based on the PROMIS® global items and PROMIS-29 V2.0 profile measure enables potential application of these measures to population-based studies and economic evaluations.

The regression models estimated here accounted for between 48 and 64 % of the variance in the HUI-3 preference scores. The best prediction was obtained for the PROMIS-29 V2.0 items, followed closely by the PROMIS-29 V2.0 scale scores and then the PROMIS® global health items. In comparison, PROMIS® Wave 1 scale scores and global health items accounted for 57 and 65 %, respectively, of the variance in the EQ-5D-3L [14]. The equated HUI-3 preference scores based on PROMIS® measures were comparable with those directly assessed using the HUI-3 in this sample. Intra-class correlations were good according to the poor, fair, moderate, good, or very good categorization suggested by Altman [25]. The largest differences between average equated and observed scores were found for older individuals, especially 75- to 88-year-old males. The higher mean equated scores for those 55–74 years old is consistent with the observed HUI-3 preference scores reported by Fryback et al. [12].

We recommend use of the PROMIS-29 V2.0 scales to estimate HUI-3 preference scores in cases where only one approach is desired, because the variance explained was similar to that of the best regression prediction equation (PROMIS-29 V2.0 items) and the PROMIS-29 V2.0 scales allow for greater flexibility in choice of items in a study. That is, the HUI-3 can be estimated from any subset of PROMIS® items that yield an estimate of physical functioning, depressive symptoms, pain interference, ability to participate in social roles and activities, and anxiety scale scores. Item response theory scores for the PROMIS-29 V2.0 scales can be estimated within Assessment Center (http://www.assessmentcenter.net). Predicted HUI-3 preference scores can be obtained using the following equation:

$$ 0. 4 20 9 4 + \left( {0.0 1 70 4 \times {\text{physical functioning}}} \right) + \left( {{-}0.00 7 9 3 \times {\text{depressive symptoms}}} \right) + \left( {{-}0.00 50 5 \times {\text{pain interference}}} \right) + \left( {0.00 4 5 1 \times {\text{ability to participate in social roles and activities}}} \right) + \left( {{-}0.00 3 1 3 \times {\text{anxiety}}} \right) $$

These predicted scores can then be adjusted to the US general population by adding 0.17103—the product of the difference in the US general population and the current study’s HUI-3 preference score means and the ratio of their SDs. Any scores below −0.359 should be recoded to −0.359, and scores greater than 1.000 should be recoded to 1.000.

There are several limitations associated with these analyses. First, the participants in this study were from an internet panel and had worse average HR-QOL than US national probability-based samples, indicating that the sample is not representative of the US general population [26]. However, the sample included a wide range of HUI-3 preference scores and is therefore useful for equating PROMIS® scores to the HUI-3 preference score. Second, the analyses are based on only a single dataset and variance explained in a derivation random half subsample was inflated by 5–7 % compared with a cross-validated random half subsample. Third, the PROMIS® and HUI-3 items were self-administered by web-based methods and responses could differ for other modes of administration [27]. However, a comparison of responses to PROMIS® items administered by different modes (interactive voice response, paper questionnaire, personal digital assistant, or personal computer) showed method equivalence [28]. Fourth, the PROMIS® measures were collected in the US but the HUI-3 scoring function was derived from a representative sample of Canadians. However, estimated scoring functions for the HUI-3 are very similar between Canada [11], the Netherlands [29], France [30], and Spain [31]. Finally, it is preferable to include the HUI-3 itself or assess preferences directly (i.e., time trade-off, standard gamble) rather than estimate the HUI-3 preference scores. When either of these is not possible, however, the estimates provided here can provide a second-best approach. A previous study used discrete-choice experiments to derive preferences for health states from the PROMIS-29 V1.0, but the estimates produced were implausible—the mean was 0.16 in a sample drawn from the US general population [32]. Potentially better alternative methods for directing eliciting preferences in PROMIS® have been proposed [33].

5 Conclusion

We estimated HUI-3 preference scores accurately from PROMIS® global health items and the PROMIS-29 V2.0 scales, and these mapped preference scores varied as expected by demographic characteristics in the PROMIS® sample. Additional research is needed to further evaluate the validity of the estimated index scores. In addition, studies are needed to examine other possible approaches to deriving preference-based scores from the PROMIS® measures. These mapped HUI-3 preference scores have applications in measuring the health of populations and estimating quality-adjusted life-years for economic evaluations. We recommend that these estimated HUI-3 preference scores be used only for group-level (not individual-level) applications. Given the flexibility of multi-domain short forms and computerized adaptive testing, the PROMIS® domain item banks and domain scores may be very useful in clinical studies.