Introduction

Quantifying the health of populations is important for designing and evaluating policies and setting research priorities [1, 2]. In the last two decades, there has been an increasing emphasis on using patient-reported outcomes of functioning and well-being, or health-related quality of life (HRQoL), to quantify and monitor population health [3]. For example, in the USA, either the SF-36 or the VR-12 has been used in the Medicare Health Outcomes Survey since 1998 [4, 5], and the SF-12 has been used in the Medical Panel Expenditures Survey since 2003 [6]. HRQoL measures selected for measuring population health have usually been disease agnostic (generic), providing an overall description of health not limited to one organ system or disease, thus making them applicable to all survey participants [7]. In addition to being generic, most measures used for population health evaluations have a single composite score to simplify analyses and comparisons. Multi-attribute utility theory provides a framework to develop a single preference-based score from multiple health domains [8, 9].

Because social determinants of health (SDoH) have many known direct and indirect effects on HRQoL, any measure used for population evaluation should be sensitive to these effects [10,11,12,13,14,15]. Several frameworks for understanding the relationships between SDoH, medical care, health conditions, and HRQoL are available [16]. For example, the World Health Organization constructed an explanatory and action-oriented conceptual framework of the relationship between SDoH and health and well-being, which includes elements of structural determinants, social determinants, socioeconomic position, individual circumstances, and the health system [17]. Models such as the WHO’s emphasize that HRQoL is not just a product of disease and healthcare, but is also directly and indirectly influenced by SDoH.

The most widely used generic preference-based measures in clinical intervention evaluations are the EuroQol-5D (EQ-5D) [18, 19], the Health Utilities Index (HUI) [20, 21], and the SF-6D [22]. These measures have also been shown to be sensitive to SDoH such as income, education, financial security, perceived discrimination, and gender [10,11,12,13,14,15]. Each of these measures, however, has limitations, including ceiling effects in the general population and an inability to robustly assess individual health domains (such as physical functioning or depression) [23]. The PROMIS-Preference (PROPr) score was built, in part, to address these limitations.

PROPr is based on the Patient-Reported Outcomes Measurement Information System (PROMIS) [24]. Using PROMIS as the foundation for a preference-based score addresses several of the limitations of other scoring systems because the PROMIS domain measures have been developed using item response theory (IRT) [25]. IRT is a modern psychometric measurement technique in a which a pool of items are calibrated over a latent construct to create an “item bank”. In HRQoL measurement, the latent constructs include concepts such as depression, pain interference, physical function, and fatigue; PROMIS currently has over 90 adult item banks and over 20 pediatric item banks. All of the responses to all of items within a bank provide information about a respondent’s location on the latent construct. This allows a single item bank to be adapted to fit many projects needs while maintaining the score on a latent construct that is comparable to any other project that used the same item bank and calibration. For example, physical function could be measured by a standardized six-item short form for a clinical setting, a custom two-item short form for a population study, and a computer adaptive test in a research study and the physical function scores can be compared. This is true even if each physical function measure used entirely different items from the item bank.

By linking directly to PROMIS domains, PROPr gains many of the advantages of an IRT-based descriptive system including flexible administration of items and the potential to avoid ceiling and floor effects [26]. PROPr also creates the possibility of collecting both health domain information and preference-based scores from the same questions. Moreover, PROMIS and PROPr are both freely available, whereas the other measures require licensing.

Before any measure can be adopted for widespread use, its validity needs to be demonstrated [7]. PROPr has been shown to have construct validity for chronic conditions with minimal ceiling and floor effects in the general population [27]. This report evaluates the construct validity of PROPr by its association with SDoH in the general US adult population.

Methods

Data

Data used in this study are from a general population sample designed to be representative of US adults age 18 and older. Participants were selected from the AmeriSpeak Panel maintained by NORC at the University of Chicago using sampling strata based on age, race, Hispanic ethnicity, education, and gender [28]. The primary focus of the study was to better understand individuals’ HRQoL and its relationship to SDoH. This survey was offered in English and Spanish and was made available on the web and by phone in December 2018. Amerispeak panelists are consented for research when they join the panel and study participants received ”Ameripoints” (a cash equivalent point system) from AmeriSpeak for completing surveys. NORC provided demographic information about the sample, latitude and longitude of the participants’ household, and the sampling weights necessary to make nationally representative estimates. Data used in these analyses are available online at the Open Science Framework [29].

HRQoL measures

Because of substantial redundancy in questions across the HRQoL measures, participants were randomly assigned to complete either (1) the EQ-5D-5L and SF-12 or (2) the HUI Marks 2 and 3. All participants received PROPr and all SDoH questions.

PROPr

PROPr is based on 7 Adult PROMIS domains: Cognitive Function—Abilities v2.0, Depression v1.0, Fatigue v1.0, Pain Interference v1.0, Physical Function v2.0, Sleep Disturbance v1.0, and Ability to Participate in Social Roles and Activities v2.0 [30]. The PROMIS questions refer to the participant’s own health “in the past 7 days” and have five response options. Participants answered 2 questions per domain for a total of 14 questions. Responses were scored by uploading responses to the scoring service on Assessment Center, and scoring used the default IRT parameters for each item [31]. The PROPr scoring algorithm was developed from standard gamble valuations from a US sample [26]. Possible PROPr scores range from − 0.022 (worst) to 1.0 (best).

To keep sample sizes consistent across measures, PROPr scores were split into two sets (set 1 and set 2) using the same randomization that assigned participants to the other HRQoL measures.

EQ-5D-5L

The EQ-5D-5L questions refer to “your health today”. The EQ-5D descriptive system has five domains (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression) and the – 5 L version has five response options [32] (no problems, slight problems, moderate problems, severe problems, extreme problems/unable to do). This study used the US time trade-off value set [19]. Possible scores range from -0.699 (worst) to 1.0 (best).

Health utilities index (HUI)

The original HUI Mark 2 classification system included seven attributes: sensation, mobility, emotion, cognition, self-care, pain, and fertility. Fertility is not currently used in the HUI Mark 2 questionnaire, so current versions have six attributes and the disutility of the fertility attribute is set at 0 [20]. The HUI Mark 3 defines health using eight attributes: vision, hearing, speech, ambulation, dexterity, emotion, cognition, and pain [21]. The 15 item self-administered HUI questionnaire allows scoring of both Mark 2 and Mark 3. HUI questions refer to “your level of ability or disability during the past week”. Scoring algorithms for both HUI Mark 2 and HUI Mark 3 were derived from standard gamble assessments made by adults in community samples in Hamilton, Ontario. HUI Mark 2 scores range from -0.03 (worst) to 1.0 (best) [20]; HUI Mark 3 scores range from − 0.36 (worst) to 1.0 (best) [21].

Short form-6D (SF-6D) from the SF-12

The Short Form-12 (SF-12) version 1 has 12 multiple-choice items that relate to eight dimensions: physical functioning, physical role limitations, emotional role limitations, pain, general health, vitality, social functioning, and mental health [33]. The SF-6D scoring algorithm uses seven of the questions from the SF-12. These seven questions were evaluated using the standard gamble technique in a representative sample of the UK population [22]. Possible SF-6D scores range from 0.345 (worst) to 1.0 (best).

Self-reported SDoH

Participants had several demographic and survey variables provided by the panel company (age, gender, race, Hispanic ethnicity, survey language, survey mode). They were also asked questions about the SDoH listed below. Most of these questions were extracted from large US national surveys such as the National Health Interview Survey [34] or the National Health and Nutrition Examination Survey [35]. Specific sources, questions, responses, and coding are included in the Supplement. All SDoH were binary or categorical outcomes and coded such that the reference category would have the least hardship.

  • Health literacy

  • Educational attainment

  • Has a usual medical provider

  • Medical provider CAHPS scores

  • Difficulty of getting to medical appointments

  • Medical insurance type

  • Difficulty paying medical deductibles

  • Difficulty paying medical copays

  • Difficulty paying monthly bills

  • Experienced intimate partner violence in the last year

  • Food insecurity

  • Social isolation

  • Stress

  • Social support

  • Living with partner

  • Employment

  • Household income

  • House ownership

SDOH from geographic information system linkages

A participant’s household latitude and longitude were linked to several data sources. Census tract information was drawn from the 2016 American Community Survey [36]. Air quality information was drawn from the 2017 Environmental Protection Agency Air Quality System [37]. County crime statistics were drawn from the 2014 Uniform Crime Reporting Data [38]. Variables are given as follows:

  • Census tract percent education less than high school

  • Census tract percent below poverty

  • Census tract percent household income under $10,000

  • Census tract percent household with food stamps

  • Census tract percent unemployed

  • Census tract percent vacant housing

  • Census tract percent medically uninsured ages 18–34

  • Air quality unhealthy for sensitive groups

  • Air quality median score

  • County rate of violent crime

  • County rate of property crime

  • County rate of drug possession crime

  • County rate of drug sale crime

Self-reported chronic conditions

Participants were asked, “Have you ever been told by a doctor or other health professional that you have any of the following?”:

  • Coronary heart disease

  • Angina (also called angina pectoris)

  • A heart attack (also called myocardial infarction)

  • A stroke

  • Emphysema

  • Chronic obstructive pulmonary disease (also called COPD)

  • Asthma

  • Cancer or malignancy of any kind

  • Arthritis, gout, lupus, or fibromyalgia

  • A seizure disorder or epilepsy

  • A depressive disorder, including depression, major depression, dysthymia, or minor depression

  • Diabetes or sugar diabetes

Statistical analysis

Information about sample demographics, SDoH exposure, and presence of chronic conditions is presented as percentages or means, as appropriate. Because all participants were given PROPr but participants were randomly assigned to complete either the HUI Mark 2 and 3 or the EQ-5D and SF-6D, PROPr scores were split into two sets. This ensures that any differences in the number of SDoH that are statistically significant are not due to different sample sizes across measures. Summary scores of each measure are presented as histograms. Percent of participants at the ceiling was calculated, and score skewness was quantified with Pearson's moment coefficient of skewness. Convergent validity was evaluated using Pearson correlations between summary scores within subjects.

Known-groups validity was evaluated using the coefficient associated with SDoH created using ordinary least squares regression in which a summary score was regressed on age, gender, health conditions, and a single SDoH. The coefficient(s) for the SDoH is the estimated effect on the preference score of being exposed to a SDoH vs. not being exposed. Categorical variables (such as education category or insurance category) were coded as dummy variables. Because the ranges of the census tract and county variables varied widely, their associated coefficients also varied widely. These variables were standardized by multiplying a variable’s range to its coefficient and standard error. For example, “census tract education less than high school” values vary from 0 to 0.659 and its unstandardized coefficient is − 0.237 with a standard error of 0.052; its standardized coefficient is − 0.237 × 0.659 = − 0.165 with a standard error of 0.052 × 0.659 = 0.036. A separate analysis was done for each SDoH. Since all SDoH were coded such that higher scores indicated more hardship, negative coefficients are expected. A p < 0.05 was considered statistically significant. For SDoH with statistically significant coefficients in at least four of the six measures, Spearman correlations were calculated using the SDoH coefficients between the different measures.

All analyses were performed using SAS 9.4 (The SAS Institute, Cary, NC). All analyses except the Spearman correlations of SDoH coeffficients were weighted to be nationally representative using person weights provided by the survey company.

Results

The sample had 4142 participants. The average age was 47 and 52% of participants were female. Seventy-three percent of the participants reported their race as white, 13% reported black, and 15% reported other races; 16% of the sample reported Hispanic ethnicity and 3% completed the survey in Spanish. Eleven percent of the sample completed the survey by phone and the rest completed it by web interface. Negative SDoH exposures were common. For example, 45% reported difficulty paying their bills, 15% reported intimate partner violence within the last year, 30% reported some food insecurity, and 40% reported social isolation. Full descriptive information is available in the supplement with the source of the question and coding information.

Figure 1 illustrates the distribution of summary scores. The percent of participants with a score at the ceiling was 11.8% for the EQ-5D, 4.8% for the HUI Mark 2, 3.6% for the HUI Mark 3, 0.4% for PROPr set 1, 0.7% for PROPr set 2, and 1.2% for the SF-6D. All distributions were negatively skewed; Pearson's moment coefficient of skewness was − 2.7 for the EQ-5D, − 2.2 for the HUI Mark 2, − 1.9 for the HUI Mark 3, − 0.1 for PROPr set 1, − 0.1 for PROPr set 2, and -0.8 for the SF-6D. For comparison, this coefficient is 0 for a symmetric distribution (such as a normal distribution), around 1 for a half-normal distribution, and 2 for an exponential distribution.

Fig. 1
figure 1

Distribution of summary scores

Pearson correlations were 0.72 between EQ-5D-5L and PROPr set 1; 0.74 between EQ-5D-5L and SF-6D; 0.60 between HUI Mark 2 and PROPr set 2; 0.85 between HUI Mark 2 and HUI Mark 3; 0.62 between HUI Mark 3 and PROPr set 2; 0.71 between SF-6D and PROPr set 1. For context, Evans suggests interpreting correlations of 0.40 to 0.56 as moderate, 0.60 to 0.79 as strong, and 0.80 to 1.0 as very strong [39]. It should be noted that the correlation between HUI Mark 2 and HUI Mark 3 is inflated because these measures use some of the same questions.

Figures 2 and 3 illustrate the age-, gender-, and chronic condition-adjusted SDoH coefficients in self-reported SDoH variables and community SDoH variables. Results used to generate Figs. 2 and 3 are in Supplement 2. Of the 42 SDoH variables tested, 27 had statistically significant coefficients for EQ-5D, PROPr (both sets), and SF-6D. Statistical significance was reached for 17 SDoH variables with HUI Mark 2 and 23 variables with HUI Mark 3. As the figures illustrate, the measures tend to move together across SDoH. Fourteen of the SDoH had statistically significant coefficients in all of the measures and another six had statistically significant coefficients in five of the six measures. The measures show significant differences in HRQoL for a variety of person-level and community-level variables, particularly related to education, income, food and financial insecurity, and social interactions. The measures used in this study are not associated with county variables.

Fig. 2
figure 2

Age-, gender-, and chronic condition-adjusted coefficients for self-reported social determinants of health

Fig. 3
figure 3

Age-, gender-, and chronic condition-adjusted coefficients for census tract and county social determinants of health

The figures also illustrate that there is variance across the measures within each SDoH variable. The average SDoH coefficients by measure were -0.037 (95% CI − 0.051 to − 0.024) for the SF-6D, − 0.039 (95% CI − 0.053 to − 0.026) for the HUI Mark 2, − 0.063 (95% CI − 0.084 to − 0.043) for the HUI Mark 3, − 0.059 (95% CI − 0.086 to − 0.048) for PROPr set 1, − 0.067 (95% CI − 0.077 to − 0.041) for PROPr set 2, and -0.086 (95% CI − 0.110 to − 0.063) for the EQ-5D-5L. Despite the difference in absolute values of SDoH coefficients across the different measures, the rankings of SDoH coefficients by different measures were often similar. There were 27 SDoH with statistically significant coefficients in at least four of the six measures. These SDoH are starred in Supplement 2. Table 1 shows the pairwise Spearman correlations between different measures across estimates for these 27 SDoH coefficients. All have Spearman correlations above 0.70.

Table 1 Spearman correlations of measures using 27 social determinants of health coefficients

Discussion

This report provides evidence of the validity of the PROPr score by testing its association with SDoH on HRQoL using a large cross-sectional dataset from the general US population. PROPr discriminates between those with and without a variety of SDoH exposures in the expected direction. PROPr was simultaneously administered with 4 other widely used preference-based measures that have been shown to be associated with SDoH: EQ-5D, HUI Mark 2, HUI Mark 3, and SF-6D. PROPr has good convergent validity with these other measures when measured as correlations of scores and as associations with SDoH. These findings support the use of PROPr to monitor population health in the general US population.

PROPr scores were associated with self-reported variables (education, income, employment, difficulty getting to medical appointments, food and financial insecurity, intimate partner violence, stress, and social isolation) and census tract variables (education, poverty, unemployment, vacant housing), but not county variables (air quality, crime). These county-level variables may be too coarse to apply to individual participants. Even when adjusted for demographic variables and chronic conditions, several expected SDoH exposures were not associated with a difference in scores (not having a usual medical provider, medical insurance type, employment categories, living with a partner, home ownership). This may be due to incomplete model adjustment. For example, though we may expect that those with a usual medical provider have better HRQoL than those without, this comparison may be confounded by healthy people not establishing a relationship with a medical provider. To keep the many analyses in this report consistent, further analyses were not performed to better understand these null findings.

Given the wide range of variables to which they are sensitive, any of the measures used in this report appear appropriate for use in population health monitoring, including calculating aggregated indices of morbidity and mortality such as Quality Adjusted Life Years (QALYs). If used for monitoring population health, these measures would provide the added benefit of being able to be compared to clinical interventions because clinical research often uses these measures [40,41,42]. As shown in other studies, even when controlling for demographics and health conditions, SDoH are associated with a substantial difference in HRQoL. Using these measures to evaluate programs intended to improve overall population health by addressing SDoH will be able to show the value of these programs relative to clinical interventions.

This study found that the coefficients associated with SDoH vary substantially by measure. The measures with the smallest average coefficients were the HUI Mark 2 and SF-6D. The measure with the largest average coefficients was the EQ-5D-5L with US scoring. These findings are consistent with other regression-based analyses of co-administered measures, as well as analyses using item response theory, which show different measures assign varying decrements to the same change in a latent joint construct of health [43]. PROPr’s average coefficients were in the middle of the tested measures. Despite the magnitude of coefficients varying across measures, the relative coefficients, as measured by Spearman correlation, of these SDoH was similar across these preference-based scores.

Findings in this report are limited by the data being cross-sectional, the unknown severity of chronic conditions, and relatively simple questions about SDoH. While there are several concerns about the representativeness of online panel surveys [44], the project was intended to show sensitivity to SDoH without quantifying the national impact of SDoH. The project’s strengths include its use of standardized questions from large US surveys when available and co-administration of common generic HRQoL measures.

As with several other studies that co-administered these measures, these results support selecting a single preference-based measure that can be uniformly applied across research, clinical, and population health studies. Though use of any one measure will give similar health condition rankings to those of any other single measure, mixing health condition estimates from different measures can create substantial variation in condition rankings [45,46,47, 48]. PROPr has several arguments in its favor for use as a harmonizing measure. These include: PROPr is free to use, has less ceiling effect and skew in the general population than other measures, and is linked to a robust descriptive system (PROMIS). Though not presented in this report, collection of PROPr scores requires the collection of seven common PROMIS domains which can also be analyzed and compared to national samples. Arguments against the use of PROPr include its relative newness, limited number of validity studies, and its requirement to collect at least 14 questions.

Harmonization of measures has the potential to substantially improve the practice of clinical outcomes research, cost-effectiveness analysis, and population monitoring. The findings presented in this report provide evidence of construct validity for PROPr: it is correlated with other widely used generic preference-based summary scores and generates lower scores for those persons with negative SDoH exposures than for those without. Future work is needed to validate PROPr using longitudinal data collection, particularly with interventions that are expected to change HRQoL.