Introduction

Health-related quality of life (HRQoL) is generally understood as the ways in which a medical condition and/or therapy affect a patient’s well-being [1, 2]. HRQoL is both subjective and multidimensional, taking into account multiple aspects of well-being, including physical and occupational function, psychological state, social interaction, and somatic sensation. When compared with more objective measures such as survival, HRQoL captures a different, and potentially complementary, dimension of the impact of illness and therapy on a patient’s life. This is especially true with older adults with a considerable illness burden and limited life expectancy.

Patient-reported outcome measures (PROs), which are often based upon HRQoL, may improve clinicians’ ability to monitor symptoms and treatment effectiveness, and may engage patients in their care through better patient–physician communication [3, 4]. The most widely used HRQoL instruments to date have included the Veterans RAND 36-Item Survey Instrument or VR-36 [5] which was developed in the early 1990s and has an established application of two summary scores. The patient-reported outcomes measurement information system 29-Item profile measure (PROMIS-29 V2.0) is a relatively new HRQoL instrument (http://www.healthmeasures.net/explore-measurement-systems/promis), which was developed using modern measurement theory and was calibrated and scored based on contemporary samples [6].

PROMIS profile instruments are a collection of short forms containing a fixed number of items from the same 7 PROMIS domains (physical function, anxiety, depression, fatigue, sleep disturbance, ability to participate in social roles and activities, and pain interference) plus a single item on pain intensity. Depending on how many questions are asked in each domain, there are three different lengths of PROMIS Profile forms: PROMIS-29, PROMIS-43, and PROMIS-57. The PROMIS-29 is the shortest, with 4 questions asked for each of 7 domains, plus the single pain intensity item. Each short form is scored separately, yielding a total of 7 domain scores. There are two versions of PROMIS-29, and the one we used is the second version, PROMIS-29 v2.0. The PROMIS Profile instruments are intended for adults (ages 18+), and are designed to be focused on general HRQoL, as opposed to disease-specific measures intended to be used with a population who all have a certain medical condition (e.g., heart failure). All PROMIS instruments assess HRQoL over the past 7 days, with the exception of two domains (physical function and ability to participate in social roles and activities), which do not specify a timeframe.

Profile measures such as the SF-36 provide a wealth of information; for example, the SF-36 has 8 scales. However, it has also been important to summarize HRQoL with fewer scores [7]. In the case of SF-36 and related instruments, this has been done with the summary scores, called the physical component score (PCS) and the mental component score (MCS), which represent two underlying factors based on factor analyses of SF-36 scale scores [8, 9]. While all 8 scales of the SF-36 contribute to both PCS and MCS to some extent, some of the 8 scales are more strongly represented within one or the other. With PCS, physical health is defined mostly by measures of physical functioning, pain, and role limitations due to physical health problems. With MCS, mental health is primarily measured by emotional well-being and role limitations caused by emotional problems [8, 9]. Other scales, such as general health perceptions, vitality, and social functioning, contribute relatively strongly to both physical and mental health. One important feature for PCS and MCS is that in their construction, they were forced to have zero correlation with each other, which may conflict with the reality of how most people experience HRQoL. This, in turn, may contribute to inconsistency between SF-36 scales scores and summary scores (see more in-depth discussion in [8]). In a recent study, Hays and colleagues used a sample that represented the general US population to combine the 7 PROMIS-29 v2.0 domain scores and the single pain intensity item into two composite scores, physical and mental health summary scores [10]. The new summary scores were constructed allowing physical and mental health to be intercorrelated, rather than forcing a zero correlation between them. Here, we adapted these scores to a population much older and with a much higher illness burden than the general US population. Our sample was collected prospectively, and consisted of patients age 65 + with at least 2 of 13 specified chronic conditions.

Our objective was to assess the utility of these adapted scores as a tool to characterize HRQoL in this population, as an extension of their original derivation across the entire population of US adults. We address this task through two companion manuscripts. In this paper, we demonstrate how we created similar summary scores with the PROMIS-29 v2.0 for the population of older adults with multiple chronic conditions (MCCs). In a companion paper [11], we establish construct validity for these summary scores by comparing them against summary scores of the VR-36, a modification of SF-36 which was developed in response to findings from the Medical Outcomes Study [5, 12].

Methods

Participants

Participants were recruited from Kaiser Permanente Colorado (KPCO), a not-for-profit integrated delivery system that directly provides primary and specialty care, including both ambulatory and hospital-based care. KPCO members were eligible to participate if they were age 65 or older, were assigned to a primary care provider at a KPCO ambulatory clinic (which demonstrates that they were not permanently residing at a skilled nursing facility), had been seen for clinical care at least once in the past 12 months, had a valid email address, and had at least 2 of 13 specific chronic conditions. The 13 conditions include arthritis, cancer, chronic lung disease, congestive heart failure, depression, diabetes, hypertension, inflammatory bowel disease, ischemic heart disease, osteoporosis, other heart problems, sciatica, and stroke. These conditions were selected because they are assessed as part of the Medicare Health Outcomes Survey (http://www.hosonline.org/). Each condition was defined using a list of International Classification of Diseases – Clinical Modification, Version 10 (ICD-10) codes; these codes and more information on how we chose the 13 conditions can be found in “Appendix 1”. We oversampled participants age 80 and older to ensure an adequate representation of the oldest-old. We obtained approvals from the institutional review boards of both KPCO and the RAND Corporation. The work described in this manuscript was part of a broader program of work, which has been described in full elsewhere [13].

Data collection

The survey included the PROMIS-29 v2.0 questionnaire. Survey data were collected between 11/11/2016 and 3/6/2017. A total of 4991 patients were deemed to be eligible based on the criteria described above. Participants were given 2 weeks to opt out, after which time they were presumed to be eligible for initial contact. KPCO also performed one final check to remove any deceased respondents from the list. A total of 283 eligible patients opted out, 677 were excluded due to an invalid mailing address or email address, and 282 had died. The final sample file therefore contained 3749 potential participants who had valid addresses, were not deceased, and had not opted out.

We randomly assigned participants to different survey modes, with the same survey instrument administered across all modes. Initially, 2764 participants were assigned to the web survey, 376 to the mail survey, and 372 to the phone survey. The number assigned to the mail and phone surveys was based on previous experience by the RAND Survey Research Group with similar surveys, as well as the final number of responses we wanted to gather. We assigned a much larger number to the web survey, based on conservative assumption about the response rate.

Those assigned to the web survey were sent an email invitation, followed by weekly email reminders for 3 weeks if they had not completed the survey. Participants who only partially completed the web-based survey were sent reminders asking them to finish the survey they had started. Those invited to complete the survey by mail received an initial packet that included a personalized invitation letter, information sheet, mail survey, and prepaid reply envelope. Materials identical to the first packet were sent to non-responders 4 and 6 weeks after the first survey mailing. For those invited to complete the survey by phone, telephone center staff contacted respondents an average of five times. We anticipated that some of the participants would require help from a proxy, such as a family member, to complete the survey, and therefore proxy responses were allowed for all modes.

Analytic approach

Scoring the PROMIS-29

We scored the 7 domains of the PROMIS-29 using an online scoring tool intended for this purpose (https://www.assessmentcenter.net/ac_scoringservice). More extensive information about scoring is available at http://www.healthmeasures.net. These scores were calculated based on the principles of item response theory (IRT) [14], and hence they are called IRT scale scores. Two summary scores were developed to measure the physical and mental domains of HRQoL based on the 7 IRT scales scores and the single pain intensity item. The process for developing PROMIS-29 summary scores for the general population has been described by Hays et al [10]; below, we describe how we adapted it for use in this population.

Initial exploration of our sample

We first estimated the reliability of the 4-item short forms for each of the 7 domains by calculating Cronbach’s Alpha [15]. Together with the single pain intensity item, we examined the Pearson product-moment correlations among these 8 scores.

We then proceeded to exploratory factor analysis (EFA), using the PROMAX oblique rotation and maximum likelihood (ML) estimation method on these 8 scores. Eigenvalues [16], scree plot [17], and proportion of variance explained were used to guide decisions about the number of factors to extract from the 8 scores. We also tested alternatives of combining some of the scores, including combining the two pain scores (pain intensity and pain interference), and combining depression and anxiety into a single score that represents emotional distress. Based on the rotated factor loading pattern, we examined the interpretability of EFA results. When the criteria of the number of factors extracted were satisfied and the results were interpretable, we proceeded to confirmatory factor analysis (CFA) to examine model fit. The CFA model was based on the same loading pattern that we extracted from EFA results.

Adapting the summary scores to our sample

The PROMIS-29 summary scores were originally developed by Dr. Hays and colleagues using a nationally representative sample recruited from the Opinions4Good (Op4G) panel (http://op4g.com/our-panel/). Op4G participants all completed the PROMIS-29 profile measure, v. 2.0, on the internet, and the sample represents the US adult population in terms of gender, region, race/ethnicity, and education. The average age of Op4G sample is 46 with a range of 18–88 years old.

We compared results from our MCC sample with those from Hays and colleagues’ Op4G sample to ensure consistent factor loading patters across the two samples. The standardized scoring coefficients were derived from the Op4G sample and were applied to our MCC sample to score our participants. This process provides the two summary scores, the PROMIS-29 physical health summary score (PHS) and mental health summary score (MHS). We used SAS, version 9.4 [18] for EFA and correlation analyses. All CFA were conducted using Mplus version 8 [19].

Results

Survey response

Out of a sample of 3749 people who received a survey, a total of 1359 participants responded, for a total response rate of 36%. Response rates varied by survey modality, and were 25% for the web mode, 51% for the mail mode, and 48% for the phone mode. The web mode was noteworthy in that 37% of those who began to respond to the web-based survey ultimately abandoned the effort before completing enough of the survey to support including them in the HRQoL analyses, which we defined by having enough responses to calculate at least one PROMIS-29 scale score. With the phone mode, the interviewer made sure the respondent completed all items; with the mail mode, it was also uncommon for the respondent to return a survey without enough information to support analysis.

Characteristics of the study participants are shown in Table 1. The mean age of the MCC sample was 81 (SD 6.9), ranging from 66 to 89. A majority of participants (56%) were age 80 or older, and a majority (89%) were White/non-Hispanic. By definition, all had at least 2 of 13 chronic conditions; 35% had exactly 2, 31% 3, 18% 4, and 16% 5 or more. Prevalence of some representative chronic conditions was relatively high; for example, 38% had chronic lung disease, and 31% had diabetes.

Table 1 Respondent demographic and clinical characteristics

Appendix 2” shows a comparison of responders with non-responders based on data derived from the KPCO electronic medical record, as well as a regression analysis to predict response based on these characteristics. Responders and non-responders were generally comparable in terms of age, burden of chronic conditions, and healthcare utilization. We did not find mode effects between the web, mail, and telephone surveys, nor did we find important differences between responders and non-responders in terms of age, burden of chronic conditions, and healthcare utilization. Women were somewhat less likely to respond compared to men (adjusted odds ratio (AOR) 0.77, p = 0.002), and compared to White non-Hispanics, Hispanics and non-Whites were less likely to respond (AOR 0.47 and 0.58, respectively; p < 0.001). Details can also be found online at https://www.rand.org/pubs/research_reports/RR2176.html.

PROMIS-29 scores

The means, standard deviations, minimum, and maximum of the 8 scores are presented in Table 2. The IRT scale scores we obtained from the online scoring service follow the same PROMIS T-score metric with a mean of 50 and standard deviation of 10. The single pain intensity item is scaled from 0 to 10, with 0 being no pain and 10 being worst imaginable pain.

Table 2 Items used for PROMIS-29 summary score calculations

We measured internal consistency reliabilities of the 7 PROMIS-29 health domain scales that are based on the 4-item short forms. Cronbach’s Alphas by domain were as follows: physical functioning (0.91), anxiety (0.87), depressive symptoms (0.89), fatigue (0.92), sleep disturbance (0.87), ability to participate in social roles and activities (0.93), and pain interference (0.95). Product-moment correlations among these 7 domain scales and the single pain intensity item are presented in Table 3. The size of these correlations ranges from − 0.70 (fatigue and ability to participate in social roles) to 0.79 (pain interference and pain intensity). All correlations are statistically significant with p values < 0.001.

Table 3 Pearson product-moment correlations among the PROMIS-29 domain scores

Factor analyses

EFA results indicated a two-factor solution based on an overall evaluation of eigenvalues and scree plot. Kaiser’s criterion [16] considers factors with an eigenvalue greater than one as common factors. As is shown in the scree plot (see Fig. 1), two eigenvalues are above one, suggesting two factors be kept. The scree plot suggested solutions with one or three factors. Although previous research [20] suggests a two-factor solution, we reserved the possibility of one-factor or three-factor solutions to be further explored beyond EFA stage.

Fig. 1
figure 1

Scree plot and variance explained

We then proceeded to CFA to compare 1-, 2-, or 3-factor solutions. We also explored the possibility of combining some items given their higher correlations (see Table 3) at the CFA stage. Specifically, we combined the pain intensity and pain interference items (to create a single item “pain,” r = 0.79), and also the anxiety and depression items (“emotional distress,” r = 0.74). Our CFA results showed that the 2-factor solution based on these 6 variables (fatigue, sleep disturbance, social ability, physical function, plus the two combined variables—pain and emotional distress) fit the data best. Specifically, the root mean square error of approximation (RMSEA) value for this model was 0.056, with Comparative Fit Index (CFI) of 0.99 and Tucker-Lewis Index (TLI, also called the non-normed fit index or NNFI) of 0.98; the accepted standard is that a RMSEA value smaller than 0.06, and CFI and TLI values above 0.95, are indicators of a relatively good fit between the hypothesized model and the observed data [21]. The reduced 6-indicator set includes four indicators representing original PROMIS domain scores (i.e., fatigue, sleep disturbance, social ability, and physical function), and two new indicators emotional distress, created by averaging the anxiety and depression domain scores, and pain which was the average of the pain intensity scale and the pain interference domain score. All items were standardized to account for differences in their original scales. This result is both consistent with previous research [20] and what was found in Hays et al.’s study [10] from which we derived the scoring coefficients we used here.

We had therefore developed a two-factor model; the two factors were correlated (r = 0.59) in our final CFA model. Consistent with the findings in Hays et al [10], we considered the first factor to be physical HRQoL, which we call the physical health score (PHS). The PHS was defined by physical functioning, fatigue, ability to participate in social roles and activities, and the combined pain measure. We considered the second factor to be mental HRQoL and it was defined by all indicators except physical functioning. We call this second factor the mental health score (MHS).

This two-factor solution allowed cross-loadings of fatigue, social ability, and pain on both factors, suggesting that they contribute to both factors (see Table 4 for the factor loading pattern, which is very close to what Hays et al [10] found). Previous research evaluating summary scores from the SF-36 [20] found that fatigue and sleep disturbance contributed equally to both physical and mental quality of life. In our case, however, fatigue contributed more to mental HRQoL (− 0.76) than to physical HRQoL (− 0.14), while sleep disturbance contributed only to mental HRQoL (− 0.51). Two-factor solutions of the SF-36 also showed that ability to participate in social roles and activities was only related to physical health [20]. Our findings showed that it was related to both physical and mental health, and in fact that it contributed more strongly to mental HRQoL (0.55) than to physical HRQoL (0.40).

Table 4 Factor loadings and standardized scoring coefficients from CFA model with 6 indicators

Scoring PHS and MHS

We proceeded to calculate the PHS and MHS scores in our MCC sample, using the standardized factor scoring coefficients that Hays had developed using the Op4G sample. The last two columns in Table 4, and a note below the table, explain how to calculate these summary scores. These scores were set on the PROMIS T-score metric (i.e., normed to the general US population) with a mean of 50 and standard deviation of 10. The mean MHS for the MCC sample was 50.1, similar to the US population, while the mean PHS was 42.2, almost a standard deviation below the US population. The correlation between the two measures of PROMIS-29 HRQoL in our sample was 0.74, implying that the two concepts of physical and mental HRQoL may overlap fairly strongly among the older and sicker population we studied here.

Discussion

Despite the ever-increasing use of PROMIS-29 v2.0, this instrument has not been explicitly evaluated among older patients with multiple chronic conditions, a population that often requires intensive health care and complex care coordination. The need for measuring HRQoL, however, is particularly relevant for this population. This is because given the limited life expectancy of most older adults with multiple chronic conditions, other ways of measuring quality of care (e.g., certain processes of care) or patient outcomes (e.g., long-term survival) may be less relevant.

The utility of PROMIS-29 is enhanced by the use of two summary scores. These two scores serve to distill the information from the seven domains of the PROMIS profile, and the single pain intensity item, into two scores, which represent the physical and mental aspects of HRQoL. Here, we adapted these two scores, which we call PHS and MHS, for use in a population of community-dwelling older adults (age 65+) who had at least two of thirteen pre-specified chronic health conditions. This extends their use beyond the general US population in which they were originally derived [6]. This paper also provides guidance for those who wish to adapt these summary scores for use in other populations, and to calibrate them to allow a comparison between these populations and the general US population, as we did.

Previous efforts relying upon the PROMIS-29 measures have used the seven parent scales, plus the pain intensity item, to compare populations [22]. A reliance on 8 scales to track changes in HRQoL over time, or to compare populations with each other, necessitates multiple comparisons and is not easily or intuitively grasped. In contrast, reliance on the physical and mental aspects of HRQoL concords with long-held understandings of how HRQoL works, and allows the PROMIS-29 to be used much as other older HRQoL instruments have been used. For example, the SF-36 has long featured summary scores, called the physical component score (PCS) and the mental component score (MCS). In one study, the HRQoL burden of low back pain was estimated at − 5.5 points on PCS and − 2.8 points on MCS [23].

Hays and colleagues’ study [10], using a nationally representative sample, showed strong support for some of the psychometric properties of the PROMIS-29 v2.0 summary scores. Their results provide the factor scoring coefficients (see Table 4) for estimating the summary scores that should apply to the general US population. The present study helps open up a similar possibility for the use of the PHS and MHS, and also points the way toward how these scores can be adapted for use in a particular population. In addition to future validation work that is needed to assess their scoring system in different samples, algorithms have been developed in the PROMIS project to estimate the EQ-5D-3L [24] and the HUI-3 [25] from the PROMIS-29 scales. Preference-based scoring functions can also be estimated directly from the PROMIS-29 [26,27,28].

We acknowledge several limitations. First, older KPCO members with MCC may be somewhat different from analogous older patients with MCC drawn from the general US population in several ways—not least of which is that older KPCO members are mostly enrolled in a medicare advantage benefit plan. Another limitation of the study is that our sample had limited diversity in terms of race/ethnicity, and relatively few participants lived in a high-poverty zip code.

Despite these limitations, this study has adapted the PHS and MHS, summary scores for the PROMIS-29 v2.0, for use in a population of older adults with a considerable illness burden. It is particularly important to have HRQoL summary scores for this older and sicker population with more chronic conditions, because this is a population that often has intensive health care needs requiring complex care coordination. Future research should use the PHS and MHS summary scores to quantify the impact on HRQoL from specific chronic conditions, as measured using the PROMIS-29.