Introduction

As the population ages and life expectancy increases in the USA, living a longer high-quality life in good health is becoming more important to the public health community, the medical community, and individuals (Moriarty et al. 2005; Netuveli and Blane 2008; Sondik et al. 2010). Questionnaires used on a variety of surveys are used to monitor the concept of health-related quality of life (HRQOL), a multidimensional construct of self-rated physical and mental health. “Generic” HRQOL instruments, which include a wide range of health domains, are collected on a number of large population surveys. For example, the Centers for Disease Control and Prevention’s (CDC) four-item measure of HRQOL (HRQOL-4; CDC 2000) has been collected since 1993 on the Behavioral Risk Factor Surveillance System (http://www.cdc.gov/brfss/), for example, as reported in Moriarty et al. (2005) and Moriarty et al. (2003). This measure, and other measures of HRQOL, are also included as part of ongoing population surveys, such as the Medical Expenditure Panel Survey (http://meps.ahrq.gov/mepsweb/) and the National Health Interview survey (http://www.cdc.gov/nchs/nhis.htm). Through such ongoing surveys, HRQOL measures provide a way to monitor both the health and well-being of the population over time and the potential association with risk factors. Several generic HRQOL measures have previously been compared in studies such as those of Hanmer et al. (2006), Fryback et al. (2007), and Cherepanov et al. (2010). References on HRQOL include Bowling (1991), Spilker (1995), and Fayers and Machin (2007).

An emerging body of research has shown that monitoring HRQOL in population surveys may have additional uses. HRQOL measures, and in particular general self-rated health (GSRH), have been shown to be associated with future adverse health events, such as hospitalization (Cavrini et al. 2012; DeSalvo et al. 2005) and mortality (DeSalvo et al. 2005; Tsai et al. 2007; Sargent-Cox et al. 2010), although most studies to date have used selected samples. If this relationship is generalizable to other measures of HRQOL and to more representative samples of older adults, then HRQOL may be considered as an important tool in prevention for monitoring increased risk of adverse health events. One study (Dominick et al. 2002) found a significant relationship between HRQOL-4 and 1-month and 1-year mortalities, but longer-term associations, the relative importance of the four measures when used together (Dominick et al. 2002; Sargent-Cox et al. 2010; DeSalvo et al. 2006a, b), and the generalizability of results are unknown. Understanding these factors is important for surveillance and risk adjustment since health care resources are often heavily focused on end-of-life care.

This study investigates the following research questions: Do the items in HRQOL-4 predict short- or longer-term mortality? How much information do the three HDs in HRQOL-4 contribute relative to GSRH in predicting mortality? How do new results from a national sample compare to previous estimates?

Research Design and Methods

Data

The data are from Cohorts 6–8 of the Medicare Health Outcomes Survey (HOS), an annual national survey of Medicare beneficiaries voluntarily enrolled in Medicare Advantage (MA) private health plans (we were limited to three cohorts because of resource constraints, and the questionnaire switched versions beginning in Cohort 9, so we selected earlier years to ensure consistency in the measures). All managed care plans with MA contracts are required to participate in the HOS, which is used mainly for quality improvement programs, reports, and monitoring and improving health outcomes. When the data were collected, about 13 % of the Medicare population was enrolled in MA plans (Gold et al. 2010).

MA plans include persons aged 65+ and a small number of persons aged <65 with disabilities. Surveys are mailed to a randomly selected cohort of MA plan enrollees each spring, with a follow-up survey 2 years later. Non-respondents are surveyed by telephone. For large plans, a simple random sample of 1,000 members are surveyed; for smaller plans with fewer than 1,000 members, all members are surveyed. HOS eligibility requires only continuous enrollment in the same plan for at least 6 months and no end-stage renal disease. Approximately 160,000 individuals are sampled for each cohort, and about 105,000 respond to the complete survey in each cohort.

This study obtained an HOS limited data set including dates of death from the Centers for Medicare & Medicaid Services and an exemption from the investigators’ institutional review boards.

Sample

The analysis sample includes all individuals aged 65+ from HOS Cohorts 6–8 whose baseline surveys were collected between 2003 and 2005. HOS members under 65 were excluded because they are eligible for Medicare due to disability and represent a specialized population. Individuals were also excluded from the analysis if they had missing values for any of the HRQOL-4 variables, marital status, or health indicators (n = 96,583). Proxy respondents were excluded (n = 38,473) because perception of HRQOL varies by reporter (Albrecht and Devlieger 1999; Andresen et al. 2001; Ellis et al. 2003). The total analysis sample was 191,001.

Dependent Variable

The primary dependent variable is mortality status. Short-term mortality is defined occurring within 90 days of interview; long-term mortality is the status within the maximum follow-up length, slightly more than 2.5 years (analysis of 30 days and 1 year is available upon request). The time between the baseline survey and an individual’s last observation—either date of death or the end of follow-up—was used in all Cox survival models described below. The exact date of death on the HOS is identified from the Medicare Enrollment Database and Social Security Administration Master Death File.

Health-Related Quality of Life

The primary independent variables are the core CDC Healthy Days measures (HRQOL-4), which have been included on the HOS since 2003. These items have undergone cognitive testing and have demonstrated content, construct, criterion, and predictive validity, test–retest reliability, and internal consistency (Andresen et al. 2001; Moriarty et al. 2003; CDC 2000). The items include GRSH, physically unhealthy days (PUDs), mentally unhealthy days (MUDs), and days with activity limitations (ALDs). GSRH is assessed on a five-point Likert scale (“excellent,” “good,” “fair,” “poor,” or “bad”). PUDs are defined by the question “Now thinking about your physical health, which includes physical illness and injuries, for how many days during the past 30 days was your physical health not good?”; MUDs by “Now thinking about your mental health, which includes stress, depression, and problems with emotions, for how many days during the past 30 days was your mental health not good?”; and ALDs by “During the past 30 days, for about how many days did poor physical or mental health keep you from doing your usual activities, such as self-care, work, or recreation?”

In the analysis, GSRH was used as a categorical variable with “excellent” as the reference group. PUDs, MUDs, and ALDs were grouped into 0-, 1- to 10-, 11- to 20-, and 21- to 30-day outcomes, with 0 day as the reference group, following past precedent (Dominick et al. 2002).

Other Independent Variables

All analyses adjusted for age, sex, race/ethnicity, marital status, household income categories, educational status, and Spanish survey language. In addition, a count of 12 self-reported chronic diseases (hypertension or high blood pressure; stroke; emphysema, asthma or chronic obstructive pulmonary disease; Crohn’s disease, ulcerative colitis, or inflammatory bowel disease; arthritis of hip, knee, hand or wrist; diabetes, high blood sugar, or sugar in the urine; congestive heart failure; any cancer other than skin cancer; angina pectoris or coronary heart disease; a myocardial infarction or heart attack; sciatica; other heart conditions, such as problems with heart valves or rhythm), six activities of daily living (ADLs; 0–2, 0 = no difficulty, 1 = has difficulty, 2 = unable to do the activity), back pain (1–5, 5 = greatest pain), smoking frequency (never, sometimes, or every day), and indicators for treatment of breast, colon, lung, and prostate cancer were included in all models. Survey mode and MA plan type indicators were also included.

Statistical Analysis

Cox proportional hazards analysis of the risk of mortality as a function of time to death since baseline, HRQOL, and other covariates were run as both bivariate and multivariate models. Cox models (Cox 1972) are a semi-parametric survival model for estimating the rate of mortality based on the time to event. Censored data, in which mortality is not observed, enter the model naturally. The time variable in all models was the number of days from baseline until death, if observed, or the censoring date. The Cox model assumes proportional hazards over time, implying that the hazard ratio for a given covariate does not change over time. This assumption was tested by interacting HRQOL variables with time.

Relative Importance

Two approaches were used to assess the relative importance of the three HDs compared to GSRH for predicting mortality. Predictive validity was assessed using the C-index (18), which ranges from 0.5 to 1.0, and reflects the proportion of paired outcomes in which differing survival times are correctly predicted by the model. To compare the relative predictive power of different factors, the C-index was estimated for several models: (1) age; (2) age, GSRH; (3) age, HDs; and (4) age, GSRH, HDs. DeSalvo et al. (2005) and Quinten et al. (2009) also used similar approaches to assess the performance of HRQOL for predicting mortality outcomes. In addition, the full model (age, GSRH, HDs) was run with and without each individual term to compare the difference between the full model and each reduced form.

Relative importance was also assessed for the same models by the proportion of explained variation (PEV; Schemper 1993; Schemper and Stare 1996; Heinze and Schemper 2003). PEV measures the amount of variation in mortality (0–100 %) attributable to independent variables relative to the total variation in mortality. The partial PEV for each factor of interest reflects the PEV with and without the factor of interest.

Results

Sample Characteristics

Among the 191,001 respondents in this analysis, mean age was 75 years, 10 % were non-White, 58 % were female, and 58 % were married (Table 1). Within 90 days of the baseline survey date, 718 respondents (0.4 %) died. Over the maximum observation time of slightly more than 2.5 years, 13,414 (7 %) had died. Table 2 shows responses for each of the HRQOL-4 items. Compared to all persons in the HOS sampling frame, including non-respondents, the analysis sample is slightly younger (among those aged 65 and older) and has fewer minorities, more married persons, and higher education levels. All of these factors are associated with conditioning on non-proxy respondents and non-missing data. A detailed table of comparisons is available from the authors upon request.

Table 1 Characteristics of the analysis sample
Table 2 HRQOL measured by Healthy Days

Bivariate Analysis

At the maximum follow-up time, age was the strongest predictor of mortality, with an estimated hazard ratio (HR) of 1.09 (P < 0.0001) at age 65 that increased with age (Table 3). The HRs for each category of the four HRQOL variables were also highly significant (P < 0.001), were greater than 1, and increased with each decrement in HRQOL. For any level of HDs (1–10, 11–20, and 21–30), the HRs were smallest for MUDs and largest for ALDs, ranging from 1.23 for 1–10 MUDs to 3.65 for 21–30 ALDs. Most other covariates had the expected relationship with mortality, such as increased HRs for males, persons with more chronic diseases, for those with ADLs, and smokers (all P < 0.0001). Married respondents and persons with higher income (both P < 0.0001) had decreased HRs. HRs were <1.0 for those responding in Spanish (P = 0.004) and for races other than White and Black (P = 0.001), and HRs exceeded 1.0 for telephone respondents (P < 0.0001).

Table 3 Cox Proportional Hazards Analysis, 3 Models of Mortality

Multivariate Results

For short-term (90-day) survival, HRs increased with worse categories of HRQOL for GSRH, PUDs, and ALDs. HRs for PUDs and ALDs were significant at 11–20 days (P < 0.05) and 21–30 days (P < 0.001), roughly 1.4 and 1.7 for each measure, with confidence intervals overlapping across comparable levels of the two variables (second and third columns of Table 3). HRs for “fair” and “poor” GSRH were statistically significant (P < 0.001) and larger than HRs for any levels of PUDs or ALDs. Contrasting with the other measures and the bivariate results, the HR for the only significant level of MUDs was <1, indicating a decreased risk of mortality (P < 0.05) with 11–20 days of “not good” mental health in the past 30 days relative to no days of “not good” mental health. The relationship for other control variables was largely as expected, although fewer of these variables were significantly associated with mortality relative to the bivariate analysis. However, back pain score is associated with decreased risk of mortality (P < 0.001) compared to increased risk in the bivariate analysis.

Over the maximum follow-up, several additional relationships that were not significant over 90 days were statistically significant. HRQOL results were consistent with the 90-day period, although the HRs are closer to 1.0 in most cases except for MUDs, indicating that the proportional hazards assumption may not have been met. Likewise, worse HRQOL corresponded to larger HRs, except for MUDs. All included categories of GSRH were significantly associated with mortality (P < 0.05 or better). All categories of ALDs were significant at P < 0.01 or better, but only 21–30 days of PUDs was significant (P < 0.001). At 90 days, greater MUDs were associated with a decreased risk of mortality, but only 21–30 days was statistically significant (P < 0.01).

Specification Tests

To test the proportional hazards assumption of the Cox model, time interactions with the HRQOL variables were included in an additional multivariate model of maximum follow-up length (Table 4). Except for “fair” GSRH, all interactions were statistically significant for the same HRQOL terms that were significant in the corresponding model without time interactions in Table 3. Between short- and long-term follow-up, the combined HR for “poor” GSRH decreased from 7.1 to 3.5 (P < 0.05). For PUDs, only 21–30 days was significantly associated with increased mortality (P < 0.05), and the HR declined from 1.5 to 1.1 over the same period. The HRs for ALDs decreased from 1.6 to 1.1 for 11–20 days (P < 0.05) and from 1.9 to 1.2 for 21–30 days (P < 0.01). The only significant interaction for MUDs, 11–20 days (P < 0.05), moved in the opposite direction over time, with HR increasing from 0.8 to 1.1 by 2.5 years. All other covariates in Table 3 were also included in these models. For the time interactions at 90 days only, interactions between GSRH and time were significant.

Table 4 Combined hazard ratio from Cox proportional hazards of time-interacted model (maximum follow-up length)

Model Fit and Predictive Validity

Using the total C-index, the HRQOL variables added more information to total fit in the short-term model, while age added more information over the long-term model (Table 5, first panel). The added contribution of the four HRQOL variables ranged from 0.12 to 0.17 (19–27 % relative increase) at 90 days compared to only 0.07–0.08 (10–11 %) at 2.5 years. For overall cohorts and both follow-up lengths, GSRH alone added more to the fit than all three of the HD measures.

Table 5 Total model fit and contribution of individual variables

The relative contributions of age, GSRH, and the three HDs were assessed in terms of the average change in PEV or individual contributions to the full model C-index (Table 5, bottom panel). GSRH was the single largest predictor for the short-term model of 90 days and the second most important predictor in all cohorts at 2.5 years. Age was the most important predictor over the long-term model of about 2.5 years and the second most important predictor in the 90-day model. The relative importance of all three HD measures, compared to GSRH and age, was lower in both the short- and long-term models, whether assessed by PEV or C-index. In the three healthy days measures, the long-term results display a clear pattern in both the PEV and C-index results: ALDs are most predictive of long-term mortality, PUDs the next most predictive, and MUDs provide almost no information. In the short-term models, the pattern is not as consistent. The PEV shows ALDs to be most predictive of the three, while MUDs and PUDs vary in rank depending on the cohort. The C-index results show PUDs more predictive in Cohorts 7–8 and ALDs in Cohort 6, with MUDs always at the bottom.

Discussion

This study provides new data describing the relationship between HRQOL and mortality by comparing the relationship with short- and long-term mortality. It is also the first to examine the association between healthy days and mortality in a large nationwide population, to assess all four HRQOL-4 measures in a single model, and to follow them for more than 1 year.

The bivariate results are consistent with the findings of Dominick et al. (2002), although HRs are smaller in this study for comparable models. The HRs <1.0 for Spanish surveys and other (non-White, non-Black) race were unexpected, but may reflect a crossover association with survival (Corti et al. 1999; Johnson 2000). The MA population is healthier, on average, than the general population (Atherly et al. 2005; Mello et al. 2003; Lied et al. 2003), which may exaggerate a crossover association if selection into MA plans varies by race/ethnicity.

Comparing the short (90-day) and long (about 2.5 years) multivariate models, the relationship between HRQOL and mortality weakens as time since measurement increases. Although more covariates and HRQOL categories were statistically significant over the longer follow-up period, this results from increased power and more deaths observed, not increased predictive power for the HDs, as shown in Table 5. An exception to this is that the GSRH measure is more predictive over the long-term model than the short-term model. The weaker long-term relationship with mortality for the three HD measures is not unexpected since these capture HRQOL during the past 30 days while the follow-up period is substantially longer. Physical health (GSRH and PUDs) and ALDs had the expected relationship with mortality, but the associations between mental health and mortality were less consistent and, in some cases, were in the opposite direction than expected. The finding of no significant relationship between mental health and mortality after adjusting for other measures of HRQOL was consistent with several other studies (Sargent-Cox et al. 2010; Steptoe and Wardle 2011; DeSalvo et al. 2006a, b; Franks et al. 2003; Korten et al. 1999; Vogt et al. 1994). One study examining the association between mental health and mortality (Pratt 2009) found significant associations between poor mental health and mortality, but did not control for HRQOL measures, including GSRH.

Assessing predictive validity and relative importance, the finding of a much smaller (10–11 % vs. 19–27 %) impact of the HRQOL-4 variables at the long term than at the short term is expected because the HRQOL-4 questions are phrased in terms of the past 30 days. If only one HRQOL measure were available for mortality modeling—over any length of time—GSRH is by far the most predictive. Among the three HDs, ALDs are the largest predictor in the PEV analysis of long-term follow-up and also the most stable, based on the smaller time interactions with this variable. Thus, ALDs appear to be the most important of the three HD measures for long-term prognostic assessment. This conclusion should be weighed against the fact that the overall contribution of ALDs was still small in the models. Adding further robustness to these results, the differences in the contributions of the different HRQOL measures to the change in the C-index and PEV are consistent with a previous study that used another measure of HRQOL (SF-36) and reported C-index statistics for the alternative models (DeSalvo et al. 2005). In addition, the large association with the GSRH question after adjusting for other demographics, comorbid conditions, and depression has been shown in one previous study (Sargent-Cox et al. 2010). Another study that examined the variance in HRQOL explained by GSRH found that it explained 48–59 % of the variance in one HRQOL scale, the -12V instrument (DeSalvo et al. 2006a, b).

These findings have several implications for the monitoring of population health. Because the greatest level of predictive information is added by the GSRH item in HRQOL-4, the most parsimonious way to monitor for increased risk of adverse health events in the population is to collect this single measure. However, the other measures do add additional information. If all four HRQOL-4 items are collected, these items could be used to create a better model of mortality risk which could inform targeting of future preventive intervention efforts. For example, if these items were part of a clinical screening form or a routine wellness questionnaire, persons with worsening over repeated measures of GSRH, PUDs, or ALDs could be flagged for additional clinician or caseworker review to better understand their health state and higher risk of mortality. The inconsistent and largely non-significant relationship for mental health suggests that less emphasis should be put on the MUD measure when monitoring for population health risk. However, it is still an important outcome capturing quality of life associated with morbidity.

This research has limitations. The large sample of Medicare recipients is an advancement, but is not representative of either the older adult population or the entire Medicare population. Our sample restrictions change the composition slightly relative to the HOS sampling frame, but we have no reason to believe that HRQOL–mortality relationship is affected by this. Second, time to mortality is measured with error among mail respondents: HOS records the date on which the survey was received, not when it was completed. This short lag should have minimal impact on the analysis. The proportional hazards assumption was not met for the maximum follow-up time or for GSRH in the short-term model, so final conclusions about the HRs for HRQOL should be drawn from Table 4, which includes time interactions. Finally, the back pain relationship in the multivariate models is puzzling. Back pain is correlated with some of the other controls in the multivariate model. Our suspicion is that after including the 12 chronic diseases, GSRH, and HDs, the residual information in back pain in the multivariate model is small and may be proxying for attitudinal differences in the sample. Because our main emphasis was on the HRQOL relationship, further exploration of this finding must be left for future research.

Future research may address these limitations in several ways. Population differences are likely to be important and may be explored with different samples and greater collection of HRQOL measures, including the Healthy Days. The relationships may also be different for less healthy Medicare beneficiaries and could alter conclusions about using these results for broader population surveillance. The question of proxy respondents may also be an important area for future research. In conclusion, this study demonstrates strong associations between HRQOL and mortality using the CDC Healthy Days measures in both short- and long-term measurement, with important differences identified over time.