Introduction

Self-reports of health, subjective well-being, and symptoms are widely used in gerontology, medical, and epidemiological research because they can serve as valuable indicators of general health and health-related outcomes across the lifespan (Schnittker 2005). Two of the measures studied here, subjective health and life-satisfaction, are also considered important predictors of individuals’ health trajectories and have robust links with objective physical health indices, healthcare utilization, and longevity (DeSalvo et al. 2005, 2006 for review see Diener and Chan 2011; Franks et al. 2003; Ostbye et al. 2006; Sargent-Cox et al. 2008). The other two measures studied here, pain and fatigue, are common symptoms in the general population and are associated with healthcare use, work absenteeism, and overall productivity (Cote et al. 2001; J. Ricci et al. 2006, 2007; van't Leven et al. 2010; Vasseljen et al. 2013).

A potentially important factor in studies employing these four outcome measures that has sometimes been overlooked is the comparison standards respondents use to answer questions, also known as frames of reference. Frames of reference (FoRs) are well-established in survey methodology and refer to time frame or reference group comparisons that respondents use when answering questions (Fienberg et al. 1985; Schwarz 1999). When asked about their life-satisfaction, for instance, some individuals may compare themselves to when they were younger (a time frame) or to other people of similar age (a reference group) (Kaplan and Baron-Epel 2003; Fayers, Langston, Robertson, and group 2007). Such comparisons can impact the way answers are formed.

FoRs might be a concern in self-reports if the comparison standard is not provided in survey questions. This is because different individuals or different groups of individuals may use different types of comparisons and this “natural” variation in the use of FoRs may compromise the utility of self-reports for making comparisons across people of different ages, demographic, or medical circumstances (Sargent-Cox et al. 2008; Stone et al. 2008; Ubel et al. 2005). A hypothetical example demonstrates the concern: if younger people used their impression of an “ideal self” more often than older people and if comparisons with an ideal self were associated with lower life-satisfaction, then life-satisfaction would be artificially low in younger age groups as compared to older age groups. Thus, it is important to know whether participants of different subgroups consider different FoRs when self-rating their health or life-satisfaction. This paper aims to describe frames of reference that are naturally used in self-reports of health and life-satisfaction and the associations between FoRs and age.

Evidence supporting this examination comes from several lines of research. For example, three qualitative studies showed age differences in the use of reference standards when answering the single-item self-rated health question. Peersman and colleagues (Peersman et al. 2012) found that older people were more inclined to use health comparisons with oneself in the past as a reference when responding to self-rated health questions, whereas younger people were more likely to use health behaviors (e.g., smoking, exercising) as a referent. Similar age difference were found by Krause and Jay (1994). They showed that younger respondents were more likely to use health behaviors as a reference standard when responding to the global self-rated health item, while middle aged and older participants were more likely to think about health problems they had or could have. Finally, in a study by Simon and colleagues (Simon et al. 2005), older participants were more likely to refer to physical and functional aspects of their health when responding to global self-rated health items but younger participants were more likely to use aspects of well-being.

Furthermore, research shows that explicit manipulation of a FoR in the question wording can impact the age-pattern of health and health-related outcomes. Self-rated health items containing a FoR directing respondents to use comparisons with “other people” have been shown to reduce age differences compared to self-rated health items without an explicit FoR or those with a time-comparative FoR (Baron-Epel and Kaplan 2001; Sargent-Cox et al. 2008, 2010a). Questions explicitly asking respondents to “rate your health for someone your age” also yield less pronounced age differences than health ratings without a specific FoR (Roberts 1999; Ubel et al. 2005; Vuorisalmi et al. 2006).

Questions using different FoRs have also been found to yield differential associations with health-related outcomes, such as survival, and differential trajectories of self-rated health over time (Ferraro and Wilkinson 2015; Sargent-Cox et al. 2008, 2010a, b). Sargent-Cox and colleagues (Sargent-Cox et al. 2010a) found that a measure of self-rated health that contained neither an age-comparative FoR (comparisons to people who are younger/older/or of the same age) nor a self-comparative FoR (comparing current health to previous health) in the item wording was the best predictor of mortality in older adults. Self-rated health based on an age-comparative FoR included in the item wording improves with age in older adults (Dening et al. 1998; Seitsamo and Klockars 1997), whereas health ratings based on items with a self-comparative FoR decline with age (McCullough and Laurenceau 2004).

There is also reason to believe that the “directionality” of a FoR comparison (upward (better than oneself), lateral, or downward (worse than oneself) relative to the respondent’s perception of themselves) differs by age. Drawing on comparison theories (Albert 1977; Festinger 1950, 1954), it has been suggested that older adults are more likely to engage in downward social comparisons (i.e., comparisons with a social reference group that one perceives as inferior or downgraded) than younger adults in order to elevate their perspective on themselves and how they are doing (Heckhausen and Brim 1997; Suls et al. 1991). Temporal comparisons (comparing one’s current state with the past), in contrast, may yield less favorable self-evaluations in older adults due to age-related declines in physical health (Leinonen et al. 2001). For instance, prior research that manipulated FoRs in the item wording has reported that an age-comparative FoR in self-ratings of health yields better scores for older adults (Baron-Epel et al. 2004).

Despite the importance of FoRs in self-report research, we know very little about the FoRs - if any - that people naturally use when a FoR is not made explicit in a survey question, that is, when a FoR is not included in questions. There are only a few qualitative studies that explored which FoRs individuals naturally use and they were relatively small-scale investigations with non-representative samples. Kaplan and Baron-Epel (Kaplan and Baron-Epel 2003) examined self-rated health in Israeli residents and found that most respondents compared themselves to “Other People.” Fayers and colleagues (Fayers, Langston, Robertson,, and group 2007) examined FoRs about health-related quality of life in patients with Paget’s disease and found that the majority of patients compared themselves to (1) how they were a year ago, (2) before they became ill, and (3) other people who are healthy. These two studies focused on health-related ratings.

In our own recent work (Junghaenel et al. 2018), we extended this prior research by querying one hundred adults from a community sample of the U.S. general population about their natural use of FoRs when rating the outcomes examined in this study: self-rated health, life-satisfaction, and two common, self-reported symptoms, pain and fatigue. In this qualitative work we found that participants reported three broad FoR categories: (1) interpersonal comparisons (references to other people), (2) historical comparisons (references to an earlier time in life or an important event in the past), and (3) imaginary comparisons (references to an imaginary situation). We use these categories in the current study.

The goal of the present study was to examine whether or not people of different age groups in the general U.S. population vary in their use of FoRs when responding to health-related outcomes. The study aims were: (i) to examine the prevalence of FoRs used by respondents of different age groups for self-rated health, life-satisfaction, pain, and fatigue; (ii) to evaluate whether the natural use of FoRs (type and direction of comparisons) differs by outcome and age group; (iii) to examine whether the differential use of FoRs by age (including type and direction of comparisons) could potentially affect the observed age-patterns of health, life-satisfaction, pain, and fatigue.

Methods

Participants and Procedure

Participants (n = 2000) were recruited from a U.S. national Internet panel of about one million households, hosted by Survey Sampling International (SSI). The opt-in panel consists of people who volunteered to periodically participate in Internet surveys for which they receive modest compensation. Invitations to participate in the study were sent to panelists in three age groups (age: 21-45 years, 46-64 years, and 65-85 years, n = 648-683 in each group) until the targeted sample size of 2000 participants was reached. Recruitment strategies for the study also included an equal proportion of female and male participants and racial/ethnic demographics based on the 2010 Census. Inclusion criteria did not include specific health conditions.

Participants completed the questionnaire online and were presented with one item at a time. The FoR checklist consisted of a computer-generated branching system that was programmed by the survey host SSI. Each participant was randomly assigned to two of the four outcome variables (i.e., health, life-satisfaction, fatigue, and pain). The checklist first presented the item for one of the two selected study outcomes and asked participants to provide their self-report rating. This was followed by an open-ended question about what participants were thinking about or which comparisons they were making when answering the survey item. Next, participants were presented with a list of specific FoRs (derived from our prior qualitative work) and were asked to indicate which of these, if any, they used when making their self-report rating (“When you answered the health question, did you make any of the following comparisons?”). The list included references to other people, i.e. Interpersonal comparisons (“I compared myself with another person or other people”), past events, i.e. Historical comparisons (“I made a comparison with how I was some time ago”), a hypothetical situation, i.e. Imaginary comparisons (“I thought about how I would feel if something about me or my life were different”) as well as an option that allowed them to say that they were not thinking of any of the listed FoRs (“I did not think about any of the above when answering the question”). If a participant indicated that s/he did not think of any of these three FoRs, the checklist moved on to the next outcome variable. Participants could check all options that applied and were later asked to select the one (if multiple FoRs were endorsed) that was “most important” to them when they were thinking about their answer to the survey item. Participants were then asked to think about their selected single or most important FoR and rated whether this FoR contained an upward, lateral, or downward comparison. For example, when participants selected that they made Interpersonal comparisons, the subsequent question on the internet checklist asked them to rate whether this person/these people were 1) better off (had better health, greater life-satisfaction, less fatigue, or less pain), 2) similar to the participant (had similar health, life-satisfaction, fatigue, or pain) or 3) were worse off (worse health, lower life-satisfaction, more fatigue, or more pain) than the participant. These steps were repeated for both outcome variables that were assigned to the participant. The online branching system also administered other survey questions not reported in the present study. For example, these questions included asking participants what specific types of past events they were thinking about in their historical comparisons, whether participants were thinking about one or more people and what their characteristics were in their interpersonal comparisons, and what aspects of themselves or their lives participants were thinking about in their imaginary comparisons. Information about the FoR checklist is available from the authors.

Measures

The self-report outcomes were derived from the following sources. For all of the selected questions, none had a specific FoR in the standard item wording (i.e., specifying a comparison standard for the question). For subjective health, the global health item from the SF-36v2 (Ware Jr. and Sherbourne 1992; J. E. Ware et al. 2000) was used (“In general, would you say your health is…”; response options: excellent, very good, good, fair, poor). For life-satisfaction the World Values Survey (www.worldvaluessurvey.org) life-satisfaction question was applied (“All things considered, how satisfied are you with your life as a whole these days?”; response options ranging from 1 = completely dissatisfied to 10 = completely satisfied). The fatigue question was taken from the Brief Fatigue Inventory (Mendoza et al. 1999) (“Please rate your fatigue (weariness, tiredness) by circling the one number that best describes your usual level of fatigue”; response options ranging from 0 = no fatigue to 10 = fatigue as bad as you can imagine). The pain question was selected from the Brief Pain Inventory (Cleeland 1994) (“Please rate your pain by circling the number that best describes your pain on the average”; response options ranging from 0 = no pain to 10 = pain as bad as you can imagine).

Data Analysis

For the analyses each participant’s most salient FoR was used, which included: Interpersonal, Historical and Imaginary comparisons or None of the FoRs. If a participant checked multiple FoRs, the one selected as “most important” was used. Overall there were only 340 instances of multiple FoRs (8.5%). The analysis showed that younger and middle-aged people were more likely to use multiple FoRs as compared to older people (younger vs. older: OR = 2.48, p < .001, 95%CI [1.86, 3.29]; and middle aged vs. older: OR= 1.42, p < 0.05, 95%CI [1.04, 1.93]). The analyses of direction of comparisons were conducted excluding the subsample of participants who reported using None of the FoRs. There were no missing data. To investigate whether the use of FoRs (type and direction) differ by outcome (health, life-satisfaction, fatigue and pain), we used a series of logistic regression models (McCullagh and Nelder 1989; Venables and Gardner 2002) followed by pairwise comparisons with Bonferroni corrections to control for Type 1 error (Agresti 2007; MacDonald and Gardner 2000). Two approaches were used to evaluate the age effects on the use of FoRs. For the analyses conducted on the data pooled across outcomes, we used a series of logistic regression models followed by pairwise comparisons with Bonferroni correction. To properly account for non-independence (each participant evaluated themselves on two outcomes hence, outcomes were clustered within individuals) Huber-White cluster-robust standard errors were applied in the analyses (Williams 2000). For the analyses conducted by each outcome domain we used a series of logistic regression models, where the predictor variable “Age” was re-coded into two dummy indicators contrasting young and middle-aged participants with older participants. The dependent variable FoR was recoded into three dummy indicators contrasting Interpersonal, Historical and Imaginary comparisons to the absence of these FoRs (None of the FoRs). The second dependent variable, direction of comparison, was also recoded into two dummy variables contrasting downward and lateral comparisons to the upward comparisons. We checked for possible confounding effects of demographic variables (gender, SES and marital status). Inclusion of covariates did not change the substantive interpretation of the results; therefore, the final regression analyses were conducted using only participants’ age as the predictor. These analyses were conducted using R statistical software.

To investigate how the age differences in the use of FoRs affected the observed age-differences in ratings of health, life-satisfaction, fatigue and pain, we conducted two sets of univariate ANOVAs in STATA that examined: (i) age differences in each outcome variable without any covariates (i.e., testing the main effect of age groups), and (ii) age differences in each outcome holding differences in FoRs constant (i.e., testing the main effect of age groups controlling for FoR categories). The age effects obtained from models i) and ii) were statistically compared using ‘seemingly unrelated equation’ models (Baum 2006; Srivastava and Giles 1987), in which the results from separate models with different sets of explanatory variables are combined into a single model with a joint variance-covariance matrix. This makes it possible to directly compare parameter estimates (i.e., age effects) across the different models.

Results

Participants

Participants had a mean age of 53.01 years (SD=16.0). The gender distribution was 52.6% female. Most participants were married or living as married (57.7%). Our sample was predominantly White (80.2%), non-Hispanic (86.7%) and educated (81% with college education or more). Table 1 presents the demographic characteristics.

Table 1 Demographic characteristics of the study sample

Prevalence of FoRs Overall and Across Outcomes

Overall, slightly over one-third of respondents (36.7%) did not use any of the FoRs in the questionnaire. For those who reported using one of the FoRs, Historical comparisons were most frequent (35%), followed by Imaginary (15.1%) and Interpersonal comparisons (13.2%).

To examine the differences in the use of FoRs by outcome, we applied a series of logistic regression models. Table 2 presents marginal means and the summary of pairwise comparisons, where each column of the table represents a separate logistic regression analysis. For Interpersonal comparisons, the main effect of Outcome was significant (χ2(3) = 84.25, p < 0.001). Interpersonal comparisons were used more frequently for health as compared to the other three outcomes, which did not differ from each other. There was also a significant effect of Outcome for Historical comparisons2(3) = 10.3, p < 0.05). This FoR was used somewhat more frequently for fatigue as compared to health and life-satisfaction, but the frequency of Historical comparisons for fatigue and pain was not significantly different. For Imaginary comparisons the main effect of Outcome was also significant (χ2(3) = 93.71, p < 0.001). The pairwise tests showed that Imaginary comparisons were used more frequently for life-satisfaction compared with the other three outcomes, which did not differ among themselves. Finally, there was also a significant effect of Outcome for None of the FoRs2(3) = 33.07, p < 0.001), which was reported most frequently for pain as compared to health and life-satisfaction. The frequency of None of the FoRs did not differ between pain and fatigue.

Table 2 The use of FoRs by outcome

Do Young, Middle-Aged and Older People Report FoRs Differently?

In the first step we examined differences between the age groups in the use of FoRs using the data pooled across the four outcome domains. Table 3 presents marginal means and the summary of pairwise comparisons. For Interpersonal comparisons, the main effect of Age was significant (χ2(2) = 114.67, p < 0.001). Younger participants used Interpersonal comparisons more frequently than did middle-aged and older participants. For Imaginary comparisons the main effect of Age was also significant (χ2(2) = 18.35, p < 0.001). Younger participants made Imaginary comparisons more frequently than did older participants, whereas there were no significant age differences between middle-aged and older participants. Finally, there was also a significant effect of Age for None of the FoRs2(2) = 114.22, p < 0.001). Younger participants used None of the FoRs less frequently than middle-aged and older participants who did not differ from each other. No significant age differences were found for Historical comparisons2(2) = 0.18, p = 0.91).

Table 3 The use of FoRs by age across outcome

In the next step we investigated age differences in the use of FoRs for each outcome separately. Results from the logistic regression models are presented in Table 4 and Fig 1. The analyses showed that for health younger participants were 1.5 times more likely than older participants to make Interpersonal comparisons (OR=1.51, p < 0.001) and 2.2 times more likely than older participants to use Imaginary comparisons (OR=2.17, p < 0.001). No significant age effects for references to Historical comparisons for the health were found. For life-satisfaction younger participants were almost 2.4 times more likely than older participants to use Interpersonal comparisons (OR=2.44, p < 0.001). No significant age effects for Historical or Imaginary references were found. For fatigue, younger participants were over 5 times more likely than older participants to make Interpersonal comparisons (OR=5.54, p < 0.001) and 1.6 times more likely than older participants to make Imaginary comparisons (OR=1.62, p< 0.05). No significant age effects for Historical comparisons were found. For pain, younger participants were almost 4 times more likely than older participants to make Interpersonal comparisons (OR = 3.95, p < 0.001). Furthermore, they were almost two times more likely than older participants to make Imaginary references (OR=1.82, p < 0.001). There were no significant age effects for Historical comparisons. Finally, for each outcome and FoR no significant difference was found between middle-aged and older participants.

Table 4 Effects of age on the use of FoRs by outcome
Fig 1
figure 1

The use of FoRs (Interpersonal, Imaginary and Historical Comparisons) by age group and outcome

Prevalence of Downward, Lateral, and Upward Comparisons Overall and Across Outcomes

For each of the broad FoR categories (notably, with the exception of the “None of the FoRs” category) respondents were subsequently asked about the direction of their comparison (i.e., upward, lateral or downward comparison). Investigation of the frequency of these comparisons indicated that upward comparisons were most frequently used (52%), followed by lateral (32%) and downward comparisons (16%).

We were also interested if the use of upward, lateral and downward comparisons differs by outcome. Table 5 presents a summary of the logistic regression analyses, including marginal means and the summary of pairwise comparisons. For each of the three FoRs the effect of domain was significant (Upward: χ2(3) = 38.82; Lateral: χ2(3) = 21.32, p < 0.001; Downward: χ2(3) = 18.64, p < 0.001). The upward comparisons were used more frequently for health and pain than for fatigue and life-satisfaction. The lateral comparisons were used more frequently for health and pain and less frequently for life-satisfaction and fatigue. Finally, downward comparisons were used more frequently for health and life-satisfaction, which did not differ from each other, than for fatigue and pain, which also did not differ from each other.

Table 5 The use of upwards, lateral and downwards comparisons by outcome

Do young, middle-aged, and older people differ in their reporting of upward, downward, and lateral comparisons?

We tested for the effects of age on the use of upward, lateral, and downward comparisons using data pooled across four outcomes and logistic regression models. Table 6 presents the summary of the analyses. We found significant effects of age for lateral and downward comparisons (Lateral: χ2(2) = 26.51, p < 0.001; Downward: χ2(2) = 15.49, p < 0.001). The lateral comparisons were used less frequently and the downward comparisons were used more frequently by younger participants as compared to middle-aged and older participants, which did not differ significantly from each other.

Table 6 The use of upward, lateral and downward comparisons by age group

Does the use of FoRs Bias Observed Age-Patterns of Health, Life-Satisfaction, Pain, and Fatigue?

Our previous analyses showed that age groups differ in the use of FoRs, both the type and direction of comparisons. We next investigated whether we obtain different age patterns in health, life-satisfaction, fatigue and pain before versus after statistically controlling for differences in FoRs. First, we investigated how the age differences in the use of broad FoR categories (i.e., Interpersonal, Historical, Imaginary comparisons or None of the FoRs) affect age-differences in ratings of health, life-satisfaction, fatigue, and pain. Table 7 presents the ANOVA results (overall F-test) for the age group differences in means before and after adjusting for FoRs (first two columns), the estimated age group means, and the χ2- value for the test of differences in age effects between the two models (last column). Results show that taking into consideration FoRs did not influence age differences in ratings of life-satisfaction and health (p > .10). However, FoRs significantly affected age differences in ratings of fatigue and pain (p < .01). As shown in Fig 2, for the fatigue and pain items, after accounting for the effects of FoRs the age differences are still present, however, they are somewhat reduced.

Table 7 Estimated means by age group before and after adjustment for fors and direction of comparisons, and tests for differences in age effects between the two models
Fig 2
figure 2

Estimated age patterns before and after adjustment of FoRs (Interpersonal, Historical, and Imaginary Comparisons) and direction of comparisons (Upward, Downward, Lateral) by outcome

Finally, we examined if differences in the direction of comparisons affect age differences in ratings of health, life-satisfaction, fatigue, and pain. The analyses steps were the same as the ones described in the previous paragraph. Results shown in Table 7 indicate that the used direction of comparison did not significantly affect age differences in ratings of life-satisfaction, health, and fatigue (p > .10), but it significantly affected the age differences in pain ratings (p < .01). Looking at Fig 2 we see that after accounting for the effects of FoRs, the age differences in the ratings of pain are still present, however, they are reduced.

Discussion

Self-reported survey data is used to monitor the health and well-being of the population by both governments and scientists. For some of these outcomes, self-reports are the only practical way to acquire the information, e.g., pain, fatigue, and life-satisfaction. As with any measurements that are central for population monitoring and decision-making, self-reports need to be measured as precisely as possible. Despite the widespread use of self-reports, knowledge of the processes underlying people’s evaluation of their general health or life-satisfaction is limited (Kaplan and Baron-Epel 2003; Simon et al. 2005; Peersman et al. 2012). In this study we focused on the comparison standards or FoRs people use when responding to survey questions about their health, life-satisfaction, pain and fatigue, a topic for which there are only a handful of studies (Kaplan and Baron-Epel 2003; Fayers, Langston, Robertson,, and group 2007; Junghaenel et al. 2018). We examined the use of FoRs in questions that do not explicitly provide instructions about reference frames, because the field lacks empirical evidence regarding the age differences in the natural use of FoRs and whether or not such differences - if observed - affect observed age patterns for the outcomes. The findings of this study make several contributions to the exiting literature. These are outlined below.

The first contribution of this study is that we found that in the U.S. general population most participants used three broad-defined FoRs: comparisons to other people (Interpersonal comparisons), to past events (Historical comparisons), or to a hypothetical situation (Imaginary comparisons). This is consistent with our previous research (Junghaenel et al. 2018) that showed that participants mostly used these three broad categories when evaluating their health and life-satisfaction. In this study the most frequently used FoR was a comparison concerning past events. In fact, Historical comparisons were used almost twice as frequently as the other two FoRs investigated in this study, namely Interpersonal and Imaginary comparisons. Further, we also showed that slightly over one-third of respondents reported not using any of the FoRs specified in the survey. This is comparable to our previous qualitative research (Junghaenel et al. 2018) where 20-40% of respondents reported not using FoRs when self-evaluating their health or life-satisfaction. This is interesting and calls into question whether or not there are implicit FoRs that are not accessible to individuals or if such comparisons are truly not used at all. We recommend that future studies investigate this issue further using mixed-methods approaches.

The second contribution of this study is that it showed that the use of frames of reference differs by the outcome measure. Namely, for self-rated health the dominating FoR was Interpersonal comparisons, for life-satisfaction it was Imaginary comparisons, whereas for pain and fatigue respondents mostly used Historical comparisons. This is consistent with previous research suggesting that respondents most often make comparisons to other people or past events when self-rating their health (e.g., Kaplan and Baron-Epel 2003; Fayers et al. 2007; Peersman et al. 2012). These findings also align with our previous qualitative work (Junghaenel et al. 2018) that showed that references to a hypothetical situation were mainly reported for ratings of life-satisfaction.

The third contribution is that we found that there were significant and substantial age differences in the reporting of FoRs. Younger participants used Interpersonal comparisons more often than the other two age groups, regardless of the domain being assessed. One possible explanation here is that younger people tend to have more exposure to various social groups in their daily lives than the middle-aged and older people, which could trigger social comparisons. Carstensen (1992) has shown that size of social networks reduces gradually with age. Indeed, a recent meta-analysis (Wrzus et al. 2013) suggests that there is evidence of age differences in the size and composition of social networks, with the peak in young adulthood and a steady decline in middle-age and older adults. This finding does not align with theoretical positions proposing that older people, not younger ones, use interpersonal comparisons more frequently as a strategy that protects them from negative effects of aging (e.g., Baron-Epel and Kaplan 2001; Albert 1977; Festinger 1950, 1954). This literature suggested that by comparing themselves with other people who are doing worse than them, older people manage to maintain positive well-being as evident in self-reports, a phenomenon known as the paradox of aging (Mather 2012, Zhou et al. 2017). Nevertheless, these theoretical considerations have not been comprehensively examined. The association between age and the use of Interpersonal comparison in self-reports and possible explanations of these differences require further research.

Furthermore, we found that younger people used Imaginary comparisons somewhat more frequently than middle-aged and older participants when self-reporting their fatigue and pain. To date there is no empirical evidence regarding the age differences in the natural use of this FoR. However, these age differences may be understandable when one considers that in this study more frequent use of Imaginary comparisons by younger people related to self-reports of one’s pain and fatigue. Younger people, on average, may have less experience with chronic health conditions or fatigue as compared with other age groups. Epidemiological studies indicate that people in their 40s and 50s most often experience chronic fatigue (e.g., Santhouse et al. 2010) and that there is generally a higher rate of chronic pain in older age groups (e.g., Docking et al. 2011). Hence, the use of hypothetical situations may be a useful reference standard in self-reporting fatigue and pain among younger respondents.

Another noteworthy contribution of this study is that we also investigated the direction of the comparisons (i.e., upward, lateral or downward) used by respondents in self-reports. Our findings were that overall upward comparisons were used most frequently and downward comparisons least frequently. We also found that the use of the direction of comparisons depended on the outcome being assessed. Upward comparisons were used most often for life-satisfaction and fatigue, lateral comparisons were used most often for health and pain, whereas downward comparisons were most frequent for health and life-satisfaction. No previous studies have examined the effects of outcome domain being evaluated on the use of the direction of comparisons in self-report surveys.

The results also revealed significant age effects in the direction of comparisons used for self-reported fatigue: younger participants used lateral comparisons less often and downward comparisons more often than middle-aged and older participants, whereas middle-aged and older participants did not differ. This finding again is inconsistent with prior research suggesting that older people use downward comparisons more frequently than younger respondents to self-enhance their perspectives on themselves and their overall health (Heckhausen and Brim 1997; Suls et al. 1991). These results suggest that self-enhancement via downward comparisons does not explain the paradox of aging and hence emphasize the need to investigate other cognitive or psychometric processes responsible for this effect.

The final contribution of this study is that we found that age differences in the use of FoRs slightly impacted the age patterns for self-reported fatigue, and pain. The overall effect of naturally used FoRs is that they magnify the existing age differences in self-reports of pain, and fatigue, but do not change their direction or pattern. This is somewhat surprising given the existing literature that suggests that explicit manipulations of a FoR in the item wording can impact the age pattern of an outcome such as self-rated health (e.g., Baron-Epel and Kaplan 2001; Sargent-Cox et al. 2008, 2010a), or even the pattern of changes in self-rated health over time (e.g., Sargent-Cox et al. 2008, 2010a, b). We recommend that future research should examine the impact of natural FoRs versus FoR specified in questions to resolve this inconsistency.

There are limitations of the study that should be considered in interpreting the findings. The present study used the representative U.S. population, however, it consisted predominantly of White and highly educated respondents. This is often the case for studies using internet survey samples (e.g., Hays et al. 2015). Future studies should include more diverse samples to address whether the findings obtained in this study are consistent among respondents of different demographic composition. Moreover, this study focused on three FoRs, namely Interpersonal, Historical, and Imaginary comparisons. This choice was based on our previous research that showed that participants mostly used these three broad categories when evaluating their health and life-satisfaction (Junghaenel et al. 2018). It is quite possible there may be other comparison standards that people naturally use when self-rating their health or life-satisfaction that we did not include in this study. Finally, it should be noted that in the study design we only used one question for each outcome domain; however many self-reports of health or life-satisfaction are based on multiple-item measures. It remains an open research question if people use the same comparison standard for each item in a multiple-item scale.

The limitations notwithstanding, this study adds to the existing literature on the effects of aging on self-report processes. The results revealed significant differences between younger participants and other age groups in the natural use of FoRs in self-reports of health and life-satisfaction. Specifically, our study found that age differences relate not only to the types of FoR (i.e., Interpersonal, Historical, and Imaginary) but also the direction of these comparisons (i.e., upward, lateral or downward). This suggests that researchers and practitioners working with self-report measures should be aware of these age differences in the natural use of FoRs. Furthermore, we showed that the age differences in the natural use of FoRs enhance the observed age differences for self-reported pain and fatigue, but do not change their direction and that they have no pronounced effect on observed age patterns in self-reported health and life-satisfaction. More research is needed on the interactions between age, type of FoRs and direction of comparisons, to provide clear guidelines to professionals working with self-report measures.