Introduction

Health-related quality of life is often assessed via the short-form general health survey (SF-36), as this instrument provides a systematic approach for comparison between studies and populations [1]. SF-36 scores can be summarised using two widely accepted constructs labelled as ‘physical component summary’ (PCS) and ‘mental component summary’ (MCS) scores, which are based on exploratory factor analysis of the eight SF-36 subscales: physical functioning (PF), role limitation caused by physical health problems (RP), bodily pain (BP), general health perceptions (GH), vitality (VT) for energy levels and fatigue, social functioning (SF), role limitation because of emotional problems (RE), and mental health (MH) [2]. PCS and MCS scores are represented on a standardised scale (as a T score with mean 50 and standard deviation 10) and have better distributional properties (continuous and symmetrical) than individual SF-36 subscales. While studies often compute PCS and MCS scores for their sample based on population norms for weightings and some longitudinal studies have tracked individual subscales, such as MH and PF, the precise trajectory of component summary scores over the life course is less clear.

The use of PCS and MCS scores as a means of characterising health has been subject to considerable debate [38]. Much of this focuses on the method used in their calculation, specifically constraining factor analysis to have two uncorrelated axes (achieved via orthogonal rotation). The labelling of the resultant uncorrelated summary scores PCSuc and MCSuc implies that ‘physical’ and ‘mental’ characterisations of health occupy distinct dimensions. PCSuc and MCSuc comprise loadings from all subscales, with some subscales in PCSuc, such as the mental health subscale, having a negative weighting. Thus, improved mental health (for instance reflected in higher MH and RE scores) can result in a counter-intuitive decline in PCSuc. Similarly, PF and some other subscales have negative weightings on MCSuc. Thus, concerns have been raised about the implications of these negative weightings when investigating changes in summary scores over time [5]. Some authors have suggested that due to links between physical and mental health, correlated summary scores (PCSc and MCSc) should be derived using factor analysis where correlation between axes is permitted, via the use of oblique rather than orthogonal rotation [3]. This approach tends to reduce rather than eliminate negative weightings entirely. Ware and Kosinski [9] have argued, however, that PCSuc and MCSuc should always be viewed alongside scores for individual subscales so that unusual results can be identified.

In spite of the extensive use of uncorrelated summary scores from SF-36 in many studies, differences between PCS and MCS derived from the two methods remains unclear; specifically how differences vary across life and consequent implications for interpreting changes in summary scores over long time periods. The issue being that the magnitude of difference in the summary scores as a measure of the rate of change in health of patients may be used, for instance, to assess the effectiveness or otherwise of a treatment programme. In this paper, we compare PCS and MCS scores across the life course via both orthogonal and oblique settings for factor analysis of SF-36 subscales using data from three cohorts of women in the Australian Longitudinal Study on Women’s Health (ALSWH) [10].

Methods

Australian Longitudinal Study on Women’s Health (ALSWH) is an ongoing, broad-ranging prospective study of factors affecting health and well-being of Australian women. In 1996, self-reported data on health including from SF-36, health service use, socio-demographics, and personal information were collected from women in three age cohorts: the young cohort, born in 1973–1978 (aged 18–23 years), the mid-age cohort, born in 1946–1951 (45–50 years), and the older cohort, born in 1921–1926 (70–75 years). To date all cohort members have been followed up in at least five waves (younger cohort: 1996, 2000, 2003, 2006, 2009; mid-age cohort: 1996, 1998, 2001, 2004, 2007, 2010; and older cohort: 1996, 1999, 2002, 2005, 2008). The study has been approved by Ethics Committees at the University of Queensland and University of Newcastle. Further details of recruitment and response rates have been published elsewhere [10]. A total of 40,394 women completed the baseline survey, of these 9.8 % died during the study, 9.5 % withdrew, 18 % did not complete any surveys, while 23 % only completed surveys intermittently. The present study focuses on the remaining 16,006 women who provided complete data for SF-36 at all surveys (younger cohort (n = 5458), mid-age cohort (n = 7212), and older cohort (n = 3336)).

Statistical method

Using SF-36 survey data from baseline and applying a very similar process to Ware and colleagues [2, 11], we obtained two versions of PCS and MCS scores for all women: uncorrelated (PCSuc and MCSuc) and correlated (PCSc and MCSc). First, scoring coefficients (factor loadings) for PCSuc and MCSuc were determined using factor analysis of SF-36 subscales with a two-factor orthogonal rotation. Second, all data for the eight SF-36 subscales for each women were rescaled by taking the difference between the subscale and the mean of the subscale at baseline and then dividing by the standard deviation at baseline. Note these results in SF-36 subscales that are z scores at baseline only (mean 0, standard deviation 1), while data for SF-36 subscales for subsequent surveys are standardised relative to baseline. Third, the two summary scores for the woman at every survey were computed by multiplying each of the standardised SF-36 subscales by its respective scoring co-efficient (obtained in the first step) and summing the results. These two summary scores were then rescaled by multiplying by 10 and adding 50, so that at baseline the resultant PCSuc and MCSuc have mean 50 and standard deviation 10. Note that scores for subsequent surveys only differ by the standardised SF-36 subscales used in the calculation (scoring coefficients remain the same across surveys).

The correlated summary scores, PCSc and MCSc, were obtained by repeating steps above, but using two-factor oblique rotation in the factor analysis to determine the baseline scoring coefficients. Spearman correlation coefficients were also calculated between PF and MH subscales and component summary scores. The intra-class correlation (ICC) coefficients were calculated for each of the summary scores. Random effects models, a method for longitudinal regression for correlated data, [12] were used to investigate (1) if the summary scores produced by each method varied across surveys; (2) the difference in summary scores produced by the two methods was significantly different from zero; and (3) the extent that the magnitude of this difference varied across surveys. In these models, the intercept was considered as random effect and survey number (representing time) as fixed effect. All analyses were repeated for each cohort and performed in SAS version 9.1 [13].

Results

Table 1 shows the variation in individual subscales scores for each cohort through the study. Women in the younger cohort experienced a decline in PF, RP, and VT from survey 1 (aged 18–23) through to survey 5 (aged 31–36) but improved with respect to GH, SF, RE, and MH subscales. For the mid-age cohort PF, RP, BP, and GH declined from survey 1 (aged 45–50) to survey 6 (aged 59–64), but they also experienced a rise in RE and MH scores. Women in the older cohort, however, show a marked decline in all eight subscales of SF-36 from survey 1 (aged 70–75) to survey 5 (aged 85–90). For instance, PF for this cohort dropped from a mean score of 70.5 at baseline to 50.1 at S5 12 years later (Table 1).

Table 1 Mean (standard deviation) of SF-36 subscales by age cohort

For both the uncorrelated and correlated factors (Table 2), the factor ‘physical component’ had coefficients with high values (factor loadings of 0.2 or more) from PF, RP, BP, and GH, while ‘mental component’ had high coefficients for VT, SF, RE, and MH. Scoring coefficients with negative values, however, were considerably reduced for correlated factors compared with uncorrelated factors, for instance, MH reduced from −0.21 for PCSuc to −0.1 for PCSc; similarly for PF in MCSuc compared with MCSc. The percentage of variance explained by each of the two correlated factors (obtained using oblique rotation) was also higher than that for the uncorrelated factors (obtained using orthogonal factor rotation).

Table 2 Standardised scoring coefficients by the type of factor rotation using data at Survey 1 (N = 39,241)

When tracking mean PCS obtained from both factor rotation methods, results from the three cohorts of women showed a declining trajectory over the life course (Fig. 1; Table 3), with PCSuc dropping from 54.3 at baseline for younger women to 37.9 by the last survey for older women. PCSc showed a similar decline, though not quite as steep overall: starting with slightly lower values in the younger cohort and then higher values than PCSuc in the mid-age cohort that widens to an essentially parallel declining trajectory in the older cohort but at 1.5 units higher than PCSuc. The higher value of the ICC for the correlated factors compared to the uncorrelated factors indicated a greater degree of agreement in measurements of the correlated factors across time. The ICC was lowest for the 1973–1978 cohort (Table 3).

Fig. 1
figure 1

Mean and 95 % confidence interval of physical component summary measures (PCS_uc derived from orthogonal rotation; PCS_c derived from oblique rotation) by survey wave and age cohort (Y1 to Y5 refers to survey 1–survey 5 for the 1973–1978 cohort, M1–M6 refers to survey 1–survey 6 for the 1946–1951 cohort, and O1–O5 refers to survey 1–survey 5 for the 1921–1926 cohort)

Table 3 Mean (standard deviation) of SF-36 physical and mental health component summary scores by age cohort, derived with orthogonal and oblique rotation in factor analysis

Results from the random effects models shown in Table 4 confirm the difference in trajectories of the two measures for PCS: for the younger cohort PCSuc are higher, but with the differences narrowing from 0.85 (0.81–0.88) at survey 1 to 0.33 (0.28–0.37) at survey 5. For the mid-age cohort, the difference is smaller with an overall estimate of PCSuc being lower by 0.25 (0.22–0.28)—though the gap widens such that by survey 6 the difference is −0.61 (−0.65 to −0.57). For the older cohort, PCSuc is consistently lower than PCSc by an estimated 1.10 (1.07–1.13) across the five surveys.

Table 4 Estimates (means and 95 % confidence intervals) from random effects models for a difference in SF-36 physical and mental health component summary scores, derived with orthogonal and oblique rotation in factor analysis by age cohort

A markedly different trajectory was indicated by MCS over the life course, rising from low levels in the young women and then at a slower rate in mid-life to peak at the baseline or second survey for the older women, before declining rapidly thereafter (Fig. 2). A noticeable divergence between MCSuc and MCSc was also evident across the three age cohorts. For the younger cohort, MCSuc is consistently lower than MCSc by an overall estimate of 0.90 (0.88–0.92) (Table 4). In the mid-age cohort, the two measures crossover with MCSuc lower than MCSc by 0.39 (0.35–0.43) at survey 1, with this difference reversing to 0.57 (0.53–0.61) by survey 6. By baseline in the older cohort, MCSc reached 54.6, but was lower than MCSuc by 1.63 (1.58–1.67) unit, and had declined to 51.2 by the last survey with the divergence widening to 3.44 (3.38–3.50) units lower than MCSuc.

Fig. 2
figure 2

Mean and 95 % confidence interval of mental component summary measures (MCS_uc derived from orthogonal rotation, MCS_c derived from oblique rotation) by survey wave and age cohort (Y1 to Y5 refers to survey 1–survey 5 for the 1973–1978 cohort, M1–M6 refers to survey 1–survey 6 for the 1946–1951 cohort, and O1–O5 refers to survey 1–survey 5 for the 1921–1926 cohort)

Discussion

As far as we are aware, this is the first study to use SF-36 summary component scores from a population-based longitudinal study to present the trajectory of health-related quality of life for women over the life course. Using waves of SF-36, data from three cohorts of Australian women have also permitted a detailed comparison of results using uncorrelated and correlated PCS and MCS scores. The trajectory of PCS scores from the two methods produced similar trajectories that showed an accelerating decline across the lifespan, with correlated summary scores tracking below the uncorrelated scores from midlife onwards. In contrast, MCS scores increased in young women and through midlife to a peak in later life, before again declining in older women. There was also a notable divergence for MCS obtained from the two methods, with the trajectory of correlated MCS scores showing less of an increase in young and mid-age women leading a lower peak followed by a greater decline in older women than for the uncorrelated MCS score. This divergence occurs in part because by allowing for correlation between factors, or oblique rotation in the factor analysis, reduced the contribution of subscales with negative scoring coefficients.

The findings suggest that caution is needed to avoid misleading conclusions when using uncorrelated summary component scores to measure change in quality of life, especially when comparing changes in MCS among older women. The vicious circle of poor physical and mental health and poor functioning is well documented and this becomes more important with age [14, 15]. Correlated PCS and MCS scores can reflect such relationships in the underlying data, as was indicated by the correlation between the mental health subscale and the physical function and role physical subscales (Spearman correlation coefficient of 0.14 and 0.25, respectively). Mental and physical health may reflect common risk factors, however, and it may be difficult therefore to disentangle the precise relationship between the two [15].

In terms of previous longitudinal evidence on SF-36 subscales, the progressive decline in scores for the physical functioning subscale from this study is consistent with findings from the Whitehall II study [16]. Our results for the mental health subscale also show a similar trajectory to their findings, whereby they show an increase from midlife to a peak occurring when participants were aged around 70 years. The broader findings here regarding component summary scores also support previous evidence for the use of correlated PCS and MCS alongside uncorrelated PCS and PCS when measuring change in scores [8], though it should be noted that oblique factor rotation will generate health axes similar to the orthogonal setting if that provides an appropriate way of characterising the data.

The strength of this study lies in the population-based prospective study with three cohorts and multiple follow-up surveys of SF-36. One weakness is that using data only from women who responded to all surveys may have led to biased results. We would expect, however, that if other women had been included, such as those who missed surveys due to poor health, the divergence in results from the two methods would have increased.

In conclusion, our findings present contrasting trajectories for PCS and MCS until later life and that caution needs to be exercised when interpreting component summary scores as simple metrics of ‘physical health’ and ‘mental health’. Specifically, divergence in the component summary scores obtained using correlated factors compared with uncorrelated factors, most notably where MCS declines for older women, suggests that correlated MCS and PCS provide a more reliable metric of change in health over time. Health professionals, including researchers, should include correlated PCS and MCS as part of their assessment of change in quality of life rather than relying on uncorrelated summary scores—especially for older women. Further research is needed to see if similar trajectories for summary component scores over the life course occur for men and for other populations.