Introduction

Falls in older adults are a leading cause of disability, pain and reduced health-related quality of life (HRQOL), contributing considerably to the global burden of disease [1,2,3]. Due to the large consequences of falls, maintaining and improving HRQOL is one of the main goals of falls prevention [4]. Home care recipients have a high incidence of falls and low level of HRQOL compared to the general older population [5, 6]. Effective interventions to prevent falls and to improve HRQOL in older home care recipients are necessary to maintain functional abilities and well-being, contributing to healthy ageing [1, 3], as well as reducing costs [7,8,9].

HRQOL is an important outcome in evaluations assessing the effects of health care and policy interventions for older adults [10]. Given the rise in number of older adults and the associated increase in health care costs, evaluations are necessary to inform about the cost-effectiveness of interventions [4]. Preferences over health states in older adults can be measured by various instruments which in turn can influence the outcome of the evaluation [4, 10, 11].

The Short Form 36 Health Survey (SF-36) and the EuroQOL EQ-5D (EQ-5D) are two of the most widely used generic measures of HRQOL, which have shown to be valid and reliable when assessing older adults [12]. In the population of community-dwelling older adults, EQ-5D has been recommended due to its shortness and high response rate, while SF-36 has been recommended for more detailed and broad assessments [12, 13]. Scores from SF-36 have been translated into the preference-based utility index SF-6D, comparable to EQ-5D. Both EQ-5D and SF-6D can be employed to calculate quality adjusted life years (QALYs) in economic evaluations [11]. Comparing EQ-5D and SF-6D in older adults can provide knowledge on how the instruments respond to changes in the underlying health state in this frail group. [13]. This can assist clinicians and researchers when choosing an instrument, and policymakers interpreting recommendations.

Differences and similarities in EQ-5D and SF-6D have been studied previously in various populations. Although both instruments measure HRQOL, differences have been demonstrated, in particular in the lower end of the utility scale [14]. Across various patient groups, floor effects in SF-6D and ceiling effects in EQ-5D have been shown [11, 13, 15]. In the domains pain and mood/mental health, similar utilities have been found, while EQ-5D puts more weight on physical functioning and SF-6D on social functioning [14, 15]. In the general population, EQ-5D seems to be more responsive to chronic physical conditions [16]. This was also evident when comparing the responsiveness of EQ-5D and the age-specific index of capability for older adults (ICECAPO) detecting changes over a 12-month period in fallers with mobility impairments [4]. However, in older people with severe chronic obstructive pulmonary disease, SF-6D was more efficient in detecting differences among subgroups in disease severity, but this observation was made at one time-point and not longitudinally [17, 18].

In economic evaluations, EQ-5D has demonstrated larger health gains and lower cost-utility ratios compared to SF-6D [19, 20]. Due to large differences in the incremental cost-utility ratios, the comparability of results from these instruments has been disputed [20, 21]. In patient groups with mild health conditions, the probability of acceptance of the incremental cost-utility ratio was larger when using EQ-5D, while in patient groups with worse health conditions the probability of acceptance was larger when using SF-6D [21]. Hence, when selecting a HRQOL instrument for evaluative purposes, the characteristics of the participants and the content of the intervention is relevant to consider. The group of frail older home care recipients has not been focused on previously.

Health care interventions for older fallers receiving home care are important to maintain their physical function and quality of life and further enable them to remain at home. Evaluations are necessary to develop efficient and high-quality health care services for this group of frail older adults. By comparing SF-6D and EQ-5D, further knowledge on the instruments’ impact and their effect on evaluations can be obtained. This can be important information for health care managers and policy makers conducting evaluations to prioritise between different health care interventions for this population. We therefore conducted a study to examine the agreement between EQ-5D and SF-6D using longitudinal data on older home care recipients. We also assessed differences and similarities in HRQOL domains covered by the instruments and the responsiveness to changes in physical function over time.

Methods

Design

The study is a secondary analysis of data from a randomised controlled trial (RCT). Longitudinal data from three time points, baseline, 3 months and 6 months, was employed. The randomised controlled trial has been described elsewhere [5, 22, 23]. Reporting follows the STROBE guidelines [24].

Participants and setting

This study was conducted in primary health care service in six municipalities in Norway. Participants were recruited from lists of people receiving home care by health professionals. All assessments were performed in the participants’ homes by trained research assistants. Participants provided informed written consent before baseline testing.

Recruitment was conducted from February 2016 to February 2017 and follow-up assessments were carried out until September 2017. The sample size was determined based on power calculations for the RCT [23]. The inclusion criteria for the participants were that they received professional home care, either home nursing, practical assistance or safety alarm service. They were 67 years or older and had experienced at least one fall during the previous 12 months. They were able to walk with or without a walking aid and could understand Norwegian. Exclusion criteria were that they had any medical contraindications to exercise or a life expectancy below 1 year. They were also excluded if they had a score below 23 on the Mini Mental State Examination (MMSE), indicating cognitive decline, or if they already were participating in another falls prevention programme or trial.

Outcome measures

HRQOL was measured by two common self-report assessment tools, Short Form 36 Health Survey (SF-36) and EuroQOL EQ-5D (EQ-5D). SF-36 is a generic and validated questionnaire, which has been translated into Norwegian [25]. It consists of 36 questions on physical functioning, role limitations due to physical functions, role limitations due to emotional problems, bodily pain, general health perception, vitality, social functioning and mental health. Scores for the different items range from 1 to 6, where 1 is good and 6 is worse. Based on SF-36, SF-6D utility indexes can be calculated, with scores ranging from 0.29 to 1 [26]. SF-6D derives from the responses on 11 items of the SF-36 which are combined into six dimensions of health with four to six levels each [18]. The six dimensions are: physical functioning, role participation, social functioning, pain, mental health and vitality. EQ-5D is a generic and validated tool, but briefer, and comprises five domains: mobility, usual activities, pain/discomfort and emotions [27]. Scores for the different domains range from 1 to 5, where 1 is good and 5 is worse. The corresponding utilities in EQ-5D are ranging from − 0.59 to 1 [19]. The utility scores for SF-6D and EQ-5D were calculated. The published algorithm with parametric preference weights for United Kingdom was employed as there is no Norwegian value set [27, 28]. Measurements were taken as interviews to increase the response and completion rate [12].

Physical function consisted of measures of balance, usual walking speed and leg muscle strength. Static and dynamic balance was assessed by the Berg Balance Scale (BBS), a 14-item scale measuring performance on a scale from 0 (cannot perform) to 4 (normal performance) [29]. The sum score ranges from 0 to 56, and a score below 45 indicates a high risk of falling. Usual walking speed was based on the time that was required to walk 4 meters in their usual speed (4MWT) and was expressed in meters per second [30]. The participants could use a walking aid if needed. Leg muscle strength was assessed using the functional 30 s sit-to-stand test (STS), which reports the number of rises from a chair within 30 s [31].

Falls self-efficacy was assessed by the Falls Efficacy Scale International (FES-I) [32]. This questionnaire measures concern about falling in 16 different activities in older adults and has been recommended for clinical practice as well as research [33]. Each activity has a four-point scale from 1 (not concerned) to 4 (very concerned), and the sum score ranges from 16 to 64.

Background variables like sex, age, falls history and health care services were collected at baseline. Mini Mental State Examination (MMSE) was performed at baseline to exclude participants with cognitive decline [34].

Statistical analyses

The statistical analyses were conducted using STATA/SE 14. Descriptive statistics on the study population are reported. Mean and standard deviations (SD) were calculated for continuous data and percentages for categorical data. The distribution of SF-6D and EQ-5D scores over domains and levels is presented. Furthermore, for different baseline characteristics a threshold (median) value has been calculated and mean utility scores are reported for the groups below and above this threshold. Complete case analyses were conducted.

Bland–Altman plots illustrate the agreement between EQ-5D and SF-6D. Plots were drawn for absolute values and changes at different time points. In the Bland–Altman plots, a fitted regression line and boundaries of agreement were drawn. Outliers were investigated to check characteristics, but were not excluded as they did not substantially affect the results. To study the associations between the domains of SF-6D and EQ-5D, and physical measures, Spearman rank correlations were calculated as most of the data was ordinal. The strength of correlations was interpreted according to Cohen, where 0.10 to 0.29 is weak, 0.3 to 0.49 is moderate, and 0.5 to 1.0 is strong [35].

To examine the responsiveness of SF-6D and EQ-5D related to measures of physical outcomes, elasticities were calculated from a linear mixed regressions model with individual-specific effects [36]. An elasticity of an outcome variable y with respect to a predictor variable x is calculated such as it equals approximately the proportional change in y for a proportional change in x. The elasticities were calculated at the mean level of x and can conveniently be interpreted as the percentage change in y in response to a one percentage change in x at this level. This removes the unit of measurement and makes responses in different regression models more illustrative and directly comparable.

Results

Descriptive statistics

Eight-hundred and sixty-five older adults receiving home care from the primary health care services in Norway were contacted and checked for eligibility. One hundred and sixty-seven were assessed at baseline and 12 were excluded, ten due to a score lower than 23 on MMSE and two due to ill-health. One hundred and fifty-five participants agreed to participate, gave informed consent and were tested at baseline (T0). At intervention end at 3 months (T1), 138 completed the assessments. At follow-up at 6 months (T2), 136 participants completed the assessments. Missing data at the different time points were due to death, ill-health, declining to participate or other reasons. For EQ-5D complete data was obtained for 155 at T0, 135 at T1 and 135 at T2. For SF-6D complete data was obtained for 155 at T0, 136 at T1 and 133 at T2.

Table 1 shows baseline characteristics of the sample. The participants had a mean age of 82.7 (SD 6.7) and 73.3% were females. They all received home care services. Practical assistance (69.7%) and safety alarm service (75.5%) were most common. The mean number of falls in the previous 12 months was 2.7 (SD 3.7). 36.1% had higher education with more than 12 years of education and the mean score on MMSE was 27.4. The measures of physical function were generally low, with a mean of 39.1 points (SD 11.3) on BBS, a mean of 5.1 raises (SD 4.1) on STS, a mean of 0.6 m/s (SD 0.2) on 4MWT and a mean of 30.7 points (SD 9.8) on FES-I. The mean index score of SF-6D was 0.64 (SD 0.12) and the mean index score of EQ-5D was 0.62 (SD 0.23).

Table 1 Baseline characteristics of the sample N = 155

Tables 2 and 3 present the distribution of SF-6D and EQ-5D results at baseline. EQ-5D has a larger proportion of responses in the top category of each dimension compared to SF-6D. Very few responses are within level five of EQ-5D. Both distributions of SF-6D and EQ-5D scores appear to be reasonable spread across the scales.

Table 2 Distribution of SF-6D results at baseline (%) N = 155
Table 3 Distribution of EQ-5D results at baseline (%) N = 155

Table 4 presents a comparison of SF-6D and EQ-5D utility scores over selected groups of the sample. The table shows that there is a larger spread of values within EQ-5D utility scores compared to SF-6D between those with higher/lower age and better/worse physical function. The exception is on number of falls, where both instruments are more similar. Higher age and better scores on physical measures are associated with higher utility scores of both SF-6D and EQ-5D. Having lower scores on physical measures is associated with relatively higher scores on SF-6D compared to EQ-5D, and contrary.

Table 4 Comparison of SF6D and EQ 5D utility scores over selected groups of baseline characteristics

More information on the flow of participants, on their characteristics at baseline, as well as the results of the RCT are presented elsewhere [5, 23].

Similarities and differences in SF-6D and EQ-5D

The index scores of SF-6D and EQ-5D are strongly correlated (0.71). This is also confirmed by the Bland–Altman plots in Figs. 1 and 2. Figure 1 shows the agreement of these two preference-based measures at the time points T0, T1 and T2. Many observations cluster around 0 difference between the instruments. Furthermore, the difference is proportional to the mean HRQOL scores, illustrated by the fitted regression line. A positive difference is associated with a higher mean score, and a negative difference is associated with a lower mean score. Participants with higher mean HRQOL have higher scores on EQ-5D compared to SF-6D, and analogously participants with lower mean HRQOL have generally lower scores on EQ-5D compared to SF-6D. This relationship becomes more evident from TO to T2. Figure 2 illustrates the agreement on changes in HRQOL from T0 to T1, T1 to T2 and T0 to T2. A similar pattern can be observed as in Fig. 1. Participants with a positive change in mean HRQOL have a generally stronger positive change in EQ-5D than in SF-6D, while for the participants with negative changes in mean HRQOL, the negative change is generally stronger in EQ-5D than in SF-6D.

Fig. 1
figure 1

Bland–Altman plot on the agreement of EQ-5D and SF-6D at time points T0, T1 and T2

Fig. 2
figure 2

Bland–Altman plot agreement on changes in EQ-5D and SF-6D from T0 to T1, T1 to T2 and T0 to T2

Associations between subdomains of SF-6D and EQ-5D and physical measures

Table 5 presents correlations between the different subdomains of SF-6D and EQ-5D and physical measures. First, three items from EQ-5D are strongly correlated with four items from SF-6D; EQ-5D Self-care and SF Physical Function (0.65), EQ-5D Pain/Discomfort and SF Bodily Pain (0.71) and EQ-5D Anxiety/Depression and SF Role Participation (0.61) and SF Mental Health (0.71). Weak to moderate correlations were shown between SF Social Functioning and SF Vitality and all domains of EQ-5D. Similarly, only weak to moderate correlations were shown between EQ-5D Mobility and all domains of SF-6D.

Table 5 Correlations between subdomains of SF-6D, subdomains of EQ-5D and physical outcomes

BBS is the only physical measure which is strongly correlated with one of the HRQOL domains, EQ-5D Mobility (0.54). On EQ-5D, moderate correlations were shown between all physical measures and Mobility, Self-care and Usual Activities. On SF-6D, moderate correlations were shown between BBS, STS and FES-I and Physical Functioning and Role Participation. In addition, BBS was moderately correlated on SF-6D with Bodily Pain and FES-I with Mental Health.

Responsiveness assessed by elasticities

Table 6 presents elasticities from linear mixed regressions. Each regression includes as covariate a physical measure and individual-specific effects. Changes in physical outcomes have larger impacts on changes of HRQOL measured by EQ-5D compared to SF-6D. Elasticities can be interpreted for small changes in the dependent variable. For instance, a 10% increase in BBS score is expected to increase EQ-5D by 5.4% (p < 0.001) and SF-6D by 1.8% (p < 0.001). A 10% improvement of FES-I is expected to give an improvement of 2.4% (p < 0.001) on EQ-5D and 1.3% (p < 0.001) on SF-6D. On STS and 4MWT an increased score is expected only leading to significant changes (p < 0.05) of EQ-5D.

Table 6 Elasticities from linear mixed regressions with individual-specific effects

Discussion

This study compares two widely used metrics of HRQOL in this growing population of older fallers living at home. The results show that SF-6D and EQ-5D are strongly correlated in index scores and some subdomains. There are, however, some differences in the utilities’ agreement and in their responsiveness to changes in physical function. EQ-5D gave generally higher scores than SF-6D at a higher mean HRQOL and/or when physical function was better. In contrary, SF-6D provided relatively higher scores than EQ-5D when mean HRQOL was lower and/or when physical function was poorer. EQ-5D showed to be more responsive to changes in physical function compared to SF-6D in older home care recipients who have experienced falls.

There seems to be a high level of agreement between SF-6D and EQ-5D, both in absolute values and in changes over time. There are also some differences. EQ-5D appears to be more sensitive to changes than SF-6D. Similar results have been presented previously. A study comparing the two instruments in mental health patients showed that EQ-5D resulted in larger health gains, in particular in the subgroup with higher severity of mental health problems [19]. Another study showed that, even though SF-6D had better distributional properties, it did not improve the sensitivity to change [15]. EQ-5D has fewer health states, but it seems not limiting its ability to measure utilities [13]. Hence, the range and variability of the two outcomes, where SF-6D has a smaller range and lower variability, could contribute to its lower sensitivity [11]. The sensitivity of EQ-5D is even more evident at T2 (6 months) than at T0 (baseline). This result could be due to the change in HRQOL over the 6 months’ time period, possibly caused by the intervention.

Another reason for limitations in the agreement could be differences in domains of HRQOL covered by the instruments [15]. HRQOL is a complex concept and includes both physical, mental and social domains of health [37]. SF Physical Function was strongly correlated with EQ-5D Self-Care, and SF Role Participation was strongly correlated with EQ-5D Anxiety/Depression. SF Bodily Pain was strongly correlated with EQ-5D Pain/Discomfort and SF Mental Health with EQ-5D Anxiety/Depression. Interestingly, SF Social Functioning and SF Vitality were only weakly to moderately correlated with EQ-5D subscales. This could be due to the discrepancy in domains included. Other elements than physical health, as mental health, might be under-represented in EQ-5D compared to SF-6D [38]. A previous study showed that SF-6D was more efficient at detecting external indicators of health status, for instance longstanding illness or disability, compared to EQ-5D [14]. SF-6D can tap into broader aspects of HRQOL, as role participation and social functioning, which could be the reason why it is more sensitive in complex health conditions. Moreover, a higher number of response items on each of the domains can result in a larger descriptive system with possibly greater sensitivity to the external health indicators. Enhancing several dimensions of health can be important when evaluating interventions for older home care recipients. This should be considered when selecting HRQOL measurement tools for this population.

The instruments’ responsiveness to changes seems to be influenced by the type of intervention being evaluated [15]. In the present study, a relatively higher responsiveness to changes in physical function was shown in the scores of EQ-5D compared to SF-6D. Positive changes in HRQOL were associated with improvement of balance, measured by Berg Balance Scale, and falls self-efficacy, measured by Falls Efficacy Scale International. Results were illustrated using elasticities, another representation of regression coefficients enabling direct comparison of the estimates. The findings emphasise the importance of picking an instrument that is more sensitive to the elements that the intervention is aiming to change. When evaluating a falls prevention exercise intervention, responsiveness to underlying physical changes could therefore be of importance. However, if the falls prevention intervention had included actions also targeting other dimensions, as social functioning or vitality, SF-6D could be more responsive. The HRQOL outcome selected will influence the economic evaluations conducted and further the decision-making in public health policy.

Another element to consider when selecting HRQOL instruments is the older adults’ health status and level of function at baseline. EQ-5D is thought to be sensitive in patient groups with more severe health states at baseline and less sensitive in patient groups with milder health states at baseline, and the opposite applies to SF-6D [21]. In comparison with normative data on SF-6D and EQ-5D, the participants in the present study had a lower level of self-perceived HRQOL [39, 40]. The older home care recipients had a mean index score of 0.62 on EQ-5D. In a study including Danish population norms, the index score was 0.85 in males and 0.82 in females aged 70–79 [39]. In SF-6D, the mean index score in the present study was 0.64. Norms from a sample of the British population demonstrated a mean index score ranging from 0.77 in the age-group 70–74, to 0.69 in the age-group above 85 [40]. While the Danish and British population of older adults are similar to the Norwegian, there might be some differences in socioeconomic status influencing the general health status [41]. Nevertheless, the low level of HRQOL in this population of home care receivers and fallers emphasise the need for effective health services in the primary care to maintain or improve their HRQOL contributing to healthy ageing.

In addition to health policy, this study has implications for patient management in primary care. Measuring HRQOL in older adults is increasingly seen as important in evaluations and there is a lack of tools that can be applied in clinical practise [4, 42, 43]. In the present study, both measurements were conducted as interviews as recommended for this population, achieving high completion rates, but are at the same time more time-consuming and costly [12]. Previous research including older adults has suggested that EQ-5D might be sufficient when brevity is required and the health changes are expected to be substantial, while SF-36 is more beneficial when several details are required and the health changes expected are less substantial [12, 44]. Although general measures of HRQOL provide relevant information in the group of older adults, there might be some important age-specific factors that are missing, as for instance sensory abilities and autonomy [42]. Instruments designed specifically for older adults could therefore be an important addition in clinical practise.

This study has both strengths and limitations. Due to a thorough follow-up a low number of missing for both SF-6D and EQ-5D was achieved. The participants were recruited from six municipalities to a falls prevention exercise intervention, and the results might not representative for the general population of older home care recipients. A low percentage of males were included, but this is typical for the population of older home care recipients. The preference weights of SF-6D and EQ-5D have been developed specifically for the United Kingdom, but the Norwegian population could have different preferences. The two scales also differ in their range. To deal with this issue elasticities were calculated, where responsiveness is expressed as comparable changes in percentages. Few index scores are in the lowest end of the distribution, where EQ-5D has health states regarded worse than what SF-6D is able to generate. Finally, responsiveness was only explored related to important physical measures collected in the randomised controlled trial. Other outcomes might also be relevant for this population.

Conclusion

SF-6D and EQ-5D are strongly correlated, but there are some differences in their agreement, aspects of HRQOL covered and responsiveness to changes. Older adults with a generally higher level of HRQOL and/or better physical function achieved a relatively higher score on EQ-5D, and older adults with a generally lower level of HRQOL and/or poorer physical function achieved a relatively higher score on SF-6D. EQ-5D was more responsive to changes in physical function compared to SF-6D in older home care recipients who had experienced falls. This study shows that selecting a HRQOL instrument for evaluating an intervention may depend on the characteristics of the intervention and the studied population. The choice of instrument can affect the outcome of evaluations in the group of frail older adults and consequently health policy for this increasing population.