Introduction

Generally, in older people, adverse health-related outcomes such as frailty, sarcopenia, falling, and fractures are well known to increase with age [1, 2]. These adverse health-related outcomes are critical risk factors in terms of causing functional disabilities in older people who live independently in their community. Muscle strength and physical performance tests are commonly used in clinical and community settings to evaluate the risk of frailty, falling, and sarcopenia [3,4,5]. Physical exercise interventions are carried out to help prevent frailty, falling, and sarcopenia, and muscle strength and physical performance tests are used to verify their effectiveness [6]. However, estimating measurement errors for muscle strength and physical performance tests in community-dwelling older people is indispensable for accurate evaluation. On the other hand, measurement errors for muscle strength and physical performance tests for older people have not been clearly defined because previous studies have been insufficient in terms of having relatively small sample sizes and participants that did not include older people with high functional capacities [7, 8]. These issues must be addressed to clarify the absolute reliability of muscle strength and physical performance tests and establish accurate evaluations. Verifying the absolute reliability and determining the reasonable measurement errors in muscle strength and physical performance tests could, therefore, contribute to the establishment of judgment criteria for the effectiveness of physical exercise interventions and the risk of functional decline. Especially, the determination of judgment criteria could be a useful index for the evaluation of intervention effects and the risk of functional decline at the individual rather than the population level.

The purpose of the present study was to clarify the absolute reliability of muscle strength and physical performance tests for community-dwelling older people with high functional capacities.

Materials and methods

Participants

We analyzed data obtained from 718 community-dwelling older volunteers recruited from Sagamihara city in Kanagawa Prefecture, Japan, using advertisements in newspapers and community newsletters. The inclusion criteria were as follows: age 65 years and older, and able to perform activities of daily living (ADL) independently. The participants who could perform ADL independently were defined as those who lacked certification of care level by long-term care insurance (LTCI) system, and this was confirmed by interviews with experienced researchers at recruitment. The LTCI system is a public insurance system established by the Japanese government that covers all Japanese individuals aged 65 years and older. An individual’s certified care level is determined according to uniform criteria under the LTCI, and is assessed by a trained investigator and a primarily physician. A certified care level means that an older person requires support to perform ADL [9]. Further, participants suspected of having dementia based on these interviews were excluded.

This study was approved by the Institutional Review Board of the School of Allied Health Sciences at Kitasato University (approval number 2016-G021B), and written informed consent was obtained from all participants.

Muscle strength and physical performance tests

We conducted two different isometric muscle strength tests and four different physical performance tests, as described below. For the isometric muscle strength tests, we administered a grip strength test and a knee extension strength test. For the physical performance tests, we administered the five-times chair stand test (FCST) and the timed up and go test (TUG), and measured 5-m walking time at a comfortable pace. These tests have been shown to be useful to predict adverse health-related events such as falling and functional disabilities in community-dwelling older people. To verify the test–retest reliability and the measurement errors for all tests, all measurements were carried out twice by the same experienced physical therapist.

  1. 1.

    Grip strength

Grip strength, which is a simple and inexpensive isometric muscle strength test, is commonly used as a diagnosis criterion in sarcopenia and frailty [5, 10, 11]. Further, a meta-analysis found grip strength to be a predictable index for adverse outcomes in older people [12]. In this study, maximum voluntary grip strength was measured using a Smedley-type dynamometer (T.K.K.5401, TAKEI Scientific Instruments Co., Ltd., Niigata, Japan). All measurements were performed on the dominant hand with the elbow joint extended at the side of the trunk while the participant was in a standing position. The measurement precision was 0.1 kgf.

  1. 2.

    Knee extension strength

Knee extension strength is clinically used to assess isometric muscle strength for the quadriceps muscle and has been reported to be associated with mortality and functional decline in older people [13, 14]. In this study, maximum voluntary knee extension strength was measured in the right leg using a handheld dynamometer (μ-Tas F-1; Anima Inc., Tokyo, Japan). All measurements were performed while the participants were in a seated position with the knee joint in 90° flexion. The sensor of the handheld dynamometer was secured on the lower part of the crus (top of the medial and lateral malleolus) using a belt. The measurement precision was 0.1 kgf.

  1. 3.

    Five-times chair stand test [15]

The FCST is clinically used to assess lower extremity function and has been reported to be associated with knee extension strength [16]. Recently, the FCST has also been recommended as a clinical parameter of muscle strength for the identification of sarcopenia [5]. For the FCST, the participants sat on a standard armless chair (height 42 cm) with their feet apart at shoulder width and their arms crossed in front of their chest. They were then instructed to stand up from the chair and sit back down five times as quickly as possible. The time required to complete the task was measured using a digital stopwatch (ALBA W072; Seiko Watch Corporation, Tokyo, Japan). The measurement precision was 0.01 s.

  1. 4.

    5-m walking time

Walking time at a comfortable pace is simple test for an assessment of mobility and walking ability, and is commonly used as a diagnosis criterion in sarcopenia and frailty [5, 10, 11]. Further, walking time, compared with muscle strength and other physical performance tests such as balance and lower extremity function, has been suggested to be powerful predictor of impaired ADL in older people [17]. In this study, the total length of the walkway was set at 9 m, including acceleration and deceleration zones at the start and end of the walkway. All participants were instructed to walk on the walkway at their usual walking pace without any assistance. The time required to walk the 5-m length in the middle of the walkway was measured using a digital stopwatch (ALBA W072; Seiko Watch Corporation). The measurement precision was 0.01 s.

  1. 5.

    Timed up and go test [18]

The TUG, which was developed as a test of functional mobility for older people, is also used as a screening tool for the assessment of frailty [11] and fall risk in older people [19]. For the TUG, the participants were instructed to rise from a standard armless chair (height 42 cm), walk 3 m, turn around, walk back to the chair, and sit back down as quickly as possible. The time required to complete this task was measured using a digital stopwatch (ALBA W072; Seiko Watch Corporation). The measurement precision was 0.01 s.

Basic characteristics

The participants’ age, height, body weight, body mass index, and Tokyo Metropolitan Institute of Gerontology Index of Competence (TMIG-IC) scores were recorded. The TMIG-IC, which is used to assess functional capacities higher than ADL in community-dwelling older people, consists of the following three sub-items: instrumental ADL (IADL), intellectual activity, and social roles [20]. TMIG-IC total scores range from 0 to 13 (the sub-scores for IADL, intellectual activity, and social roles range from 0 to 5, 0 to 4, 0 to 4, respectively), with higher scores indicating greater functional capacity.

Statistical analysis

With respect to the reliability of the isometric muscle strength and physical performance tests, relative and absolute reliability were verified based on two measurement values for each test. Intraclass correlation coefficients (ICCs) were calculated to investigate relative reliability [21]. For the purposes of the present study, the following ICC criteria were used: 0.90–0.99 indicated high reliability, 0.80–0.89 indicated good reliability, 0.70–0.79 indicated fair reliability, and < 0.69 indicated poor reliability [22]. Further, the detection of systematic errors and the calculation of the margin of error were carried out to investigate absolute reliability. For systematic errors, Bland–Altman analysis was performed. In addition, the presence or absence of fixed and proportional biases were investigated [23]. The presence of a fixed bias was determined using a one-sample t test [23] and effect size (Cohen’s d) [24], and that of a proportional bias was detected based on the relationship between the differences and means of two measurement values using Pearson’s correlation coefficient [25]. The presence of a fixed bias was defined as a probability of < 0.05 by the one-sample t test and an effect size (d) of > 0.5. The presence of a proportional bias was defined as a Pearson’s correlation coefficient (r) of > 0.3. If no fixed or proportional biases were found, further analysis was performed to determine the margin of error. These criteria were used in accordance with a previous report [24].

Regarding the margin of error, the minimal detectable change (MDC) was calculated using the following formula: 1.96 × √2 × SD × √(1 − ICC) (SD: standard deviation) [26]. In addition, %MDC was calculated by dividing the MDC with the mean of two measurement values for each test. A %MDC of < 30% was considered acceptable, and a %MDC of < 10% was considered excellent [27].

Statistical analysis was performed using the R programming language and environment (R version 3.2.2) [28].

Results

The participants’ demographic characteristics are presented in Table 1. The mean TMIG-IC total scores were 11.8 ± 1.6 in men and 12.5 ± 1.2 in women, and 37.6% of men and 46.6% of women showed the best functional capacity (i.e., highest possible TMIG-IC score). Further, 81.3% of men and 97.3% of women had the highest possible IADL sub-score (five points) and could perform IADL independently.

Table 1 Characteristics of the participants in this study

To assess systematic bias, Bland–Altman analysis was conducted. Bland–Altman plots of the muscle strength and physical performance tests are shown in Fig. 1. The presence or absence of fixed and proportional biases is shown in Table 2. Fixed and proportional biases were found in the FCST, but not in any of the other tests. The ICCs and margins of error for the muscle strength and physical performance tests are shown in Table 3. Furthermore, we stratified the data by gender and age (< 75 and ≥ 75 years) and analyzed the presence or absence of fixed and proportional biases. As a result, no fixed or proportional biases were found, except for the FCST in both gender and age groups. The ICCs of the total sample for grip and knee extension strength indicated high reliability, while those for the 5-m walking time and TUG indicated good reliability. The MDC and %MDC of the FCST are not shown because of the presence of fixed and proposal biases. The %MDCs of grip strength, 5-m walking time, and the TUG were all < 10%, and the %MDCs of the sub-groups stratified by gender and age were also all < 10% (Table 3). By contrast, that of knee extension strength was 12%, and the %MDCs of the sub-groups were also ≥ 10%, except for that in men (Table 3).

Fig. 1
figure 1

Bland–Altman plot representing the differences between repeated measurements of muscle strength and physical performance. Dotted lines indicate the mean difference between repeated measures. Bold lines indicate the 95% limit of agreement

Table 2 Systematic errors for the muscle strength and physical performance tests
Table 3 Relative and absolute reliability for the muscle strength and performance tests

Discussion

In present study, we analyzed data obtained from 718 community-dwelling older people who could perform ADL independently and had high functional capacity, and clarified the absolute reliability of muscle strength and physical performance tests that are widely used in the clinical setting. The absolute reliability of the muscle strength and physical performance tests for older people with high functional capacities is essential so that healthcare professionals can evaluate precisely the risk of adverse health-related events such as frailty and falling. The MDCs and %MDCs for the absolute reliability of such tests determined in the present study could be a useful index for muscle strength and physical performance tests in community-dwelling older people.

With respect to muscle strength and physical performance tests, relative reliability was investigated using ICCs, which are commonly used to determine test–retest reliability [21], and systematic errors were investigated using Bland–Altman analysis. The ICCs for all tests were found to be better than a fair level [22]. Furthermore, no systematic errors were found in any tests except for the FCST. These findings did not differ as a result of the analysis stratified by gender and age. Regarding the systematic bias of the FCST, both fixed and proportional biases were found; the first measurement value was larger than the second, and the error between the first and second measurement values increased in accordance with increments in the mean values. That is, better performances tended to be seen for the FCST in the second trial, and this tendency was prominent in individuals who had a worse performance in the first trial; this finding appeared to be the result of a learning effect [29]. However, the extent of the learning effect could not be clarified from the data, and we are not aware of any method to offset this type of learning effect. Further study will be necessary to address this issue. Consequently, we could not estimate the MDC or clarify the margin of error for the FCST in this study. In a recent consensus published by the European Working Group on Sarcopenia in Older People regarding the diagnosis of sarcopenia, 15 s or more in the FCST was defined as low muscle strength [5]. Although the FCST is known to be a useful indicator of low muscle strength, careful attention is necessary when assessing changes over time sake for critical systematic error in measurement.

Regarding the margin of error, the %MDCs of grip strength, 5-m walking time, and the TUG were considered excellent. On the other hand, knee extension strength was only considered acceptable. The detectability of grip strength, 5-m walking time, and the TUG were higher than that for knee extension strength. To detect changes in function and performance over time in older people, grip strength, 5-m walking time, and the TUG are, therefore, considered to be useful.

MDC and %MDC were investigated by previous studies in older people or those with a condition such as Parkinson’s disease. With respect to muscle strength, the MDC for grip strength has been reported to be 5.2 kgf in patients who have undergone cardiac rehabilitation [30]. Furthermore, the %MDC of knee extension strength has been reported to be 50% in older nursing home residents [7]. Regarding physical performance tests, the %MDC of gait speed has been reported to be about 20% in older people who can walk independently; however, that study utilized a relatively small sample (n = 52). Furthermore, the %MDC of gait speed in people recovering from stroke has been reported to be about 60% [31]. In addition, the MDCs of the TUG have been reported to be 3.5 s in people with Parkinson’s disease [32] and 4.1 s in people with Alzheimer’s disease [33]. Thus, the MDCs and %MDCs reported in previous studies were larger than those in the present study, which included community-dwelling older people with high functional capacities. These findings suggest that the absolute reliability of muscle strength and physical performance tests are greatly influenced by factors such as age, functional level, and disease. Therefore, the absolute reliability of muscle strength and physical performance tests appears to be a useful index for community-dwelling older people with high functional capacities.

This study had several limitations. First, grip strength was measured using a Smedley-type dynamometer, which is used widely and clinically in Japan [34]. However, a previous study reported that grip strength measurement values differed between Jamar- and Smedley-type dynamometers [35]. Therefore, whether MDC and %MDC for grip strength estimated using a Smedley-type dynamometer are identical to those assessed by a Jamar-type dynamometer remains unclear. Second, the measurement of knee extension strength was performed without body stabilization of the abdomen and pelvis because it was simpler and more clinically practical. Previous research has suggested that body stabilization in the measurement of knee extension strength affects muscle strength [36]. In this study, a large measurement error was found in knee extension strength compared with grip strength; however, no mention was made of measurement errors for knee extension strength with body stabilization. Therefore, the measurement errors for knee extension strength estimated in this study may be limited to measurement values taken without body stabilization. Third, all measurements were carried out in two trials in this study, and absolute reliability was verified based on these data. Therefore, the absolute reliability established by this study may be limited to tests in which measurements were carried out twice. The number of trials in this study were set considering previous studies and clinical applications. For instance, many studies set the number of trials from one to three for measurements of grip strength and the TUG [34, 37]. Furthermore, applying additional trials may be difficult in clinical settings because of patient fatigue and burden, and the restricted amount of time available for treatment and assessment. Therefore, the absolute reliability established by this study may be limited to tests in which measurements are carried out twice.

In conclusion, the present study clarified the absolute reliability of muscle strength and physical performance tests commonly carried out in the clinical setting, which could therefore serve as reference values for measurement errors to detect changes in muscle strength and physical performance over time at the individual level in community-dwelling older people with high functional capacities. Because of critical systematic errors, the FCST was suggested to be an inappropriate physical performance test as an outcome measure to detect changes over time in lower extremity function. On the other hand, the measurement errors for grip strength, 5-m walking time, and the TUG were found to be < 10%. Thus, these tests were suggested to be a useful index to detect changes over time in muscle strength and physical performance tests in community-dwelling older people. Clinically, the findings suggest that changes of ≥ 10% detected in these tests may be interpreted as true changes in muscle strength or physical performance at the individual level. However, knee extension strength showed relatively large measurement errors, thereby suggesting that the sensitivity for detection of changes over time might be more limited than grip strength, 5-m walking time, and the TUG.