Introduction

Fragility fracture risk increases dramatically with advancing age, but bone mass does not have a comparable decline [1]. As such, it is apparent that something other than bone loss causes the age-related increase in fracture risk. A major contributor is sarcopenia with resultant impaired physical performance leading to increased falls and hip fracture risk [2]. As such, interest in sarcopenia among the “osteoporosis” field has dramatically increased in recent years.

Sarcopenia was first described as an age-related decline in lean mass that was associated with reduced mobility and independence [3, 4]. Initially, sarcopenia definitions were based solely on muscle mass measurements [57]. More recently, it has become clear that physical and muscle function testing and muscle imaging parameters that assess muscle quality, e.g., muscle fat infiltration, better predict outcomes in sarcopenic individuals than lean mass measurement alone [815]. Consequently, recent consensus sarcopenia definitions include both muscle mass and function assessment [7, 1518]. As such, reproducible methods to assess both muscle mass and function are needed.

Numerous techniques exist to assess physical and muscle function [18, 19]. Physical function tests such as gait speed measure overall physical function and are generally characterized by complex movements that require a high degree of coordination of different muscle groups, low intensity (in terms of using maximal muscle force or power), and also rely heavily on other body organs such as the eyes, the vestibular system, or proprioception. In contrast, muscle function tests, e.g., maximal grip strength, focus on single muscle groups or extremities and are of high intensity (power or force). It is possible that tests combining these attributes, i.e., requiring complex and coordinated movements of high intensity, are superior to traditional physical function tests in assessing the effect of sarcopenia interventions on adverse health outcomes such as falls.

These traditional tests include grip strength, gait speed, timed-up and go, chair rise, and the short physical performance battery (SPPB) [9, 2022], an assessment that combines gait speed, repeated chair rise time, and balance assessment [21, 2327]. Despite widespread use, these existing functional tests have limitations. For example, some only measure one specific component of muscle function in a particular muscle group, as is the case for grip strength. Others use dichotomous (yes/no) determinations, for example, the tandem stance test. Importantly, many traditional tests, (e.g., chair rise, gait speed, and timed-up and go), include some human subjectivity, potentially contributing error to measurement time. Finally, substantial changes in function are required to detect measured change. The aforementioned factors might decrease test reliability and sensitivity to detect change over time. Availability of highly sensitive and reproducible tools is particularly important for studies that examine potential therapeutic interventions, as more sensitive and reproducible tools can reduce sample size, study duration, and cost, but could also become used in clinical monitoring of an individual patient. Indeed, it is conceivable that medications in development for sarcopenia have failed due to measurement capabilities; specifically, agents that improve muscle mass have not always documented muscle function improvements [2830]. This observation possibly reflects inability of traditional methods to detect small functional changes.

As such, we, and others, have become interested in using jumping mechanography (JM) as a tool to assess muscle function in populations at risk for muscle impairment and decreased bone strength [3134]. For the JM evaluation, countermovement jumps are performed on a force platform that calculates jump power and height [27, 31, 33, 35, 36]. JM combines the components of physical and muscle testing by evaluating a high intensity, complex physical task that requires a high degree of muscle force and power and also necessitates the use of other systems (e.g., vestibular, neural, vision) that are required for activities of daily living. While one might be concerned about performing jumping tests in older adults, JM is safely performed, even in very elderly individuals [27, 31, 35]. Existing data suggest that JM has good reproducibility with no significant learning effect over time [35, 37, 38]. However, available studies included healthy adults and only a small number of individuals age 70 years and older. Keeping in mind that performance variance is greater in older adults than in younger adults and that treatment interventions for sarcopenia will require muscle function assessment in older adults, it is important to demonstrate reproducibility and sensitivity to change in this age group. To this end, the purpose of this study was to compare JM reproducibility in older adults with that of other commonly used physical and muscle function tests.

Methods

Participants

Independently living adults age 70 years or older, residing either in the community or retirement facilities, were invited to participate. The study population has been described previously [32]. Briefly, study inclusion criteria required the ability to stand without assistance, absence of clinically significant acute disease, and ability to sign informed consent. Potential participants were excluded if they had sustained a prior fragility fracture and if dual energy x-ray (DXA) measured bone mineral density (BMD) T-score was −3.5 or worse. This study was reviewed and approved by the University of Wisconsin-Madison Institutional Review Board, and was conducted per Federal regulations.

Study design

Potential participants were evaluated at the University of Wisconsin Madison Osteoporosis Clinical Research Center during a screening visit. At that time, following informed consent, BMD measurement and vertebral fracture assessment (VFA) was performed. Volunteers meeting entry criteria noted above underwent muscle/functional testing at the screening visit. They returned for a baseline evaluation 1–3 weeks later (mean 18 days) with subsequent visits 2 weeks and 3 months after baseline.

Physical and muscle function tests

As described above, muscle function tests were conducted at four visits: screening, baseline, 2 weeks, and 3 months post baseline. A window of 2 to 3 weeks was allowed between screening and baseline. Testing included the SPPB as a measure of physical function and JM and grip strength as measures of muscle function. Grip strength was performed at only baseline and 3-month time points as this measure is documented to have high reproducibility [39]. Consequently, acquisition of additional data in this regard was felt to have only limited value. A standardized testing order was followed; JM was always performed first, followed by grip strength, when conducted, then the SPPB. Research study coordinators conducted all tests and read the same scripted instructions to each subject for every test at all visits. One examiner conducted all jump testing and another administered all SPPB and grip strength evaluations.

Subjects performed two-leg countermovement jumps on a Leonardo force platform (Novotec, Pforzheim, Germany). Jumps were recorded and analyzed utilizing Leonardo software version 4.2. Each test session required subjects to perform countermovement jumps as previously described; they were required to complete two successful practice jumps, and the next three technically adequate jumps were recorded [31]. The jump achieving the greatest height from this set of three was used for this study analysis. Specific jump power (W/kg) and maximal jump height (cm) variables are reported.

Components of SPPB were completed in the standard order of chair rise, three tandem balance tests, and a 4-m walk [23]. This test generates a cumulative score based on the performance of the 3 components with a maximum achievable score of 12. For purposes of these data, we utilized total SPPB score and the performance time of the chair rise (seconds) and 4-m gait speed (m/s).

Grip strength was measured using a hand-held dynamometer (Jamar; Bolingbrook, IL) at the baseline and 3-month visits using a previously described technique [39]. Subjects performed the test standing, using the non-dominant hand. Three trials were performed at each exam with subjects resting for 10–20 s between trials. When trial results differed by greater than 3 kg, a fourth trial was obtained and maximal achieved grip strength (kg) was used for analyses.

DXA bone and body composition analysis

Proximal femur, lumbar spine BMD, VFA, and total body composition scans were acquired using a Lunar iDXA densitometer (GE Healthcare, Madison, WI). All scans were acquired and analyzed using enCORE software version 11.0 per manufacturer guidelines. A BMD T-score of ≤−2.5 at the lumbar spine, total femur, or neck was utilized to categorize subjects as osteoporotic. An experienced reader (NB) blinded to subject and time point evaluated all VFA images obtained at screening and 3 months to determine vertebral fracture prevalence and incidence. Lean and fat mass were determined from the total body scans. Specifically, appendicular lean mass (ALM) was calculated by adding the lean mass of the arms and legs and dividing by height squared (arm lean mass [kg] + leg lean mass [kg])/height [m]2). Subjects were classified as sarcopenic or non-sarcopenic by applying the Baumgartner definition to their ALM calculation (females <5.45 and males <7.26 kg/m2) [5].

Statistical analysis

Differences in demographics and functional performance by sex were assessed using Student’s t test. Repeat measures ANOVA examined the change in muscle function for each test and gender over time, i.e., over the four study visits. Microsoft Excel (Seattle WA), JMP, and SAS (Cary, NC, USA) were used for these calculations. Method variance for each test was evaluated by standard deviation and least significant change (LSC). LSC, calculated in routine manner (root mean square SD with 95 % confidence), accounts for test error and represents the minimum longitudinal difference that must occur with each test to document a physical change.

Mixed effects linear regression models were used to estimate between- and within-subject variance parameters for each functional performance outcome. All models included all follow-up visits; additional models also adjusted for sex and body mass index (BMI). Measurement reproducibility was summarized using the intra-class correlation coefficient (ICC). ICCs evaluate the reproducibility of observed measures relative to the variation in true measures between subjects in the population under study. Within a study/population, ICCs facilitate direct comparisons of reproducibility of different muscle function tests by automatically adjusting for the different measurement scales. Confidence intervals were obtained using the bootstrap. All analyses were performed in R version 3.0.2 (R Foundation for Statistical Computing, Vienna, Austria).

Results

Demographic data

Average age of the 49 women and 48 men in this trial was 80.7 (range 70–95) years. Group mean BMI was 25.6 kg/m2, 24.7 % were classified as osteoporotic per WHO definition, 22.7 % had prevalent vertebral fracture, and 23.7 % were classified as sarcopenic using the Baumgartner definition. No differences between men or women were observed except men were taller and more women were osteoporotic (mean height 172.7 vs. 160.8 cm; 36.7 vs. 12.5 %, p < 0.05, Table 1). No adverse events related to the functional tests were reported, and no new or incident vertebral fractures were identified on the 3-month VFA.

Table 1 Participant demographics and functional performance at the screening visit

Physical and muscle function testing

The repeated JM trials at each study visit did not demonstrate a pattern of decline or improvement in performance, suggesting that multiple trials did not cause fatigue or provide “practice” to improve performance (data not shown). Mean jump power and height were 20.0 W/kg and 17.8 cm, respectively; mean grip strength was 25.3 kg. Men generated higher (p < 0.0001) mean measurements than women for jump power 22.5 vs. 17.6 W/kg, jump height 20.5 vs. 15.3 cm, and grip strength 32.4 vs. 18.3 kg. Mean SPPB was 10.1, repeated chair rise 14.0 s, and gait speed 1.08 m/s with no difference in performance by sex. These data are detailed in Table 1.

No clear change in muscle function was observed over time (Table 2). No changes were observed for any test in the male cohort. In contrast, statistically significant differences were present in the female cohort; however, there was no consistent functional change pattern in that chair rise time improved while grip strength and jump height decreased. No clear pattern was seen in SPPB scores (Table 2).

Table 2 Change in muscle/physical function over time

Method variance for each test expressed by standard deviation and LSC is presented in Table 3. LSC values of the different tests varied greatly. This variation was related to the size of the test mean values (see Table 1), i.e., larger mean values were associated with larger LSCs.

Table 3 Method variance

All trial visits for each test were utilized to calculate ICC values and all were adjusted for study visit. Grip strength and jump power demonstrated the best reproducibility with ICCs of 0.95 and 0.93, respectively (Table 4). Total SPPB score and 4-m gait speed were the least reproducible with ICCs of 0.77 and 0.76, respectively. Adjustment for visit, sex, and BMI lowered ICCs for jump height, jump power, and grip strength, primarily due to the large sex differences in these parameters. Nonetheless, jump power and grip strength still demonstrated the best reproducibility with ICCs of 0.90 and 0.87, respectively.

Table 4 Method reproducibility

Discussion

The development of pharmacologic and non-pharmacologic approaches to mitigate sarcopenia and thereby reduce falls and fracture risk necessitates sensitive and reproducible testing methodologies. The data reported here demonstrate that JM is a highly reproducible and stable functional test over time. JM reproducibility is similar to grip strength, an isometric exercise that evaluates only one muscle group. However, JM requires coordinated activity and safely evaluates an individual’s maximal effort; as such, it may have advantages over traditional muscle/functional tests in older adults. Indeed, JM is a strength measurement that also incorporates the functional components assessed in traditional chair rise and gait speed testing. Considering the low injury risk of JM, it may prove to be a safe, and potentially more sensitive method of evaluating maximal effort in those with functional limitations. Additionally, recent data show that JM is also correlated to pQCT measured cortical bone strength, a finding which further highlights the potential usefulness in those at risk for fracture [34].

In contrast to the high reproducibility found for JM, these data also reinforce the limited ability of SPPB and 4-m gait speed to detect small changes. The high variance and relatively low ICC of these tests may be due to the inclusion of human variability in obtaining the measurements; a confounder minimized by the computerized data acquisition of JM.

Our results are similar to those published by Rittweger et al. [35] who concluded that JM and maximal gait speed performed better than timed get up and go, free gait speed, and chair rise tests. In that study, the gait speed tests were 10 m at full walking speed as distance was added before and after the walking course to remove potential effects of acceleration and deceleration. Consequently, these studies evaluated somewhat different functional tests. Additionally, the Rittweger study evaluated only a small number (n = 58) over a wide age range (19 to 88 years); inclusion of young adults likely increased inter-individual variability of all the functional tests. For example, the range of weight-corrected jump power was 8–65 W/kg, whereas the range was 10–32 W/kg in this cohort. Similarly, Matheson et al. [38] examined JM intra- and inter-rater reproducibility in ten young adults and reported ICCs in the same range as this study. As such, this study finds similar reproducibility over time in older adults as that previously reported in younger cohorts.

Developing optimal tests to monitor sarcopenia interventions are sorely needed to adequately evaluate pharmacological therapies in phase 3 trials and ultimately bring such agents to clinical care [28, 29, 40, 41]. Such testing methods should be reproducible, stable over time (i.e., no learning effect), and sensitive surrogates that predict adverse outcomes of this disease, such as falls and fractures. In this regard, measurement of muscle mass, e.g., DXA lean mass, have good reproducibility, but do not predict clinical outcomes as well as muscle function tests [10, 13]. Although muscle function tests appear valid, there is concern about reproducibility in older adults. The data reported here (ICCs, LSCs) are likely to provide appropriate estimates of reproducibility variables because they focused on an older population at risk. An additional strength of this study is that several measurements (not only two) were obtained over a time period of a few months. This approach likely produced a more robust estimate of the true reproducibility of these physical and muscle function tests. One might argue that in an elderly population such as studied here, a biologic decline in muscle function could occur over 3 months. However, no clear change was observed in this community-dwelling cohort. Additionally, ICCs were also calculated based on only the first three visits and although the absolute measurements changed, the differences between the muscle function tests remained similar (data not shown). As such, true biological change seems unlikely to be confounding these results.

Limitations of this work include a relatively well-functioning older adult cohort as demonstrated by less than 25 % of the sample meeting ALM/height2 diagnostic criteria for sarcopenia. Additionally, the applicability of utilizing various surrogates would be improved if these test results were correlated prospectively with outcomes such as falls, fracture, or chronic disease to assess predictive power. Finally, not all traditional functional tests were evaluated, for example, the fastest gait speed, 30-s chair rise, 6-min walk, and timed get up and go were arbitrarily omitted from the study due to concern about reducing reproducibility by causing fatigue.

In conclusion, JM is a highly reproducible test in older ambulatory adults and demonstrates less test variability than some traditional physical functional tests commonly used in clinical and research settings. Additionally, it evaluates a complex, high-intensity movement, thereby combining features of common physical and muscle function tests. As such, JM may well have enhanced capability to detect change due to interventions, potentially making this a valuable research tool. Further studies are indicated to evaluate whether the ability of JM to measure small increments over a wide range of performance may make it more sensitive to intervention-induced changes in function and to correlate JM parameters with hard outcomes such as falls and fractures.