Introduction

Postural instability is a common motor symptom in people with Parkinson’s disease (PD). Approximately half of all people with PD will develop severe gait and balance problems which are often associated with falls, loss of independent community ambulation and eventually decreased quality of life [1, 2].

Postural instability in PD is clinically evaluated with the Movement Disorder Society-Sponsored Revision of the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) [3] pull test (item 3.12 “Postural Stability”). In this test, an examiner stands behind the patient and pulls abruptly on the patient’s shoulders. The pull must be strong enough that the patient requires one or more steps backward to recover balance (“reactive stepping”). Despite its widespread use in clinical settings and clinical trials, the test suffers from a major disadvantage, the variability of the clinician-imposed perturbation in terms of method of delivery, force and duration. Even highly trained examiners in a clinical trial do not perform the test consistently [4]. Such variability of the perturbation will introduce both random noise, and examiner bias which undermines the validity of clinical trials.

Standardized perturbations have been used to study postural instability in laboratory settings, such as surface translation with movable platforms or a direct pull from mechanical actuators [5,6,7,8,9]. These methods are considered to generate more accurate and reliable measurements, but it is unclear if these techniques discriminate postural deficits between people with PD and matched controls better than the simple pull test. Here, we compared balance recovery performance between a group with PD (OFF- and ON-medication) and matched controls in response to a rapid anterior translation of the support surface on a treadmill (imposed backward fall) and during standard clinical pull test. Many balance assessment systems use partial body weight support (BWS) systems to reduce testing difficulty or ensure safety, but BWS might also compromise test sensitivity. Thus, we also examined whether the presence of partial BWS affected performance during the two perturbation tasks. Furthermore, we examined how repeated exposure to a perturbation affected the reactive stepping performance, since a prominent “first trial” effect has been reported for reactive stepping [10,11,12,13].

We hypothesized: (1) the performance of reactive stepping would systematically differ between PD and controls in both perturbation types, (2) differences between the PD and controls would be smaller during ON-medication (ON-meds) state compared to OFF-medication (OFF-meds) state, (3) the surface translation test would distinguish PD vs. Control, and ON-meds vs. OFF-meds at least as well as the standard clinical test, (4) partial BWS would improve the reactive stepping performance of both PD and controls, (5) PD vs. Control and ON-meds vs. OFF-meds effects would be detectable even with partial body weight support, and (6) reactive stepping would improve with repeated exposure within a testing session as well as from one session to the next session.

Methods

We studied 27 individuals including 14 diagnosed with PD by their treating neurologist (3 females) and 13 controls (8 females; see Table 1 for summary demographics and Table S1 for detailed demographics of individuals with PD). No participants had been treated surgically for PD. Exclusion criteria for all participants included medical conditions other than PD that significantly impaired use of lower limbs and dementia or cognitive impairment including documented Montreal Cognitive Assessment (MoCA) [14] or Folstein [15] score < 24. The PD participants were asked to complete two visits, ON-meds and OFF-meds, in random order. For the OFF-meds visit, the PD participants were tested after overnight withdrawal from immediate release PD medications and 24-h withdrawal from sustained release PD medications. Eleven PD participants completed two visits with at least one week apart.

Table 1 Summary of participants demographics

Participants stood on a stationary treadmill (C-Mill, Motek, Amsterdam) with safety harness loosely suspended from an overhead sliding track and an experienced movement disorders specialist (S.C.) behind the participant. Postural perturbation inducing backward reactive stepping was induced using two methods: (1) a rapid anterior translation of the support surface on a treadmilla (“treadmill” condition) or (2) a standard clinical pull test (“clinical” condition). The treadmill perturbation has been previously described [8]. Briefly, after a verbal warning, the treadmill was turned on following a variable time interval to produce servocontrolled anterior belt movement sufficient to evoke 1–2 steps from neurologically normal individuals. A tachometer directly measured belt movement to verify the standard trajectory. All participants required at least one step to recover balance in all treadmill trials. For the clinical perturbation, the pull test was performed by administering a quick tug backwards on the shoulders by the movement disorders specialist in conformance with the MDS-UPDRS specifications.

Falls were prevented using a safety harness loosely suspended from an overhead sliding track and the examiner behind the participant. Safety bars were located on each side of the participant, who was, however instructed not to grab them to regain balance during the testing. Participants started each trial looking at a fixation target (2.7 m away at head level) and with weight evenly distributed between the feet (verified by monitoring real-time center of pressure trajectory). The participants were permitted to step with either foot. Most participants stepped consistently with the same foot on all trials. A total of 48 trials were collected in 3 sets. Each set included 16 trials, beginning with 3 trials of clinical perturbation, followed by 10 trials of treadmill perturbation and ending with 3 trials of the clinical perturbation. The first and third sets were without BWS, while BWS of 20% total body weight was provided during the second (middle) set. Participants were videotaped at 60 fps from the waist down in the frontal plane, and a blinded rater counted the number of steps taken and noted the video frame numbers of liftoff, and touchdown for the first step. The duration of the first step was then calculated from these frame numbers. A total of 1767 trials were collected; 57 trials were missing due to equipment failure and one subject was unable to complete the protocol. A separate analysis of step lengths, extracted from the videos using machine-learning techniques, is the subject of a separate paper [16].

Linear repeated-measures mixed-effect models were applied to the number of steps taken and the first step duration respectively with fixed factors of Group (PD OFF-meds, PD ON-meds, Control), Pull Type (Treadmill vs. Clinical) and BWS (0% BWS vs. 20% BWS) and subject as random factor with age and sex included as covariates. Post-hoc analysis for Group was performed with Tukey Test for multiple pairwise comparisons. The group differences in age and sex were examined using t-test and chi-square, respectively. All statistical analyses were done in R [17], with significance testing by ANOVA (F test, Satterthwaite’s degrees of freedom method) using lme4 and lmeTest libraries. Significance level was set to 0.05 with Holm–Bonferroni correction for multiple tests using “p.adjust” function.

Results

There were no significant differences in age (t(25) = − 0.92, p = 0.37) and sex (χ2 = 2.98, p = 0.08) between the PD and control groups.

Number of steps

Group (F(2,44.5) = 17.08, p < 0.001), Pull Type (F(1,1737.9) = 19.51, p < 0.001) and BWS (F(1,1737.9) = 119.41, p < 0.001) effects were all statistically significant. Post hoc showed PD patients took fewer steps (mean: − 0.34 steps) during ON-meds than OFF-meds (p < 0.001) and control participants took fewer steps (mean -0.65 steps) than PD patients who were OFF-meds (p = 0.04) independent of Pull Type and BWS. Both covariates, age (F(1,23.3) = 0.07, p = 0.80) and sex (F(1,23.5) = 3.48, p = 0.08) were not significant. These results are shown graphically in Fig. 1a.

Fig. 1
figure 1

a Number of steps taken and b first step duration during the clinical and treadmill perturbation tasks for each group. Open circles represent group average values while the vertical lines represent the standard error of the group. Orange is for testing under 0% body weight support (BWS) and blue is for 20% BWS

Repeating the analysis with a Poisson link function (i.e., treating number of steps as an ordinal variable) in a generalized linear mixed model (instead of the standard Gaussian) provided the same result as above, including that PD patients took fewer steps during ON-meds than OFF-meds (p < 0.01). However, with the Poisson link, the difference between control participants and PD patients while OFF-meds did not reach significance (p = 0.06).

The direction of the medication effect was a normalizing one, since PD patients in the OFF-meds state took more steps than Controls, and medication reduced this difference. The PD ON-meds vs. Control comparison was non-significant in all analyses.

To further assess the effects of Pull Type and BWS, we constructed two augmented models, (1) with an additional Group X BWS term, and (2) with an additional Group X Pull Type term. Neither interaction was significant (F(2,1735.8) = 1.57, p = 0.21; F(2,1735.7) = 0.05, p = 0.95), so we retained the null hypothesis that neither Pull Type nor BWS changed the effect of medication on the number of steps. The same result was obtained with the model using the Poisson linking function.

First step duration

Similar results were observed in the first step duration (Fig. 1b) with Group (F(2,43.7) = 5.11, p = 0.01), Pull Type (F(1,1737.4) = 265.85, p < 0.001) and BWS (F(1,1737.4) = 98.02, p < 0.001) all statistically significant. PD patients had a longer first step duration (mean: + 0.01 s) during ON-meds than OFF-meds (p < 0.001) independent of BWS. In the augmented model, Group X BWS was non-significant (F(2,1735.3) = 0.43, p = 0.65); however, the Group X Pull Type interaction was significant (F(2,1735.3) = 9.15, p < 0.001) reflecting greater Group difference with the Treadmill than with the Clinical perturbation. Post hoc showed significant differences in PD patients between ON-meds and OFF-meds only in the Treadmill (p < 0.01). Both covariates, age (F(1,23.0) = 0.89, p = 0.35) and sex (F(1,23.1) = 1.78, p = 0.20) were not significant (see supplementary material for analysis in only PD patients including with clinical characteristics as additional covariates).

Relationship between number of steps and first step duration

To examine whether the first step duration was related to number of steps taken, we augmented the main model of number of steps taken with a term for the first step duration. The result showed a significant relationship in both Gaussian (F(1,1749.5) = 161.98, p < 0.001) and Poisson models (χ2 = 65.54, p < 0.001) independent of the other terms (Age, Sex, BWS, Pull Type, Group). This is shown graphically in Fig. 2 (note that the relationship is only apparent in the PD group because there was very little variance in the number of steps for the Control group).

Fig. 2
figure 2

The relationship between number of steps and first step duration during clinical (top row) and treadmill (bottom row) perturbations. Open circles represent individual average values (orange = 0% BWS, blue = 20% BWS). Regression lines for all subjects in each group are represented as solid lines (0% BWS) and dash lines (20% BWS)

Habituation effect

We frequently observed participants requiring more steps to recover balance initially compared to subsequent trials. This has also been noted by others [10]. To quantify this effect, we augmented our main model with a term for TrialNumber and a TrialNumber X Group interaction term. The TrialNumber was significant (F(1,1736.7) = 60.99, p < 0.001; coefficient value: − 0.013) with a non-significant TrialNumber X Group interaction (F(2,1734.7) = 2.13, p = 0.12). This indicates that participants required more steps initially in a series of trials (Fig. 3a), independent of Group. Pull Type (F(1,1736.84) = 19.78, p < 0.001), BWS (F(1,1736.9) = 123.64, p < 0.001) and Group (F(1,44.42) = 17.24, p < 0.001) remained significant. Results were similar with the Poisson link function (TrialNumber coefficient: − 0.010).

Fig. 3
figure 3

a The relationship between number of steps and trial number. Each blue line represents a fitted regression line for each individual. b Fitted values of number of steps for each visit in PD. The values were estimated from the mixed model including factors of age, sex, pull type, body weight support, medication state and test order. Each blue circle represents individual trial. Even taking into account medication state (ON/OFF) and other variables, number of steps were significantly lower on the second session (asterisk denotes the significant main effect of session number p < 0.001)

Since PD participants were tested on two separate days (for both ON-meds and OFF-meds), we were able to evaluate long-term (weeks) persistence of a practice/learning effect. To examine the effect, we applied the same model on number of steps with a term for session number (i.e., whether the measurements were made at the first or second session) including only PD participants and an interaction for session order and medication state. The session number was significant (F(1,1011.1) = 18.94, p < 0.001; coefficient: − 0.31). The added interaction term was not significant (F(1,7.01) = 1.30, p = 0.29). That is, PD participants tended to take fewer steps on the second session, independent of medication state. This suggests a persistent practice effect (shown graphically in Fig. 3b). Additionally, Pull Type (F(1,1011.2) = 9.99, p = 0.001), BWS (F(1,1011.2) = 63.97, p < 0.001) and medication state (F(1,1011.2) = 28.60, p < 0.001) remained significant. Results were unchanged with the Poisson link function (session number coefficient: − 0.15).

Discussion

This study examined postural instability in PD using two reactive stepping tasks: the standard clinical “pull test” and an imposed translation of the support surface on a treadmill. For both tasks, number of steps taken by the PD patients in the OFF-meds condition was greater than the controls. This finding is consistent with previous studies [6, 9] and demonstrates that both perturbation types produce increased instability in people with PD compared to controls. PD patients also took a shorter duration first step consistently compared to controls. The two metrics (number of steps, duration of first step) were negatively correlated, and behaved very similarly, across all analyses.

The performance of people with PD on both metrics was significantly closer to controls’ in the ON-meds compared to the OFF-meds condition. Dopaminergic medication (primarily levodopa) is first-line therapy to treat PD [18]; however, whether medication improves postural instability in PD remains equivocal [8, 9, 19,20,21,22,23]. In our study, a clear improvement in reactive stepping due to medication was observed. This finding contrasts with the results of de Kam et al. [9] who found no significant improvement in reactive stepping performance with medication. One possible explanation is that our platform perturbation acceleration was greater (5 m/s2 vs. 1.5 m/s2), thus producing greater imbalance and potentially a larger effect size. Discrepancies might also come from differences in the PD cohorts since our group had a slightly higher daily dose of medications (average levodopa equivalent dose 881 mg vs. 740 mg) and were also slightly older (mean: 68 years vs. 65 years). An important possible explanation for why we detected a medication effect is the larger quantity of data in our study (48 measurements in each medication condition, compared to only 4 in de Kam et al.). It is expected that this would increase statistical power to detect a medication effect. In addition, we were able to replicate our surface perturbation results with the standard clinical pull test, which was not reported in de Kam et al., which allows us to generalize our results beyond the specifics of the platform perturbation protocol.

Partial body weight support is widely available in rehabilitation settings, and can make reactive stepping tests easier, safer, and possibly less stressful for the patient. However, easier testing could translate into a “less sensitive” evaluation, i.e., a ceiling effect. In this study, all participants required fewer steps with 20% BWS, indicating the task was easier, but the difference between PD vs. Controls was preserved as was the difference between ON-meds vs. OFF-meds. This suggests that 20% BWS can be used without significantly affecting the sensitivity of the test. Body weight has a strong relationship with postural stability [24]. Thus, providing the extra weight support may reduce the biomechanical constraints due to weight and increase the dynamic stability range [25].

The clinical pull test has the great advantage of requiring no equipment. However, the correct technique is difficult to maintain even for trained and experienced examiners in clinical trials [4]. A recent study [26] using wearable sensors during a clinical pull test showed that postural responses between hydrocephalus patients from healthy controls could be distinguished when the magnitude of the applied perturbation was systematically varied across multiple trials. Similar to our findings, a relationship between pull intensity and step length was observed and the slope of this relationship was significantly different between groups. This approach requires multiple trials of various pull intensities and, as we observed, may require accounting for the practice/learning effect over trials.

A treadmill perturbation does not require a trained and experienced examiner and/or additional wearable sensors and may allow for a more consistent perturbation. While the “clinical standard” pull test performed well compared to the treadmill perturbation, the treadmill perturbation may have performed better than the pull by the shoulders. Specifically, although both types of perturbation performed similarly for the number-of-steps metric, the secondary metric (i.e., first step duration) showed a difference between ON-meds vs. OFF-meds with the treadmill perturbation, but not the pull test. This may reflect underlying biomechanical differences between the types of perturbation, but it could also be due to the greater consistency (lower variability) of the treadmill perturbation. Testing variability may have been further reduced by the larger number of treadmill trials than shoulder-pull trials; however, the clinical standard pull test does not allow an indefinite increase in the number of trials [3]. Note that in our protocol, the pull test was always performed by the same experienced movement disorder specialist and 6 sets of 3 trials were tested. Thus, we are conservatively underestimating the variability of the clinical pull test when the single-examiner constraint is not enforced and only the usual smaller number of trials are performed. Substituting a mechanical actuator for a human examiner is a natural way to standardize the pull test perturbation, and our results support its effectiveness, We used a treadmill as our testing modality because treadmills are widely available in gait labs and even some clinical (rehabilitation) settings, which requires no specialized or specially constructed equipment. Our results should be generalized with caution since we used one particular make and model of treadmill. The protocols and perturbation characteristics provided by other manufacturers and models of treadmill may be different and may not yield the same results.

We compared the “first step duration” to the classical “number of steps” metric and found similar performance. Both metrics were obtainable from video, but the first step duration is potentially more automatable using a motion analysis system, pressure-sensitive shoe insert, or forceplate (see our previous study [27]). We expected greater sensitivity with the more nearly continuous “duration of the first step” metric than with the coarser, integer “number of steps.” In fact, the duration metric performed equally well as the number of steps. This may reflect that our “number of steps” was averaged over large number of individual trials, giving a quantity which was more fine-grained than the individual integer values. Given its good performance, especially in combination with the treadmill perturbation, the duration of first step metric can serve as an alternative when averaging a large number of integer count of steps measurements is not feasible. One study limitation is that our main analysis omitted step latency for the shoulder perturbation because video recordings were from the waist down. This was aimed to maximize resolution of the lower extremities and avoid capturing participants' faces in order to preserve participant's anonymity. As a result, shoulder perturbation onset was not available from video data. Nonetheless, these experiments were done with an accelerometer on the dorsal surface of the examiner's hand, allowing us to detect the timing of onset of the experimenter’s application of force to the shoulder, in a total of 508 trials. For an analysis of shoulder perturbation step latency, see Supplementary Material and Fig. S1.

Previous studies have shown that PD patients and controls generate greater postural responses in the first trial of perturbation [10, 11]. This effect was also observed in the current study. Nanhoe-Mahabier et al. suggested that the habituation rate is slower in PD compared to controls [10]. We were not able to replicate this finding since the Group X TrialNumber interaction was not significant. Several factors could account for this, including differences in the perturbation protocol, outcome measures and demographics of the sample population. Importantly, the current study demonstrates that group differences in reactive stepping performance can be detected when controlling for the first step response (significant Group effect even with TrialNumber included in the model). Furthermore, we demonstrated a practice/learning effect over a longer time window (weeks apart) in people with PD (significant effect of visit number independent of medication state). This result supports the practice of randomizing the order of ON/OFF—medication testing.

Conclusions

In summary, people with Parkinson’s disease performed reactive stepping better in the ON-medication compared to the OFF-medication state, over a range of measurement techniques, including the standard clinical pull test. Both the clinical pull test and treadmill surface translation protocols induced an increased number of steps and reduced first step duration in PD, but the surface translation task performed slightly better in differentiating between conditions (OFF vs ON-meds). The additional 20% body weight support did not substantially degrade test performance. Step counts and first step duration were both useful metrics for evaluating reactive balance.

Supplier

Treadmill: Motek medical, Vleugelboot 14, 3991 CL Houten, Netherlands.