FormalPara Key Points

The results from the present systematic review and meta-analysis demonstrate that endurance training (ET) is effective in decreasing motor signs in individuals with Parkinson’s disease (PD), regardless of the control condition used as a comparator. This decrease exceeds the moderate range of clinically important changes to motor signs suggested for PD.

Although considerable heterogeneity was observed between RCTs, some moderators that increased the effect of ET on motor signs decreased the heterogeneity of the analyses, such as combined endurance and physical therapy training, intensity of ET based on treadmill speed, and high-intensity ET based on self-perceived exertion rate.

Our findings show that the significant effects of ET on motor signs are not based on randomized controlled trials with low methodological quality, which strongly supports the necessity of using high-quality randomized controlled trials in future studies to further understand the effect of ET on motor signs of PD.

The findings of this meta-analysis are an important contribution to establishing clinical guidelines for ET prescription among individuals with PD. Although questions remain on the dose–response relationship between ET and reduction in motor signs, higher ET dosage (up to 64 weeks and up to 5760 min of exercise [number of sessions × session duration]) presented higher effect sizes.

1 Introduction

Parkinson’s disease (PD) is a complex and progressive neurodegenerative disorder that affected approximately 6.1 million individuals worldwide in 2016 and may affect 13 million individuals by 2040 [1]. PD comprises a range of motor signs, such as tremor, bradykinesia, hypokinesia, akinesia, rigidity, gait disorders, and postural disturbances. The severity of the motor signs is routinely evaluated by both the part III motor subscale of the Unified Parkinson’s Disease Rating Scale (UPDRS-III) [2] and the Movement Disorder Society (MDS) revision of the UPDRS, called the MDS-UPDRS [3], which are gold standard clinical assessments for PD motor signs. The motor impairments progress rapidly, at an annual rate of between 1.5 and 8.9 points in the UPDRS-III scores [4, 5] and between 1.8 and 4.2 points in the MDS-UPDRS-III [6, 7]. Thus, therapeutic strategies that are able to decrease the UPDRS-III and MDS-UPDRS-III scores are crucial for individuals with PD, as worsening of motor signs is the main cause of disability in this population [4].

Currently, there is no disease-modifying treatment for PD and only dopaminergic replacement therapy through medication and deep brain stimulation significantly reduce the signs. However, these treatments cause adverse effects and have decreased effectiveness over time [8, 9]. On the other hand, physical exercise has been deemed a treatment that could modify the course of PD; in animal models, physical exercise, specifically endurance exercise, promoted neurogenesis and neuroprotection [10,11,12]. It can also reduce the signs of the disease and cause positive neurophysiological changes. In individuals with PD, 36 sessions of endurance training (ET) increased caudate dopamine release and ventral striatal activation [13]. Additionally, ET increases corticomotor excitability and the levels of brain-derived neurotrophic factor, and changes brain grey matter volume [14].

Several randomized controlled trials (RCTs) have investigated the effects of ET on motor signs as assessed by UPDRS-III [15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33] and MDS-UPDRS-III [13, 34, 35]. To date, four meta-analyses have aimed at identifying effects of ET on motor signs of PD [36,37,38,39]. These studies have included a low number of RCTs (up to 8) [36,37,38,39] or focused only on one type of ET protocol [36,37,38,39]. However, RCTs have used different ET protocols, such as stationary cycling training [13, 28, 29, 35], treadmill training [15, 19, 20, 22, 24, 30,31,32,33,34, 40, 41], bodyweight-supported treadmill training (BWSTT) [23, 42, 43], walking training [42], Nordic walking training [17, 21, 26], combined endurance and physical therapy training (CEPTT) [16, 18, 27, 44, 45], low-intensity ET [23], moderate-intensity ET [18, 30, 31], and high-intensity ET [13, 15, 20, 21, 23, 26, 28,29,30,31,32,33,34,35, 41]. Nevertheless, to the best of our knowledge, no meta-analysis has yet attempted to determine if the different ET protocols (aforementioned) and variables related to ET dosage (frequency, training period, and number and duration of sessions) affect the ET-induced changes to motor signs assessed in ON- or OFF-medication state.

Therefore, in this systematic review and meta-analysis we investigated the effects of ET, compared with nonactive and active control conditions, on motor signs of individuals with PD. A secondary purpose was to test if discrete moderators (e.g., active and nonactive control groups, CEPTT and isolated ET, moderate and high intensity), and a continuous meta-regressor for variables related to ET dosage (e.g., frequency, training period, and number and duration of sessions) modulate the training-induced effects on motor signs.

2 Methods

2.1 Protocol

The meta-analysis followed the recommendations of the ‘Preferred Reporting Items for Systematic Reviews and Meta-Analyses’ (PRISMA) [46].

2.2 Search Strategy and Selection Criteria

Consistent with PRISMA’s suggestions, literature searches were conducted in the following five computerized databases from the earliest record up to March 2021: PubMed, ISI’s Web of Knowledge, Cochrane’s Library, Embase, and EBSCOhost. The following combination of keywords was used:

P = “Parkinson’s disease” OR “Parkinson” OR “PD”.

I = “aerobic training” OR “treadmill training” OR “bike training” OR “walk training” OR “Nordic walking” OR “body-weight supported treadmill training”.

C = “control group” OR “control” OR “active control” OR “no intervention”.

O = “motor signs” OR “motor symptoms” OR “motor severity” OR “UPDRS-III” OR “MDS-UPDRS-III” OR “motor subscale”.

S = “randomized controlled study” OR “randomized trial” OR “RCT”.

The search was limited to English language. All the identified and retrieved electronic search titles, selected abstracts, and full-text articles were independently evaluated by two of the authors (FOA and VS) to assess their eligibility. In case of disagreements, a consensus was adopted or, if necessary, a third reviewer evaluated the article (CSB).

2.3 Eligibility Criteria

The inclusion criteria were RCTs that (i) used ET as an intervention for individuals with PD; (ii) performed at least 20 min per session of continuous and rhythmic exercise, according to the American College of Sports Medicine (ACSM) recommendation for endurance exercise [47, 48]; (iii) assessed PD motor signs by UPDRS-III or MDS-UPDRS-III; (iv) compared ET versus a control group exposed to any type of intervention (active) or no intervention (nonactive); and (v) had a crossover design if the study presented a washout period long enough to rule out carryover effects [49] or if the study performed statistical analyses to rule out carry over effects [50]. The exclusion criteria were as follows: (i) non-RCT (e.g., cross-sectional studies); (ii) did not assess PD motor signs by UPDRS-III or MDS-UPDRS-III scores; (iii) performed other physical exercise interventions such as progressive resistance training, balance training, and aquatic therapy; and (iv) non-physical training interventions such as electrical/magnetic stimulation, virtual reality, walking-assist device, and robotic gait training.

2.4 Definition of Endurance Training (ET) and Classification of the Training Intensity

We included RCTs that used ET, which is defined by the ACSM guideline as a continuous and rhythmic exercise sustained for a period of time that requires a substantial activation of large skeletal muscles [47], such as treadmill walking or running, stationary cycling, and Nordic walking training.

Training intensities < 40% of heart rate reserve (HHR) or < 55% of maximum heart rate (HRmax), 40–60% of HHR or 55–69% of HRmax, and ≥ 60% of HHR or ≥ 70% of HRmax were considered as low, moderate, and high intensities, respectively, following ACSM’s guidelines for adult individuals without [51] or with risk classification based upon the presence or absence of cardiovascular disease risk factors, signs or symptoms, and/or known cardiovascular, pulmonary, renal, or metabolic disease [52].

Training intensities based on alternative methods of prescribing exercise intensity were also included, such as self-selected intensity [53], intensity based on treadmill speed [54], self-perceived exertion rate [55], and intensity adjusted according to the 6-min walk distance [56].

2.5 Data Extraction

Two reviewers (FOA and VS) separately and independently assessed the articles and extracted data. If the full text did not provide sufficient information regarding the inclusion criteria, we contacted study authors to obtain missing data or additional information. The following information was mandatory: UPDRS-III or MDS-UPDRS-III as primary or secondary outcome, authors of the study, year of publication, design of the trial, intervention characteristics, number of participants randomized to treatment arm, demographics, anthropometric and clinical characteristics, adverse events, sample (individuals with PD with, without freezing of gait [FOG]), drug regimen during the experimental period, motor signs obtained in the ON and/or OFF-medication state, isolated ET, CEPTT, BWSTT, training device (treadmill, stationary bike, Nordic walking), supervised ET and training location (facility, community, and home-based), intensity (low, moderate, and high), and variables related to training dosage (frequency, training period, number and duration of sessions). Two reviewers (FOA and VS) independently recorded 100% of the articles. Coder drift was assessed by randomly selecting 30% of the RCTs for recording by a separate investigator (CSB). A mean agreement of 0.90 was the required level of reliability in the coding procedures.

2.6 Assessment of Methodological Study Quality in Included Trials

We used the Physiotherapy Evidence Database (PEDro) scale to assess the methodological quality of the included RCTs [57]. The PEDro scale has 11 items but the first item (eligibility criteria) is used to establish external validity; thus, it is not included in the overall score. The PEDro scale rates trials on a scale from 0 (low quality) to 10 (high quality). Although a score of ≥ 6 represents a cut-off for high-quality trials [57], we did not exclude studies based on low quality. Two researchers (FOA and CSB) independently assessed the methodological study quality in the included trials. There is good inter-rater reliability with an intra-class correlation coefficient of 0.7 when using consensus ratings generated by two or three independent raters [57].

2.7 Statistical Analyses

As implemented in a previous publication [58], we used a between-groups, pre- to post-intervention meta-analytic design based on standardized mean differences (Hedges’ g), and random-effects models due to the high heterogeneity between trials (I2 = 74%, n = 27). Hedges’ g, with 95% confidence interval (CI), is a variation of Cohen’s d that corrects for biases attributed to small sample size [59]. As none of the included trials presented pre- to post-intervention correlations, we estimated correlation values for each trial (Eq. 1).

$$r = \left( {S_{{{\text{pre}}}}^{2} + S_{{{\text{post}}}}^{2} - S_{{\text{D}}} } \right) \div 2 \times \left( {S_{{{\text{pre}}}} \times S_{{{\text{post}}}} } \right),$$
(1)

S stands for standard deviation, and SD is the standard deviation of the difference score (pre- to post-intervention, Eq. 2).

$$S_{{\text{D}}} = \sqrt {\left( {S_{{{\text{pre}}}}^{2} } \right) \div n} + \left( {S_{{{\text{post}}}}^{2} \div n} \right).$$
(2)

In cases in which RCTs had more than one experimental group, pre- to post-intervention correlation values were defined as 1 divided by the number of experimental groups.

In RCTs that presented data as mean and standard error (SE) [27, 30, 35, 41, 45], we converted SE to standard deviation (SD) as follows: \({\text{SD}} = \sqrt {\text{sample size}} \times {\text{SE}}\) [60].

Effect sizes were considered small, moderate, and large if values ranged from 0.2 to 0.49, 0.5 to 0.79, and > 0.8, respectively [61].

Heterogeneity (I2) values between 0 and 40% indicate no heterogeneity, I2 between 30 and 60% indicate moderate heterogeneity, I2 between 50 and 90% indicate substantial heterogeneity, and I2 between 75 and 100% indicate considerable heterogeneity [62]. Heterogeneity in meta-analysis is an indicator of the variation in study outcomes between RCTs. The I2 value represents the percentage of variation across studies that is due to heterogeneity rather than chance [62].

Publication bias was verified via a funnel plot, Kendall’s tau with continuity, and Egger’s regression indices. In case of significant publication bias, the fill and trim procedure was implemented [58]. In addition, a sensitivity analysis was carried out to identify the presence of highly influential studies that might bias the analyses [58]. Studies were considered influential if their removal significantly changed the summary effect (i.e., change from significant to non-significant).

Finally, all moderators were implemented as categorical variables and continuous variables related to ET dosage (frequency, training period, number and duration of sessions) were tested as meta-regressors.

Meta-analytic findings were calculated using the Comprehensive Meta-analysis version 3.0 software (Biostat Inc., Englewood, NJ, USA). The significance level was set at p ≤ 0.05.

3 Results

3.1 Study Selection

Our systematic literature search retrieved 1032 studies, 756 remained after duplicates were removed. Titles and abstracts were read and 720 were excluded. The remaining 36 studies were fully read. Nine were excluded because they were not RCTs [63,64,65,66], did not assess motor signs using UPDRS-III or MDS-UPDRS-III scores [67], presented duplicated data from a previous RCT [42], did not use continuous and rhythmic ET intervention [47], or we did not have data access [68, 69]. Finally, 27 RCTs [13, 15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35, 40, 41, 43,44,45] were included in the meta-analysis and systematic review. The search process is depicted in Fig. 1.

Fig. 1
figure 1

Diagram showing the search process. ISI Institute of Scientific Information, MDS-UPDRS-III Movement Disorders Society Unified Parkinson’s Disease Rating Scale part III motor subscale score, RCT randomized controlled trials, UPDRS-III Unified Parkinson’s Disease Rating Scale part III motor subscale score

3.2 Study Characteristics

A detailed summary of the RCTs and participants’ characteristics is presented in Table 1. All the RCTs, with the exception of two cross-over studies [18, 45], had parallel-group designs. Six studies were conducted in the USA [15, 23, 28, 30, 31, 41], four in Brazil [20, 26, 32, 33], four in Italy [18, 21, 22, 44], three in Canada [13, 29, 34], two in Germany [24, 43], two in Japan [27, 45], one in Turkey [16], one in Korea [17], one in Australia [19], one in Taiwan [40], one in India [25], and one in the Netherlands [35].

Table 1 Summary characteristics of the 27 included studies

Most of the RCTs (16) used motor signs as the primary outcome [17, 18, 20,21,22,23, 25,26,27,28,29, 31,32,33,34,35]. In addition, the majority of the RCTs had relatively small sample sizes ranging from five [28] to 65 participants [35].

Of the 27 RCTs, two included only individuals with PD and FOG [22, 27], only one included participants with de novo PD (without drug therapy) [31], three assessed motor signs by MDS-UPDRS-III [13, 34, 35], four assessed motor signs in OFF-medication state [13, 28, 31, 35], six had a nonactive control group as a comparator arm [15, 19, 21, 25, 29, 31], two used supervised home-based ET [19, 35], and one used community-based ET without direct supervision [31].

Four RCTs used stationary cycling training [13, 28, 29, 35], 12 RCTs used treadmill training [15, 19, 20, 22, 24, 30,31,32,33,34, 40, 41], three RCTs used BWSTT [23, 42, 43], one RCT used walking training [42], three RCTs used Nordic walking training [17, 21, 26], and five RCTs used CEPTT [16, 18, 27, 44, 45].

Low-intensity ET (< 40% of HHR or < 55% of HRmax), was used in one study [23], moderate-intensity ET (40–60% of HHR or 55–69% of HRmax) was used in three RCTs [18, 30, 31], and high-intensity ET (≥ 60% of HHR or ≥ 70% of HRmax) was used in 15 RCTs [13, 15, 20, 21, 23, 26, 28,29,30,31,32,33,34,35, 41]. Three RCTs used self-selected intensity [17, 42, 43], four RCTs used intensity based on treadmill speed [22, 27, 44, 45], two RCTs used self-perceived exertion [24, 40], and two RCTs used intensity adjusted according to the 6-min walk distance [16, 19].

Finally, the majority of the RCTs did not observe serious adverse events related to ET, as detailed in Table 1.

3.3 Overall Effect of Aerobic Training on Motor Signs

Twenty-seven RCTs [13, 15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35, 40, 41, 43,44,45] totaling 1152 participants (575 from experimental and 577 from control conditions; mean age 64.7 years, mean duration of PD 5.4 years, and mean motor severity scores of 23.1 for UPDRS-III and 23.6 for MDS-UPDRS-III) examined the effect of ET versus active and nonactive control conditions on motor signs. The sensitivity analysis revealed that the motor sign results were not highly affected by any study (data not shown). No study was considered influential as removal did not significantly change the summary effect (i.e., change from significant to non-significant). In addition, funnel plot analysis did not reveal the presence of any influential studies, requiring no data imputation (Fig. 2).

Fig. 2
figure 2

Funnel plot of studies comparing motor signs response (UPDRS-III or MDS-UPDRS-III scores) to endurance training vs nonactive and active control conditions. The standard error of the estimates is displayed on the y-axis and Hedges’ standardized difference in means on the x-axis. On the x-axis, the open diamond represents the Hedges’ g overall standardized difference in means and the filled diamond represents the change in the overall Hedges’ g standardized difference in means after the trim and fill procedure. MDS-UPDRS-III Movement Disorders Society Unified Parkinson’s Disease Rating Scale part III motor subscale score, UPDRS-III Unified Parkinson’s Disease Rating Scale part III motor subscale score

Figure 3 shows the subgroup and the overall effect (average of g values) of ET on motor signs across RCTs. There was a small effect (g =  − 0.42, 95% CI − 0.65 to − 0.19; p = 0.000) of ET on motor signs when compared with nonactive and active control conditions (Fig. 3). Although there was a considerable heterogeneity in the variance between the 27 RCTs (I2 = 74%), 11 RCTs [16, 17, 20, 21, 24, 28, 29, 40, 41, 44, 45] reported significant effects of ET on motor signs, which represented a moderate clinically significant reduction on UPDRS-III scores of − 8.0 (within groups) and − 6.8 points (between groups) when visually analyzing these RCTs (Table 2). In addition, several moderators described in Sect. 3.5 decreased the heterogeneity across RCTs.

Fig. 3
figure 3

Forest plot displaying the overall standardized mean difference (Hedges’ g) of motor signs response (UPDRS-III or MDS-UPDRS-III scores) to endurance training vs nonactive and active control conditions. The data are shown as Hedges’ g, standard error (SE), lower 95% confidence interval (CI), upper 95% CI, Z-value, and p value of the point estimates. BWSTT body-weight supported treadmill training, MDS-UPDRS-III Movement Disorders Society Unified Parkinson’s Disease Rating Scale part III motor subscale score, PP postural perturbation, UPDRS-III Unified Parkinson’s Disease Rating Scale part III motor subscale score

Table 2 Motor symptoms in the pre- and post-test assessments for each group across the 27 included studies. Mean (SD) and mean difference (MD) are shown

3.4 Methodological Study Quality

Even although only RCTs were included, Table 3 shows that the quality scores averaged 5.8 ± 1.4 points (range 4–8), which is indicative of low-quality methodological studies. Only 37% of the RCTs (10 studies) exceeded the pre-determined cut-off score of 6 points, indicating good quality [16, 17, 19, 24, 29, 31, 32, 35, 40, 41].

Table 3 Physiotherapy Evidence Database (PEDro) scores of the 27 included studies

3.5 Moderators

PEDro score Table 4 shows that only RCTs with high methodological quality have a moderate effect of ET on motor signs versus nonactive and active controls (g =  − 0.50, 95% CI − 0.79 to − 0.21; p = 0.001), although in the presence of substantial heterogeneity (I2 = 71%, n = 10).

Table 4 Potential moderators of the effect of endurance training on motor signs for the 27 included studies

Type of scale Table 4 shows that only ET versus nonactive and active controls has a moderate effect (g =  − 0.50, 95% CI − 0.78 to − 0.22; p = 0.000) on motor signs assessed by UPDRS-III, which presented a substantial heterogeneity (I2 = 76%, n = 24).


ON/OFF-medication state Table 4 shows that only ET versus nonactive and active controls has a small effect (g =  − 0.44, 95% CI − 0.70 to − 0.17; p = 0.001) on motor signs assessed in ON-medication state, which presented a substantial heterogeneity (I2 = 75%, n = 24).


Sample Table 4 shows a small (g =  − 0.35, 95% CI − 0.61 to − 0.10; p = 0.006) and a moderate effect (g =  − 0.69, 95% CI − 1.20 to − 0.18; p = 0.007) of ET versus nonactive and active controls on motor signs of individuals with PD without and with FOG, respectively. Samples composed of individuals with PD presented substantial heterogeneity (I2 = 75%, n = 21), while samples composed of individuals with PD and FOG presented no heterogeneity (I2 = 0%, n = 2).


Type of control group Table 4 shows a moderate (g =  − 0.68, 95% CI − 1.07 to − 0.29; p = 0.001) and a small effect of ET on motor signs (g =  − 0.33, 95% CI − 0.61 to − 0.05; p = 0.018) versus nonactive and active controls, respectively. The type of control condition did not reduce the heterogeneity between studies for nonactive controls (I2 = 63%, n = 8) or for active controls (I2 = 75%, n = 19).


Training mode Table 4 shows a small (g =  − 0.29, 95% CI − 0.54 to − 0.05; p = 0.016) and a moderate effect (g =  − 0.76, 95% CI − 1.16 to − 0.36; p = 0.000) for treadmill training and CEPTT versus nonactive and active controls on motor signs, respectively. Treadmill training presented moderate heterogeneity (I2 = 53%, n = 12), while CEPTT presented no heterogeneity (I2 = 21%, n = 5).


Training location Table 4 shows that only supervised facility-based ET versus nonactive and active controls has a small effect (g =  − 0.48, 95% CI − 0.78 to − 0.18; p = 0.002) on motor signs, which presented substantial heterogeneity (I2 = 77%, n = 24).


Training intensity Table 4 shows a large effect for self-selected intensity (g =  − 0.95, 95% CI − 1.63 to − 0.27; p = 0.006), a moderate effect for intensity based on the treadmill speed (g =  − 0.79, 95% CI − 1.47 to − 0.10; p = 0.024), and a large effect for self-perceived exertion (g =  − 1.30, 95% CI − 2.25 to − 0.35; p = 0.007) of ET versus nonactive and active controls. Self-selected intensity presented a considerable heterogeneity (I2 = 85%, n = 3), while intensity based on treadmill speed and self-perceived exertion presented no heterogeneity (I2 = 0%, n = 4, and I2 = 33%, n = 2, respectively).

3.6 Meta-Regression

Table 5 shows individual and averaged values of the variables related to ET dosage across RCTs. The RCTs included an average of 3.3 sessions per week (frequency), 11.5 weeks of training (training period), 36.4 training sessions (number of sessions), and 41.3 min of session (duration of session).

Table 5 Values of the variables related to dosage of endurance training for each included study. Mean (SD) are shown

Table 6 shows that no variable related to ET dosage significantly predicted the decrease in motor signs (p > 0.05).

Table 6 Meta-regression of the 27 included studies to predict endurance training effects on motor signs

4 Discussion

4.1 Summary of Overall Evidence for Benefit of ET

To the best of our knowledge, this is the first systematic review with meta-analysis of RCTs assessing the effect of ET on motor signs in individuals with PD and testing for the significance of eight moderators and four meta-regressors. Data from 27 RCTs including 1152 participants were analyzed. Our results provide evidence that ET is effective in decreasing the motor signs regardless of the control condition used as a comparator. Although considerable heterogeneity was observed between RCTs (I2 = 74%, n = 27), the moderators that increased the effect of ET on motor signs also decreased the heterogeneity of the analyses, such as CEPTT (I2 = 21%, n = 5), intensity based on treadmill speed (I2 = 0%, n = 4), self-perceived exertion (I2 = 33%, n = 2), and studies composed of individuals with PD and FOG (I2 = 0%, n = 2). Finally, the meta-regression did not show significant relationships between variables related to ET dosage (frequency, training period, and number and duration of sessions) and observed changes in motor signs.

4.2 Improvement in Motor Signs After ET is Greater Than the Clinically Important Changes for Parkinson’s Disease (PD)

Our meta-analysis showed that ET decreases UPDRS-III scores (g =  − 0.42) when compared with the average of both control conditions (Fig. 3). When visually analyzing the within- and between-groups average of the RCTs that presented significant effects of ET on motor signs (see Table 2), we observed decreases of − 8.0 points and − 6.8 points on UPDRS-III scores, respectively. These decreases exceed the moderate range of clinically important changes to UPDRS-III scores of − 4.5 to − 6.7 points suggested for PD [70], which demonstrates that ET is effective in decreasing motor signs. Thus, the findings of this meta-analysis are an important contribution to establishing clinical guidelines for ET prescription among individuals with PD when improvements in motor signs are the major goal.

4.3 Improvement in Motor Signs Depends on High-Quality Methodological Randomized Controlled Trials (RCTs)

Ten RCTs [16, 17, 19, 24, 29, 31, 32, 35, 40, 41] had high methodological quality (PEDro scores > 6); of these, six RCTs [16, 17, 24, 29, 40, 41] showed significant effects of ET on motor signs. Our moderator analysis showed that the effect of ET on motor signs was higher (g =  − 0.50) when analyzing only high-quality methodological RCTs, but not when analyzing RCTs that had low methodological quality (g =  − 0.31). High-quality RCTs in clinical research are the primary source for evidence on the safety and efficacy of clinical interventions, and help avoid or diminish risk of bias in these trials [71]. A high methodological quality study, according to the PEDro scale, must positively respond to a series of indicators of strong internal validity (e.g., allocation concealment, random allocation, blinding of assessors and therapists), which might indicate relevance of findings for clinical practice. Thus, our findings show that the significant effects of ET on motor signs are not based on RCTs with low methodological quality; this strongly supports the necessity of using high-quality RCTs in future studies to verify the effect of ET on motor signs of PD.

Interestingly, all included RCTs showed a lack of blinded therapists/trainers and participants, which is consistent with previous RCTs of rehabilitation [72]. Blinding for therapists and participants is frequently reported in trials involving pharmacological interventions [73], but the nature of rehabilitation interventions (e.g., exercises and devices) makes it impossible to blind therapists and participants. Although blinding is also used to prevent performance bias associated with participants’ and research teams’ expectations, its influence on the treatment effect in RCTs is still unclear. For example, a meta-epidemiological study showed that trials with inappropriate blinding of participants tended to underestimate treatment effects when compared with studies with appropriate blinding of participants [72]. However, a recent large meta-epidemiological study found no evidence for an average difference in estimated treatment effect between trials with and without blinded participants, which indicates that blinding is less important than often thought [74]. As in the rehabilitation field, these findings are still inconclusive due to small sample sizes, heterogeneity of datasets, and lack of/poor reporting of bias [49, 72]. As such, the implementation of creative solutions to avoid performance and detection bias is still required. Some recommendations to minimize this performance bias include blinding of the testing staff to group allocation, withholding information on the hypothesized intervention efficacy from the study participants, and standardizing uniform data collection throughout all study phases [49].

4.4 ET has Effects on Motor Signs Assessed by UPDRS-III

Most of the studies (942 participants, n = 24) used the UPDRS-III scale, which increased the effect of ET on motor signs (g =  − 0.50). Our findings corroborate with published meta-analyses that used few studies (7 studies) and showed positive effects of ET on UPDRS-III scores [38, 75]. Regarding the MDS-UPDRS-III scale, although the MDS-UPDRS Task Force revised and expanded the UPDRS in 2008 [3], there is still limited published evidence regarding its measurement performance on motor signs in RCTs of ET [13, 34, 35]. Only three RCTs (210 participants) assessed motor signs using MDS-UPDRS-III scores as the primary outcome [13, 34, 35], with no significant effect (g =  − 0.06). One used the MDS-UPDRS-III score as a secondary outcome since the primary outcome was the UPDRS-III score [31]. Future RCTs using MDS-UPDRS-III to assess motor signs are needed.

4.5 Improvement in Motor Signs Occurs in the ON-Medication State

Most of the RCTs (820 participants, n = 24) assessed motor signs in the ON-medication state. Our moderator analysis showed a small effect of ET on motor signs assessed in the ON-medication state (g =  − 0.44), which agrees with two previous meta-analyses that only included a few studies (7 studies) and identified positive effects of ET on UPDRS-III scores assessed during the ON-medication state [38, 75]. Our moderator analysis shows that ET decreases UPDRS-III scores in the ON-medication state, which demonstrates the clinical relevance of ET, as the positive change of ET on motor signs was found over and above medication. This result is interesting because a single levodopa dose is known to decrease UPDRS-III scores [76], while one vigorous-exercise session does not decrease the UPDRS-III scores [77]. Our results demonstrate that long-term ET can decrease UPDRS-III scores in the ON-medication state, which reflects the impact of ET on an individual’s condition in real life [39, 78]. In addition, 12 weeks of ET is able to increase caudate dopamine release [13] and 24 weeks of ET can reduce disease progression in untreated individuals with PD [31]. Thus, one may suggest that ET would modify the course of disease and/or the need for levodopa. Further studies are required to investigate if ET should be indicated for newly diagnosed individuals with PD to slow progression of motor signs without the need for medication (e.g., levodopa).

Interestingly, our moderator did not show a significant effect of ET on motor signs in the OFF-medication state, which reflects the true disease state and minimizes medication confounding effects (e.g., fluctuations in the motor signs due to the wearing-off phenomenon) [35, 39]. The small number of studies included in our moderator analysis may have influenced our results, as only four RCTs (332 participants) assessed the motor signs in the OFF-medication state [13, 28, 31, 35]. Thus, more studies are needed to assess the effect of ET on motor signs in the OFF-medication state.

4.6 Individuals with PD and Freezing of Gait (FOG) are More Benefited from ET

Most of the RCTs included individuals with PD but without FOG (1092 participants, n = 21), and presented a small effect of ET on motor signs (g =  − 0.35) compared with active and nonactive control conditions. Although only two RCTs [22, 27] included individuals with PD and FOG in the sample (60 participants), our moderator analysis showed that the effect of ET on motor signs increased (g =  − 0.69) when these two RCTs were analyzed, which presented no heterogeneity (I2 = 0%).

Individuals with PD and FOG have greater motor severity than those without FOG [69]. Thus, we suggest that individuals with FOG show a greater effect from ET than those without FOG due to a larger window for the improvement of motor signs. A recent meta-analysis demonstrated that treadmill training is effective in decreasing FOG severity [79]. Here, we expand these findings for motor signs of individuals with FOG. One study used treadmill training with cues (visual and auditory) [22], which is effective in improving motor signs, gait parameters and FOG, as auditory, visual and somatosensory cues, coupled with the treadmill, may compensate for poor internal rhythm of the basal ganglia as well as enhance motor skill learning through task-specific repetition [80, 81]. Another study that used BWSTT with physical therapy showed an improvement in motor signs in individuals with FOG with mild-to-severe PD [27]. BSWTT allows safe walking practice by supporting a portion of the body weight mechanically while performing treadmill training [82]. Thus, BWSTT might enhance longer durations of exercise, reducing energy cost of walking and fatigue, increasing walking speed, and decreasing fall risk [16, 25, 42, 44, 83]. These benefits from BSWTT can be important for individuals with severe FOG who are less responsive to exercise training [84, 85]. Thus, BWSTT may be important for individuals with severe PD and FOG, but implementing BWSTT protocols in clinical practice may be challenging as it is costly and requires well-structured facilities. Feasible, viable, and secure low-cost intervention, easily transferrable to clinical practice (e.g., treadmill training with cues), should be used for individuals with FOG to improve their motor signs. Although these results did not present heterogeneity, they should be interpreted with caution because we obtained effect sizes from two studies only.

4.7 Effects of ET on Motor Signs is Higher When Compared with Nonactive Control Condition

Only six out of 27 RCTs (266 participants) [15, 19, 21, 25, 29, 31] used a nonactive control group as a comparator. Nonactive control groups help to clarify the actual magnitude of exercise interventions, as our moderator analysis shows that ET compared with nonactive control groups increased the effect on motor signs (g =  − 0.68). Although our moderator analysis showed that the effect of ET on motor signs increased when compared with the nonactive control condition, the use of nonactive control groups may be ethically unviable because individuals with PD experience deterioration in functionality over time [86]. Thus, we suggest that future RCTs should use active control groups as a comparator, even though ET compared with active control groups decreases the effect on motor signs (g =  − 0.34). These results are expected since active control conditions, in fact, are traditionally interventions already known to be effective and would reduce differences between groups [87]. It is important to note that active control interventions can cause contamination because therapists can gradually change the content of the training to resemble the intervention of the experimental group; thus, study protocols should make sure to avoid this in order to maintain good methodological quality [86].

4.8 Improvement in Motor Signs Depends on Specific ET Modes

Only treadmill training (g =  − 0.29) [15, 19, 20, 22, 24, 30,31,32,33,34, 40, 41] and CEPTT [16, 18, 27, 44, 45] (g =  − 0.76) showed significant effects on motor signs, but only CEPTT presented no heterogeneity (I2 = 21%, n = 5).

Twelve out of 27 RCTs (850 participants) used treadmill training [15, 19, 20, 22, 24, 30,31,32,33,34, 40, 41]. Treadmill training causes a proprioceptive cueing effect due to backward motion of the belt, which may help individuals with PD to improve the swing phase during the gait cycle and to drive the stepping pattern [88]. Treadmill training allows programing of a wide range of speeds, allowing individuals with PD to exercise at slow and fast speeds [88], which can increase gait speed and stride length and decrease bradykinesia and rigidity. Treadmill training, but not free walking training, is able to maintain the cadence of individuals with PD compared to healthy controls [89]. This is important because decreased cadence is a potential risk factor for falls in this population [90]. In addition, treadmill training may require an adapted control mode (e.g., coordination patterns, body orientation, and balance stability) that is important to challenge the locomotion of individuals with PD [91, 92]. Interestingly, treadmill training has also been implicated in locomotor skill acquisition and retention [93], which is related to motor learning and can also play a role in motor signs improvement via basal ganglia plasticity [94]. Thus, our moderator analysis reinforces the benefits of treadmill training on motor signs of PD. Future studies aiming to refine our understanding of ET in PD should consider training on a treadmill.

Five out of 27 RCTs (132 participants) used CEPTT [16, 18, 27, 44, 45], such as BWSTT or stationary bike training combined with physical therapy that included resistance, stretching, and balance exercises. Our moderator analysis showed that CEPTT increased the effect on motor signs (g =  − 0.76), which presented no heterogeneity (I2 = 21%). The complexity of PD and a wide range of signs require a comprehensive approach to the patient, which should include different exercise strategies (e.g., resistance and balance exercises) performed in the same exercise session. For example, increases in muscle strength and in rate of torque development of lower limbs [95] predict a decrease in UPDRS-III scores. In fact, ET is not as effective as resistance exercises in improving neuromuscular variables (e.g., muscle strength) in the elderly [96] and individuals with PD [30]. In addition, ET alone is not effective in improving postural instability and balance in individuals with PD [24, 41]. Taken together, our meta-analysis findings indicate that combining ET with resistance and balance exercises may be promising in decreasing motor signs (e.g., bradykinesia, rigidity, tremor, postural instability, gait disturbance, imbalance, upper and lower limb functions) assessed by UPDRS-III.

4.9 Improvement in Motor Signs Depends on Supervised Facility-Based ET

Most of the included RCTs (787 participants, n = 24) showed small and significant effects of supervised facility-based ET on motor signs (g =  − 0.48). A previous meta-analysis showed that facility-based exercise training and specialized supervision led to greater improvements in balance and gait ability than community- and home-based exercise training in individuals with PD [97]. Our moderator analysis expands these results to motor signs of PD. Supervised facility-based exercise may have encouraged individuals with PD to perform the ET at the prescribed training intensity. In fact, verbal encouragement by exercise specialists in both sprint and endurance activities results in large improvements in performance and motivation to exercise [98]. Interestingly, supervised sprinting with verbal encouragement has recently been proven to be a feasible and biomechanically reliable intervention for individuals with mild to moderate PD [99]. Thus, the analyses of our moderators suggest that supervised ET may enhance the effects of ET on PD motor signs.

Only three RCTs that used either supervised home-based ET [19, 35] or community-based ET without direct supervision [31] showed no improvements (g =  − 0.15 and g =  − 0.25, respectively). Although our meta-analysis does not provide support for supervised home-based ET, it is important to note that the study of van der Kolk et al. [35] did show a between-group difference of 4.2 points on the MDS-UPDRS part III at 6 months. This value exceeded the pre-specified clinically relevant value of 3.5 points [100]. As such, further studies of home-based interventions are needed that provide more support for the participants and have increased sample sizes.

4.10 Improvement in Motor Signs Depends on Alternative Methods for Prescribing High-Intensity Exercise

Our moderator analyses showed that the effect of ET on motor signs versus control conditions increased when using alternative methods for prescribing high-intensity exercise, such as self-selected intensity (g =  − 0.95), intensity based on treadmill speed (g =  − 0.79), and self-perceived exertion rate (g =  − 1.30), but only the last two presented low heterogeneity (I2 = 0%, n = 4, and I2 = 33%, n = 2, respectively).

Three out of 27 RCTs including 80 participants [17, 42, 43] used self-selected intensity, in which exercisers can regulate the intensity based on an intuitive sense of effort and exertion and affective responses (pleasure and displeasure) [53]. A previous study [101] demonstrated that individuals with PD without any diagnosis of cardiac disease, who exercise either with imposed intensity or self-selected intensity, have similar cardiovascular (e.g., blood pressure) and psychophysiological (e.g., perceived exertion and feeling) responses as well as exercise workloads and the percentage of HRmax during 25 min of aerobic exercise. Self-selected intensity gives a sense of control when performing ET, as individuals can either exercise within their capabilities or select a higher intensity if desired [102]. As a result, self-selected intensities generally chosen by individuals with PD [101] or individuals with chronic disease (e.g., heart disease and obesity) [53] are within the high intensity range recommended by the ACSM (≥ 70% of HRmax). Our meta-analysis demonstrates that individuals with PD who exercise with self-selected intensity have decreased motor signs after ET. These results have a practical applicability to the ET performed with self-selected intensity for individuals with PD without cardiac disease who do not have access to more precise exercise prescription via the maximal exercise test [53, 101].

Four out of 27 RCTs including 106 participants [22, 27, 44, 45] used intensity based on treadmill speed, which is primarily designed to improve an individual’s walking speed by training at the maximum tolerated treadmill belt speed [54, 103]. These studies [22, 27, 44, 45] used progressive speeds (varied from 0.13 to 0.83 m/s) with systematic and gradual increases over time, and showed that motor signs can be improved by speed-based ET practice. Studies have demonstrated that in chronic stroke survivors, aerobic capacity and gait parameters (e.g., walking speed and stride length) are more effectively increased with speed-based ET than with treadmill training without significant speed increases [104,105,106]. Thus, these results suggest that intensity prescription based on treadmill speed may be an efficient alternative prescription for improving motor signs of PD, which presented no heterogeneity.

Only two out of 27 RCTs [24, 40] totaling 62 participants used individuals’ self-perceived exertion rate assessed by the Borg scale, which showed a decrease in the UPDRS-III scores after treadmill training with postural perturbation [24] and after curved treadmill training with cues [40]. In these studies, the self-perceived exertion rate had a target range of 12–15 [24] and < 13 [40], defined as perceived exertion of ‘somewhat hard’ that corresponds to ~ 70% to 80% of HRmax, as the Borg scale is positively related to HRmax [107, 108]. A positive correlation was found in individuals with PD between the rate of perceived exertion and HR during a maximal progressive cycling exercise test [109]. Thus, the Borg scale may be an alternative way of monitoring and regulating the intensity of exercise in PD [110], although these results should be interpreted with caution because we obtained effect sizes from two studies only.

4.11 Dose–Response Relationship of ET for Motor Signs

Our meta-regression model presented no significant relationship of variables related to ET dosage (frequency, training period, and number and duration of sessions) with UPDRS-III scores (Table 6). ET performed with a high dosage (frequency = three times per week, training period ≥ 12 weeks, number of sessions ≥ 36, and duration of sessions ≥ 30 min) increases caudate dopamine release [13] and the expression of neurotrophic factors [111], and improves functional connectivity of brain motor pathways [112] and motor skill learning in PD [112, 113]. These neurophysiological benefits may continuously change brain functioning and, therefore, decrease motor signs. However, the treatment effect of higher ET dosages on motor signs is still unknown.

Interestingly, the 11 RCTs [16, 17, 20, 21, 24, 28, 29, 40, 41, 44, 45] with significant g values favoring ET (Fig. 3) had higher ET dosages (up to 64 weeks, up to 192 sessions, and up to 5760 min of exercise [number of sessions × session duration]) than the other RCTs (up to 24 weeks, up to 96 sessions, up to 4800 min of exercise, respectively), as demonstrated in Table 5. Thus, our results suggest, but do not confirm, that high ET dosage has positive effects on motor signs of PD. Future studies are needed to verify the dose–response relationship of ET and motor signs.

4.12 Adverse Events in the RCTs Included

As detailed in Table 1, nine RCTs [16,17,18,19, 24, 31, 35, 40, 41] reported adverse events during the study period, mostly non-related to the intervention. Six of them [17,18,19, 31, 40, 41] reported non-injurious falls, low-back pain, palpitations, and pain in the extremities. One study reported increased number of falls (20 falls) during Nordic walking training with visual and auditory cues [17]. One study [19] reported a withdrawal due to chest pain during home-based treadmill training. Although a low number of adverse events was reported, the safety of the intervention should be considered in future RCTs. Minimization of factors that increase fall-related injuries should be addressed through matching exercise type to an individual’s physical capabilities, besides exercising during the ON-medication state and at the best time of the day according to the individual’s perception. Cardiovascular testing needs to be considered before starting ET if there is any evidence of potential cardiac abnormalities, with the goal of preventing cardiac events during intervention, and also having a clear idea of an individual’s HRmax. We have previously demonstrated that some individuals with PD have a blunted heart rate response [114] and impaired blood pressure control [115] and both factors could potentially affect both the safety and efficacy of ET.

4.13 Limitations and Strengths of This Review

This review had two limitations: (i) we did not explore the prolonged persistence of the ET treatment effect on motor signs through follow-up data; and (ii) the (MDS-)UPDRS-III is the gold standard clinical assessment for PD motor signs, but it may not reflect actual performance in daily life [116]. Thus, we suggest that future studies investigate the use of wearable sensors (individuals using inertial sensors at home), which can serve three potentially useful outcomes. First, they can be used to provide secondary outcome measures (e.g., number of steps walked) [117]. Second, they can provide controlled data to show that the benefits of the exercise intervention were not caused by enabling individuals to be more active outside of the exercise intervention. Third, they can provide surrogates for PD measures (e.g., rigidity and postural instability), although the feasibility of using sensors needs to be proven in future large trials [118].

The strengths of this review are as follows: (i) we used eight moderators that affected the magnitude (effect size) of the changes in PD motor signs after ET and decreased the heterogeneity across RCTs (e.g., sample composed of individuals with PD and FOG and CEPTT); (ii) we found that only high-quality methodological RCTs have significant effects on motor signs; (iii) we observed that ET is effective in decreasing motor signs when comparing the decrease (within- and between-group differences) in the UPDRS-III scores across RCTs with the moderate range of clinically important changes to UPDRS-III scores suggested for PD [70]; and (iv) we presented both the objective results from the meta-analysis as well as a careful interpretation of studies that clearly show benefits of ET on progression but do not show a clear improvement in the signs of the disease [31, 35]. For example, Schenkman et al. [31] demonstrated superior benefits of high-intensity ET on motor signs compared with usual care in untreated individuals with PD. In this study, there was a difference of 4.1 points on MDS-UPDRS-III scores between the high-intensity group and the usual-care group after 6 months, which is clinically relevant because signs of PD can progress by as much as 4.2 points in untreated and treated individuals in the OFF state to 6.3 points in untreated individuals when assessed by the MDS-UPDRS-III [7]. These data therefore suggest that high-intensity ET may slow disease progression since this difference is close to the amount individuals with PD progress. The current SPARX3 study has been designed to further address this important issue (ClinicalTrials.gov identifier NCT04284436; available online at https://clinicaltrials.gov/ct2/show/NCT04284436). Likewise, supervised home-based ET performed at high intensity [35] did show a between-group difference of 4.2 points on the MDS-UPDRS part III at 6 months. This value exceeded the pre-specified clinically relevant value of 3.5 points in treated individuals with PD [100]. Thus, future RCTs are needed to assess the efficacy of ET in slowing PD progression.

5 Conclusion

ET is effective in decreasing UPDRS-III scores, regardless of the control condition used as a comparator. This decrease exceeds the moderate range of clinically important changes to UPDRS-III scores suggested for PD. Although questions remain on the dose–response relationship of ET and reduction in motor signs, higher ET dosage (up to 64 weeks and up to 5760 min of exercise [number of sessions × session duration]) presented higher effect sizes. Thus, future studies are needed to verify the dose–response relationship of ET and motor signs.