FormalPara Key Points

Control groups used in exercise and depression randomised clinical trials experience large improvements in depressive symptoms.

The improvement in control groups equates to approximately double that reported in control groups of antidepressant studies.

Exercise has to ‘beat’ a powerful control group arm in order to demonstrate its effectiveness for depression.

1 Introduction

Major depressive disorder (MDD) is a serious public health concern, ranking second in the top ten causes of years lived with disability in all countries in the recent Lancet review [1]. Moreover, MDD and sub-threshold depressive symptoms are pervasive, affecting people of all ages, sexes and across a broad spectrum of chronic conditions [2]. Treatment of depression is multifaceted, often including both pharmacological therapy (e.g. antidepressants) and psychotherapies (e.g. cognitive behavioural therapy). However, remission rates following first treatment are often poor with approximately 37 %, requiring further intervention [3]. In recent years, the notion of ‘exercise is medicine’ for people with depression has received considerable attention [46], and treatment guidelines proposed exercise as a therapeutic strategy for managing depressive symptoms [7]. The benefits of exercise for people with depression extend beyond alleviating depressive symptoms [4]. For instance, people with depression are at increased risk of diabetes mellitus [8] and premature mortality from cardiovascular disease [9] and, in the general population, exercise is broadly as effective as pharmacological interventions in preventing cardiovascular disease and related premature mortality [10]. Both depressive symptoms and cardio-metabolic disease are associated with lower levels of physical activity participation, compounding the risk of high levels of sedentary behaviour among people with depression [11].

Evidence regarding the antidepressant effect of exercise for people with depression is mixed. Whilst some randomised controlled trials (RCTs) have reported a positive impact [6, 12], others have found no significant treatment effect [13, 14], and it is important to understand why these outcomes are inconsistent [15, 16].

RCTs using the comparison of active interventions with control groups that receive either placebo, treatment as usual (TAU) or a non-active intervention, have been the basis of evidence-based medicine for many years. In the case of medication, control groups typically receive an inert pill with no active ingredient as the placebo [15] and, in order to demonstrate its effectiveness, the active medication has to show evidence of a treatment effect beyond that seen in the placebo response. The placebo response is defined as the change in illness symptoms occurring during a clinical trial in patients randomised to receive placebo [17, 18]. There are concerns about increasing evidence of pronounced placebo responses in studies of antidepressant medication, making it more challenging for novel medications to demonstrate effectiveness [17, 18]. A large placebo response has also been reported in non-pharmacological interventions for depression, with a recent meta-analysis demonstrating an effect size of 0.82 [95 % confidence interval (CI) 0.63–1.00] for control group responses in trials of repetitive transcranial magnetic stimulation [19].

Recently, a meta-analysis [20] quantified the placebo response for psychological outcomes in exercise studies in the general population, finding a mean effect size of 0.20 (95 % CI 0.02–0.41). In exercise trials among people with depression, control group participants do not receive a conventional placebo. For instance, many people in control groups are in receipt of antidepressant medication and/or receiving psychotherapy. Thus, TAU is frequently a proven and effective ‘intervention’ for depressive symptoms. Therefore, while RCTs of antidepressant medication typically contend with placebo responses of inert pills, exercise interventions, when added to TAU, have to demonstrate greater symptom reductions than potent control interventions, and failure to do so may be interpreted as a lack of efficacy. Thus, in exercise studies among patients with depression, it is important to quantify the control group response, and to consider the ‘actual’ effect of exercise interventions. This information is critical in designing appropriate control interventions, in guiding clinical decision making, and in facilitating policy makers in funding cost-effective interventions.

A recent Cochrane review [21] found that exercise improved symptoms of depression [standardised mean difference (SMD) −0.62, 95 % CI −0.81 to −0.42). However, the results were criticised for questionable restrictions to high quality trials resulting in a shrinkage of the effect size [22]. Moreover, the extent to which control group responses reduced estimates of the effectiveness of exercise interventions was not considered. Therefore, we conducted the first systematic review and meta-analysis to quantify the control group response in studies of exercise interventions for people with depression. Additionally, we used sensitivity and meta-regression analyses to identify factors that may contribute to the control group response.

2 Methods

This systematic review adhered to the MOOSE (Meta-analysis of Observational Studies in Epidemiology) guidelines [23] and PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement [24], following a predetermined published protocol (PROSPERO registration CRD42015025333).

2.1 Inclusion Criteria

Included in this meta-analysis were studies that:

  1. 1.

    Included adult participants with a primary diagnosis of MDD according to established criteria (e.g. DSM-IV, [25] or ICD-10, [26]) or those with increased depressive symptoms determined by a validated screening measure (e.g. Hamilton Depression Scale [HAM-D] [27], Beck Depression Inventory [BDI or BDI-II] [28]). We also included studies meeting our criteria among people with MDD who had other co-morbid diagnoses such as bipolar disorder and dysthymia, which was decided among three reviewers (BS, FS, SR).

  2. 2.

    Measured depressive symptoms pre- and post-intervention using a validated measure (e.g. HAM-D, BDI).

  3. 3.

    Were RCTs investigating exercise as the active arm of the trial.

  4. 4.

    Included a control group, defined as receiving TAU (usual care, antidepressants, psychotherapies, or electroconvulsive therapy) or wait-list control conditions or, in line with the recent Cochrane review [21], also any light-intensity activity programmes (e.g., stretching, walking).

  5. 5.

    Were published in a peer-review journal or a published PhD dissertation.

2.2 Information Sources and Searches

Articles were identified in a two-step strategy. First, three authors (BS, FS, SR) reviewed all articles identified (both included and excluded with reasons) by the recent Cochrane review on exercise for depression [21]. Second, three independent reviewers (BS, FS, SR) searched Academic Search Premier, MEDLINE, Psychology and Behavioral Sciences Collection, PsycINFO, SPORTDiscus, CINAHL Plus and PubMed without language restrictions from January 2013 until 1 July, 2015, using the key words ((exercis* OR aerobic* OR running OR jogging OR walk* OR hiking OR swim* OR aquatic* OR cycling OR bicycl* OR strength* and activit* OR fitness OR train* OR “physical medicine” OR resistance OR lift*) AND (depression OR dysthymia)). In addition, reference lists of all eligible articles of recent reviews investigating the effectiveness of exercise versus control were screened to identify potentially eligible articles [21, 29, 30].

2.3 Study Selection

In the first stage of the search strategy, three authors (BS, FS, SR) determined potentially eligible articles meeting criteria from the Cochrane review [21]. In the second stage, after removal of duplicates, two independent reviewers screened titles and abstracts of all potentially eligible articles. Three authors applied the eligibility criteria, considered the full texts and a final list of included articles was reached through consensus.

2.4 Outcomes

Our primary outcome of interest was the mean change in depressive symptoms within the control group from baseline to the end of the intervention, defined as the ‘control group response’.

2.5 Data Extraction

Two authors (FS, SR) independently extracted data using a data extraction form, including study design, geographical location, details of control group participants [mean age, % females, presence of clinical comorbidities such as cardiovascular, metabolic, neurological conditions (yes/no)], psychopharmacologic treatment, including percentage on antidepressants and details regarding depression diagnosis. We also extracted data on the publication type (peer-reviewed journal article or dissertation) the duration of the control intervention, type of control condition [i.e. wait list/no intervention, placebo pill, antidepressants, psychotherapies, light-intensity activity/stretching, usual treatment/routine care/counselling, meditation, light therapy, electroconvulsotherapy (ECT)], and drop-out frequencies in the control group. Finally, we extracted data on the mean and standard deviation (SD) pre- and post-test depressive symptom rating scale for the control group (primary outcome). If this was not available, we used the mean change and SD from pre- and post-test if reported within the study.

2.6 Risk of Bias and Quality Assessment

Three authors (FS, JR, BS) assessed studies on the presence of high, low or unclear risk of bias according to the Cochrane Handbook definition [31]. The risk of bias was assessed in the following items: random sequence generation, allocation concealment, blinding of participants, blinding of those delivering the intervention, blinding of outcome assessors, incomplete data outcome, selective reporting or others. Studies presenting adequate allocation concealment and complete presentation of outcome data (intention-to-treat analysis) and blinding outcome assessors are considered studies with low risk of bias (high quality trials). The criteria selection was based on previous studies [21].

2.7 Meta-Analysis

Because of the anticipated heterogeneity, we utilised a random effects meta-analysis and calculated SMD and 95 % CIs as the effect size measure (ES). The meta-analysis was conducted in the following steps. First, we calculated the SMD statistic together with 95 % CI to establish the control group response in our composite analysis across all studies using Comprehensive Meta-Analysis software (CMA; Version 3, Biostat, Englewood, New Jersey). We subsequently conducted a sensitivity analysis computing the control group response in studies using a wait list or TAU only in order to reduce the potential for any active influence on the control group response that may be present in other control groups employed. Second, we conducted meta-regression analyses with CMA to investigate potential moderators of the control group response. Potential moderators of interest were chosen that have previously been associated with responses in exercise RCTS: mean age, sex, sample size, study publication year, duration of control group and baseline depressive symptomology. We also investigated the influence of frequency of adherence, and drop out in the control groups. We conducted categorical meta-regression analyses to investigate the potential impact of risk of bias, comparing studies with a low risk of bias with those with a high or unclear risk of bias on key study design factors including allocation concealment, intention-to-treat analysis and blinding of assessors. Next, we conducted subgroup analyses to compare control group response according to depression diagnosis (MDD versus depressive symptoms), study setting (inpatient, outpatient, mixed), type of publication (peer-review article or thesis), high quality (low risk of bias) versus low quality, and presence of comorbidity in the study participants (yes or no). Within each subgroup analysis, we also calculated the control group response using wait list/usual care control groups also. Finally, in line with a meta-analysis investigating the placebo response in depression RCTs [17], we investigated mean changes in depressive symptoms with the HAM-D and BDI. Heterogeneity was assessed with the Cochran Q and I2 statistics for each analysis [32]. Publication bias was assessed with a visual inspection of funnel plots and with the Begg-Mazumdar Kendall’s tau [33] and Egger bias test [34]. Moreover, for the main composite analysis we conducted a trim and fill adjusted analysis [35] to remove the most extreme small studies from the positive side of the funnel plot, and recalculated the effect size at each iteration, until the funnel plot was symmetric about the (new) effect size. Finally, we calculated the fail-safe number of negative studies that would be required to nullify (i.e. make p > 0.05) our main effect size [36].

3 Results

3.1 Search Results

In the first stage of our search strategy, 35 RCTs were identified from a previous review [21]. In the second stage, following the removal of duplicates, we identified 819 potentially relevant articles from our searches. At the full-text review stage, we reviewed 76 articles (35 from stage 1 and 41 from our searches in stage 2) and 30 were excluded with reasons (details summarised in Fig. 1). There were 46 full texts that met the eligibility criteria. Of these, 41 had complete data to enable inclusion within our meta-analysis, including 32 [12, 21, 3766] from a previous review [21] and nine new RCTs [14, 6774].

Fig. 1
figure 1

Flowchart of study selection

3.2 Characteristics of Included Trials and Participants

Across the 41 studies, 1122 adults with depression were included in the control groups of the included studies [mean age 49.8 years (±18), 63 % female (range 17–100 %)]. The mean proportion of antidepressant use across the included studies was 38.1 % (range 0–100 %). Overall, 18 studies [12, 14, 37, 4345, 51, 53, 54, 56, 65, 6770, 72, 74, 75] confirmed a diagnosis of MDD, whilst 19 studies included participants with depressive symptoms [38, 4042, 4650, 54, 55, 5862, 66, 71, 73], and a further four included participants with MDD with a number of people with other mood disorders diagnoses. Specifically, Bonnet [39], Singh et al. [64] and Singh et al. [63] included participants with MDD or dysthymia, and Knubben et al. [52] included some participants with bipolar disorder. The majority of included studies involved outpatients with depression (n = 36), were published in peer-reviewed journals (n = 35), and included people without reported clinical comorbidities (n = 35). The most commonly used measures of depressive symptoms were the HAM-D (n = 15), BDI (n = 10) or BDI II (n = 3). Most control groups used a waiting list (n = 11) or usual care (n = 15). Full details of the included studies, participant details, control group condition and depressive symptom measurement are presented in Table 1.

Table 1 Summary of included studies

3.3 Risk of Bias

Overall, 11 studies were judged to be of good methodological quality and at low risk of bias [12, 14, 37, 38, 43, 53, 54, 67, 69, 73, 74] and the remaining 30 were of low quality (high risk of bias). Full details of the risk of bias are presented in the electronic supplementary material, Table S1.

3.4 Main Analysis—Control Group Response Across All Studies

Data was pooled from 41 studies and established that the control group experienced a large significant improvement (i.e. reduction) in depressive symptoms (SMD −0.920, 95 % CI −1.11 to −0.729, p < 0.001; Q = 634, p < 0.01) (Fig. 2). The Begg (tau −0.32, p = 0.003) but not the Egger tests indicated publication bias (intercept 1.5, p = 0.2). Therefore, the effect size was recalculated using Duval and Tweedies trim and fill method with five studies being adjusted and a new effect size of −1.09 (95 % CI −1.31 to −0.873, p < 0.001).

Fig. 2
figure 2

Meta-analysis of control group response in all studies. CI confidence interval, Std Diff standardised difference

The fail-safe number of additional negative studies (i.e. studies reporting no changes in control group improvement in depressive symptoms) required to nullify the significance of the main analysis was high. Indeed, 7886 studies with negative results were required to nullify the large significant result.

3.5 Sensitivity Analysis Investigating Control Group Response in Wait List and Treatment as Usual

In our sensitivity analysis, the control group response in 26 RCTs utilising wait list or usual care was a SMD of −0.642 (95 % CI −0.853 to −0.431; Q = 332, p < 0.01). There was no evidence of publication bias (Egger 0.35, p = 0.7, Begg −0.26, p = 0.055). The fail-safe number was 2241 studies.

3.6 Meta-Regression of Control Group Response in Main Analysis

With borderline significance (p = 0.05), higher mean age of control group participants (β = 0.0226, 95 % CI −0.0009 to 0.0461; z = 1.88) moderated less reduction in depressive symptoms in the control group response and explained some heterogeneity (R 2 = 0.13). The percentage of control group drop-outs also moderated a larger reduction in depressive symptoms (i.e. control group improves) in the control groups (β = −0.023, 95 % CI −0.0339 to −0.0134; z = −4.61, p = 0.001, R 2 = 0.16). Baseline depressive symptomology was not related to the control group response (β = −0.0058, 95 % CI −0.0342 to 0.0226, z = −0.4, p = 0.688). A diagnosis of MDD moderated a larger improvement in depressive symptoms in the control group compared with studies using depressive symptoms (β = −0.6416, 95 % CI = −1.0430 to −0.2402, z = −2.17, p = 0.01, R 2 = 0.10). A multivariate meta-regression model established that the percentage of drop-outs in the control group (β = −0.0211, 95 % CI −0.0388 to −0.0033, z = −2.32, p = 0.01) but not a diagnosis of MDD (β = −0.5543, 95 % CI = −1.1735 to 0.0648, z = −1.72, p = 0.07) or mean age (β = 0.0159, 95 % CI = −0.0042 to 0.0392, z = 1.63, p = 0.11) moderated a larger improvement in the control group response. The multivariate model explained almost half of the heterogeneity in the control group response effect size (R 2 = 0.30). A summary of all meta regression analyses is presented in Table 2.

Table 2 Meta regression of moderators of control group response

3.7 Mean Change in Depressive Symptoms

Data from 13 studies established that control group participants improved by −7.59 points (95 % CI −10.30 to −4.889, p < 0.001) on the HAM-D scale. A sensitivity analysis including only studies utilising a wait list or usual care control across seven studies established a mean improvement in depressive symptoms of −4.50 points on the HAM-D (95 % CI −8.18 to −0.864, p = 0.01). The mean control group improvement in depressive symptoms across ten studies using the BDI was −4.862 (95 % CI −8.374 to −1.350, p = 0.007), and −4.167 points (95 % CI −8.14 to −0.854, p = 0.01) in nine studies using a wait list or usual care.

3.8 Sensitivity Analyses

3.8.1 Control Group Response in Major Depressive Disorder (MDD) and Depressive Symptoms

Overall, the pooled control group response across 18 studies in participants with MDD was −1.248 (95 % CI −1.58 to −0.91, Q = 317, p < 0.01, Begg −0.25, p = 0.13, Egger −4.13, p = 0.11) larger than in studies considering depressive symptoms (SMD −0.520, 95 % CI −0.749 to −0.291, n = 19, Q = 209, p < 0.01, Begg −0.34, p = 0.04, Egger =1.41, p = 0.3) (see Fig. 3). The fail-safe number required to nullify the result was 3343.

Fig. 3
figure 3

Control group response in MDD (1.00 top) and depressive symptoms (2.00 bottom) studies sub group analysis. CI confidence interval, MDD major depressive disorder, Std Diff standardised difference

The control group response remained significant in studies using wait list/usual care in MDD (SMD −0.886, 95 % CI −1.358 to −0.415, p < 0.01, n = 9) and in those studies investigating depressive symptoms (SMD −0.393, 95 % CI −0.636 to −0.150, p = 0.002, n = 15).

3.8.2 Meta Regression of Moderators of Control Group Response in MDD

Mean age (β = 0.0498, 95 % CI 0.0108–0.0888, z = 2.51, p = 0.01, R 2 = 0.34), the percentage of drop outs (β = −0.0205, 95 % CI −0.0337 to −0.0073, z = −2.34, p = 0.02, R 2 = 0.25), baseline depressive symptoms (β = −0.0005, 95 % CI −0.0008 to −0.0001, z = 2.01, p = 0.002, R 2 = 0.03) and longer control group duration moderated the control group response in MDD studies (β = 0.0541, 95 % CI 0.0094–0.0988, z = 2.02, p = 0.01, R 2 = 0.01). When all were entered into a multivariate meta-regression model, none were significant moderators but the percentage drop-out demonstrated a marginal significance (β = −0.0214, 95 % CI −0.0668 to 0.0240, z = −1.89, p = 0.06) and the model explained about 20 % of the heterogeneity (R 2 = 0.18).

3.8.3 Meta Regression of Moderators of Control Group Response in Depressive Symptoms

No significant moderators were found (see Table 2).

3.9 Study Quality

We pooled the control group response for those studies deemed a high/unclear risk of bias separately from low risk of bias and found a larger control group response in high quality studies (SMD −1.430, 95 % CI −1.771 to −1.090, p < 0.001; Q = 167, p < 0.01) than lower quality studies (SMD −0.713, 95 % CI −0.925 to −0.498, p < 0.001; Q = 318, p < 0.01). Categorical meta-regression analyses found that studies judged high risk of bias on allocation concealment, intention-to-treat analyses and assessor blinding were not significant moderators across all studies or in those in MDD alone (p > 0.05, the results of these meta regression analyses are available from the first author upon request).

3.10 Study Setting

A larger control group response was observed in inpatient settings (SMD −1.783, 95 % CI −2.405 to −1.161, Q = 94, p < 0.01) compared with outpatient settings (SMD −0.884, 95 % CI −1.048 to −0.639, Q = 462, p < 0.01, n = 36). When restricted to wait list/usual care, the results remained significant in inpatient studies (SMD 1.99, 95 % CI −2.78 to −1.26, n = 2) and outpatient studies (SMD −0.542, 95 % CI −0.748 to −0.337, n = 23).

3.11 Control Group Response According to Type of Control Group Comparison

All of the control groups demonstrated a significant improvement in depressive symptoms (see Table 3).

Table 3 Meta-analysis results of control group responses

3.12 Comorbidity and Control Group Response

Studies conducted in participants without comorbidities (n = 35) demonstrated a larger effect size (SMD −0.978, 95 % CI −1.18 to −0.769) than those including people with comorbidities (n = 6, SMD −0.607, 95 % CI −1.09 to −0.115). The results remained significant for those without (n = 20, SMD −0.689, 95 % CI −0.94 to −0.438) and with comorbidities (n = 5, SMD −0.571, 95 % CI −1.05 to −0.08) when restricted to wait list/usual care only.

3.13 Type of Publication

Control group responses were evident in RCTs in peer-review journals (n = 34, SMD −0.932, 95 % CI −1.13 to −0.725) and theses (n = 7, SMD −0.850, 95 % CI −1.34 to −0.357). The results were significant in peer-review journals when restricted to wait list/usual care controls (n = 21, SMD −0.696, 95 % CI −0.920 to −0.472) but not in theses (n = 5, SMD −0.37, 95 % CI −0.886 to 0.13).

All sensitivity and subgroup analyses are presented in Table 3.

4 Discussion

The present review found consistent evidence that control groups in exercise RCTs experience pronounced improvements in depressive symptoms. This large control group improvement was evident across virtually all subgroup analyses, and was highest when we restricted the analyses to high quality studies. Moreover, a large control group response seems evident in participants with confirmed MDD. Of all control group ‘methods’ used, a waiting list (n = 11) appears to have the least improvement, although this is still in the medium effect size range. Meta-regression analyses demonstrate that increasing age results in less control group improvement, whilst a higher proportion of drop-outs results in a larger control group improvements. Moreover, higher depressive symptoms also appear to moderate a smaller control group response in people with MDD. Taken together, control groups improve by 7.5 points on the HAM-D scale across all studies, or 4.5 points in TAU/waiting list studies. This demonstrates that control groups in exercise RCTS show an approximate doubling in the improvement in HAM-D scores compared with observations for control groups in meta-analyses of antidepressants RCTs (4 points, [17]), with the National Institute for Health and Care Excellence (NICE) stating an improvement of 3 points is clinically meaningful [77]. Thus, considering the substantial control group response that exercise RCTs must overcome to show a benefit in depressive symptoms and the benefits in absolute terms, critiques concerning the efficacy of exercise for depression (e.g., [13]) may be premature, particularly in light of the plethora of other health benefits of an active lifestyle [10, 78, 79].

There are a number of strengths for this meta-analysis. First, the results are novel and address a critically important area with MDD continuing to be a leading cause of disability worldwide [1]. Second, we conducted a comprehensive systematic review, critical appraisal and robust meta-analysis and meta-regression analyses. Third, our results were consistent across all analyses. We conducted numerous adjustments for publication bias, while the number of studies needed to nullify our results is extremely high, adding support to the robust nature of the current findings. A number of limitations should be considered when interpreting the findings of our meta-analysis. First, we encountered some heterogeneity in most of the analyses. However, we were able to explain large portions of this with our meta-regression analyses. Second, some important participant data was not available across most studies and it was therefore not possible to investigate the influence of these data (e.g. illness duration, psychotropic medication use). In addition, lack of data with a number of RCTs meant they were not eligible for inclusion in our meta-analysis. Third, due to limitations in the dataset we could not consider the long-term control group responses in the RCTs in a systematic manner. An important area for future research is to consider the longer-term effects (e.g. 12 months after the study) in both the exercise and control arms of RCTs in depression. Finally, the time period that we examined encompassed significant changes in the diagnostic criteria used to identify patients with MDD and this may have influenced the results.

Nevertheless, allowing for the aforementioned caveats, our results are consistent and indicate that researchers assessing the effectiveness of exercise in RCTs for those with depression face a challenge in competing with the powerful improvements within control groups. Similar debate has occurred regarding the rising placebo response to ‘inert pills’ in antidepressant RCTs (e.g. [17, 18, 80]). One previous meta-analysis of 29 RCTs [19] has investigated control group response in a non-pharmacological intervention in MDD (repetitive transcranial magnetic stimulation). The authors [19] found a similar increased control group improvement (SMD −0.8) to ours but did not conduct robust meta-regression analyses. Thus, to our knowledge, our review provides the first comprehensive insight into the control group response in non-pharmacological interventions in depression.

A hypothesis for the larger control group response in inpatient trials might be that inpatients benefit from multidisciplinary treatment (which often includes pharmacotherapy, psychotherapy and even electroconvulsive therapy) and the therapeutic milieu and intensive attention from healthcare providers, regardless of how frequently they exercise during their inpatient treatment. The greater control group response in people with a formal MDD diagnosis might be due to a higher baseline symptom severity in these patients. Also, the percentage of drop-outs in the control group moderated the control group response. A recent correlational study [81] in the general population found that initial expectations predicted adherence to a 2-week walking programme. Therefore, it is plausible that participants with lower expectations are more likely to drop out from the study protocol than those who do expect a benefit. An alternative hypothesis might be that the supportive provision of empathy, a coherent narrative to understand one’s illness, and a therapeutic relationship are effective in themselves in the treatment of depression.

Ultimately, control group responses are important in elucidating, explaining and optimising the true antidepressant effects of regular physical activity in clinical settings. It is recommended that studies include real placebo groups in RCTs that are designed to examine the psychological effects of exercise. These could be traditional placebo (e.g. an inert placebo pill) or exercise placebo interventions, which is clearly a challenge. An exercise placebo intervention can be defined as an exercise intervention that is not generally recognised as efficacious, that lacks adequate evidence for efficacy, and that has no direct pharmacological, biochemical, or physical mechanism of action according to the current standard of knowledge [20]. An appropriate exercise placebo condition strategy might be to employ conditions that resemble some aspects of very low-intensity exercise, include equipment that passively moves the limbs of an individual (i.e. without an active muscle contraction) and imagery or hypnotic suggestion of exercise. It is, however, important that these kinds of exercise interventions are administered in a context that is believable to participants. Future research trials should also assess the role of a priori and subsequent expectations using validated questionnaires. Finally, an RCT is a limited research design for studying placebo effects in depression [82]. Due to the inability to blind participants to exercise training, expectations about the intervention are likely to introduce error to the observed effect of exercise [82] although this may increase the observed benefit of exercise also. A feasible alternative, that provides a better controlled estimate of the placebo effect, could be a between-subjects balanced-placebo design [20]. To date, no exercise studies in people with depression have attempted to use this design to study the size of placebo effects. Nevertheless, authors conducting exercise RCTs who find null effects should carefully consider our results for the control group response, and future studies should consult our results in an attempt to minimise control group responses. Overall, to more sensitively test for an antidepressant effect of exercise, studies need to use control groups with a less pronounced response and/or adjust for factors which moderate this effect.

5 Conclusion

We found consistent evidence that the control groups used in exercise studies experience large and significant improvements in depressive symptoms. Our results suggest that control group responses play a substantial role in the outcomes of studies investigating the impact of exercise for people with depression. The control group response in exercise studies is approximately double that reported in antidepressant meta-analyses.