Introduction

Numerous studies link physical activity (PA) and mental health. Research examining the influence of exercise interventions on clinical depression has been published for more than a century [1]. The link between PA and clinical depression has been documented [2]. Narrative reviews and meta-analyses have examined the sizable empirical literature testing the effect of PA interventions on subjects with clinical depression [38]. Longitudinal descriptive research suggests that PA may be effective for preventing clinical depression [2, 9].

Some depressive symptoms are common among people without clinical depression. Depressive symptoms include cognitive-affective and somatic symptoms [3, 1012]. Depressive symptoms may precede clinical depression, which is related to health state and physical disease [12]. Many PA intervention studies conducted with healthy adults have measured depressive symptom outcomes because these symptoms are an important component of mental health and quality of life [2, 3, 10]. The mechanism by which PA may reduce depressive symptoms is not well understood [3, 12]. Biochemical or physiological explanations include endogenous opiates, endocannabinoids, brain neurotransmitters, anti-inflammatory cytokines, cerebral blood flow, and hypothalamic–pituitary–adrenal axis function [3, 12]. Hypothesized psychological mechanisms include distraction and enhanced self-efficacy, self-esteem, behavioral activation, sense of achievement/mastery, and self-determination [2, 5, 12]. The relationship between PA and depressive symptoms may be reciprocal. People who experience reduced depressive symptoms may be more likely to continue PA performance. Given the potential importance of depressive symptoms in adults without clinical depression, this meta-analysis synthesized the body of work examining depressive symptom outcomes of PA interventions.

Previous meta-analyses of PA interventions for subjects without clinical depression have focused on older adults [10], included diverse measures of negative mood outcomes without specifically addressing depressive symptoms [10, 13], or included not only studies that targeted subjects with clinical depression but also studies of participants without clinical depression [8, 11, 13, 14]. Only one previous meta-analysis examined the impact of PA interventions on depressive symptoms among subjects without clinical depression [14]. North [14] synthesized depressive symptom outcomes of PA interventions from studies of subjects with and without clinical depression. North et al. reported a standardized mean difference of 0.59 for depressive symptom scores among studies focused on adults without clinical depression. North et al. conducted moderator analyses across studies including subjects with and without clinical depression. They reported larger effect sizes in published studies (vs. unpublished reports), those with home-based exercise (vs. center-based PA), studies with random assignment of subjects, studies with more males, studies with clinically depressed subjects (vs. subjects without clinical depression), studies with resistance PA (vs. endurance PA), and studies with 21- to 24-week interventions (vs. less than 21 weeks or more than 24 weeks) [14].

No previous meta-analyses have reported moderator analyses focused on studies of subjects without clinical depression. Moderator analyses are important to determine if depressive symptom outcomes are associated with characteristics of samples, research methods, or interventions. For example, moderator analyses can determine if intervention effects are similar among women and men or across sample ages. This quantitative synthesis moves beyond previous reviews by focusing on studies of adults without clinical depression, by greatly expanding primary study search strategies to achieve a comprehensive synthesis, and by conducting moderator analyses. This meta-analysis addressed the following research questions:

  1. (1)

    What are the overall effects of supervised PA and unsupervised PA interventions on depressive symptoms in healthy adults without clinical depression?

  2. (2)

    Do interventions’ effects on depressive symptom outcomes vary depending on intervention, sample, and research design characteristics?

  3. (3)

    What are the effects of interventions on depressive symptoms among studies comparing treatment subjects before versus after interventions?

Methods

We used standard procedures for conducting meta-analysis research to identify and obtain potential primary study reports, determine eligibility, extract data from reports, conduct meta-analysis statistical procedures, and interpret findings [15]. The project received approval from the university institutional review board for the protection of human subjects.

Primary Study Search Strategies

Multiple search strategies were used to ensure a comprehensive search and thus limit bias while moving beyond previous reviews [1618]. An expert reference librarian searched 11 computerized databases (e.g., MEDLINE, PsychINFO, EMBASE) using broad search terms (sample MEDLINE intervention terms: adherence, behavior therapy, clinical trial, compliance, counseling, evaluation, evaluation study, evidence-based medicine, health care evaluation, health behavior, health education, health promotion, intervention, outcome and process assessment, patient education, program, program development, program evaluation, self care, treatment outcome, validation study; PA terms: exercise, physical activity, physical fitness, exertion, exercise therapy, physical education and training, walking). Search terms for depressive symptoms were not used to narrow the search because many PA intervention studies report depressive symptom outcomes but do not consider these the main outcomes of the study and thus papers are not indexed by these terms. Several research registers were examined including Computer Retrieval of Information on Scientific Projects and mRCT, which contains 14 active registers and 16 archived registers. Computerized author searches were completed for project principal investigators located from research registers and for the first three authors on eligible studies. Author searches were completed for dissertation authors to locate published papers. Ancestry searches were conducted on eligible and review papers. Hand searches were completed for 114 journals which frequently report PA intervention research.

Selection Criteria

We included studies with an outcome measure of depressive symptoms following PA interventions. Two types of PA interventions were studied: researcher supervised PA such that PA performance was verified, and interventions designed to increase unsupervised PA where the PA performance was not verified. All analyses were conducted separately for supervised PA and unsupervised PA studies. Ancillary intervention strategies that might directly affect depressive symptoms were uncommon and these studies were excluded. For example, studies were excluded if they used relaxation exercise, stress management strategies, cognitive-behavior therapy, etc. These strategies were found in a few interventions designed to affect multiple health behaviors and outcomes. These psychological interventions were excluded because the project focused on the link between depressive symptoms and supervised or unsupervised PA. Supervised PA studies were included if they measured depressive symptoms 1 to 7 days after the last exercise session. Studies with immediate measurement of depressive symptoms following an acute bout of exercise were excluded. Acute bouts of exercise may influence mood when mood is measured in the minutes immediately following the exercise. This synthesis was focused on changes in depressive symptoms beyond immediate effects of exercise on depressive symptoms.

Studies with diverse measures of depressive symptoms were included. Primary studies using self-report measures of cognitive-affective and somatic depressive symptoms were included (e.g., Beck Depression Inventory, Profile of Mood States depressive symptoms subscale, and Center for Epidemiologic Studies-Depression Scale). Only studies that measured depressive symptoms separately from other symptoms were included. For example, we excluded studies reporting overall mood disturbance and those which combined measures of anxiety and depression (unless they reported depressive symptoms separately). Samples included healthy adults without acute or chronic physical or mental illness. Studies targeting subjects with clinical depression or subjects that scored above a primary study specified depressive symptom criterion score were excluded.

English language studies reported between 1969 and 2008 were included. Both published and unpublished studies were included because the single most consistent difference between published and unpublished research is the statistical significance of findings. Meta-analyses with only published primary studies may overestimate the magnitude of the true population effect [18]. We included small-sample studies with as few as five treatment group subjects, small-sample studies often lack statistical power to detect treatment effects but can contribute to meta-analysis findings. Studies were weighted so that those with larger samples had proportionally more impact on aggregate findings [19]. Pre-experimental and experimental studies were included. All analyses were conducted separately for two-group and single-group pre- and post-comparisons, with moderator analyses conducted exclusively on two-group comparisons because of validity concerns in single-group pre- and post-comparisons. We report the single-group findings only as ancillary information to the more valid two-group results.

Data Coding and Evaluation

A coding frame to extract primary study characteristics and outcomes was developed from previous meta-analyses on related topics, intervention components reported in empirical literature, and the research team’s previous meta-analyses. The coding frame was pilot tested with 20 studies prior to implementation. Year of dissemination, publication status (published articles/chapters vs. unpublished reports such as dissertations), and presence of funding were coded as report features. Gender, age, ethnicity, and subject weights were coded from descriptions of samples in primary studies. Methodological features that we coded included attrition rates, random vs. nonrandom assignment to treatment and control groups, and control group management (i.e., attention control vs. true control).

Supervised PA intervention characteristics were coded, including intervention dose (total minutes of supervised PA), exercise intensity, and whether the exercise included flexibility or resistance exercise as well as endurance exercise. For unsupervised PA interventions, the recommended PA intensity, form, and minutes per week were coded. Other coded intervention characteristics included whether the intervention was linked with worksites, whether it was delivered to individuals or groups, and whether it targeted only PA behavior or attempted to change multiple behaviors (e.g., PA plus diet). Outcome data coded included baseline and outcome sample sizes, means, measures of variability, change scores, and t statistics.

All data were independently coded by two extensively trained coders to enhance reliability. All outcome data were further verified by a doctorally prepared coder. Coding decisions not easily reconciled between coders were adjudicated by the principal investigator. All data were compared among coders until 100% agreement was achieved. Coded data were also duplicate-entered. To ensure independence among samples, we cross-checked author lists to identify reports that might contain overlapping samples. We contacted senior authors to clarify uniqueness of samples. Ancillary reports—additional publications about the same subjects—enhanced coding completion.

Data Analysis

We handled data calculations by standard meta-analytic approaches using standardized mean differences (d) effect size (ES) weighted by the inverse of variance to give larger samples more influence [20]. ESs were adjusted for bias. Random-effects models were used to acknowledge that individual ESs vary due both to subject-level sampling error and to other sources of study-level error such as variations among interventions [21]. Normal-theory standard errors were used to construct 95% confidence intervals. Between-study homogeneity was assessed with a conventional homogeneity statistic (Q)[20]. Statistical and clinical heterogeneity is common in behavior change research [22, 23]. We expected heterogeneity in this sample and handled it in four ways [22, 23]. First, random-effects analyses were used because they take into account heterogeneity not fully explained by moderators. Second, a location parameter was calculated along with quantification of heterogeneity. Third, potential heterogeneity was explored with moderator analyses. Fourth, findings were interpreted in the context of discovered heterogeneity. We explored potential publication bias by constructing funnel plots of ES against sampling variance where symmetrical distribution of ES suggested absence of bias [24].

Whenever baseline data were available, treatment group pre- and post-scores were compared. These calculations require correlations between baseline and outcome scores that are not reported in primary studies [25]. These analyses were conducted under two assumptions: no correlation and high (0.80) correlation. Single-group comparisons were not combined with two-group comparisons, nor were moderator analyses conducted on single-group data. All analyses were conducted separately for supervised and unsupervised PA.

We conducted exploratory moderator analyses to determine if ESs were associated with primary study characteristics. Continuous moderators were examined by testing effects through an unstandardized regression slope, a meta-analytic analogue of regression [20]. Dichotomous moderators were examined through testing effects by between-group heterogeneity statistic (Q between), a meta-analytic analogue of analysis of variance [20].

Results

Comprehensive searches yielded 70 reports with 38 eligible supervised PA treatment vs. control comparisons, 22 eligible unsupervised PA treatment vs. control comparisons, 67 supervised PA treatment group pre- vs. post-intervention comparisons, 45 unsupervised PA treatment group pre- vs. post-intervention comparisons, 29 supervised PA control group pre- vs. post-intervention comparisons, and nine unsupervised PA control group pre- vs. post-intervention comparisons [2695]. The supervised PA two-group comparisons included 1,598 subjects. The unsupervised PA two-group comparisons included 1,081 subjects. The treatment single-group comparisons included 1,639 supervised PA and 3,420 unsupervised PA subjects. The pre- and post-comparisons included 676 control subjects in supervised PA studies and 198 control participants in unsupervised PA studies. Most primary studies were published articles (s = 54), and the remainder were dissertations (s = 14), book chapter (s = 1), and conference presentation materials (s = 1; s indicates the number of reports). Publication bias was evident in the funnel plots for supervised PA and unsupervised PA two-group outcome comparisons and for treatment group, pre- vs. post-intervention supervised PA and unsupervised PA comparisons. The control group pre- and post-comparison distributions on the funnel plots suggested less publication bias than plots of treatment groups. Unless otherwise specified, all results are from the treatment vs. control comparisons.

Attrition ranged from 0% to 54%. Interventions designed to increase unsupervised PA ranged from 1 day in length to 1 year between starting and completing the intervention. Supervised PA interventions typically lasted 45 to 60 min, with thrice weekly meetings, and a mean of 62 sessions (range 8 to 156 sessions). Unsupervised PA studies typically recommended 30 to 60 min of PA thrice weekly. Many unsupervised PA studies measured depressive symptoms within a week after PA interventions were completed. Among studies measuring outcomes later, the number of days between the intervention and outcome assessment ranged from 30 to 1,825 days. The most commonly used measures of depressive symptoms were the Profile of Mood States, Beck Depression Inventory, and Center for Epidemiologic Studies-Depression Scale. Studies averaged 65% female subjects. The mean age across samples ranged from 31 to 81 years, with a mean of 48.5 years. The mean weight was 75.9 kg, with a range from 62 to 90 kg. Body mass index means ranged from 25 to 31, with a mean of 27.6.

Effects of Interventions

Table 1 shows the effects of interventions on depressive symptom outcomes. We found a mean ES of 0.372 among the 38 supervised PA studies comparing treatment and control groups following interventions. Supervised PA treatment group pre- vs. post-intervention comparisons yielded mean ESs of 0.258 and 0.281 under the assumptions of high correlation and no correlation, respectively. ESs for two-group and treatment group pre- and post-comparisons of supervised PA were statistically significant with the confidence interval not including 0. For unsupervised PA interventions, the two-group comparison yielded an ES of 0.522. Treatment group pre- vs. post-intervention comparisons of unsupervised PA yielded mean ESs of 0.474 and 0.519 under the assumptions of high correlation and no correlation, respectively. All two-group and treatment pre- vs. post-intervention comparisons for supervised and unsupervised PA were significantly heterogeneous (Q in Table 1). Control group pre- vs. post-intervention comparisons for both supervised and unsupervised PA were much smaller, with all confidence intervals including 0.

Table 1 Random-effects depressive symptom outcome estimates and tests

Moderator Findings

We report moderator analyses for two-group comparisons for both supervised and unsupervised PA (Table 2: continuous moderators, Table 3: dichotomous moderators) when adequate data were reported in primary studies. Results of some potential moderator analyses (e.g., study focused on overweight subjects, worksite interventions) should be interpreted cautiously given small-sample sizes for these variables.

Table 2 Continuous moderator results for depressive symptoms: treatment vs. control at outcome
Table 3 Dichotomous moderator results for depressive symptoms: treatment vs. control at outcome

Report Moderators

Year of publication was unrelated to ES for supervised PA studies. Unsupervised PA studies published more recently had slightly smaller ESs than studies published earlier (Table 2). The magnitude was small, although statistically significant. For both supervised and unsupervised PA, unpublished studies reported larger ES (0.564, 0.870) than published studies (0.321, 0.432; Table 3). Although more distinct in unsupervised PA studies, a similar trend was apparent for unfunded studies having larger ES than funded studies for supervised PA (0.506 vs. 0.313) and unsupervised PA (1.090 vs. 0.192) studies.

Research Methods Moderators

Unsupervised PA studies with random assignment of subjects to treatment and control conditions reported smaller ES (0.201) than studies without random assignment (0.932). A similar trend among supervised PA studies (0.334 vs. 0.514) did not achieve statistical significance. The difference between unsupervised PA studies with true control groups (0.596) and studies with attention control groups (0.191) was borderline statistically significant. Neither sample size nor attrition rates were related to depressive symptom outcome ESs.

Primary Study Sample Characteristic Moderators

Unsupervised PA studies with more women reported slightly smaller depressive symptom ES (Table 2). Sample age was unrelated to outcomes for supervised PA studies. Gender was not a significant predictor of ESs. Primary study mean sample weight and whether studies focused on overweight subjects were both unrelated to depressive symptom outcomes in supervised PA studies (Tables 2 and 3). Inadequate primary study report information prevented a similar analysis for unsupervised PA studies.

Intervention Characteristic Moderators

Interventions that targeted only PA behavior were not significantly more effective than interventions attempting to change multiple behaviors (Table 3). The social setting of the intervention (group vs. individual) was not a statistically significant moderator of ES. Worksite-based unsupervised PA interventions reported similar ESs (0.512) compared to interventions not linked with workplaces (0.529).

The number of days over which the intervention to increase unsupervised PA was delivered was a statistically significant predictor, but the ES magnitude was very small. The relationship between the ES and the magnitude of the dose of supervised PA (minutes per session times the number of sessions) was small and not statistically significant. Low intensity supervised PA interventions resulted in a significantly larger ES (0.907) than moderate intensity interventions (0.271). The pattern of findings for three analyses addressing the exercise form (endurance, resistance, flexibility) suggested interventions are most effective when the intervention is not exclusively focused on endurance exercise (Table 3). The addition of either flexibility or resistance exercise improves depressive symptom outcomes.

Among unsupervised PA studies, neither the recommended PA intensity nor the recommended form of PA were related to outcome ES. Although only three of the unsupervised PA studies recommended subjects exercise at a fitness center, these studies reported significantly larger ES (0.922) than studies which recommended home exercise (0.425). Unsupervised PA interventions that recommended more minutes per week of PA reported smaller ES. Shorter unsupervised PA interventions (days over which the intervention was delivered) resulted in larger ES outcomes. The magnitude of the effects for both recommended minutes and intervention days were quite small.

Discussion

This research synthesis documented that both supervised and unsupervised PA interventions significantly improve depressive symptoms among healthy adults. The moderate ESs (0.372 and 0.522) are between the ES reported for negative affect among older adults (0.35) [10] and the 0.59 ES for non-depressed subjects reported in 1990 [14]. Meta-analyses of PA interventions delivered to adults with clinical depression have reported larger ESs of 0.72 [6], 1.10 [5], and 1.42 [7]. The smaller effects among subjects without clinical depression may partially reflect a floor effect where healthy subjects have less room to improve depressive symptoms than subjects with clinical depression [8]. The moderate ESs document that even subjects who are not clinically depressed experience improvement in depressive symptoms following either supervised or unsupervised PA interventions. Unfortunately, none of the studies reported subsequent episodes of clinical depression to determine the extent to which the interventions reduced clinical depression episodes such as Major Depressive Disorder, Dysthymic Disorder, or Depressive Disorder Not Specified [5, 9698]. Longer follow-up would be necessary to determine the protective benefit of PA interventions.

As expected, the primary studies were heterogeneous. The exploratory moderator analyses of report, methods, and intervention characteristic may help explain existing heterogeneity. The findings indicated a tendency for unpublished and unfunded studies to report larger ESs than published and funded studies. Unpublished dissertation research may contain studies with heightened investigator involvement in especially strong interventions. Unfunded studies may include pilot projects where investigators are strongly vested in providing exceptional interventions to ensure subsequent funding of larger projects. Moderator analyses suggest cautious interpretation of research with some design characteristics. Studies without random assignment or without attention control groups may overestimate the benefits of PA interventions [5].

Encouraging findings, consistent with previous research with clinically depressed subjects [14], suggest that both supervised and unsupervised PA interventions may be broadly beneficial across age groups and weight distributions. Both men and women seem to experience positive results from PA interventions. Interventions delivered to groups are as effective as those focused on individuals, which is important for delivering interventions in a cost-effective manner.

Previous research has documented that interventions that target only PA behavior (vs. those attempting to change multiple behaviors) are more effective in increasing the amount of PA after interventions [23, 99]. These depressive symptom findings suggest that interventions targeting multiple health behaviors may be as effective as interventions focused only on PA behavior. The pattern of findings in this project, including the larger ES for unsupervised than supervised PA interventions, suggests changes in depressive symptoms are affected by something beyond actual PA behavior. Complex inter-related factors likely affect depressive symptoms.

The findings that supervised low-intensity interventions were more effective than moderate or high intensity interventions is consistent with a previous meta-analysis examining overall mood outcomes [10]. These findings do not support the common contention that insufficient training intensity is the reason for poor outcomes in some studies [10, 97]. It is possible that subjects were so unfit that low intensity PA was more achievable than more intense exercise [10]. Low intensity PA may elicit immediate positive feelings that may not be as apparent in more vigorous exercise [97]. Although most research has emphasized endurance type exercise, these novel findings suggest that including flexibility and resistance exercise may be important for improving depressive symptoms. Lawlor and Hopker [5] found no difference between endurance and resistance exercise in major depression symptoms. A meta-analysis of PA interventions among older adults reported lower ES among studies with endurance exercise than studies with resistance exercise [10]. Neither previous published meta-analysis included any primary studies with flexibility exercise. Our findings regarding exercise intensity and achieved exercise dose in supervised PA studies suggest that improved aerobic fitness may not be the mechanism by which depressive symptoms improve [11, 12, 98]. Although endurance exercise has cardiovascular benefits, our findings suggest that additional forms of exercise may have significant depressive symptom benefits. Studies that systematically manipulate dimensions of the exercise stimulus are needed. Center-based PA may offer advantages beyond PA itself that account for these studies’ larger improvements in depressive symptoms. Exercise centers may offer structured experiences, social interaction, and positive regard from others that may enhance the depressive symptom benefits of the PA.

This meta-analysis was limited by the primary studies and by inherent meta-analysis method limitations. Some potentially important moderators were inadequately reported for analyses. For example, few studies reported sample ethnicity. Challenges in designing adequate placebo/attention control groups for testing interventions on psychosocial outcomes are obvious in many primary reports. Studies comparing PA modes and intensities are infrequently reported. Moderator analyses must be interpreted cautiously given the observational nature of meta-analysis and the potential for confounding unknown variables. This synthesis focused on depressive symptoms among samples without clinical depression. A broader meta-analysis comparing interventions among subjects with and without clinical depression would be informative. This project excluded studies that used both PA interventions and psychological treatments. Meta-analyses of these studies would be useful to determine potential additive effects of both intervention types.

In conclusion, these meta-analysis findings document that both supervised and unsupervised PA interventions are effective in reducing depressive symptoms among adults without clinical depression. Subjects benefitting from the interventions included those of both genders and of diverse weights. Intriguing moderator analyses suggest that further examination is in order regarding the influence of the PA stimulus (e.g., flexibility vs. resistance vs. endurance PA) on depressive symptom outcomes.