Background and significance

People with mental disorders are at an increased risk of physical comorbidity [1, 2] and associated differential mortality [3, 4]. Compared to the general population, people with mental disorders engage in significantly less physical activity (PA) [5, 6]. PA interventions have demonstrated benefits in both psychological and physical well-beings of people with mental disorders [7, 8]. While much of the existing evidence for the efficacy of PA among people with mental disorders consists of findings from randomized-controlled trials [9], they are often limited by factors such as selection bias (e.g., participants who completed the study are more likely to be motivated to be physically active than non-participants) and short-study duration. Cohort studies can mitigate some of these limitations and may facilitate examination of the “real world effectiveness” of PA. To date, reviews examining the association between mental disorders and PA have been limited to people with depression. Mammen and Faulkner [10] found that 25 out of 30 longitudinal studies identified demonstrated that increased PA was associated with a reduced risk of subsequent depression. Roshanaei-Moghaddam and colleagues [11], on the other hand, examined the effect of baseline depression on subsequent PA. Eight out of eleven studies included found that depression at baseline was associated with subsequent reduction in PA. More recently, a comprehensive systematic review by Schuch et al. examined 49 longitudinal studies consisting of 266,935 participants to find that those with high levels of PA had lower odds of developing depression (odds ratio (OR) 0.83; 95% confidence interval (CI) 0.79–0.88) [12]. It should be noted that previous reviews were based on a broad range of measures of depression—some were symptom scales and others diagnostic instruments. This has implications for clinical interpretation of the findings and casts doubt on the validity of pooling such widely disparate measures of depression [10]. Thus, this systematic review examines the following research questions: (1) Do people with lower PA have an increased risk of subsequent mental disorders—including mood disorders, anxiety disorders, substance use disorders, and psychotic disorders—compared to those with higher PA?, and: (2) Do people with mental disorders have reduced subsequent PA compared to those without mental disorders? It was hypothesized that: (1) people with lower PA would have an increased risk of subsequent mental disorders compared to those with higher PA, and (2) people with mental disorders would have reduced subsequent PA. To optimize the clinical utility of the analyses, this review focused on traditional diagnostic criteria for mental disorders.

Methods

A systematic review was conducted according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline [13] and Meta-Analysis of Observational Studies in Epidemiology (MOOSE) guideline [14]. This study was prospectively registered in advance with PROSPERO (registration ID CRD42017071737). The following databases were searched for English-language, original research articles published in peer-review journals from inception to September 2017 initially, and then updated in May 2019: Medline, PubMed, EMBASE, PsychInfo, CINAHL, and Web of Science. In addition, references from articles identified as well as several systematic reviews [10, 11, 15,16,17,18,19,20,21] in the field were examined to identify any other article eligible. The following search algorithm was used: (“exercise” OR “physical activity”) AND (“schizophrenia” OR “psychosis” OR “depression” OR “bipolar” OR “serious mental illness” OR “severe mental illness” OR “anxiety” OR “substance use disorder” OR “substance dependence” OR “alcohol use disorder” OR “alcohol dependence” OR “stimulant use disorder” OR “stimulant dependence” OR “common mental illness” OR “common mental disorder”) AND (“longitudinal” OR “observational”). The inclusion criteria of the current review were: (a) longitudinal observational study with either prospective or retrospective design of at least 12 months of follow-up; (b) study populations include people with mental disorders (defined as those who meet the diagnostic criteria, either International Classification of Diseases (ICD, any version) or Diagnostic and Statistical Manual of Mental Disorders (DSM, any version) for mood disorders, anxiety disorders, substance use disorders, and psychotic disorders) as well as people without mental disorders of interest (“the general population”); (c) presence of mental disorder diagnoses or self-reported or objectively measured PA at the study baseline; and (d) outcome measures of either; change in PA over time in those with mental disorders compared to the general population, or incidence of mental disorders.

Data extraction

Titles and abstracts of the articles were reviewed to identify studies that met the eligibility criteria. Eligibility assessment was then performed by two authors (SS and BS) independently. In keeping with previous reviews [5, 22], the following characteristics were extracted from each study when available: (a) study description (including author, publication year, location, study design, follow-up period, sample numbers, loss to follow-up, age, gender, PA measures, and mental disorder measures), and (b) study findings (effect size metrics, 95% CI, and confounders adjusted for). The study findings were examined and summarized according to consistency in direction and significance of the results. Two independent researchers extracted the data (SS and BS) and disagreements were resolved by consensus.

Methodological quality appraisal

The assessment of risk of bias in each study was evaluated using Newcastle–Ottawa Scale (NOS) [23] (Supplementary Table 1). Points are assigned based on the selection process of cohorts (0–4 points), the comparability of the cohorts (0–2 points), and the identification of the exposures and the outcomes of research participants (0–3 points). Two reviewers (SS and BS) independently assessed the methodological quality of each study with disagreements resolved by discussion.

Statistical analysis

Descriptive statistics were used to present a summary of the findings. As studies included were not sufficiently homogeneous in terms of exposure, comparator, and outcomes, meta-analysis of studies was not conducted in the current review.

Results

The search of Medline, PubMed, EMBASE, PsychINFO, CINAHL, and Web of Science databases identified a total of 8407 potential papers (Fig. 1). After removing duplicates, 4665 papers remained. Additional five papers were located through reference lists from other systematic reviews on the topic. Of these 4670 papers, 4647 papers were discarded after reviewing the titles and abstracts. The full texts of remaining 23 papers were examined. Six papers required further discussion before consensus was reached. Three corresponding authors were contacted via email for further data and/or clarification. All three authors [24,25,26] responded to our correspondence and two [24, 25] provided us with further data. A total of 18 studies were identified for inclusion initially. The updated search was conducted in May 2019. Additional 2672 potential non-duplicated papers were identified. After discarding 1153 duplicates, 1519 papers remained. The full texts of eight papers were examined after reviewing the titles and abstracts. Four additional papers were identified for inclusion. Therefore, a total of 22 studies were identified for inclusion in the review.

Fig. 1
figure 1

Search process

Study characteristics

A summary of included studies is presented in Table 1. Of 22 studies included, four studies examined the association between baseline mental disorders and subsequent PA [25, 27,28,29], and only one study [30] reported on the bidirectional relationship between the variables of interest. The other 17 studies examined the association between baseline PA and subsequent mental disorders. The number of participants ranged from 496 [30] to 25,520 [31] and follow-up periods ranged from 15 months [24] to 26 years [32].

Table 1 Summary of included studies evaluating longitudinal association of physical activity status and mental disorder diagnoses

The majority of the included studies (13 out of 18) examined depressive disorders [27, 28, 30,31,32,33,34,35,36,37,38,39,40], while one examined both depressive disorders and anxiety disorders [25] and another [41] examined mood disorders in general. One study [42] examined anxiety disorders only. Two studies examined the relationship between PA and alcohol use disorder [24, 43]. Three studies assessed mood disorders, anxiety disorders, and substance use disorders within the same samples [29, 44, 45]. One study [46] examined mood disorders, anxiety disorders, substance use disorders as well as psychotic disorders within the same cohort group. The majority of the studies (16 out of 22) utilized DSM criterion [24, 25, 27,28,29,30, 34,35,36,37,38, 40, 42, 44,45,46], while five used ICD criterion [31,32,33, 41, 43]. One study did not specify the diagnostic criteria used, but was included, because the study utilized Composited International Diagnostic Interview (CIDI) questionnaire to derive its diagnoses [39].

In contrast to mental disorder diagnoses, there was little consistency in how PA was measured and categorized. None of the included studies used objective PA measurement tools. Eight studies used validated questionnaires like IPAQ [47], with remaining 14 studies relying on isolated self-reported PA items, often from just one question (e.g. “Over the past 12 months, how often did you participate in sports or exercising?” [24]). These questionnaires often used exercise (defined as a subtype of PA that is repetitive and structured, and has a specific intention of improving or maintaining fitness [48]) as a proxy for the overall PA. The wide range of measurement tools utilized to estimate the PA led to inconsistent PA categorizations (i.e., different studies used the same terminologies (e.g., “physically active”) to describe a wide range of different intensities/types/frequencies of PA), thus compromising the current review’s ability to compare and synthesize included studies in a systematic manner.

Mood disorders

Nineteen out of twenty-two studies examined the association between mood disorders and PA. Most (15/19) focused on depressive disorder, but three examined wider diagnostic entities to include dysthymia [46] and bipolar disorders [29, 44, 45]. Of these 19 studies, 15 [30,31,32,33,34,35,36,37,38,39,40,41, 44,45,46] examined the relationship between PA at baseline and subsequent mood disorders; eleven [30,31,32,33, 37,38,39,40,41, 44, 45] of which examined for incident (new onset) mood disorders, whilst four studies [34,35,36, 46] did not measure the baseline mood disorder status. One [33] also examined the effect of persistent depression on PA. Five studies [25, 27,28,29,30] examined the impact of mood disorder at baseline on subsequent PA.

In terms of incident mood disorders, 5 out of 11 studies [33, 39,40,41, 44] found no association between PA at baseline and subsequent onset of mood disorders. Two studies [30, 37] found significant association between higher PA and a reduced likelihood of incident mood disorders. The remaining four studies had mixed results; Mikkelsen et al. [32] found that women with low PA had an increased likelihood of incident depression compared to those with high PA (1.80; 95% CI 1.29–2.51). There were no significant associations between women with moderate PA compared to high PA (1.07; 95% CI 0.80–1.44), or in men (1.11; 95% CI 0.73–1.68 with moderate PA and 1.39; CI 0.83–2.34 with low PA, both compared to the high PA group). Tanaka et al. [38], on the other hand, found that men who reported to “never” engage in PA had an increased risk of incident depression compared to those who reported to “often, sometimes” engage in PA (2.58; 95% CI 1.31–5.05). There was no significant association between the two variables found in women in this study (1.23; 95% CI 0.69–2.21). Ten Have et al. [45] found that those who engaged in between 1 and 3 h per week exercising (0.62; 95% CI 0.43–0.89), but not in those who exercised for more than 4 h per week (0.79; 95% CI 0.53–1.18), had a reduced risk of subsequent mood disorders compared to those who engaged in no exercise. Finally, Hallgren et al. [31] found that compared to those who did not achieve the WHO recommendation (150 min of moderate-to-vigorous PA per week), those who engaged in PA exceeding the WHO recommendation (over 300 min of moderate-to-vigorous PA per week) had a reduced risk of subsequent major depressive disorder (0.71; 95% CI 0.53–0.96), but not those who achieved the WHO recommendation (0.86; 95% CI 0.64–1.14).

Four studies examined the association of baseline PA and subsequent mood disorders without assessing for baseline mood disorder diagnoses, making it impossible to determine if these were incident cases or not. One study [34] found no association between PA and mood disorders in adolescents, while another study [35] found that lower PA was associated with an increased risk of subsequent depression. Using the response to the question: “How often did you exercise or play sports in the last week?”, Suetani et al. [46] found that compared to adolescents who engaged in frequent PA (more than 4 days), those with no PA engagement (“Not at all”) had an increased risk of subsequent mood disorders (1.79; 95% CI 1.05–3.06). However, there was no association between the two variables when the infrequent PA engagement (1–3 days) was compared to the frequent PA engagement group (1.05; 95% CI 0.83–1.32). McKerracher et al. [36] used a self-reported past-week frequency and duration of school and extracurricular sport and exercise at baseline (aged 9–15), and the long-form IPAQ at follow-up (approximately 20 years later) [47] to categorize participants into four PA groups [49]; (1) decreasing, (2) increasing, (3) persistently active, and (4) persistently inactive. The study found that compared to men in the persistently inactive group, men in the increasing group (0.31; 95% CI 0.11–0.92) or the persistently active PA group (0.35; 95% CI 0.15–0.81), but not those in the decreasing group (0.40; 95% CI 0.15–1.05), had a reduced risk of subsequent depression. No difference among the four groups was seen in women [36].

Five studies examined the effect of mood disorders on subsequent PA. Jerstad et al. found that adolescent girls with depression were less likely to engage in PA (compared to the non-depressed cohort) 6 years later [30]. Three studies [25, 27, 29] found no association between depression at baseline and subsequent PA. One study [28] found that participants with baseline depression who were active (defined in this study as the total estimated energy expenditure greater than 1.5 kcal/kg per day) had a greater likelihood of transitioning into being physically inactive compared to those without depression (1.6; 95% CI 1.2–1.9). However, among those who were physically inactive at baseline, depression was not associated with the transition into being physically active at follow-up (1.0; 95% CI 0.8–1.2).

Overall, these studies reveal an inconsistent pattern of associations between PA and mood disorders.

Anxiety disorders

Six studies [25, 29, 42, 44,45,46] examined the association between anxiety disorders and PA. Of these, three studies [42, 44, 45] examined incident anxiety disorders and one study [46] did not examine baseline anxiety disorders. Two studies explored the impact of anxiety disorders on subsequent PA [25, 29].

Of three studies examining incident anxiety disorders, Strohle et al. [44] found that compared to those engaged in no PA (defined in this study as less than once a month or no exercise at all), participants who were engaged in regular PA, defined in this study as daily and several times a week (0.52; 95% CI 0.37–0.74), but not in non-regular PA, defined in this study as one-to-four times a month (0.73; 95% CI 0.45–1.19), showed a reduced risk of subsequent anxiety disorders. On the other hand, Ten Have et al. [45] found that compared to those engaged in no exercise, participants who engaged in between 1 and 3 h per week of exercise (0.56; 95% CI 0.40–0.79), but not those who engaged in more than 4 h per week of exercise (0.71; 95% CI 0.48–1.07), showed a reduced risk of subsequent anxiety disorders. Furthermore, McDowell et al. [42] and Suetani et al. [46] found no association between PA status and subsequent generalised anxiety disorder diagnosis. Overall, these studies suggest no consistent association between PA and subsequent anxiety disorders.

Two studies that examined the association between the baseline anxiety disorders and subsequent PA had conflicting findings. Hiles et al. [25] found that the baseline anxiety disorders diagnosis was associated with reduced subsequent PA [25], whereas Suetani et al. [29] found no association.

Substance use disorders

Six studies examined the association between substance use disorders and PA [24, 29, 43,44,45,46]. Two studies explored alcohol use disorder, and the remaining four explored substance use disorders in general. One study [45] found no association between PA and subsequent substance use disorders. Three studies found mixed results. Ejsing et al. [43] found that compared with those with a moderate/high PA level, participants with a sedentary level (1.45; 95% CI 1.01–2.09 in woman, 1.64; 95% CI 1.29–2.10 in men), but not those with a low PA level (0.88; 95% CI 0.64–1.19 in woman, 1.15; 95% CI 0.93–1.42 in men), had an increased risk of subsequent alcohol use disorder. This finding was found in both men and women. Likewise, Henchoz et al. found that participants engaging in no sports or exercise (2.32; 95% CI 1.60–3.35 in “never”), but not those with lower engagement (1.14; 95% CI 0.79–1.65 in “a few times a year”, 1.09; 95% CI 0.80–1.49 in “1–3 times per month”, and 1.16; 95% CI 0.92–1.46 in “at least once per week”), had an increased risk of subsequent alcohol use disorders compared to those who engaged in sports or exercise “almost every day” [24]. Strohle et al. [44] found that non-regular PA (0.48; 95% CI 0.31–0.75), but not regular PA (0.95; 95% CI 0.72–1.24), was associated with a reduced risk of subsequent substance use disorders (compared to those with no PA). Finally, Suetani et al. [46] found that adolescents who engaged in infrequent PA (0.75; 95% CI 0.62–0.91), but not those who engaged in no PA (0.92; 95% CI 0.56–1.51), had a reduced risk of substance use disorders compared to adolescents who engaged in frequent PA. Overall, there was no consistent association across all studies of PA and subsequent substance use disorders.

Only one study [29] examined the association between the baseline substance use disorders and subsequent PA, and found no association.

Psychotic disorders

Only one study examined the association between PA and psychotic disorders. Suetani et al. [46] did not find any association between the baseline PA and subsequent psychotic disorders.

Quality assessment

Quality assessment scores for included studies are summarized in Table 2. The mean score was 7.1 with the lowest score of 5 and the highest score of 8.

Table 2 Quality assessment scores

The majority (17 out of 22) of included studies scored equal to or higher than the mean score (i.e., the NOS score of at least 7 out of 9). One study [37] found a consistent association between higher PA and a reduced risk of depression. Another study [30] found the bidirectional beneficial association between PA and depression. Ten studies [25, 28, 31,32,33, 36, 43,44,45,46] found mixed results (i.e., no consistency in direction and significance of the findings), and five studies [27, 29, 34, 39, 41] found no association between PA and mental disorder diagnoses.

Discussion

Summary of evidence

We would like to draw attention to three key findings. First, apart from depression, there has been relatively little research examining the association between PA and mental disorders using clinical diagnostic criteria. Second, there is a wide range of different methods used to measure and classify PA, thus making it difficult to compare one study from another. Third, the findings from this review suggest that there are no convincing data to support the hypothesis that PA influences the risk of subsequent mental disorders, nor that mental disorders influence subsequent PA.

Within 22 included studies, only two found consistent association between lower PA and a reduced risk of subsequent mental disorders. One study found the bidirectional association (i.e., greater PA was associated with a reduced likelihood of subsequent major depression and major depression predicted lower subsequent PA engagement). Twelve studies found mixed results (i.e., no consistency in direction and significance of the findings). Seven studies found no association.

Previously, Mammen and Faulkner [10] reported that 25 out of 30 longitudinal studies identified in their review showed the association between greater PA and a reduced subsequent risk of developing depression. However, only five [30, 32, 37, 39, 41] of the 30 studies included in their study met the inclusion criteria for the current review (due to this review having a narrower definition of mental disorders). Similarly, Hoare et al. [18] have identified six longitudinal studies that examined the association between PA and subsequent depressive symptoms among adolescents. Even though all six studies found the association between greater PA and a reduced subsequent risk of depressive symptoms with one study also demonstrating the opposite direction of association, only one out of the six studies included [30] met the inclusion criteria for the current review. The rationale for the tighter inclusion criteria for this study was so that the findings were more relevant to those individuals who attended specialist mental health services. Requiring a diagnostic threshold favored studies examining the association between PA and mental disorders where participants had more severe symptom profiles. This is in contrast to previous reviews [12, 21] which included studies of people who had mental health symptoms which may not reflect those who would seek help from mental health service providers. Our findings are, therefore, more applicable to patient populations but not generalizable to the non-clinical population.

There has been little attention given to the association between the PA and subsequent anxiety, substance use, and psychotic disorders. In the current review, six studies [25, 29, 42, 44,45,46] examined the association between PA and anxiety disorders. Another six studies [24, 29, 43,44,45,46] examined the association between PA and subsequent substance use disorders, and only one study examined the relationship between PA and psychotic disorders [46]. Overall, there was no consistent association among these studies.

Strengths and limitations

Strengths of the current analysis included the focus on participants with more robust definitions of mental disorders, thus allowing the findings to have more clinical utility compared to previous reviews. It also focused on data from longitudinal studies, facilitating a better exploration of the temporal association between PA and mental disorders.

The main limitation of the current analysis is the inconsistency in measurement and definition of PA in the included studies and absence of objective measurement of PA. As PA is often unstructured and varies in both intensity and duration as well as type, measuring and assessing PA are inherently complex [50]. Furthermore, the risk of under-reporting with self-report measurements may be particularly high in people with mental disorders. Given that all the studies included in this review relied on self-reports rather than objective measurements, and the proportion of participants focused were those with mental disorders, the risk of under-reporting due to both recall/reporting biases and methodological effectiveness of the tools used is likely to be substantial.

It should also be noted that this review had stronger emphasis on the statistical significance, rather than the effect size, of each association in interpreting the findings of the original studies. Some studies were classified as either having no or mixed association even when the effect sizes were large (e.g., in one study [36], men in the decreasing group compared to the persistently inactive group had the adjusted relative risk of subsequent depression at 0.40. However, as its 95% CI was between 0.15 and 1.05, the study was classified as having a mixed association). This is particularly important given that a meta-analysis was not conducted in this review. It is possible that the effect of PA may have been underestimated by taking this approach.

The majority of included studies measured mental disorder diagnoses at both time points to allow for the estimate of incident, or change in mental disorder diagnosis status. This allowed the current review to investigate if PA was associated with the change in the outcome (mental disorder diagnosis status), rather than just examining the association between the two variables at different time points. However, few included studies accounted for the change in PA status using a consistent measurement tool over the same period of time, which limited our ability to examine the change in the PA status in the same manner.

Another important limitation is that the current review only included observational studies. While most of the included studies controlled for potential confounders such as age, sex, and body mass index, only few of them considered other factors like smoking status, other medical conditions, and socioeconomic status, and none of them included factors like cardiorespiratory function. There may be other unmeasured confounding factors that may influence the association between the two variables in any given observational study. Given these limitations, even though the current review restricted its search to longitudinal studies only, it is unable to comment on causative factors between mental disorders and PA. This may, at least in part, explain why the results of the current review are not in line with the accumulating base of research evidence from interventional studies that indicate that PA is beneficial to people with mental disorders [7, 8].

The current review highlights the need for researchers in this field to explore an appropriate and consistent tool to measure PA in large cohort studies with more robust designs. While objective measurements such as accelerometers may have more favorable features in terms of validity, reliability, and sensitivity when measuring PA, this needs to be weighed against practical issues such as cost, feasibility, acceptability, and tolerance, especially if the study requires the participants to wear these devices for a considerable amount of time. Furthermore, given how rapidly the technology is advancing, most measurement devices that are used at the baseline data collection may be out of date by the follow-up point in longitudinal studies with long durations. Thus, subjective-validated measurement tools such as self-report questionnaire still have an important role [51].

In addition to how PA is measured, future studies should also consider the timing of the PA (e.g., at specific ages across the lifespan) and the duration of the PA (e.g., years as opposed to weeks). They should also examine how PA may impact on both physical and psychiatric outcomes both in the short term (e.g., changes in cardiometabolic parameters and symptom severity), and in the long term (e.g., changes in life expectancies and disease progression) for people with mental disorders. Future studies should incorporate novel designs that may help to strengthen the exploration of the causative natures of the relationship such as Mendelian comparison and sibling comparison on a large scale in a collaborative manner [52, 53]. For example, Choi et al. [54] recently investigated the bidirectional relationship between depression and PA using Mendelian Randomization. This study found that people with increased PA had reduced likelihood of developing depression, but the association was evident only when objectively measure PA, not self-reported PA, was used. They found no association between depression and subsequent PA status regardless of how PA was measured.

Conclusion

While the evidence base for PA as an effective intervention option among people with mental disorders is growing [55], there remains a lack of consistent epidemiological evidence linking PA to be either a risk factor or consequence of mental disorders. Therefore, it may be important not to overstate the benefit of PA in mental health service settings at this stage. The current review found the need for a more consistent approach to measuring and defining PA, as well as incorporating novel approaches to account for inherent complexities of PA among people with mental disorders. Given the potential importance of PA at both the individual and population levels [56], this issue warrants ongoing attention.