Introduction

Sick leave due to mental disorders poses great challenges for many high-income countries (Shiels et al. 2004) with both individual and societal consequences. Due to their high prevalence, common mental disorders, such as depression, anxiety, and stress, lead to a significant loss in productivity (Lidwall 2015). Thus, in recent years, there have been many attempts to develop effective return to work (RTW) interventions (Brouwers et al. 2006; Lagerveld et al. 2012; Netterstrøm et al. 2013) for these disorders, but evidence from available studies is mixed (Nigatu et al. 2016). To develop efficient RTW interventions, it is important to know which factors influence the probability of RTW for the target group, knowledge, which may also be valuable for general practitioners when estimating the RTW probability for patients (Nieuwenhuijsen et al. 2006).

The term common mental disorder is a concept covering mild-to-moderate mental disorders. However, in the literature, consensus as to which diagnoses are included in the umbrella term common mental disorder is lacking; thus, some researchers include only anxiety and depression (Goldberg 1994), while others also include different stress-related conditions (Steel et al. 2014; Koopmans et al. 2011). In the present review, we wish to investigate the broadest definition of common mental disorders, and as stress induces a high risk of developing anxiety or depression and is in itself associated with a high level of sick leave, we include stress-related disorders in our definition of common mental disorders.

A large nationwide cohort study, including app. 40,000 individuals with mood or anxiety disorders, found a higher relative risk of not being employed between the age of 25 and 52 compared to individuals without mental illness. In addition, they found an increasing relative risk of not being employed up until the age of 40, and that the risk was slightly higher for women after the age of 40. Previous reviews on predictors of RTW for people with common mental disorders have been made either specifically on depression (Lagerveld et al. 2010; Ervasti et al. 2017) or with a much broader scope including studies on mental disorders in general or studies on bipolar disorder or musculoskeletal disorders. These reviews found that the strongest predictors of a lower probability of RTW were older age, psychiatric or somatic comorbidity, the duration, and severity of a depressive episode, and contact with a medical specialist, while factors predicting a higher probability of RTW were higher RTW self-efficacy, higher workability, fewer work demands, and a high score on the personality trait conscientiousness.

However, these reviews do not allow for interpretation of differences within diagnostic subgroups of anxiety, depression, and stress-related disorders. Although there are similarities in symptoms, and treatment of stress, anxiety, and depression the disorders, and their course are very heterogenous (Nandi et al. 2009); hence, predictors may differ between diagnoses.

Studies have shown that different trajectories of RTW exist (Arends et al. 2019; Hellström et al. 2017), e.g., slow RTW or fast RTW, and that predictors for being in a certain RTW trajectory differ. Hence, one could also speculate that predictors of RTW vary during the period of sick leave. However, previous reviews have not investigated if predictors influence RTW probability differently over time.

Identifying predictors for RTW regarding specific diagnoses and in specific periods during sick leave might guide work and occupational psychologists in what to be aware of according to diagnosis and length of sick leave.

Therefore, the present systematic review and meta-analysis aim to investigate predictors of RTW for people on sick leave with common mental disorders to add to the evidence on the subject. Our review includes analyses at different RTW time points, and in diagnostic subgroups of anxiety, depression, and stress-related disorders analyses which, to our knowledge, have not been investigated before, except for depression.

Objective

The objectives of the present systematic review and meta-analysis are: (1) to investigate if various predictors affect the probability of RTW differently at specific time points after sick leave, and (2) to investigate if predictors of RTW differ for the subcategories of depression, stress, and anxiety.

Method

Protocol and registration

The systematic review and meta-analysis were registered in Prospero 27/08/18 with ID: CRD42018073396. It was updated 24/07/2019. The review followed the PRISMA guidelines (Page et al. 2020).

Inclusion and exclusion criteria

Inclusion criteria were:

  • Studies with a population of employees on sick leave due to a common mental disorder. Studies were included if they referred to common mental disorders either as a general term or specified as depression, generalized anxiety, social anxiety, panic disorder or stress disorders, and stress-related problems like adjustment disorder, exhaustion disorder, burnout, or other terms referring to stress. We included studies that used professional clinical assessments and formal diagnostic criteria as eligibility criteria, but studies using only self-report measurements were also included, thereby including more studies with stress-related problems, not always considered clinical diagnoses.

  • Studies looking for baseline predictors of return to work

  • Published or unpublished studies in English or a Scandinavian language from 1990 to 2018.

Exclusion criteria were:

  • studies where common mental disorders, seen either as a group of disorders or as a part of it (e.g., depression) were compared with another group of disorders (e.g., musculoskeletal disorders) without information on predictive factors within each group, since we would not be able to differentiate between diagnoses.

  • studies where a common mental disorder was a secondary diagnosis to either a somatic disease or another mental disorder.

  • studies where post-traumatic stress disorder (PTSD) was the primary target group

  • studies with a qualitative methodology.

Outcome

The outcome was a full or partial RTW. During preliminary searches, we noticed heterogeneity among the studies with respect to the outcome. Some studies emphasized that RTW had to be stable RTW, typically defined as a stable period of, e.g., 28 days in work after sick leave. Some studies used registers, whereas others used self-report measures as their primary source of RTW data. Some studies analyzed full and partial RTW separately, whereas others pooled the two outcomes into one. In our study, the primary outcome included both full and partial RTW. In studies reporting full and partial RTW separately, we included data on full RTW. Also, RTW can be measured both as RTW or not or as time to RTW, and we used both these outcome definitions.

Literature search

We carried out a comprehensive systematic search in seven databases. The choice of databases and search strategy was discussed with a librarian with knowledge of the subject area. We searched Medline, Sociological Abstracts, Cochrane, Ot-seeker, PsycINFO, and Scopus. In addition, we searched Clinical trials.gov for unpublished studies. Keywords for the search strategy can be seen in Table 1.

Table 1 Keywords in the literature search

Quality assessment

We adopted a quality assessment tool used by Lagerveld (Lagerveld et al. 2010) which is a 10-item instrument covering domains like selection of participants, measurement of variables, and control of confounding. Any item on the instrument is scored with either 1 = positive or 0 = negative or unable to determine. The quality assessment was done independently by two reviewers (JF and LH) and any conflicts were discussed until consensus was reached. We did not exclude studies with lower quality, but aimed to perform analyses on differences in predictors between studies with high vs. low scores.

Data synthesis

The software Covidence was used to manage the abstract and full-text screening (2018). Two authors (JF and LH) independently assessed if studies were eligible. Any disagreement on eligibility was thoroughly discussed, and in case of lack of consensus, a third reviewer (LFE) was consulted. Reference lists of all included studies were thoroughly screened for relevant studies not included in the primary search. We organized the studies in tables, and before statistical analyses, we clustered the identified predictors into one of the four domains, demographic, health-related, psychological, or work/contextual factors. Subsequently, we did a more detailed clustering based on their underlying theoretical construct. As an example, depression severity was measured with many different questionnaires aiming at the same theoretical construct. Some variables were pooled based on theoretical and psychometric similarities, even though they had different names.

Statistical analysis

We used Comprehensive Meta-analysis Version 3 for the statistical analyses. We conducted a meta-analysis on every variable that had at least two comparable results, both among common mental disorders and, if possible, in the subgroups. If studies provided insufficient data to be fully included, or in case of errors in data presentation, the authors would be contacted to provide the necessary data. Subsequently, if we did not receive the necessary data, the studies would be fully or partly excluded from the meta-analysis.

Based on findings from previous studies and preliminary searches, we selected potentially relevant predictor variables and sought to analyze the relationship between these and different pre-specified RTW time points in the following hierarchical order of analysis: 1. Common mental disorder → RTW any time point, 2. Common mental disorder → RTW < 3 months after baseline, 3. Common mental disorder → RTW 3–12 months after baseline, 4. Common mental disorder → RTW > 12 months after baseline, and 5. Subgroup analysis: depression, anxiety, or stress → RTW any time point. We used univariate analyses from the studies included. Additionally, we aimed at analyzing differences in studies assessed as high vs low quality.

Results in included studies were primarily reported as odds ratios (ORs), but some as hazard ratios (HR) and means/SDs. We decided mainly to report OR and treated HR as OR, but we also conducted some analyses based on means/SD converted to standard mean difference. In some analyses, we re-calculated OR from the core data to permit comparison. Confidence intervals were set to 95%, and statistical significance was set as two-sided p < 0.05. We used inverse variance weights and random-effects models for every analysis. Heterogeneity of the analyses was assessed by I2 test, where low values indicate a low between-study heterogeneity. To test for potential publication bias, we did funnel plots on all results of the meta-analyses.

Results

The literature search resulted in 3486 references. After removing 1497 duplicates, 1989 abstracts were screened, which resulted in 145 articles that were full-text screened for eligibility (See Fig. 1). Of these, 32 studies were included (Netterstrøm et al. 2013, 2015; Nieuwenhuijsen et al. 2004, 2006; Nielsen et al. 2011, 2012; Post et al. 2006; Beurden et al. 2015; Victor et al. 2017, 2018; Virtanen et al. 2011; Volker et al. 2014; Wåhlin et al. 2012; Audhoe et al. 2012; Kausto et al. 2017; Roelen et al. 2012; Dewa et al. 2003; Hees et al. 2012; Huijs et al. 2017; Kronström et al. 2011; Vente et al. 2015; Eklund et al. 2013; Oostrom et al. 2010; Young and Russell 1995; Brouwer et al. 2010; Brouwers et al. 2009; Ekberg et al. 2015; Lagerveld et al. 2017; Lammerts et al. 2016; Mattila-Holappa et al. 2017). Studies were primarily excluded due to wrong study population (e.g., Somatic diseases), wrong outcome, or wrong study design.

Fig. 1
figure 1

Flowchart of studies in the review

Reviewing the 32 studies with RTW as the dependent variable, we identified 160 different potential predictor variables. Three studies had insufficient data or errors in data presentation; hence, 29 studies were included in the meta-analysis. The quality of the studies was rated independently by two reviewers and was generally assessed as high with a median score of 9 (0–10), and no studies had scores below 6. Due to these small differences in the quality assessment, no analyses on differences between low vs high-quality studies were done. An overview of the study characteristics and single-study predictors is presented in Table 2. We clustered the 160 potential predictors based on their underlying theoretical construct or psychometric properties, resulting in 25 groups of potential predictors.

Table 2 Characteristics of included studies

Meta-analyses on predictors of return to work at any time point, specified time points, and in diagnostic subgroups

A forest plot of all meta-analyses is shown in Fig. 2. The heterogeneity of the meta-analyses was generally low. One meta-analysis showed a heterogeneity of 78, but the mean score was 15. For all analyses, funnel plots did not show any increased risk of publication bias.

Fig. 2
figure 2

ae Forest plots of meta-analyses

Among demographic and psychological factors, a large proportion of potential predictors could be included in the meta-analyses with 4 of 7 of the demographic factors and 15 of 24 of the psychological factors. For health-related and work-related factors, the proportion of included predictors were much smaller, with 19 of 73 factors and 4 of 56 factors included, respectively. Overall, 27 meta-analyses were done at any time point measured with either OR (Victor et al. 2017) or standard mean difference (Brouwers et al. 2006). A full overview of all potential predictors, and which could and could not be included in the meta-analyses, is presented in the supplementary material (Online resource 1—Overview of potential predictors).

For most of the potential predictor variables, there were only comparable data for analyses for RTW at any time point. Only sex, age, education, and positive expectations of length of sick leave had enough data to be analyzed at all specified time points. At the RTW > 3-month time point, only four variables could be included in the meta-analyses, seven were included for RTW at 3–12 months, and eight were included at RTW > 12 months. Regarding analyses in subgroups, seven studies had been included specifically on depression, and in this subgroup, meta-analyses could only be done on sex, age, partner status, and depressive symptom severity. Only five studies were identified in the stress subgroup and there were none in the anxiety group. Therefore, no comparable data existed to do meta-analyses in these diagnostic subgroups. Results of meta-analyses are presented below according to their category as either a demographic, health-related, psychological, or work-related factor.

Demographics factors

At any time point, women had a higher probability of RTW than men: OR: 1.16 (1.01–1.32). Meta-analyses on sex were also conducted at all specified time points and in the depression subgroup, but no significant results could be found in any of these analyses. No comparable data existed for other diagnostic subgroups.

Meta-analyses on education were done at any time point, and all specified time points, but in no diagnostic subgroups. Significant results were only found at > 12 months, where higher education was predictive of a higher probability of RTW: OR: 1.7 (1.17–2.50). Age (10-year increase) decreased the probability of RTW at any time point: OR: 0.79 (0.66–0.95), but when analyses were done on all specified time points, the decreased probability associated with higher age was only found at > 12 months: OR: 0.58 (0.46–0.73). Analyses were done in the depression subgroup, where age was not found to be a predictor. In the other diagnostic subgroups, no comparable data existed.

Age, measured as > 50 vs < 50, could only be analyzed at any time point, showing no significant results. Analyses on partner status were possible at any time point, 3–12 months, > 12 months, and in the depression subgroup. However, no significant predictors were found in these analyses.

Other demographic variables not included in meta-analyses, due to lack of comparable data, were ethnicity, socioeconomic position, and geographic region.

Health-related factors

Scoring high, compared to low, on self-reported depressive symptoms decreased the probability of RTW at any time point: OR: 0.97 (0.95–0.99). Specified time point analyses at 3–12 months and > 12 months found no significant results. No analyses could be done at the < 3 months’ time point or in other diagnostic subgroups. High self-reported stress was also associated with a lower probability of RTW at any time point: OR: 0.66 (0.49–0.87), or if measured as standard mean difference: SDM: − 0.35 (− 0.53 to − 0.17), but no analyses could be done on specified time points or in diagnostic subgroups. Self-reported anxiety was not predictive of RTW at any time point measured as OR, but a lower mean score was associated with increased probability of RTW: SDM: − 0.26 (− 0.44 to − 0.08). No analyses could be done at specified time points or in diagnostic subgroups.

A lower mean somatization score was predictive of a higher probability of RTW at any time point SDM: − 0.40 (− 0.62 to − 0.19). No further analyses of the variable were possible. Previous sickness absence within last year with a mental health problem could only be analyzed at any time point and showed a decreased RTW probability: OR: 0.89 (0.86–0.92). Duration of the current episode and anti-depressant use could also be analyzed at any time point, where none of them were predictive of RTW probability. No comparable data were available for other health-related variables like self-perceived health, reason for sick leave, history of psychiatric treatment, or sleep and fatigue.

Psychological factors

Among psychological factors, positive expectations of length of sick leave could be analyzed at any time point and at all specified time points, but in no diagnostic subgroups. Significant results were found at any time point: OR: 1.50 (1.15–1.96) and < 3 months: OR: 1.64 (1.20–2.24), showing that positive expectations increased RTW probability.

Higher RTW self-efficacy also predicted a higher probability of RTW measured at any time point OR: 1.83 (1.19–2.81), but this could not be found in the analyses of 3–12 months or > 12 months. No comparable data existed for analyses of < 3 months or in diagnostic subgroups. General self-efficacy could only be analyzed at any time point, finding that a higher score predicted a higher RTW probability OR: 1.68 (1.15–2.46).

Three personality variables could be analyzed at any time point, but not on specified time points or in diagnostic subgroups. Higher levels of neuroticism: OR: 0.93 (0.88–0.98) and openness: OR 0.92 (0.87–0.97) decreased the probability of RTW, while higher levels of conscientiousness increased the probability: OR: 1.14 (1.07–1.21). Analyzing coping and mastery at any time point, which was the only analysis possible to make, no difference in RTW probability was seen. Many psychological factors, e.g., expression of emotions, locus of control, dysfunctional attitude, dispositional optimism, or other personality variables, were not comparable in meta-analyses.

Work-related factors

The only work-related factor that was predictive of RTW was the workability index (WAI): OR: 1.11 (1.02–1.21), where a higher score indicated a higher probability of RTW, measured at any time point. No analyses could be made at specified time points or in diagnostic subgroups. Social support and job demands could only be analyzed at any time point and were not predictive of RTW. Variables on working conditions, worker characteristics, job satisfaction, decision authority, supervisor support, and needs for better work climate could not be analyzed due to lack of comparable data.

Discussion

This systematic review and meta-analysis aimed to investigate predictors of RTW among people with common mental disorders. It provided an update including a higher number and more recent studies than previous reviews, and it differentiates itself by looking into predictors of RTW probability at different time points during sick leave. Additionally, it included both common mental disorders as a general concept and the subgroups of stress, anxiety, and depression.

Major findings can be summarized in terms of the strength of the relationship between a predictor variable and the outcome variable in any single analysis as well as the consistency of a predictor variable’s relationship with the outcome variable at different time points and in subgroups.

Looking at the strength of single analysis predictors, high RTW self-efficacy, high general self-efficacy, positive expectations of length of sick leave, and high self-reported stress at any time point were the strongest predictors. Only four predictor variables sex, age, education, and positive expectations of length of sick leave could be analyzed at all different time points. Partner status, depression severity, and RTW self-efficacy were included in some analyses at different time points, and for most of the variables analyzed at different time points, there was no consistency in the results across different time points. The results suggest that variables important for RTW probability after, e.g., < 3 months of sick leave are not necessarily equally important after > 12 months and vice versa. For instance, higher education and younger age were associated with increased probability of RTW after > 12 months, but not in any of the other time points just as positive RTW expectations were associated with RTW < 3 months only. This is an important finding as it indicates that predictors of RTW probability differ according to length of sick leave. However, in some of the meta-analyses, only two or three studies could be included. Very few diagnostic subgroup analyses were possible and only in the depression subgroup, where no significant results were found.

Results from any time point analyses correspond with findings from earlier reviews, where age, RTW self-efficacy, gender, workability, history of sick leave, and depressive symptoms were predictors (Lagerveld et al. 2010; Ervasti et al. 2017; Cornelius et al. 2011), all with the same direction of the prediction. However, we found that higher education increased RTW probability (but only at RTW > 12 months) in contrast to findings of the Cornelius et al.’s review. Results from our meta-analysis correspond to findings from studies on people with severe mental disorders, where higher education is linked to a higher probability of being in competitive employment (Burke-Miller et al. 2006).

Only a few of the included single studies (Brouwers et al. 2006; Ekberg et al. 2015; Netterstrøm et al. 2015) and none of the previous reviews have analyzed if predictors of RTW vary during different time points of sick leave. Our analyses at different time points pointed to a continuous lack of knowledge in the field. For instance, only 5 of 32 studies included analyses of predictors of RTW probability at < 3 months. More knowledge on which factors are predictive after < 3 months sick leave compared to, e.g., > 12 months can be useful for practitioners to plan a proper RTW for people on sick leave.

Also, the focus of most included studies was on common mental disorders; hence, we know little about predictors in subgroups, especially among people with anxiety and stress. Within the four categories of variables, especially work-related factors were difficult to include in meta-analyses, with only 4 of 56 potential predictors included. That lack of significant predictors in the depression subgroup is in contrast with the previous reviews, which found both health-related, demographic, and psychological factors predictive (Lagerveld et al. 2010; Ervasti et al. 2017).

How demographic, health-related, psychological, and work-related predictors are potentially interacting is worth discussing. At the individual level, it is difficult to imagine a person with severe depression having a high degree of self-efficacy, but there is a lack of knowledge on whether the effect of stress or depressive symptoms on RTW probability is confounded by the level of general or RTW self-efficacy. Also, what are the potential explanations for higher age decreasing the RTW probability? From a societal perspective, one could suggest that the labor market is more adapted to a younger population, thereby making it generally more difficult to get a job with age, and even more difficult to RTW after sick leave. Another explanation could be that the decreased probability of RTW with higher age is also confounded by other factors like a low RTW self-efficacy, which could be affected by, e.g., general self-efficacy, previous sick leaves, history of psychiatric treatment, or high scores on neuroticism.

Clinical implications and directions for future research

The findings of strong predictors within the psychological and demographic domains in this review have both clinical and research implications. Clinically, it is reasonable for practitioners to consider the severity of a patient's depressive or stress symptoms but also their age, self-efficacy, and RTW expectations when making individual estimations of the length of sick leave. This is in line with Norder (Norder et al. 2017 ) who designed a point system to estimate the RTW process based on age, depression/anxiety symptoms, own expectations, and educational level. Another implication is that since the effect of RTW interventions differs according to, e.g., the baseline age or self-efficacy, the interventions should be designed specifically to accommodate these baseline factors. As an example, a recent Danish study (Rotger 2021) found that an RTW intervention targeting people on sick leave with complex problems had an effect on people with low baseline self-efficacy, but not on those with high self-efficacy.

Despite the size of this review and meta-analysis, not one study could be included specifically targeting anxiety disorders, either as a group of disorders or as specific disorders like panic disorder or generalized anxiety disorder. Among stress-related disorders, only four studies were included. Although depression, anxiety, and stress together are labeled as common mental disorders and have overlapping symptoms, they are different conditions, and more knowledge on specific predictors within these categories is warranted. Currently, a new study based on data from two large RCTs in the Danish IBBIS project (Poulsen et al. 2017a, b, 2018) analyzes predictors both in common mental disorders and in the subgroups of stress, anxiety, and depression.

Additionally, among work-related factors, knowledge on potential predictors that are not only measured from the individual on sick leave’s perspective but also the perspective of others such as colleagues or supervisors, is needed. In this review, most predictor variables were based on individual self-report, particularly regarding measures of work-related factors. No meta-analyses could be conducted on objective variables like working hours, type of work, occupational code, or size of the company. Addressing this in future research would be relevant.

Strengths and limitations

The strength of this study was that, to our knowledge, it was the biggest systematic review and meta-analysis on predictors of RTW for people on sick leave with common mental disorders. It included 32 studies in the review and 29 in the meta-analysis, which is more than any of the previous reviews have included. This meta-analysis has also investigated RTW predictors at different time points, which has not been done in the previous reviews, and it attempted to find predictors within the diagnostic subgroups; depression, anxiety, and stress, which has previously only been done in reviews of depression.

We assessed the quality of included studies based on an instrument previously used within this field (Lagerveld et al. 2010). Although our quality assessment was done independently, quality assessments were very high. Certain quality assessment components, e.g., the assessment of loss to follow up are not included in this instrument, as, e.g., in the Newcastle–Ottawa scale (Wells et al. 2012), which could be considered a limitation of our study. The meta-analyses were based on univariate analyses from included studies. Multivariate analyses can provide more knowledge on causal relationships, but comparing the results from many types of adjustments within each study would probably not provide a more complete picture of relevant predictors.

We conducted 27 parallel meta-analyses, which may have increased the risk of a type I error. However, bivariate analyses were conducted to quantify the strength and direction of each potential risk factor. We did consider other approaches, e.g., meta-regression, but this would have excluded studies with missing information on just one covariate, and would not have done much to reduce the risk of type I errors, but would inflate the risk of type II errors due to lack of power in meta-regressions. Furthermore, we firmly believe that identifying potential risk factors and protective factors is so important that it is preferable to have type I errors over type II errors.

Our choice of search terms may seem quite broad and may have resulted in many irrelevant papers; however, search terms were chosen based on terms used in the previous studies in the field and were not narrowed in by fear of missing out on relevant studies. Making a systematic search is always a matter of making the search precise enough to avoid too many irrelevant papers, but not so narrow that relevant studies are missed.

Furthermore, almost all of the studies included in the present review were from The Netherlands or Scandinavian countries, which may compromise generalizability. However, this is in line with the other reviews (Ervasti et al. 2017), and is not related to bias in the selection of eligible studies which were screened independently by two reviewers, nor is there any of the in- or exclusion criteria that should favor studies from these countries.

A further limitation of the review is that there are many factors that we have not been able to investigate; for instance, institutional contexts and background characteristics not accounted for in the included studies, just as different workplace and work-related factors may have had an impact on RTW. Factors such as working conditions, worker characteristics, job satisfaction, supervisor support, and needs for better work climate could not be analyzed due to lack of comparable data. Investigating if predictors were modified or differed according to, e.g., sex would have been very interesting and relevant, but out of the scope of the present review.

Conclusion

In the present study, the results from the meta-analyses at any time point were consistent with earlier reviews in finding age, symptom severity, and self-efficacy/RTW expectations to be predictors of RTW. However, in contrast with earlier reviews, we found that higher education increased RTW probability, but only looking at time points over 12 months. The results from meta-analyses on specific time points indicated that predictors important earlier in the RTW process are not necessarily equally important later in the process and vice versa, which is important knowledge for practitioners and needs further investigation. The findings may be useful in the development of RTW interventions which should maybe be targeted differently in different timepoints in people’s sick leave. This review revealed a lack of studies on specified time points and in diagnostic subgroups. Overall, more studies in the area are needed, specifically examining if predictors are persistent over the RTW process or change over time, but also studies including more work-related factors are warranted. Moreover, studies in diagnostic subgroups are needed, specifically on stress-related disorders and anxiety disorders.