Introduction

Chronic whiplash-associated disorders (WADs) often include not only physical but also diverse psychological and cognitive impairments (Sterling, 2014), whose characteristics are different from those of chronic idiopathic neck pain (Coppieters et al., 2017; Ris et al., 2017). Therefore, clinical practice guidelines suggest providing a Bio-Psycho-Social model of care to individuals with chronic WADs by including psychological interventions (Scholten-Peeters et al., 2002).

One intervention commonly suggested by clinical practice guidelines is cognitive behavior therapy (CBT). CBT helps in cognitive reconditioning and behavioral modifications of specific activities (Butler et al., 2006; Flor & Turk, 1984; Morley, 2011). In 2015, Monticone et al. (Monticone et al., 2015) have conducted a meta-analysis to investigate the effects of CBT alone on chronic neck pain. In their meta-analysis, patients with nonspecific neck pain and those with WADs were combined, limiting the clinical implications due to different characteristics of chronic WADs and idiopathic neck pain. Therefore, performing a new analysis by limiting participants to those with chronic WADs is necessary.

Investigating the effects of CBT alone on chronic WADs by performing a meta-analysis is an important step in considering the advantages of including CBT in the Bio-Psycho-Social model of care. However, note that CBT is a psychological intervention, not a Bio-Psycho-Social intervention (Urits et al., 2019). Therefore, understanding the effects of combining physical interventions and CBT on chronic WADs is clinically useful. In 2016, Shearer et al. (Shearer et al., 2016) have investigated the effects of a combination of physical interventions and CBT on chronic WADs in a systematic review involving the literature from 1990 to 2015. However, data synthesis was not undertaken due to the absence of multiple studies. We found multiple randomized controlled trials (RCTs) to be included in a meta-analysis (Michaleff et al., 2014; M. J. Stewart et al., 2007a, 2007b), and we found another eligible RCT in 2020 (Andersen et al., 2020). Therefore, an updated systematic review was necessary to understand the effects of the combination of physical interventions and CBT on chronic WADs.

This systematic review with meta-analysis has two purposes. First is to investigate the effects of CBT on pain, disability, quality of life (QoL), and psychological parameters in patients with chronic WADs. Second is to investigate the effects of combination of physical interventions and CBT compared with those of CBT alone on pain, disability, QoL, and psychological parameters on patients with chronic WADs.

Methods

Protocol Registration and Search Strategy

This review was preregistered in PROSPERO (CRD42020193904) and conducted according to the updated Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines (Moher et al., 2009). The following items were modified after the initial registration in PROSPERO: (1) Review questions were added to investigate the effects of the combination of physical interventions and CBT; (2) the definition of chronic WADs were defined as symptoms lasting for ≥ 3 months, which was revised from symptoms lasting for ≥ 6 months; (3) studies with one author were also included; and (4) the PsycINFO database was excluded due to limited access to the database.

An author (HK) systematically searched the following databases from inception to January 2021: CINAHL, Web of Science, MEDLINE, Embase, EMCare, and Physiotherapy Evidence Database (PEDro). The search strategies are presented in Online Appendix 1.

Study Selection

Screening and full-text inspection were performed by two authors (YK and TM) independently. Any disagreements on eligibility were resolved by discussion. Cross-referencing was performed with hand searches of the reference lists of studies included in the full-text screening.

Eligibility Criteria

All RCTs published as full-text articles were eligible for inclusion in this systematic review. No restrictions on language were employed.

Eligible participants were as follows: (1) adult individuals (≥ 18 years of age) with WADs with a whiplash injury grade of I, II, or III in the Quebec Task Force Classification (Spitzer et al., 1995), (2) patients from primary, secondary, or tertiary care institutions, and (3) patients with any persistent symptoms, such as musculoskeletal pain, sensorimotor control disturbances, and psychological problems, for more than 3 months after the accident. Studies with the following participants were excluded from this systematic review: (1) patients with a cervical fracture or dislocation, (2) patients with injuries in other body areas other than the neck during the accident causing WAD, and (3) patients with previous WADs, preexisting neck pain, or previous neck surgery.

Eligible interventions were CBT with and without physical interventions. No consensus was made for a specific definition of CBT (Lamb et al., 2010); thus, CBT was identified in this study when the following criteria reported by Richmond et al. (Richmond et al., 2015) were satisfied: (1) treatments based on cognitive–behavioral principles that were explicitly or implicitly stated (Fisher et al., 2018; Gatchel et al., 2007; Turk & Flor, 1984); (2) interventions using both cognitive and behavioral strategies were used in the same treatment package; (3) CBT was provided by an experienced healthcare professional; and (4) when multimodal treatments were provided, the intervention was assumed to be based on a CBT principle. Any disagreements in selecting CBT techniques were resolved through a discussion between the authors by contacting the corresponding authors of the study for additional information or by finding a process paper associated with the study that provided further information.

Eligible comparisons included any type of a single intervention or a wait-and-see control.

Eligible primary outcomes included pain intensity, disability, QoL, and eligible secondary outcomes included psychological status. In addition, adverse events were recorded where mentioned. For pain intensity, when more than one patient-reported outcome measure (PROM) was reported, a numerical rating scale was used in the analysis, followed by a visual analog scale. For disability, when more than one PROM was reported, the Neck Disability Index (NDI) was used in the analysis. For QoL, when more than one PROM was reported, the 36-Item Short Form Health Survey (SF-36) was used in the analysis, followed by the 12-Item Short Form Health Survey (SF-12) and the EuroQol-5 Dimensions. For the SF-36 and SF-12, physical and mental component scores were used in the analysis.

Risk of Bias Assessment

The risk of bias was assessed using the PEDro scores (Maher et al., 2003). We used the scores reported in the PEDro (www.pedro.org.au). When no scores were available in the database, two authors (YK and TM) independently assessed the PEDro scores.

Disagreements were resolved by a third author (HK). Moderate to high quality studies were defined as studies with a PEDro score of ≥ 6 (Maher et al., 2003).

Data Extraction

Two authors (YK and TM) independently extracted data, and disagreements were resolved by discussion, moderated by a third author (HK). Extracted data were () country where data collection was performed, study design, setting and duration of the intervention, profession providing the intervention, and number of sessions of the intervention; (2) participants’ diagnosis, age, and gender, number of participants, and pain duration; (3) intervention type and comparison; (4) adverse events and dropouts, including reasons, and the means and standard deviations of the PROM scores for pain, disability, QoL, and psychological status at short-, intermediate-, and long-term follow-ups. The definitions of short, intermediate, and long terms were according to previous studies (Gross et al., 2015; Monticone et al., 2015). Short term was defined as less than 3 months after the start of the intervention. The time point closest to 4 weeks was used when multiple eligible follow-up points were available. Intermediate term was defined as ≥ 3 months and less than 12 months after the start of the intervention. The time point closest to 6 months was chosen when multiple eligible follow-up points were available. Long term was defined as ≥ 12 months after the start of the intervention. The time point closest to 1 year was chosen if multiple eligible time points were available. When such data were lacking in the published study, we contacted the corresponding author via email to request for the missing data. A reminder email was sent 2 weeks after the first contact. When no response was received after the second reminder, we considered it uncontactable.

Data Synthesis and Analysis

When multiple datasets of similar outcomes were available, a meta-analysis was performed using Review Manager 5 (The Nordic Cochrane Centre, København Ø, Denmark). First, the meta-analysis was attempted using change values from the baseline to each follow-up point. When the change values were unavailable, the values at each follow-up point were used for the meta-analysis.

The standardized mean difference (SMD) with 95% confidence intervals (CI) was calculated using the random-effects model. If necessary, the scores were reversed to show that high scores indicate a healthy status. The I2 statistic was assessed for heterogeneity among trials, whose interpretations were as follows: 0–40%, insignificant heterogeneity; 30–60%, moderate heterogeneity; 50–90%, substantial heterogeneity; and 75–100%, considerable heterogeneity (Deeks et al., 2019). Effect sizes proposed were used with 0.2 representing a small effect, 0.5 a moderate effect, and 0.8 a large effect (Cohen, 1988).

The overall quality of evidence was evaluated in each meta-analysis using the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) approach (Furlan et al., 2015; Pollock et al., 2016). The GRADE approach has five domains. Our review included RCTs only; thus, the starting GRADE score was high in each domain. The scores were downgraded by one or two levels in each domain as follows: (1) the risk of bias was downgraded one level when more than 25% of the participants are from studies conducted in low-quality methods (e.g. PEDro score of less than 6); (2) the inconsistency was downgraded one level when the I2 value was more than 75%; (3) the indirectness was downgraded one level when the available evidence for population, interventions, comparisons, and outcomes differs from what was defined in the inclusion criteria of the review; (4) the imprecision was downgraded two levels when the number of participants within the pooled analysis was less than 100 and one level when the number of participants within the pooled analysis was less than 200 (Pollock et al., 2016); and (5) the publication bias was downgraded one level when a funnel plot comparing at least 10 studies suggested publication bias. Two authors (YK and TM) independently rated the GRADE scores and disagreements were resolved by discussion.

Subgroup Analysis

Subgroup analyses were conducted to compare the effectiveness of CBT alone and in combination with physical interventions with that of specific interventions.

Results

Study Selection

Figure 1 presents the flow of the study selection. Two studies by Söderlund and Lindberg (2001, 2007) and two studies by Wicksell et al., (2008, 2010) were from the same study project and, therefore, were treated as one, respectively. The risk of bias was assessed in eight studies (Andersen et al., 2020; Dunne et al., 2012; Ehrenborg & Archenholtz, 2010; Michaleff et al., 2014; Pato et al., 2010; Söderlund & Lindberg, 2001, 2007; M. J. Stewart et al., 2007a, 2007b; Wicksell et al., 2008; Wicksell et al., 2010). Table 1 demonstrates the results of the risk of bias assessment. Six studies (Andersen et al., 2020; Dunne et al., 2012; Michaleff et al., 2014; Söderlund & Lindberg, 2001, 2007; M. J. Stewart et al., 2007a, 2007b; Wicksell et al., 2008; Wicksell et al., 2010) had a low risk of bias, and two studies (Ehrenborg & Archenholtz, 2010; Pato et al., 2010) had a high risk of bias.

Fig. 1
figure 1

PRISMA flow chart demonstrating the study search results

Table 1 PEDro scores of the studies included in this systematic review

Study Characteristics

The summary of the eight studies is presented in Table 2. Detailed characteristics of the eight studies are presented in Online Appendix 2. Andersen et al. (2020) have compared trauma-focused CBT in addition to exercise with Supportive therapy and exercise. Söderlund and Lindberg (2001, 2007) have compared CBT in addition to physical therapy with physical therapy alone. Pato et al. (2010) have compared CBT in addition to other treatments (physical therapy, infiltration, or medication) with other treatments alone. Ehrenborg and Archenholtz (2010) have compared CBT in addition to surface electromyography biofeedback training with CBT alone. Dunne et al. (2012) and Wicksell et al., (2008, 2010) have compared CBT with the wait-and-see control group. Michaleff et al. (2014) and Stewart et al., (2007a, 2007b) have compared CBT combined with a comprehensive exercise program with advice alone.

Table 2 Summary of the eight studies included in this systematic review

One study (Ehrenborg & Archenholtz, 2010) was deemed ineligible for the meta-analysis because it compared CBT combined with surface electromyography biofeedback training with CBT alone. Therefore, seven studies (Andersen et al., 2020; Dunne et al., 2012; Michaleff et al., 2014; Pato et al., 2010; Söderlund & Lindberg, 2001, 2007; M. J. Stewart et al., 2007a, 2007b; Wicksell et al., 2008; Wicksell et al., 2010) were included in the meta-analysis comparing CBT effect sizes with those in all types of comparisons. Furthermore, two studies that have compared CBT with the wait-and-see control group (Dunne et al., 2012; Wicksell et al., 2008, 2010), were considered eligible for inclusion in the subgroup analysis to investigate the effects of CBT alone. Two studies comparing CBT combined with a comprehensive exercise program with advice alone (Michaleff et al., 2014; M. J. Stewart et al., 2007a, 2007b), were considered eligible for inclusion in the subgroup analysis to investigate the combined effects of CBT and physical interventions.

In the eight studies, CBT was provided by psychologists in four studies (Andersen et al., 2020; Dunne et al., 2012; Pato et al., 2010; Wicksell et al., 2008, 2010) and by physical therapists in four studies (Ehrenborg & Archenholtz, 2010; Michaleff et al., 2014; Söderlund & Lindberg, 2001, 2007; M. J. Stewart et al., 2007a, 2007b). The corresponding authors were never contacted to resolve doubts about the types and treatment characteristics of CBT. Three studies (Andersen et al., 2020; Michaleff et al., 2014; M. J. Stewart et al., 2007a, 2007b) have evaluated the adverse events of CBT, where no serious adverse events were observed. Minor adverse events, including muscle soreness, stiffness, headaches, and/or exacerbation of existing symptoms, were reported in the CBT group (Table 2).

Meta-analysis

Only one study has reported changes in values from baseline to each follow-up point (Wicksell et al., 2010). In the other studies, no additional data were available, and the values at each follow-up point were used for the meta-analysis. No disagreement was found in any rating of the GRADE scores between the two authors.

CBT versus All Types of Comparisons

For pain in the short-term, 372 patients with chronic WADs from six studies (Andersen et al., 2020; Dunne et al., 2012; Pato et al., 2010; Söderlund & Lindberg, 2001, 2007; M. J. Stewart et al., 2007a, 2007b; Wicksell et al., 2008; Wicksell et al., 2010) were included in the meta-analysis, whose forest plot is presented in Fig. 2. No statistically significant overall effect was observed (p = 0.17; SMD, − 0.21; 95% CI, − 0.50 to 0.09), indicating that CBT was not more effective than all types of comparisons in reducing pain at the short-term follow-up. The I2 value was 43%, indicating moderate heterogeneity. Due to a serious impression with one levels downgraded from the GRADE score, the quality evidence was deemed moderate (Online Appendix 3).

Fig. 2
figure 2

Forest plot of CBT versus all types of comparisons on pain in the short, intermediate and long term

For pain in the intermediate-term, 348 patients with chronic WADs from five studies (Andersen et al., 2020; Michaleff et al., 2014; Pato et al., 2010; Söderlund & Lindberg, 2001, 2007; Wicksell et al., 2008, 2010) were included in the meta-analysis, whose forest plot is presented in Fig. 2. No statistically significant overall effect was observed (p = 0.73; SMD, − 0.04; 95% CI, − 0.25 to 0.17), indicating that CBT was not more effective than all types of comparisons in reducing pain at the intermediate-term follow-up. The I2 value was 0%, indicating insignificant heterogeneity. Due to a serious impression with one levels downgraded from the GRADE score, the quality evidence was deemed moderate (Online Appendix 3).

For pain in the long-term, 361 patients with chronic WADs from studies (Andersen et al., 2020; Michaleff et al., 2014; M. J. Stewart et al., 2007a, 2007b) were included in the meta-analysis, whose forest plot is presented in Fig. 2. No statistically significant overall effect was observed (p = 0.58; SMD, − 0.08; 95% CI, − 0.36 to 0.20), indicating that CBT was not more effective than all types of comparisons in reducing pain at the long-term follow-up. The I2 value was 44%, indicating moderate heterogeneity. Due to a serious impression with one levels downgraded from the GRADE score, the quality evidence was deemed moderate (Online Appendix 3).

For disability in the short-term, 372 patients with chronic WADs from six studies (Andersen et al., 2020; Dunne et al., 2012; Pato et al., 2010; Söderlund & Lindberg, 2001, 2007; M. J. Stewart et al., 2007a, 2007b; Wicksell et al., 2008; Wicksell et al., 2010) were included in the meta-analysis, whose forest plot is presented in Fig. 3. No statistically significant overall effect was observed (p = 0.18; SMD, − 0.20; 95% CI, − 0.50 to 0.10), indicating that CBT was not more effective than all types of comparisons in reducing disability at the short-term follow-up. The I2 value was 45%, indicating moderate heterogeneity. Due to a serious impression with one levels downgraded from the GRADE score, the quality evidence was deemed moderate (Online Appendix 3).

Fig. 3
figure 3

Forest plot of CBT versus all types of comparisons on disability in the short, intermediate and long term

For disability in the intermediate-term, 348 patients with chronic WADs from five studies (Andersen et al., 2020; Michaleff et al., 2014; Pato et al., 2010; Söderlund & Lindberg, 2001, 2007; Wicksell et al., 2008, 2010) were included in the meta-analysis, whose forest plot is presented in Fig. 3. No statistically significant overall effect was observed (p = 0.68; SMD, − 0.05; 95% CI, − 0.31 to 0.20), indicating that CBT was not more effective than all types of comparisons in reducing disability at the intermediate-term follow-up. The I2 value was 23%, indicating insignificant heterogeneity. Due to a serious impression with one levels downgraded from the GRADE score, the quality evidence was deemed moderate (Online Appendix 3).

For disability in the long-term, 361 patients with chronic WADs from studies (Andersen et al., 2020; Michaleff et al., 2014; M. J. Stewart et al., 2007a, 2007b) were included in the meta-analysis, whose forest plot is presented in Fig. 3. No statistically significant overall effect was observed (p = 0.20; SMD, − 0.18; 95% CI, − 0.46 to 0.10), indicating that CBT was not more effective than all types of comparisons in reducing disability at the long-term follow-up. The I2 value was 42%, indicating moderate heterogeneity. Due to a serious impression with one levels downgraded from the GRADE score, the quality evidence was deemed moderate (Online Appendix 3).

For QoL in the short-term, 247 patients with chronic WADs from three studies (Andersen et al., 2020; Dunne et al., 2012; M. J. Stewart et al., 2007a, 2007b) were included in the meta-analysis with the SF-36 scores. The forest plot of the physical component summary score in the SF-36 is presented in Fig. 4, and that of the mental component summary score is presented in Fig. 5. No statistically significant overall effect was observed (p = 0.08; SMD, − 0.26; 95% CI, − 0.55 to 0.03), indicating that CBT was not more effective than all types of comparisons in improving QoL (physical component summary) at the short-term follow-up. CBT had a statistically significant overall small effect (p = 0.007; SMD, − 0.35; 95% CI, − 0.60 to − 0.10), indicating that CBT was more effective than all types of comparisons in improving QoL (mental component summary) at the short-term follow-up. The I2 value was 18% for the physical component summary score, and 0% for the mental component summary score, indicating insignificant heterogeneity. Due to a serious impression with one level downgraded from the GRADE score, the quality evidence was considered moderate (Online Appendix 3).

Fig. 4
figure 4

Forest plot of CBT versus all types of comparisons on quality of life in the short, intermediate and long term (physical component summary)

Fig. 5
figure 5

Forest plot of CBT versus all types of comparisons on quality of life in the short, intermediate and long term (mental component summary)

For QoL in the intermediate-term, 223 patients with chronic WADs from two studies (Andersen et al., 2020; Michaleff et al., 2014) were included in the meta-analysis with the SF-36 scores. The forest plot of the physical component summary score in the SF-36 is presented in Fig. 4, and that of the mental component summary score is presented in Fig. 5. CBT had no statistically significant overall effect (p = 0.90; SMD, − 0.02; 95% CI, − 0.33 to 0.29 for the physical component summary score; p = 0.28; SMD, − 0.14; 95% CI, − 0.41 to 0.12 for the mental component summary score), indicating that CBT was not more effective than all types of comparisons in improving the QoL at the intermediate-term follow-up. The I2 value was 24% for the physical component summary score, and 0% for the mental component summary score, indicating insignificant heterogeneity. Due to a serious impression with one level downgraded from the GRADE score, the quality evidence was considered moderate (Online Appendix 3).

For QoL in the long-term, 361 patients with chronic WADs from three studies (Andersen et al., 2020; Michaleff et al., 2014; M. J. Stewart et al., 2007a, 2007b) were included in the meta-analysis with the SF-36 scores. The forest plot of the physical component summary score in the SF-36 is presented in Fig. 4, and that of the mental component summary score is presented in Fig. 5. CBT had no statistically significant overall effect (p = 0.21; SMD, − 0.13; 95% CI, − 0.34 to 0.07 for the physical component summary score; p = 0.45; SMD, − 0.08; 95% CI, − 0.29 to 0.13 for the mental component summary score), indicating that CBT was not more effective than all types of comparisons in improving the QoL at the long-term follow-up. The I2 value was 0%, indicating insignificant heterogeneity. Due to a serious impression with one level downgraded from the GRADE score, the quality evidence was considered moderate (Online Appendix 3).

For secondary outcomes (fear of physical activity, anxiety, depression, posttraumatic stress) in the short-term, 135 patients with chronic WADs from three studies (Andersen et al., 2020; Dunne et al., 2012; Wicksell et al., 2008, 2010) were included in the meta-analysis, whose forest plot is presented in Online Appendix 4. More details of the results of secondary outcomes is available in Online Appendix 4. CBT was more effective than all types of comparisons in reducing the fear of physical activity (p = 0.03; SMD, − 0.70; 95% CI, − 1.34 to − 0.07), anxiety (p = 0.02; SMD, − 0.62; 95% CI, − 1.16 to − 0.08), and depression (p = 0.005; SMD, − 0.68; 95% CI, − 1.15 to − 0.20) at the short-term follow-up. CBT was not more effective than all types of comparisons in reducing posttraumatic stress (p = 0.34; SMD, − 0.28; 95% CI, − 0.87 to 0.30) at the short-term follow-up. Due to a very serious impression with two levels downgraded from the GRADE score, the quality evidence was considered low for all secondary outcomes (Online Appendix 3).

For secondary outcomes (fear of physical activity, anxiety, depression) in the intermediate-term, 98 patients with chronic WADs from two studies (Andersen et al., 2020; Wicksell et al., 2008, 2010) were included in the meta-analysis, whose forest plot is presented in Online Appendix 4. More details of the results of secondary outcomes is available in Online Appendix 4. CBT was more effective than all types of comparisons in reducing anxiety (p = 0.03; SMD, − 0.44; 95% CI, − 0.84 to − 0.04) at the intermediate-term follow-up. CBT was not more effective than all types of comparisons in reducing fear of physical activity (p = 0.48; SMD, − 0.24; 95% CI, − 0.92 to 0.43), and depression (p = 0.10; SMD, − 0.70; 95% CI, − 1.53 to 0.14) at the intermediate-term follow-up. Due to a very serious impression with two levels downgraded from the GRADE score, the quality evidence was considered low for all secondary outcomes (Online Appendix 3).

Subgroup Analysis: CBT Alone Versus Wait-and-See Control

For pain in the short-term, 46 patients with chronic WADs from two studies (Dunne et al., 2012; Wicksell et al., 2008, 2010) were included in the meta-analysis, whose forest plot is presented in Fig. 6. No statistically significant overall effect was observed (p = 0.11; SMD, − 0.48; 95% CI, − 1.07 to 0.11), indicating that CBT alone was not more effective than the wait-and-see control in reducing pain at the short-term follow-up. The I2 value was 0%, indicating insignificant heterogeneity. The number of participants within the pooled analysis was a very small sample. Due to a very serious impression with two levels downgraded from the GRADE score, the quality evidence was considered low (Online Appendix 3).

Fig. 6
figure 6

Forest plot of CBT alone versus wait-and-see control on pain in the short term

For disability in the short-term, 46 patients with chronic WADs from two studies (Dunne et al., 2012; Wicksell et al., 2008, 2010) were included in the meta-analysis, whose plot is presented in Fig. 7. CBT alone had a statistically significant overall medium effect (p = 0.05; SMD, − 0.61; 95% CI, − 1.21 to − 0.01), indicating that CBT alone was more effective than the wait-and-see control in terms of disability reduction at the short-term follow-up. The I2 value was 0%, indicating insignificant heterogeneity. The number of participants within the pooled analysis was a very small sample. Due to a very serious impression with two levels downgraded from the GRADE score, the quality evidence was considered low (Online Appendix 3).

Fig. 7
figure 7

Forest plot of CBT alone versus wait-and-see control on disability in the short term

For secondary outcomes (fear of physical activity, anxiety, depression, posttraumatic stress) in the short-term, 46 patients with chronic WADs from two studies (Dunne et al., 2012; Wicksell et al., 2008, 2010) were included in the meta-analysis, whose forest plot is presented in Online Appendix 4. More details of the results of secondary outcomes is available in Online Appendix 4. CBT alone had a statistically significant overall large effect, indicating that CBT alone was more effective than the wait-and-see control in reducing the fear of physical activity (p = 0.001; SMD, − 1.04; 95% CI, − 1.67 to − 0.41), anxiety (p = 0.002; SMD, − 0.97; 95% CI, − 1.59 to − 0.35), and depression (p = 0.001; SMD, − 1.04; 95% CI, − 1.66 to − 0.41) at the short-term follow-up. CBT alone was not more effective than the wait-and-see control in reducing posttraumatic stress (p = 0.34; SMD, − 0.28; 95% CI, − 0.87 to 0.30) at the short-term follow-up. Due to a very serious impression with two levels downgraded from the GRADE score, the quality evidence was considered low for all secondary outcomes (Online Appendix 3).

Subgroup Analysis: CBT in Addition to Physical Interventions versus Advice Only

For pain in the long-term, 282 patients with chronic WADs from two studies (Michaleff et al., 2014; M. J. Stewart et al., 2007a, 2007b) were included in the meta-analysis, whose forest plot is presented in Fig. 8. CBT in addition to physical interventions had no statistically significant overall effect (p = 0.09; SMD, − 0.20; 95% CI, − 0.43 to 0.03), indicating that CBT in addition to physical interventions was not more effective than advice in reducing pain at the long-term follow-up. The I2 value was 0%, indicating insignificant heterogeneity. Due to a serious impression with one level downgraded from the GRADE score, the quality evidence was deemed moderate (Online Appendix 3).

Fig. 8
figure 8

Forest plot of CBT in addition to physical interventions versus advice only on pain in the long term

For disability in the long-term, 282 patients with chronic WADs from two studies (Michaleff et al., 2014; M. J. Stewart et al., 2007a, 2007b) were included in the meta-analysis, whose forest plot is presented in Fig. 9. CBT in addition to physical interventions had a statistically significant overall small effect (p = 0.01; SMD, − 0.29; 95% CI, − 0.53 to − 0.06), indicating that CBT in addition to physical interventions was more effective than advice only in reducing disability at the long-term follow-up. The I2 value was 0%, indicating insignificant heterogeneity. Due to a serious impression with one level downgraded from the GRADE score, the quality evidence was considered moderate (Online Appendix 3).

Fig. 9
figure 9

Forest plot of CBT in addition to physical interventions versus advice only on disability in the long term

For QoL in the long-term, 282 patients with chronic WADs from two studies (Michaleff et al., 2014; M. J. Stewart et al., 2007a, 2007b) were included in the meta-analysis with the SF-36 scores. The forest plot of the physical component summary score in the SF-36 is presented in Fig. 10, and that of the mental component summary score is presented in Fig. 11. CBT in addition to physical interventions had no statistically significant overall effect (p = 0.09; SMD, − 0.20; 95% CI, − 0.44 to 0.03 for the physical component summary score; p = 0.32; SMD, − 0.12; 95% CI, − 0.35 to 0.11 for the mental component summary score), indicating that CBT in addition to physical interventions was not more effective than advice only in improving the QoL at the long-term follow-up. The I2 value was 0%, indicating insignificant heterogeneity. Due to a serious impression with one level downgraded from the GRADE score, the quality evidence was considered moderate (Online Appendix 3).

Fig. 10
figure 10

Forest plot of CBT in addition to physical interventions versus advice only on quality of life in the long term (physical component summary)

Fig. 11
figure 11

Forest plot of CBT in addition to physical interventions versus advice only on quality of life in the long term (mental component summary)

Discussion

As far as the authors know, this is the first meta-analysis investigating the effects of CBT alone and those of the combination of CBT and physical interventions on patients with chronic WADs. Eight studies were considered in detail. Most RCTs were of high quality. This analysis indicated with moderate-quality evidence that CBT was no more effective for most primary outcomes than any other intervention included in a comparison. In subgroup analyses, when considering the effects of CBT alone, data synthesis was possible only for the short term with two RCTs (Dunne et al., 2012; Wicksell et al., 2008, 2010), resulting in the low quality of evidence of all findings. Regarding the effects of the combination of CBT and physical interventions, data synthesis was possible in the comparison between CBT with exercises and advice only for the long term with two RCTs (Michaleff et al., 2014; M. J. Stewart et al., 2007a, 2007b), resulting in the moderate quality of evidence of all findings.

The Meta-Analysis for the Effects of CBT Compared with those of All Types of Comparisons

No statistically significant overall effect was observed for most primary outcomes, except QoL (mental component summary) at short-term follow-up, thereby indicating that CBT was no more effective than any other intervention included in a comparison. However, this result may be affected by heterogeneity in the interventions and comparisons of the studies included in the meta-analysis. Several studies (Andersen et al., 2020; Michaleff et al., 2014; Pato et al., 2010; Söderlund & Lindberg, 2001, 2007; M. J. Stewart et al., 2007a, 2007b) demonstrated the effectiveness of CBT as a supplement to other treatments and as part of a comprehensive exercise program. Additionally, the comparison group varied between trials, with two trials (Dunne et al., 2012; Wicksell et al., 2008, 2010) using a waiting list control and other studies comparing the CBT group to other treatments (e.g., medication, acupuncture, physical therapy, naprapathy, and osteopathy) or advice alone. Therefore, these results should be interpreted with caution due to the difficulty in separating the effects of CBT from those of other treatments or exercises.

None of the comparisons provided high-quality evidence either for or against the effectiveness of CBT. The main reason for downgrading the evidence quality was high imprecision. There are methodological reasons for this. Small sample sizes are considered acceptable in behavioral science research (Shearer et al., 2016), and the sample size was less than 400 for all outcomes in the studies considered. Therefore, further RCTs are required to improve the evidence quality for CBT effectiveness.

The Meta-Analysis for the Effects of CBT Alone Compared with Those of the Wait-and-See Control

At the short-term follow-up, statistically significant reductions in disability, fear of physical activity, anxiety, and depression were found in favor of CBT, although no difference in pain and posttraumatic stress was observed. Relatively, the effect size of the fear of physical activity, anxiety, and depression seems to be larger than that of disability, indicating the characteristics of CBT as a psychological intervention. In addition, no additional RCTs were included in the meta-analysis after the previous meta-analysis in 2016 (Anstey et al., 2016). The lack of additional RCTs may indicate that interest in recent research has shifted to the investigation of the combined effects of CBT and other treatments, such as exercise (Andersen et al., 2020; Michaleff et al., 2014; Pato et al., 2010; Söderlund & Lindberg, 2001, 2007; M. J. Stewart et al., 2007a, 2007b).

The Meta-Analysis of the Combined Effects of Physical Interventions and CBT Compared with Those of Advice Only

A statistically significant reduction in disability was found in favor of CBT. The moderate quality of evidence of the long-term effects of CBT with physical interventions on disability would be an important finding to better guide management strategies for chronic WADs from a Bio-Psycho-Social perspective. However, further investigations are needed to implement this finding in clinical practice. First, the effect size of 0.29 is small; thus, further investigations are required to determine the most effective form of CBT, dose, optimal combination with other therapeutic modalities, and ways to deliver these approaches. Second, the usefulness of the inclusion of CBT components in physical interventions is recognized and provided as a management strategy for patients with chronic low-back pain, such as cognitive functional therapy (O'Sullivan et al., 2018). However, Beissner et al. (2009) have reported that physical therapists lack CBT implementation in clinical practice primarily due to limited knowledge about CBT techniques. Evidence has been increasing that educational/training level, not work experience, can be associated with the implementation of the Bio-Psycho-Social model of care with the identification of patients’ psychological status (Miki et al., 2020; Suzuki & Takasaki, 2020; Takasaki et al., 2014). Establishing a global educational/training system will be a challenge for physical therapists to be able to implement the Bio-Psycho-Social model of care not only using CBT techniques but also other behavioral techniques, such as communication to increase patient’s autonomy (Murray et al., 2019) and motivational interviews (Alperstein & Sharpe, 2016).

Evidence on the long-term effects of the combination of CBT and physical interventions compared with those of advice only is lacking, which is not surprising because the reduction of pain intensity is no longer the primary focus in patients with chronic WADs (Scholten-Peeters et al., 2002). However, evidence is lacking on the long-term effects of the combination of CBT and physical interventions compared with those of advice only on QoL measures, which were subscales of the SF-36, although a statistically significant effect on disability was observed measured by the NDI. The discrepancy may reflect the lower responsiveness of the SF-36 than that of the NDI in patients with chronic WADs (Stewart et al., 2007). In this systematic review, all PROMs had the structure of pre-determined items. Such a structured PROM reduces responsiveness from individuals with neck pain (Cleland et al., 2006; M. Stewart et al., 2007a, 2007b) because each item has the same weight of importance among all participants, resulting in the lack of validity for measuring the intended health construct (Walton et al., 2010). The recently developed Satisfaction and Recovery Index is an importance-weighted health-related satisfaction tool that captures both the process and status of recovery following musculoskeletal trauma and is shown to be more responsive than SF-12 and region-specific disability measures (Modarresi & Walton, 2020; Walton et al., 2014). Therefore, further studies are required to include such an importance-weighted PROM for outcome measures to clarify the effects of an intervention for those with musculoskeletal trauma.

Both in the two RCTs (Michaleff et al., 2014; M. J. Stewart et al., 2007a, 2007b) included in the meta-analysis to investigate the combined effects of CBT and physical interventions, CBT was provided by physical therapists. Therefore, it has been unknown which is better in terms of treatment effect and cost-effectiveness between multidisciplinary approach with separate roles of CBT for psychologists and physical interventions for physical therapists and physical therapist’s delivering CBT with physical interventions. However, we believe that it is important to involve psychologists when planning future studies in order to enhance the quality of the intervention. Previous studies have shown the benefits of a multidisciplinary approach for chronic pain conditions (Casey et al., 2020; Kamper et al., 2014), and further research is needed.

Limitations

Our meta-analysis had some limitations. The first and greatest limitation is the limited number of studies included in the meta-analysis. We were unable to compare the advantages of a combination of physical interventions and CBT with other treatments other than advice because of heterogeneity regarding interventions, comparison, and outcomes. In the future, when more RCTs reporting the effects of CBT are available, the results of this study can be strengthened. Secondly, the analysis was performed with a limited number of participants. Therefore, studies with a larger sample size should be performed in the future. Finally, we did not actively seek unpublished studies. However, we believe it is unlikely to have had an important impact on the overall results.

Conclusion

This systematic review with meta-analysis involving patients with chronic WADs found moderate-quality evidence that CBT was no more effective for most primary outcomes than any other intervention included in a comparison. We also performed subgroup analysis and found a low level of evidence on the favorable effects of CBT alone compared with those of the wait-and-see control on disability and psychological status in the short term. In addition, this study found a moderate favorable evidence on the effects of the combination of physical interventions and CBT compared with those of advice only on disability in the long term.