Introduction

Tension-type headache (TTH) is one of the most prevalent neurological conditions [1,2,3]. Patients with TTH suffer from mild-to-moderate headaches—with tightening, constricting, or pressing sensation around the head [4]. Although TTH had a lower disability weight compared to the other primary headaches [1, 3, 5], while the global prevalence of TTH was as high as 26.8% [1]. Furthermore, for the persistent symptoms, TTH causes distress and disturbance [6, 7], wastes time at work and for family activities [8], and reduces students' learning capacity [9].

Multiple treatments are proposed to manage TTH [3, 10]. For non-frequent TTH, symptomatic treatment with acute medications is recommended, such as simple analgesics and non-steroidal anti-inflammatory drugs [11]. For chronic TTH or episodic TTH, prophylactic treatment is typically necessary, including amitriptyline [11], mirtazapine [12], and non-pharmacological therapies such as physical exercise [13] and cognitive behavioral therapy [14].

Acupuncture is a complementary alternative therapy that can reduce the frequency and intensity of idiopathic headaches [15, 16] and enhance the quality of life [17] in patients with TTH. Systematic reviews suggested that acupuncture was effective for TTH prevention [18,19,20]. However, periodic updating meta-analysis from trials with small sample sizes inflates type I error rates [21, 22], which leads to overestimating the efficacy of acupuncture. Therefore, the effectiveness of acupuncture for TTH and the reliability of current evidence remain questionable.

Trial sequential analysis (TSA) is a method that can provide the required information sizes (RIS) for meta-analysis, and the RIS represents there are adequately powered to verify the conclusion of study [22, 23]. Meanwhile, TSA can provide trial sequential monitoring boundaries (TSMB) that are corrected thresholds for statistical significance and ineffectiveness of the intervention [24, 25]. The method can reduce early false-positive results from repeated analysis of accumulating data in meta-analysis [25]. In addition, TSA can assist in determining the reliability of the results from meta-analysis and whether additional trials are needed. If the summary test statistic-Z-curve cross the TSMB or the cumulative sample sizes more than RIS, indicating there is solid evidence, then the redundant trials can be stopped, otherwise the evidence may be inconclusive, and additional trials are needed [22, 25].

We aimed to examine whether acupuncture is effective and safe for TTH prophylaxis and explore whether additional trials are required for particular outcomes (i.e., TTH frequency and responder rate), therefore, we conducted a systematic review and meta-analysis with TSA to evaluate the efficacy of acupuncture versus sham acupuncture, no acupuncture, and other treatments in the preventive treatment of TTH.

Methods

Our systematic review and meta-analysis were conducted and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [26].

Search strategy

We searched the following databases from inception to September 29, 2022: Ovid Medline, Embase, and the Cochrane Library, without any language restriction. The details of search strategies can be found in eTables 1–3. Besides, the clinical registry (clinicaltrials.gov) was searched for any missing studies. We also screened the reference of the published systematic reviews for any potentially eligible RCTs.

Inclusion and exclusion criteria

Included studies were required to meet the following criteria: (1) Adult patients with TTH; (2) The diagnostic criteria were the International Classification of Headache Diseases standard developed by the International Headache Society; (3) The experimental arm was acupuncture, the controlled arm was sham acupuncture, usual care, waiting list, or other active treatments (drugs recommended by guidelines, physical training, relaxation training, cognitive treatment); (4) At least one of the outcomes as follows has been measured and reported: TTH frequency, responder (reduction ≥ 50% in TTH frequency) rate and adverse event (AE); (5)The type of study should be RCTs with parallel design, and the crossover design RCTs with data of the first phase were also included.

The studies that met one of the following criteria were excluded: (1) Not original research articles; (2) Duplicate articles.

Outcome assessments

The primary outcome was TTH frequency (number of headache days). And the secondary outcome was responder rate. The number of treatment-related AE also was assessed. We evaluated the outcomes into two phases: after treatment and at the follow-up period. If the follow-up period contained more than one time point with available data, we used the date closest to the end of treatment.

Screening and data extraction

Two reviewers (Q-FT and X-YW) independently scanned the title and abstract of searched studies, and then they read the full-text to identify the eligible RCTs. The disagreements were solved by discussion and judged by a third reviewer (HZ) finally.

Two reviewers (X-YW and S-JF) extracted the data from eligible studies with standardized extraction forms. The extracted information included: (1) Characteristics of the included studies—author, year of publication, country, study designs, diagnostic criteria, sample sizes, the proportion of female patients, mean age, duration of TTH; (2) Details of intervention and control arm—name of the intervention and control, dosage and duration of treatment; (3) Outcome data—name of the outcomes, the number of participants in each treatment groups, mean and standard of the continuous data and the number of events of the dichotomous data. The differences were addressed through conversation and judgment by a third reviewer (HZ).

Risk of bias assessment

We assessed the risk of bias (ROB) in the included studies by the second version of the Cochrane risk-of-bias tool [27]. Each included study was evaluated through the following five domains: (1) randomization process; (2) deviations from intended interventions; (3) missing outcome data; (4) measurement of the outcome; (5) selection of the reported result. Finally, the study will be classified as having a low, some concerns, or high risk of bias (ROB).

Statistical analysis

The meta-analysis was performed by Review Manager 5.4 software. The effect sizes of continuous outcomes were measured by standardized mean difference (SMD), and the effect sizes of the dichotomous outcomes were calculated by relative ratio (RR); the corresponding 95% confidence intervals (CIs) were also calculated. We evaluated between-study heterogeneity by I2 statistics [28]. An I2 value ≥ 50% indicates significant heterogeneity in the studies. Random-effects model (DerSimonian Laird method) was used when the I2 ≥ 50, while the fixed-effects model (the inverse variance method) was applied when I2 < 50%.

Subgroup analysis was performed to find the source of heterogeneity. Sensitivity analysis was performed by the leave-one-out analysis to test the robustness of the findings. We removed one study from the meta-analysis at a time to assess the impact of a particular study on the meta-analysis.

TSA was performed by TSA 0.9.5.10 beta (https://www.ctu.dk/tsa/) to calculate the RIS and TSMB. The type I error was allowed to be 0.05 and the type II error was 0.2 when estimating the RIS. The significance boundaries were calculated based on the O’Brien-Fleming alpha-spending method. For continuous data, we estimated mean difference and variance based on empirical assumptions generated by software. For dichotomous data, we estimated mean difference and variance based on the incidence in low risk of bias studies. The correction of heterogeneity was based on model variance.

The certainty of the evidence

The certainty of evidence was assessed by the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) [29]. The level of evidence was to be classified as high, moderate, low, or very low through the following five components: risk of bias, inconsistency, indirectness, imprecision, and other considerations.

Results

Characteristics of the included RCTs

A total of 2459 articles were found from databases, and we finally included 14 studies [15,16,17, 30,31,32,33,34,35,36,37,38,39,40] with 2795 participants after screening. The screening procedure is depicted in Fig. 1.

Fig. 1
figure 1

PRISMA flow diagram of literature search and study selection

The major characteristics of the included RCTs are shown in Table 1. The included studies were conducted in ten countries. Six RCTs only enrolled participants with chronic TTH, one RCT only enrolled participants with episodic TTH, and seven enrolled both. The proportion of females was 71% and the mean age was 40.9 years. Ten studies compared acupuncture with sham acupuncture; one study compared acupuncture with no acupuncture; one study compared acupuncture with both sham acupuncture and no acupuncture; one study compared acupuncture with physical training and relaxation training, and one study compared acupuncture with both physical training and no acupuncture.

Table 1 Characteristics of the included RCTs

The ROB of the included RCTs is shown in eFigure 1. Six RCTs were assessed as low risk, five RCTs were assessed with some concerns, and three RCTs were assessed as high risk. Three studies were rated high risk in the domain of deviations from intended interventions, five studies were rated some concerns in the domain of randomization process, three studies were rated some concerns in the domain of missing outcome date, and two studies were rated with some concerns in the domain of selection of the reported results.

Tension-type headache frequency

Acupuncture vs. sham acupuncture

Eight studies (n = 1163, I2 = 94%) were pooled to compare acupuncture and sham acupuncture after treatment. Acupuncture was better than sham acupuncture in reducing the TTH frequency after treatment (SMD − 0.80, 95% CI − 1.36 to − 0.24, P = 0.005, Fig. 2A). Sensitivity analysis verified the result was robust (eFigure 2A), and the GRADE evidence was low (eTable 4).

Fig. 2
figure 2

Meta-analysis (A) and TSA (B) of acupuncture vs. sham acupuncture in the TTH frequency after treatment. TSA, trial sequential analysis. The blue curve represents the Z-curve, the red curves above and below represent trial sequential monitoring boundaries, the dashed red line represents the traditional level of statistical significance, and the red vertical line represents RIS Value; the red lines on the sides closest to the horizontal line are boundaries for futility

Eight studies (n = 1143, I2 = 97%) were included to compare the effect of acupuncture and sham acupuncture at the follow-up period (eFigure 3). Acupuncture presented as superior to sham acupuncture (SMD − 1.33, 95% CI − 2.18 to − 0.49, P = 0.002). The result was stable (eFigure 2B), and the quality of evidence was low (eTable 5).

TSA showed that the cumulative Z-curve did not cross the TSMB and the included sample size did not exceed the RIS (n = 2317, Fig. 2B). This suggests a lack of evidence confirming a significant difference in reducing the TTH frequency between acupuncture and sham acupuncture.

Acupuncture vs. no acupuncture

Two studies (n = 1305, I2 = 0%) were collected to compare acupuncture and no acupuncture after treatment. The frequency was reduced more in the acupuncture group than in the no acupuncture group (SMD − 0.52, 95% CI − 0.63 to − 0.41, P < 0.00001, eFigure 4A). Sensitivity analysis verified the robustness of the result (eFigure 5A), but the quality of evidence was very low (eTable 6).

The result that entered two studies (n = 226, I2 = 90%) showed the difference was not significant at the follow-up phase (SMD − 0.43, 95% CI − 1.50 to 0.64, P = 0.43, eFigure 6). Sensitivity analysis showed that the result was unstable. After omitting a study [35], the results of meta-analysis tended to be significant in the efficacy of acupuncture (eFigure 5B). GRADE showed a very low quality of evidence for the outcome (eTable 7).

TSA showed that the Z-curve traversed the TSMB and the sample size included exceeded the RIS (n = 237, eFigure 4B), suggesting firm evidence of a significant difference between acupuncture compared to no acupuncture in reducing TTH frequency.

Acupuncture vs. physical training

One study showed that physical training was more effective than acupuncture at the end of treatment (eFigure 7). While there was no statistically significant difference at the follow-up period by pooled two studies (eFigure 8). The GRADE evidence was very low (eTables 8–9).

Acupuncture vs. relaxation training

One study compared acupuncture and relaxation training, and the result showed that relaxation training leads to more reduction of frequency than acupuncture (eFigure 9). But there was no statistically significant difference at the follow-up period (eFigure 10). The GRADE showed the quality-evidence was very low (eTables 10–11).

Responder rate

Acupuncture vs. sham acupuncture

Five studies (n = 941, I2 = 26%) were included to compare acupuncture and sham acupuncture after treatment. A higher proportion of patients in the acupuncture group achieved at least 50% reduction in TTH frequency than in the sham acupuncture group (RR 1.28, 95% CI 1.12 to 1.46, P = 0.0003, Fig. 3A). Sensitivity analysis showed that the result was stable (eFigure 11A), and the GRADE evidence was moderate (eTable 4).

Fig. 3
figure 3

Meta-analysis (A) and TSA (B) of acupuncture vs. sham acupuncture in the responder rate after treatment. TSA, trial sequential analysis. The blue curve represents the Z-curve, the red curves above and below represent trial sequential monitoring boundaries, the dashed red line represents the traditional level of statistical significance, and the red vertical line represents RIS Value; the red lines on the sides closest to the horizontal line are boundaries for futility

Five studies (n = 941, I2 = 0%) were pooled to compare acupuncture and sham acupuncture at the follow-up period (eFigure 12). The result showed that acupuncture had a higher rate of response than sham acupuncture (RR 1.37, 95% CI 1.19 to 1.58, P < 0.0001). The result was stable (eFigure 11B), and the quality of evidence was assessed as moderate (eTable 5).

TSA showed that the cumulative Z-curve did not cross the TSMB and the accrued sample size did not reach the RIS (n = 2140, Fig. 3B), indicating a lack of firm evidence of significant difference between acupuncture compared to sham acupuncture in responder rate.

Acupuncture vs. no acupuncture

Two studies (n = 1472, I2 = 79%) were entered to compare acupuncture and no acupuncture after treatment. There was no significant difference between acupuncture and no acupuncture (RR 6.46, 95% CI 0.77 to 53.90, P = 0.08, eFigure 13A). The sensitivity analysis presented that the result was unstable. After omitting a study [17], the result of meta-analysis tended to be significant in the efficacy of acupuncture, and after omitting another study [15], the result showed that acupuncture was more effective (eFigure 14A). The evidence of quality was low (eTable 6).

Two studies (n = 255, I2 = 94%) were pooled to compare acupuncture and no acupuncture at the follow-up period. There was no significant difference between acupuncture and no acupuncture (RR 3.19, 95% CI 0.29 to 35.01, P = 0.34, eFigure 15). The sensitivity analysis presented that the result was unstable. After omitting a study [35], the results of meta-analysis tended to be significant in the efficacy of acupuncture (eFigure 14B). The quality of evidence was very low evaluated by GRADE (eTable 7).

TSA showed that the cumulative Z-curve did not cross the TSMB and the cumulative sample size did not exceed the RIS (n = 6554, eFigure 13B), which indicated that there was an absence of firm evidence to support a significant difference in responder rates between acupuncture and no acupuncture.

Acupuncture vs. physical training

One study compared acupuncture and physical training at the follow-up period. The acupuncture group had a higher response rate than the physical training group (eFigure 16) with very low quality of evidence (eTable 9).

Safety evaluation

Eight studies documented AE, and six of them occurred AE. The main AEs occurred in these studies included dizziness, hematoma, pain, and severe headache. There was no serious AE has been reported.

Seven studies (n = 1114, I2 = 0%) were included to compare acupuncture and sham acupuncture after treatment (Fig. 4A). There was no statistically significant difference between the two groups (RR 1.14, 95% CI 0.67 to 1.94, P = 0.62). Sensitivity analysis showed the result was stable (eFigure 17), and the GRADE evidence was low (eTable 4).

Fig. 4
figure 4

Meta-analysis (A) and TSA (B) of acupuncture vs. sham acupuncture in the adverse events. TSA, trial sequential analysis. The blue curve represents the Z-curve, the red curves above and below represent trial sequential monitoring boundaries, the dashed red line represents the traditional level of statistical significance, and the red vertical line represents RIS Value; the red lines on the sides closest to the horizontal line are boundaries for futility

TSA showed that the cumulative Z-curve did not cross the TSMB and the included sample size did not reach the RIS (n = 2235, Fig. 4B), which suggested the absence of solid evidence to support that there is no statistical difference between acupuncture and sham acupuncture in terms of safety.

One study compared acupuncture and no acupuncture after treatment, and the incidence rate of acupuncture group (13 of 24 participants, 51.16%) was statistically significantly more than that of no acupuncture group (0 of 24 participants, 0%). In addition, one study compared acupuncture and physical training, and the result showed that the incidence rate of acupuncture group (13 of 24 participants, 51.16%) was more than that of physical training group (2 of 24 participants, 8.33%).

Subgroup analysis

To find the source of heterogeneity, we conducted subgroup analysis. When subgroup analysis was performed for differences in acupuncture classification, we found the heterogeneity of studies between acupuncture versus sham acupuncture was 0% at the end of treatment (eFigure 18); at the follow-up period, the I2 was 1% (eFigure 19). Furthermore, depending on the results shown, electroacupuncture [40], dry acupuncture [32], and laser acupuncture [30] may be more effective than manual acupuncture in reducing the frequency of headache attacks (eFigures 18–19).

To explore the effect of acupuncture on different classifications of TTH, we conducted a subgroup analysis of the studies that included patients with chronic TTH only. There was no significant difference in reducing the TTH frequency at the end of treatment between acupuncture and sham acupuncture (3 studies with 416 participants; I2 = 96%; SMD − 0.75, 95% CI − 1.92 to 0.42, P = 0.21, eFigure 20), while acupuncture has a higher rate of reducing the TTH frequency compared with sham acupuncture during the follow-up phase (3 studies with 428 participants; I2 = 99%; SMD − 3.39, 95% CI − 6.73 to − 0.06, P = 0.05, eFigure 21).

Discussion

Our meta-analysis revealed that acupuncture is an effective and safe treatment that can be used to prevent TTH attacks. Acupuncture can reduce the number of headache days per month. In studies comparing acupuncture with sham acupuncture, there was an average reduction of 8.15 days after treatment, and in studies comparing acupuncture with no acupuncture, the reduction was 5.25 days. Acupuncture also has a high responder rate, with a rate of 51% in studies comparing acupuncture with sham acupuncture and a rate of 44% in studies comparing acupuncture with no acupuncture. In terms of safety, the incidence of AE in the acupuncture group was 6%, the side effects were slight, and no serious AE.

Compared to sham acupuncture, acupuncture was more effective in reducing TTH frequency and had a higher responder rate. In terms of reducing the number of headache days per month, acupuncture resulted in an additional 2.94 days after treatment. In terms of responder rate, acupuncture was 10% higher after treatment. Furthermore, there was 4 more days of headache reduction per month with acupuncture than no acupuncture after treatment. These findings suggest that acupuncture is an effective treatment in preventing TTH, which are similar to previous studies [18, 19].

Furthermore, we performed TSA to clarify whether there was a sufficient sample size to support the conclusions, which has not been performed in previous studies. We found that the conclusions of the comparison of acupuncture with sham acupuncture were underpowered and the level of evidence quality was low to moderate. High-quality trials are warranted to investigate the efficacy of acupuncture in TTH prevention. Besides, compared to the no acupuncture treatment, the result of TSA demonstrated sufficient sample size to prove the significant efficacy of acupuncture.

Physical training and relaxation training have been shown to be an effective method to manage TTH [41]. We found physical training and relaxation training were superior to acupuncture in reducing pain frequency after treatment, while there was no significant difference at the follow-up period. However, there are fewer studies comparing acupuncture with physical training and relaxation training, and more trials should be conducted in the future to verify the efficacy of acupuncture in preventing TTH attacks.

The safety of acupuncture as a treatment for diseases has been recognized in previous studies [18, 42, 43]. Although the sample size is insufficient to support that there was no significant difference in safety between acupuncture and sham acupuncture, the overall probability of AEs from acupuncture was low (6%). And the symptoms caused by acupuncture were slight, and no serious adverse events occurred.

Limitations of the study

Our study has some limitations. First, TSA has its limitations, as it can be evaluated using a range of models. Different estimates of the effect size and differences in the assumptions of the proportion of events will lead to different RIS and affect whether the Z-curve will cross the TSMB. Therefore, our conclusions about TSA depend on our assumptions and the predefined variables in the model used. For example, for dichotomous data, we calculated the required RIS based on the incidence of low-bias studies. Therefore, our conclusions will rely on studies with a low risk of bias. Second, due to the limitation of retrieval, there may be a possibility that qualified trials may not be included. Third, drug therapy is an effective method for the treatment of TTH, but in the articles we retrieved, there were no studies on acupuncture directly compared with drug therapy, so acupuncture and drugs could not be compared. Fourth, although TSA can adjust the type I error due to repeated significance testing of a meta-analysis, it could lead to an underestimation of the acupuncture effect. Sham acupuncture was reported to be significantly more effective than placebo pills [44], TSA of these trials would lead to a larger sample size since the difference between acupuncture and sham acupuncture would be narrower than the difference between acupuncture and placebo pills. However, looking from another aspect, the finding—acupuncture had better treatment effect than sham acupuncture—that passed the verification of TSA analysis would be more robust than the finding of other comparisons (i.e., acupuncture versus conventional medication treatment). Fifth, due to there are high risk of bias and heterogeneity of included studies, the quality of the evidence was downgraded to moderate, low, or very low.

Implication for practice

Our study has certain implications for clinical practice. First, acupuncture has been shown to be effective in pain conditions, such as migraine [45], knee osteoarthritis [46, 47], and TTH [18, 19]. However, despite its efficacy, acupuncture is not widely used in clinical practice [48]. Reasons for this might include high costs due to acupuncture is not covered by health insurance, non-preference of patients or physicians for acupuncture, and limited access to acupuncture [48, 49]. Meanwhile, acupuncture might be considered as a simple placebo by some patients and physicians [50]. Our study showed that acupuncture can significantly reduce the frequency of TTH attacks, indicating acupuncture can be used as a beneficial alternative therapy for the prevention of TTH. And more high-quality RCTs should be conducted to verify the efficacy. Second, we found that the effect of acupuncture on TTH can be maintained for some time after the intervention. A trial showed a significant difference in at least 50% reduction in headache frequency between the acupuncture group and the sham acupuncture group at 32 week [16]. The long-term effect of acupuncture on TTH deserves to be further researched. Thirdly, drug such as amitriptyline have been recommended as a prophylactic treatment for TTH [11]. While there is still a lack of high-quality trials on acupuncture versus drug, therefore the corresponding trials can be carried out in the future. Fourth, our subgroup analysis found that different acupuncture interventions may have different efficacy in the treatment of tension headaches. Electroacupuncture, dry needle, and laser acupuncture might contribute to a greater reduction in headache frequency than manual acupuncture. Therefore, the study on the difference in the therapeutic effect of different acupuncture interventions has a guiding effect on the application of acupuncture in the treatment of TTH. Fifth, the efficacy of acupuncture may differ for different types of TTH, so it is necessary to conduct separate studies for different types of TTH.

Conclusions

The meta-analysis showed that acupuncture was effective for TTH prophylaxis compared with sham acupuncture and no acupuncture. Acupuncture resulted in an additional 2.94 reduction of headache days per month and an additional response rate of 10% after treatment, compared with sham acupuncture. Meanwhile, acupuncture resulted in 4 more days of headache reduction per month than no acupuncture after treatment. The level of GRADE evidence was very low to moderate. TSA proved that there was a sufficient sample size to demonstrate that acupuncture is more effective than no acupuncture in reducing headache frequency. Nevertheless, further high-quality trials are warranted to verify the efficacy and safety of acupuncture versus sham acupuncture according to TSA.