Introduction

Low back pain (LBP) is a very common health problem in the adult population, with an estimated incidence of 5% and a lifetime prevalence of 60–80% [1, 2]. Chronic pain syndromes develop in 10–20% of these patients and represent a significant source of disability [2]. Among the multiple causes of chronic LBP, lumbar facet joint pain (LFJP) accounts for up to 40% of cases [3]. LFJP is characterized by the degeneration of the zygapophyseal or facet joints of two adjacent lumbar vertebrae, resulting from repetitive mechanical stress, inflammatory processes, or infections [4]. The facet joint is the only synovial joint in the spinal column, and its sensory innervation is provided by the medial branch of the dorsal root [5]. The pain associated with LFJP can radiate to the gluteal region and the posterior aspect of the leg, is exacerbated by lumbar extension and improves with slight lumbar flexion [6]. To confirm LFJP, it is often necessary to perform a diagnostic blockage test [7].

The therapeutic approach to LFJP should be multimodal, including hygienic-dietary measures, physical therapies, and pharmacological interventions [8,9,10]. In non-responders, second-line treatments include local anesthetic and corticosteroid infiltration, and radiofrequency (RF) of the medial branch of the dorsal root [11, 12], all of which are frequently performed by musculoskeletal and interventional radiologists. The latter procedure involves sensory denervation of the facet joint by applying an electrical field around a nerve, which alters the transmission of painful stimuli, either through direct nerve injury (continuous RF, CRF) or modulation of nerve impulses (pulsed RF, PRF) [13, 14]. According to the American Society of Interventional Pain Physicians, radiofrequency neurotomy for lumbar chronic pain patients who test positive for blocks is recommended with level II evidence and moderate strength of recommendation [15]. Similarly, current consensus practice guidelines establish that lumbar RFA may provide benefit to well-selected individuals and stress the importance of selection criteria to improve denervation outcomes [16]. These recommendations are based upon different studies of varied quality and design.

To date, various randomized controlled trials (RCTs) have been conducted comparing the efficacy of RF with placebo [17,18,19], intra-articular infiltration of anesthetics and corticosteroids, and different RF modalities [20,21,22]. However, systematic reviews and meta-analyses of these trials have shown contradictory results [23, 24], which have been attributed to factors, including small sample sizes [17, 25], significant differences in baseline variables, heterogeneous inclusion and exclusion criteria [23], presence or absence of prior diagnostic blocks [17, 26], short follow-up periods [18, 19, 27], or differences in outcome measures and RF technique used [20, 21, 28]. Additionally, various biases in different stages of the clinical trials have been noted [9, 29].

Therefore, high-quality scientific evidence is needed to update and compare the results obtained from RCTs comparing RF versus placebo. Such evidence should consider potential biases and confounding factors that may impact the analysis, shedding light on the efficacy and clinical indications of RF treatment for LFJP.

The aim of this study is to conduct a systematic review and an updated meta-analysis of placebo-controlled RCTs examining the efficacy of RF in the treatment of chronic LBP caused by LFJP.

Materials and methods

Eligibility criteria

The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [30] were followed to collect and report the results obtained. The design and selection criteria were based on the PICOS strategy: adult patients with chronic LBP due to LFJP (P), treated with RF-based procedures (I) compared to placebo (C), and clinical outcomes (pain, functional status, quality of life [QoL], and perceived global effect) (O). Only RCTs were included (S).

Therefore, the inclusion criteria were: RCTs on RF versus placebo for LFJP that included quantitative results for at least one of the primary outcomes, original data, and adult populations. We included all identified studies regardless the year of publication, language or study quality criteria. Efforts to obtain full-text documents (through our institutional virtual library and sending emails to authors) were conducted for studies with limited access.

The exclusion criteria were: locations or conditions other than LFJP (e.g., cervical, thoracic, disc pathology, or sacroiliac pain), RF modalities other than PRF or CRF, quasi-experimental or observational designs without a control group, letters, editorials, or conference proceedings. Figure 1 shows the flow diagram of the study.

Fig. 1
figure 1

Flow diagram of the study

Information sources and search strategy

A systematic search was conducted in PubMed, Scopus, Web of Science, and Cochrane Central Register of Controlled Trials (CENTRAL) databases. The search included literature published up to May 31, 2023. A search equation using the MeSH terms "facet joint," "zygapophyseal joint," "low back pain," "radiofrequency," "efficacy" was used, combining them with the boolean operators AND and OR. Only human studies with abstracts written in English or Spanish were considered, without any other restrictions. Additionally, to optimize the number of relevant results, studies of potential interest from the reference lists of the selected articles were reviewed.

The literature search was conducted by the authors (AJLRB) and (PMJG). Both evaluators are clinicians with 5-year experience in the subject area and previous experience in systematic reviews and meta-analysis methodologies. All titles and abstracts of interest were reviewed. An article that could not be unequivocally excluded based on its title and abstract was considered potentially relevant. Then, the full text of the non-excluded articles was evaluated to determine if they met all the eligibility criteria. Both evaluators performed the search and selection of studies independently according to PRISMA recommendations. This was performed for all steps (title and abstract, and full-text assessment). Once the evaluation was completed independently, consensus was reached for final decisions. Discrepancies between both evaluators were solved by consensus with a senior researcher (FRS).

Variables analyzed and data extraction

The treatment-related outcomes included:

  • Pain relief, measured by the Visual Analog Scale (VAS) or other quantitative scales (e.g., Numeric Rating Scale, NRS).

  • Improvement in functional disability measured by the Roland-Morris Disability Questionnaire (RMDQ) and the Oswestry Disability Index (ODI).

  • Improvement in QoL measured by the Euro-Qol in 5 dimensions and other scales (e.g., SF-36 QoL Questionnaire, 6-item QoL scale).

  • Global perceived effect (GPE) measured by the GPE scale or surrogate scores on overall subjective assessment measured as quantitative data or as data that could be grouped as dichotomous variables (e.g., pain relief < 50% or > 50% compared to baseline, or bad/moderate vs good/excellent overall patient satisfaction).

These variables were grouped according to the time at which they were measured as follows: short-term (< 3 months), medium-term (3–12 months), and long-term (> 12 months). If any RCT reported several measurements within one of those intervals, the data from the last one were selected.

The primary outcome measure was pain relief. The secondary outcome measures were improvement in functional status, QoL, and perceived global effect.

The data from the selected articles were extracted by the author. The data were stored in anonymized spreadsheets and the software Review Manager Web (RevMan Web) version 5.4.0 [31].

Risk of bias, heterogeneity, and publication bias

The Cochrane Risk of Bias Tool v. 2 was used to systematically address the presence of potential biases. For each RCT, the risk of bias of 10 categories was classified as low, intermediate, or high. Studies with < 5 low risk of bias items or > 2 high risk of bias items were considered as higher-risk-of-bias (lower quality) studies. A subgroup analysis according to the quality of the studies was performed for each association. Publication bias was analyzed through funnel plots.

Statistical analysis

For variables measured on different scales (e.g., functional status), standardized mean differences (SMDs) with a 95% confidence interval (CI) were calculated. In the case of different scales with the same range (e.g., pain), mean differences (MDs) were applied, as in Shih et al. [11]. When standard deviations (SDs) were not available for a given variable, they were calculated using the standard error (SE) through the formula \(SD=SE\bullet \surd N\). For studies with sample sizes greater than 70 patients, SD was estimated from 95% confidence intervals using the formula \(SD=\sqrt{N}\bullet \frac{({U}_{L}-{L}_{L})}{3.92}\), and for studies with smaller sample sizes, SD was estimated using the formula \(SD=\sqrt{N}\bullet \frac{({U}_{L}-{L}_{L})}{4.13}\). If SDs or 95% CIs were not available, SD values were imputed based on the median of SDs from all studies in the same group [32,33,34].

The inverse variance-weighted method with a random-effects model was applied to quantitative outcomes, and the Mantel–Haenszel method was applied for dichotomous GPE variables. The I2 statistic was used to analyze heterogeneity among studies (non-relevant, moderate, or substantial, with cutoff values of I2 < 40%, 40% < I2 < 75%, and I2 > 75%, respectively) [35]. Sensitivity analyses were performed in cases of significant heterogeneity (I2 > 40%) by sequentially removing each study to estimate its contribution to the overall analysis. Two-tailed tests were conducted, with significance set at p < 0.05. Statistical analyses were performed using RevMan web [31].

Results

Baseline characteristics of patients

The RCTs included in the meta-analysis encompassed data from 472 patients, with 249 in the RF group and 223 in the placebo group. The smallest RCT included 30 patients (Gallagher et al. 1994), while the largest included 150 patients (Moussa et al. 2020). Of the eight included studies, seven compared CRF with placebo, and one (Tekin et al. 2007) compared CRF, PRF, and placebo. Therefore, data from the latter were analyzed based on the corresponding subgroups, following the example of Maas et al. [29]. Table 1 summarizes the baseline characteristics of the participants in each study.

Table 1 Baseline characteristics of the patients included in the meta-analysis

Risk of bias

A high risk of bias was detected in several RCTs, specifically for performance (n = 1), detection (n = 1), attrition (n = 3), information (n = 1), and other biases (n = 6). Figure 2 summarizes the analysis of risk of bias in the RCTs included in the meta-analysis. Globally, 4 studies [19, 21, 27, 36] were considered of lower quality, mainly due to combinations of selection, blinding and attrition biases. The rest of RCTs were considered of higher quality (i.e., lower risk of bias) according to the criteria detailed in the methodology.

Fig. 2
figure 2

Risk of bias assessment for the trials included in the meta-analysis. Each evaluated item is indicated as " + " for low risk of bias, "?" for unclear risk, and "-" for high risk

Pain relief

Pain relief was measured using the VAS in all studies except Van Tilburg et al. (2016), who used the 11-NRS. Statistically significant differences were found favoring RF over placebo in the short (MD − 1.01; 95% CI − 1.98 to − 0.04; p = 0.04), medium (MD − 1.42; 95% CI − 2.41 to − 0.43; p = 0.005), and long term (MD − 1.12; 95% CI − 1.57 to − 0.68; p < 0.001). Heterogeneity among the studies was high, particularly in the short and medium-term analyses (I2 = 90 and 91%, respectively). The results of the analysis are shown in Fig. 3.

Fig. 3
figure 3

Forest plot comparing radiofrequency versus sham for pain relief at different time intervals after treatment or sham intervention

Improvement in functional status

Statistically significant benefits of RF over placebo were observed in the short (SMD − 0.94; 95% CI − 1.73 to − 0.14; p = 0.02) and long term (SMD − 0.74; 95% CI − 1.09 to − 0.39; p < 0.0001). For medium term outcomes, a trend toward significance favoring RF was observed (SMD − 1.43; 95% CI − 3.24 to − 0.37; p = 0.12). Overall, significant differences favoring RF were found (SMD − 1.04; 95% CI − 1.65 to − 0.44; p < 0.0007). Heterogeneity was high, particularly in the short and medium term (I2 = 87 and 90%, respectively). The results are shown in Fig. 4.

Fig. 4
figure 4

Forest plot comparing radiofrequency versus placebo for improvement in functional disability at different time intervals after treatment or sham intervention

Improvement in QoL

Only two RCTs (Nath et al., 2008; Van Kleef et al., 1999) included results on QoL, using different questionnaires. QoL was analyzed for short and medium term outcomes combined, and no statistically significant differences were observed (SMD − 0.28; 95% CI − 0.75 to 0.18; p = 0.23). Heterogeneity among the studies was low (I2 = 0%). The results of the analysis are shown in Fig. 5.

Fig. 5
figure 5

Forest plot comparing RF treatment with placebo for improvement in quality of life at different time intervals after treatment or sham intervention

Global Perceived Effect

GPE was analyzed in six RCTs [17, 18, 20, 21, 25, 27]. No statistically significant differences were found in GPE measured as a continuous variable (SMD 0.04; 95% CI − 0.55 to 0.63; p = 0.90). The heterogeneity among studies was moderate (I2 = 64%).

Moussa et al. (2020), Van Wijk et al. (2005), and Tekin et al. (2007) measured GPE as variables that could be grouped as dichotomous. In the short term, the results showed a trend toward significance favoring RF (OR 0.55; 95% CI 0.31 to 1; p = 0.05). Statistically significant differences favoring RF were observed in the medium (OR 0.19; 95% CI 0.07–0.52; p = 0.001) and long term (OR 0.22; 95% CI 0.06 to 0.78; p = 0.02). Overall, significant benefits in favor of RF over placebo were found (OR 0.38; 95% CI 0.24–0.6; p < 0.0001). Heterogeneity was low (I2 = 0–3%). The results of the analysis are shown in Fig. 6.

Fig. 6
figure 6

Forest plot comparing radiofrequency versus placebo for global perceive effect (GPE) at different time intervals after treatment or sham intervention. Top, GPE measured as a continuous variable. Bottom, GPE measured as a dichotomous variable

Subgroup analysis

Low back pain duration prior to patient inclusion

Regarding pain relief in the short term, the RCTs that established LBP duration > 1 year in the inclusion criteria [17, 21, 25] showed no significant differences, while significant differences favoring RF were found in the group of LBP < 1 year [18,19,20, 27, 36] (MD − 1.16; 95% CI − 2.11 to − 0.20). In the medium and long term there were similar results between both groups.

For functional status, significant differences favoring RF were found in both groups in the medium and long term. Although no significant differences were observed in the short term in either group, a trend toward significance (p = 0.11) was observed in the group of LBP < 1 year. Supplementary File 1 presents the forest plots for the subgroup analyses.

MRI prior to patient inclusion

Regarding pain relief, the RCTs that included the performance of MRI as an inclusion criterion [17, 21, 27] showed significant differences favoring RF in the long term (MD: − 1.21; 95% CI − 1.40 to − 1.02), but not in the short or medium term. In the RCTs where the previous performance of MRI was not established as inclusion criterion [18,19,20, 25, 26], significant differences favoring RF were found in the short term (MD − 1.42; 95% CI − 2.35 to − 0.50), with a trend toward significance in the medium (p = 0.11) and long (p = 0.08) term.

For functional status, no significant differences were found in the group with prior MRI in the short term, but significant differences favoring RF were found in the medium (SMD: − 3.12; 95% CI: − 3.72 to − 2.53) and long (SMD: − 0.69; 95% CI: − 1.14 to − 0.24) term. In the other group, significant differences were observed in the short, medium and long term. Supplementary File 2 presents the forest plots for the subgroup analysis.

Heterogeneity and publication bias

A subgroup analysis of all associations including 3 or more studies was performed according to their quality (Supplementary File 3). The heterogeneity of most associations (5 out of 7) disappeared in the highest-quality subgroup, suggesting that the low quality of certain studies might represent a relevant source of heterogeneity. Short-term associations presented more favorable estimates for RF in the higher-quality subgroup.

Regarding publication bias, funnel plots were obtained only for pain and functional status due to the low number of studies in the other analyzed variables [37]. The funnel plots did not suggest publication bias (Supplementary File 4).

Sensitivity analysis

The sensitivity analysis showed differences in pain and functional status when excluding certain studies in different time intervals. Specifically, for pain, the exclusion of the studies by Gallagher et al. (1994) in the medium term and Moussa et al. (2020) in the long term led to significant modifications (from "favorable to RF" to "no significant differences"). In the short term, no significant variations in the overall effect size were observed.

Regarding functional status, the exclusion of Van Kleef et al. (1997) in the short term led to significant modifications (from "significant in favor of RF" to "not significant”), and a reduction in heterogeneity (I2 from 87 to 51%). Similarly, the exclusion of Moussa et al. (2020) in the medium term led to significant modifications (from "not significant" to "favorable to RF"), associated with significant changes in heterogeneity (from 95 to 0%). No sensitivity analysis was conducted for the remaining comparisons due to the low number of studies.

Discussion

This meta-analysis included eight placebo-controlled RCTs with a total of 472 patients (249 in the experimental group and 223 in the sham group). The results indicate that RF provides significant benefits in terms of pain relief and improvement in functional disability in the short, medium, and long term compared to placebo, which is consistent with previous studies [17, 21, 25]. However, the benefits in terms of QoL and perceived global effect are inconclusive, mainly due to the low number of RCTs that evaluated these variables in a comparable manner, although there are cues suggesting favorable benefits in GPE, consistent with previous studies [38]. Overall, the analysis of risk of bias indicates that the quality of the studies is adequate, although there may be information biases, as reported elsewhere [29, 38]. Although no clear signs of publication bias were found, its assessment is limited due to the low number of RCTs. A high heterogeneity was found among studies, with the sensitivity analysis showing a mild influence of some studies (e.g., Gallagher et al., Moussa et al.).

The subgroup analysis according to the quality of the studies showed interesting results. First, the heterogeneity disappeared in most of the associations in the subgroup of higher-quality studies. This suggests that lower-quality studies represent a relevant source of heterogeneity and, therefore, future RCTs should focus on avoiding these biases. Second, in the higher-quality subgroup, the pooled estimates of short-term outcomes were much higher than in the lower-quality (less reliable) group (− 1.57 vs. − 0.42 for pain relief, − 1.63 vs. − 0.14 for improvement in functional disability). This fact reinforces our results, showing that higher-quality studies showed even more favorable outcomes for RF in the short term.

Several systematic reviews and meta-analyses on this topic have been published. For example, a systematic review of RCTs conducted by Leggett et al. (2014) reported short-term benefits in favor of RF [39], while Manchikanti et al. (2020) found long-term pain relief benefits with level II evidence [15]. These findings contradict a Cochrane review published by Maas et al. (2015), which reported the absence of high-quality studies suggesting benefits of RF in chronic LBP [29]. Lee et al. (2017) published a meta-analysis with 7 RCTs and a total of 454 patients [40] which found benefits favoring RF compared to the control group in pain relief for up to 12 months, in line with our findings. Similarly, Chen et al. (2019) evaluated the efficacy of RF in the treatment of LFJP and sacroiliac joint pain [38], and reported favorable results for RF in pain relief, functionality, and QoL. Very recently, the meta-analysis conducted by Janapala et al. (2021) concluded that there is level II evidence in favor of RF efficacy [41]. These results are consistent with our findings, which includes the largest number of placebo-controlled RCTs published to date, and focuses exclusively on traditional RF modalities (continuous and pulsed).

A noteworthy aspect of our study is the evaluation of QoL, which has only been assessed in the previous meta-analysis by Chen et al. (2019). We found no significant differences between groups, although the number of RCTs included is very low. In addition, we conducted an analysis on GPE and found significant differences in favor of RF. However, the number of studies is limited, warranting a more comprehensive and standardized approach when assessing QoL and GPE in future research.

The subgroup analysis based on the duration of LBP suggests an influence on the response to RF, being more favorable in less chronic cases. These findings could be explained by differences in structural spine changes or central sensitization phenomena [42]. The other subgroup analysis shows that patients who did not undergo an MRI before inclusion in the study reported more pain relief in the short term. This could be explained by a selection bias, as patients with different conditions that could show a more favorable response to RF might have been included, or by the influence of MRI on patients' expectations of treatment [43, 44]. It should be noted that these subgroup analyses were predicated on certain biologically plausible hypotheses that might introduce bias, such as temporal variations in pain-related neuromodulation and macroscopic edema resulting from local inflammation. Other potential sources of bias, such as heterogeneity in sham-related procedures, a well-known controversial topic in spinal pain-related procedures [45], have been discussed elsewhere [15, 31] and therefore were not specifically explored in this meta-analysis.

This meta-analysis has several strengths and some limitations. Remarkable strengths include the quantitative analysis of QoL and GPE, which have been poorly analyzed in previous studies [38], and the subgroup analyses conducted, which allowed to suggest hypotheses to consider in future research. Regarding the limitations, there is a relatively low number of RCTs, which limits the statistical power of the meta-analysis [46], high heterogeneity in study design, inclusion criteria, and outcome measures. In addition, the cutoff points chosen for short, medium, and long-term follow-up times lack universal consensus and thus could entail a potential source of bias, although they are comparable to previous meta-analyses (e.g., [29]). These limitations should be taken into account in future studies.

Conclusion

Radiofrequency treatment for Lumbar facet joint pain provides significant benefits compared to placebo in terms of pain relief in the short, medium, and long term, as well as improved functionality in the short and long term. However, the evidence for benefits in quality of life and perceived global effect is inconclusive. The duration of low back pain and performing an MRI before treatment may influence therapeutic response. Future clinical trials should investigate the long-term effects of RF, its impact on quality of life, and define appropriate criteria for patient selection.