Introduction

Etiology of enlarged mediastinal or abdominal LNs ranges from benign reactive lymphadenopathy, infectious-inflammatory diseases to malignant conditions [1]. The distinction between benign and malignant LNs is of utmost importance for developing a treatment plan and determining prognosis [2]. Advanced imaging techniques such as computed tomography (CT) and positron emission tomography (PET) can detect enlarged lymph nodes, but they are not accurate enough to distinguish benign and malignant lymphadenopathy [3, 4]. Traditional thoracotomy, thoracoscopy, and laparoscopy, which can accurately obtain pathological diagnosis, will increase the economic and health burden of patients with benign lymphadenopathy [5].

Most of the mediastinal and abdominal lymph node lesions are adjacent to the gastrointestinal tract. EUS is easy to access lesions and obtain diagnostic cytological or histological material by FNA [6]. Therefore, EUS-FNA has been widely applied in diagnosing lymphadenopathy. However, to our knowledge, the existing research is mainly aimed at the differential diagnosis of normal lymph nodes and lymphadenopathies (including benign and malignant) [7, 8], and there has been no quantitative review to evaluate the accuracy of EUS-FNA in distinguishing benign and malignant LNs in the mediastinum and abdominal cavity. The purpose of our study is to collect available evidence to review the application value of EUS-FNA in this field systematically.

Methods

Protocol and Registration

The protocol of this systematic review with meta-analysis was prospectively registered in PROSPERO (http://www.crd.york.ac.uk/ PROSPERO; registration number: CRD42020171901).

Literature Search

Data for this study were gathered following an electronic search in Embase, Cochrane Library, Web of Science, and PubMed up to February 2020. The search terms used were ((Mediastinal OR Abdominal) AND (mass(es) OR lymph node(s) OR lymphadenopathy(ies))) AND ((EUS OR endoscopic ultrasound OR EUS guided OR Endoscopic Ultrasound-Guided) AND (FNA OR fine-needle aspiration OR fineneedle OR fine-needle OR FNB OR FNAB OR Fine-Needle Aspiration Biopsy)). After removing duplicates, two researchers (LB.C., HB.S.) separately examined the titles and abstracts and selected the appropriate study for full-text review.

Study Selection Criteria

Inclusion criteria: (1) diagnostic clinical trials assessing the accuracy of EUS-FNA for differentiating benign and malignant mediastinal or abdominal LNs; (2) using surgical histology or long-term follow-up as reference standards; (3) providing data that can be used to reconstruct a 2 × 2 contingency table (true positive, false positive, true negative, false negative).

Exclusion criteria: (1) did not assess mediastinal or abdominal LNs; (2) samples for cancer staging; (3) incomplete 2 × 2 contingency table data; (4) review articles, case reports, letter to editors, editorials.

Data Extraction and Quality Assessments

For each included study, we constructed a full 2 × 2 contingency table, which consisted of four effect sizes: true positive (TP), false positive (FP), true negative (TN), and false negative (FN). Malignant LNs (diagnosed by cytopathology through EUS-FNA) that were ultimately diagnosed as ‘malignant’ were regarded as TP. In contrast, malignant LNs that were ultimately diagnosed as a benign disease at follow-up were regarded as FP. Likewise, benign aspirates that were ultimately diagnosed as ‘benign’ were regarded as TN, and benign aspirates that were diagnosed as ‘malignant’ were regarded as FN. Other information (characteristics of studies and patients) were also collected. The data required for meta-analysis were independently extracted by two authors (LB.C., HB.S.). Disputes were resolved through their mutual consultations or with a third inspector (XY.G.).

The Quality Assessment of Diagnostic Accuracy Studies questionnaire (QUADAS) was used to evaluate the quality of the selected studies, which included a 14-item checklist for each study [9]. Items were divided into three categories: yes, no, and unclear.

Sensitivity Analysis

The jackknife method was used for sensitivity analysis to evaluate whether a separate study had a disproportionate influence on the outcomes of the meta-analysis [10]. We removed one study each time and then calculated the pooled results of the remaining studies to measure whether the test performance had changed significantly. The sources of heterogeneity were explored by subgroup analysis and meta-regression.

Statistical Methods

The results of the meta-analysis were presented by determining pooled sensitivity, specificity, positive LR, negative LR, and DOR. There were two methods for assessing heterogeneity. First, the Chi-square test was adopted to examine whether the difference in results is only by chance [11]. The existence of heterogeneity was proved by P < 0.1. Second, the value of the inconsistency index (I-squared, I2) was computed to explain the portion of total variations that originated from heterogeneity instead of chance [12]. Value of < 30%, 30–60%, 61–75%, and > 75% was considered suggestive of low, moderate, substantial, and considerable heterogeneity, respectively [13]. When there was no obvious heterogeneity, the pooled results of EUS-FNA were generated by using a fixed-effect model based on Mantel–Haenszel method [14]. In contrast, if the heterogeneity was significant, a random-effects model based on DerSimonian–Laird method [15] was applied. According to the Moses–Shapiro–Littenberg method, a SROC was constructed [16]. The trapezoidal method was used to integrate the SROC equation to calculate the AUC numerically. A value of AUC close to 1 indicated that the test performed well [17].

The evaluation of publication bias was analyzed by Stata Release 15.1 (StataCorp, College Station, Tex). Other statistical processes were conducted by Meta-Disc Version 1.4 (Unit of Clinical Biostatistics, Romany Cajal Hospital, Madrid, Spain [18]).

Results

Study Characteristics and Quality Assessment

The initial exploration yielded 1160 studies. 466 duplicate studies were excluded. Base on the predestined inclusion and exclusion criteria, we excluded 500 studies after reading the title and abstract. And the full texts of the remaining 194 studies were evaluated in depth. Finally, 25 studies were included in our analysis [19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44]. The search process is shown in Fig. 1. And Table 1 summarizes the characteristics of selected studies. The selected studies included 24 full articles [19,20,21,22,23,24,25,26,27,28,29,30, 32, 34,35,36,37,38,39,40,41,42,43,44] and two abstracts [31, 33]. A total of 2833 LNs involving 2753 patients were reviewed in meta-analysis. The quality evaluation of the selected studies is presented in Fig. 2.

Fig. 1
figure 1

Flowchart of search strategy

Table 1 Characteristics of the 26 selected studies
Fig. 2
figure 2

Quality assessment of included studies. +: yes; −: no; ?: unclear

Discrimination Between Benign and Malignant LNs

In the differential diagnosis of benign and malignant mediastinal and abdominal LNs, EUS-FNA had a pooled sensitivity, specificity of 87% (95% CI 86–90%) and 100% (95% CI 99–100%), respectively. And we found substantial heterogeneity in sensitivity (Chi-square = 87.71, df = 25, P = 0.0000, I2 = 71.5%) (Fig. 3a, b).

Fig. 3
figure 3

Forest plot of sensitivity (a), specificity (b), positive LR (c), and negative LR (d) for discrimination between benign and malignant LNs. LR likelihood ratio

The pooled PLR and NLR were 68.98 (95% CI 42.10–113.02) and 0.14 (95% CI 0.11–0.17), respectively (Fig. 3c, d). The pooled DOR for diagnosing malignant mediastinal and abdominal LNs was 519.15 (95% CI 297.70–905.32). The AUC of EUS-FNA was 0.99 (Q* = 0.96) (Fig. 4).

Fig. 4
figure 4

SROC for EUS-FNA (with 95% CI). SROC summary receiver operating characteristic, AUC area under the curve, Q* the point at which sensitivity and specificity are equal, SE standard error

The baseline conditions in seven included studies [19, 22, 24, 26, 29, 35, 42] were defined as patients with primary cancers or suspected malignancy. In this type of study, the pretest probability of malignant lymphadenopathy is higher. Therefore, the presence of such patient bias may overestimate the sensitivity of distinguishing malignant lymphadenopathy. To avoid this bias, we separately assessed 11 studies [19, 20, 22, 25, 27, 30,31,32, 37, 39, 43] that excluded certain or suspected malignancies.

The subgroup with primary cancers or suspected malignancies involved 477 patients with 477 LNs (sensitivity 88%, 95% CI 84–92%; specificity 100%, 95% CI 98–100%; DOR 254.12, 95% CI 78.49–822.70) while the subgroup without primary cancers or suspected malignancies included 947 patients with 983 LNs (sensitivity 86%, 95% CI 83–89%; specificity 100%, 95% CI 98–100%; DOR 379.63, 95% CI 160.01–900.73).

Mediastinal Versus Abdominal Lymph Nodes

Nine studies [19, 20, 22, 25, 29, 33, 35, 37, 44] (including 901 patients) reported EUS-FNA for differentiating benign and malignant LNs in mediastinum, and four studies [30, 32, 35, 38] (including 461 patients) reported the diagnostic accuracy in abdominal LNs. In the remaining 13 studies, it is difficult to separate mediastinal and abdominal LNs.

EUS-FNA performed in mediastinal LNs gained a sensitivity of 85% (95% CI 81–88%), while in abdominal LNs, it reached 87% (95% CI 82–91%). The specificity was 100% (95% CI 99–100%) in mediastinal LNs and 99% (95% CI 97–100%) in abdominal LNs. The mediastinal LNs had a PLR of 73.28 (95% CI 29.63–181.19), and the abdominal LNs had a PLR of 64.12 (95% CI 21.11–194.83). The mediastinal LNs had an NLR of 0.15 (95% CI 0.11–0.22), and the abdominal LNs had an NLR of 0.14 (95% CI 0.07–0.29). The DOR of mediastinal and abdominal LNs were 508.60 (95% CI 181.86–1422.37) and 439.83 (95% CI 125.15–1545.78), respectively.

Rapid On-Site Evaluations

The studies included in the analysis involved 9 studies [20, 21, 23, 25, 26, 28, 32, 34, 35] (including 1106 LNs) with ROSE and 14 studies [19, 24, 27, 29, 30, 36,37,38,39,40,41,42,43,44] (including 1393 LNs) with non-ROSE. The remaining three studies had not clear descriptions. We assessed whether ROSE changed the diagnostic efficacy of EUS-FNA in malignant LNs based on the above two subgroups. The sensitivity of the group with ROSE was 91% (95% CI 89–93%), while non-ROSE was 85% (95% CI 82–87%).

Adverse Events

We calculated the rate of adverse events (AEs) in EUS-FNA derived by included patients. Twenty studies [19,20,21,22,23,24,25, 27, 29, 30, 32, 34, 35, 38,39,40,41,42,43,44] reported the occurrence of AEs. There were 30 AEs described in these studies, involving 1907 patients. All of that were mild involving postprocedural pain (n = 16), sore throat (n = 11), nausea and vomiting (n = 2), and postprocedural fever (n = 1) (pooled rate of AEs: 1.57%; 95% CI 1.06–2.24%). These symptoms were self-limited or relieved within a few days after symptomatic treatment.

Subgroup Analyses and Meta-Regression

We performed subgroup analyses based on the characteristics of selected studies. These variables included the country (USA or other countries), publication year, study design (prospective or retrospective), publication type (full articles or abstract), time interval, length of clinical follow, adequate rate of sample, on-site cytology (yes or no), number of performers, sample size (< 100 LNs vs. ≥ 100 LNs), locations of LNs, and scores of QUADAS. The summary results are provided in Table 2.

Table 2 Subgroup analysis (with 95% CI) and meta-regression on DOR

We removed the two studies published as abstract and recalculated the sensitivity of the 24 studies published as full articles and obtained a sensitivity of 88% (95% CI 86–89%) in EUS-FNA. This result suggested that the publication type was not a factor affecting overall outcomes.

We isolated studies that only used 22G needles for analysis [19,20,21,22,23,24,25,26, 28, 30, 32, 35, 36, 38, 41, 42, 44]. However, 19G and 25G needles are a small amount of mixed data, which cannot be compared and analyzed. The results showed that the sensitivity of FNA using 22G needles was 88% (95% CI 86–89%), which was not significantly different from the overall results.

We also evaluated the sensitivity of EUS-FNA performed by single expert. Ten studies [19,20,21,22, 25, 28, 35, 36, 38, 42] with single performer had a higher pooled sensitivity (87%, 95% CI 84–90%) than studies with multiple performers (80%, 95% CI 76–84%). Compared with the overall results, the heterogeneity of sensitivity decreased in subgroup with single performer (Chi-square = 9.81, df = 9, P = 0.3662, I2 = 8.2%), while subgroup with multiple performers (Chi-square = 14.07, df = 5, P = 0.0152, I2 = 64.5%) did not change significantly. It still did not explain that the number of operators in EUS-FNA could lead to heterogeneity in sensitivity.

In the next step, we reviewed whether several variables affecting the quality of selected studies were related to heterogeneity.

We assessed the studies with quality scores of more than ten on the QUADAS questionnaire. The pooled sensitivity reached 90% (95% CI 88–92%) within the 18 studies of higher quality.

We performed sensitivity analyses of 14 studies designed as prospective and obtained a sensitivity of 91% (95% CI 89–93%).

We also analyzed 13 studies with sample sizes less than 100 and obtained a sensitivity of 87% (95% CI 83–90%) (Chi square = 10.95, df = 12, P = 0.5332, I2 = 0.0%). Similarly, the other subgroup with a sample size of 100 or more gained a sensitivity of 88% (95% CI 85–89%) (Chi-square = 76.66, df = 12, P = 0.0000, I2 = 84.3%). It could be seen that the subgroup with a sample size less than 100 had low heterogeneity, but it could not prove that the difference of sample size was the source of heterogeneity.

Meta-regression analyses were performed following the subgroup analyses. Fourteen results of meta-regression were generated, and two of these were significant (P < 0.1). One of them was the sample size subgroup, which had an RDOR of 3.96 (95% CI 1.19–13.17) (P = 0.027). The other was a subgroup of patients involving malignant conditions with an RDOR of 2.11 (95% CI 0.96–4.61) (P = 0.063). This result indicated that the sample size and baseline conditions of patients might lead to differences in the accuracy of EUS-FNA for diagnosing malignant LNs up to 3.96 fold and 2.11 fold, respectively. Details of subgroup analyses and meta-regression are documented in Table 2.

Sensitivity Analysis

The jackknife method was used for sensitivity analysis to evaluate whether a separate study had a disproportionate influence on the outcomes of the meta-analysis. We removed one study each time and then reevaluated the accuracy of EUS-FNA (Supplementary Table 1). Sensitivity analysis found that overall results, and heterogeneity were not significantly affected by individual studies.

Publication Bias

The funnel plot in Fig. 5 was symmetric. Additionally, evaluation of publication bias was derived by Begg’s test (z = 0.55, Pr > |z| = 0.582 > 0.05) and Egger’s test (t = 0.57, 95% CI − 1.52 to 2.68, P = 0.576 > 0.05). None of the test results above suggested significant publication bias in meta-analysis.

Fig. 5
figure 5

Begg’s publication bias plot

Discussion

Accurately distinguishing benign and malignant lymphadenopathy can not only effectively guide the formulation of clinical strategies, but also provide valuable information for cancer staging [45, 46]. Although CT, PET, and other imaging techniques are non-invasive, their accuracy in the diagnosis of malignant lymph nodes is not satisfactory [47, 48].

As an effective detection technique for mediastinal and abdominal LNs, EUS could distinguish benign and malignant LNs based on specific four echo characteristics: size greater than 1 cm, distinct margins, round shape, and hypoechoic [49]. Only when all of the four echo features mentioned above appeared in the same lymph node, the accuracy of EUS in predicting invasive malignant lymph nodes could reach 80%. However, all features of malignant involvement just existed in 25% of malignant lymph nodes, and none of these features could predict malignant invasion alone [49]. This made EUS limited in the discrimination of benign and malignant LNs. Besides, some similar studies did not support the accuracy of the echo characteristics [50, 51]. However, with the development of image enhancement techniques, such as contrast-enhanced EUS (CE-EUS) for lymphadenopathy, the detection efficiency of regular EUS has been well supplemented [52,53,54,55]. But this still cannot replace histological or cytological diagnosis.

Recently, EUS-FNA has made great contributions to the diagnosis of malignant lymph nodes, which comes from its safety and accuracy [56]. The results of our analysis also yielded ideal sensitivity (87%) and specificity (100%). The positive and negative LR was also considerable. The positive LR was 68.98, which provided strong evidence to establish the diagnosis of malignant lymphadenopathy. The negative LR was 0.14, which deserved our attention, but it still could not reliably rule out malignant lymphadenopathy. Furthermore, the area under the SROC curve of 0.99 reflected the high diagnostic accuracy of EUS-FNA.

The meta-analysis gave a sensitivity of 87% for EUS-FNA, but the sensitivity in different centers varied from 74 to 99% [19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44]. Therefore, we did some exploration to find the source of heterogeneity.

We compared the diagnostic results of abdominal and mediastinal LNs. EUS-FNA was more sensitive in the abdominal cavity than the mediastinum. However, EUS-FNA has a diagnostic odds ratio of 508.60 in mediastinal LNs, whereas it creates a diagnostic odds ratio of 439.83 with a wider confidence interval in abdominal LNs. The lower number of studies in the subgroup might be one of the reasons for such differences. In addition, it might also be related to the inability of EUS in evaluating the hilar and pretracheal LNs. For the insufficient diagnosis in this area, EBUS-TBNA can be an excellent supplement [57, 58]. Felix et al. showed that the combination of EBUS-TBNA and EUS-FNA could increase the sensitivity of mediastinal lymph nodes to 96% [59].

Whether ROSE can increase the diagnosis accuracy of solid lesions is still controversial [60, 61]. Thus, we evaluated the value of ROSE in the diagnosis of malignant LNs by EUS-FNA and found that the use of ROSE could increase the sensitivity by 6%. However, most medical institutions are not capable of deploying ROSE, and other studies have suggested that using FNB can achieve similar accuracy to FNA + ROSE [62, 63]. It may be another option in an endoscopic center that is not equipped with ROSE.

The results of studies using a single EUS-FNA operator caught our attention. In these studies, all procedures were performed by an experienced endoscopic ultrasound examiner. The subgroup analysis revealed that EUS-FNA with single operator had a higher sensitivity (87%) than multiple operators (80%). It is in line with the research originating from W.Bohle et al., which highlighted that the sample quality and diagnostic accuracy obtained by EUS-FNA performers at different learning stages were significantly different [36].

The false negative of EUS-FNA in identifying malignant LNs cannot be ignored. It was closely related to sampling errors. This phenomenon seemed to be common in many medical centers [22, 24, 32, 33, 37]. Thereinto, false negative results from the acquisition of lymphoma samples have been doubted by pathologists. Therefore, we pay special attention to the efficacy of EUS-FNA in the diagnosis of lymphoma. In several studies of separation analysis of lymphoma, EUS-FNA achieved 75–96% accuracy in diagnosing lymphoma [27, 32, 37]. Many scholars have tried methods to reduce false negatives and improve the efficiency of lymphoma diagnosis. Some studies have reported that flow cytometry can enhance the effectiveness of EUS-FNA in lymphoma diagnosis [27, 64, 65]. Moreover, there have been many reports of experience in FNA. If aspiration was performed from multiple lymph nodes and various sites, sampling errors will be greatly reduced. Wallace et al. and Baradales RH et al. also proposed that an accurate diagnosis of malignant LNs usually required three passes, while Leblanc et al. gave a suggestion of more passes (≥ 5) [66,67,68]. Our study indicated that the pooled sensitivity in studies with an average number of passes higher than or equal to 3 and less than 3 were 89% and 86%, respectively. Obviously, the optimal amount of passes in EUS-FNA cannot be reliably determined. It is worth mentioning that we can also improve the accuracy of sample acquisition sourcing with advanced image enhancement techniques when working on EUS-FNA. The systematic reviews of Xu et al. and Fusaroli et al. showed that combining CH-EUS and EUS-FNA may help reduce the false negative, thereby improving overall diagnostic accuracy [52, 53].

This meta-analysis emphasized the differential diagnosis of benign and malignant LNs, which was different from the routine diagnostic analysis of lymphadenopathies. We had collected a considerable size of sample, but there were still some limitations. The majority of selected studies were conducted in a single center. Most benign lymph nodes, as defined by EUS-FNA, had not been confirmed by surgical pathology, and the clinical follow-up strategies of different studies were various. Fortunately, the follow-up period of almost all studies was more than 6 months, which made a benign diagnosis more reliable. The included studies used 22G alone or mixed-use of 19G, 22G, and 25G, so it is difficult to compare the effectiveness of different types of suction needles. Besides, significant heterogeneity was found in the pooled sensitivity of EUS-FNA. Hence, we sought sources of heterogeneity through sensitivity analysis, subgroup analyses, and meta-regression. The results suggested that the reasons for the heterogeneity of sensitivity might be the difference in sample size, the various baseline conditions, and the number of EUS-FNA performers. We analyzed the subgroups after reducing heterogeneity to obtain more convincing conclusions.

In conclusion, EUS-FNA is a sensitive, highly specific, and safe method for distinguishing benign and malignant mediastinal or abdominal LNs. However, the sensitivity of EUS-FNA still varies significantly among different centers, which requires us to further explore more stable strategies for the diagnosis of malignant LNs.