Introduction

Preoperatively, lymph node metastases in thyroid cancer are highly related with local recurrence and cancer-specific mortality [1, 2]. Otherwise, the rate of recurrence or persistent disease is reported to be up to 30% in post-thyroidectomy patients [3, 4], with the most common type being nodal metastasis [4, 5]. Therefore, detection of metastatic lymph nodes before and after thyroidectomy is important.

Ultrasound (US) is considered the modality of choice for the assessment of lymph node metastasis in thyroid cancer patients [6,7,8,9]. However, US is an operator-dependent and it may be difficult to evaluate retropharyngeal, retrosternal, and mediastinum [6, 10, 11]. In such cases, contrast-enhanced computed tomography (CT) has advantages over US [10, 12,13,14]. In addition, CT can deliver detailed anatomic information for surgeons, especially concerning nodal locations and relationships to anatomical landmarks [10]. There is an increasing trend in the number of articles describing the diagnostic performance of CT for cervical lymph node metastasis, especially against the background of “active surveillance” suggested by the new American Thyroid Association guidelines [6, 15, 16]. Several reports suggest that delaying radioactive iodine therapy because of the iodine content of CT contrast agent is not necessary, because the iodine clears up within 4–8 weeks, and the iodine content of the body is not an essential determinant for radioactive iodine therapy [17,18,19,20,21]. Therefore, recent guidelines suggest CT for the detection of metastatic lymph nodes in patients with thyroid cancer [9, 22]. Considering that, we consider that analysis of the diagnostic performance of thyroid CT is a timely and clinically important issue.

There is meta-analysis on the diagnostic performance of CT in the preoperative diagnosis of cervical lymph node metastasis [23]. However, this article includes only a small sample size, and it only evaluated preoperative diagnostic performance; it did not analyze the effects of different CT parameters and focused on a comparison of the diagnostic performances of US and CT.

Therefore, the purposes of the present study were to evaluate the diagnostic performance of CT in pre- and postoperative metastatic cervical lymph nodes in patients with thyroid cancer and to demonstrate the parameters influencing diagnostic performance.

Materials and methods

This systematic review and meta-analysis was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines [24]. Literature search and quality assessment are described in supplementary file.

Inclusion criteria

Studies that satisfied the following criteria were included: (1) involved patients with known thyroid cancer; (2) performed CT as the index test examination, regardless of the patients’ preoperative or postoperative state; (3) included reference standards based on histopathology tests; and (4) contained adequate information to reconstruct 2 × 2 tables to estimate the diagnostic performance of CT for the detection of metastatic cervical lymph nodes.

Exclusion criteria

The exclusion criteria were as follows: (1) case reports or case series including fewer than 10 patients; (2) letters, editorials, conference abstracts, and review articles; (3) articles focusing on a topic other than the diagnostic performance of CT for the detection of metastatic cervical lymph nodes in thyroid cancer; (4) articles with, or with suspicion of, overlapping populations: if articles had an overlapping population, the article with the largest population was included; (5) insufficient data for the reconstruction of 2 × 2 tables; and (6) articles without reference standards based on histopathology tests.

To assess the eligibility of the studies, the literature search and selection was independently performed by two radiologists (S.J.C. and C.H.S., both 1 year of experience in neuroradiology and thyroid radiology).

Data extraction

Data were extracted from the included studies using a standardized form, and included the following: (1) study characteristics: authors, year of publication, institution, country of origin, duration of patient recruitment, study design (prospective vs. retrospective, multicenter vs. single center, consecutive or not), time interval between index test (CT), and reference standard (the postoperative histological data in all 17 articles [10, 13, 14, 25,26,27,28,29,30,31,32,33,34,35,36,37,38] and 13 articles with additional data of fine needle aspiration regarded as standard references [10, 13, 14, 25, 26, 28,29,30,31,32,33,34,35,36]), blinding of radiologist to reference standard, characteristics of readers (number and experience), blinding of pathologist to image findings; (2) clinical and patient characteristics: number of patients, male to female ratio, mean patient age, number of lymph nodes, number of metastatic lymph nodes, the level of the metastatic lymph node (central (level VI) vs. lateral (levels II, III, IV, and V)), clinical setting (preoperative vs. postoperative state), final diagnosis (the type of thyroid cancer); (3) technical characteristics of CT and contrast enhancement: CT machine and vendor, detector number, kVp, mAs, reconstruction slice thickness, contrast model, quantity of injected contrast, scan delay time after contrast injection, flushing with saline or not, flushing saline flow rate; (4) the definition of CT image findings of a metastatic cervical lymph node; and (5) the diagnostic performance of CT.

Data synthesis and analyses

The primary outcome for the systematic review and meta-analysis was the diagnostic performance of pre- and postoperative CT for the detection of metastatic cervical lymph nodes in patients with thyroid cancer. A secondary aim was to identify parameters responsible for heterogeneity among the included studies by performing subgroup analyses according to specific clinical settings. Heterogeneity was determined using the Higgins I2 statistic [39]. The presence of a threshold effect caused by the heterogeneity was evaluated by visual assessment of coupled forest plots of sensitivity and specificity. In addition, the presence of a threshold effect (a positive correlation between sensitivity and false-positive rate) among the articles was evaluated, with a Spearman correlation coefficient of greater than 0.6 between the sensitivity and false-positive rates being considered to indicate a threshold effect [40].

The summary sensitivity and specificity, and their 95% CIs, were calculated using bivariate random-effects modeling [39, 41,42,43,44]. Additionally, the results are graphically presented by plotting hierarchic summary receiver operating characteristic (HSROC) curves with 95% confidence and prediction regions. Deeks’ funnel plot was used to assess publication bias, and Deeks’ asymmetry test was used to calculate the p value and determine statistical significance [45].

Subgroup analyses were performed to assess the diagnostic performance in specific clinical settings as follows: (1) preoperative CT analysis, (2) lateral neck lymph node analysis, and (3) central neck lymph node analysis. Meta-regression was performed using several covariates to identify the sources of heterogeneity among the studies as follows: (1) malignant lymph node rate (≥ median value vs. < median value), (2) study design (retrospective vs. prospective), (3) quantity of contrast injected (≥ 100 ml vs. < 100 ml), (4) scan delay time after contrast injection (arterial ≤ 45 s vs. venous > 60 s), (5) reconstruction slice thickness (≤ 3 mm vs. > 3 mm), and (6) blinding of the radiologist to the reference standard (yes or no).

The MIDAS and METANDI modules in STATA 15.0 (StataCorp) and the Mada package in R version 3.2.3 (The R Foundation for Statistical Computing) were used for the statistical analyses, which were performed by one of the authors (C.H.S., with 5 years of experience performing systematic reviews and meta-analyses).

Results

Literature search

The detailed article selection process is described in Fig. 1. An initial systematic search identified 757 articles. After removing 215 duplicates, screening of the remaining 542 titles and abstracts resulted in the exclusion of a further 506 articles. No additional articles were identified in the searches of the bibliographies of the relevant articles. Full-text reviews of 36 provisionally eligible articles were performed, and 19 articles were excluded because of the following reasons: 15 articles were not in the field of interest [46,47,48,49,50,51,52,53,54,55,56,57,58,59,60], three articles included patient cohorts partially overlapping with other studies [61,62,63], and one article contained 10 or fewer enrolled patients [64]. Finally, 17 studies were included in this meta-analysis [10, 13, 14, 25,26,27,28,29,30,31,32,33,34,35,36,37,38].

Fig. 1
figure 1

Flow diagram of the study selection process

Characteristics of the included studies

The detailed patient characteristics are listed in Table 1. The total study population was 6378, with individual studies ranging from 20 to 3668 patients. The studies had mean patient ages ranging from 34.1 to 65.6 years. The number of lymph nodes ranged from 85 to 6557, with a total sum of 11,590. In 15 articles, the population consisted of patients with proven papillary thyroid carcinoma (PTC) [10, 14, 25,26,27,28,29,30,31,32,33,34,35], while in the remaining two articles, the study population consisted of patients with both PTC and medullary thyroid carcinoma [37, 38]. Four of the included studies were prospective in design [10, 33,34,35], while the other 13 studies were retrospective [13, 14, 25,26,27,28,29,30,31,32, 36,37,38]. Fourteen of the included articles evaluated the preoperative diagnostic performance of CT [10, 14, 25,26,27,28, 30,31,32,33,34,35,36, 38], two articles evaluated it postoperatively [29, 37], and one article evaluated it both pre- and postoperatively [13].

Table 1 Characteristics of the included studies

The detailed CT characteristics are summarized in Table 2. The CT vendors and scanners were heterogeneous, with four studies having used only Siemens [10, 13, 25, 28], three studies only GE [34, 35, 38], two studies only Philips [26, 37], one study both GE and Siemens [14], one study both Siemens and Philips [27], and one study GE, Siemens, and Philips [30]. The contrast material used for CT was iopromide in five studies [13, 14, 25, 28, 30], iohexol in four studies [26, 35,36,37], iopamidol in one study [34], and a non-specified nonionic contrast agent in two studies [10, 38]. The injection rates for the contrast agent were 3 ml/s in seven studies [14, 25, 26, 28, 30, 36, 38], 2 ml/s in two studies [35, 37], 1.2 ml/s in one study [10], 3–3.5 ml/s in one study [33], 3.5 ml/s in one study [13], and 4 ml/s in one study [34]. The quantities of contrast agent injected were 90 ml in six studies [14, 25, 28, 30, 33, 38], 120 ml in two studies [26, 36], 100 ml in two studies [13, 37], 65 ml in one study [10], and 60 ml in one study [35]. Of the five studies reporting the use of a saline flush after contrast injection, three used a saline flush rate of 3 ml/s [25, 30, 33], one reported using 50 ml of saline [13], and one did not give information on the quantity or rate of saline injected [28]. Image slices were reconstructed at a thickness of 3 mm in five studies [10, 25, 28, 36,37,38], 5 mm in three studies [34, 35, 38], 2.5–3 mm in three studies [14], and 0.5 mm in one study [26]. The CT criteria for cervical metastatic lymph nodes were mostly similar across the studies that were almost morphologic criteria including size, central necrosis or cystic change, dense or heterogeneous enhancement by measurement of CT attenuation, and calcification described in Table 2.

Table 2 CT characteristics and criteria of the included studies

Quality assessment

The overall quality of the included studies according to the QUADAS-2 criteria was moderate, with the exception of three articles that satisfied fewer than four of the seven items (Fig. 2) [31, 32, 35]. In the patient selection domain, 11 studies showed an unclear risk of bias due to unclear information about consecutive patient enrollment [10, 13, 25, 27, 28, 31, 32, 35,36,37,38]. In the index test domain, there was an unclear risk of bias in seven studies where the index test was interpreted without the operators being blinded to knowledge of the results [26, 27, 29, 31, 32]. In terms of the reference standard, all included studies were considered to have an unclear risk of bias because of unclear blindness to knowledge of the index test during interpretation of the reference standard test [10, 13, 14, 25,26,27,28,29,30,31,32,33,34,35,36,37,38]. In the flow and timing domain, one study was considered to have a low risk of bias [27], while the others considered unclear or high risk. Ten studies were considered to have an unclear risk of bias because the intervals between the index test and reference standard were not specified [13, 25, 26, 28, 31,32,33,34,35, 38]. However, all studies were considered to have low applicability in the patient selection, index test, and reference standard domains [10, 13, 14, 25,26,27,28,29,30,31,32,33,34,35,36,37,38].

Fig. 2
figure 2

Quality assessment of the included studies according to the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) criteria

Diagnostic performance of CT

The sensitivity and specificity of the individual included studies were variable, ranging from 23 to 83% and 64 to 94%, respectively. Heterogeneity was present (p < 0.01) in the Q test, and the Higgins I2 statistic demonstrated substantial heterogeneity in sensitivity (I2 = 96.3%) and specificity (I2 = 93.8%); however, there was no threshold effect (Spearman correlation coefficient of < 0.6). For all the included 17 studies, the pooled sensitivity was 55% (95% CI, 47–63%) and the pooled specificity was 87% (95% CI, 90–95%) (Fig. 3). In the HSROC curve, there was a relatively large difference between the 95% confidence and prediction region, indicating heterogeneity between the studies (Fig. 4), while the area under the HSROC curve was 0.82 (95% CI, 0.79–0.85). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.41 for the slope coefficient (Fig. 5).

Fig. 3
figure 3

Pooled sensitivity and specificity for pre- and postoperative CT for diagnosis of metastatic cervical lymph nodes in patients with thyroid cancer. Horizontal lines indicate 95% CIs of the individual studies

Fig. 4
figure 4

Hierarchical summary receiver operating characteristic (HSROC) curve of CT for the diagnosis of cervical lymph node metastasis in patients with papillary thyroid cancer

Fig. 5
figure 5

Deeks’ funnel plot to evaluate publication bias

Subgroup analysis

We performed several subgroup analyses to evaluate various clinical scenarios. For preoperative CT analysis (n = 14), the pooled sensitivity was 54% (95% CI, 45–64%) and the pooled specificity was 87% (95% CI, 83–90%). In the per-neck level analysis, the summary sensitivity for lateral neck lymph nodes was 69% (95% CI, 59–77%) and the specificity was 91% (95% CI, 87–94%), while the summary sensitivity for central neck lymph nodes was 43% (95% CI, 32–55%) and the specificity was 91% (95% CI, 83–94%).

In the per-neck level subgroup analysis, the heterogeneity in both sensitivity and specificity, as demonstrated by the Higgins I2 statistic, was reduced, with the I2 values being 86.2% for sensitivity and 84.3% for specificity for lateral neck lymph nodes, and 90.4% for sensitivity and 90% for specificity for central neck lymph nodes.

Meta-regression

We performed meta-regression analysis to determine the causes of heterogeneity (Table 3). Among the potential covariates, three were shown to be significantly associated with study heterogeneity in the joint model, with these being contrast amount, scan phase, and reconstruction slice thickness. Sensitivity was higher in the arterial scan phase than in the venous scan phase, and was also higher with thin slices (≤ 3 mm) than with thick slices (> 3 mm). Apart from these three covariates, no other factor was demonstrated to significantly affect heterogeneity.

Table 3 Meta-regression of pre- and postoperative CT for diagnosing metastatic cervical lymph nodes in patients with thyroid cancer

Discussion

This systematic review and meta-analysis showed that CT for pre- and postoperative diagnosis of metastatic cervical lymph nodes in patients with thyroid cancer had a pooled sensitivity of 55% (95% CI, 47–63%), a pooled specificity of 87% (95% CI, 90–95%), and an area under the HSROC curve of 0.82 (95% CI, 0.79–0.85). The Higgins I2 statistic demonstrated substantial heterogeneity in the sensitivity (I2 = 96.3%) and specificity (I2 = 93.8%); however, in the per-neck level subgroup analysis, this heterogeneity was reduced. Therefore, our meta-analysis demonstrated that CT showed acceptable diagnostic performance. In the meta-regression analysis, variations in the CT protocols such as contrast amount, scan phase, and reconstruction slice thickness were significant factors influencing the heterogeneity in diagnostic performance. Therefore, optimization and standardization of CT protocols are required in the future.

Although ultrasound and ultrasound-guided FNA are first-line diagnosis procedures for the detection of pathological nodes in the neck, the number of articles reporting the performance of CT for the diagnosis of cervical lymph node metastasis in thyroid cancer patients in routine clinical practice is increasing. This is because of the following reasons. First, the new American Thyroid Association guidelines suggest “active surveillance” for some recurrent patients, according to the risk stratification system [6, 15, 16]. Therefore, detection of preoperative lymph node metastasis is important before active surveillance. Second, there was concern over the use of iodinated contrast agents for thyroid CT for detecting metastatic lymph nodes in pre-op evaluation. However, recent studies reported that contrast-enhanced CT does not decrease the effect of subsequent radioactive iodine therapy [17,18,19,20,21]. Third, CT is an objective technique that is a less operator-dependent modality than US. Moreover, CT provides more anatomical information and improved detection of deeply located lesions, such as those in the retropharyngeal, retrosternal, and mediastinal areas [6, 10, 11]. Fourth, CT provides added value to US [23]. Fifth, in a postoperative setting, US diagnosis has many challenging clinical aspects, including anatomical distortion, postoperative fibrosis, and patients’ resistance for re-operation [65]. Finally, minimally invasive treatments such as radiofrequency ablation (RFA) and laser ablation are increasingly being used to treat recurrent thyroid cancers [66, 67], and current RFA guidelines suggest both CT and US detect lymph nodes before RFA, and to then evaluate the effect of RFA. On the basis of this background, this analysis of the diagnostic performance of CT is timely and clinically important.

Although one meta-analysis on the performance of CT has been published [23], several original articles on the diagnostic performance of CT have been newly published since 2017 [13, 27,28,29, 31, 33, 38]. Moreover, the previously published meta-analysis contained relatively small sample size, and it evaluated only preoperative CT performance (no postoperative CT examinations). Therefore, we considered there to be a need to update the literature and add the newly available information to a systematic review and meta-analysis, including a larger sample size (926 to 6378 patients), the addition of postoperative CT, and the analysis of various CT parameters.

The result of per-neck level subgroup analysis indirectly suggests that the level of the lymph node could be a cause of heterogeneity. In the meta-regression analysis, the thyroid CT protocol was revealed to be the main factor causing heterogeneity. Sensitivity was higher in the arterial scan phase than in the venous scan phase, which is in agreement with a study that demonstrated that arterial phase CT may be helpful for improving the detection of lymph node metastasis [13]. In addition, sensitivity was also higher with thin slices (≤ 3 mm) than with thick slices (> 3 mm). These additional results indicating that variation in the CT parameters causes heterogeneity suggest that optimization and unification of CT protocols could be crucial for improving the performance of CT.

This study has several limitations. First, not all subtypes of thyroid cancer were enrolled in this study. The number of medullary thyroid cancer patients was small, and patients with follicular thyroid cancer and anaplastic thyroid cancer were not enrolled. The criteria to define the metastatic lymph node were not exactly same, even though that were mostly similar across the studies. In addition, most studies (14 of 17) were retrospective, resulting in a risk of bias in patient selection, with the possibility of increased diagnostic sensitivity [68].

In conclusion, CT demonstrated acceptable diagnostic performance in the pre- and postoperative diagnosis of metastatic cervical lymph nodes in patients with thyroid cancer. Variation in the CT protocols is a main factor behind the heterogeneity across the included studies. Therefore, this study could be considered to indicate the need for optimization and unification of CT protocols in patients with thyroid cancer.