Introduction

Ultrasound (US) is the pivotal tool to detect and characterize thyroid nodules. Indeed, according to its US presentation, a nodule will undergo further investigations or be conveyed to clinical and US follow-up. This approach is supported by a very significant number of papers published in the last two decades [1,2,3,4,5,6]. However, some limitations of US, such as low reproducibility and operator-depending performance, might reduce its diagnostic value. For these reasons, during the last decade, several additional applications of US have been introduced, including contrast-enhanced ultrasound (CEUS). CEUS has been reported to increase the reliability of conventional US, is associated with a rate of adverse events close to zero (1:10,000 vs. 1–12:100 of iodinated contrast agents), and has a reasonable price depending on the country. Moreover, with the development of second-generation US contrast media, both durability and reproducibility have improved [7, 8]. It is worth noting that, at the time of writing, regulatory agencies have still not approved the contrast agents for use in clinical practice in all countries, and this has resulted in limiting the diffusion of thyroid CEUS, too [9]. As a consequence, despite several studies on the reliability of CEUS in detecting and excluding malignancy in thyroid nodules, a high level of evidence in the literature is currently lacking.

The present study was undertaken to systematically review the available data on CEUS in evaluating thyroid nodules for the risk of thyroid carcinoma. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were determined. Importantly, to avoid the potential bias present in several studies reporting the performance of CEUS using fine-needle aspiration cytology (FNAC) as standard of reference, only those studies adopting histological diagnosis as the gold standard were selected to carry out the meta-analysis.

Materials and methods

Conduct of review

This present systematic review was conducted according to PRISMA guidelines.

Search strategy

A comprehensive literature search in PubMed and Scopus was conducted by searching the following terms: “((contrast) AND ultrasound) AND thyroid.” This allowed us to retrieve the largest number of manuscripts of CEUS in assessing the thyroid gland. A beginning date limit was not used, the search was updated until June 6, 2018, and no language restriction was used. To try to expand our search, references of the retrieved articles were also screened to identify additional studies.

Study selection

Only original articles with complete data on thyroid nodules with histological diagnosis and their preoperative CEUS evaluation were eligible for inclusion. Specifically, to be included in the present meta-analysis a study should aim to classify a nodule as positive or negative on CEUS by US examiners (i.e., a true clinical study undertaken to give information for practice) and not basing on a particular feature of CEUS (i.e., experimental study searching which ancillary characteristic of CEUS could be the most reliable one). Exclusion criteria were: studies reporting nodules with particular condition (i.e., inconclusive or indeterminate FNAC report, specific echostructure or vascular presentation); studies using non-histological standard of reference; studies with no categorical results (i.e., positive and negative CEUS evaluation); studies with overlapping patient or nodule data; cases report and case series (i.e., less than 15 cases); studies using first-generation US contrast; non-English language studies. Two researchers (MC and FB) explored online databases to find the final algorithm of study search. Three researchers (PT, MC, and CV) independently and in duplicate reviewed titles and abstracts of the retrieved articles, applying the above criteria; then, all authors independently reviewed the full text of the remaining articles to determine their final inclusion.

Data extraction

For each included study, the following information was extracted independently and in duplicate by three researchers (PT, MC, and CV) in a piloted form: (1) study data (authors, year of publication, and country of origin); (2) number of patients evaluated; (3) number of lesions; (4) histological diagnosis; (5) preoperative CEUS classification. The main paper and its supplementary data were searched. Data were cross-checked, and any discrepancy was mutually discussed.

Study quality assessment

The risk of bias of included studies was assessed independently by two reviewers (PT and MC) through the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool for the following aspects: patient selection; index test; reference standard; flow and timing. Risk of bias and concerns about applicability were rated as low (L), high (H), or unclear (U).

Statistical analysis

Sensitivity, specificity, PPV, and NPV of CEUS were calculated according to Galen and Gambino predictivity tests. For statistical pooling of the data, DerSimonian and Laird method (random-effects model) was used. In this model, pooled data represent weighted averages related to the sample size of the individual studies. Pooled data were presented with 95% confidence intervals (95% CI) and displayed using a forest plot. I-square index was used to quantify the heterogeneity among the studies as follows: < 25%, no heterogeneity; 25–50%, mild heterogeneity; 50–75%, moderate heterogeneity; > 75%, high heterogeneity. Egger’s test was carried out to evaluate the possible presence of a significant publication bias. Statistical analyses were performed using the StatsDirect statistical software (StatsDirect Ltd; Altrincham, UK).

Results

Eligible articles

The comprehensive computer literature search retrieved 1885 articles. After the exclusion of 326 duplicates, titles and abstracts of 1559 manuscripts were reviewed. Finally, 14 articles were included in the study according to the above criteria [1023]. No additional studies were retrieved after screening the references of these papers. Figure 1 details the flowchart of the search.

Fig. 1
figure 1

Flowchart of study search

Qualitative analysis (systematic review)

The included articles were published by authors from China (n = 9), Italy (n = 2), Germany (n = 2), and Austria (n = 1). The publication period was from 2006 to 2017. All studies used SonoVue (Bracco S.p.A., Milan, Italy) as the contrast agent. The overall number of reported nodules was 1515 from 1363 patients undergoing surgery for several reasons. A total of 775 nodules were classified as positive, and 740 lesions were negative at CEUS; 741 proved to be cancers at histology. Sensitivity and specificity recorded in each single study ranged from 73 to 93% and from 63 to 100%, respectively. The characteristics and findings of the included articles are summarized in Tables 1 and 2.

Table 1 Characteristics of the included studies
Table 2 Results of the included studies

Quantitative analysis (meta-analysis)

Pooled sensitivity of CEUS was 85% (95% CI 83–88), without inconsistency (I2 = 5%, 95% CI 0–50.2), with publication bias (P = 0.02) (Fig. 2). Pooled specificity was 82% (95% CI 77–87), with moderate inconsistency (I2 = 71%, 95% CI 44.9–82), without publication bias (P = 0.23) (Fig. 3). PPV was 83% (95% CI 77–88), with moderate inconsistency (I2 = 72%, 95% CI 46.6–82.3), without publication bias (P = 0.16) (Fig. 4). NPV was 85% (95% CI 81–88), without inconsistency (I2 = 42%, 95% CI 0–67.4), with publication bias (P = 0.01) (Fig. 5).

Fig. 2
figure 2

Forest plot of sensitivity (95% CI) of CEUS (random effect)

Fig. 3
figure 3

Forest plot of specificity (95% CI) of CEUS (random effect)

Fig. 4
figure 4

Forest plot of positive predictive value (95% CI) of CEUS (random effect)

Fig. 5
figure 5

Forest plot of negative predictive value (95% CI) of CEUS (random effect)

Study quality assessment

Quality assessment of the studies is reported in Table 3. Overall, all studies enrolled consecutive patients with thyroid nodules in a specific period. CEUS was conducted and interpreted before histology in almost all cases. Reference standard bias was rated as high since histology was performed in knowledge of the results of the index test. Flow and timing bias were rated as low since thyroid cancer is a chronic condition. As an exception, in some studies [12, 14,15,16, 18, 20, 23] the patients’ enrollment was unclear, while in other ones [15, 18] the blinding of CEUS reading was not explained.

Table 3 Quality assessment of the studies according to QUADAS-2

Discussion

An increase in thyroid nodule prevalence has been recorded in the last years. Interestingly, the malignancy rate among all nodules has been up to 10–15%. However, mortality from thyroid cancer has remained stable over time [24]. Thus, a significant challenge for clinicians is to identify those nodules harboring a clinically relevant malignancy. FNAC has traditionally been used for this purpose [1, 6]. However, at least a half of all biopsied nodules are proved to be benign [25, 26], and up to one-third have cytological findings that are inconclusive [27]. Therefore, a noninvasive diagnostic method that allows a reliable differentiation between malignant and benign thyroid nodules, superior to the current B-mode US features, is warranted. US is considered to be the best imaging modality for the assessment of thyroid nodular or diffuse diseases. Specifically, it has been widely used in patients with known and suspected thyroid nodules. To optimize gray-scale US performance, some investigators have adopted “score” or “pattern-based” approaches for malignancy risk stratification [1,2,3,4,5,6,7]. However, internationally endorsed sonographic risk stratification systems varied widely in their ability to reduce the number of unnecessary FNAC; inter-observer variability is another reported limitation [28, 29].

In the last years, several studies assessed the potential of US-elastography and CEUS to increase the accuracy of baseline US. The main advantage of the latter is that it could accurately evaluate the sequence and intensity of tumor perfusion and vascularity, which is the character of malignant tumors. Its diagnostic efficacy has been extensively evaluated in the examination of liver, uterus, prostate, and other organs. Previously, studies showed that CEUS can provide both a qualitative and quantitative evaluation of the contrast enhancement of thyroid nodules, too [7]. Nevertheless, there are no unified standards for quantitative or qualitative studies and no single feature of CEUS seems to be sensitive and specific enough for the diagnosis of malignancy. Furthermore, a previous meta-analysis of seven eligible studies found that the qualitative evaluation showed better sensitivity and specificity for the differentiation of benign and malignant nodules, compared with the quantitative evaluation [30]. Worthy of note, in this paper the authors included various methods for CEUS interpretation, resulting in a relatively high heterogeneity. Therefore, more advanced and detailed methods need to be further addressed in the future studies.

The present systematic review found a substantial number of articles reporting histological data on 1515 thyroid nodules with preoperative CEUS. Conversely to the previous studies, we analyzed the diagnostic accuracy of CEUS on thyroid nodules by searching and including all the eligible studies using histology as the reference standard. Pooled sensitivity and specificity of CEUS were 85% and 82%, respectively. Interestingly, the result of sensitivity was without inconsistency while specificity showed only mild inconsistency. Also, PPV and NPV were 83 and 85%, respectively. Our study results are in line with previous meta-analyses by Yu et al. [30] and Ma et al. [31] conducted using non-histological reference. However, the present study is the first meta-analysis on this topic using the histological diagnosis as the only reference standard. Thus, this review extends the previous information and could represent the proof of high accuracy of CEUS in discriminating benign thyroid nodules from malignant ones.

A further issue needs to be discussed. Would the inclusion of CEUS in clinical practice be associated with any additional benefit in the evaluation of thyroid nodules? As already stated, US represents the pivotal tool in these patients. In a recent meta-analysis, a head-to-head comparison among the most common TI-RADS was carried and ACR TI-RADS showed the highest performance in selecting thyroid nodules for FNA [32]. Interestingly, a study found that the accuracy of this TI-RADS can be further increased by combining it with CEUS [33].

Strengths and limits of this review should be discussed. Generally, small sample size studies reporting positive findings are more likely to be published than those describing negative results; here, we included only studies with a significant number of cases. In studies evaluating the performance of a diagnostic tool, the choice of the reference standard is crucial; here, we selected only studies with histology confirmation, and this certainly excludes a bias due to other weak standards (i.e., cytology). CEUS results may be interpreted according to several parameters (washout, washout peak, ring sign, etc.); here, we selected only papers in which the authors aimed to classify CEUS result in the easiest fashion (i.e., just positive and negative). Moderate heterogeneity for two of the evaluated outcomes was found, so caution should be taken in generalizing the results; this could be due to the operator’s CEUS interpretation, study design, and patients features. CEUS has been only recently introduced into clinical practice, as confirmed by the limited number of both studies and evaluated nodules; the results of the present review are thus meant to be preliminary, and further studies are needed. As recognized for thyroid nodule US stratification of risk of malignancy, a high accuracy by a combination of several CEUS parameters may be reached.

In conclusion, CEUS appears to reach a good performance in detecting or excluding thyroid cancer in thyroid nodules. CEUS may thus represent a useful diagnostic tool and can be used for selecting patients for FNAC or clinical follow-up.