Introduction

Thyroid nodules are very common in the general population. The occurrence of tangible nodules varies from people up to age 50 years from 4 to 21%, while for ultrasound diagnosis of nodules, it varies from 21 to 67% [1,2,3,4,5]. The series of diagnostic modalities consist of biochemical tests such as the measurement of calcitonin, TSH and thyroid autoantibodies which are less suggested in the diagnosis of the common types of thyroid cancer, also ultrasound-guided core needle biopsy (CNB), and fine-needle aspiration (FNA), Doppler sonography, and scintigraphy are done in diagnosis of thyroid nodules [6]. Preventive management of thyroid nodules is a relatively intense process because there is no method with 100% sensitivity and specificity to detect thyroid cancer [7]. The precision aspiration biopsy of the needle under ultrasound guidance (FNA) is a useful method for assessing thyroid nodules. However, there are limitations to the ultrasound guidance for FNA cytology, containing those categorized as uncertain or non-diagnostic aspirated sample [8,9,10]. Previous studies have shown that the malignancy rates for uncertain or non-diagnostic aspirated samples of ultrasound-guided FNA samples are 60.0% and 10.9% separately [8,9,10]. This may increase the delay of the final diagnosis of thyroid cancer and the unnecessary thyroidectomy [11]. Today, the color flow Doppler ultrasonography (CFDS) is used to further evaluate suspected malignant tumors. However, there is a wide disagreement among relevant specialists in the validation of the CFDS method for differential diagnosis of benign and malignant thyroid nodules. Some researchers claim that it is very valuable, while others do not agree with this concept [12]. Several studies have pursued to detect ultrasound features that both show sensitivity and specificity for malignancy versus benign illness, but now, it is a question of whether these features are being identified successfully or not [13,14,15]. Sonographic features that have been detected in previous studies as being indicative of malignancy including irregularities, hypoechogenicity, or microlobulated borders, more tall shape of nodules, intranodular vascularity, and the presence of micro-calcifications [13, 15, 16]. In 2010, Iared et al. did meta-analysis on three articles which showed the overall sensitivity was 96% (95% CI 88–100%), and the specificity was 14% (95% CI 11–18%) [17]. The aim of our study is to investigate the role of CFDS in timely diagnosis and preventive management of malignant thyroid nodules.

Materials and methods

This meta-analysis was performed based on PRISMA guideline, and our registration code in PROSPERO is CRD42018111198.

Search strategy

This research was done by two independent researchers from September 5, 2018 to December 11, 2018, and an information extraction form was used for this purpose. Researchers recognized primary studies and selected studies that were in line with our selection criteria. Any disagreements arising between the two reviewers were decided by discussion with a third reviewer. The following sources of data were searched: Web of Science, International Medical Sciences, Scopus, MEDLINE, PubMed, Index Copernicus, DOAJ, Mbase, Google Scholar, EBSCO-CINAHL, Persian databases including Magiran, and SID using keywords such as: “color doppler ultrasoun*”, “thyroid nodule”, “malignant thyroid lesion”, “malignant thyroid nodule”, “follicular thyroid lesion”, “thyroid cancer”, and “neoplasm”.

Inclusion and exclusion criteria

Inclusion criteria were randomized controlled trials or other controlled trials, cohort studies, and cross-sectional studies. Exclusion criteria were letters, reviews, editorials, case reports, articles in abstract form only, articles identified as preliminary reports, irrelevant articles, and articles without the exact quantity information.

Risk of bias (quality) assessment

Selected articles for retrieval are evaluated by two independent reviewers for methodological validation, before entering in the assessment. The Review Manager version 5.0.20 (Cochrane Collaboration, Oxford, UK) was used to calculate sensitivity and specificity, also equivalent 95% Confidence Interval (CIs). The quality of the study was evaluated using 7 items from the Quality Assessment of Diagnostic Accuracy Studies-2 tool. Items that diagnose quality assessment of the list, should be answered the questions by “yes”, “no” or “unclear”. Items was scored such as “yes” items score 2, “uncertain” items score 1, and “no” items score 0. The highest score was 14, and the lowest score was 0 (Table 1) [18].

Table 1 Quality assessment of studies that were included in the meta-analysis
Table 2 Summary estimates of each parameter and their heterogeneity statistics
Table 3 Results of meta-regression to assess source of heterogeneity

Statistical analysis

After extracting true positive (TP), false positive (FP), false negative (FN) and true negative (TN) among included studies, at first sensitivity of studies was evaluated using graphical depiction of residual based such as influence, outlier detection, goodness-of-fit and bivariate normality graph.

Heterogeneity was assessed using the Q statistic of the Chi square value test and the inconsistency index (\( I^{2} \% > 50\% \;{\text{as}} \;{\text{heterogenity}} \)). Sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), and diagnostic odds ratio (DOR) (with 95% confidence interval) were found using a random effect model.

Forest plots were constructed to demonstrate the variations in the sensitivity and specificity estimates combined for color Doppler ultrasound to distinguish malignant thyroid nodules in each study. The sensitivity, specificity and DOR values with 95% confidence intervals (CI) were calculated. Summary receiver operating characteristic curves (SROC) were used to assess relationship between sensitivity and specificity. The area under the curve (AUC) of the SROC was calculated to estimate the performance of color Doppler ultrasound to distinguish malignant thyroid nodules. The AUC ≥ 0.97 demonstrated excellent accuracy; 0.93 ≤ AUC ≤ 0.96 is very good; 0.75 ≤ AUC ≤ 0.92 is good, and AUC < 0.75 can still be reasonable, but the test has obvious deficiencies in its diagnostic accuracy, and it is approaching the random test [19].

Meta-regression was used to assess source of heterogeneity. Deeks’ test was applied to detect publication bias. All data analyses used STATA version 14.0 software for Windows (StataCorp, College Station, TX).

Results

Research finding

From 1125 articles found, 288 documents were assessed independently. Duplicate papers were excluded, and 132 articles were examinated. In the next phase, 106 articles such as case report, irrelevant studies, and articles without enough information were excluded. Evaluating the full texts of the remaining papers, 20 papers were confirmed for meta-analysis. Figure 1 shows the evaluation process. Then, the key results of the selected documents were resumed (Table 4). The selected articles included in a study investigated 6272 patients who have malignant and benign thyroid nodules, and all the studies were compared color Doppler ultrasonography with FNA.

Fig. 1
figure 1

Flowchart of the study

Table 4 Details of the studies included in this systematic review and meta-analysis

Sensitivity analysis

Sensitivity analysis was used for the color Doppler ultrasound to distinguish malignant thyroid nodules and to assess individual study on pooled effect size (Fig. 2). Comparison of pooled diagnostic parameters using all studies except for three outliers [21, 25] showed that excluding the two studies reduced sensitivity from 0.74 to 0.70, specificity from 0.70 to 0.68, DOR from 6.0 to 5.0 and PLR from 2.4 to 2.2, whereas it increased NLR from 0.38 to 0.44 and AUC from 0.82 to 0.91. Specificity in both cases was 0.98. Also, I-square for sensitivity from 89.94 to 88.59% and for specificity from 99.77% to 97.61 changed. As we show these parameters are not essentially changed. Therefore, meta-analysis is performed on the reliable full set of studies.

Fig. 2
figure 2

Sensitivity analysis of included studies in diagnostics meta-analysis: a graphical depiction of residual-based goodness-of-fit, b bivariate normality, c influence analysis, d outlier detection

Heterogeneity

The results showed a high heterogeneity for sensitivity (\( I^{2} \) = 89.94%), specificity (\( I^{2} \) = 97.77%), DOR (92.8%), PLR (93.81%), and NLR (92.67%). Therefore, to estimate pooled effect size for each parameter a random effect model was suggested (Table 2).

According to a random effect model, the pooled sensitivity and specificity of color Doppler ultrasound to distinguish malignant thyroid nodules were estimated 0.74 (95% CI 0.62–0.83; \( I^{2} = 89.94\% \)) and 0.70 (95% CI 0.56–0.81; \( I^{2} = 97.79\% \)), respectively. It means that the high diagnostic accuracy (sensitivity = 74%) of color Doppler ultrasound is determined in this meta–analysis (Fig. 3).

Fig. 3
figure 3

Forest plot for the pooled estimates of sensitivity and specificity of color Doppler ultrasound to distinguish malignant thyroid nodules

To detect source of heterogeneity, meta-regression was used on publication year, country and quality of studies. For three variables, both sensitivity and specificity were reduced, but quality of studies has more effect on heterogeneity (Table 3).

The SROC curve consists of representing the paired results for sensitivity and specificity. According to SROC, AUC = 0.78 (95% CI 0.74–0.81) is between 0.75 and 0.92, so that color Doppler ultrasound has a good accuracy. Furthermore, the results of PLR = 2.4 (95% CI 1.5–4.0); NLR = 0.38 (95% CI 0.22–0.63) and DOR = 6.0 (95% CI 2.0–17) indicated that color Doppler ultrasound is a reliable methods for detection of malignant thyroid nodules (Table 3 and Fig. 4).

Fig. 4
figure 4

SROC curve for color Doppler ultrasound to distinguish malignant thyroid nodules. Each circle represents an individual research study. The size of the circle is proportional to the sample size of the study. The best-fitting curve lies between the other two curves demarcating its 95% confidence interval [summary receiver operating characteristic (SROC), area under the curve (AUC)]

Deeks’ funnel plot was depicted for 20 included studies. In the meta–analysis, according to this symmetric diagram and results of Deek’s test (p = 0.54) for the DOR, there was no evidence of publication bias among included studies (Fig. 5).

Fig. 5
figure 5

Deeks’ funnel plot for the assessment of publication bias among 20 included studies

We updated our search from September 5, 2018 to December 11, 2018 using search terms in the databases. In this search, 20 papers were updated with diagnostic rate of thyroid nodules by color Doppler. We used a sensitivity analysis for color Doppler ultrasound to evaluate malignant thyroid nodules of studies separately (Fig. 2).

Comparison of diagnostic parameters collected using all studies, except for two cases [21, 25] showed that, irrespective of the two studies, the parameters were decreased for sensitivity from 0.74 to 0.70, the specificity from 0.70 to 0.68, DOR from 6.0 to 5.0, and PLR from 2.4 to 2.2, while the NLR increased from 0.38 to 0.44 and the AUC increased from 0.82 to 0.91. In this study, a significant heterogeneity was found between the sensitivity and specificity of the studies, which was controlled by the meta-regression method. However, the clinical source of this heterogeneity may be due to factors such as the authors, the different locations of studies, the environment and tools, the accuracy and calibration of devices, the skills and training of people working with devices, or technicians. The findings of these studies were consistent with our meta-analysis results. It showed that color Doppler is a valuable method for evaluating thyroid nodules and can be used as a para clinical method for evaluating the risk of malignancy in a patient with palpable and non-palpable thyroid nodules.

Although the color Doppler manifestations cannot alone predicted 100% benign and malignant differentiation, several features found in a thyroid nodule increase the probability of diagnosis of the thyroid papillary carcinoma [41].

Color Doppler can be useful in managing fine-needle aspiration cytology (FNAC) patients with non-diagnostic or indeterminate results, which is the main limitation of FNAC from thyroid nodules. Non-diagnostic cytology can be detected in cystic or hemorrhagic lesions due to the lack of sufficient number of cells. Ultrasound is valuable in separating solid and cystic lesions. Wienke et al. [42] reported that 60% of benign thyroid nodules are solid, and 40% of them have cystic structure, but these results contradicted with a few authors’ studies (Lanonchicchi et al. and Appetecchia). In general, the presence of multiple cystic components can almost eliminate the probability of papillary cancers, which can help to identify benign cystic nodules that required a rapid FNAC or biopsy, especially if the patient has a risk factor of thyroid cancer, including a family history of thyroid cancer and head and neck cancer [41].

Calcification can be detected in about 10–15% of all thyroid nodules, but the location and pattern of calcification are more effective in prognosis of benign from malignancy. Perhaps, calcification is the most reliable feature of benign nodules, but unfortunately, it can occur in the small percentage of benign nodules. When calcifications are large and coarse, the nodule is more likely to be benign. When calcifications are small and dotted, there is a more probability of malignancy. Pathologically, these fine calcifications may be come from psammoma bodies, which are commonly seen in papillary cancers. In all studied articles, except for two cases, micro-calcification was significantly associated with malignancy in the nodules. Fortunately, color Doppler had an ability to determine the presence of calcification (p < 0.05). Some studies have shown that micro-calcification has a high specificity (95%) and a low sensitivity (29-36%) of thyroid malignancies [33].

Irregular margins with deterioration or micro-lobulations are a common histopathologic finding in malignancy [15]. In studies conducted by Papini et al. [13] and Lu et al. [43], the poorly defined boundary was found in 77.4% and 79% of the cases. The display of this irregular border represents the malignant nature of the lesions. Similarly, Solbiati et al. [44] found that 82% of the thyroid nodules were benign with regular margins, and 55% of the thyroid nodules were malignant with irregular margins [33]. In Kalantari’s study, irregular margin of nodules had the least sensitivity (33%) among cases that were investigated by color Doppler [39].

Some authors have recommended sampling from nodules larger than 1.0 or 1.5 cm [16, 45]. Clinicians usually refer patients to the fine-needle aspiration biopsy (FNAB) if the nodule size is greater than 1 cm. However, a larger thyroid nodule is not a risk factor for malignancy [46]. Papillary thyroid cancers that smaller than 1–1.5 cm may show metastasis to the primary lymph node or spread around [15]. Varverakis et al. concluded that nodules less than 2.5 cm have more vascularity, although the vascularity pattern (central or peripheral) does not have relationship with nodule size. Generally, the size of the nodule cannot be a good criterion for diagnosing benign from malignancy. According to the examined articles, if only the size is considered (< 15 mm) as a sign of the malignancy, we will lose 50% of the papillary carcinoma nodules [27].

The vascular nodule models in color Doppler have been suggested as a diagnostic tool for the prevention of thyroid malignancies, assuming that peripheral vessels represent a benign grade, while the central vessels represent malignancy grade [16]. Several studies have shown that central vessels are associated with malignant solid thyroid nodules [13, 16]. In particular, Frates et al. reported that the central vascularity was seen in a greater percentage of malignant nodules than benign nodules (42% and 14%). In a Kalantari et al. study of color Doppler ultrasonography, there was no significant correlation between blood flow intranodular and perinodular vascularity from benign and malignant nodules.

According to the studies on papillary and follicular lesions, the RI pattern of malignancy (> 0.75) was always associated with a vascular type that indicated malignancy in the Doppler. Follicular lesions in the FNAC can have a particular challenge; therefore, the pattern of vascular flow and color Doppler can help us in this scenario. Papillary carcinoma does not usually show a vascular pattern, probably because of some papillary carcinomas tend to be fibrotic, thus they have non-vascular patterns. In contrast, in all of the follicular cells, there was an increase in the central vascularity, so in the follicular carcinoma nodules that FNA is challenged, the color Doppler can be helpful, and it seems that both methods can be complement each other.

Varverakis et al. [5] found that nodules smaller than 2.5 cm have more vascularity than the larger nodules. Although, the vascularization pattern (central or peripheral) was not correlated with nodule size. Peripheral vascularity is a high feature of benign. Central vascularity is a feature of moderate malignancy. Several authors have suggested that vascular pattern III may be associated with malignancy, as was shown in Bakhshaee’s and his colleagues’ work [47].

Rago et al. [48] reported that among 74 patients with benign nodules, 38 patients had pattern III (51.8%). While among 30 patients with malignant nodules, 20 patients had pattern III. Studies done by Moon et al., Argalia et al., Tamsel et al., and Rosario et al., showed that there is no association between intravenous and malignant vessels [49,50,51,52]. Moreover, Clarke et al. [53] reported by examining color Doppler ultrasonography that cold nodules are mainly introduced with peripheral coronary arteries, and hot nodules are introduced with central vessels. They concluded that color Doppler ultrasonography could not properly detect benign from malignant thyroid nodules. According to the studies, thyroid nodules do not require interferences with the elastography grade I and II. Nodules with grade III which diagnosed by color Doppler are suspicious nodules, and we recommend FNAC. However, nodules with grade IV and V are highly suspected of malignant, and we recommend surgery.

Color Doppler ultrasonography in various studies has shown that halo sign is a thin and complete area, which strongly represents benign nodules. A halo sign of blood vessels is located around the periphery of the lesion (the basket pattern). Solbiati et al. [44] reported that halo sign is found in 36% of thyroid nodules and mostly found in benign than malignant cases (86% versus 14%). Halo sign seems to be a parenchyma of the normal thyroid, especially for fast-growing cancers, which are often thick, irregular, and imperfect. It is shown in a hypo-vascular or avascular color Doppler scan which Singh and colleagues also confirmed this.

Palaniappan et al. also showed that comet tail artifacts are seen in 23 nodules, all of them are benign (100%). Hence, this is a specific criterion for thyroid benign nodules. These findings are comparable with the findings of Wang and Ahuja, which conclude that the trace of the comet tail artifacts is a sign of benign [54].

In the reviewed articles, the prevalence of malignancy in hypoechoic nodules was higher than those of the hyperechoic and isoechoic nodules. In the study of Kalantari et al. [39], micro-calcification of tissue and hypoechogenicity are the most and the least predictive factors for malignancy in nodules (77% sensitivity, 76% specificity versus 24%, 41% PPV versus 14%, and 94% NPV vs. 86%), respectively. A study by Papini et al. on cytological examination of the non-palpable nodules showed that the appearance of hypoechoic is correlated with at least one independent statistical ultrasonography risk factor. It has been able to detect the majority of non-palpable tumors of the thyroid nodules [13].

Various studies have found that the risk of thyroid cancer with several nodules is less than the individual nodules. Brown in his study reported that ultrasonography included multiple nodules in 28% of the glands, and none of the nodules that had multiple nodules were malignant [55].

85% of patients with nodular diseases were female [23] and other epidemiological studies show that the incidence of palpable thyroid nodules is about 5% in women and 1% in men. But, studies show that age and sex are not the most important criteria for differentiating benign and malignant thyroid nodules.

According to the surveys that were done, certain malignant features grow stronger in the color Doppler including: micro-calcification, hypoechoic, thick, weak and irregular halo, lymphadenopathy, and additional local thyroid stimulation. Other features, such as the absence of a halo and macro-calcification, are less likely to be useful, but its side effects may be helpful.

Conclusion

However, needle aspiration cytology is the most accurate, sensitive, and specific diagnostic tool in preoperative thyroid evaluation and can be very helpful. Though when FNAC provides an undesirable number of cells for the removal of malignant cells, and there is no clinical feature for identifying the malignancy, another test with similar strength can be helpful under such conditions. Color Doppler is widely available, easily applied in practice, and as a diagnostic tool for evaluating thyroid nodules. Also, it has high sensitivity and specificity.

Resistive index > 0.75 and a pattern III or more in color Doppler predicts malignancy with the confidence. Finally, we recommend that the thyroid nodules with positive result of ultrasonography are required to take FNAC. Doppler can play a complementary role in FNAC for evaluation of individual thyroid nodules, due to its precision, cost-efficiency, easy access, and non-invasive nature. Not only it can be avoid unnecessary FNAC, it also increases diagnostic ability and helps the surgeon in conducting the surgery to do the necessary design for the malignant neoplasm before surgery. This method can prevent important problems, such as repeated biopsies and delays in detection.