Introduction

Thyroid nodules are a relatively common disease of the endocrine system, and in recent years, studies have shown that the detection rate of malignant thyroid nodules is increasing yearly and the age of onset is gradually becoming younger. The reported prevalence ranges from 10 to 50% due to differences in age, race, and gender composition distribution, as well as relatively small sample sizes in previous studies [1, 2]. The number of patients with overdiagnosed low-risk nodules is predicted to reach 5.1 million from 2019 to 2030 [3]. Accurate assessment of the malignant risk of thyroid nodules remains a key clinical concern as those overdiagnosed will receive extensive and unnecessary treatment. Current guidelines for the evaluation of thyroid nodules recommend high-resolution ultrasound as the core method for deciding whether to schedule FNAB and subsequent treatment options for patients [4,5,6,7].

To better assess the ultrasound characteristics of thyroid nodules and to produce standardized reports, researchers have recommended the TIRADS as the standard risk stratification scheme, which has been in existence for more than a decade and different versions have been developed. The most widely investigated is the thyroid nodule risk stratification system published by the American College of Radiology in 2017. ACR-TIRADS [8] is a score-based system that allows for the classification of all nodules. Thyroid nodules are classified into five categories based on ultrasound characteristics of the composition, echogenicity, shape, margins, and strongly echogenic foci. The decision to perform FNAB or further examinations is based on the classification of the nodule as well as its size.

The ACR TIRADS Committee estimated that the malignant risk of TR5 nodules was only greater than 20% [9]. FNAB results are required prior to surgery according to internationally accepted principles for the diagnosis and management of thyroid nodules. FNAB has not been widely implemented in a few local primary hospitals and there is no uniform application management guideline in China [10]. Based on the above situation, the Chinese Society of Ultrasound Medicine has developed the “2020 Chinese guideline for ultrasound risk stratification of thyroid nodules for malignancy: C-TIRADS” [11], which uses the counting method to classify thyroid nodules, following the well-established BI-RADS for breast masses, and the rate of suspected malignancy for each category is highly similar to that of BI-RADS.

Currently, fewer studies have been reported on the C-TIRADS, which gradually became popular in China although the most widely used was ACR-TIRADS before its publication. The aim of this study is to investigate the diagnostic performance and the unnecessary FNAB efficacy by comparing it with the highly recognized ACR-TIRADS guideline to identify their strengths and weaknesses.

Materials and methods

This study was approved by the Institutional Review Board and Ethics Committee of Sino-Japanese Friendship Hospital Affiliated with Jilin University (code: 20221124001).

Patients

The study population was 2356 thyroid nodules in 1862 consecutive patients who underwent thyroid ultrasonography with FNA biopsy or had postoperative pathological findings at our hospital from October 2019 to November 2021. Inclusion criteria were as follows: (1) nodules with definite pathological findings; (2) FNA biopsy or preoperative ultrasound with full report and saving of images as JPEG files. Exclusion criteria were as follows: (1) absence of complete preoperative ultrasound images of thyroid nodules; (2) "zombie" thyroid nodules; (3) lack of definitive pathological diagnosis after surgical resection; (4) atypical or few heterogeneous cells diagnosed by FNAB. Of these, 292 nodules were excluded due to lack of final pathological findings after surgical resection (n = 51), Pathological findings of calcified nodules (n = 13), cytopathological diagnosis of follicular lesions with heterogeneous cells or atypical hyperplasia (n = 228). Ultimately, a total of 2064 thyroid nodules from 1627 patients were included in this study (Fig. 1).

Fig. 1
figure 1

Flowchart of the selection of patients with 2064 thyroid nodules

Ultrasonography

All conventional examinations were performed with PHILIP EPIQ7 ultrasound diagnostic instrument equipped with a broadband line array probe ranging from 5 to 18 MHz. Ultrasound images of thyroid nodules were recorded and stored in JPEG file format by a US specialist with over 15 years of experience. The location of the nodule (right lobe, left lobe, and isthmus) should be recorded, Repeatedly measure the upper and lower diameters, right and left diameters, anterior and posterior diameters of the nodules three times and record the average values.

Nodules analysis

All selected thyroid nodules were evaluated by two experts with more than 15 years of experience in ultrasound diagnosis without providing pathological findings. They did not analyze all nodules first and then discuss the results to reach a consensus. Instead, each expert simultaneously analyzed each nodule according to the C-TIRADS and ACR-TIRADS guidelines, immediately discussed the nodule to reach a consensus and then proceeded to the next nodule analysis. When two experts disagree, they discuss it with a third expert who has 20 years of experience in diagnosis. After reaching a consensus on the previous nodules, this protocol serves as a standard for subsequent analysis. The classification was based on the ultrasound images of nodules in five aspects: composition, echogenicity, morphology, margins and strongly echogenic foci.

Statistics

SPSS 26.0 and MedCalc 20.0.22 were used for statistical analysis. Quantitative data were presented as median and quartile, while qualitative data were presented as composition ratio. The distribution of nodule size between the groups was analyzed by the Mann–Whitney test. The generation of the receiver operating characteristic (ROC) curve was based on the pathological findings to determine the optimal cut-off value for benign and malignant thyroid nodules by the ACR-TIRADS and C-TIRADS guidelines. The DeLong test was used to compare the differences in the area under the curve (AUC) between the two guidelines. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (+LR), negative likelihood ratio (–LR) and accuracy of the two guidelines were compared by the χ2 test. Agreement between the two guidelines and with pathology was compared by Cohen’s kappa coefficient. The rate of unnecessary biopsies was further calculated for both according to the guidelines. P < 0.05 was considered a significant difference.

Results

Basic characteristics of nodules

A total of 1247 nodules were malignant and 817 were benign of all thyroid nodules. In all, 944 nodules were located in the left lobe, 1063 nodules were located in the right lobe and 57 nodules were located in the isthmus of the thyroid. There were 1049 nodules less than 1 cm in diameter, of which 334 were benign and 715 were malignant. The number of nodules in different categories is shown in Table 1.

Table 1 Number of thyroid nodules in different size categories

Risk of malignancy for different categories of nodules in both guidelines

The actual malignant rate for nodules graded TR3 and above in the ACR-TIRADS guideline is higher than the recommended malignant rate, which is more pronounced in nodules less than 1 cm in diameter. The actual malignant rate of nodules within the TR3 to TR4C according to the C-TIRADS guideline was also higher than the recommended, but the difference was smaller than that of the ACR-TIRADS guideline. In addition, the C-TIRADS guideline assessed accurately that the malignant rate of nodules in TR5 was greater than 90% (Table 2). Evaluation of the various ultrasound signs in the C-TIRADS guideline revealed that the malignant rate of nodules with very hypoechoic, vertical and microcalcified features was above 85% (Table 3).

Table 2 Comparison of malignancy risk stratification results between C-TIRADS and ACR-TIRADS guidelines
Table 3 Frequency of ultrasound findings in all and <1 cm thyroid nodules according to the C-TIRADS

Comparison of diagnostic efficacy in different guidelines

All 2064 nodules could be classified according to both guidelines, as shown in the ROC curve (Fig. 2), the optimal diagnostic cut-off value for the ACR-TIRADS guideline was 3.5, which means the thyroid nodules are diagnosed as malignant when TR ≥ 4, and diagnosed as benign when TR < 4. C-TIRADS guideline could distinguish between benign and malignant nodules with a cut-off value of 3.5, meaning that a nodule is diagnosed as malignant when it is rated as moderately suspicious for malignancy or higher, and as benign when it is rated as low suspicion for malignancy or lower. A comparison of the overall diagnostic agreement between the two guidelines for thyroid nodules by the above assessment criteria showed high agreement (Kappa = 0.86 > 0.75, P < 0.001). The specific value is shown in Table 4. The results showed higher specificity and PPV of C-TIRADS guideline among all nodules (81.64%, 88.72%, both P < 0.05). The sensitivity and NPV of ACR-TIRADS guideline were higher (96.23%, 96.26%, both P > 0.05). The diagnostic effectiveness of the two guidelines for nodules of different sizes is shown in Table 5. The difference in AUC between nodules <1 cm and ≥1 cm groups was not statistically significant (all P > 0.05).

Fig. 2
figure 2

The ROC curve of the C-TIRADS and ACR guidelines

Table 4 Comparison of diagnostic consistency between ACR-TIRADS and C-TIRADS guidelines
Table 5 Comparison of the diagnostic efficacy of the two guidelines for thyroid nodules

Comparison of the unnecessary FNAB rates between C-TIRADS and ACR-TIRADS guidelines

FNAB was performed in 1108 of all thyroid nodules. 695 of these nodules were recommended for biopsy according to the C-TIRADS guideline, and 630 of these nodules were recommended for biopsy according to the ACR-TIRADS guideline. The unnecessary biopsy rate for the C-TIRADS guideline was higher than the ACR-TIRADS guideline(24.9%, 20.2%), and the same result was seen in terms of the false-positive rate for FNAB(21.2%, 15.5%). The difference was statistically significant (all P < 0.05). Furthermore, There is almost no difference in the rate of missed biopsies between the two guidelines (Table 6).

Table 6 Comparison of unnecessary biopsy rates between C-TIRADS and ACR-TIRADS guidelines

Discussion

The optimal diagnostic threshold of the C-TIRADS guideline in this retrospective study was 4B, whereas Chen and Wu [12] showed that the diagnostic threshold of this guideline was 4C, which may be due to the bias of the data source; the optimal diagnostic threshold of the ACR-TIRADS guideline was TR4, which is consistent with the findings of Mao et al. [13]. The variability in optimal diagnostic thresholds between the two guidelines is due to the different scoring criteria and malignant risk for each category. The C-TIRADS has a slightly higher accuracy rate (89.49%) than the ACR-TIRADS(88.23%), which improves the accuracy of classification while taking into account clinical usability. The AUC of the ACR-TIRADS guideline (0.922) was slightly higher than that of the C-TIRADS guideline (0.913), P < 0.05, indicating a better overall diagnostic performance.

Thyroid nodules are evaluated in five different ultrasound characteristics based on the ACR-TIRADS guidelines. Note that all ultrasound features are scored individually, except for the strong echogenic foci, which are scored superimposed. Thyroid nodules are classified into five risk categories, ranging from TR1 (benign) to TR5 (highly suspicious of malignancy). The C-TIRADS guidelines show that solid, microcalcified, very hypoechoic, blurred margins, irregular margins or extrathyroidal invasion, and vertical position (aspect ratio >1) are ultrasound features of nodules that are suspicious of malignancy (Fig. 3). Microcalcifications are one of the most specific features of malignancy with a specificity of 85.8–95% [14,15,16,17]. In contrast to the ACR-TIRADS guidelines, which score different types of calcifications, the C-TIRADS guidelines only score microcalcifications for risk stratification, whereas previous studies found that eggshell discontinuity coarse calcifications and peripheral calcifications were highly associated with malignancy [18,19,20]. In addition, the C-TIRADS guidelines score only very hypoechoic, whereas the study found 78% of papillary thyroid carcinomas are hypoechoic, and since 30.6–55% of benign nodules are also hypoechoic, it is a sensitive sign but not specific [21,22,23]. ACR-TIRADS scores hypoechoic as 2, which may explain its higher sensitivity than the C-TIRADS guidelines. Focal strong echogenicity within thyroid nodules is classified into three types according to the C-TIRADS, including microcalcifications, comet tail artifacts, and punctate strong echogenicity of uncertain significance, one or more of which may be present in the same nodule. The C-TIRADS guideline is innovative in not scoring focal strong echoes of the uncertain significance of <1 mm. "Comet tail" artifact is considered a benign feature and given a score of –1 according to the C-TIRADS while the ACR-TIRADS does not. These may explain its higher specificity.

Fig. 3
figure 3

a It is a solid, hypoechoic, vertically located nodule that invaded the thyroid peritoneum. The pathological finding showed a malignant nodule. ACR guidelines classified it as TR5, meaning a malignant nodule. C-TIRADS guidelines classified it as TR4C, meaning a malignant nodule. The diagnostic results are the same for both guidelines, indicating a good consistency of diagnosis. b It is a mixed cystic-solid, microcalcification, irregularly shaped nodule. The pathological finding showed a malignant nodule. Classified by the ACR guidelines as TR5, meaning a malignant nodule. The C-TIRADS guidelines evaluate multiple predominantly solid and/or predominantly cystic nodules with similar ultrasonic manifestations in the thyroid as TR3, meaning benign nodules. The ACR guidelines are correct, indicating that the sensitivity of the ACR guidelines is relatively high. c It is a solid, hypoechoic, smooth-edged nodule. The pathological finding showed a benign nodule. Classified by ACR guidelines as TR4, meaning a malignant nodule. And classified by C-TIRADS guidelines as TR4A, meaning a benign nodule. The diagnosis was correct by C-TIRADS guidelines, indicating its higher specificity. d It is a solid, hypoechoic, marginal lobulated nodule. The pathological finding showed a malignant nodule. Classified as TR4 by the ACR guidelines, meaning a malignant nodule. Classified as TR4B by the C-TIRADS guidelines, meaning a malignant nodule. The diagnosis is correct in both guidelines, indicating a high degree of agreement between the two guidelines

The C-TIRADS guidelines have higher PPV, +LR, and –LR in both groups of nodules, which may be due to their more rigorous control of malignant signs, and the five malignant signs were derived from multivariate logistic regression analysis with large multicenter data, which is relatively realistic and reliable, and the interpretation of each ultrasound feature as only yes and no, By counting the number of five positive and one negative ultrasound features to determine the category. This simplified classification method was easy to improve the interobserver agreement. In contrast, the ACR-TIRADS guidelines require multiple interpretations of each ultrasound feature, which may overlap between categories, and the risk of malignancy of some ultrasound features may vary depending on other features. In addition, the scores of each ultrasound feature are mainly determined by expert opinion rather than statistical analysis, which may explain the lower accuracy of ACR-TIRADS in predicting the malignant risk of thyroid nodules compared with C-TIRADS. Our study takes interobserver variability into account to some extent, and it has been suggested that artificial intelligence can effectively limit interobserver variability through standardized mathematical algorithms [24].

This study demonstrated that in nodules ≥1 cm, C-TIRADS showed higher sensitivity, specificity, PPV, accuracy, and AUC than nodules in the <1 cm subgroup, indicating better diagnostic performance in these nodules. In nodules ≥1 cm in diameter, there was no significant difference between the diagnostic efficacy of the ACR-TIRDS guideline and the C-TIRADS guideline based on ultrasound modality, and in the nodule diameter <1 cm group, the specificity of the C-TIRADS guideline (79.04%) was significantly higher than that of the ACR-TIRADS guideline (67.66%), while there was no significant difference in sensitivity. The findings of Zhou et al. [25] were similar to the present study. In contrast, the results of Zhu et al. [26] showed that the C-TIRADS guideline had a high specificity regardless of whether the nodule diameter was smaller or larger than 1 cm, and its sensitivity was significantly lower than that of the ACR-TIRADS guideline. Currently, studies on the diagnostic ability of each guideline for subcentimeter nodules are less reported and controversial, but it is clear that in our study, the diagnostic efficacy of both guidelines for nodules <1 cm in diameter was not inferior to that of nodules ≥1 cm.

Recommendations for FNA or ultrasound follow-up were based on the grade and its maximum diameter of the nodule. A biopsy is recommended for nodules in risk grades TR3 to TR5 according to ACR-TIRADS, which have a size threshold of 2.5 cm for FNA for nodules of mild suspicion of malignancy. A biopsy is recommended for nodules in grades TR4A and above according to the C-TIRADS, with a size threshold of 1.5 cm for nodules of TR4A. if the nodule is multifocal, or immediately adjacent to the perineum, trachea, or involves the recurrent laryngeal nerve, the diameter for biopsy is reduced to 1 cm. Both guidelines, while reducing the rate of unnecessary biopsies, are expected to result in a higher percentage of missed malignant nodules. This is unavoidable because some malignant tumors have benign ultrasound features. In the present study, the rate of unnecessary biopsies was lower in ACR-TIRADS (20.2%) than in C-TIRADS (24.9%), P < 0.05. This may be due to the higher biopsy threshold for nodules of equal malignant risk compared with C-TIRADS. The nodal leakage rate was slightly higher in ACR-TIRADS (59.7%) than in C-TIRADS guidelines (58.1%), and the difference was not statistically significant (P > 0.05).

This study also has some limitations. First, most nodules are diagnosed based on pathological findings and the rest based on definite cytopathological findings, while the medical level in China is lower than that in developed countries, FNA has not been widely carried out in Chinese hospitals at all levels and its value has not been widely recognized in China, which may lead to selection bias. Second, this was a single-center retrospective study. Although the consistency of diagnostic results for all thyroid nodules was guaranteed, the heterogeneity of the patient population was less than that of a multicenter study. Third, our hospital is a tertiary referral center for patients with more severe diseases, which may have led to a sample bias that increased the proportion of malignant nodules and decreased the number of low-grade nodules, thus affecting the diagnostic efficacy of the guidelines, which explains why the calculated malignancy rate for certain grades of nodules classified according to the guidelines is much greater than the recommended. Finally, the ACR-TIRADS guidelines do not recommend FNAB for nodules smaller than 1 cm, however, C-TIRADS recommends FNAB for nodules smaller than 0.5 cm under certain conditions. in this study, 50.8% of nodules were smaller than 1 cm, which somewhat limits the comparison of C-TIRADS with other guidelines.

Conclusion

Both guidelines showed excellent diagnostic efficacy and overall diagnostic consistency for thyroid nodules. The ability to diagnose nodules in the <1 cm subgroup was not inferior to nodules ≥1 cm. Compared with ACR-TIRADS, although the C-TIRADS guideline has a slightly higher rate of unnecessary biopsies, it is simple and significantly improves the stratification efficiency of thyroid nodules, which is more suitable for the Chinese situation.