Introduction

Although thyroid nodules are rare in the pediatric population, the incidence of pediatric thyroid cancer has been increasing in recent years [1]. The prevalence of thyroid nodules in the pediatric population, range between 1 and 1.65%, is lower than that in adults [2, 3]. However, in pediatric patients, thyroid nodules are more likely to be malignant and show aggressive features, including extrathyroidal extensions, lymph-node metastasis, and distant metastases [3-6]. Therefore, the differentiation of thyroid malignant nodules is vital crucial due to discrepant management strategies.

US is widely used to differentiate malignant thyroid nodules from benign ones in adult and pediatric populations. Based on US imaging features, the American College of Radiology Thyroid Imaging Reporting and Data System (ACR TI-RADS) was developed to provide a standard system for risk stratification and management for thyroid nodules of adult patients [7, 8]. The TI-RADS assigns the sum of points of five US imaging features, namely composition, echogenicity, shape, margin, and echogenic foci, which range from TR1 (benign) to TR5 (high suspicion of malignancy). The TI-RADS displays favorable sensitivity and moderate specificity in predicting thyroid cancer in adult patients [9-11]. However, the diagnostic performance of TI-RADS in pediatric patients with thyroid nodules has not been well studied. Recently, some studies [12-16] have applied TI-RADS to pediatric patients, but the diagnostic performance varies greatly. Kim et al [12] and Lee et al [13] applied new size cutoffs for biopsy based on US risk stratification systems to discriminate the thyroid malignant nodules, which had acceptable diagnostic accuracy but lacked external validation.

In this study, we intended to develop and validate modified size cutoffs based on TI-RADS to improve the diagnostic performance in predicting thyroid malignant nodules in patients younger than 19 years.

Materials and methods

The study was approved by the ethics committee of the two centers (the Second Xiang ya Hospital, Central South University; the Third Xiang ya Hospital, Central South University). Informed consent was waived due to the retrospective nature of this study.

Patients

Patients, younger than 19 years, underwent thyroid US from two tertiary referral hospitals between May 2005 and August 2022 were included in our study. Patients from the Second Xiang ya Hospital, Central South University, were classified as the training cohort, and patients from the Third Xiang ya Hospital, Central South University, were classified as the validation cohort. The inclusion criteria were as follows: (1) age < 19 years old; (2) presence of a nodule on thyroid US; and (3) acceptable diagnostic reference standards. The exclusion criteria were as follows: (1) atypia of undetermined significance or follicular lesion of undetermined significance; (2) an unclear final diagnosis; (3) repeated US monitoring thyroid nodules; and (4) lost images or poor image quality. The patient selection flowchart of the training cohort is presented in Fig. 1. Additionally, the demographic of the patients was also recorded.

Fig. 1
figure 1

Flowchart of patient selection for the training cohort of the study

Biopsies were performed based on the American Thyroid Association Management Guidelines for Children with Thyroid Nodules in both centers, namely for nodules 1 cm or larger or for smaller nodules with suspicious features at US [3]. However, for some patients, the indications for FNA or surgery were determined by clinicians based on patient age, symptoms, history of irradiation, cancer predisposition syndromes, and patient’s or parent’s preference [12].

Reference standard

The final diagnosis of all thyroid nodules was determined by the cytopathologic results based on the Bethesda system [17] from fine-needle aspiration (FNA) or by surgical pathology. The Bethesda system includes six diagnostic categories. Bethesda I: nondiagnostic or unsatisfactory; Bethesda II: benign; Bethesda III: atypia of undetermined significance or follicular lesion of undetermined significance; Bethesda IV: follicular neoplasm or suspicious for a follicular neoplasm; Bethesda V suspicious for malignancy; Bethesda VI: malignancy.

US image acquisition and analysis

The Acuson Sequoia (Siemens Medical Solutions) equipped with a 4–10 MHz linear transducer, the LOGIQ 9 (GE Healthcare) equipped with a 10–14 MHz linear transducer, and the Resona 7 (Mindray Medical International Ltd.) equipped with a 3–11 MHz linear transducer were used to perform thyroid US examinations. Imaging parameters were adjusted by the radiologist performing the US examination. Each target nodule was routinely obtained as at least one largest transverse plane, one largest long-axis plane, and one Doppler US image on the largest long-axis plane. Additional images containing important features (location, composition, echogenicity, shape, margin, echogenic foci, etc.) of the nodules were also acquired by the radiologist. All US images were stored in the Picture Archiving and Communication Systems.

Two radiologists (H.Y.X. and W.G.T., with 5 years and more than 8 years of experience in pediatric thyroid US, respectively) randomly reviewed all the US images in consensus. Both readers had not been involved in the original examinations and were blinded to the final diagnosis and other imaging findings of patients. For each nodule, both readers were asked to document the following US image characteristics according to the TI-RADS lexicon and User’s Guide [7, 8]: (1) Composition scores were as follows: cystic or almost completely cystic, spongiform (0 points), mixed cystic and solid (1 point), solid or almost completely solid (2 points); (2) Echogenicity scores were as follows: anechoic (0 points), hyperechoic or isoechoic (1 point), hypoechoic (2 points), very hypoechoic (3 points); (3) Shape scores were as follows: wider-than-tall (0 points), taller-than-wide (3 points); (4) Margin scores were as follows: smooth or ill-defined (0 points), lobulated or irregular (2 points), extra-thyroidal extension (3 points); (5) Echogenic foci scores were as follows: none or large comet-tail artifacts (0 points), macrocalcifications (1 point), peripheral (rim) calcifications (2 points), punctate echogenic foci (3 points). If an inconsistency arose, a third experienced radiologist (L.M.H., more than 35 years of experience in pediatric thyroid US) reviewed the US images to make the final decision.

The points of the above 5 categories were added to determine the TI-RADS level. In the TI-RADS, recommendations for FNA, follow-up, or neither are based on nodules’ TI-RADS level and their maximum diameter.

Simulation of size cutoff for FNA indication

We evaluated the accuracy of multiple size cutoffs for FNA of thyroid nodules categorized as TR3 and TR4. For TR3 nodules, seven size criteria with a difference of 0.5 cm from 1.0 to 4.0 cm were evaluated. Similarly, three size criteria with a difference of 0.5 cm from 0.5 to 1.5 cm were evaluated for TR4 nodules. As the cutoff value for FNA indication, we adopted the size with the highest accuracy for detecting thyroid cancer for nodules scored as TR3 and TR4. For TR5 nodules, nodules smaller than 1 cm were implemented FNA [12]. Namely, the biopsy was indicated for all TR5 nodules. Simulation 1 was defined as the same as TI-RADS, but only for TR3 nodules using the optimal size cutoff of 35 mm. In our study, the optimal criterion for FNA of TR4 nodules was 15 mm, which was identical to the size cutoff of the TI-RADS guideline. Similarly, simulation 2 was defined as the same as TI-RADS, but only for TR5 nodules using the newly suggested size cutoff of smaller than 10 mm. Simulation 3 was defined as simulation 1 and simulation 2 together (Table 1).

Table 1 Size cutoff values for ACR TI-RADS and three simulations of fine-needle aspiration

In our study, the unnecessary biopsy rates were defined as the number of benign nodules among the nodules that were recommended for FNA. The missed malignancy rates were defined as the number of malignant nodules among those not recommended for FNA.

Statistical analysis

Descriptive statistics were presented as numbers and percentages. Continuous variables were expressed as medians and ranges. The Mann–Whitney U test was used for continuous variables and the chi-square test was used for categorical variables. The diagnostic performance in the detection of thyroid cancer was assessed in terms of accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The area under the curve (AUC) of the receiver operating characteristic was used to estimate the probability of predicting thyroid cancer. The DeLong test was used to compare different AUCs. SPSS software (version 22.0, IBM Corp) and MedCalc software (version 15.2.2, MedCalc Software) were used for data analysis. p < 0.05 (two-side) was considered statistically significant.

Results

Patient characteristics and thyroid nodules

The baseline information of the enrolled patients and nodules in the training and validation cohorts is summarized in Table 2. There were no significant differences between the two cohorts in age, sex, methods of diagnosis, location, and diameter of nodules (all p > 0.05).

Table 2 Patient and nodule characteristics

Malignancy rates according to the ACR TI-RADS category

The distribution of benign and malignant thyroid nodules according to the TI-RADS category in the training and validation is summarized in Table 3. The malignancy rate of thyroid nodules was 29.7% (70/236) in the training cohort and 28.9% (65/225) in the validation cohort. The highest number of nodules fell into category TR5 (70 of 236 nodules) in the training cohort and TR3 (80 of 225 nodules) in the validation cohort.

Table 3 Distribution of benign and malignant thyroid nodules according to ACR TI-RADS risk levels in the training and validation cohort

Management of thyroid nodules based on ACR TI-RADS category and three simulations

According to the TI-RADS guideline, a total of 55.9% (132/236) and 52.4% (118/225) nodules would have undergone FNA, 12.3% (29/236) and 15.1% (34/225) would have been assigned follow-up, and 31.8% (75/236) and 32.5% (73/225) would have been recommended neither follow-up nor FNA in the training cohort and validation cohort, respectively (Table 4).

Table 4 Distribution of benign and malignant thyroid nodules according to ACR TI-RADS and three simulations in the training and validation cohort

The accuracy in the prediction of malignant thyroid nodules based on each size cutoff is shown in Fig. 2. In the training cohort, for TR3, the size cutoff value of 35 mm showed the highest accuracy (70.6%); for TR4, the size cutoff value of 15 mm showed the highest accuracy (60.5%), which was identical to the size cutoff TI-RADS guideline. We adopted the highest accuracy of size cutoff for TR3 and TR4, and the new size cutoffs for FNA indication were established.

Fig. 2
figure 2

Comparison of accuracy for differentiation of malignant thyroid nodules according to multiple size cutoff values. This graph shows a comparison of accuracy according to the variable nodule size threshold for TR3 (a) and TR4 (b). For TR3 nodules, the size cutoff value of 35 mm showed the highest accuracy. For TR4 nodules, the size cutoff value of 15 mm showed the highest accuracy

The distribution of thyroid nodules based on the simulations in the training and validation is summarized in Table 4. In simulation 3, 50.9% (120/236) and 45.3% (102/225) nodules would have undergone FNA, 17.8% (42/236) and 23.1% (52/225) would have been assigned follow-up, and 31.4% (74/236) and 31.6% (71/225) would have been recommended neither follow-up nor FNA in the training cohort and validation cohort, respectively.

Diagnostic performance in the prediction of thyroid malignancy

The diagnostic performance of the original TI-RADS, simulation 1, simulation 2, and simulation 3 are represented in Table 5.

Table 5 Diagnostic performance for the diagnosis of malignant thyroid nodules according to ACR TI-RADS and three simulations

The AUC of simulation 3 is higher than that of the TI-RADS (both p < 0.001) in the training cohort and validation cohort, respectively. In addition, simulation 3 had lower unnecessary biopsy rates (simulation 3 vs. TI-RADS: 45.0% (54/120) vs. 56.8% (75/132); 42.2% (43/102) vs. 56.8% (67/118)) and a lower missed malignancy rates (simulation 3 vs. TI-RADS: 5.7% (4/70) vs. 18.6% (13/70); 9.2% (6/65) vs. 21.5% (14/65)) in both cohorts.

Discussion

In this study, we developed and validated new criteria (≥ 35 mm for TR3 and no threshold for TR5) to indicate FNA based on TI-RADS to improve diagnostic performance in the prediction of thyroid malignancy in patients younger than 19 years. The AUC of simulation 3 was higher than the original TI-RADS for the differentiation of thyroid malignant nodules in the training cohort (0.681 vs. 0.809, p < 0.001) and in the validation cohort (0.683 vs. 0.819, p < 0.001). In addition, simulation 3 had lower unnecessary biopsy rates (simulation 3 vs. TI-RADS: 45.0% vs. 56.8%; 42.2% vs. 56.8%) and lower missed malignancy rates (simulation 3 vs. TI-RADS: 5.7% vs. 18.6%; 9.2% vs. 21.5%) in both cohorts. Thus, our results suggest that the new criteria (≥ 35 mm for TR3 and no threshold for TR5) could improve the diagnostic performance in the prediction of thyroid malignant nodules and reduce unnecessary biopsy rates and missed malignancy rates in patients younger than 19 years.

In our study, the malignancy rate of thyroid nodules was 29.7% (70/236) in the training cohort and 28.9% (65/225) in the validation cohort, which was lower compared with previous reports (46.6–55%) [12, 13]. In the training cohort, the TI-RADS guideline showed higher sensitivity (81.4% vs. 57.0%), but lower specificity (54.8% vs. 97%) and AUC (0.681 vs. 0.80) compared with the previous meta-analysis [16]. When the new criteria were applied to patients younger than 19 years in simulation 3, it showed higher sensitivity (94.3% vs. 81.4%), specificity (67.5% vs. 54.8%), and AUC (0.809 vs. 0.681) than TI-RADS guideline. Therefore, indicating FNA with the new criteria (≥ 35 mm for TR3 and no threshold for TR5) would be a reasonable option for the management of thyroid nodules in patients younger than 19 years.

Kim et al [12] believed that modification of the nodule size cutoff for FNA could help improve diagnostic performance in differentiating benign and malignant thyroid nodules in pediatric patients. In our study, for TR3 thyroid nodules, the cutoff value of 35 mm showed the highest accuracy. The optimal cutoff value is higher than TI-RADS guidelines, probably due to the mean thyroid nodule size being larger in children than in adults [18]. The mean thyroid nodule size in the present study was larger than in Liang et al’s study of adults (27 mm vs. 15 mm) [19]. In addition, adjusting the size cutoff for biopsy has resulted in improving diagnostic performance without increasing unnecessary biopsy rates. In the present study, the optimal size for FNA of TR4 nodules was 15 mm, which was identical to the size cutoff of the TI-RADS guideline.

In the present study, the risk of malignancy was 71.4% (50/70) of TR5 thyroid nodules. A previous study on pediatric thyroid TR5 nodules showed a similar malignancy rate (74.2%) [15]. Kim et al suggested biopsy of TR5 thyroid nodules smaller than 10 mm might be an effective strategy in pediatric populations [12]. Since the thyroid volume of children is smaller than that of adults, nodules smaller than 10 mm of TR5 should not be ignored. Previous meta-analysis analysis showed TI-RADS guidelines showed a high missed malignancy rate (21.7%) and unnecessary biopsy rates (62.7%) for pediatric thyroid nodules [16], which were comparable to our results (18.6% and 56.8%). However, the unnecessary biopsy rates in pediatric patients seemed to be higher than those reported in the adult population, ranging from 25.3 to 40.5% [20, 21]. This may be explained as follows: the presence of ectopic thymus tissue within the thyroid gland in children mimics a thyroid nodule, leading to unnecessary biopsy and surgery [22]. Another possible explanation for the higher rate of unnecessary biopsies in pediatric patients could be more concern about cancer in children based on the higher overall malignancy rate of nodules in children (22–26%) as compared to adults (5–10%) [3, 4, 23]. Hence, adult guidelines may be unsuitable for managing thyroid nodules in patients younger than 19 years.

Our study had several limitations. Firstly, the sample size of this study is relatively small. Secondly, the time span of US examination is long, and the image quality in the early stage is relatively poor. Thirdly, only thyroid nodules confirmed by FNA and surgery were included, which may cause the overall high malignancy rate in the study. Fourthly, there could be selection bias due to the retrospective nature of this study. addition, the existence of false-negative results of FNA might cause bias in the study. Fifthly, the size criterium of 5 mm was not evaluated for TR 3 nodules. Lastly, patients who underwent FNA but whose pathology revealed atypia of undetermined significance or follicular lesion of undetermined significance were excluded.

In conclusion, we developed and validated new criteria (≥ 35 mm for TR3 and no threshold for TR5) to indicate FNA based on the TI-RADS to improve the diagnostic performance and reduce unnecessary biopsy rates and missed malignancy rates for thyroid nodules in patients younger than 19 years.