Introduction

Fine needle aspiration (FNA) has reported false-negative rates between 0.7% and 11.0% [1], and this variability requires repeated biopsy of benign thyroid nodules determined at the first biopsy. However, some authors have reported low false negativity [2, 3] and suggested that there is limited benefit from repeated biopsy. Indeed, in a study of 2000 benign thyroid nodules that were followed for 8 years, there were no patient deaths due to thyroid cancer, thyroidectomy was performed in 24.0%, and the false-negative rate was estimated to be 1.3% [4]. In the absence of clinical or sonographic features of concern, repeated biopsy rarely yields a different result; thus, in many cases, the first biopsy without further intervention is sufficient.

The revised American Thyroid Association (ATA) guidelines emphasize the concept of “less is more” [5], such that less extensive surgeries and less surveillance testing will become more common because it is reasonable to do less to provide responsibly individualized therapy for patients [6]. Indeed, several researchers [7, 8] have reported that suspicious sonographic features are better predictors of malignancy in thyroid nodules with previous benign cytology [9, 10]. This integrated management, encompassing pathology results and sonographic features, is now more frequently advocated because routine repeat biopsy of initially biopsy-proven benign nodules may not be necessary in all cases.

Recently, various Thyroid Imaging Reporting and Data System (TIRADS) guidelines have been reported from several institutions and societies [11,12,13,14,15]. To our knowledge, the integration of TIRADS guidelines in initially biopsy-proven benign nodules has not been previously described. Therefore, our aim was to evaluate whether one biopsy is sufficient for benign thyroid nodule diagnosis via comprehensive analysis of initially biopsy-proven benign thyroid nodules by various TIRADS guidelines.

Materials and Methods

Patient selection

This retrospective study was approved by the institutional review board of the Asan Medical Center, and the institutional review board gave a waiver of informed consent for the use of the data. The study population was obtained from a historical cohort of 6762 thyroid nodules from 6493 consecutive patients, all of whom underwent core needle biopsy (CNB) or FNA between January 2013 and December 2013. These patients were enrolled in a previous study regarding the efficacy and safety of CNB in initially detected thyroid nodules [16]. Excluding nodules that previously underwent CNB or FNA (n = 1940), the final diagnosis was obtained for 2114 nodules from 1928 patients in the CNB group and 2708 nodules from 2625 patients in the FNA group. As a reference standard, the final diagnoses of benign nodules were determined by (i) pathology results of surgical resections, (ii) benign cytology results of FNA or CNB procedures that were repeated at least twice, (iii) an initial benign result from FNA or CNB and decreased or stable nodule size at an ultrasound (US) follow-up of ≥1 year. Malignancy was diagnosed after surgery or after repeated or initial biopsy. For the purpose of this study, we analyzed 2747 (57.0%, 2747 of 4822) thyroid nodules from 2643 patients that were diagnosed by initial biopsy with 28.2 ± 9.1 months follow-up (range, 12–41 months).

US-guided FNA and CNB procedures

US images were obtained for the evaluation of thyroid nodules before the US-guided procedure by using either an HDI 5000 (ATL Ultrasound, Philips) or Sequoia (Acuson, Siemens Healthineers) instrument equipped with a 5–12 MHz or an 8–15 MHz linear-array transducer. All US-guided procedures were performed by radiologists under the supervision of two faculty radiologists (J.H.B. and J.H.L., with 19 and 14 years of clinical experience, respectively, in performing and evaluating thyroid US). US-guided CNB and FNA procedures for thyroid nodules were performed according to current practice guidelines [17,18,19,20]. US-guided CNBs were performed under local anesthesia with 1% lidocaine by using a disposable 1.1- or 1.6-cm excursion, 18-gauge, double-action, spring-activated needle (TSK Ace-cut; Create Medic) [21, 22]. Before insertion of the core needle, the vessels along the approach route were evaluated by power Doppler US to prevent hemorrhage. Using a freehand technique, we advanced the core needle from the isthmus of the thyroid toward the nodule [22]. After the needle tip had been advanced toward the edge of the nodule, the distance of firing (1.1 or 1.6 cm) was measured before sequential firing of the stylet and cutting cannula of the needle [22]. A total of 1056 nodules in 983 patients were examined with CNB: 101 nodules had suspicious features seen on US, 270 heavily calcified nodules, 51 predominantly cystic nodules, and the remaining 634 nodules underwent CNB at the request of their referring physicians. Either US-guided FNA or CNB was used at the discretion of the US operator or referring physician without definite indications for selection.

US-guided FNAs were routinely performed by using a 23-gauge needle. In all cases, direct smears were made and immediately fixed with alcohol after the FNA procedure, then stained by using the Papanicolaou method [21]. The number of needle passes was determined by the operator during the procedure, and a maximum of four needle passes were permitted for each nodule. Additional FNAs were recommended in the case of incomplete visual assessment results. In the case of US-guided CNB, the adequacy of the procedure was monitored via real-time US, and the adequacy of the tissue core was assessed by visual inspection [22]. An additional CNB was performed when the targeting of the lesion was considered to be inaccurate. The maximum number of CNBs performed during a single session was two. Each patient was observed after firm local compression of the biopsy site for 10–20 min after the biopsy. If patients complained of pain or neck swelling, a repeat US examination was performed to evaluate possible complications.

All CNB specimens and FNA cytological analysis were reviewed by a thyroid cytopathologist (D.E.S., with 11 years of clinical experience in thyroid cytopathology). FNA cytology diagnoses were categorized into six categories according to the Bethesda System for Reporting Thyroid Cytopathology [21, 23, 24]. Although standardization of CNB diagnostic criteria for thyroid nodules had not yet been established during our study period, the histological results of CNB were categorized into the six categories of the Bethesda System [20, 21, 23, 25, 26].

Analysis of US findings and malignancy rate

When analyzing the US images, the two radiologists (S.M.H. and J.H.B with 9 and 20 years of clinical experience, respectively, in performing and evaluating thyroid US) assessed the thyroid nodules by using criteria that were obtained from published reports [11,12,13, 27], including size, internal content (solid, predominantly solid, predominantly cystic, cystic), shape (ovoid to round, taller than wide, irregular), margin (well-defined smooth, microlobulated or spiculated, ill-defined), echogenicity of the solid portion (hyper- or iso-echogenicity, hypoechogenicity or marked hypoechogenicity), and the presence of microcalcification, macrocalcification, and/or rim calcification. The relationship between the final diagnosis and US features was assessed [13, 28]. The suspicious US features, as suggested by the Korean Society of Thyroid Radiology (KSThR) and Moon et al, are as follows: taller than wide shape, spiculated/microlobulated margin, marked hypoechogenicity, and presence of microcalcifications [13, 27]. A thyroid nodule with at least one of these suspicious US findings was classified as a suspicious nodule during US examination. We calculated the probability of malignancy by using a web-based TIRADS (www.gap.kr/thyroidnodule.php) [14], the Korean-TIRADS (K-TIRADS) guideline [12], the revised ATA guidelines [11], and the French TIRADS guideline (proposed by Russ et al) [15]. For our analysis, suspicious nodules were defined as those with a score of ≥8 on the web-based TIRADS, high suspicion according to the K-TIRADS guideline, high suspicion according to the ATA guidelines, and a score of ≥4B according to the French TIRADS guideline.

Statistical analysis

Statistical analysis was performed by using the SPSS software package (Version 19.0 for Windows; SPSS, IBM). Each of the US characteristics was analyzed to determine its association with a benign or malignant diagnosis. Categorical data were summarized by using frequencies and percentages. Either the chi-squared test or Student’s t-test was used to evaluate the relationship between the variable factors and benignity or malignancy of the nodules. Two-tailed p-values of <0.05 were considered as indicative of statistical significance.

Results

Demographic data and US characteristics of all nodules

The size of nodules that were diagnosed by initial biopsy ranged from 2 mm to 96 mm (mean size, 14.4 mm). The proportion of nodules with a diameter <1 cm was 37.0% (1017/2747) and those with a diameter ≥1 cm was 63.0% (1730/2747). Two nodules that measured 2 mm underwent FNA because of contralateral cancer and to decide on performing either lobectomy or total thyroidectomy depending on the diagnosis of the nodule. The malignancy rate was significantly higher in the thyroid nodules with a diameter <1 cm than in the nodules with a diameter ≥1 cm (14.2% vs. 4.9%, respectively; p < 0.001; Table 1). Significant differences were found in all US features representing benign and malignant nodules, regardless of size (Table 2). None of the included patients experienced any major complications associated with intervention or hospitalization. Four patients (0.4%; four of 1056) developed a hematoma after the CNB, but resolution of the hematoma occurred following compression and rest for 1 h.

Table 1 Demographic data of patients in this study (n=2747)
Table 2 Ultrasonography characteristics of benign and malignant thyroid nodules in this study

Overall, the calculated thyroid malignancy rate was 8.3% (229/2747) when an initial biopsy with ≥1 year of follow-up was used as the gold standard for benign diagnosis. For malignant nodules (n = 229, 8.3%), the diagnoses upon histological examination included papillary thyroid carcinomas (PTC) (n = 222), medullary carcinomas (n = 2), anaplastic carcinoma (n = 3), and metastasis (n = 2). The diagnoses of benign lesions (n = 2518, 91.7%) upon histological examination comprised 2518 benign non-neoplastic nodules, including nodular hyperplasia, adenomatous goiter, follicular proliferating lesions, and thyroiditis.

Comparison of the malignancy probability assessed by various malignant risk systems

Table 3 shows the malignancy probability of initially biopsy-proven benign thyroid nodules according to various TIRADS guidelines. The malignancy probability of the “low suspicion” category by K-TIRADS (category 3) is 1.3%, the “low suspicion” category of ATA is 1.2%, and the “very probably benign” category of the French TIRADS guideline (category 3) is 1.2%, all lower than the ≤3.0% malignancy probability. Applying the web-based TIRADS [14], we could subdivide each malignancy probability by using a score range of 0–13. A cutoff of ≤3.0% malignancy probability was observed at a score of 3 (1.8%) and increased substantially at a score of 4 (7.3%) (Fig. 1). We additionally analyzed subgroups of nodules that matched the criteria for the “intermediate suspicion” category by K-TIRADS guidelines (category 4), divided into three categories: i) iso- or hyperechoic nodule with any suspicious US feature, ii) solid hypoechoic nodule without any suspicious US feature, and iii) partially cystic nodule with any suspicious US feature [12]. The lowest malignancy probability was observed in solid hypoechoic nodules without any suspicious US feature (9.5%).

Table 3 Malignancy rate according to various Thyroid Image Reporting and Data System (TIRADS) guidelines and a web-based risk stratification system (n=2747)
Fig. 1
figure 1

Management pathway algorithm of initially biopsy-proven benign thyroid nodule by various US-based risk stratification systems

Discussion

Our study results demonstrated that initially biopsy-proven benign nodules exhibited a ≤3.0% malignancy probability when assessed as “low suspicion” by K-TIRADS (category 3) guideline, “low suspicion” by the ATA guideline, the “very probably benign” category by the French TIRADS guideline (category 3), and a score of ≤3 by the web-based TIRADS. Therefore, one biopsy was found to be sufficient for these thyroid nodules. However, even when the initial biopsy result was benign, the lowest malignancy probability was 6.8% in nodules with a “high suspicion” US pattern. Therefore, one biopsy was found to be insufficient for the initially biopsy-proven benign nodules with a higher TIRADS assessment category. These results stressed the value of a combination of biopsy results and TIRADS assessment categories. Moreover, current thyroid radiofrequency ablation (RFA) guidelines require pathological confirmation of benign status at two separate US-guided FNAs before RFA. The current study results showed that two biopsy procedures before RFA were unnecessary. This more conservative paradigm shift regarding initially proven benign nodules is concordant with the “less is more” theme suggested by the 2015 ATA guideline [5], which suggests that less surveillance testing will become more common because it is reasonable to do less.

Several investigators [10, 11, 29,30,31,32] have emphasized the combination of US features and pathological information rather than merely requiring nodule growth as an indication for repeat biopsy. The ATA guideline [11] states that given the low false-negative rate of nodules with prior benign cytology and higher yield of missed malignancy based on US pattern rather than on nodule growth, management should be determined by risk stratification based on US pattern. In a prospective study [31], the cytologically benign nodules without suspicious features exhibited only a 0.7% incidence of cancer and 1.1% false-negative rate during 5 years of follow-up. Therefore, repeated biopsy is unnecessary for these nodules. In contrast, the malignancy risk (20.4–56.6%) has been found to be significantly higher in benign nodules with accompanying suspicious US features [9, 10, 29, 32]; thus, repetitive biopsy or even diagnostic surgery is necessary for US–pathology mismatched nodules (28, 29, 38, 52). This approach is suggested by the ATA guidelines as well [11]. To justify our results, we applied various TIRADS guidelines and achieved a relatively low malignancy probability in initially biopsy-proven benign nodules with “low suspicion” US patterns. Similarly, Hong et al [30] reported a malignancy probability of <3.0% (0%, 0.4%, and 2.4%, respectively) in thyroid nodules with benign cytology results stratified according to K-TIRADS categories 2, 3, and 4 and recommended observation in lieu of repeated biopsy. Nevertheless, the initially biopsy-proven benign nodules with higher TIRADS assessment categories by various TIRADS guidelines exhibited a malignancy probability of ≥6.8%, which is higher than the malignancy probability of the benign category (0–3.0%) [24].

For malignancy risk stratification, several attempts have been made to convert the TIRADS “pattern-based” approach to a “score-based” approach [33, 34]. Such scoring risk stratification models can be implemented in clinical practices with varying volumes of patients and in sites with varying levels of professional expertise. Moreover, they permit more personalized management, with >10 different ranges of malignancy risk scores for thyroid nodules [35]. More recently, the American College of Radiology (ACR) developed the TIRADS [36] by allocating points to more suspicious features, summing the points, and determining the TIRADS category of the nodules. With a lower malignancy risk assigned to benign nodules without any suspicious US features or a score of 0, a lower biopsy rate can be expected. According to the web-based scoring risk stratification model (www.gap.kr/thyroidnodule.php), the overall risk of malignancy for score 0 nodules was <5% [14]. Our current risk results equated to 1.8% in score 3 nodules and increased to 7.3% in score 4 nodules based on the web-based scoring system. Thus, by integrating the scoring systems, prediction of the malignancy probability of initially biopsy-proven benign thyroid nodules will increase biopsy efficacy regarding repeated biopsy, thereby enabling more personalized and optimized management.

Given the potentially higher false-negative rate of biopsy in larger nodules, some groups have suggested performing lobectomy when thyroid nodules are above a certain size [37]. In addition to surgery, RFA is one of management options for reducing nodule volume and relieving nodule-related clinical problems. US-guided RFA is a nonsurgical technique that has been used for the treatment of benign nodules [38]. Pathological confirmation of benignity at two separate US-guided FNAs is usually a requirement before an RFA procedure [39]. When we analyzed initially biopsy-proven benign thyroid nodules according to size, we observed consistently low malignancy probabilities when assessed as “low suspicion” by K-TIRADS (category 3, 1.4% malignancy probability in nodules >3 cm), “low suspicion” by ATA guideline (1.5%), “very probably benign” in French TIRADS guideline (category 3, 1.5%), and a score of ≤3 (1.2%) by web-based TIRADS. With further accumulation of evidence indicating that one biopsy result is sufficient for benign thyroid nodule diagnosis, patients may proceed to ablation therapy and be relieved of nodule-related clinical problems more promptly.

There were several limitations in our study. First, its retrospective design may have induced selection bias; however, such a problem would be overcome to a certain extent by the large study population. Second, we assumed that benign cases were “truly benign” after ≥1 year of follow-up; however, the fact that no test has a 100.0% negative predictive value may explain the relatively high calculated thyroid malignancy rate of 8.3% observed in our study. In addition, 2.8% (77/2,747) of the nodules measured >3 cm and were palpable. Thus, larger prospective cohorts are required to determine the number of benign nodules that behave as “truly benign” and to investigate any significance regarding an asymptomatic patient population for screening. Because of the relatively short follow-up period, we could not assess the long-term outcome or growth of these benign nodules. In fact, a previous study [40] that compared the rates of new diagnosis of malignancy between long-term and short-term follow-up of patients with biopsy-proven benign nodules found no significant difference in outcomes. Last, we did not evaluate the interobserver variability.

In conclusion, when initially biopsy-proven benign nodules exhibiting a “low suspicion” US pattern and low malignancy probability are stratified by various TIRADS, imaging surveillance, rather than second biopsy, is warranted. However, repeated biopsy should be performed for thyroid nodules with suspicious US features or those that belong to higher TIRADS assessment categories even when initial biopsy results indicate that the lesion is benign.