Introduction

In the diagnosis of thyroid nodules, both ultrasound (US) and US-guided fine-needle aspiration biopsy (US-FNAB) are important diagnostic tools, because they are simple, safe, accurate, and cost-effective. Due to the high prevalence of thyroid nodules, it is impossible to aspirate every thyroid nodule detected on US and determining which nodules should be aspirated is an essential part of treatment. Many organizations, such as the American Thyroid Association (ATA) [1], American Association of Clinical Endocrinologists and Associazione Medici Endicrinologi (AACE/ANE) [2], and the Society of Radiologists in Ultrasound (SRU) [3], have recommended guidelines to determine which nodules should be aspirated. The SRU [3] emphasized the size of masses, all of these organizations [13] stressed the importance of US features of focal thyroid nodules.

Although US-FNAB is highly regarded as a diagnostic tool, it also has a relatively high false-negative rate, ranging from 0.7 to 21% [48]. To reduce the false-negative rate of FNAB, performing repeat aspirations may be considered. While several researchers recommend performing routine repeat aspirations on thyroid nodules [9, 10], others disagree [1, 2, 11]. Shin et al. [12] demonstrated that the risk of malignancy was higher in thyroid nodules with clinico-radiological suspicion of malignancy than that in nodules without clinico-radiological suspicion of malignancy, and they suggested that follow-up for benign-looking nodules with benign cytological results could be performed using imaging surveillance rather than repeated US-FNAB. However, they did not state which findings showed sonographically suspicious features and what the risk of malignancy was according to either US features or cytological findings, or a combination of US and cytological results.

The purpose of this study was to investigate the role of sonographic-cytological correlation in determining which nodules should be reaspirated to reduce the false-negative rate of US-FNAB.

Patients and methods

Patients

From May 2006 to December 2006, US-FANB was performed on 3,029 focal thyroid nodules in 2,785 patients at Severance Hospital. Of these, data from 672 focal thyroid nodules in 568 patients that underwent surgery were used as our study population. Demographic data for these 568 patients studied are shown in Table 1. The records of each patient were reviewed at the time of initial examination and included information regarding age, gender, and TSH levels. A retrospective review of US, cytological, and pathological records was also conducted. Our institutional review board approved this retrospective study, and informed consent was not required from patients. However, informed consent for FNAB was obtained from all patients prior to biopsy.

Table 1 Patient demographics and characteristics

Imaging and image analysis

US was performed using a 7- to 15-MHz linear array transducer (HDI 5000; Philips Medical Systems, Bothell, Wash.), an 8- to 15-MHz linear array transducer (Acuson Sequoia; Siemens Medical Solutions, Mountain View, Calif.), or a 5- to 12-MHz linear array transducer (iU22; Philips Medical Systems). Compound imaging was performed in all cases that used the HDI 5000 or iU22 machine. Real-time US was performed by one of five board-certified radiologists, all of whom were aware of the clinical problems of the patients.

Focal thyroid nodules were interpreted using sonographic features, including internal components, echogenicity, margin, calcifications, and shape (Fig. 1). The internal components were defined as solid, mixed, or cyst. A pure cyst with or without comet-tail artifacts was classified as benign. A mass with mixed components indicated that the mass has both solid and cystic components. Sonographic analysis for masses with mixed components was evaluated based on its internal solid components. Malignant sonographic features (Fig. 1) were defined as marked hypoechogenicity (decreased echogenicity compared with the surrounding strap muscle), microlobulated or irregular margin, microcalcification, and taller than wide in shape (being greater in the anteroposterior dimension than the transverse dimension). The sonographic features in this study were based on previously published criteria [13]. Nodules were classified as “probably benign” or “suspicious”.

Fig. 1
figure 1

Scheme with benign and malignant signs as defined in the study. A pure cyst with or without comet-tail artifacts was classified as benign. Malignant sonographic features were marked hypoechogenicity (decreased echogenicity compared with the surrounding strap muscle), microlobulated or irregular margin, microcalcifications, and taller-than-wide shape (being greater in the anteroposterior dimension than transverse dimension). A suspicious malignant nodule was defined if one of the above findings was present. If a nodule showed no suspicious features, it was classified as a “probably benign” nodule. a Colloid cyst with internal comet-tail artifacts; b -i well-circumscribed margin, -ii microlobulated margin, -iii irregular margin; c -i hyperechogenicity, -ii isoechogenicity, -iii hypoechogenicity, -iv marked hypoechogenicity; d -i microcalcifications, -ii macrocalcifications; e -i wider than tall in shape, -ii taller than wide in shape

Tissue examinations

After US evaluation of the thyroid gland, a US-FNAB was performed by the same radiologist who evaluated the thyroid gland by US. At our institution, US-FNABs were performed on either thyroid nodules with suspicious US features or the largest thyroid nodule if no suspicious US features were detected. In most cases, benign cysts were not aspirated, except upon the clinician or patient’s request due to the patient’s fear of cancer. Local anesthesia was not routinely applied. According to the radiologist’s preference, US-FNABs were performed with either a 23-gauge needle attached to a 20-ml disposable plastic syringe and aspirator, or a 23-gauge needle attached to a 2-ml disposable plastic syringe. Freehand biopsies were performed for all FNABs. Each lesion was aspirated at least twice. Two different techniques were used for aspiration biopsies, depending upon the radiologist’s preference, which included aspiration using an aspirator and capillary-action without aspiration [14, 15]. Samples obtained from aspiration biopsies were expelled onto glass slides and smeared. All smears were placed immediately in 95% alcohol for Papanicolaou staining. The remaining material was rinsed in saline for processing as a cell block. The cytopathologist was not on site during biopsy. Additional special staining was undertaken on a case-by-case basis upon the request of the cytopathologist.

One of five cytopathologists specializing in thyroid cytology interpreted US-FNAB. At that time, the cytopathologist had information from the US diagnosis made by the radiologist. At our institution, cytological reports of US-FNAB were classified into one of two categories: “adequate” or “inadequate”. A specimen was considered “adequate” if there was a minimum of six groupings of well-preserved thyroid cells, consisting of at least ten cells per group [2]. The “adequate” group was further divided into four subgroups: “benign”, “indeterminate”, “suspicious for papillary carcinoma”, and “malignancy”. A benign cytology included colloid nodules, nodular hyperplasia, lymphocytic thyroiditis, Graves’ disease, and postpartum thyroiditis. Indeterminate cytology included follicular or Hurthle cell neoplasm. The “suspicious for papillary carcinoma” cytological result was designated when the specimen exhibited cytological atypia (nuclei are crowded and overlapping, enlarged, and pleomorphic) but showed insufficient cellularity for definite diagnosis of papillary carcinoma. Cytological results were designated “malignancy” when the specimen exhibited abundant cells with unequivocal cytological features of cancer.

Repeat aspiration

Repeat aspirations were performed on 48 focal thyroid nodules. Repeat aspirations were done for the following reasons: benign cytological results but suspicious sonographic features (n = 25), inadequate cytological results (n = 14), suspicious for malignant cytological results (n = 5), patient’s anxiety (n = 3), and indeterminate cytological results (n = 1). The interval between first and repeat aspiration to avoid nuclear atypia related to aspiration was more than 90 days for most patients (range, 7–350 days) [16].

Result interpretation and statistical analysis

FNAB results were classified as either benign (negative) or malignant (positive). The malignant or positive category included “suspicious for papillary carcinoma” and “malignancy”. The “indeterminate” category was excluded from analysis because it could not be classified as a benign or malignant subtype by cytological features. Inadequate cytological results were also excluded from analysis. The benign or negative categories included benign cytological results. Results from FNAB were compared with the final histopathological diagnosis.

An independent two-sample t-test was used to compare the risk of malignancy according to age, gender, TSH levels, size, and US groupings of 672 focal thyroid nodules that were confirmed pathologically. Logistic regression analysis was performed to determine whether US groupings of focal thyroid nodules were a significant predictor of thyroid malignancy. Multivariate logistic regression analyses were performed to assess independent associations of malignancy with all factors found to be significant by univariate analysis with adjustment for significant factors. We evaluated the risk stratification of malignancy according to US groupings and cytological results, calculated the false-negative rate of FNAB, and investigated the cytological results of repeated aspiration. Statistical significance was assumed when the P value was less than 0.05. Statistical analysis was performed using SPSS 14.0 K for Windows (SPSS, Chicago, Ill., USA).

Results

Histopathology

There were 567 malignant and 105 benign thyroid nodules. Of 567 malignant thyroid nodules, 552 (97.4%) were papillary carcinoma, seven (1.2%) were a follicular variant of papillary carcinoma, four (0.7%) were medullary carcinoma, three (0.5%) were minimally invasive follicular carcinoma, and one (0.2%) was Hurthle cell carcinoma. Pathological results of 105 benign nodules included 82 (78.1%) benign adenomatous nodules, 13 (12.4%) lymphocytic thyroiditis, six (5.7%) follicular adenoma, and four (3.8%) fibrotic nodules. Table 2 shows the reasons for surgeries.

Table 2 Reasons for surgery for 568 patients

Statistics

The age, tumor size, and US groupings were significantly different between benign and malignant pathological groups. However, a statistically significant relationship did not exist between the risk of malignancy and either gender or TSH values. The mean maximal diameter of the malignant nodules (10 mm ± 6.8 mm) was significantly smaller than that of benign nodules (15.7 mm ± 12.8 mm) (P < 0.001). The mean age of patients with benign thyroid nodules (50.8 ± 11.9 years old) was significantly higher than that of those with malignant nodules (47.7 ± 11.8 years old) (P = 0.013). Logistic regression analysis demonstrated that the odds ratios were 0.968 (0.944-0.993, 95% confidence limits, P = 0.012), 0.964 (0.945–0.984, 95% confidence limits, P < 0.001), and 10.236 (6.008–17.442, 95% confidence limits, P < 0.001) for size, age, and US groupings, respectively. After adjustments for the size of masses and the ages of the patients, logistic regression analysis showed that the “suspicious” US group showed a significant association with thyroid cancer (P < 0.001).

Risk stratification of thyroid malignancy

Table 3 shows the US and initial cytological results in relation to the pathological results. The rate of malignancy was high in focal thyroid nodules with “malignancy” or “suspicious for papillary carcinoma” readings on FNAB, regardless of US features (98.5 and 92.2% in nodules that had “suspicious” and “probably benign” US features, respectively). In contrast, when the focal thyroid nodules were “benign” on FNAB, the rate of malignancy was lower for probably benign US features (2.9%) (Fig. 2) than suspicious features (56.6%) (Fig. 3). Focal thyroid nodules with “indeterminate” or “cell paucity” readings on FNAB, had a lower risk of malignancy with probably benign US features (16.7 and 7.7%, respectively) than with suspicious features (66.7 and 60.9%, respectively), although the number of focal thyroid nodules was small.

Fig. 2
figure 2

Papillary carcinoma in the thyroid gland of a 43-year-old woman. Transverse (a) and longitudinal (b) US images show a 0.6-cm-diameter, irregular (arrows) hypoechoic thyroid mass with internal microcalcifications (arrowheads). Cytological diagnosis was adenomatous hyperplasia. Because of the discrepancy between the sonographic feature and cytological diagnosis, surgery was performed. A papillary carcinoma was found

Fig. 3
figure 3

Adenomatous hyperplasia in the thyroid gland of a 46-year-old woman. Transverse (a) and longitudinal (b) US images showed a 1.9-cm-diameter, well-defined (arrows) hypoechoic thyroid mass without suspicious US features. Cytological diagnosis was adenomatous hyperplasia, which was compatible with US findings, but surgery was undertaken because there was a malignant mass in the right thyroid mass. The final diagnosis after surgery was adenomatous hyperplasia

Table 3 Histopathological correlation with sonographic, initial cytological, and pathological results for 672 focal thyroid nodules

Cytological results

The sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of FNAB were 94.2%, 83.8%, 97.9%, 64.8%, and 93%, respectively, for the 601 focal thyroid nodules based on pathology. With a false negative defined as a reading consistent with benign results on initial cytology and malignant results on final histopathological diagnosis, the false-negative rate of FNAB was 5.8% (31/533). The 31 false-negative FNAB results occurred in samples from 30 thyroid nodules with papillary carcinoma and one thyroid nodule with medullary carcinoma. Of these, 30 (96.8%) had “suspicious” and one (3.2%) had “probably benign” US features.

Results of repeat aspiration

Table 4 shows the histopathological correlation with initial and repeat cytological results for 48 focal thyroid nodules according to the US classification. Repeat aspiration resulted in “adequate” samples in 13 of 14 nodules (92.9%), and “suspicious for malignancy” or “malignancy” in 15 (93.8%) of 16 thyroid cancers with “benign” results on the initial aspirate. However, repeat aspirations were misdiagnosed as “suspicious for malignancy” in two (16.7%) of 12 thyroid benign nodules with “benign” results on the initial aspirate. The intervals between initial and repeat aspiration at these two nodules were 25 and 30 days, respectively.

Table 4 Histopathological correlation of initial and repeat cytological results for 48 focal thyroid nodules according to sonographic classification

Discussion

Until now, several investigators have tried to find a useful indicator of malignancy on US. Although there are some overlapping US characteristics between benign and malignant nodules [17, 18], several US features have been accepted as malignant features, such as irregular or microlobulated margins [13, 1923], hypoechogenicity [13, 1921], taller-than-wide shape [13, 24], microcalcification [13, 19, 20, 22, 25, 26], solidity [20, 22, 27], and intratumoral vascularity [19, 21, 27]. In this study, we used the US classification scheme we proposed that was based on our published data [13] for thyroid nodules detected on US because of its simplicity, high sensitivity, and negative predictive value (Fig. 1) [13, 28].

For diagnosing thyroid nodules, US-FNAB is an important diagnostic tool because it is simple, safe, accurate, and cost-effective. Although the use of US-FNAB for diagnosing thyroid nodules has improved detection rates of thyroid cancer, decreased the number of benign surgeries, and increased the cancer-detection rate for thyroidectomies [2932], it has some unavoidable limitations such as “inadequate” and false-negative FANB results. Because a missed diagnosis of malignant disease is possible, false-negative FANB results can be a major concern for both clinicians and patients in the management of focal thyroid nodules. Reasons for false-negative FNAB results include sampling errors and cytodiagnostic errors [33]. The Papanicolaou Society of Cytopathology Task Force on Standards of Practice recommends that the false-negative rate of FNAB should not exceed 2% [34]. However, many studies demonstrated unsatisfactory FNAB false-negative results (6.1-21%) [58]. To overcome the limitations of FNAB in diagnosing thyroid nodules, several studies have explored and reported on the diagnostic accuracy or inadequate rate of core biopsies with or without FNAB [3545]. A few studies demonstrated that a combined approach of FNAB/core biopsy showed higher adequacy [36, 44] and accuracy [35, 39, 42] than either procedure alone, although core biopsies showed better adequacy than FNAB [38, 43]. Therefore, core biopsy may be useful in cases classified as “inadequate” by FNAB [35, 46] but should not be used as an initial diagnostic tool as a replacement for FNAB [37, 46].

Until now, there was no definite guideline to reduce the false-negative FNAB rate. Although recent studies [9, 10] stressed that repeat aspiration may reduce the false-negative rate of FNAB, organizations such as the ATA and AACE/ANE do not recommend routine re-biopsy, but rather further clinical follow-up as long as the nodule does not show growth [1, 2]. However, our study revealed that thyroid nodules with suspicious US groupings have a high risk (56.6%) of cancer, even when the results of cytology are benign, which suggests the possibility of a false-negative FNAB. Shin et al. [12] suggested that benign-looking nodules with benign cytological results can be followed-up using imaging surveillance rather than repeated US-FNAB because the risk of malignancy of thyroid nodules with clinico-radiological suspicion of malignancy was higher than that of those without a clinico-radiological suspicion of malignancy. However, they did not suggest which findings showed the sonographically suspicious features and what the risk was of malignancy according to either US features, cytological findings, or a combination of US and cytological findings. In this study, we evaluated the risk of malignancy according to US groupings and cytological readings.

In this study, age, tumor size, and US groupings were significantly different between the benign and malignant pathological groups. However, we think that age and tumor size had little impact on the risk of malignancy because the odds ratios of these were nearly 1, implying no discernable relative risk of malignancy. Conversely, the US grouping had a high odds ratio (10.568), suggesting that it is an important prediction factor of malignancy. Additionally, we evaluated the risk stratification of malignancy based on US groupings and cytological results, and we investigated the role of sonographic-cytological correlation for deciding which nodules should be reaspirated to reduce the false-negative rate of US-FNAB. When focal thyroid nodules had “malignancy” or “suspicious for papillary carcinoma” readings on FNAB, the rate of malignancy was high (more than 92%). In contrast, when focal thyroid nodules had “benign” readings on FNAB, the risk of malignancy was lower with “probably benign” (2.9%) than suspicious sonographic features (56.6%). For focal thyroid nodules with “indeterminate” or “cell paucity” readings on FNAB, the risk of malignancy was lower with “probably benign” (16.7 and 7.7%, respectively) than “suspicious” sonographic features (66.7 and 60.9%, respectively), although the number of focal thyroid nodules was small. These results demonstrated that focal thyroid nodules with “probably benign” US groupings and “benign”, “indeterminate”, or “cell paucity” cytological results would have a low chance of being malignant, compared with a suspicious malignant US grouping with the same cytological results. Of 31 nodules with false-negative results on FNAB, 96.8% (30/31) had suspicious findings based on US features. Our results demonstrated that repeat aspiration resulted in adequate samples in 13 (92.9%) of 14 nodules, with inadequate samples on the initial aspirate and “suspicious for malignancy” or “malignancy” in 15 (93.8%) of 16 thyroid cancers with “benign” results on the initial aspirate. In contrast, repeat aspirations were misdiagnosed as “suspicious for malignancy” in two (16.7%) of 12 thyroid benign nodules with “benign” results on the initial aspirate. The intervals between the initial and repeat aspirations were 25 or 30 days for these nodules. The false-positive results of these two nodules may be related to nuclear atypia on the initial aspiration [16]. These results may help clinicians in various situations in determining whether to aspirate thyroid nodules again. When discrepancies exist between radiological and cytological findings, we suggest performing repeat aspiration selectively for focal thyroid nodules with suspicious US features, even when the cytological results are benign.

There are several potential limitations of this study. First, selection bias was an inevitable limitation. This study included a large proportion of malignant nodules (567/672, 84.4%) that were surgically confirmed. This remarkably high percentage of malignancy can be attributed to the conservative management of most benign nodules at our institution. When the results of FNAB and US indicated that the lesion was benign, we chose clinical follow-up rather than surgery. Focal thyroid nodules that had not been operated on were excluded to analyze the false-negative rate, resulting in a relatively low negative predictive value compared with other studies [6, 4749]. Therefore, the false-negative rate of 5.8% may not be an accurate measurement. We believe that the true false-negative rate of malignancy would be reduced if we included benign thyroid nodules that were not surgically removed. Second, we did not include the vascularity of the thyroid nodules due to limitations of the retrospective data collection. Several reports demonstrated that intratumoral vascularity may be a risk factor for malignancy [19, 21, 27], although others have found the opposite [17, 50, 51]. Third, five cytopathologists interpreted the FNAB slides at our institution. Although many reports dealt with interobserver variability with respect to follicular-patterned thyroid nodules [5255], some interobserver variability likely exists for papillary thyroid carcinoma.

In conclusion, this study demonstrated the risk stratification of thyroid malignancy based on US features and cytological results, and the importance of the correlation between sonographic features and cytological results. Repeat FNAB should be performed on focal thyroid nodules with suspicious US features even when the initial FNAB results indicate that the lesion is benign.