Introduction

Fine-needle aspiration (FNA) is widely accepted as the primary diagnostic tool for the evaluation of thyroid nodules owing to its simplicity, safety, cost-effectiveness, and its high sensitivity of 83–98 % and specificity of 70–92 % [1, 2]. However, a major limitation of FNA is inconclusive diagnosis including rates of non-diagnostic results reported to be 2–24 % and atypia of undetermined significance/follicular lesion of undetermined significance (AUS/FLUS) results reported to be 1–27 % [35]. Core needle biopsy (CNB) has been suggested as a complementary method to FNA, mainly to overcome possible inconclusive diagnosis [6]. However, inconclusive results have still been unavoidable in CNB with reported rates of inconclusive results of 6.0–31.8 % [617].

A few limited studies have compared the diagnostic accuracy of ultrasound (US)-guided CNB with that of US-guided FNA in the diagnosis of malignancy [6, 9, 1821]. The results of these studies have not been consistent: CNB was more accurate and sensitive than FNA in some studies [6, 9, 21], but other studies, including one meta-analysis, showed that CNB had similar or rather lower accuracy and sensitivity compared to FNA [1820]. If the diagnostic accuracy does not significantly differ between the two methods, there may be no benefit in performing CNB in spite of accepting risk of complications and technical difficulty, which are more common with CNB than with FNA.

Thus, the objective of our study was to compare the diagnostic performances of FNA and CNB in the diagnosis of thyroid malignancy and neoplasm in patients who underwent surgery for thyroid nodules.

Materials and methods

Patients

The institutional review board approved this retrospective study, and patient approval and informed consent were not required for the retrospective review of US images, medical records or cytopathology reports. However, written informed consent for US-FNA, CNB and surgery was obtained from all patients prior to procedures.

From July 2013 to April 2015, 3382 consecutive patients underwent staging US examinations prior to surgery. Of 3382 patients, 184 patients who did not proceed with surgery, and six patients who underwent FNA at another hospital but who did not have a slide review performed were excluded. Finally, a total of 3192 patients were included. The mean ± standard deviation age of the patients was 44.4 ± 12.3 years (range 10–87 years); there were 2522 women (mean age 44.4 ± 12.3 years, range 10–87 years) and 670 men (mean age 44.4 ± 12.3 years, range 20–85 years). Among the total included patients, 3048 (95.5 %) underwent US-FNA and 144 (4.5 %) underwent CNB to diagnose thyroid nodules prior to surgery. Of 3048 patients with FNA, 2538 (83.3 %) underwent FNA at another hospital, and 510 (16.7 %) did so at our institution. Of 144 patients with CNB, 134 (93.1 %) underwent CNB at another hospital, and 10 (6.9 %) did so at our institution to evaluate thyroid nodules with non-diagnostic (n = 2) or AUS/FLUS (n = 5) cytology on prior FNA, benign but discordant FNA cytology (n = 1) or increased size on follow-up US (n = 2). All 2672 patients with CNB or FNA performed at other hospitals had their slides reviewed by our pathologists prior to surgery. There were no complications or technical failures among the CNBs performed at our institution, but data on the occurrence of complications due to CNBs performed at other hospitals were not available.

Staging US examinations

Preoperative staging US included an examination of the thyroid nodule (size, extrathyroidal extension and US features) and cervical lymph nodes. US features of each thyroid nodule were described and recorded by the radiologists who performed staging US examinations according to the following categories: (a) internal component (completely solid nodules, or mainly cystic (cystic portion ≥ 50 %) or mainly solid (cystic portion <50 %) nodules in mixed cystic and solid nodules), (b) echogenicity (hyper-, iso- and hypoechogenicity when compared to the echogenicity of the underlying thyroid parenchyma, or marked hypoechogenicity compared to the adjacent strap muscle), (c) margin (circumscribed or non-circumscribed i.e. microlobulated or irregular), (d) calcifications (no calcification, microcalcification, macrocalcification or mixed calcification with both micro- and macrocalcifications) and (e) shape (parallel, greater in the transverse dimension than the anteroposterior dimension or non-parallel, greater in the anteroposterior dimension than the transverse dimension). Suspicious US features included marked hypoechogenicity, non-circumscribed margin, microcalcification or mixed calcification, and non-parallel shape, based on published criteria [22]. Final assessment was given as ‘probable benign’ for nodules without suspicious US features and ‘suspicious malignant’ for those with one or more suspicious US features.

Cytopathology reports

During the study period, eight cytopathologists with 8–21 years of experience in thyroid cytopathology reviewed the FNA and CNB slides according to the review schedule. FNA cytology diagnoses were reported on the basis of the Bethesda System for Reporting Thyroid Cytopathology with the following six categories: non-diagnostic, benign, atypia of undetermined significance or follicular lesion of undetermined significance (AUS/FLUS), FN/SFN (follicular neoplasm or suspicious for follicular neoplasm), suspicious for malignancy, and malignancy [23]. The diagnostic criteria for CNB of thyroid nodules have not yet been standardized [6, 9, 15]; in this study, CNB readings were therefore classified according to the six categories of the Bethesda system, like with the prior studies [6, 9, 15, 23]. A non-diagnostic reading included the absence of any identifiable follicular thyroid tissue, the presence of only a normal thyroid gland, and tissue containing only a few follicular cells insufficient for diagnosis. A benign reading included colloid nodules, nodular hyperplasia and lymphocytic thyroiditis. An AUS/FLUS reading included nodules in which some atypical cells were present, but were not diagnostic of suspicious malignancy or malignancy, and cellular follicular nodules in which distinction between adenomatous hyperplasia (AH) and FN was not possible. An FN/SFN reading included nodules with histological features favouring follicular neoplasm. A suspicious for malignancy reading was assigned when a specimen exhibited some atypical cells but was without sufficient evidence for malignancy. A malignancy reading was assigned when a specimen exhibited unequivocal features of cancer.

Data and statistical analysis

Because there were significant differences in nodule characteristics between patients with CNB and FNA as well as the overall number of nodules, patients with CNB were matched with those with FNA using propensity score matching with greedy algorithms and logistic regression analysis [24]. Matched variables included sex, age, Bethesda category, size on US, US composition, and US final assessment. Clinical and US characteristics were compared between patients with CNB and those with FNA using the independent two-sample t test for continuous variables and the Chi-square or Fisher’s exact test for categorical variables before matching, and the paired t test for continuous variables and McNemar’s test for categorical variables after matching.

Surgical pathologic diagnosis was defined as the reference standard. Malignancy rates and neoplasm rates were compared between patients with CNB and those with FNA using the Chi-square test in the non-matched cohort and generalized estimating equations (GEE) in the matched cohort. Diagnostic performances (sensitivity, specificity, accuracy, positive predictive value (PPV) and negative predictive value (NPV)) to assess malignancy and neoplasm were calculated, and compared between the CNB and FNA groups using the Chi-square or Fisher’s exact test in the non-matched cohort and GEE in the matched cohort with the following six criteria. Neoplasm was defined as malignancy plus follicular adenoma and Hürthle cell adenoma. For the diagnostic criteria to assess malignancy, test-positives were defined by the Bethesda categories of suspicious for malignancy and malignancy. Test-negatives were defined by the Bethesda categories according to the following four criteria to assess malignancy:

  • Criterion 1: Non-diagnostic, Benign, AUS/FLUS, FN/SFN

  • Criterion 2: Non-diagnostic, Benign, AUS/FLUS

  • Criterion 3: Non-diagnostic, Benign

  • Criterion 4: Benign

For the diagnostic criteria to assess neoplasm, test-positives were defined by the Bethesda categories of FN/SFN, suspicious for malignancy and malignancy. Test-negatives were defined by the Bethesda categories according to the following three criteria to assess neoplasm:

  • Criterion 1: Non-diagnostic, Benign, AUS/FLUS

  • Criterion 2: Non-diagnostic, Benign

  • Criterion 3: Benign

All analyses were performed with SAS, version 9.2 (SAS Institute, Cary, NC). Two-sided P values less than 0.05 were considered statistically significant.

Results

Clinical and US characteristics of patients with FNA versus CNB

Table 1 demonstrates the clinical and US characteristics of the two groups with FNA and CNB before and after matching. Before matching, sex and age were similar for both groups. In terms of the Bethesda category, no patients had non-diagnostic results from CNB, while 0.5 % (15 of 3048) of patients had non-diagnostic results from FNA. The proportions of the AUS/FLUS, FN/SFN and malignancy categories were higher in the CNB group than in the FNA group. The proportion of the suspicious for malignancy category was lower in the CNB group than in the FNA group. Patients with CNB had more nodules that were 1 cm or larger in size, with mixed cystic and solid composition, and with probable benign assessments on US than patients with FNA. Patients with FNA were statistically matched with patients with CNB. After matching, there were no differences between the two groups with respect to the matched variables.

Table 1 Clinical and ultrasonographic characteristics of patients with fine-needle aspiration (FNA) or core needle biopsy (CNB) before and after propensity score matching

Comparison of malignancy rate and neoplasm rate of FNA versus CNB

Table 2 compares the malignancy rates and the neoplasm rates between the two groups according to the Bethesda category, and Table 3 compares the FNA or CNB results to the final surgical diagnosis. Before matching, the overall malignancy rate of the FNA group was higher than that of the CNB group (Table 2, 97.4 % vs. 91.0 %, P < 0.001), and the overall neoplasm rate was not different between the two groups (98.0 % vs. 96.5 %, P = 0.239). Malignancy rates and neoplasm rates according to each Bethesda category were not significantly different between the two methods, with the exception that the malignancy rate of the FNA group was higher than that of the CNB group in the AUS/FLUS category (Table 2, 83.0 % vs. 60.9 %, P = 0.022).

Table 2 Comparison of the malignancy rate and neoplasm rate of fine-needle aspiration (FNA) versus core needle biopsy (CNB) according to the Bethesda category
Table 3 Comparison of the fine-needle aspiration (FNA) or core needle biopsy (CNB) diagnosis with the final surgical diagnosis

After matching, no differences were found between both groups for the overall malignancy rate (Table 2, 90.3 % for FNA, and 91.0 % for CNB, P = 0.796), overall neoplasm rate (93.8 % for FNA, 96.5 % for CNB, P = 0.273) and malignancy rates and neoplasm rates according to each Bethesda category.

Diagnostic performances of FNA versus CNB

Before matching, when predicting malignancy, the sensitivity and accuracy of FNA were significantly higher than those of CNB using criterion 1 (Table 4, 93.8 % vs. 84.7 %, P < 0.001 for sensitivity, 93.7 % vs. 86.1 %, P < 0.001 for accuracy) and criterion 2 (94.0 % vs. 88.1 %, P = 0.008 for sensitivity, 93.9 % vs. 89.1 %, P = 0.023 for accuracy). No differences were found when using criteria 3 and 4. When predicting neoplasm, the sensitivity and accuracy of FNA were significantly higher than those of CNB using criterion 1 (Table 5, 93.6 % vs. 84.9 %, P < 0.001 for sensitivity, 93.3 % vs. 85.4 %, P < 0.001 for accuracy). No differences were found when using criteria 2 and 3. The specificity, negative predictive value and positive predictive value were comparable between FNA and CNB.

Table 4 Diagnostic performances of fine-needle aspiration (FNA) versus core needle biopsy (CNB) to predict malignancy
Table 5 Diagnostic performances of fine-needle aspiration (FNA) versus core needle biopsy (CNB) to predict neoplasm

After matching, diagnostic performances to predict malignancy or neoplasm were not different between the FNA and CNB groups, except that the specificity of CNB was significantly higher than that of FNA when using criterion 2 or 3 to predict neoplasm (Table 5, 100.0 % vs. 50.0 %, P = 0.046), because of the two false-positive diagnoses (Table 3, two AH) for the FN/SFN category of FNA.

Discussion

We compared the diagnostic performances of FNA and CNB to diagnose malignancy and neoplasm in patients who underwent surgery for thyroid nodules. Since there were significant differences in terms of the Bethesda category, nodule size, internal composition on US, and final US assessment, as well as the overall number of nodules between the FNA and CNB groups, the results found after matching these variables were considered to be more appropriate for the comparison of FNA and CNB. Before matching, FNA showed similar or significantly higher sensitivity and accuracy than CNB. After matching, the diagnostic performances of the two methods were similar, with the exception of criteria 2 and 3 showing a higher specificity of CNB in predicting neoplasm. Therefore, our results found that there may be no benefit in performing CNB over FNA, given the two methods show comparable diagnostic performances.

Until now, only a few studies have compared the diagnostic performances of CNB with those of FNA in the diagnosis of thyroid malignancy [6, 9, 1821]. Of these prior studies, the study by Sung et al. evaluated 555 thyroid nodules, the largest number with regard to sample size, and used the same diagnostic criteria as our criterion 1 to calculate diagnostic performances [9]. In that study, the sensitivity and accuracy of CNB were significantly higher than those of FNA to diagnose malignancy (86.8 % vs. 68.6 % for sensitivity, 92.1 % vs. 82.0 % for accuracy) and neoplasm (84.7 % vs. 65.5 % for sensitivity, 90.3 % vs. 78.9 % for accuracy) [9], contrary to our results. These conflicting results may be due to how the final diagnoses were determined as they were defined differently in the two studies. In our study, all final diagnoses were determined by histopathological results after surgical resection, while in the study by Sung et al. the final diagnoses for benignity were mostly determined by the combination of follow-up US and FNA or CNB results without surgery (82.7 % of benign nodules) [9]. Also, inter-observer variability among the different pathologists might have affected the interpretations of FNA and CNB results [25].

Several recent studies have reported the usefulness of CNB for nodules with initial non-diagnostic FNA results, showing significantly lower non-diagnostic rates in CNB compared to repeat US-FNA [6, 8, 10, 26, 27]. The lower non-diagnostic rate of CNB may be a natural result because CNB can obtain larger tissue samples, and thus CNB may help clinicians appropriately manage nodules with non-diagnostic FNA results [16, 28]. However, the full clinical impact of this approach remains uncertain [16, 28]. In recent studies which included non-diagnostic FNA samples with repeat FNA or follow-up US, as well as those with surgery, the frequency of the malignancy rate was 1.6–3.0 % for nodules lacking suspicious US features. Given this low malignancy rate, a more conservative approach with follow-up US examinations instead of repeat FNA or CNB has been proposed for nodules with non-diagnostic FNA results, particularly for those lacking suspicious US features [29, 30].

A recent meta-analysis by Li et al. [20] found that FNA and CNB do not differ significantly in sensitivity and specificity for diagnosis of thyroid malignancy, concordant with our results. The areas under the ROC curves of FNA were even higher than those of CNB without statistical significance (0.905 ± 0.030 for FNA vs. 0.745 ± 0.095, for CNB, P = 0.053) [20]. This meta-analysis included the aforementioned study by Sung et al. along with four other studies [6, 9, 18, 19, 21]. Of note, despite the large sample size of the study by Sung et al. (64.7 %, 555 of the total 858 thyroid nodules included in the meta-analysis), the results of the meta-analysis showed comparable sensitivity and accuracy of FNA and CNB in the diagnosis of malignancy [20]. A previous study by Yoon et al. reported that CNB can provide more accurate information in differentiating follicular neoplasms from non-neoplasms, because CNB can obtain tissue samples containing nodular tissue, the fibrous capsule of the nodule, and the extranodular tissue [31]. However, in our study, the diagnostic performance of CNB to diagnose neoplasm did not differ from that of FNA. Also, our results showed that it is still difficult to differentiate AH (which lacks a well-defined, complete fibrous capsule) and FN (completely encapsulated lesion by capsule) using CNB, concordant with several previous studies [32, 33]. These CNB readings showing the difficult differentiation between AH and FN accounted for 8.3 % (footnote of Table 1; 12/144) of total readings, and the neoplasm rate and malignancy rate of the readings were 83.3 % (footnote of Table 2; 10/12) and 33.3 % (4/12), respectively. CNB cannot be used to distinguish between follicular carcinoma and follicular adenoma, because the pathologist might not be able to review the whole nodule for invasion through the capsule [16, 33].

The main concerns with performing CNB are safety problems and complications [16, 32, 34]. The most common complications are bleeding and haematoma [32]. Although the reported complication rates of CNB are low, ranging from 0.5 % to 1.0 % [32], and although reported patient discomfort levels and tolerability are similar between FNA and CNB [35], CNB can lead to severe and critical complications such as injury to the carotid artery, trachea or adjacent nerves. An iatrogenic arteriovenous fistula can occur after CNB, causing tinnitus [34]. With the use of a larger needle size, CNB has a greater potential for more serious and permanent complications compared to the transient complications by FNA. Also, CNB may be technically unfeasible or difficult to perform in certain cases, especially in nodules in close proximity to the carotid artery or trachea or in nodules located at the posterior margin of the thyroid [9, 16]. While FNA is relatively safe and feasible even when performed by less-experienced performers, CNB should be performed by experienced radiologists who are familiar with radiologic features of important anatomic structures of the cervical region to avoid major complications [16].

We acknowledge several limitations. First, there may be a selection bias. From the beginning of the study, thyroid nodules without surgery were not included because they did not have gold standard results. This may affect the high malignancy rates of the non-diagnostic, benign and AUS/FLUS category. Also, a comparison of the inconclusive rates (i.e. non-diagnostic plus AUS/FLUS) between FNA and CNB was inappropriate with our study population. Second, we did not perform simultaneous FNA and CNB on thyroid nodules, unlike the previously mentioned studies [6, 9, 1821]. Instead, we chose to compare the diagnostic performances of the two methods by establishing a matched cohort. Third, another limitation is the lack of diagnostic category standardization in the pathologic interpretation of CNB. Recently, the Korean Endocrine Pathology Thyroid Core Needle Biopsy Study Group suggested a guideline for the standard pathology reporting system of CNB [36]. Using a standard pathology reporting system would help clinicians properly manage patients with CNB, and help them accurately evaluate the diagnostic performances of CNB. Fourth, the majority of FNA (83.3 %) and CNB (93.1 %) were performed at other hospitals, although all other hospital slides were re-reviewed by our cytopathologists. Inter-hospital and inter-performer variability might have occurred. Fifth, CNB has been reported to be helpful in the specific diagnosis for malignancy other than papillary thyroid carcinoma such as lymphoma or anaplastic carcinoma [37], but an analysis was not performed on this issue because of the small number of malignancies other than PTC. Our results cannot be generalized to all thyroid malignancies other than papillary thyroid carcinomas, because most of the malignancies included in our study were papillary thyroid carcinomas (98.9 %, 3065 of 3099).

In conclusion, FNA showed comparable diagnostic performance to CNB; therefore, there may be no benefit in performing CNB to diagnose papillary thyroid carcinoma and neoplasm.