Thyroid nodules, a common disorder of the endocrine system, are found in 4% to 8% of the general population by palpation,1 17% to 46% of patients by ultrasound,2 and 50% of autopsy series.2 Only about 5% of the thyroid nodules are thought to be malignant, although rates as high as 15% have been reported.3,4 The accurate diagnosis of cancer without resort to thyroidectomy can be a major challenge5,6 and is an important clinical problem because more than 100,000 thyroidectomies still are performed annually in the United States.7

Thyroid nodule diagnostics have sequentially and significantly improved during the last 40 years, beginning with the routine use of fine-needle aspiration (FNA) cytology, which when introduced in the 1980s was associated with a significant decrease in the rate of thyroidectomy and a doubled rate of thyroid cancer in surgical specimens.8 In addition, ultrasound imaging has become a core component of thyroid nodule management, and technologic advances in high-resolution ultrasound have improved the characterization and differentiation of benign from malignant nodules.9

In 2007, the Bethesda System for Reporting Thyroid Cytopathology provided another major innovation by standardizing the reporting of FNA results into six distinct categories: I (nondiagnostic), II (benign nodule), III (atypia of undetermined significance/follicular lesion of undetermined significance [AUS/FLUS]), IV (follicular neoplasm/suspicious for follicular neoplasm [FN/SFN]), V (nodule suspicious for malignancy, and VI (nodule positive for malignancy).10 The rates of malignancy within each Bethesda category vary greatly, with a thyroid cancer (TC) probability of 0% to 3% for benign FNA (BII) compared with 97% to 99% for malignant FNA (BVI).11 However, in the three indeterminate categories (BIII, BIV, BV), which account for approximately 20% of FNA specimens,11,12,13 the rates of malignancy are less distinct, comprising 5% to 15% for AUS/FLUS, 15% to 30% for FN/SFN, and 50% to 75% for suspicious cytology.11 Thus, although current medical and surgical guidelines recommend diagnostic thyroidectomy for most indeterminate thyroid nodules, the final histology for the majority of patients who undergo such surgery still will be benign.6

Molecular testing (MT) is the fourth major advance in the management of thyroid nodules. During the past two decades, several test types were developed to improve the stratification of cancer risk in cytologically indeterminate lesions, with the dual goals of reducing rates of diagnostic surgery and informing about the correct extent of initial surgery. Currently, the major molecular techniques have evolved to allow for sensitive and cost-effective genetic analysis, and such testing can be readily performed on even a small amount of cellular material (i.e., 2.5–25 ng of nucleic acid material) harvested during thyroid FNA (Table 1), and even from stored slides or cytology smears in many cases.14

Table 1 Comparison of commercially available molecular testing platforms

The current MTs use several combinations of genomic sequencing, messenger RNA (mRNA) analysis, and/or microRNA (miRNA) expression analysis of cancer-associated genes, with high diagnostic accuracy documented in multiple studies.15,16,17,18,19,20,21,22 Based on these results, MT was included as an option for the management of indeterminate thyroid nodules in the 2015 guidelines of the ATA and recommended for use in several settings in the 2020 guidelines of the AAES.5,6

After its analytical validity is established, any new clinical test should then demonstrate both clinical validity (does it perform well?) and clinical utility (is it safe; does it actually help patients; is it cost-effective?).23 The purposes of this review are to provide an overview of the current available MT platforms, to summarize published data on clinical validity, and to highlight recent studies analyzing the clinical utility of MT in the management of thyroid nodules and thyroid cancer.

Current Available Thyroid Molecular Tests

Three MT platforms are commercially available in the United States: the Afirma Gene Sequencing Classifier and Xpression Atlas (GSC & XA; Veracyte, South San Francisco, CA, USA), ThyroSeq version 3 (TSv3; CBLPath, Rye Brook, NY, USA), and ThyGeNEXT and ThyraMIR (Interpace Diagnostics, Parsippany, NJ, USA). With each test type, although limited material is needed for analysis, it is important to note that additional dedicated passes during FNA biopsy often are required for MT.

A comparison of molecular platform methods, breadth, sampling requirements, and published accuracy is provided in Table 1. Although high sensitivity and specificity are characteristics of good diagnostic tests, clinicians also should carefully consider both the intended use of the test and the specific local cancer prevalence, which greatly affect test performance.5 Generally, in a region of low thyroid cancer prevalence (i.e., when the pretest probability of cancer is low) and/or when a MT has high sensitivity, the negative predictive value (NPV) can be high, suggesting that cancer can accurately be excluded. Conversely, in areas with high test specificity and/or high cancer prevalence (or with another reason for a high pretest probability of cancer), the positive predictive value (PPV) may be high, allowing for an accurate diagnosis of malignancy.6,24

The clinical validation data for the Afirma GSC and ThyroSeq v3 platforms are summarized in Fig. 1.

Fig. 1
figure 1

Comparison of clinical validation data for Bethesda III and IV nodules. Comparison of sensitivity (a), specificity (b), negative predictive value (NPV) (c), and positive predictive value (PPV) (d) between Afirma GSC and ThyroSeq v3 platforms

We omitted the ThyGeNEXT/ThyraMIR platform from Fig. 1 because the clinical validation data for this platform remain limited.

Importantly, for all three types of thyroid MT, the 2016 introduction of the novel term “noninvasive follicular thyroid neoplasm with papillary-like features” (NIFTP) to describe what was previously classified as noninvasive encapsulated follicular variant papillary thyroid cancer25 likely has modified the interpretation of prior accuracy and validity studies. For definitive diagnosis and management, NIFTP lesions require surgery and commonly are counted together with thyroid cancer in studies.

Clinical Validity of MT Platforms

GEC/GSC

The Afirma GSC test is the most recent version of what used to be called the gene expression classifier (GEC), and both are microarray-based tests designed using a proprietary algorithm based on the mRNA expression pattern of genes selected to identify benign thyroid nodules.26 In 2012, an initial and influential prospective, double-blinded, multicenter validation study of GEC examined 265 samples from 249 patients with indeterminate cytology and demonstrated NPVs of 95% for 129 BIII nodules, 94% for 81 BIV nodules, and 85% for 55 BV nodules.15 The prevalence of malignancy was 24% for BIII nodules and 25% for BIV nodules, and the PPVs were low, at 38% and 37%, respectively. Thus, the high sensitivity and NPV were proposed to allow patients with GEC-negative nodules the informed option to pursue clinical surveillance instead of diagnostic surgery. However, the observed low specificity and PPV meant that a GEC-positive result (termed “suspicious” for this test) still led to diagnostic surgery for many histologically benign lesions.27 Because the likelihood of missed malignancy was too high for BV lesions (15%), GEC was not recommended for suspicious cytology results.

The GSC added mRNA classifiers to identify parathyroid lesions, medullary thyroid cancer, the BRAF V600E and RET/PTC1 mutations, RET/PTC3 fusion, and Hurthle cell lesions.

A clinical validation study of GSC used the same multi-institutional patient cohort as the GEC validation study,15 although with somewhat smaller numbers due to exclusion of BV nodules and insufficient RNA in some samples, to assess GSC performance in 191 BIII or BIV thyroid nodules.28 One sample was assigned no result and excluded from the final analysis after it was deemed to have inadequate follicular content. Of the remaining 190 samples, the sensitivity and specificity were, respectively, 92.9% and 70.9% for 114 BIII nodules, and 88.2% and 64.4% for 76 BIV nodules. The NPV and PPV were respectively 96.8% and 51% for BIII and 95% and 41.7% for BIV nodules.28

In 2019, Endo et al. compared the performances of GSC and GEC, noting that GSC improved specificity and PPV while maintaining high sensitivity and NPV.21 In addition, since 2018, Afirma also offers testing for a gene and fusion panel termed the Xpression Atlas (XA), which provides information on 905 gene variants and 235 fusions including clinically relevant alterations in BRAF, DICER1, RAS, ALK, NTRK, and RET. However, TERT-promoter mutations, an important prognostic marker,5,6 are not assessed.

The GSC test does not provide detailed genetic information about the type of detected mutation, which can potentially help in prognostic assessment, systemic therapy, and hereditary syndrome risk.29 To date, a blinded clinical validation study using the proprietary XA platform has not been performed. The results of utility studies examining whether the use of GEC/GSC provides safety, reduces thyroidectomy rates, and/or affords cost efficacy are presented later.

ThyroSeq

In contrast to focusing on genetic profiles seen in benign nodules, the rationale for clinical development of ThyroSeq was to identify alterations associated with malignancy. The earliest iteration was a seven-gene panel (ThyroSeq v0), which used a real-time polymerase chain reaction (PCR) to detect point mutations in BRAF V600E/K601E, NRAS codon 61, HRAS codon 61, and KRAS codons 12 and 13, as well as gene rearrangements in RET/PTC1, RET/PTC3, and PAX8/PPARγ. Its use was first reported in 2009 by Nikiforov at the University of Cincinnati,30 and further clinical validation was described in 1056 indeterminate FNA samples from the University of Pittsburgh, which showed that detection of any mutation in an indeterminate nodule increased the risk of cancer from 14% to 87%.31 In the University of Pittsburgh validation study, 479 patients underwent thyroidectomy, which provided a pathologic diagnosis for 513 FNA samples. In the BIII cohort (n = 247), the overall cancer risk was 14%, and MT had a PPV of 88% and an NPV of 95%. In the BIV cohort (n = 214), the overall cancer risk was 27%, and MT had a PPV of 87% and a NPV of 86%. Finally, in the BV cohort (n = 52), the cancer risk was 54%, and MT had a PPV of 95% and a NPV of 72%.31 The seven-gene panel was externally validated.32 Although it helped in the diagnosis of thyroid carcinoma and objectively reduced the rate of thyroidectomy,33 it lacked sufficient sensitivity (range, 57–68%) to allow avoidance of diagnostic thyroidectomy altogether, which limited its clinical utility.

With the advent of high-throughput techniques such as next-generation sequencing (NGS) and availability of data from comprehensive whole-genome sequencing from The Cancer Genome Atlas (TCGA) program,34 in subsequent versions, ThyroSeq was sequentially broadened to include a 12-gene panel (ThyroSeq v1) and a 56-gene panel (ThyroSeq v2).17,35,36 These changes led to improvement in the sensitivity of ThyroSeq v2 for cancer to 90.9% in BIII nodules and 90% in BIV nodules, and the observation of ThyroSeq v2-negative nodules became a proposed management option.35,36 The most recent version (ThyroSeq v3) includes 112 genes and detects five different classes of genetic alterations: 1) mutations; 2) insertions and deletions; 3) gene fusions; 4) gene expression alterations; and 5) copy number alterations.20

In a 2019 prospective, double-blinded clinical validation study of 286 indeterminate (BIII–BV) thyroid nodules from 10 clinical sites,22 ThyroSeq v3 sensitivity and specificity were respectively 91% and 85% for 154 BIII nodules, and 97% and 75% for 93 BIV nodules, with 29 samples that had uninformative results from insufficient biopsy material; the BV cohort accounted for 10 samples. The NPV and PPV were respectively 97.1% and 64% for BIII (cancer/NIFTP prevalence, 28%) and 98% and 68% for BIV lesions (cancer/NIFTP prevalence, 35%).22 An external single-institution experience with ThyroSeq v3 showed a higher NPV for BIII (99.5%) than for BIV (95.4%).37 The current data on ThyroSeq v3 clinical utility, safety, and cost efficacy are presented later.

In general, MT still has some limitations with Hurthle cell predominant lesions, which frequently are placed in a BIII or BIV category cytologically. Hurthle cells are large oxyphilic (pink) cells characterized by prominent nucleoli and abundant mitochondria, which can be present in Hurthle cell carcinoma but are much more commonly present in a variety of benign conditions such as Hurthle cell adenoma, Hashimoto’s thyroiditis, and nodular goiter. Although earlier versions of MT did not have sufficient specificity to decrease the rates of surgical intervention for cytologic Hurthle cell neoplasms, the latest versions of MT (Thyroseq v3 and Afirma GSC) have improved performance for Hurthle cell lesions.38 The benign or negative call rate of the published experiences with Hurthle cell lesions is 53% to 61% for Thyroseq V2/V3 and 63% to 89% for Afirma GSC,38 suggesting that unnecessary surgery is avoidable in a majority of cases. It is important to note that the published experience to date is based on limited numbers, and more studies including larger cohorts and longer follow-up periods are needed.

ThyGeNEXT/ThyraMIR

Using methodology and rationale similar to those for ThyroSeq, ThyGeNEXT/ThyraMIR primarily uses sequencing for a targeted gene mutation and fusion panel, and if testing is negative, additional testing for a microRNA gene expression panel also is performed.19,39 The only clinical validation study published for this platform used an earlier iteration termed ThyGenX/ThyraMIR. The study assessed a cohort of 109 BIII/BIV thyroid nodules from 12 clinical sites and reported a sensitivity of 89%, a specificity of 85%, a NPV of 94%, and a PPV of 74%, with a thyroid carcinoma prevalence of 32%. However, in that study, a negative ThyGenX plus a low-risk ThyraMIR result was associated with a relatively high residual risk of 6% for malignancy.39

In a more recent study funded by the manufacturer40 that included 178 BIII-BV nodules, after post hoc exclusion of nearly 40% of the initial study cohort, sensitivity was 97% for BIII (TC prevalence 36%) and 86% for BIV (TC prevalence of 24%). It is not clear why the cancer prevalence rate was higher for the BIII nodules than for the BIV nodules. In addition, performance for BV nodules was not assessed due to a limited number of samples (n = 19) in this cohort. This study reported improvement in diagnostic performance with the microRNA panel, particularly for nodules that were RAS-positive.

Clinical Utility of MT Platforms

Perhaps the most important question about thyroid MT is whether it has a beneficial impact on patient care. Whereas earlier studies that addressed the effect of MT on surgical decision-making for indeterminate nodules either demonstrated a clear benefit33 or seemed to suggest no benefit,41,42 the two studied test types (i.e., ThyroSeq v0 and Afirma GEC) currently are outmoded, due not only to the evolution of the tests themselves, but also to updated clinical guidelines for management of thyroid nodules6,43 and even to the introduction of NIFTP terminology.44 Cytologic findings with the addition of MT may aid the clinician in differentiating papillary thyroid cancer (PTC) because NIFTP/follicular variant PTC lesions have a distinct miRNA profile (43–45) and an increased association with RAS rather than with the BRAFV600E mutation, which is virtually synonymous with PTC. Clinician reluctance to avoid diagnostic surgery for molecular-negative indeterminate nodules is reported,45,46 but unfortunately, some studies have included patients with BVI (malignant) FNA results, when to date, MT has no role if the results will not alter clinical management.5

Several small single-institution clinical utility studies of GEC/GSC have been performed under prior and existing clinical guidelines. In 2018, Deaver et al.47 provided a long-term follow-up study for more than 2000 BIII and BIV thyroid nodules. With a malignancy rate for surgically resected BIII nodules of 24.5%, GEC-suspicious nodules had a surgical rate of 78.9% and a malignancy rate of 37.8%. The malignancy rate for all the BIV nodules that underwent surgery, with or without MT, was 20%.

In 2018, Livhits et al.48 compared the diagnostic performance of GEC with that of ThyroSeq v2. They found that ThyroSeq v2 had a higher specificity and allowed more patients (n = 28) to avoid diagnostic thyroid surgery on the basis of a negative molecular result (GEC, 39% vs ThyroSeq v2, 62%). Among the nodules tested with GEC, 49% were suspicious and 43% were benign. Of the nodules tested with ThyroSeq v2, 19% were mutation-positive and 77% were mutation-negative.

In 2019, Wei et al.49 compared GEC with GSC and found that a larger percentage of indeterminate FNA specimens were classified as benign using GSC, especially among samples with oncocytic features.

In 2020, Vora et al.50 evaluated more than 400 thyroid nodules by GEC, and the rate of surgical resection with “suspicious” GEC results was 85%, but the malignancy rate was only 43%. Nearly one fourth (24%) of the patients with benign GEC results underwent surgical resection, with a NPV of 90%.

In 2020, a single-institution study performed under the current clinical guidelines5,6 assessed the clinical utility of reflexive MT for 405 molecular-negative BIV (follicular neoplasm) nodules in 389 consecutive patients managed by ThyroSeq v2/3 (281 v2, vs 124 v3 after November 2017), excluding cytologic Hurthle cell neoplasm.51 This analysis represents the largest real-time utility study of MT to date and also provides the results of nonoperative surveillance. The patients were offered surgery for positive MT, nodule-related symptoms, size greater than 4 cm, hyperthyroidism, and/or concurrent hyperparathyroidism. During programmatic implementation from November 14 to September 19, 39% of BIV nodules were molecular-positive. A positive result was associated with much higher use of thyroidectomy (91% for MT-positive vs 27% for molecular-negative nodules; p < 0.001) and a quadrupled rate of histologic thyroid cancer/NIFTP (78% vs 19%). All molecular-negative cancer/NIFTP lesions found on final pathology were low risk and had been assessed using ThyroSeq v2. Importantly, 81% of the molecular-negative BIV patients were triaged to active surveillance, and during a mean follow-up period of 24.6 months, 82% of their molecular-negative BIV nodules remained stable on ultrasound reevaluation.51 However, although nonoperative surveillance appeared to be safe in the short-term follow-up evaluation, compliance was incomplete. The study was not designed to detect whether molecular use in BIV nodules affects the extent of initial thyroidectomy under current management guidelines (lobectomy vs total thyroidectomy).

In 2020, Guan et al.52 reported that use of ThyroSeq v2/3 (546 v2 and 34 v3 patients) was associated with a threefold higher rate of malignancy for 58 RAS-positive BIII/BIV nodules and a fivefold decrease in the surgery rate when MT was negative in 233 patients. A very recent clinical utility study from Canada showed that application of Thyroseq testing to 50 indeterminate nodules led to a 54% decrease in the rate of diagnostic surgery.53 Zhu et al.54 studied trends in the surgical management of thyroid cancer and found that early adoption of MT was a factor in decreasing the rate of diagnostic thyroidectomy from 67.3% down to 35.5%.

Two significant cost-efficacy studies have compared diagnostic thyroid lobectomy for cytologic indeterminate nodules with several types of thyroid MT. In 2019, we used a hypothetical model to perform cost-effectiveness analysis under the current national management criteria5,6 and demonstrated significantly improved cost efficacy with the use of either GSC or TSv3 versus routine diagnostic lobectomy, and the results remained consistent regardless of the length of surveillance.55 In 2020, a decision analysis by Zanocco et al.56 using Markov modeling showed that Afirma GEC compared with diagnostic lobectomy can be cost-effective for cytologically indeterminate nodules with intermediate or low ATA or sonographic suspicion for malignancy, but not for those with high sonographic suspicion.

Prognostic Value of Thyroid MT

In addition to providing diagnostic information, preoperative MT can provide prognostic value for patients with suspected or known thyroid malignancy because different mutations often are associated with thyroid cancer subtypes and can provide prognostic information (especially preoperatively) for long-term management.57 For example, whereas RAS-like mutations typically are more indolent in the absence of a secondary mutation such as TERT or TP53, BRAF-like mutations are associated with lymph node metastasis and/or aggressive histologic subtypes such as tall cell variant.57,58 Furthermore, when late-hit mutations such as TP53, PIK3CA, or TERT are seen, the risk of aggressive disease, including disease recurrence and distant metastasis, is significantly higher.58,59 Because either lobectomy or total thyroidectomy currently is an acceptable choice for intrathyroidal differentiated thyroid cancer (DTC) 1 to 4 cm,6 nodules with isolated RAS mutations may be adequately treated with lobectomy alone or may be candidates for active surveillance, whereas a patient with BRAFv600E mutation may benefit from total thyroidectomy. However, determining the role of MT in guiding the extent of thyroid surgery remains a controversial and active area of investigation with ongoing clinical trials (NCT 02947035).

Conclusion

Molecular testing for indeterminate thyroid nodules has been in clinical use for more than a decade. As our understanding of thyroid tumors and their genetic alterations has evolved, and as the technical parameters of testing have improved, MT provides a safe and cost-effective strategy that decreases the rate of diagnostic thyroidectomy in the management of indeterminate nodules. Additionally, MT can provide valuable prognostic information in a preoperative setting and may safely guide clinical management.

Acknowldegments

The authors thank Dr. Linwah Yip for her generosity and expert assistance during the manuscript preparation process. Dr. Carty gratefully acknowledges support from the William and Susan Johnson Fund for Endocrine Surgery Research.