Keywords

Introduction

Fine-needle aspiration (FNA) biopsy is the most accurate and reliable diagnostic test available for the evaluation of a thyroid nodule. However, 20–30 % of FNA results are indeterminate or suspicious, and of those resected, 10–40 % are confirmed to be malignant on final pathology [13]. In order to improve upon the diagnostic accuracy of FNA, ancillary molecular tests have emerged to help preoperatively distinguish between benign and malignant nodules. However, the clinical utility of these tests and implications for optimal patient management are not well established. This review will focus on the efficacy of these molecular markers in thyroid nodule diagnosis, specifically when a marker(s) might provide added benefit and how to potentially incorporate these results.

Accuracy of FNA

Although FNA is the gold standard for diagnosis of a thyroid nodule, its accuracy and reproducibility vary considerably, mainly because cytologic interpretation is quite subjective. To address this, the 2007 National Cancer Institute Thyroid FNA conference proposed the Bethesda System for Reporting Thyroid Cytopathology (TBSRTC) in an attempt to standardize diagnostic terminology and improve the clinical utility of FNA. This six-tiered system is comprised of the following diagnoses: nondiagnostic, benign, atypia of undetermined significance (AUS), follicular neoplasm or suspicious for follicular neoplasm (FN/SFN), suspicious for malignancy (SFM), and malignant [4]. The proposed risk of malignancy for each indeterminate and suspicious category is as follows: AUS, 5–15 %; FN/SFN, 15–30 %; and SFM, 60–75 %. Based on these risks, the recommended management for AUS is a repeat FNA; for SFN/FN, surgical lobectomy; and SFM, total thyroidectomy or lobectomy. We know however that these rates of malignancy are not consistent across clinical practices, thus challenging these clinical recommendations [5]. For example, we and others have demonstrated the risk of malignancy associated with AUS to be as high as 39 %, and thus our group recommends surgery as opposed to repeat FNA [58]. Furthermore, significant intra- and interobserver variation in cytological diagnosis also occurs. When 3885 thyroid outside cytological specimens were rereviewed at our institution, the diagnosis changed 32 % of the time [9]. As a consequence, and despite the TBSRTC, this degree of variation in cytological diagnoses still exists and further emphasizes the need for ancillary, more definitive diagnostic testing. Research over the past decade suggests that molecular markers may add diagnostic value to an indeterminate or suspicious FNA biopsy.

The Gold Standard in Diagnosis

The gold standard in the diagnosis of a thyroid nodule is histopathology, and thus, the accuracy of FNA cytology or any molecular test is based on the final diagnosis. However, significant intra- and interobserver variation also exists in making histopathologic diagnoses, and studies have reported disagreement rates as high as 21 % [10, 11]. This variation more typically arises when evaluating follicular lesions. Although histopathologic definitions exist, distinguishing follicular variant of papillary thyroid cancer (FVPTC) from follicular carcinoma or a follicular adenoma can be difficult, especially if nuclear features of papillary carcinoma are not well developed or only focally present. The absence of clear diagnostic criteria for FVPTC has led to an overcalling of this malignant diagnosis. Furthermore, the lack of consensus on the definition of capsular invasion makes diagnosing a benign adenoma versus a follicular carcinoma difficult. Therefore subjective variation in the diagnoses of follicular lesions complicates the evaluation of a molecular test for an indeterminate thyroid nodule, as the accuracy of the test relies on an accurate and consistent histologic diagnosis.

Molecular Markers

Rapidly Accelerated Fibrosarcoma Isoform B

Rapidly accelerated fibrosarcoma isoform B (BRAF) is one of the three RAF paralogs (ARAF, BRAF, and CRAF) and is the most potent activator of the mitogen-activated protein kinase (MAPK) pathway [12]. As one of the most common protein kinase gene mutations in all human malignancies, BRAF is found in 7 % of all cancers and is the most studied molecular marker in thyroid cancer [13, 14]. It occurs in 27.3–87.1 % of papillary thyroid cancer (PTC), 35 % of FVPTC, and 25 % of anaplastic thyroid cancers [1520]. BRAF mutation is not, however, present in pure follicular thyroid cancers, medullary thyroid cancers, or benign tumors. Given the high prevalence of this mutation in PTC, BRAF has been widely investigated to determine whether its detection can improve upon the diagnostic accuracy of indeterminate thyroid FNA. Most studies demonstrate that, although it is a highly specific test (100 % specific), it has relatively low sensitivity, ranging from 15 to 45 % for indeterminate/suspicious nodules [2124]. BRAF V600E mutation fails to detect a high proportion of malignant lesions with initially indeterminate or suspicious cytology, an important consideration to take into account when examining the efficacy of BRAF testing in thyroid nodule diagnosis [25].

Mutation/Rearrangement Panel

Over the last three decades, multiple mutations and chromosomal translocations have been identified in thyroid cancer. More than 70 % of PTCs carry mutually exclusive mutations or chromosomal translocations in genes that activate the MAPK or the PI3 kinase/AKT signaling pathways and include BRAF, RAS, RET/PTC, and TRK rearrangements (Fig. 11.1). Similarly, mutations in the RAS gene or rearrangement of PAX8/PPARγ have been detected in 70–75 % of follicular carcinomas [26]. Due to the limited diagnostic utility of a single molecular marker, a somatic mutation panel including BRAF, RAS, RET/PTC, and PAX8/PPARγ rearrangement was evaluated to predict the likelihood of malignancy in a thyroid nodule. Most recently, The Cancer Genome Atlas (TCGA) program sponsored by NCI, NIH has reported on their comprehensive analysis of approximately 500 PTCs. The study revealed two distinct PTC subtypes, one primarily BRAF-like and another primarily RAS-like, with the RAS tumors having more follicular features [27]. Additional molecular changes, including copy number variation, chromosomal translocations, and other less frequent molecular changes, have been identified, and the majority of which are mutually exclusive. This very comprehensive study sets the stage for the likelihood of future histologic reclassification of PTCs.

Fig. 11.1
figure 1

The MAPK and PI3-AKT pathways. Dysregulation of the MAPK or the PI3K-AKT pathways is involved in thyroid carcinogenesis. The MAPK pathway is frequently activated in thyroid cancer via point mutations of BRAF and RAS genes (or chimeric fusion proteins RET/PTC), and the PI3K pathway is frequently activated via point mutations of PIK3CA and mutation/deletion of PTEN. RAS rat sarcoma, BRAF rapidly accelerated fibrosarcoma isoform B, MAPK mitogen-activated protein kinases, MEK mitogen-activated protein/extracellular signal-regulated kinase kinase, ERK extracellular signal-regulated kinases, PI3K phosphatidylinositol 3-kinase, PTEN phosphatase and tensin homologue, mTOR mammalian target of rapamycin, PPAR-γ peroxisome proliferator-activated receptor gamma

Nikiforov et al. performed one of the earliest studies evaluating the feasibility and role of a mutation panel [28]. They prospectively correlated cytology, mutational status, and either surgical pathology or follow-up for an average of 34 months in 470 FNA specimens, of which 51 samples had indeterminate cytology. All mutation-positive cases of AUS, FN/SFN, and SFM were malignant at surgery, and therefore the panel had 100 % specificity. However, the sensitivity and accuracy for these 51 samples was 100 % for the 21 AUS samples; 75 % and 87 % for 23 SFN samples; and 60 % and 71 % for 7 SFM samples, respectively. The authors concluded that the panel improved the diagnostic accuracy of cytology alone as the cancer probability for indeterminate cytology increased to 100 % with a positive molecular test result, and those patients would therefore be strong candidates for a total thyroidectomy.

Subsequently, a large multi-institutional prospective analysis of 513 consecutive thyroid FNA samples with indeterminate or suspicious cytology demonstrated high specificity and positive predictive value for this panel [29]. The risk of malignancy with any mutation detected was 88 % for the AUS category, 87 % for SFN, and 95 % for the category of SFM. However, the risk of malignancy of a nodule that had no mutation was 6 % for AUS, 14 % for SFN, and 28 % for SFM. Although the specificity was greater than 96 % in the indeterminate categories, the sensitivity ranged from 57 to 68 % (Table 11.1).

Table 11.1 Summary of sensitivity, specificity, and accuracy of different molecular markers in indeterminate thyroid nodules

Cantara and colleagues evaluated the impact of somatic mutations, including BRAF, RAS, RET, TRK, and PAX/PPARγ, on cytology in 235 thyroid nodules [30]. Cytology alone had a sensitivity of 59 %, a specificity of 94.9 %, and an accuracy of 83 %. With the addition of molecular testing, the sensitivity increased to 89.7 %, specificity was 94.9 %, and accuracy was 93.2 %. The addition of molecular markers in this study improved sensitivity and accuracy, but added nothing to the specificity of cytology. The authors included 87 nodules with benign cytology in addition to 53 nodules with inadequate cytology. Thus, the significance of an added benefit is difficult to determine from this study. From these studies one can conclude that although the somatic mutation panel is highly specific, its main limitation is sensitivity. Without incorporation into a decision analysis tool, its clinical utility overall still remains unclear, especially given the fact that there is tremendous variability in both cytologic and pathologic diagnosis from one pathologist to another and one institution to another.

Afirma®

An alternative approach for classifying indeterminate nodules is the commercially available gene expression classifier (GEC) panel, Afirma® (Veracyte). It measures expression of 142 genes representing well-known cancer biologic pathways. In contrast to the somatic mutation panel and BRAF testing, which are both positive predictors of malignancy, this test was designed to improve the negative predictive value (NPV) and in turn reduce or eliminate the need for diagnostic surgery. The company, Veracyte, requires two sets of FNA samples, one for cytological evaluation and the other for gene expression profiling. The second sample only undergoes GEC if the cytology is read as AUS or FN/SFN.

Two large prospective studies were the first to evaluate this test. In a preliminary study, Chudova and colleagues [31] measured more than 247,186 transcripts in 315 thyroid nodules to create a molecular panel to distinguish benign and malignant thyroid nodules. An algorithm, the Afirma GEC, was generated to identify nodules as benign or suspicious and was tested using an independent set of 24 indeterminate FNA samples. The NPV and specificity of this test were estimated to be 96 % and 84 %, respectively.

Subsequently, in an industry-sponsored prospective, multicenter study of 265 nodules with indeterminate cytology, Alexander and colleagues validated the clinical utility of this algorithm [32]. In this study, thyroidectomy was performed on the basis of the clinical judgment of the treating physician at each site without knowledge of the GEC test results. Histopathologic diagnosis was rendered by a central panel of blinded academic endocrine pathologists and served as the reference standard for clinical validation. Of the 265 nodules, 85 (32 %) were malignant. For each Bethesda category, the sensitivities were as follows: AUS, 90 %; FN/SFN, 90 %; and SFM, 94 %; whereas the specificities were lower: AUS, 53 %; FN/SFN, 49 %; and SFM, 52 %. The NPV for each indeterminate category was as follows: AUS 95 %, SFN 94 %, and SFM 85 %. The overall sensitivity for indeterminate nodules was 92 %, and the specificity was 52 %. The overall NPV was 7 %, which is similar to the NPV for benign cytology alone. Based upon this study, half of benign nodules with indeterminate cytology could be diagnosed preoperatively with this test and surgery avoided in this population of patients.

The above studies did not distinguish Hürthle cell-rich nodules from other types of indeterminate nodules. Several recent small studies using this test in routine clinical practice have differentiated this subset, noting a difference in GEC results [3335].

Lastra and colleagues retrospectively examined a cohort of 132 indeterminate nodules that had Afirma® testing [33]. They reported that the test classified only 8 of 25 (32 %) cases with the cytologic diagnosis of follicular neoplasm with oncocytic features (FNOF) as benign, whereas 45 of 68 cases (66 %) of AUS and 17 of 39 (44 %) of FN were read as benign [36]. Forty-eight patients with suspicious Afirma® results underwent surgery, and 11 of 13 (85 %) with FNOF had benign histopathology compared to 7 of 18 (39 %) with AUS and 8 of 17 (47 %) with FN. McIver et al. reported that only 1 of 13 (8 %) nodules with Hürthle cell predominance were read as benign by Afirma® and only 2 of the 12 read as suspicious by Afirma® were malignant on final pathology [35]. Harrell and Bimston retrospectively reviewed Afirma® results of 58 indeterminate nodules, of which 20 were read as benign by the GEC [34]. They noted that 21 of the 58 FNA samples had a predominance of Hürthle cells. Of those, Afirma® read 2 as benign and 19 as suspicious, and yet only 35 % were malignant on final pathology. Afirma® suspicious Hürthle cell-rich lesions were found to have a low rate of malignancy on surgical follow-up. Although these studies have small numbers of patients, they question the performance of the Afirma® for Hürthle cell-rich lesions since the majority will be suspicious on Afirma® testing, but benign on final histopathology.

In summary, based on the low specificity of Afirma®, although it can truly detect approximately half of the benign nodules with indeterminate cytology (true negative), it will mistakenly report the other half of the benign nodules as suspicious (false positive). Several studies suggest that these false-positive results may arise from Hürthle cell-rich lesions, although larger studies are needed to confirm this finding especially since this test is not marketed to use this way.

Next-Generation Sequencing

Although the Afirma® and somatic mutation panels offer some improvement on cytological diagnosis, the ability to preoperatively identify a cancer needs further refinement. The mutation panel relies on the automated Sanger method for genetic sequencing analysis, the dominant method over the past several decades [37, 38]. Recently, next-generation sequencing (NGS) was introduced to enable simultaneous sequencing of multiple genes (targeted sequencing), with as little as 5–10 ng of DNA, in a more cost-effective manner [3942]. Additionally, NGS can perform whole-genome sequencing, whole-exome sequencing, and whole-transcriptome sequencing [42]. As a result, this method can detect mutations with a higher sensitivity on small tissue samples that would have been otherwise excluded due to quantity limitations [39].

Nikiforova and colleagues used NGS to expand the diagnostic mutational panel from 4 to 12 cancer genes. The targeted NGS panel (ThyroSeq v1) included BRAF, RAS, PIK3CA, TP53, TSHR, PTEN, GNAS, CTNNB1, and RET and was performed on 228 DNA samples, which consisted of samples from 105 snap-frozen tissues; 72 formalin-fixed, paraffin-embedded tissue; and 51 FNA samples. Molecular profiles for the common types of thyroid cancer with point mutations were generated. NGS identified mutations in one of 12 cancer genes in 99 of 145 (68 %) malignant samples. The panel identified mutations in 70 % of PTCs, 83 % of FVPTCs, 78 % of conventional FTCs, 39 % of oncocytic follicular carcinomas, 30 % of poorly differentiated thyroid carcinomas, 74 % of anaplastic thyroid carcinoma, and 73 % of medullary thyroid carcinomas [42]. In contrast, only 6 % of benign nodules were mutation positive. This NGS panel was then modified to create ThyroSeq v2, which detects mutational hotspots in an additional gene, the telomerase reverse transcriptase (TERT) promoter, and 42 types of gene fusions that occur in thyroid cancer. In another study, Nikiforov et al. evaluated 143 consecutive FNA samples with a cytologic diagnosis of FN/SFN from patients with known surgical outcomes [13]. On final histologic analysis, 104 nodules were benign and 39 were malignant. The ThyroSeq v2 NGS panel had 90 % sensitivity, 93 % specificity, a PPV of 83 %, a NPV of 96 %, and 92 % accuracy. The authors concluded that this broad NGS panel provides a highly accurate method to preoperatively identify malignant nodules.

Le Mercier and colleagues utilized NGS to retrospectively analyze 50 gene mutations in 34 indeterminate FNA samples. The histological diagnoses were benign in 27 cases, malignant in 7 cases (3 PTCs, 3 minimally invasive follicular cancers, and 1 follicular tumor of uncertain malignant potential). The authors classified results as molecular test positive, a subgroup with 63 % risk of malignancy, or molecular test negative, a subgroup of patients with 8 % risk of malignancy. The sensitivity of this test was 71 %, and specificity was 89 % with a PPV and NPV of 63 % and 85 %, respectively, and an accuracy of 85 % [38]. Although the authors concluded that NGS was feasible and may improve the diagnostic accuracy of FNA biopsy, the low sensitivity of this test suggests that further refinement of the panel is still necessary in order for it to be ultimately clinically useful.

The well-known association between multiple gene mutations and thyroid cancer and the ability of NGS to detect multiple mutations by analyzing a very small amount of DNA that can be obtained from preoperative FNA raise the hope of development of a sensitive and accurate method to improve the preoperative diagnosis of thyroid cancer. These promising preliminary findings of NGS warrant further investigation with larger prospective studies that carefully evaluate their true clinical utility.

MicroRNA

miRNAs are short 19–23-nucleotide length noncoding single strand RNAs (Fig. 11.2) that were initially described in studies on Caenorhabditis elegans in 1993 [14]. They are present in both tissue and the circulation and regulate a number of cellular processes by either upregulating or silencing target genes [43]. The tissue specificity of miRNAs and stability of circulating miRNAs make them suitable choices as potential diagnostic markers of malignancy [44]. Although the exact mechanism is unclear, recent studies have reported dysregulation of several miRNAs in thyroid carcinoma [45, 46], and investigations of various miRNA expression patterns in PTC, FTC, and FVPTC compared with benign tissue have identified several differentially expressed miRNAs [13, 4549]. However, only few studies have examined the diagnostic utility of these miRNA panels for an indeterminate FNA [50].

Fig. 11.2
figure 2

miRNA synthesis and function. miRNAs are small nonprotein-coding single strand RNAs that bind to the untranslated regions of target mRNAs to regulate their translation and stability

One of the earliest studies by Nikiforova et al. investigated the differential expression of a panel of seven miRNAs (miR-187, miR-222, miR-221, miR-146b, miR-155, miR-224, and miR-197) in 60 resected thyroid nodules and then validated their results on 62 FNA specimens [46]. Only 13 patients in the FNA validation group underwent surgery based on atypical cytology (eight patients), malignant cytology (four patients), or clinical suspicion (one patient). On histopathology eight were malignant nodules and five were benign hyperplastic nodules. They found that a twofold upregulation of at least one of these miRNAs was associated with a sensitivity, specificity, and accuracy in diagnosing cancer of 88 %, 94 %, and 95 %, respectively. However, a subgroup analysis of indeterminate FNAs was not performed, likely due to the small sample size.

As the largest miRNA study in indeterminate thyroid nodules to date, Keutgen and colleagues derived a predictive model for an miRNA panel with 101 indeterminate thyroid lesions (29 indeterminate thyroid FNAs and 72 independent validation FNAs) [51]. After model selection, a panel of four miRNAs (miR-222, miR-328, miR-197, and miR-21) was validated on 72 consecutive indeterminate thyroid FNAs, of which 22 were malignant on final pathology. The model correctly classified 65 of the 72 samples, with 100 % sensitivity, 86 % specificity, and 90 % overall accuracy for differentiating malignant from benign thyroid lesions. Of the seven incorrectly predicted lesions, five had a diagnosis of Hürthle cell neoplasm on FNA. After excluding all Hürthle cell lesions, performance of the model improved with a specificity of 95 % and overall accuracy of 97 %. Again, this questions the predictive value of molecular panels in Hürthle cell-rich lesions, which are also one of the main diagnostic challenges for cytologists.

In another study, Kitano and colleagues evaluated expression of miR-7, miR-126, miR-374, and let-7 g in 95 FNA samples, of which 31 had indeterminate cytology. From these data they created a thyroid malignancy prediction model [52]. Validation in 59 samples demonstrated downregulation of miR-7 as the only marker that was differentially expressed in malignant thyroid lesions. Overall, miR-7 was 100 % sensitive, 29 % specific, had a PPV of 36 %, NPV of 100 %, and an overall accuracy of 76 %. Subgroup analysis of the 21 indeterminate samples in the validation cohort revealed a sensitivity of 100 %, specificity of 20 %, PPV of 25 %, NPV of 100 %, and overall accuracy of 37 %. Given the high NPV of miR-7, the authors concluded that a patient with a benign miR-7 result could be followed instead of undergoing diagnostic thyroidectomy.

Shen et al. measured the expression of eight miRNAs (miR-146b, miR-221, miR-187, miR-197, miR-346, miR-30d, miR-138, and miR-302c) in 60 indeterminate, suspicious, or malignant FNAs [53]. Evaluation of a validation set of 68 samples confirmed the diagnostic role of four miRNAs (miR-146b, miR-221, miR-187, and miR-30d) in the differentiation of benign from malignant lesions with a sensitivity of 88.9 %, specificity of 78.3 %, and accuracy of 85.3 %. After subgroup analysis of 30 cases with atypia, the diagnostic accuracy dropped to 73.3 % with a sensitivity and specificity of 63.6 % and 78.9 %, respectively. This group noted that while their panel of miRNA could accurately identify PTC, it was inaccurate for follicular tumors, which unfortunately generally comprise the majority of indeterminate FNAs.

Dettmer et al. evaluated the role of miRNA expression in differentiating conventional FTC (cFTC) from oncocytic FTC (oFTC) [54]. They found that a novel miRNA, miR-885-5p, was upregulated (>40-fold) in oFTCs, but not in cFTC. A classification and regression tree algorithm applied to additional 19 indeterminate FNA samples demonstrated that three dysregulated miRNAs including miR-885-5p, miR-221, and miR-574-3p could differentiate follicular thyroid carcinomas from benign hyperplastic nodules with 100 % diagnostic accuracy. Although they evaluated a small sample size of indeterminate lesions, this study introduced an miRNA panel that may accurately discriminate between follicular carcinomas and hyperplastic nodules.

Several of the above studies are promising. Further larger prospective studies, however, are needed to compare various miRNAs and panels to determine a signature for each type of thyroid cancer prior to clinical application.

Surgical Decision-Making

Although molecular markers may improve upon the diagnostic accuracy of FNA biopsy, their true impact on surgical decision-making remains unclear. In clinical practice, the decision to proceed with surgery and choice of surgical procedure reflects a multitude of clinical considerations. Often times, patient preference or clinical variables, such as nodule size, presence of compressive symptoms, family history, or other risk factors, impact the process of decision-making for an indeterminate thyroid nodule (Fig. 11.3). Furthermore, they also may have other indications for a total thyroidectomy, again challenging the impact that a molecular marker or panel may actually have.

Fig. 11.3
figure 3

Surgical management algorithm. FNA fine-needle aspiration, AUS atypia of undetermined significance, SFN suspicious for follicular neoplasm, SHCN suspicious for Hürthle cell neoplasm, SFM suspicious for malignancy, MNG multinodular goiter, US ultrasound, Hx history, FHx family history, TT total thyroidectomy, CLND central lymph node dissection. Reproduced from Han, P.A. (2014) The impact of molecular testing on the surgical management of patients with thyroid nodules. Annals of Surgical Oncology 21(6)

Two studies have evaluated the clinical impact of Afirma® on operative decision-making. In a multicenter study on 339 patients with an indeterminate cytology (165 AUS, 161 FN, and 13 SFM) who underwent Afirma® testing, the effect of the GEC on operative decision-making was evaluated [55]. This study, conducted over a 3-year period, included patients from five academic medical centers. Among the 339 patients, surgery was initially recommended in 4 out of 174 (2 %) patients with a benign GEC, 141 out of 148 (95 %) patients with a suspicious GEC, and 4 out of 17 (34 %) patients with nondiagnostic result. However, due to other factors such as additional clinical features, loss of follow-up or patient preference, eventually 11 out of 174 (6 %) patients with a benign GEC and 121 out of 148 (82 %) patients with a suspicious GEC underwent surgery. Of the resected nodules with a suspicious GEC, only 53 (44 %) were malignant. The authors performed an intention-to-treat analysis with the assumption that thyroidectomy is typically recommended for all patients with indeterminate nodules and determined that Afirma® modified care recommendations in 171 of 339 patients (50 %). However, according to TBSRTC, a cytologic diagnosis of AUS does not mandate surgery, thereby challenging the authors’ conclusions that the test modified the clinical decision-making. One must also incorporate other factors that may have led to surgery, such as compressive symptoms, family history, etc., before one can accurately assess the impact a molecular marker makes.

In the second study, Duick and colleagues evaluated the impact of a benign Afirma® test result on the endocrinologist-patient decision to operate of patients with thyroid nodules with indeterminate cytology [56]. This cross-sectional multicenter study involving 51 endocrinologists at 21 different practice sites sponsored by Veracyte demonstrated that a benign Afirma® test result could substantially reduce the percentage of patients managed surgically for an indeterminate thyroid nodule from 74 to 7.6 % [56]. Interestingly, when surgery was performed, hemi-thyroidectomy was performed twice as frequently as thyroidectomy, which is the inverse trend for the past 25 years. However, because majority of indeterminate thyroid nodules with benign Afirma® were managed nonsurgically, there is no data available on final pathology. Because of this, it is not possible to figure out whether nonsurgical management had been chosen appropriately. Although it was concluded that benign Afirma® can significantly reduce surgical management of indeterminate thyroid nodules, a longer follow-up for those patients not operated is warranted to completely evaluate the impact of a benign Afirma® test and to evaluate whether the patient eventually required surgery for other indications at some future date.

Our group performed a retrospective evaluation of 114 patients who presented for surgical consultation and who had already undergone molecular testing (Afirma®, Asuragen®, BRAF, NRAS, and/or RET/PTC translocation) to determine the effect on surgical decision-making [57]. A surgical management algorithm including cytology, presence of symptoms, size of nodule, history, and clinical features was created by consensus of four thyroid surgeons. Postsurgical pathology analysis was used to determine the appropriateness of the surgical decision and the utility of the preoperative molecular test. Of the 114 patients, 87 (72 %) underwent surgery, and of those 87, test results altered surgical management in only 9(8 %) patients. Review of final pathology demonstrated that molecular testing resulted in appropriate changes in only three (2 %) patients compared to inappropriate changes in six (5 %) patients [57]. This very low proportion of appropriate change in surgical management indicates the overuse of molecular markers and questions their practicality in the management of indeterminate thyroid nodules, particularly in a patient who otherwise would be referred for surgical consultation. It is also concerning that the molecular test resulted in inappropriate surgical management in a greater number of patients. Therefore, the clinical utility of ancillary molecular tests remains yet to be elucidated and done so in the context of a clinical algorithm.

Cost-Effectiveness

The other major concern regarding clinical use of molecular markers is cost-effectiveness. Molecular markers are expensive tests and often not covered by insurance companies. However, their cost could be offset if they can prevent unnecessary surgical interventions and limit diagnostic thyroidectomies. Otherwise, ordering a test that would not alter clinical management would become a financial burden to the patient [25].

Currently available studies on cost-effectiveness of molecular diagnostic markers mostly use hypothetical models to compare costs related to standard of care with and without molecular markers. They suggest that molecular testing for indeterminate cytology may reduce costs mainly because of reduction in either two-stage thyroidectomies for malignant thyroid lesions or unnecessary surgical interventions for benign thyroid lesions [58, 59]. However, with the exception of two studies, none have put them in the context of clinical practice in order to evaluate their true impact, and until this is performed, one cannot assess their efficacy.

Yip and colleagues created decision tree model for a hypothetical group of patients with a 1 cm or larger solitary thyroid nodule [58]. The model was constructed based on the American Thyroid Association (ATA) guidelines with and without molecular testing using the gene expression panel. The authors found that molecular testing decreased the number of diagnostic lobectomies from 11.6 to 9.7 %, and although it caused an additional diagnostic cost of $5031 for every indicated total thyroidectomy ($11,383), the cumulative cost was still less than performing a lobectomy ($7684) followed by a completion thyroidectomy ($11,954).

Li et al. also used decision analysis of a hypothetical group and used a Markov model to evaluate the 5-year cost-effectiveness of routine use of Afirma® in patients with indeterminate nodules [59]. They reported a 74 % reduction in thyroid surgery for benign lesions with no increase in the number of untreated cancers. Based on their model, the median cost of current practice was $1453 more than the practice with a molecular test over 5 years ($12,172 vs. $10,719). Not only was the duration of hypothetical long-term follow-up unclear, but also costs associated with frequent ultrasounds and repeat FNAs were unaccounted for. Patients who would eventually undergo surgery due to either growth of a nodule or development of clinical symptoms were not factored into the observation group.

As such, the main limitation of the above studies is that they are based on analysis of hypothetical patient cohorts and are not prospective. Because of this, they may have not considered the potential role of multiple important clinical factors that significantly impact clinical decision-making used in managing an indeterminate thyroid nodule in the real world. Furthermore, there is no comparison among different molecular markers regarding their cost-effectiveness.

In order to estimate cost-effectiveness of using a diagnostic test, Najafizadeh and colleagues constructed a patient-level simulation model for the diagnosis of thyroid nodules. They measured incremental clinical benefits in terms of quality-adjusted life-years and incremental 10-year costs [60]. They concluded that theoretically a molecular diagnostic test with 95 % sensitivity and specificity should cost less than $1087 per test (including costs of all related procedures such as pathology, physician time, and specimen transport and processing) in order to save quality-adjusted life-years and reduce costs when used as an adjunct to FNA.

To illustrate the existing gap between a commercially available diagnostic test and an ideal test, one can compare Afirma® with the suggested features for a cost-effective diagnostic test: with a sensitivity of 40–52 %, which are significantly lower than the suggested 95 %, Afirma® costs more than $3350, which is then three times more expensive than a cost-effective test.

Therefore, currently available molecular markers should target a significantly higher diagnostic power along with a lower cost in order to be considered cost-effective.

Conclusion

Over the past decade, significant progress has been made in the investigation of several molecular markers to further refine the diagnostic role of FNA biopsy and improve the accuracy of preoperative diagnosis of indeterminate thyroid lesions. However, because of the complexity of surgical decision-making processes, the clinical utility and impact of these markers remain unclear. A summary of pros and cons of some of the available molecular markers is presented in Table 11.2.

Table 11.2 Summary of pros and cons of different molecular markers in indeterminate thyroid nodules

Larger prospective comparative studies are still needed to address these questions to determine the optimal molecular test(s) and to identify the exact clinical scenarios in which they will both make a difference clinically and be cost-effective.