Introduction

Thyroid cancer is the most common type of malignant endocrine cancer, with its incidence continuing to rise worldwide [1]. Differentiated thyroid cancer (DTC), which includes papillary and follicular cancers, comprises majority (90%) of all thyroid cancers [2]. In general, DTC has a favorable prognosis with excellent survival rates. However, a minority of patients with DTC develop locoregional recurrence, including cervical lymph node metastases, which eventually lead to mortality in some of these patients [2]. Therefore, personalized treatment according to potential prognosis for individuals with thyroid cancer is critically important. DTC management guidelines are valuable instruments for conveying state-of-the-art data and clinical best practices. However, these documents are generally formulated by experts from “world centers of excellence” whose resources are not widely available elsewhere. Unsurprisingly, a 2013 systematic review [3] of 10 recent DTC guidelines identified real-world applicability as their greatest weakness. The American Joint Committee on Cancer/Union for International Cancer Control (AJCC/UICC) staging system is recommended for all patients with DTC based on its utility in predicting disease mortality and its requirement for cancer registries [2]. However, the previous AJCC/UICC staging system did not adequately predict the risk of recurrence in DTC [4]. Therefore, 2009 version of the American Thyroid Association (ATA) thyroid cancer guidelines proposed a three-tiered risk stratification system (ATA 2009-RSS) that classified patients as having low, intermediate, or high risk of recurrence [5]. Recently, both the AJCC/UICC staging system and the ATA DTC management guidelines and RSS were revised (AJCC 8th edition and ATA 2015-RSS) [2, 6] from their previous versions (AJCC 7th edition and ATA 2009-RSS) [5, 7]. Therefore, further studies are needed to determine the best guidelines or staging system for predicting DTC progression and recurrence. To test this, we compared patient outcome predictability of four staging systems using data from The Cancer Genome Atlas (TCGA).

Materials and methods

Data acquisition and characteristics

The primary and processed data were downloaded from Genomic Data Commons Data Portal (https://gdc-portal.nci.nih.gov/) in January 2017. All TCGA data were available without restrictions from publications or presentations according to TCGA publication guidelines. We downloaded data on somatic mutation and clinical information, which was last updated in May 2016. From the 509 patients obtained, those with samples from metastatic tissues (n = 8), history of other malignancies (n = 33), history of neoadjuvant therapy (n = 4), and missing data (n = 11) were excluded. A total of 457 patients were finally included in this study. Patients were categorized according to the 7th [7] and 8th [6] editions of the AJCC/UICC staging system (Tables 1, 2) and the 2009 [5] and 2015 [8] ATA guidelines (Table 3) for thyroid nodules and DTC.

Table 1 Changes between categories of AJCC/UICC 7th and 8th edition
Table 2 Changes between stages of AJCC/UICC 7th and 8th edition
Table 3 Changes between ATA 2009-RSS and 2015-RSS

Statistical analysis

Disease-free survival (DFS) was analyzed according to staging systems using the log rank test to compare between survival curves, and Kaplan–Meier survival plots were generated. The AJCC/UICC staging system (7th and 8th editions) and ATA guidelines (2009 and 2015) were compared using the concordance index (c-index) which measures goodness of fit weighted by the estimated probability density of the time-to-event outcome, Akaike information criterion (AIC) which compares the quality of a set of statistical models, Bayesian information criterion (BIC) which selects model partly based on the likelihood function, and Brier score which measures the accuracy of probabilistic predictions in order to quantify the predictive ability of a survival model [9], to select the statistical model [10], and to measure the accuracy of probabilistic predictions [11] using a cox proportional hazards model (Package ‘survival’, and ‘survAUC’). A higher c-index, and lower AIC, BIC, and Brier score indicated a better model for predicting outcome. Statistical analyses were performed using GraphPad Prism 7 for Mac OS X (GraphPad Software Inc. San Diego, CA, USA), and R statistical software (The R Foundation for Statistical Computing, 2016).

Results

A total of 457 patients with PTC, having a mean age of 45.9 years, were included in this study (120 males, 337 females). Among these patients, 43 (9.4%) experienced recurrence/progression during follow-up (591.2 ± 833.5 months). Patients’ characteristics are summarized in Table 3. When patients were divided according to age, a cut-off value of 45 years did not make a difference in DFS (p = 0.4799), however, patients of 55 years or older showed a worse prognosis (p = 0.0183, hazard ratio 2.03; 1.044–3.948). In addition, the status of BRAF mutation did not affect the prognosis in patients with PTC (p = 0.2869).

AJCC/UICC staging

According to the AJCC 7th edition, 270 patients were categorized as stage I (59.1%), 45 as stage II (9.8%), 92 as stage III (20.1%), and 50 as stage IV (10.9%). All 270 patients in stage I were included in the same stage with AJCC 8th edition. In addition, 39 patients from stage II, 47 from stage III, and 16 from stage IV diagnosed according to the AJCC 7th edition were categorized under stage I according to the AJCC 8th edition, giving a total of 372 patients. Approximately 96.7% of the 457 patients were diagnosed as stage I or II by the AJCC 8th edition (Fig. 1a). Weighted kappa between the AJCC 7th and 8th editions was 0.318, showing a fair agreement. Both 7th (p = 0.0041) and 8th (p < 0.0001) editions of the AJCC/UICC were significantly associated with DFS. However, DFS curves of stages II and III patients overlapped (Fig. 2).

Fig. 1
figure 1

Distribution of patients according to a AJCC/UICC 7th and 8th editions, and b ATA 2009-RSS and 2015-RSS

Fig. 2
figure 2

DFS curve of patients according to the AJCC/UICC staging a 7th and b 8th editions

ATA guidelines

According to the ATA 2009-RSS, 277 patients were considered as low risk (60.6%), 155 as intermediate risk (33.9%), and 25 as high risk (5.5%). Of the 277 low-risk patients, 177 remained in the same risk group when analyzed using the ATA 2015-RSS. However, 74 patients were moved from the intermediate to low risk group with ATA 2015-RSS (Fig. 1b). Weighted kappa between the 2009 and 2015 ATA guidelines was 0.651, showing good agreement. Both 2009 (p < 0.0001) and 2015 (p < 0.0001) ATA guidelines significantly predicted DFS. In addition, DFS curves of the 2009 and 2015 ATA guidelines did not overlap (Fig. 3).

Fig. 3
figure 3

DFS curve of patients according to ATA a 2009-RSS and b 2015-RSS

Comparing the AJCC and ATA models

The AJCC 8th edition, which showed the highest c-index and lowest AIC, BIC, and Brier score, was identified as the best among the models used. The ATA 2009-RSS had the second highest c-index and second lowest AIC and BIC; however, it had the highest Brier score (Table 4 ).

Table 4 Patients’ characteristic

Discussion

This study showed that although all the four RSS examined were useful in predicting the risk of recurrence after initial treatment, the AJCC 8th edition was the most accurate in predicting patient outcome. (Table 5)

Table 5 Comparison of AJCC/UICC TNM staging systems and ATA guidelines

The robust increase in thyroid cancer incidence rates has invited debate about appropriate management strategies for patients with DTC. However, the lack of accurate prognostic indicators or markers for predicting tumor progression and recurrence indicates that appropriate management of DTC is unclear. Over the years, multiple staging systems have been developed to predict the risk of mortality in patients with DTC [12]. However, none of the staging systems have been shown to be clearly superior to the other [2].

AJCC published the 2nd edition of its staging manual in 1977 and was the first to use an age cut-off of 45 years in 1983 [13]. This has remained in use since then, gaining international acceptance [7]. Several studies have demonstrated that the AJCC/UICC staging system had the highest proportion of variance explained when applied to a broad range of patient cohorts, which has been validated through retrospective studies and prospective clinical practice [14, 15]. An age cut-off of 45 years is used, with younger patients being limited to stage II disease in the presence of distant metastasis and stage I in the absence of metastasis. Although younger patients outperformed older patients in terms of DFS, irrespective of the age cut-off selected, clinical experience revealed that many older patients remained at a low risk for disease-specific death, despite the stage grouping assigned by the AJCC/UICC model [13]. In addition, mounting evidence suggests that an age cut-off of 45 years is too low and that many older patients remain at a low risk for disease-specific death [1618]. A recent analysis of the survivability of patients treated at the Memorial Sloan Kettering Cancer Center between 1986 and 2005 concluded that a change from 45 to 55 years in the current AJCC/UICC model would lead to a significant increase in the number of patients being considered at a lower stage, while maintaining excellent outcomes for those considered to have early-stage disease [19]. Findings from a large multicenter study also showed that increasing the age in the current AJCC/UICC staging system to 55 years would help avoid overtreatment in 12% of the patients, while improving the statistical validity of the model [13]. Collectively, changing the cut-point for the AJCC/UICC staging system for DTC from 45 years to 55 years may be warranted. Therefore, the AJCC/UICC model was changed. The results show that a change in the age cut-off of the AJCC/UICC model would impact a large number of patients, particularly those between the ages of 45 and 54 years. A change in the age cut-off would place all patients in this age category under stage I, unless they were M1 at presentation. Our findings are consistent with those of previous reports regarding a change in the AJCC 8th edition [13, 19]. The overwhelming majority of patients in this group would be restaged to AJCC/UICC stage I, which have been incorrectly categorized under a more advanced stage. This shift will help avoid overtreatment and its complications/sequelae, while providing relatively conservative cancer treatment with equally good outcomes.

The newly proposed 2015 ATA guidelines have been published based on recent scientific advances [2]. These guidelines feature a greatly expanded section on risk stratification of thyroid cancer [20]. The most obvious changes in the 2015 guidelines are structural in nature, involving the new approaches to risk stratification. There are also a handful of incremental alterations in guidance related to specific clinical features in the new system. Importantly, the definition of “low risk of recurrence” has expanded, most notably because of the inclusion of “small volume” lymph node involvement. This means that patients having five or fewer lymph node metastases, each <2 mm in the central neck, could still be considered to have a low risk of recurrence. On the other hand, the detection of microscopic extrathyroidal extension and BRAF mutation categorizes patients under an intermediate risk of recurrence, at the least. The “high-risk” group remains mostly unchanged. After implementing the ATA 2015-RSS in the TCGA data, 74 patients were moved from the intermediate to low risk group because of the modification on volume of LN metastasis. Similar pattern of migration was shown in previous studies [9, 21]. Unlike the previous reports [9, 21], the revised ATA 2015-RSS did not show better predictability than the previous ATA 2009-RSS in this study. Among four RSS, ATA 2009-RSS was the second best in predicting PTC recurrence. Previous studies had their own limitations. Pitoia et al. implemented the previously ongoing RSS rather than the officially revised one in a handful of patients [9], whereas Lee et al. did not include BRAF mutation status during restaging and analysis [21]. On the other hand, our data was analyzed from 457 patients having BRAF mutation and full histologic data. Moreover, molecular markers and clinical risk assessment with respect to DTC had been rigorously investigated, showing plenty of evidence on prognostic significance. However, the status of BRAF mutation did not affect the prognosis in patients with PTC in this study, consistent with previous study by Henke et al. [22]. As a result, the 2015 ATA guidelines convey a positive message showing that mutational analysis of thyroid cancer has the potential to refine risk estimates [20]. These discrepancies may be partly explained by the different study populations and follow-up periods of each study. Further studies should examine the clinical utility of the ATA 2015-RSS. Although AJCC staging systems were developed to predict disease mortality and ATA RSSs were built for prediction of recurrence, AJCC 8th edition predicted the recurrence more accurately than other staging systems. As there were trends toward downstaging of AJCC 8th edition compared with AJCC 7th edition, 81.4% of patients were included in stage I, leaving less patients in stage III, and IV. Therefore, higher percentage of patients of stage III (18.2%) or IV (75.0%) in AJCC 8th edition experienced the recurrence, resulting in AJCC 8th edition the most accurate staging system.

Aside from initial RSS, clinicians should continually re-evaluate the risk of recurrence and prognosis for each patient in real time as clinical data accrue. The new 2015 ATA guidelines suggest that clinicians use the response-to-therapy re-stratification system, attributed to previous studies, to analyze the estimated risk of recurrence based on the effectiveness of initial therapy [4]. This iterative process is formalized under the concept of “dynamic risk stratification (DRS).” DRS comprises simple new terminologies to indicate whether a complete or incomplete biochemical (thyroglobulin) or structural (imaging) response to therapy exists. Although data are insufficient to support specific recommendations based on the new DRS response categories, the use of these terms in the literature is increasingly anticipated [20].

This study has some limitations. TCGA cohort data is not a consecutive data. All data were retrospectively collected, which limits the conclusions that can be drawn from any such study. In addition, we could not analyze OS because only two patients died from PTC in this study. However, this study is the first to compare the 8th and 7th editions of the AJCC/UICC staging system and the 2015 and 2009 ATA guidelines for DTC, using data from TCGA. We used four different types of statistical analyses to compare the predictive accuracy of the four RSS. Here we found that the AJCC 8th edition was consistently the most accurate in predicting patient outcome in all four methods.

In conclusion, AJCC 8th edition predicted patient outcome more accurately than other staging systems. Therefore, AJCC 8th edition might be a better and more cost-effective predictor of outcome in patients with PTC.