Thyroid nodules are prevalent in North America, and the incidence of thyroid cancer has been increasing during the past four decades.1,2 The rise in thyroid cancers in Ontario has been largely in cancers smaller than 2 cm and is related, at least in part, to the increased use of ultrasound detection of incidental nodules.3,4

Because patients with thyroid cancer have excellent survival results, there has been a recent trend toward treatment de-escalation. In keeping with this trend, a group of thyroid cancer experts recently studied a group of 109 patients with noninvasive encapsulated follicular variant papillary thyroid cancer (FVPTC) who underwent lobectomy without radioactive iodine (RAI) and found no evidence of disease recurrence during a median follow-up period of 13 years, suggesting a nomenclature change for this group to noninvasive follicular thyroid neoplasm with papillary-like features (NIFTP).5 This group has never been studied using population-based data, and little is known about the impact of this new diagnostic category on the incidence and outcomes for patients with thyroid neoplasms.

This study aimed to determine the incidence of NIFTP among all well-differentiated thyroid cancers (WDTCs) in Ontario (1991–2000) and the predictors of disease-free survival (DFS) by comparing patients with FVPTC and those with NIFTP in a cohort with long-term follow-up evaluation.

Methods

Study Design and Setting

This population-based retrospective cohort study included all patients who had definitive surgery for well-differentiated thyroid cancer (follicular and papillary thyroid cancers) in Ontario, Canada between 1 January 1990, and 31 December 2001 and were followed up until 31 December 2014.

Data Collection

Patients with a diagnosis of thyroid cancer (ICD-9; n = 193) during the study years were identified in the Ontario Cancer Registry (OCR), a well-validated cancer registry with a cancer capture rate higher than 98% for all noncutaneous malignancies.6,7 Specific histology codes were assessed, and initially, all well-differentiated thyroid cancers were included (n = 3122; Fig. 1).

Fig. 1
figure 1

Cohort development flow diagram demonstrating how the cohort was developed, including reasons for exclusion

Sampling

Wide variations in incidence and treatment exist across geographic regions in Ontario, as defined by eight regional cancer treatment centers (RCTC) and as previously demonstrated by our group.8 Inherent to this variation was that 47% of the patients were treated in the Toronto RCTC. Both to ensure generalizability and to decrease the cost of pathology report data abstraction, only every fourth (25%) OCR case was sampled by date of diagnosis from the Toronto RCTC for each year plus all the patients from the remainder of the province (3122 unweighted, 6212 weighted).

Pathology Report Review

As part of a previous study,3,9 all thyroid cancer-related surgical pathology reports were requested from the date of diagnosis forward from the OCR. Patients who had only a fine-needle aspiration or biopsy were excluded from the study. All reports then were abstracted by two trained research associates for variables related to histologic features, extent of disease, and extent of surgery.3,9

Primary Predictor-Pathology Report Re-review (FVPTC vs NIFTP)

In this study, using patient identification numbers and pathology report numbers, we went back to the original pathology reports and using the same two trained abstractors, reabstracted key histopathologic variables only for those patients with FVPTC to assess which patients might be candidates for NIFTP. We then used a conservative decision rule to subtype certain FVPTCs into NIFTPs. For a patient in our FVPTC cohort to be considered for NIFTP, that patient had to have an encapsulated tumor, no tumor capsule invasion, no vascular invasion, no thyroid capsule invasion, and no extrathyroidal extension or spread.

Furthermore, we excluded from the NIFTP group tumors with true papilla (> 1%), psammoma bodies, infiltrative border, tumor necrosis, high mitotic activity, cellular or morphologic characteristics of other variants of papillary thyroid cancer (PTC) (including aggressive variants such as tall-cell, columnar-cell and diffuse sclerosing variants), and poorly differentiated tumors.

Finally, after reviewing thousands of reports from the original study and basing their rating on the detail provided in each report, the abstractors were able to categorize report quality as “excellent,” “good,” “poor,” or “very poor” as part of our multivariable and sensitivity analysis. This scoring system was not based on an objective score but rather on a global subjective assessment of the report as it related to the objectives of this study.

Database Linkage

We identified 751 patients with FVPTC, whose records were re-linked using health administrative databases at the Institute for Clinical and Evaluative Sciences (ICES), an independent, nonprofit research organization funded by the Ontario Ministry of Health and Long-Term Care. The linkage into ICES included the Ontario Health Insurance Plan physician billing codes with dates (surgery, type of surgery, RAI treatment), the Canadian Institutes of Health Information (CIHI) for hospital procedure codes (surgery, types of surgery, RAI treatment, dates of treatment), and the Office of the Registrar General Death Database (ORGD) for survival data. Of the 751 pathology reports, 26 could not be linked, leaving a final population of 725.

Primary Outcome

The primary outcome of the study was DFS, defined as death from thyroid cancer or a recurrence event. Vital status and cause of death were captured from the Office of the Registrar General Death Database (ORGD). The first recurrence and the date of the first recurrence were captured using one of the following administrative data events: (1) a positive biopsy at least 1 year after the index surgery, (2) a neck dissection alone any time after the index surgery but not until RAI had been administered to signify the end of treatment, (3) thyroid or thyroid and neck surgery at least 12 months after the index surgery, (4) the first RAI administration 12 months after the index surgery, (5) or a subsequent RAI at least 4 months after the first RAI.3,9 In Ontario, before the 2015 American Thyroid Association guidelines and certainly during the years of this study, patients with PTC in a hemi- or subtotal thyroidectomy specimen were nearly always recommended to have a completion thyroidectomy (with or without central neck dissection) within 12 months after their index surgery.

Covariates

Age was dichotomized based on 45 years, according to the seventh edition of the American Joint Committee on Cancer (AJCC) tumor-node-metastasis (TNM) classification.10 Comorbidity was measured using the Elixhauser scale based on a look-back period before diagnosis up to 1988. As a summative scale based on 31 domains from hospitalization data, higher scores are associated with increasing comorbidity (cutoffs being 0, 1, 2, and > 2).11,12,13 Pathology reports were used to capture tumor size in centimeters, lymph nodes removed and the number of involved nodes, positive margin, lymphatic invasion, and tumor focality.

Initial treatment was based on the information available in the operative and pathology reports. The time between treatments was used to define initial versus salvage treatments. For example, a completion thyroidectomy within 90 days after lobectomy was classified as an initial total thyroidectomy unless recurrence was clearly stated in the documentation. Treatments such as combinations with radiotherapy or inconsistent sequences such as RAI after biopsy alone were grouped as “other.” The five initial treatments were (1) lobectomy ± isthmusectomy, (2) lobectomy ± isthmusectomy plus completion thyroidectomy within 12 months, (3) total thyroidectomy, (4) total thyroidectomy plus RAI within 12 months, (5) lobectomy ± isthmusectomy plus completion thyroidectomy within 12 months plus RAI within 12 months, and (6) other.

Statistical Analysis

Demographic, pathology, and treatment-related information was summarized using descriptive statistics. Comparisons between the NIFTP and FVPTC (non-NIFTP) groups were made using Chi square tests given categorical predictor variables. The Kaplan–Meier method was used to estimate the time-to-event outcome statistics, which included 5-, 10- and 15-year DFS rates. Time-to-event statistics were calculated from the date of diagnosis to the event or outcome of interest. Univariable Cox proportional hazards models were used to examine the unadjusted association between predictors and the main outcome of interest (DFS). The following factors were assessed as potential predictors of DFS: age, sex, margin status, lymphatic invasion, positive lymph nodes, tumor focality, Elixhauser comorbidity scale, tumor size, and our primary predictor (FVPTC vs NIFTP). Based on our a priori statistical plan, control was used for these same variables in a multivariable Cox proportional hazards model predicting DFS. Before this, collinearity was assessed based on a variance inflation factor cutoff value lower than 2.5, and no multicollinearity was seen between the variables included in the multivariable model. The adjusted DFS survival curves for FVPTC versus NIFTP are also presented. Statistical analyses were performed using Statistical Analysis Software (SAS version 9.3; SAS Institute, Cary, NC, USA).

Sensitivity Analysis

Two sensitivity analyses were performed, both based on a pathology report quality variable given the importance of this variable for this particular study. We re-ran the analysis twice after excluding the “very poor” followed by both the “very poor” and “poor” pathology reports to assess whether this would change any of our conclusions. A third sensitivity analysis also was performed to re-categorize patients with lymph node metastases after their initial thyroid procedure to the FVPTC group to assess whether this would have an impact on our conclusions for the DFS analysis.

Incidence Rate

Using our cohort, incidence rates were plotted for (1) all well-differentiated thyroid cancers, (2) follicular thyroid cancers, (3) papillary thyroid cancers (non-NIFTP), and (4) NIFTP from 1990 to 2001. Because 2, 3, and 4 all are mutually exclusive, they are additive, summing to 1.

Results

Pathology Re-review Cohort Development

At the pathology re-review of the 725 FVPTCs, 318 were reclassified as potential NIFTPs based on the pathology of the primary tumor. Comparisons of the tumor variables used to derive the two cohorts are listed in Table 1. All the NIFTP tumors were encapsulated, without extra-thyroidal spread, infiltrative borders, lymphatic invasion, thyroid capsule invasion, or vascular invasion. Many additional pathology report features were used to exclude patients from the NIFTP group, specifically true papilla (> 1%), psammoma bodies, infiltrative borders, tumor necrosis, high mitotic rate, and morphologic features of an aggressive variant. However, these were “unstated” in respectively 87, 74, 89, 95, 90, and 90% of the pathology reports. With regard to tumors at the tumor capsule but not through it, no difference was observed between the FVPTC (n = 12, 2.9%) and NIFTP (n = 14, 4.4%) (p = 0.30). Differences were noted in report quality between the two groups (p < 0.01), with the FVPTC group having overall better-quality reports (67.6% vs 50.0% excellent or good in the NIFTP group).

Table 1 Pathology review by tumor classification

Demographic and Clinical Data

Demographic and clinical variables of the FVPTC and NIFTP cohorts are presented in Table 2. The population was largely young (≤ 45 years: 58.2% of FVPTC and 52.5% of NIFTP) and female (80.3% of FVPTC and 84.6% NIFTP). The FVPTC and NIFTP groups did not differ in terms of age, sex, Elixhauser score, or primary treatment method. However, differences in tumor size were noted between the two groups, with more large tumors (> 2 cm) in the FVPTC group (53.4%) than in the NIFTP group (44.2%). Similarly, more patients had positive lymph nodes in the FVPTC group (n = 63, 15.5%) than in the NIFTP group (n = 21, 6.6%).

Table 2 Demographic and clinical variables by tumor classification

Outcomes

The median follow-up time was 15.3 years for the entire cohort and 15.9 years for those alive at the last follow-up visit. Disease failure occurred for 109 patients, 79 (19.4%) in the FVPTC group and 30 (9.4%) in the NIFTP group (p < 0.01). Because our cohort had too few disease-specific deaths (n = 18), an analysis of disease-specific survival was not possible. In our study cohort, 112 deaths occurred, 61 (15%) in the FVPTC group and 51 (16%) in the NIFTP group (p = 0.70).

The 5-, 10-, and 15-year DFS rates for the FVPTC group were respectively 85.7% (95% confidence interval [CI], 81.9–88.8%), 82.9% (95% CI, 78.9–86.2%), and 80.4% (95% CI, 76.2–84.0%). The 5-, 10-, and 15-year DFS rates for the NIFTP group were respectively 93.1% (95% CI, 89.6–95.4%), 92.1% (95% CI, 88.5–94.6%), and 90.6% (95% CI, 86.8–93.4%). In the univariable analysis, these were statistically significant (hazard ratio [HR], 2.14; 95% CI, 1.41–3.26; p < 0.01).

Univariable Analysis

The eight clinical and pathologic factors chosen a priori for multivariable analysis were analyzed using univariable techniques to determine whether they were predictors of DFS (Table 3). The following variables were significant predictors of DFS in the univariable analysis: male sex (HR, 1.61; 95% CI, 1.04–2.49; p = 0.03), Elixhauser score (score 1 vs 0: HR, 1.72; 95% CI, 1.10–2.68 [p = 0.02]; score 2 vs 0: HR, 2.13; 95% CI, 1.09–4.17 [p = 0.03]; score 3 + vs 0: HR, 2.68; 95% CI, 1.37–5.25 [p < 0.01]), lymphatic invasion (HR, 2.37; 95% CI, 1.04–5.41; p = 0.04), tumor size (≥ 7 vs 1–2 cm; HR, 2.89; 95% CI, 1.12–7.44; p = 0.03), and lymph node involvement (HR, 2.83; 95% CI, 1.83–4.37; p < 0.01).

Table 3 Uni- and multivariable analyses of disease-free survival

Multivariable Analysis

Based on a priori hypotheses, our multivariable analysis included the following variables: age, sex, margin status, lymphatic invasion, lymph node involvement, tumor focality, Elixhauser comorbidity score, and tumor size. After controlling for these variables (Table 3), our analysis demonstrated that FVPTC had significantly worse DFS than NIFTP (HR, 1.84; 95% CI, 1.17–2.89; p < 0.01). This analysis also demonstrated that high Elixhauser score (3 + vs 0: HR, 2.22; 95% CI, 1.08–4.59; p = 0.03), large tumor size (≥ 7 vs 1–2 cm: HR, 3.19; 95% CI, 1.18–8.61; p = 0.02), and lymph node involvement (HR, 2.31; 95% CI, 1.35–3.96; p < 0.01) all were independent predictors of DFS. The adjusted multivariable DFS curves for the FVPTC and NIFTP groups are presented in Fig. 2.

Fig. 2
figure 2

Adjusted multivariable disease-free survival. Kaplan–Meier adjusted disease-free survival curve is based on the multivariable Cox regression model

Sensitivity Analysis

Two sensitivity analyses were performed, both based on the pathology report quality variable given the importance of this variable for this particular study. The analyses excluded the “very poor” followed by the “very poor” and “poor” pathology reports, and this did not change any of our findings or conclusions. A third sensitivity analysis was performed moving all patients with lymph node metastases automatically to the FVPTC group (despite the diagnosis for all these at subsequent procedures and not at the time of original thyroidectomy), and this also did not change our DFS analysis results or conclusions.

Incidence Rate

Using our cohort, incidence rates were plotted for all well-differentiated thyroid cancers, follicular thyroid cancers, papillary thyroid cancers (non-NIFTP), and NIFTP from 1990 to 2001 (Fig. 3). During the study period, a marked increase in the incidence of NIFTP was noted (from 0.18 per 100,000 in 1991 to 1.46 per 100,000 in 2001). However, this proved to be a small proportion of the incidence of papillary thyroid cancers during those study years (from 3.02 per 100,000 in 1991 to 6.97 per 100,000 in 2001). After recategorization of certain FVPTCs into NIFTPs, we found that NIFTPs accounted for 16.8% (1.461/8.699) of all WDTCs. Therefore, if NIFTP is categorized in cancer registries as a nonmalignant diagnosis, this may have a significant impact on the incidence of thyroid cancers.

Fig. 3
figure 3

Incidence rates of noninvasive follicular thyroid neoplasm with papillary-like features (NIFTP) and non-NIFTP thyroid lesions in Ontario. Incidence rates are presented for all well-differentiated thyroid cancers, follicular thyroid carcinoma, NIFTP, and non-NIFTP thyroid cancers between 1990 and 2001

Discussion

Our study demonstrated a rise in WDTC, related in part to a rise in NIFTP cases, which accounted for 16.8% of cases in 2001 based on our study. This incidence rose dramatically during the study years. In our dataset, NIFTP carries a significantly lower risk of locoregional disease failure (9.4%) than FVPTC (19.4%) (p < 0.01). This effect is sustained in uni- and multivariable Cox proportional hazards regression models. However, despite a very conservative algorithm used to categorize certain FVPTCs into NIFTPs, our study demonstrated a higher than expected disease failure rate in the NIFTP group (9.4%).

Our results regarding disease failure are in strong contrast to the results originally reported by those supporting the change in nomenclature to NIFTP. The original manuscript on this topic reviewed 109 patients with noninvasive encapsulated FVPTC who were treated with lobectomy and did not receive RAI.5 In this group, all the patients were alive during a median follow-up period of 13 years, with no evidence of disease. This is in stark contrast to the group of patients with invasive encapsulated FVPTC (n = 101), 12% of whom had an adverse event, including five patients who experienced distant metastases and two who died of their disease. Smaller cohorts have demonstrated similar results, showing a zero recurrence rate even when lobectomy was performed without completion thyroidectomy or when RAI was performed for both large (> 4 cm) and small (< 1 cm) NIFTPs, both of which were initially excluded from the reclassification study.14,15,16 However, even top endocrine pathologists disagree regarding the diagnostic criteria for this entity, which has already been modified since publication.17 Most importantly, the pathologists who participated in the original study were not blinded to the molecular panel associated with each thyroid tumor being reviewed, which was used to assist in the development of the NIFTP criteria.

Despite these preliminary findings, others have reported a recurrence rate higher than zero and have warned against de-escalation of treatment and loss to follow-up evaluation after the nomenclature change.18,19 A study that reviewed the pathology reports for a cohort of 903 potential NIFTP candidates (FVPTC) demonstrated a 2.1% incidence of NIFTP among all WDTCs.18 The incidence probably was higher in this cohort, but many specimens (134/903) could not be assessed for capsular invasion based on the slides reviewed. This is a critical finding from this study. Even at a high-volume quaternary endocrine pathologist practice, many specimens were not handled in such a way as to assess the entire tumor capsule for invasion, a key criterion in the new NIFTP categorization. Interestingly, despite very stringent criteria, this study found that 6% of patients with NIFTP demonstrated malignant behavior and warned against the use of NIFTP as a benign diagnostic category.19

Our study was particularly interested in how the NIFTP diagnosis may be interpreted in nonacademic centers by largely non-endocrine pathologists, which is represented by the differences between our methodologic and statistical approach and the Parente et al.18 analysis. This also explains the discrepancy in the incidence of NIFTP in these two studies, and we suspect that non-endocrine pathologists may have a higher NIFTP diagnosis rate. However, data from Ontario are not the only data demonstrating the metastatic potential of NIFTP tumors, albeit at a low rate.20,21 This is consistent with our study, in which despite a very conservative definition, 9.4% of the patients had a locoregional failure.

Our study had a number of advantages. It was the largest reported study in the literature on patients with NIFTP. It was population-based in a universal health care system with a very high capture rate of health care events. Our study also provided valuable information on how this new diagnostic entity may be interpreted in the real world with a large population for which most thyroidectomy specimens were not being reviewed by academic endocrine anatomic pathologists. The extensive and thorough review of the pathology reports by expert thyroid abstractors strengthened our results. Our study also had internal validity given the differences in DFS between the NIFTP and FVPTC groups, which were sustained after control was used for pathology report quality. Even in quaternary centers, the entire tumor capsule may not be assessed given how tedious this can be for both the pathology technician and the pathologist. Furthermore, even in scenarios with full tumor capsule assessment, poor interrater reliability exists among pathologists concerning the definition of tumor capsular invasion, which can be subtle and difficult to assess.17 Therefore, our study, using population-based data, demonstrated the potential for misclassification of FVPTC into the NIFTP category.

These data must be interpreted in the context of the study design. The most important limitation of this study was that it based our definition of NIFTP on a thorough review of pathology reports. We did not have access to the slides, and a formal pathology slide review was not possible. However, a very conservative criterion for classifying NIFTP and expert pathology review abstraction was applied. Our findings are strikingly similar to those of other groups regarding the non-zero recurrence rate in this population.18,20,21 Furthermore, other groups studying reclassification of this patient population through pathology review have used slides that frequently did not fully assess the entire tumor capsule, and this equally limited the accuracy of those studies in relation to our study.15,16

Removing the word “cancer” from a neoplasm that has a potential misclassification error and metastatic potential may result in undertreatment and inadequate surveillance. We therefore recommend caution about this diagnosis until further research demonstrates its widespread safety. Our recommendation is based on the following facts: (1) NIFTP is likely to undergo further changes to its diagnostic criteria, (2) the small retrospective studies that have demonstrated the zero recurrence rate all have been performed at quaternary centers, (3) there is significant risk of misclassification, particularly if the entire tumor capsule is not assessed, as is the case at most centers, and (4) the malignant potential, even in cases that have been confirmed as NIFTP with full tumor capsule assessment, has been demonstrated. Even if the entire tumor capsule is assessed, unless the neoplasm is completed (“breadloafed”), there continues to be the risk of misclassification, particularly for large nodules. During the implementation phase of this new diagnosis, increased training and communication between academic and nonacademic pathologists and the increasing use of pathology review may prevent misclassification. The NIFTP diagnosis is challenging for the pathologist, and this may make tumor behavior difficult to predict for this entity. Ultimately, molecular testing may be required to differentiate between NIFTP and FVPTC with higher reliability and accuracy than the current diagnostic criteria allows, although this currently would be prohibitively expensive.

The implications of reclassifying NIFTP as a nonmalignant disease could have significant unintended consequences, including the fact that cancer agencies may not continue collecting incidence and outcomes data at a cancer registry level. We recommend that these tumors be captured in cancer registries to assist with future study of this entity as we work toward de-escalating the management of thyroid nodules and cancers.

In conclusion, NIFTPs comprise approximately 16.8% of WDTCs in Ontario. They are associated with a 9.4% disease failure rate, in keeping with other recent reports. Therefore, further prospective and population-based studies are required to understand better the implications of this new diagnostic category. Until these studies are available, caution should be used in the management of patients with an NIFTP.