Introduction

Postoperative thyroid remnant ablation with “therapeutic” activities of 131I has traditionally been applied in most (near-)totally thyroidectomized patients with differentiated thyroid carcinoma (DTC). Three main rationales have been offered for this practice [13]. First, ablative radioiodine administration allows highly sensitive posttherapy whole-body scintigraphy (rxWBS) to discover previously undetected disease. Second, the ablative activity also treats microscopic or other occult radioiodine-sensitive tumour, possibly decreasing DTC recurrence and mortality [4, 5]. Last, by eliminating sources of thyroglobulin (Tg) secretion and radioiodine uptake, ablation should improve the sensitivity of serial monitoring for recurrence using serum Tg testing or diagnostic whole-body scintigraphy (dxWBS).

However, these benefits, especially benefits related to outcome, can be difficult to prove. Most published data in the DTC setting are retrospective and observational. Additionally, studies may be underpowered or of insufficient duration to detect infrequent, slow-to-emerge events, e.g. recurrence or cause-specific death, in patients with this generally relatively indolent neoplasm with high survival [6, 7]. Thus, and because radioiodine therapy sometimes has side effects [810], recent DTC management guidelines (for example references [1115]) generally advocate “selective” use of this procedure in patients considered to be at “low-risk” or even “intermediate-risk” of DTC recurrence or mortality. Thus the American Thyroid Association 2009 DTC management guidelines [11] include patients with T1–2 or N1 disease in the “selective ablation use” category, and indeed, recommend against ablation for N0–Nx disease. T1–2 and N1 patients also should be among the “selective use” population according to the National Comprehensive Cancer Network DTC management guidelines [15]. However, although they are valuable compendia of state-of-the-art data and disease management suggestions, DTC management guidelines are usually formulated by physicians from international centres of excellence, and may not be fully applicable elsewhere [16, 17]. For example, clinicopathological characteristics themselves, or their documentation, may be less clear-cut in other centres, and higher-risk features such as anti-Tg autoantibodies (TgAb) may be more prevalent in certain countries or regions.

Despite guideline recommendations, at our centres, we have in recent years continued to use radioiodine ablation in nearly all patients with a primary tumour ≤4 cm and no evidence of distant metastasis, i.e. pT1–2 M0 status. We make these therapeutic decisions on an individualized, patient-by-patient basis in everyday practice. In our centres, such decision-making generally does not rely on postsurgical neck ultrasonography or scintigraphic imaging; the former may be difficult to interpret due to postoperative inflammation [11] and the latter may impose excessive logistical demands on patients, may slow down workflow, or may require prohibitively expensive isotopes (e.g. 123I) to avoid “stunning”.

For these reasons, it can be challenging to gain a “big picture” of the factors leading to the decision to ablate in this putatively “low–intermediate-risk” population. We therefore performed the present retrospective quality assurance study with three objectives. First, we sought to determine key preablation clinicopathological and treatment characteristics of pT1–2 M0 DTC patients ablated in our centres during a recent period of about 3 years, i.e. the factors influencing our decision to ablate. Second, through retrospective evaluation of postablation scintigrams regarding the number and intensity of foci of thyroid bed uptake, we aimed to assess thyroid remnant size in this cohort. This endpoint reflects the necessity for ablation, at least provided that rationales for the procedure are accepted. Our third goal was to identify clinicopathological or treatment characteristics associated with thyroid remnant size before ablation. Such variables might help identify “surgically-ablated” patients, and hence could help improve ablation-related decision-making.

Materials and methods

Patients, setting, and ethics

The study included 336 patients with classical or follicular variants of papillary DTC or with follicular DTC (including Hürthle cell histology), and with pT1–2 M0 status before ablation. This cohort comprised all such patients with available medical records who from 17 September 2010 to 31 October 2013 underwent a first radioiodine ablation procedure at the Nuclear Medicine Departments of Bank of Cyprus Oncology Centre (BOCOC), Strovolos Nicosia, Cyprus (n = 198) or Papageorgiou Hospital (PGH), Thessaloniki, Greece (n = 138). These institutions are the largest tertiary referral centres for DTC in Cyprus and Northern Greece, respectively. The cohort comprised >99 % and about 95 %, respectively, of patients with pT1–2 M0 papillary or follicular DTC initially presenting at BOCOC or PGH during the study period. The patients seen during this period who were not included in this study (nine patients for both centres combined) had missing medical records (two patients), refused ablation due to fears regarding radiation (six patients) or refused consent for use of their data in scientific studies/publications (one patient).

Ethics committee approval for this study was not sought, since the analysis was anonymous and retrospective, and therefore neither affected patient privacy nor entailed additional interventions. Moreover, before radioiodine treatment, patients provided written informed consent allowing use of their data in such analyses.

Surgery and ablation

Patients had to have undergone (near-)total thyroidectomy. Ablation took place a median of 73 days [interquartile range, IQR, 52–136 days] after surgery (334 patients). Patients were prescribed a low-iodine diet for the 10 days before the procedure.

All but one patient at BOCOC and 64.5 % at PGH (89/138) had ablation stimulated by thyroid hormone withdrawal (THW; no thyroid hormone for the 2–6 weeks before ablation). One patient at BOCOC and 35.5 % at PGH (49/138) had ablation stimulated by the approved regimen of recombinant thyroid-stimulating hormone (rhTSH, Thyrogen; Genzyme, Cambridge, MA) comprising two consecutive daily intramuscular injections of 0.9 mg. Within a few days after surgery, the patients receiving rhTSH were placed on thyroid hormone therapy, which was not interrupted or altered for ablation.

TSH at ablation was <30 mIU/L in 0.6 % of patients with available data (2 of 334), but was always >20 mIU/L. Patients were ablated with fixed empirical radioiodine activities, typically (280 of 336 cases, 83.3 %) 3.7 GBq (100 mCi). Patients were hospitalized for 2–4 days after ablation, and were discharged when the exposure rate at 1 m was <40 μSv/h, unless discharge was medically contraindicated.

rxWBS

rxWBS was performed 5–10 days after ablative radioiodine administration. Anterior and posterior planar images of the cranium, thorax and abdomen from the top of the head to the inguinal region, and spot images as needed, were obtained while the patients were supine. Large field-of-view double-headed gamma cameras (BOCOC: Infinia Hawkeye 4GP3, GE Healthcare, Tirat Carmel, Israel; PGH: ADAC dual-head, Philips, Amsterdam, The Netherlands) equipped with high-energy collimators were used. Scanning was performed for ≥500,000 counts (about 30 min). To minimize workflow disruption and time and logistical demands on patients, neck uptake was not quantified.

Remnant size scoring

Thyroid remnant was retrospectively given classification scores based on visual assessment of the number of foci of thyroid bed uptake and their overall intensity on postablation scintigrams. The classification scores were derived as the product of two subscores, the number of thyroid bed foci and the overall uptake intensity. The number of thyroid bed foci was objectively scored as: 0 no foci, 1 one focus, 2 two foci, 3 three foci, and 4 more than three foci. The overall uptake intensity was subjectively scored, without reference to other tissues, e.g. liver, as: 0 no uptake, 1 low uptake, 2 intermediate uptake, and 3 star effect. (Representative postablation WBS images illustrating scores for intensity of uptake and maximum remnant scores are shown in Fig. 1.) Thus there were nine possible remnant size scores, ranging from 0, no apparent remnant, to 12, multiple remnants, high uptake.

Fig. 1
figure 1

Representative postablation WBS images illustrating scores for intensity of uptake and maximum remnant scores: a score 1, low uptake, b score 2, intermediate uptake, c score 3, star effect, d maximum remnant score of 12 (foci subscore of 4, intensity subscore of 3)

Remnant classification scoring for each centre’s images was performed by two experienced nuclear medicine physicians at the respective institutions (N.E. and S.F. at BOCOC; E.I.G. and I.P.I. at PGH). Each physician pair worked jointly; disagreements were resolved by consensus between one scorer from each centre (S.F. and I.P.I.). The scorers from the respective centre were involved in the patients’ treatment; however, images were labelled only with five-digit patient identification numbers, and were evaluated approximately 1–3 years after ablation, without reference to other patient data.

Histopathological classification

Histopathological classification was performed by the pathologist working with the given surgeon, based on the surgeon’s report as well as the pathologist’s analysis of excised tissue. The histology report and the American Joint Committee on Cancer/Union Internationale contre le Cancer Tumour, Nodes, Metastasis (TNM) system, 6th edition [18] were used to stage each patient’s disease before ablation. Patients were classified as free of cervical lymph node metastasis (N0) if one or more such nodes were excised and all excised nodes were negative for DTC on pathological examination. Patients not meeting these criteria but without evidence of cervical node involvement before ablation were classified as Nx.

Biochemistry

Tg and TgAb were measured using commercial assays (see Supplementary Table 1 for methodological details) by one accredited central laboratory for each institution (an independent firm at BOCOC; part of the Nuclear Medicine Department at PGH). Throughout the study period, each laboratory used a single assay for each of these two analytes. Samples were drawn immediately before ablative radioiodine administration in the 286 patients undergoing THW or 72 h after the second rhTSH injection, i.e. 48 h after ablative administration, in the 50 patients receiving rhTSH.

Statistics

Discrete variables are expressed as counts and percentages, or vice versa, continuous variables as median (minimum–maximum) or median [IQR]. Patient characteristics were compared between centres using the chi-square test or Mann-Whitney test, as appropriate.

Spearman’s rank order correlation coefficient was assessed to explore the strength and direction of associations between predefined variables of clinical interest and the thyroid remnant score as a surrogate for remnant size. The variables comprised age at ablation, DTC histology, number of primary tumours, T stage, primary tumour multifocality, maximum primary tumour diameter, N stage, TSH stimulation method, surgeon’s referral score, number of cervical lymph nodes excised, and number of metastatic cervical nodes excised. The referral score was assigned on a scale of 1–4 points based on the number of patients operated on by a given surgeon who were referred to BOCOC or PGH during the study period. The score comprised a proxy for the surgeon’s experience in thyroid surgery. Surgeons referring <5 patients received a referral score of 1; 5–9 patients, a score of 2; 10–19 patients, a score of 3; and ≥20 patients, a score of 4. Age, maximum primary tumour diameter, number of cervical lymph nodes excised, number of excised cervical lymph nodes with DTC involvement, and referral and remnant scores were analysed as continuous variables. Analysed as categorical variables were: DTC histology (follicular vs. papillary, classical variant vs. papillary, follicular variant, vs. Hürthle cell), T stage (T1 vs. T2), multifocality (yes vs. no), N stage (N0 vs. N1 vs. Nx), TSH stimulation method (THW vs. rhTSH).

Two-tailed P values of the Spearman correlations were calculated based on a test statistic assuming a t-distribution with 334 degrees of freedom. Independent variables with a statistically significant Spearman correlation with the remnant score were then included in a multiple linear regression model, with the dependent variable remnant score as a continuous variable. A stepwise backward elimination procedure was applied by eliminating one at a time the factors with the highest P values ≥0.05.

Testing was two-tailed; P < 0.05 was considered to be statistically significant. SPSS version 18.0.1 (IBM SPSS, Armonk, NY) was used.

Results

Cohort characteristics: comparison between centres

Table 1 summarizes and compares key preablation characteristics, and Table 2, Tg and TgAb findings, by study centre. At both centres, roughly three-quarters of the patients were female and tended to be in early middle age.

Table 1 Patient characteristics
Table 2 Tg and TgAb findings

The cohorts from the two centres differed significantly regarding DTC histology, T and N status before ablation, median longest diameter of the primary tumour, capsule infiltration, surgeons’ referral score, cervical lymph nodes excised before ablation, and rates of Tg <1 μg/L, TgAb positivity, and unavailable data regarding TgAb. Specifically, the BOCOC patients overwhelmingly comprised patients with the classical papillary histotype, whereas fewer than 40 % of the PGH patients had this characteristic – half had follicular variant papillary thyroid cancer. Virtually all the BOCOC patients had T1 disease, whilst slightly over one-quarter of the PGH patients had T2 disease. In line with these T status profiles, the median longest diameter of the primary tumour was 50 % larger in the PGH patients than in their BOCOC counterparts. However, the PGH patients tended to have less advanced and more frequently characterized capsule infiltration than did the BOCOC patients. Perhaps reflecting their larger median primary tumour size, PGH patients significantly less frequently had Tg <1 μg/L than did the BOCOC patients. TgAb positivity was significantly more frequent among the BOCOC patients than among the PGH patients, although it was frequent among patients from both centres. Virtually all BOCOC patients had available data regarding TgAb status, but about one in twenty PGH patients lacked such data.

The PGH thyroid surgeons operated on fewer patients than did the BOCOC surgeons; the PGH surgeons excised fewer cervical lymph nodes and less frequently removed such nodes. Related to the latter, the PGH cohort included a far larger percentage of patients with Nx disease than did the BOCOC cohort.

Cohort characteristics and risk classification

Despite favourable histotypes and primary tumour classifications, substantial proportions of patients at both centres had characteristics that suggested higher risk or uncertainty regarding risk, or both. Thus about one-third of each cohort had multifocal primary tumour. The largest tumour diameter in the thyroid was ≥1 cm in more than one-third of the BOCOC patients and in more than three-fifths of the PGH patients. Almost one-quarter of the BOCOC patients and almost one-fifth of the PGH patients had known cervical lymph node metastases before ablation. Reflecting frequently limited cervical lymph node excision, another quarter of BOCOC patients and a majority of PGH patients had unknown cervical lymph node status before ablation.

Regarding biochemistry, at ablation, more than two-fifths of BOCOC patients and almost one-third of PGH patients were TgAb-positive. Of TgAb-negative patients with available Tg measurements, 82.5 % of BOCOC patients (94/114), 90.7 % of PGH patients (78/86), and 86.0 % overall (172/200) had Tg concentrations ≥1 μg/L.

Figure 2 shows the overall pattern of patient referrals by surgeons at BOCOC and PGH. Altogether 90 surgeons performed thyroid surgery on the 336 patients. Only six of these surgeons (7 %) operated on ten or more patients, and 41 (46 %) operated on only a single patient.

Fig. 2
figure 2

Numbers of patients operated on by each surgeon at a BOCOC, b PGH, and c both centres combined. At BOCOC, altogether 44 surgeons performed thyroid surgery on 198 patients; at PGH, altogether 46 surgeons performed thyroid surgery on 138 patients. At BOCOC, only 4 of the 44 surgeons (9 %) and at PGH, only 7 of the 46 surgeons (15 %) each operated on >5 % of patients

Thyroid remnant size

Table 3 presents data regarding thyroid remnant size, based on retrospective assessment of the number and intensity of foci of thyroid bed uptake on postablation planar scintigrams. Virtually all patients had evidence of gross postsurgical thyroid remnant (thyroid bed uptake was absent in only 4 % of the BOCOC patients and in 1.4 % of the PGH patients). Remnant was substantial in almost one-third of BOCOC patients and almost one-half of PGH patients, as reflected by remnant scores of 6, 8, 9 and 12, the four highest of nine possible scores.

Table 3 Thyroid remnant scores

Factors associated with thyroid remnant scores

Table 4 shows Spearman correlations between pre-defined factors of clinical interest and thyroid remnant scores for the overall cohort and by centre. Also presented are the results of a multivariate analysis of the relationships of these variables.

Table 4 Results of Spearman and multivariate regression analyses of factors potentially related to thyroid remnant score

In the univariate analysis, for the entire cohort and each centre’s cohort, the surgeon’s referral score and the number of cervical lymph nodes excised were significantly negatively correlated with the remnant score. T stage was significantly correlated with the remnant score for the entire cohort, but the relationship did not attain significance for either centre alone. N stage was significantly correlated with the remnant score for the entire cohort and the BOCOC cohort, but not the PGH cohort. Conversely, the number of metastatic cervical nodes excised was significantly negatively correlated with the remnant score for the entire cohort and the PGH cohort, but not the BOCOC cohort.

Backward multiple regression analysis revealed that the dependent variable, remnant score, could be independently predicted by the surgeon’s referral score for the entire cohort (P = 0.025) and for the BOCOC cohort (P = 0.017), but not for the PGH cohort (P = 0.307). The number of cervical lymph nodes excised independently predicted the remnant score for the entire cohort (P = 0.037) and the PGH cohort (P = 0.005), but not the BOCOC cohort (P = 0.35). T stage, N stage, and number of excised metastatic lymph nodes did not significantly predict remnant score for the entire cohort or for the individual centre cohorts for which these variables were tested. These findings suggest that surgeon experience or the completeness of neck surgery predicts postsurgical, preablation thyroid remnant size.

Discussion

This retrospective analysis of numerous patients with putatively “low–intermediate-risk” pT1–2 M0 DTC ablated over a recent period of about 3 years at two tertiary referral centres had three main findings. First, virtually all patients had at least one clinicopathological characteristic suggesting higher risk of recurrence, disease-specific mortality, or both, or creating uncertainty as to risk stratification. Thus according to three leading recent consensus statements, 61.3–80.6 % of our cohort would have had probable or definite indications for ablation (Table 5).

Table 5 Classification of the study population in relation to the indications for thyroid remnant ablation according to selected DTC clinical guidelines/consensus statements

Second, based on the number and intensity of thyroid bed uptake foci on postablation scans, virtually all (97 %) our patients had gross thyroid remnant after surgery, and a considerable proportion (39.8 %) had a remnant score ≥6, i.e. one of the four highest of nine possible remnant size scores. This finding suggests that the decision to ablate was appropriate in most of our patients, if one accepts the desirability of eliminating thyroid remnant to improve the sensitivity of surveillance for recurrence using Tg testing, dxWBS, or both, or to remove sites of potential polyclonal malignant transformation [19]. Notably, even among the ten patients with a remnant score of 0, indicating no visible thyroid bed uptake on postablation scintigrams, six were TgAb-positive.

Third, in multivariate analyses, two surgery-related factors were significantly independently associated with thyroid remnant size for the entire cohort and for one of the individual centre cohorts. The first of these factors, the referral score, was based on the number of patients referred for ablation per surgeon. This variable presumably reflected surgeon experience in thyroid operations, at least in regions such as ours where few tertiary referral centres for DTC exist and surgeons routinely refer DTC patients for nuclear medicine evaluation. The second of these factor, the number of cervical lymph nodes dissected, presumably reflected the thoroughness of DTC excision. Both surgeon experience and completeness of surgery are well-documented as critical to patient outcome [2025]. Indeed, our observations support the concept of radioiodine ablation as a “backstop” when an appreciable possibility exists of suboptimal surgery [13]. It should be noted, however, that both factors identified as significantly related to remnant size for the entire cohort were also found to be significant only for one of the two centre cohorts, and did not approach P = 0.05 for the other cohort. These observations may raise a caveat regarding the generalizability of the multivariate findings.

The prevalence of certain clinicopathological and data characteristics among our patients merits further comment. The high frequency of serum Tg ≥1 μg/L (observed in 172 of 200 evaluable patients, 86.0 %, i.e. TgAb-negative patients with available Tg data) is unsurprising given the appreciable remnant volume in most of our cohort. The high rate of TgAb positivity (38.1 %, 128 of 328 patients with TgAb data) may also reflect the presence of gross remnant. Additionally, given that these autoantibodies have a half-life of about 10 weeks [26] and the comparable median time between the thyroid operation and TgAb measurement in our patients, TgAb positivity in some patients may have been a transient response to surgically stimulated antigen release into the circulation. However, persistent TgAb positivity increases reliance on dxWBS for follow-up [27], and hence the desirability of remnant ablation.

The present quality assurance survey clearly revealed frequent limitations in surgical treatment in our cohort and deficiencies in histopathological reports regarding many patients. A total of 90 surgeons operated on our patients. Most referred only modest numbers of patients (median 2 [IQR 1–4] for the entire cohort), often creating uncertainty about the surgical protocol and the completeness of resection. A further source of doubt about the completeness of resection, as well as the true extent of disease, was the frequently non-existent or very limited cervical lymph node dissection (no nodes excised in 123 of the 336 patients, 36.6 %, only one or two nodes removed in another 33 patients, 9.8 %). Additionally, due to our surgical referral pattern, our patients’ histopathological reports were from many different institutions, depended on input from surgeons of diverse experience levels, and took many forms. Classifications regarding capsular invasion appeared to be particularly inexact. These observations regarding surgical referrals and histopathology reports suggest that programmes to improve DTC patient care in our regions would be valuable. Such programmes might include efforts to educate generalist surgeons and pathologists regarding DTC, to direct referrals for thyroid surgery to a smaller number of centres, and to standardize and ensure thoroughness of histopathology report contents.

Some limitations of our study should be considered. First, there were appreciable imprecision, heterogeneity and gaps in the data. For example, thyroid remnant size was assessed using an unvalidated, partly subjective scoring system. Our remnant score relied in part on quantification of foci of uptake. Since larger remnant foci with high uptake can effectively obscure further smaller foci, patients with large remnants may have had artificially low remnant scores. This phenomenon might have introduced bias by resulting in conservative remnant scoring. Nonetheless, our remnant size rating relied on the highly sensitive modality of rxWBS, and was conducted by four experienced nuclear medicine physicians. Additionally, our remnant size scoring system was chosen to reflect “real-life” conditions in our countries, where centres typically do not perform uptake measurement, SPECT/CT, or 123I scintigraphy. Additionally, Tg and TgAb results might have been affected by variations in the time since surgery [IQR 52–136 days] and in TSH stimulation method (rhTSH in 50 of the 336 patients, 14.9 %). It should be noted, however, that treatment guidelines [14, 15] basing indications for ablation on Tg or TgAb levels make no distinctions regarding indications based on time since surgery or TSH preparation.

Another gap in our data was ultrasonography results. However, in the initial weeks after surgery, oedema and other signs of inflammation may confound sonogram interpretation. Moreover, a disadvantage of neck ultrasonography is that its accuracy is highly operator-dependent. Most importantly regarding heterogeneity, imprecision, and gaps in our data, they are inherent to a retrospective study. However, these limitations have the virtue of reflecting real-world conditions and real-world inputs to decision-making regarding whether to ablate or not to ablate, the latter a main endpoint of our study.

It should be noted that the present investigation did not address outcomes or harm related to thyroid remnant ablation, including effects on patient quality of life, DTC recurrence, cause-specific or overall survival, or side effects. Additionally, the study took place at tertiary referral centres in the European Union. Our results may not be generalizable to other geographic areas, and may be conservative with respect to less specialized settings or more resource-constrained regions. Further, our cohort included only pT1–2 M0 DTC patients referred for nuclear medicine evaluation, and this that might have biased patient inclusion towards those more likely to have indications for ablation. However, to our knowledge, our local surgeons refer essentially all pT1–2 M0 patients for nuclear medicine evaluation, suggesting that any such bias was unlikely to have materially affected our findings.

In conclusion, our large two-centre retrospective quality assurance study found that patients with putatively “low–intermediate-risk” DTC frequently had higher-risk features, or characteristics confounding risk stratification. This finding suggests that outside international centres of excellence, limitations in surgical experience and completeness and in histopathology reporting may cast important doubt on the classification of such patients as “low-risk” or “intermediate-risk”.

We also noted that our patients often had considerable thyroid remnant despite putative (near-)total thyroidectomy. Our observations suggest that “selective use” of radioiodine ablation even in pT1–2 M0 DTC patients may seldom be feasible outside international centres of excellence; this hypothesis is in line with the finding of a recent systematic review that applicability may be the greatest weakness of current DTC treatment guidelines [17]. However, surgeon experience, as reflected by the number of patients referred for possible ablation, and completeness of surgery, as reflected by the number of cervical lymph nodes excised, may aid in choosing ablation candidates.