Introduction

Radioactive iodine (RAI) therapy has been performed in patients with differentiated thyroid carcinoma (DTC) after surgery for several decades. However, conflicting results regarding the indications and appropriate administered RAI activity have been reported by previous studies [1,2,3,4]. The clinical usefulness of RAI therapy remains controversial especially in low-risk or intermediate-risk patient groups [5, 6].

RAI therapy differs from chemotherapy or adjuvant therapy performed for other types of cancers; it targets both residual cancer tissues (adjuvant therapy) and normal thyroid tissues (remnant ablation) [7]. The therapeutic effect is influenced by many physiologic factors related to RAI avidity as well as clinical or pathologic factors. As these variable factors have not been sufficiently considered in both prospective and retrospective studies, the indication of RAI remains a matter of debate.

Compared with follow-up protocols focused on lesion size, such as Response Evaluation Criteria in Solid Tumors (RECIST), the response assessment in patients with DTC after surgery followed by RAI therapy is based on the combination of thyroglobulin (Tg) and imaging studies such as neck ultrasonography (US) or diagnostic radioiodine scan [8]. These follow-up protocols may be underpowered as the classification of the response criteria may vary depending on the combination methods. Furthermore, it could also affect the significance of related factors to response prediction.

In this study, we investigated whether predictive clinicopathologic factors can be affected by different response criteria and how the clinical usefulness of RAI therapy should be evaluated by considering variable factors with a multicenter retrospective cohort.

Materials and methods

Patients

This retrospective study included 4322 patients with DTC who underwent first RAI therapy after surgery (total or near total thyroidectomy with or without central and/or lateral neck compartment dissection) in 25 hospitals (range of included patients in each hospital, n = 40–318) from 2013 to 2014. The patients were 18 years old or older and had no evidence of distant metastasis as shown by pre- or post-therapeutic study. Among these patients, those who met the following criteria were excluded: time interval between surgery and RAI therapy < 2 weeks or > 12 months (n = 22); specific range of administered RAI activity (> 1.11 and < 3.7 GBq) (n = 284); serum TSH level (before RAI therapy) lower than 30 mUI/L (n = 110); low-iodine diet for less than 1 week (n = 111); time interval between RAI therapy and follow-up < 6 or > 24 months (n = 277); no follow-up study with diagnostic radioiodine scan or stimulated Tg (n = 1260); no follow-up study with neck US (n = 571); abnormal anti-Tg antibody (TgAb) at the follow-up study (n = 124) (Fig. 1). Finally, a total of 1563 patients with DTC who underwent first RAI therapy after total or near total thyroidectomy were retrospectively enrolled.

Fig. 1
figure 1

Flow diagram of patient selection. DxWBS, diagnostic iodine whole body scan; sTg, stimulated thyroglobulin

All clinical data from 25 hospitals were collected and managed using electronic case report forms provided by the Internet-based Clinical Research and Trial Management System (iCReaT) of the Korean National Institute of Health. This study was approved by the institutional review boards of each hospital involved, which waived the need for written informed consent.

Radioactive iodine (I-131) therapy

Low-dose (1.11 GBq) or high-dose (3.7 GBq or higher) RAI therapy was administered to the enrolled patients. All patients were prepared with injections of recombinant human TSH (rhTSH) (Thyrogen, Sanofi Genzyme, Cambridge, MA, USA) or prolonged thyroid hormone withdrawal (THW) for RAI therapy. rhTSH (0.9 mg) was administered intramuscularly on 2 consecutive days during thyroid hormone replacement, and RAI was administered on the day after the second rhTSH injection. THW consisted of discontinuation of levothyroxine for 3–4 weeks (or of liothyronine for 14 days). According to the limit of exposure to environmental iodine, patients consumed a low-iodine diet for 1–2 weeks prior to RAI therapy.

Response assessment

The response assessment was performed 6–24 months after RAI therapy. The follow-up protocol included serum Tg measurement, neck US, and radioiodine scan. Neck US was performed by an experienced radiologist, and diagnostic radioiodine whole body scan was performed 1 day after I-123 (n = 883, 56.5%) or 2 days after I-131 administration. SPECT/CT analysis was not conducted in this study. Definitions of responses to therapy followed the 2015 American Thyroid Association (ATA) management guidelines [8]: (1) excellent response (ER), negative imaging and either suppressed Tg < 0.2 ng/mL or TSH-stimulated Tg < 1 ng/mL; (2) indeterminate response (IR), non-specific findings on imaging studies or suppressed Tg that are detectable but < 1 ng/mL or stimulated Tg between 1 and 10 ng/mL; (3) biochemical incomplete response (BIR), negative imaging and suppressed Tg ≥ 1 ng/mL or stimulated Tg ≥ 10 ng/mL or increased TgAb values; (4) structural incomplete response (SIR), structural or functional evidence of disease with any Tg level.

An ER group and non-ER group (including patients showing IR, BIR, or SIR) were classified according to the response to therapy. Additionally, the responses to therapy were also classified into the acceptable response (AR) group (including patients showing ER or IR) and the non-AR group (including patients showing BIR or SIR) for the purpose of excluding non-specific imaging finding and predicting the correlation between short-term response to therapy and long-term prognosis.

Study design

Response to therapy was evaluated with two different protocols based on a combination of biochemical and imaging studies: (1) serum Tg and neck US and (2) serum Tg, neck US, and radioiodine scan. We investigated whether the distribution of the four response categories could be affected by the combination of biochemical and imaging studies. We examined which factors, including patient age, sex, tumor size, multiplicity, extrathyroidal extension, T and N categories, preparation methods (THW vs. rhTSH), and administered RAI activity (low dose vs. high dose), were associated with response to therapy (ER vs. non-ER or AR vs. non-AR). Tumors and lymph node metastases were classified using the staging system of the Union for International Cancer Control (UICC) and the seventh edition of the American Joint Committee on Cancer (AJCC) staging manual.

Statistical analyses

Descriptive quantitative data are presented as means ± standard deviations or ranges. Qualitative data are expressed as percentages. The differences in variables were evaluated by Student’s t tests and chi-square tests for continuous and categorical variables, respectively. Multivariate logistic regression analysis was performed to identify factors significantly predicting response to therapy. P values lower than 0.05 were considered statistically significant. The analysis was performed using IBM SPSS for Windows®, version 21.0 (IBM Corp., Armonk, NY, USA).

Results

Patient characteristics

The patients’ characteristics are summarized in Table 1. The study population comprised more women (n = 1226, 78.4%), and the patients were 18–82 years of age (mean, 49.2 years). The histological DTC subtype was papillary in 1516 cases (97.0%). The mean diameter of the primary tumor was 1.3 ± 0.9 cm (range, 0.1–12.0 cm). According to the AJCC staging system, most pathologic results were in the T3 (n = 1002, 64.1%) and N1a (n = 738, 47.2%) categories. The interval between surgery and RAI therapy was 76.8 ± 40.0 days (range, 19–361 days). Among the enrolled patients, 1031 (66.0%) underwent THW in preparation for RAI therapy. High-dose (3.7 GBq or higher) RAI therapy was administered in 1121 patients (71.7%). The interval between RAI therapy and follow-up was 272.4 ± 73.0 days (range, 180–725 days).

Table 1 Patients’ characteristics (n = 1563)

Distribution of response to therapy depending on the combination of biochemical and imaging studies

When response to therapy was evaluated using serum Tg and neck US, 1196 patients (76.5%) were classified into the ER group and 273 (17.5%) into the IR group based on the 2015 ATA management guidelines (Fig. 2). The BIR and SIR groups included 55 (3.5%) and 39 (2.5%) patients, respectively. On the other hand, 932 patients (59.6%) were classified into the ER group and 530 patients (33.9%) were classified into the IR group in response assessment with serum Tg, neck US, and radioiodine scan. There was a significant difference in the distribution of the response criteria according to the presence of radioiodine scan (P < 0.001). It suggested that a significant number of the patients in the ER group were transferred to the IR group when radioiodine scan was added to the follow-up protocol.

Fig. 2
figure 2

Distribution of response to therapy depending on the combination of biochemical (thyroglobulin, anti-thyroglobulin antibody, TSH) and imaging studies (n = 1563). a Biochemical study and neck ultrasonography. b Biochemical study, neck ultrasonography, and diagnostic radioiodine scan. ER, excellent response; IR, indeterminate response; BIR, biochemical incomplete response; SIR, structural incomplete response

Clinicopathologic factors for the prediction of response to therapy (Tg and neck US)

Univariate analysis was performed to evaluate the clinical or pathologic variables significantly associated with an ER or AR based on the follow-up protocol using serum Tg and neck US (Table 2). An ER was observed in 1196 patients (76.5%). Female sex (P < 0.001), smaller tumors (P < 0.001), single lesions (P = 0.039), no extrathyroidal extension (P = 0.002), and the T (P = 0.003) and N (P < 0.001) categories significantly predicted an ER. However, preparation methods (P = 0.713) and administered RAI activity (P = 0.872) did not significantly predict an ER.

Table 2 Clinicopathologic factors for the prediction of therapeutic response (Tg and neck US)

An AR was observed in 1469 patients (94.0%). Older age (P = 0.011), smaller tumors (P = 0.024), no extrathyroidal extension (P = 0.049), and the T (P = 0.011) and N (P < 0.001) categories significantly predicted an AR. However, sex (P = 0.138), multiplicity (P = 0.362), preparation methods (P = 0.262), and administered RAI activity (P = 0.738) did not significantly predict an AR.

Clinicopathologic factors for the prediction of response to therapy (Tg, neck US, and radioiodine scan)

Univariate analysis was also performed to evaluate the clinical or pathologic variables significantly associated with an ER or AR based on the follow-up protocol using serum Tg, neck US, and radioiodine scan (Table 3). An ER was observed in 932 patients (59.6%). Female sex (P = 0.002), the N category (P = 0.022), preparation methods (P < 0.001), and administered RAI activity (P < 0.001) significantly predicted an ER. However, tumor size (P = 0.114), multiplicity (P = 0.109), extrathyroidal extension (P = 0.102), and the T category (P = 0.128) did not significantly predict an ER.

Table 3 Clinicopathologic factors for the prediction of therapeutic response (Tg, neck US, and iodine scan)

An AR was observed in 1462 patients (93.5%). Older age (P = 0.026), smaller tumors (P = 0.033), no extrathyroidal extension (P = 0.029), and the T (P = 0.005) and N (P < 0.001) categories significantly predicted an AR. In contrast, sex (P = 0.119), multiplicity (P = 0.407), preparation methods (P = 0.342), and administered RAI activity (P = 0.742) did not significantly predict an AR.

Multivariate analysis of response to therapy prediction-related parameters

In the multivariate logistic regression analysis (Table 4) for enrolled patients considering only the two tests, serum Tg and neck US, female sex (odds ratio (OR) 1.520; 95% confidence interval (CI) 1.147–2.014; P < 0.004), tumor size (OR 1.222; 95% CI 1.061–1.407; P = 0.005), the N category (OR 4.122; 95% CI 2.776–6.122; P < 0.001), and administered RAI activity (OR 1.737; 95% CI 1.268–2.379; P = 0.001) significantly predicted an ER. On the other hand, the N category (OR 7.429; 95% CI 3.539–15.595; P < 0.001) and administered RAI activity (OR 2.378; 95% CI 1.384–4.086; P = 0.002) could predict an AR.

Table 4 Multiple logistic regression analysis of therapeutic response prediction-related parameters

When the multivariate logistic regression analysis was performed for enrolled patients considering all three tests, serum Tg, neck US, and radioiodine scan, female sex (OR 1.403; 95% CI 1.084–1.817; P = 0.010), the N category (OR 2.085; 95% CI 1.459–2.981; P < 0.001), preparation methods (OR 2.129; 95% CI 1.687–2.685; P < 0.001), and administered RAI activity (OR 3.854; 95% CI 2.917–5.091; P < 0.001) significantly predicted an ER. Finally, age (OR 0.982; 95% CI 0.966–0.999; P = 0.044), the N category (OR 5.582; 95% CI 2.803–11.115; P < 0.001), and administered RAI activity (OR 2.205; 95% CI 1.304–3.731; P = 0.003) could predict an AR.

Discussion

Through a multicenter retrospective study with a large population, we attempted to investigate whether predictive clinicopathologic factors can be affected by different response criteria and how the clinical usefulness of RAI therapy should be evaluated by considering variable factors. Our study showed that the distribution of response to therapy could differ according to the follow-up protocols. The proportion of patients in the ER group significantly decreased when radioiodine scan was added to the follow-up protocol. The clinical impact of the preparation method and sex also differed depending on the follow-up protocol or the classification of the response criteria. However, a high dose of RAI was a significant factor predicting a favorable response to therapy regardless of the follow-up protocol or the classification of the response criteria.

RAI (I-131) therapy has been performed in patients with DTC after total thyroidectomy to facilitate detection of recurrent disease and initial staging or to improve disease-free and disease-specific survival by treating suspected or persistent diseases [8]. Although RAI therapy is usually recommended for these purposes, the potential benefits and optimal patient selection for RAI have not been consistent across the available studies [9,10,11]. Some authors have reported a benefit of RAI therapy in patients with non-metastasized microcarcinomas [3, 4], whereas other studies have shown no benefit [1, 5], although there is a tendency for larger cohorts and longer follow-up durations to be loosely associated with improved outcomes [9, 10, 12].

Administered RAI activity and preparation methods have been issued for validating the effectiveness of RAI therapy. In ATA low-risk disease, the rate of ablation success with an administered activity of 1.11 GBq has been reported to be non-inferior to 3.7 GBq after preparation with THW or rhTSH [13, 14]. Recently, a long-term follow-up study of a prospective trial showed that 98% of patients with low-risk DTC who received RAI therapy had no evidence of disease, independently of the preparation method or administered RAI activity for RAI therapy; thus, confirming that the combination of rhTSH and low-dose (1.1 GBq) RAI can be safely used in these patients [15]. In contrast, another study showed that even treatment with low-dose RAI may lead to a worse prognosis in patients with low-risk DTC compared with treatment with high-dose RAI [16]. This is one of the reasons why the European Association of Nuclear Medicine advised caution in altering long-established and successful practice until sufficient evidence is available indicating that it is safe to omit postoperative RAI therapy in patients with non-microcarcinoma against the 2015 ATA management guidelines [17].

Although previous studies have considered the same disease and the same therapeutic tools, the variability of the results indicates that many sources of heterogeneity exist in DTC patient selection and response to therapy assessment [18]. Therefore, it is very important to identify RAI-related factors and correct for them to precisely evaluate the effectiveness of RAI therapy. RAI-absorbed activity at the suspicious lesion is usually influenced by many physiologic factors such as kidney function, blood volume, height, and weight in addition to NIS expression [19]. For prospective study, variable clinical or pathologic factors should be considered for randomization, but some of these factors are not particularly related to the characteristics of RAI and some factors could be underestimated. Furthermore, many patients who were enrolled in prospective studies could have had very limited remnant tissues before RAI therapy and might have had a good prognosis with surgery alone. However, risk factors to measure the amount of remnant thyroid or hidden malignant tissues (such as Tg) were not considered in the study design of several prospective trials [13,14,15].

To compare the effectiveness of RAI therapy between low and high dose, it is also important to simultaneously correct for variable factors, including physiologic, clinical, and pathologic factors. Sabra et al. reported that older papillary thyroid cancer patients with nodal metastasis tended to respond better with escalating RAI-administered activities [19], possibly signifying that RAI metabolism differs according to patient age, which should be considered when evaluating the effectiveness of RAI therapy. Our multivariate analysis showed that administered RAI activity could predict both ER and AR regardless of the follow-up protocol, although it was not significant in the univariate analyses (except for ER prediction with radioiodine scan). Naturally, our data included some patients with high risk, but since their proportion was very small, the correction for multiple factors such as age, sex, and the T and N categories could have affected our results.

It is assumed that a higher administered RAI activity will on average lead to greater therapeutic efficacy by delivering higher absorbed doses to target lesions [20,21,22]. Simultaneously, the absorbed dose to non-target tissue will also be greater with increasing administered activities, thus potentially leading to a higher incidence of adverse effects such as sialadenitis [23]. It suggested that administered RAI activity selection should balance therapeutic efficacy with potential side effects. Although our results showed that a high administered activity of RAI could predict a favorable outcome, well-designed, long-term follow-up studies are necessary to recommend the specific range for the administered RAI activity in selected patients, and patient selection could be based on the factors discussed in this study.

The preparation method could significantly predict ER with radioiodine scan, which suggested that the ER rate of rhTSH significantly decreased regardless of administered RAI activity or other factors. In clinical practice, remnant bed uptakes are sometimes observed on follow-up radioiodine scan even when the patients have an undetectable level of stimulated serum Tg. Several studies have reported that there is a relationship between the use of rhTSH and visible uptake on diagnostic radioiodine scan [24, 25]. According to the ATA guidelines, DTC patients with non-specific thyroid bed uptake on follow-up radioiodine scan are allocated to the IR group, even if they have an undetectable level of stimulated serum Tg. In our study, many of the patients in the ER group were transferred to the IR group possibly due to thyroid bed uptake when radioiodine scan was added to the follow-up protocol. The relationship between rhTSH preparation and response to therapy was not significant for AR prediction as opposed to ER prediction. Taking the follow-up protocol into account, AR prediction could be more related to long-term prognosis, considering that some studies have shown that there were no remarkable differences in clinical outcomes in DTC patients with thyroid bed uptake despite undetectable stimulated Tg levels [26].

Risk factors other than the preparation methods could be affected according to the classification of the response criteria. Female sex could predict an ER but not an AR regardless of the follow-up protocol (Table 4). Men have been reported to have less favorable outcomes, although the incidence of thyroid cancer in women is higher than that in men [27]. If the response criteria were classified into ER (simulated Tg < 1 ng/mL) and non-ER, only patients with a very low Tg level would have been included in the ER group, with the proportion of patients in the non-ER group and the discriminative power of risk factors such as sex increasing. In contrast, if the response criteria were classified into AR and non-AR, the discriminative power of risk factors could have significantly decreased because of the very small proportion of patients with non-AR.

Our study has several limitations. Although we attempted to correct for variable types of risk factors by applying multivariate analysis, some extent of selection bias was inevitable because this study had a retrospective design. Especially, the low-dose group only included patients with 1.11 GBq, whereas the high-dose group included patients with 3.7 GBq or higher, which could have affected the study results and make it difficult to suggest a specific administered RAI activity for favorable outcome. Second, serum Tg or TgAb was measured in each participating hospital, and there was no central assessment. We could not centrally collect or assess follow-up radioiodine scan data, although typical radioiodine scan patterns were categorized and shared with the researchers at each hospital. Instead, we added the response criteria (AR vs. non-AR) to minimize bias of image interpretation, and the administered RAI activity significantly predicted AR as well as ER. Finally, several known prognostic factors such as pre-ablation Tg, capsular invasion, characteristics of metastatic lymph nodes, and mutation profiles were not included in this study because these data were very heterogeneous among the enrolled hospitals. Well-designed, long-term follow-up studies are needed to overcome these limitations.

Conclusion

The distribution of response to therapy could differ according to the follow-up protocols. Especially, the proportion of patients in the IR group significantly increased when radioiodine scan was added to the follow-up protocol. The clinical impact of factors related to response prediction such as preparation methods for TSH stimulation and sex differed depending on the follow-up protocol or classification of response criteria. A high dose of RAI was a significant factor predicting a favorable response to therapy regardless of the follow-up protocol or classification of response criteria, suggesting that correction for multiple factors is crucial to precisely evaluate the clinical effectiveness of RAI therapy.