Introduction

Differentiated thyroid carcinoma is the most common endocrine neoplasia worldwide with its incidence rising in recent years [1, 2]. In Brazil, it is currently the fifth most common neoplasia in women [3]. The most frequent tumor of this type is papillary thyroid carcinoma (PTC), which in general presents less aggressive behavior and low mortality rates [4].

The most frequently used treatment for PTC is total thyroidectomy (TT) followed by therapy with iodine 131 (131I). After initial treatment, patients are followed up through periodic clinical and ultrasound examinations and especially by monitoring serum thyroglobulin (Tg) concentrations. This thyroid-specific glycoprotein is a highly sensitive and specific tumor marker. Its evaluation can be performed with normal/low TSH (thyroid-stimulating hormone) (BTg – basal thyroglobulin) or with elevated TSH (STg – stimulated thyroglobulin), obtained endogenously or by the application of recombinant TSH [4]. Concomitant with the Tg dosage, investigation of antithyroglobulin antibody (TgAb) levels should always be performed, since this may interfere with the glycoprotein measurement method [5].

Although most PTC cases present good evolution, some evolve unsatisfactorily [6]. Thus, staging systems have been employed to predict which cases will require more aggressive therapy and more rigorous follow-up. After thyroidectomy, patients are initially staged according to risk of recurrence in accordance with the system proposed by the American Thyroid Association (ATA), and of death in accordance with the American Joint Committee on Cancer – AJCC (TNM) [4]. Although useful for predicting recurrence and mortality, these systems only assess the patient at the initial moment and do not take into account any changes occurring during follow-up or response to interventions. In an attempt to resolve this issue, a more recent dynamic classification system has been proposed based on response to initial therapy, which can be used both at the start and during longer-term patient follow-up [4, 7]. In this system, patients are classified into four response categories, namely, excellent, biochemical incomplete, structural incomplete, and indeterminate, according to results from imaging exams and measurement of serum TgAb concentrations, but mainly based on Tg concentrations [4, 7]. Although Tg is a fundamental element in assessing response to therapy, the guidelines do not specify whether the marker should be evaluated with normal/low TSH (BTg) or elevated TSH (STg) [4], probably because it considers the two methods to be comparable.

Thus, the aim of this study was to evaluate PTC patients treated with TT and 131I to establish whether response to therapy 1 year after initial treatment changed with the use of STg in relation to BTg, and, when observed, to evaluate which response was better associated with patient evolution.

Patients and methods

This observational, retrospective cohort study was approved by the Research Ethics Committee of Botucatu Faculty of Medicine (FMB), Sao Paulo State University “Júlio de Mesquita Filho”, UNESP (CAAE no 28223620.1.0000.5411; protocol nº 3.823.265) and reported according to STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) guidelines.

Study population and selection criteria

Data were initially analyzed from 575 differentiated thyroid carcinoma patients, followed up at a tertiary medical service that specializes in the treatment of thyroid carcinoma. We included cases that had undergone initial treatment with TT and 131I between January 2006 and December 2015, a period in which the Tg assay was always the same, with a lower detection limit < 0.2 ng/mL, and which allowed a minimum follow-up of 36 months. Exclusion criteria were the following: patients who did not adhere to the follow-up protocol; patients who failed to perform the proposed tests or were lost to follow-up; and patients with positive TgAb defined by the presence of serum concentrations of the antibody greater than 20.0 IU/mL (chemiluminescence, Immulite 2000, Siemens, Llanberis, Gwynedd, UK). We also excluded patients with structural disease detected by ultrasound (US) or other imaging tests at 1-year follow-up, and those with tumors classified as NIFTP (non-invasive follicular thyroid neoplasm with papillary-like nuclear features). For this, the cases with an anatomopathological diagnosis of follicular variant of papillary thyroid carcinoma had the tumor material reassessed, looking for diagnostic criteria for NIFTP. Thus, 148 cases were selected (Fig. 1), a number that met the minimum value obtained in the sample calculation. Briefly, taking into account that between 52 and 73% (mean 62%) of differentiated thyroid carcinoma cases present absence of disease at their last visit [8, 9] and considering a 95% reliability and an 8% margin of error, the minimum sample number would be 141 patients.

Fig. 1
figure 1

Representative flowchart of the selection process of the patient cases studied. DTC: differentiated thyroid carcinoma; 131I: radioactive iodine; NIFTP: noninvasive follicular thyroid neoplasm with papillarylike nuclear features; PTC: papillary thyroid carcinoma; TgAb + : positive anti-thyroglobulin antibody; TT: total thyroidectomy

Variables of interest and outcomes

The main variable of interest evaluated was therapeutic response obtained 1 year after initial treatment, classified as excellent, biochemical incomplete, or indeterminate [4], performed with the use of BTg and STg. Additional tests routinely performed to assess response to therapy were US and 131I whole-body scans. The main outcome analyzed was the change in response to therapy 1 year after initial treatment, considering STg use in relation to BTg, and the percentage of cases whose response was altered. The STg assessment was performed 1 month after BTg, and this evaluation was performed under endogenous TSH stimulation, 30 days after levothyroxine withdrawal.

Clinical course was also evaluated according to patient clinical status at final evaluation (excellent response, biochemical incomplete, structural incomplete, and indeterminate) and disease-free survival time (time in months during follow-up where an excellent response was maintained, and its percentage in relation to the follow-up time). Additionally, age at time of surgery (years); gender; performance of neck dissection; anatomopathological data (histological subtype; tumor diameter, capsule presence, capsule invasion; multicentricity; lymphocytic thyroiditis; lymph node metastasis; vascular invasion; perineural invasion, and soft tissue invasion); AJCC/TNM staging (8th edition; Stages I to IVb) [10], staging for risk of recurrence as proposed by ATA-2015 (low, intermediate or high) [4], 131I treatment (1st dose and cumulative dose, in mCi); presence of metastases; TSH levels at the time of BTg and STg measurement; and follow-up time (in months).

The TSH and Tg levels were determined by chemiluminescence (DPC, Los Angeles, CA, USA; reference values of 0.40–4.0 μIU/mL and 0.83–68.0 ng/mL, respectively). The analytical and functional sensitivity of Tg was 0.2 and 0.9 ng/mL (for values higher than 2 ng/mL), respectively.

Statistical analysis

Data were tabulated in an Excel® spreadsheet (Microsoft Corporation, USA) and submitted to statistical analysis using SPSS for Windows Version 21. Continuous variables were described as means and standard deviations (SD) or medians and quartiles (p25-p75), while categorical variables were described as numbers and percentages. Since this is a retrospective study, in some cases, not all the parameters to be evaluated were available. Thus, these percentages were calculated considering the number of cases with available information and not with the total number of patients. Student’s t test was used for age and Poisson’s test for variables related to 131I dose. A generalized linear model with gamma distribution was fitted for variables with asymmetric distribution (diameter, concentrations of BTg, TSH, STg, follow-up time, and percentage of follow-up time), followed by multiple Wald comparison. The chi-square test was performed for categorical variables. In addition, differences in the proportion test between groups compared by columns were also analyzed. Considering therapy response, logistic regression was performed to assess the chance of an excellent response at the last visit according to this same response at 1 year, evaluated using BTg or STg [odds ratio (OR); 95% confidence interval (95%CI)]. Kaplan–Meier curves were drawn for follow-up time under excellent response, considering response to therapy at 1 year, obtained through BTg or STg. These curves were compared using the Sidak test. ROC curves (receiver operating characteristic) were also drawn for BTg and STg, evaluated 1 year after initial treatment with the aim of detecting more sensitive and specific cut-off points for predicting an excellent response at final visit. P < 0.05 was considered statistically significant.

Results

Descriptive data are shown in Table 1. Response to therapy 1 year after initial treatment (TT + 131I), considering BTg, was excellent in 94 (67.1%), indeterminate in 43 (30.7%), and biochemical incomplete in three (2.1%) patients. Considering STg, response to therapy was excellent in 89 (63.6%), indeterminate in 43 (30.7%), and biochemical incomplete in eight (5.7%) patients. Twenty-eight cases (20.4%) showed a change in classification when the response to therapy with STg was compared with the response with BTg. Seventeen patients (12.4%) showed worsening therapeutic response: 12 with excellent response evolved to indeterminate and four to incomplete biochemistry, while one case with indeterminate response evolved to incomplete biochemistry. On the other hand, 11 patients (7.4%) showed an improvement in the therapeutic response with the use of TgS, as with an initially indeterminate response, and they were reclassified as having an excellent response. The patients were followed up for 83.43 ± 32.64 (mean ± SD) months. At last visit, response to therapy was considered excellent in 102 (69.9%), indeterminate in 40 (27.4%), biochemical incomplete in three (2%), and structural incomplete in one (0.7%) patient. In other words, 145 cases (99.3%) showed no evidence of structural disease at final follow-up (Table 1). Comparing the 28 cases where response to therapy changed with the 109 where there was no change, with the use of STg in relation to BTg (Table 2), it was observed that groups differed in terms of the presence of perineural invasion by tumor [14.3% vs. 0.0%; p = 0.017]; 1st 131I dose (148.21 ± 48.08 vs. 141.36 ± 49.06 mCi; p = 0.007); cumulative 131I dose (155.36 ± 61.37 vs. 147.45 ± 66.94 mCi; p = 0.002); serum STg concentration (12.68 ± 56.37 vs. 2.23 ± 13.39 ng/mL; p < 0.0001); excellent response to therapy (39.3% vs. 69.7%; p = 0.006) or biochemical incomplete (17.8% vs. 2.7%; p = 0.009) at 1 year, using STg; and worsening response to therapy (60.7% vs. 0.0%; p < 0.001) and percentage of follow-up time under excellent response (41.22 ± 24.43% vs. 49.16 ± 28.56%; p < 0.0001). Comparing cases where response to therapy at 1 year worsened with cases in which no worsening was observed, groups differed in terms of BTg (0.21 ± 0.01 vs. 0.44 ± 0.22 ng/mL, p < 0.0001, respectively), STg (20.55 ± 72.07 vs. 0.52 ± 0.28 ng/mL, p < 0.0001, respectively), and 1-year excellent (88.2 vs. 9.1%, p = 0.0001) or indeterminate (90.9 vs. 11.8%, p = 0.0002) therapy response using BTg. These groups also differed in terms of 1-year excellent (5.9 vs. 90.9%, p < 0.0001) or indeterminate (64.7 vs. 9.1%) therapy response using STg (Supplementary Material, Table 1).

Table 1 Clinical, laboratory, histological, and evolutionary data of patients
Table 2 Comparison between patients with or without changes in the classification of response to therapy 1 year after initial treatment when stimulated thyroglobulin was used in relation to baseline thyroglobulin, regarding clinical, laboratory, histological, and evolutionary variables

Evaluating patient clinical status at final visit, we observed that response to therapy at 1 year, both with BTg (p < 0.0001) and STg (p < 0.0001), was associated with response to therapy at final assessment. Of the three patients who had an incomplete biochemical response at 1 year with BTg, two did not receive additional therapy and maintained an incomplete biochemical response, while one patient received an additional dose of radioiodine and evolved to an indeterminate response at the last consultation. Of the eight cases with incomplete biochemical response in 1 year with the use of STg, two maintained this response in the last evaluation, while two evolved with excellent response and four with an indeterminate response (Supplementary Material, Table 2). Only one of the cases with incomplete biochemical response in 1 year with the use of STg was submitted to an additional dose of radioiodine, having evolved to an indeterminate response in the last evaluation. An excellent response to therapy at 1 year with STg (OR = 4.61; 95% CI: 2.13–9.98) was associated with a greater chance of excellent response at final visit than with BTg (OR = 2.84; 95% CI: 1.33–6.06; Table 3). Excellent response to therapy at 1 year, using both BTg (Fig. 2A) and STg (Fig. 2B), was associated to longer survival time with maintained excellent response (p < 0.0001); this was statistically different from cases with biochemical incomplete or indeterminate response at 1 year [biochemical incomplete ≠ excellent (p < 0.0001); excellent ≠ indeterminate (p = 0.0036); biochemical incomplete and indeterminate did not differ (p = 0.2681), and biochemical incomplete ≠ excellent; excellent ≠ indeterminate (p < 0.0001); and biochemical incomplete and indeterminate did not differ (p = 0.2189), respectively].

Table 3 Univariate logistic regression to assess the chance of excellent response in the last assessment, according to the excellent response to therapy at 1 year, using basal or stimulated thyroglobulin
Fig. 2
figure 2

Kaplan-Meyer curve for survival time under excellent response, considering response to therapy in 1 year using basal or stimulated thyroglobulin. A basal thyroglobulin [(ng/mL); log-rank test; chi-square test: 36.0326; DF: 2; p < 0.0001. Comparison between curves: Sidak test; B ≠ E; p < 0.0001; E ≠ I; p = 0.0036; B and I did not differ (p = 0.2681)]. B stimulated thyroglobulin [(ng/mL); log-rank test; chi-square test: 30.1067; DF: 2; p < 0.0001. Comparison between curves: Sidak test; B ≠ E; E ≠ I; p < 0.0001; B and I did not differ (p = 0.2189)]. B = biochemical incomplete therapeutic response; E = excellent therapeutic response; I = indeterminate therapeutic response

The ROC curve did not show an adequate cut-off point for BTg for predicting excellent response at final visit (area under the curve: 0.462; p = 0.484; 95% confidence interval: 0.347 – 0.576; Fig. 3A). On the other hand, the 0.27 ng/mL cut-off point for STg showed 66% sensitivity and 60% specificity, with an area under the curve of 0.644 (accuracy) for predicting this outcome (p = 0.009; Fig. 3B).

Fig. 3
figure 3

ROC Curves (receiver operating characteristic) to response to therapy at last appointment. A basal thyroglobulin (ng/mL) (area under the curve: 0.462; p = 0.484; confidence interval 95%: 0.347 – 0.576). B stimulated thyroglobulin (ng/mL) (area under the curve: 0.644; p = 0.009; confidence interval 95%: 0.529 – 0.759; cut-off = 0.27 ng/mL; sensibility = 66%; specificity = 60%)

Discussion

In this study of PTC patients, we observed that response to therapy 1 year after initial treatment changed in about 20% of the cases using STg in relation to BTg, proving it to be a better predictor of excellent response to therapy at final consultation.

The response to the therapy tool, proposed by Tuttle et al. in 2010, has proved valuable in PTC patient follow-up [7]. Although one of the main aspects evaluated in this response is serum Tg measurement, it is still controversial whether this should be evaluated in the presence of levothyroxine (BTg) or stimulated by TSH (STg). It is true that in recent years, the emergence of increasingly sensitive Tg assays has led to questions about the real need to use STg. However, this matter is not yet fully clarified. In this study comparing the response to therapy 1 year after the initial treatment using STg and BTg in patients treated with TT and 131I, we observed a change in 20.4% of cases using STg, with a worsening response in nearly 60% of these cases. These results are in agreement with those of Rosário et al. who, evaluating only patients with excellent or indeterminate response, also observed that STg is related to altered response to therapy in approximately 24% of cases, with worsening in approximately 30% of patients [11]. It is important to emphasize that most of the cases with worsening responses evolved from excellent response to indeterminate with the use of STg. Thus, we are talking about a potential slight Tg rise, for example, from < 0.2 ng/mL to 1.0 ng/mL, which could have debatable clinical significance. In fact, although the mean STg in these cases was 12.68 ng/mL, the median was 0.97 ng/mL. However, although we can question the clinical relevance of the slight observed changes in Tg, the criteria used for the reclassification was the one currently applied for assessing response to therapy. Unexpectedly, about 40% of the cases that changed their response to therapy showed an improvement with the use of STg. The reasons for this finding still need to be clarified.

We also observed that response to therapy 1 year after initial treatment with both BTg and STg was associated with response to therapy at final visit. However, this association was stronger with STg: an excellent response at 1 year using STg had a 1.6 times higher chance than BTg of an excellent response at final evaluation. The value of STg in the prognostic evaluation of differentiated thyroid carcinoma has previously been observed by other authors. Its evaluation, post thyroidectomy and pre-ablation with 131I, has shown it can assess both the risk of structural disease [12] and death in the most severe cases [13]. In addition, when assessed 6 to 12 months after 131I, it was associated with the chance of excellent response and disease-free status at final visit [5]. Response to therapy at 1 year has also been shown to be a predictor of long-term outcome [14]. However, the question that generally persists is specifically whether response to therapy at 1 year using STg is a better predictor of evolution than with BTg, as observed in the present study. A recent prospective study comparing BTg with STg after 131I therapy, observed that BTg < 0.1 ng/mL indicates a low risk of recurrence, not requiring stimulation [15]. However, when BTg is between 0.1 and 0.2 ng/mL, stimulation would be beneficial for reclassifying patients and predicting their long-term evolution [15]. The latter study differed from ours in that it did not specifically assess the change in response to therapy with STg in relation to BTg. In addition, it established BTg ranges and evaluated the marker at 3–6 months after 131I, whereas we evaluated it 1 year after initial treatment. The findings of our study disagree with those of another recent study which observed that BTg was a better predictor of long-term structural response. However, this study was also carried out in a different way to ours [16], which makes comparison between them difficult. In fact, the authors evaluated, for example, STg only postoperatively and before 131I therapy, using solely BTg during patient follow-up. Despite some overlap between STg and BTg values in PTC patient follow-up, some authors have reported an underestimation of tumor burden if the patient is followed using only BTg [17]. In contrast, other authors have reported that the differences in sensitivity and specificity between STg and BTg would not be significant and that, therefore, BTg is a better parameter for defining an excellent response due to its stability, low cost, and convenience [18]. However, the latter authors evaluated STg at different moments from those of the present study.

In our study, we also observed that excellent response to therapy at 1 year, with both BTg and STg, was associated with longer survival time under excellent response. These findings are in agreement with other authors who have reported an excellent 1-year therapeutic response related to good long-term prognosis for patients with high-risk differentiated thyroid carcinoma [19]. However, they evaluated progression-free survival time and disease-specific survival and not survival time under excellent response. Other authors have reported that if recurrence occurs in patients with excellent response, it would be much later [20]. In this sense, it is interesting to note that the mean follow-up time in our study was relatively long compared to other similar studies. No studies were found evaluating the association between excellent response at 1 year and survival time maintained with excellent response.

Considering only serum STg concentration 1 year after initial treatment, the 0.27 ng/mL cut-off point displayed good sensitivity, specificity, and accuracy for predicting an excellent response at final visit. Similarly, Ha et al. reported high sensitivity and specificity for predicting structural incomplete response using recombinant TSH with higher glycoprotein concentrations of around 2 ng/mL approximately 6 to 12 months after a dose of 131I [21]. In parallel, high STg concentrations have been associated with worse survival in over 55-year-old differentiated thyroid carcinoma patients classified with high risk of recurrence [3].

Comparing cases in which response to therapy at 1 year changed with those in which it did not change, we observed that in the former group, there was a higher percentage of perineural invasion. However, the clinical relevance of this finding is debatable, as there was only one case in the first group versus zero in the second. Higher percentages of biochemical incomplete response and higher 131I doses were also observed, as well as higher serum STg concentrations 1 year after initial treatment. These findings were expected as they are associated with more severe cases and, consequently, worse prognoses [22, 23]. In addition, cases with altered response had less time with maintained excellent response and those with worsening response exhibited lower BTg than those who did not. In other words, the change in response using STg to assess response to therapy is associated with worse evolution and these patients may have lower BTg, thus falsifying interpretation of these cases.

This study has some limitations, such as its retrospective nature, which may be associated with the loss of important data and a consequent reduction in the number of cases studied. However, those selected had the main necessary information. In addition, the relatively modest sample size may have influenced statistical results. Even so, this number was higher than the initial planned sample calculation. Another limitation would be initial treatment, as all patients underwent TT and 131I. This approach limits extrapolating results to patients who did not undergo this therapy. This is particularly relevant as currently, the therapeutic approach for PTC patients has become increasingly more conservative, with partial thyroidectomies and without radioiodine therapy. However, establishing this selection criterion has served to standardize our patient sample, thus reducing the possibility of other interference in the results. In addition, the dynamic classification system proposed by the ATA [4] considered patients undergoing this therapeutic approach. Another limitation of this study is that many evaluated tumors were relatively small (median diameter of 1.0 cm). Nowadays, most of the patients studied would likely be treated with lobectomy alone. Thus, the results presented in this study should be considered in this context and perhaps could apply only to tumors of smaller diameters. Finally, despite these limitations, our study has the merit of having evaluated a rigorously selected series, with laboratory measurements always performed to the same methodology, with considerable minimum and average follow-up times (3 and 7 years, respectively), which is particularly relevant in the case of monitoring PTC patients, who are known to have insidious behavior and late relapses.

In conclusion, in this group of PTC patients undergoing TT and 131I, response to therapy 1 year after initial treatment, evaluated using STg in relation to BTg, changed in approximately 20% of cases (with worsening response in around 12% of patients), hence being shown to be a better predictor of excellent response at final consultation. However, due to the retrospective nature of the current study, prospective studies are needed to confirm these results.