Introduction

Radiofrequency ablation (RFA) is a safe and effective treatment modality to patients with benign thyroid nodules by guidelines [1, 2]. Many studies have shown that the volume reduction after ablation is significant with clinical improvement of local symptoms or cosmetics [3,4,5,6,7,8,9]. Although the anatomy of the thyroid gland and neck is complex, the complication rate after RFA for benign nodules is only 2.11% and no life-threatening complications occur [10].

There are several parameters for post-ablation evaluation, like volume reduction rate (VRR), therapeutic success rate, and cosmetic/symptom scores, most of which mainly based on gray-scale and color Doppler ultrasound (US). With recent extensive studies, some novel parameters have emerged. When dividing the total volume (Vt) of ablated nodule after RFA into the necrotic ablated volume (Va) and the incompletely treated vital volume (Vv) [11, 12], Sim et al [11] found that increased Vv could be an early sign of nodule regrowth after RFA. A quantitative index, the initial ablation ratio (IAR) calculated by the ratio of Va to Vt at the first follow-up period, predicted therapeutic success after RFA [13]. All these parameters needed Va for calculation, identified on US as a decreased hypoechoic zone without vascularity in the treated nodule [11, 13]. However, the boundary between the ablated and vital zone is not easily differentiated on US, making Va measurement potentially inaccurate.

Contrast-enhanced ultrasound (CEUS) is a contrast harmonic imaging technique that allows the detection and characterization of focal lesions by assessing the micro-vascularization with a second-generation contrast agent [14,15,16,17]. Compared with US, CEUS is a superior method for the detection of microvasculature circulation dynamics and is useful for precise definition of the size and margins of the necrotic zone induced by ablation [18]. CEUS has been used to evaluate the complete ablation immediately after the procedure. It has not been used to measure Va after thyroid ablation. To our knowledge, no study has compared the measurement of Va using US with CEUS during the follow-up of RFA for benign thyroid nodules.

Therefore, the purpose of this study was to investigate the intra- and inter-observer concordance and agreement between US and CEUS in measuring Va after RFA for benign thyroid nodules.

Materials and methods

This retrospective study was approved by the Institutional Review Board of Chinese PLA Hospital. Written informed consent was obtained from all the patients prior to RFA and CEUS.

Patients

All the enrolled patients fulfilled these inclusion criteria: (1) confirmation of benign nodule status on two separate fine-needle aspiration (FNA) or core-needle biopsy (CNB); (2) no suspicious malignant features on US examination; (3) patients with solid or predominantly solid nodules; (4) indication to treat, due to cosmetics, clinical symptoms, or rapid growth; (5) serum thyroid hormone and thyrotropin levels within normal ranges; (6) refusal or ineligibility for surgery; (7) follow-up time of ≥ 12 months; (8) underwent CEUS during the follow-up. Exclusion criteria were as follows: (1) malignancy findings or follicular neoplasm on FNA or CNB; (2) nodules with benign result on FNA or CNB had suspicious of malignancy in US; (3) follow-up time of < 12 months; (4) refused CEUS during the follow-up.

From August 2014 to December 2018, 517 patients with benign thyroid nodules underwent RFA in this institution. Among them, patients who refused CEUS (N = 295) or follow-up time of < 12 months (N = 103) were excluded. At last, 173 patients with 190 benign thyroid nodules were enrolled in this study. The flowchart of patient enrolment is shown in Fig. 1.

Fig. 1
figure 1

Flowchart of patient enrolment

Pre-ablation assessment

US and CEUS before and after RFA, as well as during follow-up, were performed using an Acuson Sequoia 512 (Siemens Healthineers) with a 15L8W linear array transducer or a iU22 (Philips Medical Systems) with a L12-5 linear array transducer or an M9 (Mindray) with a L12-4 linear array transducer.

US-guided RFA was always performed using the Acuson Sequoia 512 with a 6L3 linear array transducer.

CEUS evaluated the ablated zone of the nodule immediately after RFA and in the follow-up. CEUS was performed after bolus injection of 2.4 ml of SonoVue (Bracco), followed by a 5 ml of normal saline flush.

Before treatment, the volume of thyroid nodules was calculated by the ellipsoid formula: V = πabc/6 (V is the volume, while a is the largest diameter, b and c are the other two perpendicular diameters). The nodules were divided into two subgroups (< 10 ml and ≥ 10 ml) based on the initial volume. Symptom score was self-measured by patients using a 10-cm visual analogue scale (grade 0–10) [2]. The cosmetic score was assessed by a physician (1, no palpable mass; 2, no cosmetic problem but palpable mass; 3, a cosmetic problem on swallowing only; and 4, a readily detected cosmetic problem) [2].

Ablation procedure

All RFA procedures were performed by an experienced US physician with more than 20-year experience in thyroid US and interventional US (Y.K.L.). A bipolar RFA generator (CelonLabPOWER, Olympus Surgical Technologies Europe) and an 18-gauge bipolar RF electrode with 0.9-cm active tip were used (CelonProSurge micro 100-T09, Olympus Surgical Technologies Europe) in this study.

Patients layed on an operating table in the supine position with the neck extended. Local anesthesia with 1% lidocaine was administered. RFA was performed using the trans-isthmic approach, hydrodissection technique, and moving-shot technique. CEUS was performed immediately after the RFA procedure to evaluate the ablation area. If any enhancement existed, a complementary ablation could be performed. Each patient was observed for 1–2 h in the hospital while any adverse event including complication and side effect occurring during and immediately after ablation was evaluated [19].

Post-ablation measurement of Va and assessment

Two physicians (observer A, Y.L. with more than 10 years’ experience in thyroid US and CEUS; observer B, X.J. with 3 years’ experience in thyroid US and CEUS) performed all the measurements. Before this study, the two observers standardized the measurement method. Va was defined as a decreased hypoechoic zone in the treated nodule without Doppler signal on US [11, 13]. On CEUS, it presented as a non-enhancement zone within the treated nodule during both the arterial phase and venous phase. The anteroposterior and transverse diameters of Va were measured on the transverse US image with the largest dimensions, and the longitudinal diameter was measured on the longitudinal US image with the largest dimensions. Va was measured with the calipers placed outside of the halo [20].

Patients were scanned consecutively by the observers during the same visit. Only one observer was present in the ultrasound room at any time. For each patient, each observer performed a complete new set of scans for measurement, consisting of US and CEUS, without knowledge of the other one’s results. During the examination, US was performed first. The three diameters of the hypoechoic zone in the treated nodule were measured twice to calculate the means of each observer. Then the CEUS mode was switched. The real-time microbubble perfusions within ablated nodule and surrounding tissues were observed for a minimum of 2 min and recorded digitally for further analysis. After CEUS images were reviewed, the three diameters of the non-enhancement zone during both phases were measured twice to calculate the means of each observer. When two observers finished examinations, the means of each measurement modality were calculated based on the means of the two observers. Thus, a total of 6 measurements were obtained for each nodule at each follow-up period. The cost and measurement time of each modality were also recorded.

After RFA, patients were followed up at 1, 3, 6, 12 months, and every 12 months thereafter. The nodule volume, Va, VRR, cosmetic, and symptom scores were evaluated during the follow-up period. The volume reduction was calculated as follows: VRR = ([initial volume - final volume] × 100%)/initial volume. Therapeutic success was defined as a > 50% volume reduction at the last follow-up point [19].

Statistical analysis

Statistical analysis was performed using the SPSS statistical software (version 25.0) and GraphPad Prism (version 8.0.0) software. Continuous data were expressed as mean ± SD (range). Wilcoxon signed-rank tests were used for pairwise comparisons. Reliability was defined as the extent to which measurements can be replicated, which reflects not only the degree of correlation but also an agreement between measurements [21]. The intra- and inter-observer reliability was assessed using intraclass correlation coefficient (ICC) with 95% confidence intervals (CIs) based on absolute agreement, two-way random effects model. Reliability was classified as follows: excellent (ICC > 0.90), good (ICC = 0.75–0.90), moderate (ICC = 0.5–0.74), and poor (ICC < 0.50) [21].

The Bland-Altman analysis was used to evaluate the pairwise agreement of the two measurement modalities. An agreement was expressed as the mean difference with 95% limits of agreement (LOA, mean difference ± 1.96 SD). The mean difference also called bias was the tendency for one modality to underestimate or overestimate the measurement relative to the other [22]. LOA was the range within which 95% of the differences between measurements by the two modalities would lie [23] and expressed the absolute magnitude of the agreement between the two modalities. The width of LOA varied with the precision of the measurements. LOA was wider when measurements were imprecise and vice versa [24]. Before the Bland-Altman analysis, the Kolmogorov-Smirnov test was used to assess the normality of the distribution. If a non-normal distribution was shown, a logarithmic transformation was performed, and the Bland-Altman analysis was applied to the transformed data. Antilogarithm was performed to obtain values relating to the ratios of measurements by the two modalities to fully understand the LOA and easily interpret the results of the Bland-Altman analysis [23, 24]. Moreover, the conclusion on agreement should be made based on the width of LOA in comparison to a priori–defined clinical criteria [24, 25]. Because no study has evaluated the agreement of the two modalities on measurements, the clinical criteria of thyroid nodule volume by the ellipsoid formula were used as a reference, which was reported between ± 13.1 and ± 48.6% [26,27,28]. Therefore, the acceptable agreement in this study should be a LOA ranged from 0.5 to 1.5. The intra- and inter-observer reliability and agreement analysis were performed on the total number of nodules, and then on the subgroups defined by the initial volume before RFA, namely < 10 ml and ≥ 10 ml. The Wilcoxon signed-rank tests were used for pairwise comparisons. Differences with p < 0.05 were considered statistically significant.

Results

The clinical characteristics of patients are presented in Table 1. A total of 173 patients (mean age 46.35 ± 12.06 years) with 190 benign thyroid nodules (initial volume of 9.90 ± 12.85 ml) were enrolled in this study. The number of nodules < 10 ml was 129 and ≥ 10 ml was 61. All patients underwent a single session ablation.

Table 1 Clinical characteristics of patients before RFA

During RFA, the mean power was 6.75 ± 2.95 W. The mean RFA time was 385.87 ± 269.32 s and the mean energy was 2390.83 ± 1944.19 J.

The cost of one US and CEUS examination in our country was 16.96 USD and 142.28 USD, respectively. The mean measurement times of each modality by two observers at each follow-up period are showed in Table 2. The measurement times by CEUS were significantly longer than that by US at each follow-up period (all p < 0.001).

Table 2 The measurement time of CEUS and conventional US by observers at each follow-up period

Efficacy

During a mean follow-up time of 23.17 ± 12.70 months (range 12–67 months), the volume decreased significantly from 9.90 ± 12.85 to 2.20 ± 4.51 ml (p < 0.001) with a VRR of 85.63 ± 14.27%. At the last follow-up, the therapeutic success rate was 97.37% (185/190). Symptom score significantly decreased from 2.71 ± 2.15 to 0.94 ± 1.16 (p < 0.001). The cosmetic score significantly decreased from 2.47 ± 1.20 to 1.33 ± 0.56 (p < 0.001).

Safety

All the patients were tolerable to the RFA procedure. Side effects like local pain occurred in 16 patients (9.25%) and resolved spontaneously within 3 days. No complications occurred during or after RFA. No patients had side effects or delayed complications related to CEUS.

Intra- and inter-observer reliability

The measurement methods of Va by CEUS and US are shown in Fig. 2 and the results by these two modalities are summarized in Table 3. The measurements by US during the follow-up were all significantly larger than those by CEUS (all p < 0.001). Representative cases are shown in Figs. 3 and 4. The intra- and inter-observer reliability of the two measurement modalities are presented in Table 4. The intra- and inter-observer reliability for all the nodules decreased over follow-up time: they were excellent at 1 month, good at 3–6 months, and moderate at 12–24 months, respectively. The inter-observer reliability in nodules < 10 ml was excellent at 1 month (ICC = 0.902), which was only good in nodules ≥ 10 ml (ICC = 0.894). The intra- and inter-observer reliability became moderate in nodules ≥ 10 ml at 6 months (ICC = 0.744) and in nodules < 10 ml at 12 months (ICC = 0.743), respectively.

Fig. 2
figure 2

The measurements of ablated volume (Va) on US and CEUS images of a benign thyroid nodule at 6 months after RFA. a–c The longitudinal and transverse US images showed Va present as a decreased hypoechoic zone (arrows) without color signal. d, e CEUS showed Va was a non-enhanced zone during both arterial phase and venous phase. Diagrams (f, g) showed the measurement method and the relationship of total volume (Vt), Va, and vital volume (Vv). Va measured by US (arrows) was 2.10 ml and Va measured by CEUS was 1.13 ml. Va was much larger than Va

Table 3 The measurement of Va by CEUS and conventional US
Fig. 3
figure 3

US and CEUS images of a benign thyroid nodule at 3 months after RFA. ac US showed a relatively clear boundary between Va and Vv. However, Va (1.58 ml) measured by US (arrows) was much larger than Va (0.94 ml) measured by CEUS (d, e)

Fig. 4
figure 4

US and CEUS images of a benign thyroid nodule at 24 months after RFA. ac US showed that the boundary between the Va and Vv was not easily differentiated. Va (0.04 ml) measured by US (arrows) was much larger than Va’ (0.01 ml) measured by CEUS (d, e)

Table 4 The intra-observer and inter-observer reliability of the two measurement modalities

Agreement

The Bland-Altman analysis of the measurement between the two modalities during the follow-up is shown in Table 5 and Fig. 5. After antilogarithm, the mean differences were all above one and became larger during the follow-up period. LOA also became wider during the follow-up period. The best agreement between the two modalities was in nodules < 10 ml at 1 month with a mean difference of 1.166 and a LOA of 0.413 to 3.294. It means that for approximately 95% of cases, the measurements by US were between 0.413 and 3.294 times the measurements by CEUS, which were larger than the clinical criteria. This was applied to all the reported LOAs hereinafter.

Table 5 Bland-Altman analysis of the agreement between conventional US and CEUS
Fig. 5
figure 5

Bland-Altman plots of agreement of measurements between the two modalities during the follow-up. The Bland-Altman plots of Va by two modalities at 1, 3, 6, 12, and 24 months after RFA were showed from (a) to (e). Logarithmic transformation was used to show the data. The x-axes showed the log means of two measurement modalities of Va. The y-axes showed the log differences between the measurements. Solid lines were the mean difference (bias). Top and bottom dashed lines correspond to upper and lower margins of 95% limits of agreement (LOA)

Discussion

This study investigated the intra- and inter-observer reliability and agreement between US and CEUS in measuring Va after RFA for benign thyroid nodules. The results showed the Va measured by US were significantly larger than measured by CEUS. The intra- and inter-observer reliability and agreement of the two modalities decreased over follow-up time. The intra- and inter-observer reliability became moderate after 12 months of RFA. In terms of agreements, compared with CEUS, the measurements were overestimated by US. The best agreement was found at 1 month and LOA exceeded the clinical criteria. These results demonstrated US could be neither highly reliable nor provide equivalent results compared to CEUS in the measurement of Va. Therefore, US could not replace CEUS for the measurement of Va.

Although CEUS is a superior method for precise definition of the ablated necrotic zone [18], it has not been widely used to measure Va after thyroid ablation. Some researchers believed that it could be easy to detect the margin of Va in the treated nodules on US [29]. Other studies found that it was difficult to clarify the boundary between the ablated and vital zone on US, and CEUS should be used [13, 30]. At present, no studies have been investigated the reliability and agreement of the US with CEUS in the measurement of Va after RFA. This study found the intra- and inter-observer reliability of the two modalities decreased over the follow-up period. The intra-observer reliability of the two observers was similar, which indicated the experience of observer did not affect the reliability. The intra- and inter-observer reliability were good to excellent in nodules < 10 ml at the first 6 months and were good in nodules ≥ 10 ml at the first 3 months. This indicated the reliability of nodules < 10 ml was better than that of nodules ≥ 10 ml. However, they all became moderate at 12 months after RFA, which suggested that even in the case of nodules with small volume, the intra- and inter-observer reliability was not considerable after 1 year of RFA.

The Bland-Altman analysis was used to evaluate the agreement between the two modalities. This study showed the mean difference was all above one and became larger during the follow-up period. This indicated the measurements by US were all overestimated, which were consistent with the significantly larger measurements by US compared with CEUS. Given that no studies reported the clinical criteria of the two modalities, the conclusion on agreement in this study was established based on the clinical criteria of thyroid nodule volume. Choi et al [27] performed Bland-Altman analysis to evaluate the agreement of thyroid nodule volume and found that the LOA was ± 13.1%. However, Lee et al [26] also used Bland-Altman analysis for evaluation and reported a higher LOA of 44.1%. Brauer et al [28] used a logarithmic method to estimate the inter-observer variation of 48.6% in nodule volume and suggested the volume changes of less than 50% should be considered the measurement variation, which was also recommended by the 2015 American Thyroid Association Guidelines [1]. Therefore, in this study, the LOA ranging between 0.5 and 1.5 was used as the clinical criteria. The best agreement in this study was found in nodules < 10 ml at 1 month and LOA ranged between 0.413 and 3.294, which was much larger than the clinical criteria. Moreover, LOA of all the nodules became much wider and all larger than the clinical criteria during the follow-up period, which indicated that the agreement between the two modalities was unsatisfactory.

Accurate detection and measurements of the true treated volume were essential for a successful evaluation [14]. In recent years, some novel parameters have emerged to evaluate the efficacy of ablation, which were all based on the measurement of Va. Sim et al [11] found that Vv increased occurred about 1 year prior to the nodule regrowth and suggested that Vv increased could be an early sign of regrowth. However, because Vv was equal to Vt minus Va, the overestimated Va with lack of reliability and agreement could have an inevitable impact on the evaluation of nodule regrowth. In addition, a quantitative index IAR was determined as the ratio of Va to Vt at the first month after RFA to predict the therapeutic success of ablation [13]. If IAR was larger than 70%, therapeutic success of ablation could be expected. Although the intra- and inter-observer reliability at 1 month was good to excellent, depending on the initial volume of nodules, US could show an unsatisfactory agreement with CEUS on measurement. Because of the larger Va measured by US, the IAR could be overestimated in some nodules, which could affect the prediction of therapeutic success and the follow-up management. To date, there is no consensus about the indications of CEUS for benign thyroid nodules after thermal ablation. US, as a cost-effective and non-invasive measurement modality, is the most common method to evaluate thyroid nodules, both in the pretreatment setting and after treatment [31]. However, its sensitivity and specificity are not susceptible to low-contrast differences between the blood and tissues [32], and color Doppler is not sensitive enough to detect slow and low-volume flow at the level of perfusion [33]. In contrast, despite CEUS being a relatively expensive and time-consuming technique, it can overcome the limitations of US by displaying the parenchymal microvasculature and assess vascularization and tissue perfusion on microbubble contrast agents [14, 16, 34]. CEUS can differentiate the necrotic ablated zone from the ablated nodule clearly [14, 16, 34], which can be an effective modality for the evaluation of efficacy. Therefore, when Va is needed to calculate the novel parameters of efficacy, such as IVR for technique success, or Vv for prediction of regrowth, CEUS should be applied for accurate measurement. Moreover, studies have shown that several factors are related to nodule regrowth after ablation, such as a large initial volume [11, 12, 35, 36], solidity [36], location close to critical structure [12, 35, 37], abundant vascularity [35, 38], low energy applied per volume [35], and 12-month VRR < 50% [39]. If nodules with any of these factors are suspected of regrowth, CEUS should also be considered.

In terms of safety, CEUS has been performed safely in various applications with minimal risk to the patients [17]. The US contrast agent has demonstrated an excellent safety profile with no specific renal, cerebral, or liver toxicity, which allows for repeated administration in the same session when needed [16, 33, 40]. Complications caused by CEUS are rare, and the most frequent adverse events are headache, nausea, chest pain, and chest discomfort [17]. In this study, all the patients were tolerable to CEUS and no complications occurred.

There were some limitations in this study. First, it was a single-center study. Future multicenter studies are needed to confirm our study. Second, the sample size was relatively small. Considering the numbers of each subgroup, the nodules in this study were not divided into three subgroups, which was recommended by the recent reporting criteria of thyroid ablation [16]. Third, the follow-up time was relatively short. Nevertheless, this study showed the intra- and inter-observer reliability and agreement decreased over a mean follow-up time of 23.17 ± 12.70 months. The study will be continued to follow up these patients to obtain more conclusions. Fourth, this study did not compare the intra- and inter-observer reliabilities and agreement between US, CEUS, and microvascular imaging techniques, such as superb microvascular imaging (SMI) in measuring Va. SMI, a recently developed modality, had good sensitivity to differentiate between slow blood-velocity flow signals and movement artifacts within the lesion [41]. SMI not only displayed microvasculature better than US in thyroid nodules but also was a convenient, noninvasive, and cost-effective technique for the patients [41, 42]. Zhao et al [43] reported that SMI alone was sufficient for evaluation of blood flow in thyroid nodules, and its diagnostic value was comparable to the hypo-enhancement in CEUS for differentiating thyroid cancer. Further investigations for the comparison of the different modalities for the nodule measurements in the follow-up period after ablation are also needed.

In conclusion, the intra- and inter-observer reliability and agreement of US and CEUS in measuring Va were unsatisfactory. CEUS should be considered an effective modality, when Va was needed for further evaluation or in the case of nodules with suspected regrowth.