Introduction

Differentiated thyroid cancer (DTC) is the most common endocrine cancer and its incidence has continuously increased in the last three decades all over the world including Korea [1, 2]. Although DTC is perceived as a cancer with a high survival rate, this sustained increase in DTC incidence worldwide has recently attracted considerable attention. DTC treatment employs a three-tiered approach including surgery, radioactive iodine treatment and long-term exogenous hormone replacement [3, 4]. Despite the above-mentioned excellent prognosis, a minority of these patients shows the recurrence [5]. Therefore, DTC care places an inevitable demand on social health care resources and economic burden.

During follow-up, serum markers (thyroglobulin; Tg) and imaging workup (neck ultrasonography and iodine whole body scan) are used to identify persistence or recurrence of disease. In case of suspicious ultrasonography findings, fine-needle aspiration (FNA) is generally required to cytologically confirm or exclude recurrence [6]. However, cytological analysis of FNA (FNA-C) samples may be inadequate or even false-negative, especially in lymph nodes (LN) with small metastases or partial involvement or cystic changes [7]. To improve the diagnostic yield of FNA, directly measuring the concentration of Tg in the washout fluid of the needle aspiration used in FNA (FNA-Tg) has been proposed by several authors since the first study conducted by Pacini et al. [8]. Although FNA-Tg serves as an important standard for suspicious recurrent or metastatic lesions [9], the cutoff value to define “high” is still controversial. Moreover, difference in sample treatments, Tg assays, and reference values may result in difficulty in comparing studies.

Therefore, we meta-analyze the previous studies regarding the diagnostic performance of FNA-Tg in patients with preoperative or postoperative thyroid cancers and suggest the optimal cutoff values of FNA-Tg.

Materials and methods

Data search and study selection

We performed a systematic search of MEDLINE (inception to October 2013) and EMBASE (inception to October 2013) for English publications using keywords “thyroid,” “aspiration,” “washout,” and “thyroglobulin.” All searches were limited to human studies. Inclusion criteria were studies with thyroid FNA used in the initial workup (preoperative) or the follow-up (postoperative), FNA-Tg measured in the washout fluid rinsed with 1 ml of normal saline and the reference standards based on the pathologic reports. We excluded studies that used less or more than 1 ml of normal saline to wash the needle and syringe or other solution. Reviews, abstracts, and editorial materials were excluded. Two authors conducted the searches and screening independently. Any discrepancies were resolved by a consensus.

Quality assessment

Two reviewers independently assessed study quality using the Quality Assessment of Studies of Diagnostic Accuracy Studies (QUADAS-2) [10]. The QUADAS-2 tool assesses the quality of included studies in terms of the risk of bias and applicability to the clinical question for patient selection, index test, reference standard, and flow/timing. Studies were categorized into low/high/unclear about each domain.

Data synthesis and statistical analysis

For each study, we constructed a 2 × 2 contingency table with true positive, false-positive, false-negative, and true negative. Pooled sensitivity and specificity were estimated for each cutoff and heterogeneity was assessed by I 2 statistic, as described by Higgins et al. [11]. Summary receiver operating characteristic (SROC) curves with pooled sensitivity and specificity were estimated using Moses–Littenberg method [12]. To avoid calculation problems by having zero values, 0.5 was added to each cell of the respective contingency table. Threshold effect on heterogeneity was evaluated by Spearman correlation coefficient between sensitivity and 1-specificity [12]. To explore the influence of studies that reported multiple cutoffs, we conducted a sensitivity analysis using one cutoff for each study that is nearest the upper left corner of the ROC graph. Exact 95 % confidence intervals for the sensitivity and specificity were calculated based on a binomial distribution. Data were analyzed with SAS software (version 9.3).

Results

Study characteristics and quality assessment

440 articles were identified through database searching. After excluding non-English articles (n = 54) and conference abstracts (n = 121), 265 abstracts were assessed for eligibility. 8 studies including 843 LNs were eligible for this study [1320]. The detailed procedure is drawn as a flowchart in Fig. 1. Studies were divided into 2 groups of preoperative and postoperative Tg measurement. The summary of studies included is presented in Table 1. Quality assessment was conducted on all 8 studies included in meta-analysis using QUADAS-2. Generally, studies met most of the quality criteria (Supplementary 1).

Fig. 1
figure 1

Flowchart of study selection

Table 1 Studies included in meta-analysis

Diagnostic performance according to the cutoffs

Figure 2 shows the SROC curve of the 8 studies. The study with cutoff value using serum Tg is excluded in SROC curve [21]. The pooled sensitivity and specificity of preoperative studies are 0.89 [95 % CI 0.82–0.95], 0.60 [0.49–0.70], and those of postoperative studies are 1.0 [0.83–1.0], 1.0 [0.92–1.0].

Fig. 2
figure 2

Summary receiver operating characteristic of preoperative and postoperative studies (black circle preoperative studies, star operator postoperative studies)

Data from preoperative studies

Data from 3 preoperative studies [13, 15, 18] were included in calculation of sensitivity and specificity. Six different cutoff values (100, 39.3, 32.04, 10, 5, and 1 ng/ml) were adopted. The pooled sensitivity and specificity of each cutoff are summarized in Table 2. Figure 3a shows the trend of the specificity decrease according to the decrease of cutoffs; however, the sensitivity did not meet the trend. The study by Sohn et al. [15] showed relatively lower sensitivity. Four different methods were applied to determine cutoffs. 97.5 percentile of FNA-Tg in cytologically negative LNs was used as cutoff value in 1 study [13], arbitrarily decided cutoffs in 2 studies [15, 18], and mean + 2SD of FNA-Tg in cytologically negative LNs in 1 study [15]. Pooled sensitivity with each cutoff ranged from 0.69 to 0.95, and specificity between 0.53 and 1.0.

Table 2 Pooled sensitivity and specificity according to cutoff values (Preop)
Fig. 3
figure 3

Forest plots of sensitivity and specificity of preoperative (a) and postoperative (b) studies according to cutoff values

Data from postoperative studies

All 7 postoperative studies [1317, 19, 20] were included in calculation of sensitivity and specificity. Six different cutoff values (100, 32.04, 10, 1.1, 1, and 0.9 ng/ml) were applied in postop studies. The pooled sensitivity and specificity of each cutoff are summarized in Table 3. The pooled sensitivity of each cutoff shows decrease according to the increase of cutoffs; however, the specificity did not meet the trend (Fig. 3b). Pooled sensitivity of each cutoff ranged between 0.79 and 1.0; specificity between 0.5 and 1.0.

Table 3 Pooled sensitivity and specificity according to cutoff values (Postop)

Determination of best cutoff values

To determine best cutoffs from each preoperative and postoperative study, the distance between the point (0, 1) and each observed cutoff values (1-specificity, sensitivity) was calculated, and the closest to the (0, 1) was selected. The distance is minimal when the cutoff value of 32.04 for preoperative studies and of 0.9 for postoperative one are selected (Tables 2, 3).

Discussion

Collectively, FNA-Tg measurement was shown to be a useful tool to detect neck LN metastases from DTC with high sensitivity and specificity in both preoperative and postoperative settings. This assay is now recommended by the both American [3] and European Guidelines [22]. In spite of those results, controversy remains on some issues.

Measuring Tg poses one of the major challenges to laboratories due to interfering factors that might alter the test results; the lack of methodological standards, inadequate functional sensitivity, and variability in the specificity of the commercially available antibody kits. All of these factors affect the accuracy of the results and are associated with large inter-assay variation resulting in difficulty of comparing between studies. Most of the studies included in this study used immunoradiometric technique except 2 studies (Table 1).

In this study, we selected studies with measuring FNA-Tg in the washout fluid rinsed with 1 ml of normal saline to minimize assay variables. Saline solution is the most widely accessible and least expensive option and is, therefore, the most commonly used [23]. The 1 ml volume of fluid to rinse FNA needles is recommended in the previous study [24]. The type and the volume of washout fluid in FNA-Tg measurement should be identical to decide LNs to be metastatic or not. As the methods used to measure FNA-Tg and serum Tg are the same, the possible interference of antithyroglobulin antibodies with FNA-Tg was taken into consideration until Boi et al. showed the absence of any significant interaction between the antithyroglobulin antibodies and FNA-Tg in patients with DTC [25]. The recent study evaluating 528 cases of FNA-Tg also showed that antithyroglobulin antibodies was not correlated with FNA-Tg concentration, regardless of final diagnosis [26].

FNA-Tg may be higher if the patient has remnant thyroid gland because the thyroid is the main source of serum Tg [15]. Serum Tg is also a potential source of bias [25] because of blood contamination during aspiration. Our meta-analysis confirms that optimal cutoff value of FNA-Tg is much lower in postoperative than preoperative assessments as shown in the Fig. 2 (32.04 vs. 0.9 ng/ml). Absence of thyroid tissue after thyroidectomy and serum thyroid-stimulating hormone (TSH) suppression may play a role in lowering of both serum Tg and FNA-Tg. Recently, one study [26] suggested that serum TSH suppression and serum Tg presence should be considered in diagnosing LN malignancy with FNA-Tg in papillary thyroid cancer. Serum TSH suppression and serum Tg presence independently affected the diagnosis made by FNA-Tg [26]. False-negative FNA-Tg diagnosis can be done in the suppressed serum TSH level and false-positive FNA-Tg diagnosis in serum Tg presence [26]. Therefore, measuring FNA-Tg after TSH stimulation is recommended [26]. However, this has not been investigated in previous studies and needs further investigation. Moreover, the recent article by Torres et al. concluded the diagnostic performance of FNA-Tg is not affected by the presence of thyroid tissue [23]. In this study, the usefulness of FNA-Tg is confirmed in both preoperative and postoperative assessments, while the diagnostic performance showed difference between preoperative and postoperative studies.

A wide variety of diagnostic cutoff values ranging from 0.9 to 100.0 ng/ml have been described in the literature (Table 1). Our analysis investigated the best cutoff value in preoperative and postoperative studies, separately. The distance was minimal when cutoffs of 32.04 ng/ml for preoperative and of 0.9 ng/ml for postoperative studies (Tables 2, 3). Up to 30 % of the patients with papillary thyroid cancer exhibit recurrence or persistent metastasis to the neck LN [27]. The clinical evaluation of enlarged local LN is difficult at the beginning and throughout the follow-up, because inflammatory lymphadenopathies are extremely frequent. Furthermore, neck LN metastases from a multiplicity of extrathyroidal malignancies are a relatively common finding [25]. Although the specificity of serum Tg to detect recurrent disease is good, it is not the ideal measurement for metastases affecting small LN [28]. In addition, it is not able to establish the location of the neoplastic focus and is highly affected by the presence of serum antithyroglobulin antibodies, which are found in 25–30 % of patients with DTC [29].

Another meta-analysis to estimate the diagnostic accuracy of FNA-Tg technique was recently published [30]. Totally, 24 studies with diverse assays and methods were included, which resulted in significant heterogeneity. They did not state the clear cutoff values because their results are affected by a significant heterogeneity among the selected studies. In this study, we included the studies with FNA-Tg measured in the washout fluid rinsed with 1 ml of normal saline. Of course, our study has several limitations. First, as we excluded several studies with different methods and assays, a small number of studies were eligible for this study. In addition, some studies presented the results of several cutoff values of FNA-Tg [15, 16, 26]. Further studies with individual patients’ data will be needed to propose the standardized cutoff values. Second, although authors independently reviewed the primary studies, complete accuracy of data could not be ensured by the strategy.

FNA-C from LN has been widely used to confirm suspicious findings on ultrasonography, mainly in patients with undetectable serum Tg due to suppressed TSH or with a noniodine-concentrating metastasis [31, 32]. Nevertheless, the sensitivity of this method is far from ideal, as it varies from 75 to 85 %, with a rate of false-negative results of 6–8 % [8] and a rate of up to 20 % of non-representative samples or samples with inadequate cellularity dependent on cytopathologists’ skill and experience [33]. Our cutoff value can be helpful for screening LN metastases in patients subjected to thyroidectomy, as well as to perform LN staging in patients with DTC who have not yet undergone initial surgery.

In conclusion, FNA-Tg can be used for screening LN metastases in patients subjected to thyroidectomy, as well as to perform LN staging in patients with DTC who have not yet undergone initial surgery. This measurement should be routinely performed, added to FNA-C, since the latter is often non-diagnostic, especially when LN are small-sized. Although the cutoff values for the FNA-Tg has not been standardized, preoperative values of 32.04 ng/ml and postoperative values of 0.9 ng/ml are recommended for identifying neck LN metastasis