Introduction

Detection of thyroid nodules is increasing worldwide and their management is nowadays updated to a more conservative approach to reduce overdiagnosis and overtreatment [1]. The initial assessment includes the evaluation of clinical risk factors and sonographic examination (US) of the neck, followed by fine needle aspiration biopsy (FNA) when indicated by nodule features and size [2]. During the last years, several US risk stratification systems (i.e., Thyroid Imaging And Data Systems, TIRADSs) have been proposed by international societies and their high performance was proven by studies using histology as reference standard [3, 4]. Both US and FNA reports are based on classification systems that allow standardizing of terminology and a risk stratification of the patient. Cytological examination of FNA is a mainstay to divide surgical patients from those that can be safely submitted for clinical follow up; anyway, about 20% of the cytological reports resulted indeterminate and are unable to exclude the presence of cancer [5, 6]. The Bethesda classification system [7], a largely endorsed reporting system for thyroid cytology, divides the results of cytological examination into six classes: non-diagnostic or unsatisfactory (I), benign (II), atypia of undetermined significance or follicular lesions of undetermined significance (AUS/FLUS) (III), suspicion of follicular neoplasia (IV), suspicion of malignancy (V) and malignancy (VI). Indeterminate classes (III and IV) share a similar malignancy rate and represent a critical point for clinical management. In particular, the class III is a heterogeneous group of pathological conditions including doubtful nuclear and/or architectural atypia. In class III, the risk of malignancy is characterized by a wide range (10–30%) and the suggested clinical action varies from repeat FNA to surgical lobectomy. In this context, the management of the patients largely depends on the personal skills of the endocrinologist and requires additional information sometimes based on molecular analysis. Class III has been further subclassified into subcategories assembled by similar morphological features, namely: focal cytological atypia; Extensive but mild cytological atypia; Atypical cyst-lining cells; A scantly cellular specimen with architectural atypia; Cytological and architectural atypia (NIFTP may be present); Hürthle cell aspirates with low risk pattern; Atypia, not otherwise specified (NOS), not papillary type; Psammomatous calcifications in the absence of cellular atypia; Atypical lymphoid cells, rule out lymphoma [8]. Subclassification is encouraged in the pathological report and it may be the key to better classify patients with indeterminate nodules, nevertheless this subcategory-based management requires the knowledge of their specific risk of malignancy.

The present study was conceived to evaluate the rate of malignancy of the above subcategories within the indeterminate class of Bethesda III.

Methods

Search strategy

A specific search strategy was planned. Firstly, sentinel studies were searched in PubMed. Secondly, keywords and MeSH terms were identified in PubMed. Thirdly, to test the strategy, the terms “thyroid” OR “thyroid nodule” AND “Bethesda” AND “risk of malignancy” AND atypia of undetermined significance OR “AUS” were searched in PubMed and Scopus. Then, studies meeting all the following criteria were included: (1) the definition of the Bethesda AUS/FLUS category is clearly stated; (2) the rate of malignancy in at least one of the AUS/FLUS subcategories is reported; (3) the publication is English full-length. Studies were excluded if (1) they lack of the definitive histological evaluation of surgical specimen to detect the real risk of malignancy; (2) it’s not possible to clearly distinguish at least one of the AUS/FLUS subcategories. (3) systematic reviews, narrative reviews and guidelines; (4) studies performed in pediatric patients. References of included studies were screened for additional papers. The last search was performed from January 1st 2010 to April 30th 2020. However, two older papers (published in 1998 and 2009, respectively) have been also included, because they represent referral articles for the following subcategories: psammomatous calcifications in the absence of cellular atypia [9] and atypical cyst-lining cells [10]. Two investigators (AC, AP) independently and in duplicate searched papers, screened titles and abstracts of the retrieved articles, reviewed the full-texts and selected articles for their inclusion.

Data extraction

The following information was extracted independently and in duplicate by two investigators (AC, AP) in a piloted form: (1) general information on the study (author, year of publication, country, study type, number of patients, number of nodules, selection criteria of included nodules); (2) the total number of fine-needle aspirations (FNAs); (3) total number of analyzed AUS/FLUS subcategories; (4) number of malignant nodules; (5) the rate of diagnosis and malignancy in each subgroup of AUS/FLUS. Data were cross-checked and any discrepancy was discussed.

Study quality assessment

The risk of bias of included studies was assessed independently by two reviewers (AC, AP). Each domain was assigned low (L), high (H) or unclear (U) according to QUADAS-2 [11].

Data analysis

The characteristics of included studies were summarized. Then, several separate meta-analyses were performed to obtain the pooled prevalence of malignancy in the different subcategories of Bethesda class III. Heterogeneity between studies was assessed using I2, with 50% or higher values regarded as high heterogeneity. The Egger’s test was carried out to evaluate the possible presence of significant publication bias. A meta-analysis could be performed only for those cytologic categories in which there were more than three papers, because a smaller number of studies do not allow to calculate the Egger’s test. For statistical pooling of data, a random-effects model was used. All analyses were performed using StatsDirect statistical software (StatsDirect Ltd; Altrincham, UK). A p < 0.05 was regarded as significant.

Results

A total of 591 papers were found. After removal of 129 duplicates, 462 articles were analyzed for title and abstract; 433 records were excluded (guidelines, review, meta-analysis, lack of information regarding specific AUS/FLUS subgroups, pediatric patients, not within the field of the review). The remaining 29 papers were retrieved in full-text and 23 studies were finally included in the systematic review (Fig. 1). No additional study was retrieved from references of included studies.

Fig. 1
figure 1

Flow diagram of the systematic review.

Study quality assessment

The risk of bias of the included studies is shown in Table 1.

Table 1 Quality assessment of the studies according to QUADAS-2

Qualitative analysis (systematic review)

The characteristics of the included articles and their data useful fort the present meta-analysis are summarized in Table 2. All the included studies were retrospective and published in English language. Data on the final histologic follow-up were clearly identified in all these manuscripts and the true percentage of malignancy could be calculated. Table 1 details the characteristics and findings of the 23 included studies. Out of the nine subcategories: focal cytologic atypia was evaluated in 7 papers [12,13,14,15,16,17,18]; extensive but mild cytologic atypia was evaluated in 7 papers [13, 15, 19,20,21,22,23]; a scantly cellular specimen with architectural atypia was evaluated in 19 articles [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30]; cytologic and architectural atypia (NIFTP may be present) was evaluated in 6 articles [18, 19, 26,27,28, 31]; Hürthle cell aspirates with low risk pattern was evaluated in 13 papers [12,13,14,15,16, 21, 24, 26,27,28, 32,33,34] atypia, not otherwise specified, not papillary type was evaluated in 9 papers [12,13,14, 16, 19, 21, 26,27,28]. Two subcategories (psammomatous calcifications in the absence of cellular atypia and atypical cyst-lining cells) were evaluated in only one article each. No articles were found about the malignancy rate for the subcategory “Atypical lymphoid cells, rule out lymphoma”.

Table 2 Data availability from the literature

Total cases of indeterminate nodules with postoperative histologic follow-up, submitted for the meta-analysis, were 4241 and 1163 (27.4%) of these were malignant. Table 3 describes the results of the meta-analysis.

Table 3 Result of the meta-analyses feasible with the data found in the literature

The meta-analyses were performed only for the cytologic subcategories in which there were at least three papers.

Quantitative analysis (meta-analysis)

There were two subcategories (i.e., “atypical cyst-lining cells” and “psammomatous calcifications in the absence of cellular atypia”) for which we found only one article; the cancer prevalence in these two subcategories was not meta-analyzed. In the same way we did no meta-analyze the subcategory atypical lymphoid cells, for which we did not find publications fulfilling the previous reported inclusion criteria. The data retrieved in 23 articles were used to perform several separate meta-analyses (Fig. 2); six different meta-analyses were performed to evaluate the pooled cancer prevalence in the other six subcategories. As shown in Table 3, the overall cancer rate found in the six categories ranged from 15 to 44%. The highest cancer prevalence was found in the subcategory of “focal cytologic atypia” (Fig. 3). Heterogeneity was moderate to high. Publication bias was always absent (non-calculable in one case).

Fig. 2
figure 2

Flow diagram of data included in the six different meta-analyses

Fig. 3
figure 3

Rate of malignancy found in the present meta-analysis. The subcategories are ordered according to their ROM recorded in the meta-analysis

Discussion

In this study, we observed an overall range of malignancy within the indeterminate category III (15–44%) which was larger than that originally estimated in the Bethesda document (10–30%) [7]. The wide range reflects the complexity of this indeterminate category, since it is mainly due to the variability of the pathological conditions included in this class as well as in indeterminate categories from other classification systems [35]. Since the first publication of the classification systems for reporting thyroid cytology, it was evident that these systems constituted a valid tool for standardization of medical communication and patients’ management. At the same time, their major drawback falls in the indeterminate lesions that are non-benign but not clearly malignant, where US and clinical examination also are few informative. The recent edition of the Bethesda system for reporting thyroid cytology (TBSRTC) describes different scenarios within the AUS-FLUS corresponding to the various morphological pattern included in this indeterminate category [8]. We focused the meta-analysis on each of these subcategories to identify a more specific risk of malignancy thorough the knowledge of the detailed cytological pattern. Previous studies demonstrate that cytological atypia confers a higher risk of malignancy then architectural alterations [23, 36]. This observation is recognized by clinicians and pathologists, but it is not still endorsed by international guidelines for thyroid nodules. Moreover, a specific meta-analysis for each subcategory, based on large number of cases, is still lacking. Our results underline that the subcategories differ each other having a different pooled prevalence of cancer confirmed by final histological report [37]. In particular, we showed that the presence of cytological atypia of papillary type is the riskier condition (cancer prevalence 44%), while sparse microfollicular architecture has the lowest risk (21%), being Hürthle cell features slightly more harmless (15%) than the non-Hürthle pattern. Of interest, diffuse mild nuclear atypia has a lower pooled cancer prevalence (42%) than focal but consistent nuclear atypia (44%). Finally, the combined nuclear and architectural atypia has a pooled risk of 36%, considering that cases of non-invasive follicular tumors with papillary-type atypia (NIFTP) may fall in this subgroup. Since NIFTP was recognized as entity in 2016 [38], data were only reported in three studies included in the present meta-analysis [26,27,28]. The prevalence of NIFTP in these papers was 5/117 (4.27%) [26], 1/37 (2.7%) [27], and 11/51 (21.57%) [28]. In two of these studies [26, 27] NIFTP was considered as malignant. In the study by Guleria et al. [28] the authors reported a change in ROM of “Cytologic and architectural atypia (NIFTP may be present)” subcategory when NIFTP was considered as malignant (58.8%) or non-malignant (37.3%) entity. The analyzed papers did not describe information regarding the autoimmunity or thyroid function and their potential impact on AUS/FLUS cannot be known. Our data support the use of an accurate description of the cytological specimens in the diagnostic report, avoiding the use of the category definition alone. The accuracy of the subclassification by pattern of the cytological samples, especially within indeterminate group, allows a more precise risk stratification and may represent the reference for patients’ management and follow up. This approach has been proposed also for the indeterminate category Thy3a of the UK RCPath reporting system [39]. A relevant usefulness of our analysis of pattern-based data is the possibility to know a more detailed risk of malignancy within indeterminate nodules, independently of the adopted classification system. Other reporting system, such as the Italian system [40], use a different distribution of the subcategories inside the indeterminate classes: the knowledge of the pattern-related risk allows the patients management by the specific pathological situation.

Nevertheless, a difficulty in the use of subcategories is evidenced by the differences reported about the cancer prevalence in the analyzed papers. The wider range has been observed in the focal cytological atypia (23–66%) that means a subjective interpretation of single cells or groups of cells with nuclear clearing or overlapping in an otherwise benign context. Nuclear atypia in fact is not a definite condition but a spectrum of alteration from modest reactive changes to neoplastic papillary transformation. The smaller range has been found in the subcategories with architectural atypia alone (Hürthle or non-Hürthle type) demonstrating that microfollicular pattern is a well-recognized alteration based on a detailed definition: a group of fewer than 15 thyrocytes arranged in a circle [41]. Such observation demonstrates that morphological criteria for cytological classification, when available, strongly support diagnostic standardization and reduce results’ heterogeneity.

Limitations of this study should be addressed. (1) Publication bias is a major concern in a meta-analysis. In fact, studies reporting positive findings are more likely to be published than those with negative results; also, small sample-size studies are likely to report a positive relationship which could not be confirmed in larger series. Here, importantly, Egger’s test proved the lack of publication bias. (2) The present study has some limitations such as heterogeneity and selection bias. In particular, the heterogeneity arised from a possible different management of patients. Also, the six meta-analyses were performed only for those cytologic subcategories in which there were at least three papers to be pooled. (3) Finally, it has to be underlined that when we speak about the histologic outcome of indeterminate lesions we are really considering only those patients operated upon; in this context, we must take into account that all we manage patients with indeterminate thyroid nodules according to their clinical and ultrasound characteristics and then we are prompted toward the surgical option when one or more risk factors are present. Thus, a selection bias is always present in this field.

Even if using the subcategories of AUS/FLUS is not recommended in the clinical practice, since they are for internal documentation only, our study should suggest their role for patient management. Anyway, in view of the reported data about malignancy risk for each specific scenarios, we advise for further specific studies ideally prospective.

In conclusion, this meta-analysis found a pooled cancer prevalence in the subcategories of AUS/FLUS between 15 and 44%, being this range larger than that estimated in the Bethesda document (10–30%). The evidence-based data about each subcategory represents a reference for the clinical management of indeterminate nodules in clinical practice.