Introduction

The prevalence of thyroid nodules in population is increasing around the world. In China, the morbidity of thyroid cancer grows gradually from year to year, especially in female patients [1]. Different guidelines for management of thyroid lesions were published by the American Thyroid Association and the American Association of Clinical Endocrinologists [2, 3]. High-resolution ultrasound is recommended as the first-line modality in the evaluation of thyroid nodules [4, 5]. However, there were overlapping sonographic features between malignant and benign nodules, and fine-needle aspiration biopsy (FNAB) is necessary for clinical decision [6]. Although FNAB has a good diagnostic specificity of malignant thyroid nodules, high false-negative rate (about 30 %) may make oncologists miss a relevant number of patients with thyroid cancer.

In addition, in order to avoid the overusage of FNA in multiple benign thyroid nodules, several reports investigated the risk of malignant nodules for ultrasound-guided biopsy due to suspicious ultrasonographic features [9-11]. For breast cancer, a scheme was required in the preoperation; hence, the American College of Radiology developed the breast imaging reporting and data system (BI-RADS) to establish different categories according to the risk of malignancy [7]. Similarly, there was no standard sonographic report system for thyroid imaging before clinical procedures. Mimicking the BI-RADS classification [8], Horvath et al. [10] and Park et al. [9] established a thyroid ultrasonographic system to stratify cancer risk and developed 5 and 6 categories based on 12 and 10 sonographic features, called thyroid imaging reporting and data system (TI-RADS). Kwak et al. [11] concluded six main sonographic features, such as solid component, hypoechogenicity, marked hypoechogenicity, micro-lobulated or irregular margins, microcalcifications, and taller-than-wide shape, which are significantly associated with malignancy. These risk stratifications of thyroid cancer facilitated the practice of TI-RADS.

To date, various studies performed TI-RADS to evaluate malignant and benign thyroid nodules and to identify whether it had a good diagnostic performance of thyroid lesions. Although its clinical use was questioned and multicenter data was further needed to improve its practicability in clinical practice, several studies showed a promising results of TI-RADS in the diagnosis of thyroid cancer. To our knowledge, there was no study about the systematic review of TI-RADS in the differentiated diagnosis of thyroid nodules. The aim of this review is to assess the overall diagnostic value of TI-RADS in the thyroid imaging strategy.

Methods

Data source

A comprehensive search of abstracts of TI-RADS analysis in thyroid nodules was done in databases. The PubMed (1966–2013 Dec), Cochrane Library, Google scholar, were used for all publications. The keywords or the Medical Subject Headings (MeSH) were as follows: “TI-RADS” OR “thyroid imaging reporting and data system” OR “thyroid imaging” OR “thyroid nodules ultrasound imaging”. The reference lists of included studies and review articles were checked manually. Unpublished data were not included in this review.

Study selection strategies

According to the Cochrane reviewer’s handbook [12], double abstraction process of data selection was performed in our study. Two investigators (W.X. and L.Y.), who were blinded to the journal, author, institution, and other relevant information, independently selected retrieved articles. We read all titles and abstracts of publications to get the initial potential articles, and then we filtered the full text of these articles for final analysis. Disagreements of two readers were resolved in consensus.

Selection criteria and data extraction

All articles which were included in our study should conform to the criteria as follows: (a) Language is limited in English; (b) Quality assessment of diagnostic accuracy studies (QUADAS) was used as a quality assessment tool to assess the quality of articles, and those which have more than nine “yes” answers were included [13]; (c) The TI-RADS system was performed in the differentiated diagnosis of thyroid nodules. Those studies which were not seen as a diagnostic tool in thyroid nodules were excluded; (d) Histological and/or cytological analyses were used as the reference standard; (e) Sufficient patients in studies were presented to calculate the true-positive (TP), false-negative (FN), false-positive (FP), and true negative (TN) values for data statistics; (f) The data or subsets of data were not published more than once. Among the duplicated articles, the one with most details or the most recent was chosen.

According to the standardized form, two same readers (W.X. and L.Y.) independently extracted relevant data about study characteristics (Table 1). The following data were extracted from each article: (a) first author name, (b) publication year, (c) country/region, (d) language, (e) number of patients, (f) average age of patients, (g) number of nodules, (h) study design (prospective or retrospective), (i) reference standard (histopathology or cytology), and (j) TI-RADS criteria.

Table 1 Basic characteristics of five studies

Data analysis

Meta-analysis was performed based on a construct of 2 × 2 contingency tables, which were for the sensitivity and specificity calculated in each study. At first, we used the Spearman correlation coefficient for diagnostic threshold analysis. If there was a diagnostic threshold in these data, summary receiver operating characteristic (SROC) was fitted using the Mantel-Haenszel or Moses-Shapiro-Littenberg model, and pooled sensitivity and specificity, with 95 % confidence intervals (CIs), were obtained without the diagnostic threshold in all studies. The heterogeneity was assessed by likelihood ratio chi-squared test. In the likelihood ratio chi-squared test, a P value of less than 0.05 was considered as the apparent heterogeneity. The random effect model and fixed effect model were used for the primary meta-analysis when the heterogeneity existed [14].

Combined sensitivity, specificity, likelihood ratio (LR), and diagnostic odds ratio (DOR) were all calculated after heterogeneity assessment. Forest plot was presented based on these parameters. Funnel plots were described to assess a possible publication bias. In addition, summary ROC, a mathematical transformation of sensitivity and specificity (Q* values), has been described by Moses et al. [15]. The weighted area under the curve (AUC) was obtained to measure the diagnostic performance of TI-RADS. Statistical significance was P < 0.05.

To test the heterogeneity of the results, subgroup analyses were performed in the meta-analysis (Table 2). We extracted covariates including (a) the year of publication, (b) the number of thyroid nodules enrolled, and (c) the criteria of TI-RADS used by authors. Covariate adjustment analysis was performed by using regression analysis. All data analyses were carried out using Stata/SE statistical software Version 11.1 (StataCorp LP, Texas, USA).

Table 2 Diagnostic accuracy of subgroups studies

Results

Literature search

The process of study selection in our meta-analysis was shown in Fig. 1. Five hundred seventy-six primary literatures were retrieved from databases mentioned above. Among these, 560 studies were excluded after reviewing the title and abstract. Eleven articles were then excluded after reviewing the full text: (a) nine articles was written in French or Chinese; (b) one only analyzed the ultrasonographic features which indicated the probability and risk of malignancy, but did not test TI-RADS as a diagnostic tool in thyroid nodules; and (c) one was a duplicated article. Finally, a total of five studies [9, 10, 23, 24, 33] with 7,753 thyroid nodules enrolled which satisfied all of the inclusion criteria were considered for the meta-analysis.

Fig. 1
figure 1

The procedure of study selection in our meta-analysis. A total of five studies were included in this systematic review which fulfilled all of the inclusion criteria

Study characteristics, study quality, and publication bias

The years of publications were from 2009 to 2013. The study designs were prospective (n = 3) and retrospective (n = 2). The number of thyroid nodules per study ranged from 114 to 4,550. The average age of patients was 51.6 years. The average of tumor size was 12.8 mm (range 4–86 mm). Four studies were carried out on FNA and surgery for cytological and histopathological results. One study only carried out surgery for the final histological results (Table 1).

The funnel plot indicated that there was probably no publication bias, showing a symmetric figure, using the log DORs of individual studies against their sample sizes (Fig. 2).

Fig. 2
figure 2

Funnel plot was described to show symmetry which there was probably no publication bias in the meta-analysis

Summary sensitivity, specificity, positive and negative likelihood ratio, diagnostic odds ratio, and summary receiver operating characteristic curves

The pooled sensitivity and pooled specificity of TI-RADS were 0.75 (95 % confidence interval (CI) 0.72–0.78) and 0.69 (95 % CI 0.68–0.70), respectively (Fig. 3a, b). The range of sensitivity was from 0.57 to 0.96. The lowest of sensitivity and specificity were 0.57 and 0.23, respectively (Fig. 3a, b). The summary positive and negative LR were 3.19 (95 % CI 1.60–6.34) and 0.17 (95 % CI 0.06–0.51), respectively (Fig. 4a, b).

Fig. 3
figure 3

Forest plot showed pooled sensitivity (a) and specificity (b) of TI-RADS in the differentiated diagnosis of thyroid nodules

Fig. 4
figure 4

Forest plot showed pooled positive likelihood (LR) (a) and negative likelihood (LR) (b) of TI-RADS in the differentiated diagnosis of thyroid nodules

The pooled DOR was 24.28 (95 % CI 14.25–41.38) (Fig. 5). The SROC was symmetric; no differences were found between b and zero (P = 0.07) (Fig. 6). The overall AUC was 0.9026, and the Q* index was 0.8304, indicating very good diagnostic accuracy (Fig. 6).

Fig. 5
figure 5

Forest plot showed diagnostic odds ratio (DOR) of TI-RADS in the differentiated diagnosis of thyroid nodules

Fig. 6
figure 6

Summary receiver operating characteristic (SROC) curves of TI-RADS in the evaluation of thyroid nodules. AUC, area under the curve; SE, standard error

Subgroup analysis and meta-regression

The meta-regression of five subgroups in our study showed heterogeneity associated with three subgroups (years of publication, number of nodules, and TI-RADS criteria). Among them, two main factors (number of nodules and TI-RADS criteria) contributed to the heterogeneity of this meta-analysis (relative DOR (RDOR) = 4.50, 95 % CI 0.25–80.25 and RDOR = 2.83, 95 % CI 0.04–214.66) (Table 2). These results indicated that the accuracy of TI-RADS diagnosis in three studies (number of nodules ≥ 500) were higher than that in other two which enrolled less thyroid nodules (<500). Studies with different TI-RADS criteria (according to Horvath et al. [10], or not) also had significant influence on the overall sensitivity and specificity significantly (P < 0.05).

Discussion

Recently, with the increasing number of thyroid cancer in different nations, the American Thyroid Association (ATA) and the British Thyroid Association (BTA) recommended ultrasonography as the first-line technique in the evaluation of thyroid nodules and cervical lymph nodes before operation [2, 3]. According to a number of reports, ultrasound is capable to find small thyroid nodules (>2 mm), especially non-palpable nodules, by clinical examination [26, 27]. Although ultrasound has great advantages in the diagnosis of thyroid solid and cystic nodules, the overlapping ultrasonographic features and small tumor sizes could lead to misdiagnosis by examiners [28]. The American College of Radiology (ACR) established the BI-RADS system to normalize mammographic and sonographic reports and to increase the diagnostic accuracy [29, 30]. In 2009, Horvath et al. [10] developed the TI-RADS to stratify thyroid cancer risk for clinical practice. In order to avoid unnecessary surgical resection or biopsy in thyroid nodules, high sensitivity and high negative predictive value (NPV) of ultrasound screening were required for surgical decision making [31]. TI-RADS provided ultrasonographers more information to classify benign and malignant nodules, and in China, there were several studies that tested this system in the recent years [16-22]. Because the language is limited, data in Chinese articles were not shown in this meta-analysis (the pooled sensitivity and specificity were 0.79 (95 % CI 0.77–0.81) and 0.71 (95 % CI 0.70–0.72)). However, few studies published in English focused on TI-RADS for its clinical use, because of different criteria between observers and not good interobserver agreement [23, 24, 33].

To our knowledge, it is necessary to find a systematic method to improve management of patients who have thyroid nodules based on FNAB diagnosis. The classifications in the TI-RADS system were based on sonographic feature stratification for cancer risks [11]. Several studies have classified TI-RADS into five or six categories similar to the BI-RADS classification, in which TI-RADS 1–3 corresponds to normal gland or benign nodules, and suspicious of malignancy was divided into TI-RADS 4–5 [10, 16-25]. Some authors also added TI-RADS 0 or 6 into the categories, that is, no nodules in thyroid gland or diffuse thyroid lesion or malignant nodules proven by biopsy, respectively [10, 25]. Others separated TI-RADS category 4 or 5 into two or three subsets indicating progressive possibility of malignancy (10–80 %) [16-24].

Our meta-analysis focused on the differentiated diagnostic value of TI-RADS in benign and malignant thyroid nodules. According to our systematic review, the pooled sensitivity and specificity of TI-RADS were 0.75 and 0.69, respectively, and the AUC of SROC was 0.90. These indicated that TI-RADS has good diagnostic accuracy for the differentiation of thyroid nodules. Moreover, the diagnostic odds ratio in our meta-analysis was 24.3 (95 % CI 14.25–41.38), which demonstrated that the TI-RADS was a better diagnostic test for the differentiation of thyroid nodules.

However, there was a large range of sensitivity and specificity (0.57–0.96 and 0.43–0.94) with high heterogeneity (P = 0.0001). The various TI-RADS classification in these publications may contribute to the different sensitivity and specificity. The variable interobserver reproducible scanning results may hamper the reliability of thyroid ultrasound and the management of the TI-RADS system [24]. The reported kappa values were between 0.51 and 0.61, which depended on tumor sizes (more interobserver variation in tumors less than 2 cm) [24]. In addition, multiclassifications of TI-RADS in different institutions may confuse sonographers to perform thyroid ultrasonography [9, 10]. The unified TI-RADS standard is needed to standardize TI-RADS classification in the preoperational evaluation of thyroid nodules.

In our review, we also performed subgroup regression analyses to find the main impact factors resulting in the great heterogeneity. The data demonstrated that the number of thyroid nodules enrolled in the articles and the TI-RADS criteria used in different studies (according to Horvath et al. or not) are two significant factors influencing the accuracy of TI-RADS in patient management. The short time usage and various criteria should influence specialists to make the final diagnostic decision. Three prospective studies used the pathological and cytological diagnosis as reference standards, which are more reliable than only using pathological diagnosis in the retrospective ones, and thyroid nodule surgeries were more of diagnostic than therapeutic purposes [32]. Cheng et al., Russ et al., and Friedrich-Rust et al. all reported various interobserver kappa values to test whether the classifications were practicable [23, 24, 33]. They concluded that the consistency of observers was significant for diagnosis performance, and tumor size may be considered as a considerable factor which impacts on interobserver concordance [24]. Higher sensitivity and NPV for TI-RADS were generally accepted by authors in three studies in 2013. Thus, the TI-RADS categories should be set up more practicably by authoritative organizations to unify the sonographic reports of thyroid nodules.

To avoid selection bias of our systematic review, we searched many databases, such as PubMed, MEDLINE, EMBASE, the Cochrane Database, Google scholar. The QUADAS tool is an evidence-based quality assessment tool with 14 questions [13]. The studies in our review were included based on more than nine answers of “yes” to questions in the QUADAS tool to minimize bias in the selection and data extraction [13].

However, there are several limitations in our review. Firstly, no unified criteria of the TI-RADS system resulted in research bias in the evaluation of thyroid nodules. The different categories and various standards of differentiation of benign or malignant thyroid nodules limit the usage of TI-RADS as an effective diagnostic tool. Secondly, not all participants in each study were confirmed by pathology. Most of them underwent follow-up after TI-RADS classification (1–3 categories). Even those who were divided into TI-RADS 4–5 categories had surgery or FNAB for final diagnosis. The available data given to us in some studies were not enough to calculate the sensitivity of TI-RADS. However, higher false diagnosis rate in FNAB than histopathological results may influence diagnostic accuracy.

Conclusion

The TI-RADS classification was an accurate diagnostic tool for differentiating benign and malignant thyroid nodules. The unified TI-RADS classification criteria and high-quality prospective studies in diagnosing thyroid nodules still need to be carried out.