Introduction

Nodules in the thyroid gland occur commonly in the general population, with a prevalence of 3 to 7% on palpation, 20 to 76% on ultrasonography, and over 50% in autopsy specimens [1]. The main clinical concern for managing thyroid nodules is to exclude malignant disease, present in 4 to 6% of all nodules [2]. Clinical features such as family history of thyroid cancer, or of a syndrome associated with thyroid cancer, prior history of cancer, or history of head or neck or total body irradiation are suggestive of malignancy, but most malignant nodules lack these features. On palpation, the suspicion of malignancy, apart from fixity to surrounding tissue and abnormal lateral cervical lymphadenopathy, is also related to the substantial firmness of thyroid nodules. However, palpation is a subjective method that can be easily influenced by the size and location of the nodule, as well as the experience of the examiner. Thus, to differentiate benign from malignant thyroid nodule, an optimal diagnostic test is needed, which should be accurate, economical, preferably non-invasive, and avoid unnecessary surgery.

Ultrasonography (US) is the most advocated and frequently used imaging modality to evaluate thyroid nodules; however, its efficacy in distinguishing the benign from malignant thyroid nodules is relatively limited [3]. Hypoechogenicity, solid composition, microlobulated or spiculated margin, microcalcifications, taller-than-wide shape, and intra-nodular vascularity are the various suspicious features on US that can predict malignancy [4]. The occurrence of at least one of the above malignant features has a sensitivity and specificity of 83% and 74% respectively, for the diagnosis of thyroid cancer [5]. Thyroid nodules with such suspicious US findings are classified using various scales, and need further cytological evaluation for the confirmation of malignancy. Fine needle-aspiration cytology (FNAC) of the nodule, ultrasound guided or otherwise, is a minimally invasive office procedure, and has demonstrated a sensitivity of up to 97% for thyroid cancer diagnosis [6]. However, one of the major limitations is that even in the best of the centers with an experienced cytologist and a high volume load, up to 2 to 16% of FNAC are non-diagnostic and a repeat biopsy is needed to gain adequate results [6]. Also, because of the high prevalence of nodular thyroid disease, performing a cytological assessment it is not cost effective for all the nodules, or even majority of the thyroid nodules, especially in resource constrained settings.

Real-time ultrasound elastography, also called “electronic palpation”, is a non-invasive imaging technique that has been used to assess tissue stiffness. This technique was first utilized by Ophir and colleagues in 1991, with the underlying principle that under external compression, the softer tissues distort more easily than the harder tissues [7]. Recording this degree of distortion or deformation allows for an objective determination of the tissue firmness or stiffness. Malignant tissues are more firm, have a higher stiffness, and distort less to an external force as compared to the benign lesions. Initial use of elastography was to differentiate cancers from benign lesions in breast, prostate, lymph nodes, and pancreas, and for assessment of hepatic fibrosis. Several studies have reported a high specificity and sensitivity of ultrasound elastography in predicting thyroid cancer [8,9,10,11,12]. However, diagnostic accuracy of this imaging technique has been quite variable in different settings.

Elastography is evaluated based on two parameters, namely, the elastography score and the strain ratio. Tissue distortion or deformation using an external compression with the ultrasound-probe is pictured in a split-screen mode consisting of B-mode image and elastogram on the same screen. The elastogram is then superimposed on the B-mode image, and tissue stiffness is displayed as a continuum of colors ranging from green (softest tissue) to blue (hardest tissue) (Fig. 1). Strain elastography includes elasticity evaluation by drawing two regions of interest- one over the nodule, and the other over adjoining reference area, respectively. The strain ratio for the two regions of interest is automatically calculated by the dedicated software installed in to the ultrasound machine (Fig. 2). In addition to elastography score and strain ratio, shear-wave elastography is a recent advancement wherein the fixed radiation force produced by the probe provides real-time elastogram, making the procedure less subjective, reproducible, and quantitative [13].

Fig. 1
figure 1

The images are B-mode sonograms and elastograms. There is a color bar on the right side of every elastogram. Ultrasound elastography score: A score of 1 indicates elasticity in the entire examined area. A score of 2 indicates elasticity in a large part of the examined area. A score of 3 indicates stiffness in a large part of the examined area. A score of 4 indicates a nodule without elasticity

Fig. 2
figure 2

By strain elastography, elasticity assessment is done by drawing two regions of interest over the target region and the adjacent reference region, respectively. A strain ratio is automatically calculated using dedicated software connected to the ultrasound machine. Each lesion is assessed at least three times, and the average value was recorded as the final result

This prospective study was undertaken to assess the diagnostic value of elastography in differentiating benign and malignant thyroid nodules. We also evaluated whether application of ultrasound elastography can decrease the need for FNAC in patients having thyroid nodule, and thus lead to optimizing the use of health care resources.

Materials and methods

This prospective comparative study was done in patients attending the Endocrinology Clinic at P. D. Hinduja Hospital and Medical Research Centre Mumbai, India, from July 2019 to December 2019. The study was performed in accordance with the Helsinki Declaration of 1975 (revised in 2000), and was approved by the Institutional Review Board of our hospital (IRB No. 959–15-PC).

Considering an expected prevalence of thyroid nodules as 5.5% in the general population [1], and 94% sensitivity of elastography to diagnose benign nodules [12], with a precision of 0.3 and a confidence interval of 95%, the resultant sample size by applying Buderer formula was 44 [14]. Thus, 44 consecutive adult patients (>18 years of age) with thyroid nodule(s), requiring FNAC as per the American Thyroid Association guidelines based on the clinical and the US findings [15], and having normal thyroid function tests, were enrolled after obtaining a written informed consent from each participant. Patients with cystic thyroid lesions were not included in the study. A total of 52 nodules from 44 patients were assessed. Patient’s history, including the symptoms, duration of symptoms, history of childhood irradiation exposure, and a family history of thyroid cancers; neck examination, serum thyroid stimulating hormone (TSH) levels and US thyroid findings were recorded.

Serum TSH was measured in the hospital laboratory using third generation solid phase chemiluminescent immunometric assay using Seimens® Immulite 2000®, with analytical sensitivity of 0.004 mIU/mL, and normal reference range of 0.4 to 4.5 mIU/mL.

Ultrasound examination was performed by a single experienced sonologist, after which the patients underwent FNAC of the suspicious thyroid nodule. In case of non-diagnostic or unsatisfactory result (Bethesda category-I) [6], the FNAC was repeated after three weeks. If the patient underwent surgery subsequently, the histopathology findings were noted.

Ultrasound elastography

The patients were examined in supine position with neck slightly extended. All studies were performed by the same sonologist, using the same US machine- Philips® HDI 5000® unit, which had both conventional B-mode US and elastography capabilities. A C5-I Linear probe of 5 to 12 MHz was used for evaluation of the nodules. The probe was positioned slightly in contact with the skin. The ultrasound examination started with B-mode imaging to assess nodular size and presence of sufficient surrounding reference tissue. The patient was asked to avoid swallowing, and hold their breath during the examination to minimize the motion of thyroid gland. The deformity was represented by color scale over the B-mode image that ranged from green (i.e., softest components with the greatest elastic strain) to blue (i.e., hardest components with no strain) (Fig. 1).

Evaluation based on elastography scores

Each nodule was assigned an elastography score based on a four-point scale according to the classification proposed by Itoh and colleagues [16].

  • Score 1: Low stiffness over the entire nodules (entirely green).

  • Score 2: Low stiffness over most of the nodule (almost green with blue spots).

  • Score 3: High stiffness over most of the nodule (almost blue with green spots).

  • Score 4: High stiffness over the entire nodule (entirely blue).

  • Scores of 1 and 2 were considered benign, while scores of 3 and 4 were considered malignant.

Evaluation based on strain ratio

The strain ratio (normal tissue to lesion strain ratio) of each nodule was calculated by dividing the strain value of the normal tissue by that of the nodule. Strain ratio was automatically calculated using dedicated software connected to the ultrasound machine. Each lesion was assessed at least three times, and the average value was recorded as the final result (Fig. 2).

Statistical analysis

The SPSS for Windows version 13.0 software package (SPSS Inc.®, Chicago, IL) was used for statistical data analysis. Categorical data were expressed as frequency and percentage, while continuous variables were expressed as mean ± standard deviation (SD). Unpaired student t-test and Chi-square test were used for comparison between groups with continuous data and categorical data, respectively; Mann–Whitney U test was used to calculate the P value between mean serum TSH levels of patients with benign versus malignant nodules. P value of less than 0.05 was considered statistically significant. The diagnostic sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and accuracy for elastography score to diagnose benign thyroid nodule was calculated, considering final histopathology as the gold-standard. For strain ratio, receiver operator characteristic (ROC) curves were constructed, and the diagnostic parameters for the strain ratio scores, which would best predict benign nodules, were computed.

Results

A total of 52 nodules from 44 patients, 30 (68.2%) females and 14 (31.8%) males, were assessed. Mean age of the patients in the study population was 45.18 ± 11.23 years. 10 (22.7%) patients had compressive symptoms, including breathlessness, difficulty in swallowing and/or change in voice. None of the patients had past history of childhood irradiation exposure, or a family history of thyroid cancers. Mean dimensions of the nodules were 2.56 ± 1.28 cm in the transverse axis, 2.07 ± 1.08 cm in the antero-posterior axis, and 1.61 ± 0.83 cm the cranio-caudal axis. Mean TSH level of the study participants was 1.92 ± 1.09 mIU/mL.

On FNAC, 38 (73.1%) nodules were Category-II (benign), while 8 (15.4%) were Category-VI or V (malignant or suspicious for malignancy), according to the Bethesda system for reporting thyroid cytopathy (Table 1).

Table 1 FNAC results of 52 thyroid nodules, according to Bethesda system for reporting thyroid cytopathy [6]

Fourteen (31.8%) patients underwent thyroidectomy, and histopathology was reported for 18 (34.6%) nodules. Papillary carcinoma was diagnosed in eight nodules, and follicular carcinoma in one nodule. Follicular adenoma was reported in four nodules, while five nodules were reported as benign colloid goiter. In all, nine out of 52 (17.3%) nodules were malignant, and 43 (82.7%) nodules were considered benign.

Though the mean age (44.19 ± 11.31 vs. 48.86 ± 10.27 years, P = 0.258) and mean TSH (1.88 ± 1.01 vs. 2.19 ± 1.55 mIU/mL, P = 0.849) were lower in patients with benign nodules as compared to malignant nodules, there was no significant statistical difference. Also, no statistical gender difference (30 vs. 6 females, P = 0.762) or symptomatology difference (15 vs. 3 compressive symptoms, P = 0.732) was noted in patients with benign nodules versus malignant nodules.

Hypoechogenicity was most sensitive high-risk US feature (77.8%), but it was least specific (58.1%) to determine benign nodule. Peripheral halo had a specificity of 100%, but it was the least sensitive (22.2%). Hypoechogenicity, margin-irregularity, presence of peripheral halo, presence of microcalcifications, nodule being taller than wider, and intra-nodal vascularity, all showed a good feasibility to rule out malignancy with NPV >85%. Sensitivity, specificity, PPV, NPV and accuracy of all the high-risk US features are mentioned in Table 2.

Table 2 Measures of diagnostic accuracy for various high risk ultrasound features of thyroid nodules

Amongst 38 nodules of benign category (I or II) on elastography, one (2.5%) was malignant on histopathological examination; while of the 13 nodules in malignant category (III or IV) on elastography, eight (61.5%) were carcinomas on histopathological examination (Table 3). Elastography scoring had sensitivity of 88.9%, specificity of 88.4%, PPV of 61.5%, and NPV of 97.4% to identify benign thyroid nodules (Table 4).

Table 3 Ultrasound elastography scores of the thyroid nodules, and comparison with final diagnosis
Table 4 Measures of diagnostic accuracy for elastography score and strain ratio, for benign thyroid nodules

The mean strain ratio for benign nodules was statistically significantly lower as compared to malignant nodules (2.72 ± 0.62 vs. 4.52 ± 0.75, P < 0.0001). For strain ratio, the area under the ROC curve (AUC) was 0.963 (95% CI 0.869–0.996, P < 0.0001). Optimal cut-point value for strain ratio to differentiate benign and malignant thyroid nodules was determined to be 3.8 (Fig. 3). 42 nodules had strain ratio of ≤3.8, out of which 41 were benign; while ten nodules had strain ratio of >3.8, out of which eight were malignant. Using 3.8 as the cut-point value, the sensitivity, specificity, PPV, NPV and accuracy for elastography strain ratio was 88.9%, 95.4%, 80%, 94.6% and 94.2%, respectively. (Table 4).

Fig. 3
figure 3

Receiver operator characteristic (ROC) curve analysis to determine cut-point value of the strain ratio to discriminate between benign and malignant thyroid nodule

Of the six nodules classified as Bethesda category-III or IV on FNAC, histopathological diagnosis was follicular carcinoma in one nodule (Bethesda category-IV), four were diagnosed with follicular adenoma, and one with colloid goitre. The elastography score of the nodule with follicular carcinoma was category-III (malignant), while amongst the rest, two nodules were classified as category-II (benign), and three nodules as category-III (malignant) on elastography. The strain ratio of the follicular carcinoma was 4.3, while the mean strain ratio in remaining nodules was 3.48 (P = 0.001). Though follicular carcinoma was classified as malignant by elastography, the histopathologically benign lesions in intermediate Bethesda categories could not be reliably differentiated by elastography. Two nodules (5.3%) of the 38 in the Bethesda category-II were classified as malignant on elastography (one each as category-III and IV). One nodule (12.5%) was classified as benign on elastography (category-II) amongst the eight nodules in Bethesda category-V and VI.

Discussion

Thyroid US elastography is a real-time non-invasive imaging technique wherein the firmness of the thyroid tissue is assessed by measuring the degree of tissue deformation in response to application of a controlled pressure using the ultrasound-probe, and relies on the underlying principle that the softer benign tissues deform more easily than the harder malignant tissues [7, 13].

In our study, 39 nodules had elastography score of I or II, of which 38 (97.4%) nodules were benign, and one (2.6%) nodule was malignant. Thirteen nodules had elastography score of III or IV, out of which eight (61.5%) nodules were malignant, and five (38.5%) nodules were benign. So the scores of I and II as per the Itoh criteria were majorly seen in benign nodules, whereas, scores of III and IV were notably seen in malignant nodules. High elastic score in five benign nodules can be explained by presence of microcalcifications (present in four nodules), or may be due to fibrotic changes in long standing nodules [13]. Low elastic score seen in one malignant nodule, with the final diagnosis of PTC, could be due to smaller size (2 × 1.6 × 1.2 cm) than the mean size of nodules in the study, and may be due deeper location of the nodule in the thyroid gland. The deeper the nodule, the less pressure it will receive, and thus less tissue distortion will be gained [13]. The sensitivity, specificity, PPV, NPV and accuracy for elastography score in the current study was 88.9%, 88.4%, 61.5%, 97.4% and 88.5%, respectively. Multiple studies in distinct populations, with differing methodologies for diagnosis, have reported a sensitivity of 80 to 95%, and specificity of 80 to 90% for elastography scoring to diagnose benign nodules [9,10,11,12, 17,18,19,20,21,22].

Lyshchik and colleagues published that among US elastography analysis, the strain index was the strongest independent predictor of thyroid gland malignancy [23], while Azizi and co-workers reported that PPV of strain elastography was equal to or greater than that of conventional ultrasonographic characteristics, and NPV was greater than any other predictor of malignancy [20]. In our study, the mean strain ratio for benign nodules was 2.72 ± 0.62, and that for malignant nodules was 4.52 ± 0.75 (P < 0.0001), while the optimal cut-point value to differentiate between benign and malignant nodules was determined to be 3.8. Review of elastography studies using strain ratio as parameter to distinguish benign from malignant nodules, have reported significantly lower strain ratio scores for benign nodules, as compared to malignant nodules [13, 19, 21, 24,25,26]. Measures of diagnostic accuracy for strain ratio to distinguish between benign and malignant nodules as reported by various studies is tabulated in Table 5.

Table 5 Comparison of measures of diagnostic accuracy for elastography strain ratio of select studies

Our results showed a high sensitivity for high elastography score and strain ratio in detection of malignancy (88.9% for both), which is seen without a compromise in the specificity (88.4% and 95.4%) of the testing parameters. Thus, this method is equally strong in diagnosing the malignant, as well as the benign lesions.

In the present study, 15 (34.1%) had elastography score of I. In all these patients, the final diagnosis was benign. Thus, elastography had a 100% NPV for the elastography score of I. These 34.1% patients could have been safely followed on ultrasound instead of invasive FNAC, and the number of FNAC procedures could have been reduced by more than one-third. Dighe and colleagues also reported that using elastography, the number of FNAC could be reduced by 60%; however theirs was a retrospective study, and it used systolic thyroid stiffness index, unlike the strain ratio in this study [11].

In addition to reducing FNAC, elastography strain ratio due to its high PPV (80%) may reduce potential delay or error in detection of malignancy. This is particularly important in nodules which lack suspicious US features, or which are less than one cm, and also in case of multi-nodular goiter where choice of nodules to biopsy is frequently a challenge. Currently it is recommended to biopsy all nodules as per the sonographic criteria. By application of elastography, the nodules with a high elastography score or high strain ratio, which have a high probability of malignancy, could be preferentially biopsied.

Presence of any one of high risk feature on US showed sensitivity of 88.9% and specificity of 40.9% in the present study (Table 2). A multicenter study reported 83.3% sensitivity of US in differentiating benign and malignant nodules [6]. Various other studies have shown a wide range of US sensitivity from 55% to 95%, and specificity of 52% to 81%, in differentiating benign from malignant nodules [3, 22,23,24]. These differences are due to the differences in the studied population, the sample sizes, and the experience of operators. 40.9% specificity and 48.07% accuracy in our study shows that the US is not very accurate in differentiating benign from malignant thyroid nodules. The mean TSH for benign nodules was 1.88 ± 1.01 mIU/mL, while it was 2.19 ± 1.55 mIU/mL for malignant nodules in our study. Haymart and co-workers have reported that higher TSH level, even within the upper part of the reference range, is associated with an increased risk of malignancy in a thyroid nodule [27]. However, association between TSH values and presence of malignancy was not significant (P = 0.849) in our study, which could be due to the small sample size. Thus, only ultrasonography or serum TSH levels are not useful by themselves alone to decide about the malignant potential of a thyroid nodule.

This is the first study from Western India region investigating the usefulness of thyroid US elastography to distinguish benign and malignant thyroid nodules, in a heterogeneous urban population. The key strength of the study is its prospective nature. A few limitations of the study need to be acknowledged. The thyroid nodules representing malignancies other than papillary carcinoma was not adequate, and also the inter-observer variability of elastography values was not analyzed. Thirdly, the final diagnosis was obtained only from FNAC in most nodules, as subjecting each patient to thyroidectomy for the gold-standard histopathological diagnosis would have been unethical.

Conclusion

Ultrasound elastography has good diagnostic efficacy for the differentiation of benign and malignant thyroid nodules. It can potentially reduce the rates of unnecessary fine needle-aspiration biopsy, and can be a good supplementary tool to gray-scale ultrasonography. However, whether elastography score or strain ratio will be useful in diagnosis of nodules which are indeterminate on FNAC is open to further studies.