Introduction

The prevalence of thyroid nodules in the general population is about 3–8 % [13] and greater than 50 % in people over 65 years of age [3]. Although most thyroid nodules are benign, the prevalence of thyroid cancer is as high as 5–15 % [4].

Palpation can detect large and firm nodules suspicious for malignancy, but its reported accuracy is low [1, 3]. Ultrasound and colour Doppler ultrasound (CDUS), despite being extremely accurate in identifying thyroid nodules, has limited effectiveness in differentiating benign and malignant nodules and in selecting the cases requiring fine-needle aspiration biopsy (FNAC) [5, 6]. Although several ultrasound features (microcalcifications, hypoechogenicity, irregular margins and intranodular vascularity) are found to be correlated with malignancy in thyroid nodules, they are not highly predictive for malignancy [7]. Ultrasound sensitivity and specificity are considerably variable from study to study and range between 52–81 % and 54–83 %, respectively [5]. FNAC is the standard procedure for determining whether a thyroid nodule is cancerous, but it is invasive, subject to sampling errors, limited in its diagnostic capabilities, expensive and may cause minor complications [810].

More recently, elastography has been introduced to select nodules requiring FNAC with higher accuracy than conventional ultrasound. Compressive or quantitative elastography is based upon the principle that malignancies have stiffer tissues than benign lesions and that, under compression, the softer parts of tissues deform more easily than the harder parts [11]. The force of compression can be measured either directly by the operator's hand or by carotid artery pulsation [11]. The evaluation of stiffness can be qualitative with a colour-coded or grey scale system or quantitative with offline measurements [11]. Elastography has been proven to be a promising imaging technique, with good diagnostic accuracy with both qualitative and quantitative methods [1225]. Rago et al. [18] showed sensitivity and specificity in detecting malignant thyroid nodules to be as high as 97 % and 100 % by using ultrasound elastography. Similarly, in a study with 145 thyroid nodules that were referred for surgical thyroid removal, Hong et al. [18] reported a sensitivity of 88 % and specificity of 90 %. However, in spite of these excellent results, Park et al. [26] reported a significant interobserver variability in the application of some of the interpretation techniques and suggested that the assessment of nodule elasticity is influenced by variability in both data acquisition and interpretation.

Therefore, in our prospective study, we assessed the accuracy of Q-elastography as compared with CDUS in a large patient population. Moreover, we assessed the interobserver variability between two operators with different experience, with pathological analysis as the reference standard. We employed free-hand compression Q-elastography with a new tool that offers time-elasticity graphs to be plotted over a region of interest in the compression or relaxation cycles in order to obtain semiquantitative evaluation of the tissue stiffness.

Materials and methods

Study design

In this prospective study, designed as an extension and follow-up of our former experience [24] to evaluate our preliminary experience with a larger sample and estimate the interobserver agreement, data were collected from a series of consecutive thyroid nodules evaluated with conventional ultrasound and elastography in patients presenting at our institution between May 2010 and March 2012. After ultrasound, FNAC was obtained, and suspicious or malignant results led to surgery. For all patients, cytology or post-surgical histopathological results were considered the standard of reference.

Subjects

Of 347 consecutive patients evaluated with thyroid nodules (with dimensions greater than 5 mm), 59 were excluded owing to a lack of cytological or histopathological data (38 patients) or to refusal of the intervention (11 patients) or inadequate specimen material (10 patients).

The remaining 288 patients (median age: 53 years old, range 15–85, 198 female and 90 male) with 344 nodules underwent both conventional ultrasound and elastographic examination by two independent radiologists, one with 10 years of experience in thyroid ultrasound and the other with 4 years of experience in thyroid ultrasound, both blinded to patients' data. Institutional review board approval was obtained and all subjects signed the informed consent.

Ultrasound and Elastography Technique

Both examinations were performed with a Toshiba Aplio XG machine (Toshiba, Osaka, Japan) using a 5-13-MHz linear-shaped probe.

Conventional ultrasound examinations

Each lesion was characterised according to ultrasound parameters [echogenicity, margins and halo features, microcalcifications and colour Doppler (CDUS) pattern]. The nodules included in this study were evaluated on the basis of the statement criteria of the “Revised American Thyroid Association Management Guidelines for Patients with Thyroid Nodules” [27]. In order to obtain statistical elaboration, we processed the categorical data of CDUS features, as if they were numerical data, by employing the subsequent simplified score criteria:

  • echogenicity: marked hypoechogenicity (more hypoechoic than the strap muscles) as a sign of malignancy was scored 1, whereas hyperechogenicity or isoechogenicity, as signs of a benign nodule, was scored 0;

  • margins: margin irregularity, an asymmetric halo or microcalcific halo as signs of malignancy was scored 1, whereas a symmetrical halo or regular margins as signs of a benign lesion was scored 0;

  • blood flow pattern at CDUS: absent (pattern I) and peripheral (pattern II) as signs of a benign lesion were scored 0, whereas peri-intralesional (pattern III) as a sign of malignancy was scored 1;

  • microcalcifications (excluding echogenic spots with a reverberation artefact, a finding indicative of inspissated colloid): absent microcalcifications as a sign of a benign lesion scored 0, whereas if present as a sign of malignancy they scored 1.

Elastography examinations

Elastography was performed with the Elasto-Q technique (Toshiba's semiquantitative relaxation elastography). This technique allows compression of the target tissue with the probe and visualisation, even if not in real time, of the dynamics of the compression by recording it on a compression/time curve, to allow a standardisation of the measures on a sinusoid-shaped compression-decompression curve. The ultrasound probe was placed gently on the thyroid in a transverse orientation. In order not to alter the measurements, the operators did not include in the imaging macrocalcifications and cystic areas. The exclusion criterion was: insufficient normal tissue around the target mass to obtain an adequate strain ratio. A series of compressions was performed, the compression dynamics were visualised and data were recorded if the dynamics fitted the requirements. Upon a compression whose dynamics were the most symmetrical, corresponding to the best cycle, we obtained colorimetric strain images in off-line processing, and thereafter regions of interest (ROIs) were set to obtain strain data. We evaluated the strain corresponding to the highest acceleration value during decompression. The strain ratio was calculated by dividing the strain value of the normal tissue at the same imaging section of the lesion with that of the nodule. For each operator, in the present article, lesion strain ratios were recorded and compared with the results of cytological or histopathological analysis as the reference method.

Statistical analysis

Data were collected prospectively and recorded by each radiologist performing CDUS and elastography and were entered into a computerised spreadsheet (Excel 2007, Microsoft Corp., USA). Statistical analyses were carried out using statistical software (SAS system for Windows, version 9.1.3; SAS Institute, Cary, NC, USA).

The sensitivity, specificity, positive predictive value, negative predictive value and diagnostic accuracy of each test were calculated. The optimal cutoff value for the strain ratio was calculated using receiver-operating characteristic (ROC) analysis. Areas under the ROC curve were compared using the Bonferroni test.

Interobserver variability for choosing CDUS feature descriptors in each category and for the measurement of the strain ratio (considering the strain ratio qualitatively as categorical data: benign or malignant) between the two observers was defined by using Cohen’s kappa (κ) statistic, which provides the amount of agreement between two unique raters after considering chance agreement [28, 29]. Values were interpreted according to Landis and Koch, who ascribe κ of <0.00 as “poor”, 0.00–0.20 as “slight”, 0.21–0.40 as “fair”, 0.41–0.60 as “moderate”, 0.61–0.80 as “substantial” and 0.81–1.00 as “almost perfect” [28]. All statistical calculations were performed using Stata version 12.0 (Stata Corp., College Station, TX, USA).

Results

Final diagnosis was based on the cytological and histological findings: FNAC was used as the reference standard for the diagnosis of benign nodules if the patients had not undergone thyroid surgery and histopathology was used if the patients had undergone thyroid surgery. The size of nodules ranged between 4 and 50 mm (mean 19.1 mm, SD: 12.68 mm). The final diagnosis of the 344 nodules was that 232 were benign (Fig. 1) and 112 were malignant (Fig. 2). Among the malignant nodules, 102 were papillary cancers, 6 were follicular carcinomas, 3 were medullary carcinomas and 1 was an anaplastic carcinoma. Among the benign nodules, 186 were hyperplastic nodules, 41 were adenomas and 5 were focal thyroiditis.

Fig. 1
figure 1

At baseline ultrasound the thyroid lesion appeared hyperechoic with a regular hypoechoic halo, while at colour Doppler it showed a pattern III vascularisation. At Q-elastography, reader 1 obtained a strain ratio of 1.5, while reader 2 a strain ratio of 2.04. Histopathology resulted in follicular hyperplasia

Fig. 2
figure 2

At colour Doppler ultrasound the lesion appeared mildly hypoechoic, with fluid areas and microcalcifications, vascularisation of pattern III, and lobulated margins. At Q-elastography, operator 1 and operator 2 obtained a strain ratio of 2.99 and 7.23, respectively. Histopathology resulted in papillary carcinoma

The mean age of patients with malignant nodules was younger than that of patients with benign nodules (P < 0.001). There was no association between the sex of the patients and the malignancy of the nodules (P = 0.079). The mean strain ratio of malignant nodules (3.47 for the first operator with 95 % confidence Interval: 95 % CI 3.13–3.81, and 2.89 for the second operator with 95 % CI 2.59–3.17) was significantly different from those of the benign lesions (1.29 for the first operator with 95 % CI 1.19–1.40, and 1.49 for the second operator with 95 % CI 1.36–1.62).

Receiver-operating characteristic (ROC) analysis was estimated for operator 1 (the expert operator) and for operator 2 (non-expert), for each CDUS feature and for the strain ratio measurements (Table 1); the results of the best performance found for the strain ratio measurements in both operators were reported (highest value for area under the ROC curve). The lowest value of the area under the ROC curve was found for the microcalcification score for both operators. The value of the area under the ROC curve was significantly higher for expert operator 1 in comparison with non-expert operator 2 for the margins score, blood flow pattern score and strain ratio score (Table 1).

Table 1 The performance for the detection of cancer for operators 1 and 2 as expressed by the area under the ROC curve, of the nodule echogenicity score (echo score), the nodule margins score, the nodule blood flow pattern score (pattern score), the nodule microcalcifications score (microcal score), CDUS features considered together (echo score total) and the nodule strain ratio score for operator 1 (sr1num) and operator 2 (sr2num)

In Table 1 we also present the ROC analysis of CDUS features considered together (and not individually this time) as an expression of the general performance of CDUS. The best performance was found for the strain ratio measurements as shown by the highest area under the ROC curve with a significant difference between the performance of the strain ratio and the total of CDUS features, P = 0.0001.

Q-elastography showed an excellent diagnostic performance: for operator 1 (Fig. 3) with a strain ratio best cutoff point selected at 2.02, sensitivity was 93 % and specificity 92 %; for operator 2 (Fig. 4) with a strain ratio best cutoff point found at 1.86, sensitivity was 84 % and specificity 79 %.

Fig. 3
figure 3

ROC curve analysis for operator 1. Area under the ROC curve by extended trapezoidal rule = 0.937962. Wilcoxon estimate of the area under the ROC curve = 0.937962. DeLong standard error = 0.015469: 95 % CI = 0.907643 to 0.96828. Optimal cutoff point selected = 2.02; sensitivity (95 % CI) = 0.928571 (0.864103 to 0.968659); specificity (95 % CI) = 0.922414 (0.880151 to 0.953371)

Fig. 4
figure 4

ROC curve analysis for operator 2. Area under the ROC curve by extended trapezoidal rule = 0.836786. Wilcoxon estimate of area under the ROC curve = 0.836786. DeLong standard error = 0.024361: 95 % CI = 0.789038 to 0.884533. Optimal cutoff point selected = 1.86; sensitivity (95 % CI) = 0.839286 (0.757946 to 0.901879); specificity (95 % CI) = 0.791111 (0.732086 to 0.842293)

Performance with the calculation of the area under the ROC curve and also sensitivities and specificities of Q-elastography with the strain ratio measurement were significantly higher for the first (expert) operator in comparison with the second (non-expert) operator. Inter-operator agreement resulted in excellent Cohen's kappa (κ) statistics: between the highest for the strain ratio measurements (0.95) and the lowest for the echogenicity score (0.83). All k values were within the range 0.81–1.00, considered as excellent agreement.

Discussion

Thyroid nodules are a common clinical problem nowadays with an increased ultrasound incidental detection, with its main issue being represented by the need to exclude malignancies, which occur in 5–15 % of nodules [15]. Most of the ultrasound features are not sufficiently predictive of the malignancy of a nodule. In the cases when several features are present, they are associated with a fair likelihood of thyroid malignancy and ultrasound characteristics are more reliable indicators of potential malignancy than nodule size [57].

The need to further reduce the number of unnecessary FNACs might be met with the use of elastography as a separate tool or in combination with CDUS features. Especially when there are multiple nodules in the thyroid gland, suspicious CDUS features with the aid of elastography may be helpful in targeting the right nodule for aspiration.

The recent American Thyroid Association guidelines [27] state that: “with the exception of suspicious cervical lymphadenopathy, which is a specific but insensitive finding, no single ultrasound feature or combination of features is adequately sensitive or specific to identify all malignant nodules”.

In the literature, a number of studies on elastography of thyroid malignancies report encouraging results [1125]. In a recent meta-analysis, eight studies, selected on the basis of a high rating in quality assessment, that included a total of 639 thyroid nodules were analysed [30]. For the diagnosis of malignant thyroid nodules with elastography, the overall mean sensitivity of the eight studies was 92 % (confidence interval 88–96) and the overall mean specificity was 90 % (confidence interval 85–95). However, a significant heterogeneity was found with regard to specificity in the different studies. In addition, a recent study [31] using qualitative elastography assessed using two different systems of colour-coded elastograms (Rago criteria and Asteria criteria) showed inferior performance of qualitative elastography in the differentiation of malignant and benign thyroid nodules compared with grey-scale ultrasound features in combination. Conversely, in the present study we achieved a good performance using a different method, semiquantitative Q-elastography, as shown by the amount of the area under the ROC curve (Table 1) for both operators (operator 1, the expert operator, 0.938; operator 2, the non-expert, 0.838). The sensitivity and specificity were respectively 93 % and 92 % for the first operator (with a strain ratio best cutoff point at 2.02) and 84 % and 79 % for the second operator (with a strain ratio best cutoff point at 1.86). In the comparison of each CDUS feature and strain ratio measurement, the results of the best performance were found for the strain ratio measurements in both operators (highest value for the area under the ROC curve). Therefore semiquantitative evaluation using the strain ratio was shown to be a more accurate and objective imaging tool than ultrasound features, with better results than the studies with qualitative elastography.

The lowest value of the area under the ROC curve was found for the microcalcifications score for both operators (Table 1), which is in agreement with the well-known fact that microcalcifications are a highly specific sign of malignancy, but as they are not often encountered, this sign has low sensitivity and thus a low diagnostic performance.

Another issue that is still under debate is the interobserver variability. Park et al. in their study [26] found no interobserver agreement among three radiologists using free-hand compression with colour-coded qualitative elastography, concluding that the extent of compression influences the score. On the other hand, various studies found a good interobserver agreement: Merino et al. [32] and Ragazzoni et al. [33] employing a qualitative system that scored the nodules according to strain homogeneity found an excellent agreement between operators with a k = 0.82 (0.74-0.89) and a good accuracy (84 %, OR 29) and interobserver concordance (k test 0.643); Lim et al. [34], employing a semiquantitative quasistatic method based on carotid artery pulsation, found an overall agreement between operators ranging from good to very good.

The interobserver agreement between the two operators in this study assessed using Cohen's k was greater for the strain ratio measurements than for the CDUS features, being excellent for all features, but the highest for the strain ratio measurements (near to 1) and the lowest for the echogenicity score (0.83). However, we can remark that the performance of the expert operator was significantly better than that of the non-expert one (Table 1) for the strain ratio measurements and for the evaluation of the margins score and blood flow score. As in general ultrasound studies, with the semiquantitative Q-elastography method experience and being high up on the learning curve confer diagnostic improvements.

Although we operate in a referral centre, a variety of patients present to our department with a broad range of thyroid nodules (from low to high probability of malignancy), including many cases directly referred to us by the general practitioner as well. It was our aim to avoid a spectrum composition bias; thus, our patient inclusion criteria were not restricted to cases previously selected for FNAC or surgery. A limitation in this study is the need for off-line post-processing with an estimated time ranging between 3 and 5 min approximately for obtaining the strain ratio value.

In our opinion, areas that should be addressed for improvement in the method we employed are related to the following issues: the difficulty in providing harmonic dynamics of compression, with the resulting curve morphology; the positions of ROIs relative to each other–i.e. because the strain depends upon the depth of the ROI, the nodule ROI should be at the same depth of the gland ROI; ROIs should remain within the nodule during the whole cycle; tissue features (i.e. the presence of calcifications, cystic areas, thyroiditis). Solutions to these issues are required, and further studies with improved equipment and methods are needed in order to better validate Q-elastography and elastography for the thyroid nodules in general. Furthermore, pure quantitative techniques that seem promising, such as shear-wave elastography, could offer advantages and need to be tested on the thyroid.

According to the results of our study, we can conclude that Q-elastography is a valid and useful diagnostic method that helps to improve characterisation of thyroid nodules in order to select candidates for surgery and to follow up patients with benign features.