Introduction

Breast ultrasound elastography (USE) is a new technique of ultrasonic imaging that has shown effectiveness for detection of malignancy within breast lesions. USE provides information about the mechanical properties of tissue such as elasticity and strain and maps it into color images [14]. Elasticity is the tendency of a tissue to resume the original size and shape; while strain is the level of change in size or shape in response to external compression (stress) [4]. Each pixel of the image is assigned one of 256 specific colors and demonstrates the magnitude of tissue strain depending on physiological and pathological changes in breast structure [3, 5]. Harder tissues such as malignancy may result in decreased strain and are shown in blue, while softer tissues will reflect increased strain and are shown in red [3]. Normal breast tissue which reflects average strain is shown in Green [3].

The color image is superimposed on B-mode ultrasound (USB) image for a better recognition of the relationship between the strain distribution and the anatomical borders of the lesion [3, 4, 6]. This information is further interpreted by evaluating the color pattern in a hypoechoic lesion (e.g., within lesion borders on USB image), and in the surrounding breast tissue [3]. A 1 to 5 scale elasticity score (ES) is assigned to each image based on its overall pattern, with the harder tissues (e.g. breast cancer) showing higher elasticity scores [3].

The diagnostic accuracy of elasticity scoring has already been investigated in several previous studies [2, 3, 5, 736] and a previous meta-analysis [37]. Our prior meta-analysis has shown a sensitivity of 79% and a specificity of 89% for use of ES in differentiating benign and malignant lesions [38]. Further, individual studies have reported that USE alone may increase the specificity of breast ultrasound in the characterization of breast lesions and potentially decrease unnecessary biopsies of benign breast lesions [1, 7, 9, 16, 39]. Only seven studies have reported the diagnostic performance of combination of USE (using ES) and USB, using breast imaging reporting and data system (BIRADS) [2, 8, 17, 22, 27, 33, 40]. Five of them have shown an improvement in the specificity of USB [8, 22, 27, 33, 40]; while four have reported an increase in sensitivity [2, 17, 27, 33]. However, evidence is lacking for a meta-analysis that directly evaluates the diagnostic performance of USB compared to USE alone or its combination with USB. We performed a meta-analysis of studies that reported a direct comparison of ES with BIRADS in differentiating breast lesions, according to Cochrane Diagnostic Test Accuracy Review Working Group guidelines [41]. We further, evaluated the diagnostic performance of combination of USE and USB compared to USB, using specific statistical methods [42] on the same database, representing a significant expansion of our previous work [38].

Methods

Criteria for considering studies for this review

Types of studies

All analytical studies reporting a direct comparison of elasticity score alone with BIRADS in differentiation of focal breast lesions that were published in full text were considered for eligibility. No language restriction was used.

Participants

Study participants were patients who had breast symptoms or an abnormal clinical breast examination, breast US or mammography. There was no age restriction for the study participants.

Index test

Breast USE was the index test. Only papers in which a 5-point scale elasticity score according to Itoh et al. [3] was calculated were included. Lesions with ES of 4 and 5 were considered malignant, while the other ES were grouped as benign lesions.

Comparator test

Conventional USB was the comparator. USB images were reported according to BIRADS categories [43]. Lesions with BIRADS categories of 4 and 5 were considered malignant, while the other categories were grouped as benign lesions.

Target condition

The index test is used to differentiate benign from malignant breast lesions.

Reference standards

Histopathological (core biopsy or surgical biopsy) or cytological (fine needle aspiration) confirmation of breast lesion is the reference standard.

Search methods for identification of studies

For the purpose of this study, we used the search result of our prior meta-analysis [38].

Electronic searches

Electronic searches of PubMed, EMBASE, ISI Web of Knowledge and Cochrane database from inception through to August 22, 2011 were performed without any constraints. We used relevant text words and Medical Subject Heading terms that included breast combined with sonoelastography, elastosonography, elastography, elasticity imaging and strain imaging.

Searching other resources

Reference lists from identified studies were manually scanned to identify other relevant studies.

Data collection and analysis

Selection of studies

Two authors independently conducted the literature search. A list of articles meeting the inclusion criteria based on abstracts was complied, and these articles were retrieved in full text. Two reviewers independently reviewed the list of full texts for inclusion. Discrepancies were discussed and resolved upon agreement on a final set of studies.

Data extraction and management

Data, extracted by two reviewers (GS, BAD), included patient characteristics (number, gender, mean age), lesion palpability, technical characteristics of USE, the reference standard, and the study results for USE and USB (number of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN)).

Assessment of methodological quality

Two reviewers (GS, BAD) independently assessed the methodological quality of included studies, using 11 items of the extensively validated Quality Assessment of Diagnostic Accuracy Studies (QUADAS) checklist [44]. Disagreements were resolved by consensus.

Statistical analysis and data synthesis

TP, FP, TN and FN for USE and USB were extracted directly from the source literature, where possible. Otherwise, values were calculated from the data provided. If the study did not report results for USB, we requested original data by directly corresponding with authors or principal investigators. Summary sensitivity and specificity and 95% credible interval (CrI)—which is the Bayesian analog to confidence interval—for USE and USB were calculated using bivariate generalized linear mixed modeling (a random effects model) [45, 46]. We used Bayesian Markov chain Monte Carlo simulation with non-informative hyperpriors and implemented with the WinBUGS program interfaced with STATA. Summary positive and negative likelihood ratios (LR) were calculated from the model estimates. LRs can be interpreted as follows: a LR of 0 excludes disease, a LR of infinity (∞) excludes normality and a LR of 1 means no change in likelihood of disease. For the diagnostic information to have high probability of altering clinical management, a likelihood ratio greater than 10 or less than 0.1 would be required for a positive or negative test result, respectively. Moderate informational value can be achieved with likelihood ratios of 5–10 and 0.1–0.2; likelihood ratios of 2.0–5.0 and 0.2–0.5 indicate very little informational value [47].

The data were graphically displayed in summary receiver operating characteristic (SROC) curve with summary operating points for sensitivity and specificity on the curves embellished with 95% confidence region. We used the Rutter and Gastonis version of formulas for constructing SROC curve [48].

Further, we assessed the diagnostic accuracy (sensitivity, specificity, and LRs and area under SROC curve) of combination of USE and USB, using the method described by Ament et al. [42]. We combined the two tests based on two different positivity criterion: (1) conjunctive, where the outcome of the combination of tests is positive only if both test results are positive; in all other cases the outcome of the combination of tests would be negative; (2) disjunctive, where the outcome of a combination of tests is negative only if both tests are negative; in all other cases, the outcome of the combination of tests would be positive.

Publication bias was assessed using Deeks’ funnel plot asymmetry test [49]. Heterogeneity between studies was assessed by using the I 2 statistics. I 2 values range between 0 and 100%, where 0% indicates no observed heterogeneity and values greater than 50% may be considered to indicate substantial heterogeneity [50]. All statistical analyses were performed with the user-written “midas” module for STATA, version 11 (Stata Corp., College Station, Texas) [51].

Results

The literature search from our previous study yielded 2,927 articles, of which 172 were reviewed in abstract, and from them 51 were further reviewed in full text (Fig. 1). Of these, 29 studies were eligible for inclusion. Of the excluded studies, thirteen did not fully meet the inclusion criteria, two had used 4-point scale elasticity score [52, 53], measured different from the classification introduced by Itoh et al. [3], two did not report data on USE performance as a single test [33, 40], and five did not report data on USB performance [11, 21, 35, 36, 54].

Fig. 1
figure 1

Literature search and selection schema. USB B-mode ultrasound, USE ultrasound elastography

Characteristics of included studies

Table 1 summarizes the clinical characteristics of patients and their breast lesions, and the reference standard used in all the included meta-analyses. Included studies were published between July 2005 and May 2011 in peer-reviewed journals. The mean age of included patients ranged from 39 to 55 years. Overall, our analysis included 29 studies with information on 5,153 patients and 5,511 breast masses. Of included lesions, 2,065 were malignant and 3,446 were benign. Except for Parajuly et al. [17], which used acoustic vibration source for strain measurement, the rest of the studies used freehand compression elastography probes. All the studies that reported the correlation method for strain measurement, had used the combined autocorrelation method (CAM) [3].

Table 1 Baseline characteristics of included studies

Methodological quality of included studies

Appendix Fig. 1 in Supplementary material summarizes the frequency of each study quality indicator across the studies. All studies used appropriate reference standard(s) for verification, and 97% of them explained withdrawals. On the other hand, none of the studies clearly stated whether the reference standard interpretation was performed without the knowledge of index test results.

Diagnostic performance of USE alone

The summary test operating measures for USE alone were: sensitivity of 79% (95% confidence interval (CrI), 74–83%), specificity of 88% (95% CrI, 82–92%) (Fig. 2; Table 2). The positive and negative likelihood ratios (LR) for USE were 6.71 (95% CrI, 4.60–10.20) and 0.24 (95% CrI, 0.19–0.30), respectively. This translates into a moderately informative test, where exclusion and confirmation of breast cancer using the test alone is not possible [38].

Fig. 2
figure 2

Forest plot of studies reporting elasticity score shows individual estimated sensitivities and specificities of the studies evaluated in the meta-analysis, as well as pooled values (open diamond), with corresponding 95% credible intervals (in brackets). The broken black line represents the pooled estimates of sensitivity and specificity. FN false-negative, FP false-positive, TN true-negative, TP true-positive, 95% Crl 95% credible interval

Table 2 Diagnostic and clinical performance of elasticity score, B-mode ultrasound and their combination

Figure 3 shows the resulting SROC curve with summary operating points for sensitivity and specificity on the curves. The summary area under the curve was 91% (95% confidence region, 89–93%), compatible with a good test accuracy [55]. The inconsistency index (I 2) for heterogeneity was 36% (95% CrI, 26–49%). When assessing I 2 for sensitivity and specificity analysis separately, the index was 11% (95% CrI, 6–21%) for sensitivity and 29% (95% CrI, 19–44%) for specificity. Funnel plot and linear regression showed no evidence of publication bias.

Fig. 3
figure 3

Summary receiver operator characteristics curves including a summary operating point for sensitivity and specificity (green diamond) and a 95% confidence region (gray square) for elasticity score (left upper), BIRADS (right upper), conjunctive combination of the two tests (left lower), and disjunctive combination of the two tests (right lower). The individual circles around each study number (observed data) describe the sample size weight of the individual studies. AUC area under curve; SEN sensitivity, SPE specificity

Diagnostic performance of USB alone

USB summary estimate of sensitivity and specificity were 96% (95% CrI, 93–98%), and 70% (95% CrI, 55–83%), respectively (Fig. 4; Table 2). The pooled positive and negative LRs for USB alone were 3.10 (95% CI, 2.12–5.14) and 0.06 (95% CI, 0.04–0.10), respectively. This finding would be interpreted as a negative USB is capable of excluding malignancy within a breast lesion; while a positive USB is slightly informative and requires additional testing for confirmation of malignancy [47]. The summary area under the curve for USB was 92% (95% confidence region, 90–94%), compatible with a good test accuracy (Fig. 3) [55]. The I 2 for heterogeneity was 62% (95% CrI, 49–75%). When assessing I 2 for sensitivity and specificity analysis separately, the index was 35% (95% CrI, 20–55%) for sensitivity and 51% (95% CrI, 37–67%) for specificity.

Fig. 4
figure 4

Forest plot of studies shows individual estimated sensitivities and specificities for B-mode ultrasound in the studies evaluated in the meta-analysis, as well as pooled values (open diamond), with corresponding 95% credible intervals (in brackets). The broken black line represents the pooled estimates of sensitivity and specificity. FN false-negative, FP false-positive, TN true-negative, TP true-positive, 95% Crl 95% credible interval

Diagnostic performance of combination of USB and USE

Conjunctive positivity criterion

We further evaluated the diagnostic performance of combination of USB and USE using conjunctive positivity criterion. The analysis demonstrated that the summary sensitivity for combination of the tests was 73% (95% CrI, 67–79%), and the summary specificity was 97% (95% CrI, 95–98%) (Fig. 5; Table 2). The summary positive and negative LRs for the combination of the tests were 26.20 (95% CrI, 16.00–48.68) and 0.28 (95% CrI, 0.22–0.34), respectively, showing that the positive test is strongly capable of confirming the disease; while the negative test is as informative as USE alone, and requires further test for ruling out the cancer. The summary area under the curve for USB was 98% (95% confidence region, 96–98%), translated into a better detection performance compared to USE or USB alone (Fig. 3) [55, 56]. The I 2 for heterogeneity was 42% (95% CrI, 30–58%). When assessing I 2 for sensitivity and specificity analysis separately, the index was 13% (95% CrI, 7–24%) for sensitivity and 36% (95% CrI, 23–54%) for specificity.

Fig. 5
figure 5

Forest plot of studies shows individual estimated sensitivities and specificities for conjunctive combination of elastography and B-mode ultrasound of the studies evaluated in the meta-analysis, as well as pooled values (open diamond), with corresponding 95% credible intervals (in brackets). The broken black line represents the pooled estimates of sensitivity and specificity. FN false-negative, FP false-positive, TN true-negative, TP true-positive, 95% Crl 95% credible interval

Disjunctive positivity criterion

Assessing the combination of the two tests using the disjunctive positivity criterion showed that the summary sensitivity, specificity, positive LR and negative LR for combination of tests were 99% (95% CrI, 98–99%), 59% (95% CrI, 44–68%), 2.41 (95% CrI, 1.79–3.05), and 0.02 (95% CI, 0.01–0.02) (Fig 6; Table 2). This can be interpreted as a test that is nearly similar to USB alone; a negative test can exclude malignancy, while a positive test is small informative and requires further tests for confirmation. The summary area under the curve was 91% (95% confidence region, 89–93%) (Fig. 3). The I 2 for heterogeneity was 45% (95% CrI, 30–94%). When assessing I 2 for sensitivity and specificity analysis separately, the index was 5% (95% CrI, 0–43%) for sensitivity and 43% (95% CrI, 26–90%) for specificity.

Fig. 6
figure 6

Forest plot of studies shows individual estimated sensitivities and specificities for disjunctive combination of elastography and B-mode ultrasound of the studies evaluated in the meta-analysis, as well as pooled values (open diamond), with corresponding 95% credible intervals (in brackets). The broken black line represents the pooled estimates of sensitivity and specificity. FN false-negative, FP false-positive, TN true-negative, TP true-positive, 95% Crl 95% credible interval

Comparative effectiveness of USE versus USB

Figure 7 and Appendix Fig. 2 in supplementary material show the post-test probabilities of USE alone, USB alone and the combination of the two tests across a range of disease prevalence (or pre-test probability) of breast cancer. According to Bayesian statistics, in patients with low pre-test probability of a disease, a high specificity is important [42]. Therefore, the conjunctive combination of the two tests which has the highest specificity (97%), compared to other options would be the best strategy to avoid unnecessary biopsies. On the other hand, for patients with high pre-test probability of disease, a high sensitivity is important, and therefore, the disjunctive combination of the two tests (sensitivity of 99%) or USB alone (sensitivity of 96%), would be the best test options.

Fig. 7
figure 7

Conditional probability curves after a positive and negative test result for US elastography (esscore), B-mode ultrasound (bmode), and their combination using conjunctive positivity criterion (conj), and disjunctive positivity criterion (disj). The horizontal axis shows the pre-test probability of malignancy within a breast lesion, and the vertical axis shows the post-test probability of malignancy

Discussion

This study is the first comparative effectiveness meta-analysis of USE and USB and their combination on 5,511 breast masses. Our study results demonstrated that ES can improve USB specificity (70 vs. 88%) at the cost of a drop in test sensitivity (96 vs. 78%). We further demonstrated that in patients with low risk of disease the conjunctive combination of the two tests would be the most useful test option, while in patients with high risk of breast cancer single USB test or the disjunctive combination of the two tests would be the best option.

Conventional USB, palpation and mammography are the three steps routinely performed in clinic for diagnosing a suspicious breast lesion. However, none of them alone or in combination with each other is able to differentiate malignancy and there is always the need to obtain biopsy or fine needle aspiration to confirm the diagnosis. A high percentage of these biopsies are benign [57], but may lead to increased patient’s anxiety, and impose a burden of cost to health care system [58]. On the other hand, the risk of a missed malignancy in a non-palpable lesion always remains despite using USB and mammography [59]. Therefore, any improvement in medical technology which can improve the diagnostic performance of these modalities is encouraged.

Breast USE has integrated the diagnostic ability of palpation into an ultrasound instrument with a compressive probe, and reflects the tissue stiffness (hardness) and elasticity in response to pressure; even in lesions that are not-palpable by hand. The 1–5 point scale ES introduced by Itoh et al. [3] has provided for standardized interpretation of elasticity images which may then be translated into a 1–5 point scale similar to BIRADS categories.

In this study, our results show that USE when used alone improves the specificity of USB. Compared to USB, a 19% increase in test specificity with ES may result in 17% decrease in sensitivity, with no overall improvement in test accuracy (area under curve (AUC), 91 vs. 92%) (Fig. 3). Considering the fact that USE is not currently reimbursed and overall test performance of USE alone compared to USB is similar, USE alone does not appear superior to USB to recommend its clinical use independent of USB.

In evaluating the combination of USB and USE, the current study tested two combinations, the conjunctive combination and the disjunctive combination, as defined above. Our results demonstrate that the conjunctive combination of USE and USB, is capable of improving the specificity by 28% (compared to USB alone) at the cost of decreasing the sensitivity by 23%. However, the overall improvement in test accuracy (AUC, 97 vs. 92%), as well as the significantly high positive LR of 26.28 makes the test an ideal option to confirm the disease, resulting in a significant decrease in the number of unnecessary biopsies especially in low risk patients. In order to avoid unnecessary costs and to achieve the highest test efficiency, we recommend that all low risk patients first undergo USB, and only if the test was positive, a USE is performed as a supplemental modality to assist in decision to biopsy a lesion or follow it up.

The disjunctive combination of the two tests may result in a slight increase in test sensitivity compared to USB (99 vs. 96%), at the cost of a greater decrease in test specificity (59 vs. 69%). Previous single-site studies (which we used as source publications) that reported the diagnostic performance of combination of USB and USE [2, 8, 17, 22, 27, 33, 40] used different definitions for what constituted a “combined USB/USE examination” (Table 3). Therefore, the literature shows heterogeneous improvement for a “combined USB/USE examination” compared to either test individually. The two studies [2, 17] that reported the disjunctive combination of the two tests fitting our definition have also shown improved sensitivity similar to our meta-analytic results. None of the prior studies conducted evaluation of conjunctive combination.

Table 3 List of the studies that reported the diagnostic performance of USE and USB

Limitations

Our study has limitations. First, the best cutoff score for determining benign or malignant lesions varied in different studies [3, 8, 20, 23, 57]. However, for the purpose of this study we extracted data with a cutoff score of 3 for all individual studies. Second, USE performs better in differentiation of malignancy within small lesions that are surrounded by a large amount of normal tissue, than for large lesions [18]. Scaperrotta et al. and Zhi et al. [21, 28] reported higher sensitivity but lower specificity of USE for lesions less than 1 cm in size. Giuseppetti et al. and Regini et al. [5, 32] reported an improvement in both sensitivity and specificity for lesions less than 2 cm. Since we did not have access to patient level data in the current study, it was not possible to evaluate the performance of USE or its combination with USB based on lesions size.

Implications

In summary, the application of USE as a single test is not superior to USB alone. USE improves specificity of conventional USB. However, this decrease in the number of unnecessary biopsies may be at the cost of increase in the number of missed cancers, with no change in overall diagnostic accuracy. However, in low risk patients, we recommend that USE be performed following a positive B-mode result. If both the USB and USE are positive, the patient should be referred for biopsy. Other patients with positive USB and negative USE could be evaluated with imaging follow-up which may serve to decrease the rate of benign biopsies. For high risk patients, we recommend that USB alone, rather than using USE alone or their combination, be used to evaluate breast masses. In these patients, if the USB is positive, we recommend further evaluation with biopsy.