Introduction

Ultrasound (US) is widely used to distinguish malignant from benign breast lesions. When compared with other early detection methodologies for breast cancer, such as mammography, magnetic resonance imaging, and biopsy, US provides several advantages, such as high spatial resolution, real-time imaging, rapid frame rate, and low cost. However, US showed a low specificity in the prediction of breast lesions. The usefulness of US for breast lesions varies depending on the skills of the operators. Moreover, US cannot detect tissue stiffness.

Conventional strain elastography is a new technique that has shown effectiveness for detection of malignancy by enabling measurement of tissue deformation in response to compression and displaying tissue stiffness [1, 2]. However, strain elastography is only a qualitative method. Several factors, such as poor reproducibility and high operator dependence, have also been proposed as potential causes of incorrect diagnoses [3].

With the development of elastography, acoustic radiation force impulse (ARFI) technology has emerged to display tissue stiffness via a qualitative gray-scale map (virtual touch tissue imaging, VTI) or a quantitative response (virtual touch tissue quantification, VTQ). In VTI, focused acoustic radiation force ‘pushing’ pulses are used to deform the tissue. The resulting tissue displacement is measured within the focal region of each push within a specified region of interest (ROI), and the distribution of displacement or its normalized values within the ROI is displayed in a gray-scale map [1, 2]. In VTQ, focused acoustic radiation force pushing pulses of short duration (i.e., temporal impulse <1 ms) are used to generate shear waves within an organ of interest, and the speed of the shear waves propagating away from the pushing location can be measured. The information can be reported as an average value within an ROI, and the values are reported as shear wave speed (m/s). Currently, the newest ARFI technique called virtual touch tissue imaging quantification (VTIQ) has emerged and is being used to evaluate breast lesions, which has yielded a new dimension to ARFI [4, 5].

To date, although several studies have investigated the diagnostic efficacy of ARFI technology for distinguishing breast lesions, those studies reported wide ranges of sensitivity (55.2–100 %) and specificity (55.3–97.0 %) [418]. In 2013, Li et al. [19] performed a meta-analysis to summarize the diagnostic performance of shear wave elastography (SWE), including ARFI and supersonic shear imaging (SSI), for breast lesion evaluation. Unfortunately, only 447 patients from four studies using ARFI were included in this meta-analysis.

Therefore, we performed a meta-analysis to assess the ability of elastography by ARFI technology to differentiate benign and malignant breast lesions.

Materials and methods

Literature search

PubMed, the Cochrane Library, and the Web of Knowledge were searched for articles published in English before September 2014, using the following search terms: acoustic radiation force impulse/ARFI/point shear wave elastography/shear wave velocity/breast. The last search date was September 24, 2014 without publication year limitation. In addition, all reference lists were checked manually to find more potentially relevant studies.

Inclusion criteria were: (1) publication in English; (2) a study population of at least 10 patients; (3) qualitative or/and quantitative analysis of the characteristics of the breast lesions by ARFI; and (4) having the necessary data to calculate the true positive (TP), false positive (FP), true negative (TN), and false negative (FN) diagnostic results for the differentiation of breast lesions. Editorials, letters to the editor, reviews, case reports, and animal experimental studies were excluded. All the literature searching and assessment were carried out by two researchers (B.X.L. and Y.L.Z.) independently. Discrepancies were resolved by the adjudicating senior author (X.Y.X.).

Quality assessment and data extraction

The methodological quality for each study was independently assessed using the Quality Assessment of Studies of Diagnostic Accuracy included in Systematic Review (QUADAS-2 tool) [20].

Data extraction was performed independently by two reviewers (B.X.L. and Q.Y.S.), mainly including demographic characteristics, characteristics of lesions, technical protocol, and diagnostic test results using a standardized form. In addition, TP, FN, FP, and TN results were recorded. If studies had not provided data to construct 2 × 2 contingency tables directly, they were calculated based on diagnostic sensitivity and specificity provided in the studies. Any disagreement was resolved by the adjudicating senior author (X.Y.X.).

Data analysis

Estimates of summary sensitivity, specificity, positive likelihood ratio (PLR), and negative likelihood ratio (NLR) were calculated using the bivariate mixed-effects regression model. Summary receiver operating characteristic (sROC) curves with 95 % confidence intervals (95 % CI) were constructed to summarize the results quantitatively. Heterogeneity of the included studies was assessed by the likelihood ratio (I 2) index. I 2 values range between 0 and 100 %, where 0 % indicates no observed heterogeneity. If the I 2 index was greater than 50 %, it was considered to indicate substantial heterogeneity other than chance alone [21]. If heterogeneity existed, meta-regression analysis was performed to explore the potential sources of heterogeneity that included publication year, continent of study origin (Asian versus non-Asian country), sample size, mean age of patients, malignant rate, and the cut-off value of SWV [22].

Publication bias was examined by construction of a funnel plot of the logarithm of the diagnostic odds ratio (lnDOR) versus the inverse of the square root of the effective sample size (1/ESS1/2). Testing for publication bias was conducted by a regression of lnDOR against 1/ESS1/2, weighting by ESS with p < 0.05 for the slope coefficient indicating significant asymmetry [23].

All statistical analyses were performed using the MIDAS module of Stata, version 12.0 (Stata, College Station, TX, USA). Two-sided p < 0.05 was considered to be statistically significant.

Results

Search results and study characteristics

The described search strategies retrieved a total of 153 studies. Forty-two studies were excluded due to duplication. Of the remaining 111 studies screened for title, abstract, or both, 93 were not related to the research subject or review articles. Therefore, 18 potentially relevant studies were identified for further evaluation. Three of these studies were excluded, one study [24] because it was not a diagnostic article, and two studies [25, 26] due to insufficient data. Finally, 15 studies [418] fulfilled our inclusion criteria. A flowchart describing the study selection is shown in Fig. 1.

Fig. 1
figure 1

Literature search and selection

The main characteristics of studies included in the meta-analyses are shown in Table 1. These studies included 1873 breast lesions (743 malignant, 1130 benign) in 1720 patients. The overall prevalence of malignant breast lesions was 39.7 % (range 20.87–56.52 %). Totally, VTI was used in six studies [7, 8, 12, 14, 17, 18], VTQ in eight [914, 16, 18], combined VTI and VTQ in four [6, 14, 15, 18], and VTIQ in three [4, 5, 13]. In the VTI group, only one study used pattern classification to detect breast cancer, whereas five studies investigated area ratio between elastic image and B-mode image. In the VTQ group, the results showing the best diagnostic performance was chosen for analysis in each study. In the combined VTI and VTQ group, three out of four studies combined ARFI with breast imaging-reporting and data system (BI-RADS). All the data were analyzed. The QUADAS-2 scale showed that the included studies were of acceptable methodological quality (Table 2).

Table 1 Characteristics of these included studies
Table 2 Quality assessment of included studies

Diagnostic accuracy of VTI

Six studies [7, 8, 12, 14, 17, 18] evaluated the diagnostic accuracy of VTI for the differentiation of benign from malignant breast lesions (Table 1). Forest plots of sensitivity and specificity with corresponding 95 % CI from the six eligible studies are shown in Fig. 2a. The summary sensitivity, summary specificity, PLR, and NLR were 0.913 (95 % CI 0.779–0.969), 0.871 (95 % CI 0.773–0.930), 7.071 (95 % CI 4.025–12.420), and 0.100 (95 % CI 0.038–0.261), respectively. The area under sROC (AUROC) was 0.95 (95 % CI 0.93–0.97) (Fig. 3a).

Fig. 2
figure 2

Forest plots of sensitivity and specificity of VTI (a), VTQ (b), and combined VTI and VTQ (c)

Fig. 3
figure 3

Summary receiver operator characteristics curve of VTI (a), VTQ (b), and combined VTI and VTQ (c)

I 2 of test results of the six studies was 80.81 % (95 % CI 58.89–100.00 %), which indicated variance across studies attributing 80.81 % of heterogeneity. According to meta-regression analysis, continent of study origin (p = 0.01) and malignant rate (p = 0.01) were found to be the most significant causes of heterogeneity. No publication bias existed among these studies (p = 0.60).

Diagnostic accuracy of VTQ

The diagnostic performance of VTQ for differentiating between benign and malignant breast lesions was evaluated in eight studies [914, 16, 18]. Forest plots of sensitivity and specificity with corresponding 95 % CI are shown in Fig. 2b. The sensitivity, specificity, PLR, and NLR were 0.849 (95 % CI 0.805–0.884), 0.889 (95 % CI 0.771–0.950), 7.634 (95 % CI 3.600–16.190), and 0.170 (95 % CI 0.133–0.217), respectively. The AUROC was 0.88 (95 % CI 0.85–0.91) (Fig. 3b). For the eight eligible studies, the weighted mean cut-off value was 4.4 m/s (range 2.3-9.1 m/s).

There was statistically significant heterogeneity rather than chance (I 2 = 95.58 %; 95 % CI 92.10–99.05 %). Meta-regression analysis was performed to explore the potential sources of heterogeneity. Only mean size of breast lesions (p = 0.00) was found to be a significant cause of heterogeneity. Significant publication bias existed among these studies (p = 0.025).

Diagnostic accuracy of combined VTI and VTQ

The diagnostic performance of combined VTI and VTQ was evaluated in four studies (Table 1). Forest plots of sensitivity and specificity with corresponding 95 % CI are shown in Fig. 2c. The sensitivity, specificity, PLR, and NLR were 0.935 (95 % CI 0.892–0.961), 0.881 (95 % CI 0.818–0.924), 7.859 (95 % CI 5.024-12.296), and 0.074 (95 % CI 0.044–0.125), respectively (Fig. 2c). The corresponding AUROC was 0.96 (95 % CI 0.93–0.97) (Fig. 3c).

There was no statistically significant heterogeneity (I 2 = 0; 95 % CI 0–100). No significant publication bias existed among these studies (p = 0.44).

Diagnostic accuracy of VTIQ

The diagnostic performance of VTIQ was evaluated in only three studies (Table 1). The obtained sensitivity ranged from 80.4 to 90.3 %, while the specificity ranged from 73.0 to 93.0 %. The summary diagnostic value of VTIQ could not be evaluated due to insufficient data.

Discussion

In this meta-analysis, we evaluated the ability of elastography by ARFI technology, including VTI, VTQ, and combined VTI and VTQ with or without conventional US, to differentiate benign and malignant breast lesions. A summary sensitivity of 0.913 as well as a summary specificity of 0.871 for VTI, and a summary sensitivity of 0.833 as well as a summary specificity of 0.901 for VTQ, were obtained. The AUROC for the diagnosis of malignant breast lesions by VTI alone and VTQ alone were 0.95 and 0.93, respectively. When using combined VTI and VTQ, the AUROC was raised to 0.96. Therefore, ARFI could be considered a reliable tool to classify benign and malignant breast lesions and could be integrated into current imaging protocols.

ARFI showed good diagnostic value for the diagnosis of breast lesions in the meta-analysis. ARFI including VTQ and VTI could yield additional diagnostic information about tissue stiffness, which is a strong complement to conventional US. The weighted mean cut-off value of SWV in VTQ was 4.4 m/s. Theoretically, VTQ is more independent and objective than VTI. In VTI, tissue stiffness is displayed as gray-level imaging. The greater the stiffness, the darker is the gray level. This information can be interpreted either subjectively with pattern classification or semi-quantitatively with strain ratio or area ratio. On the contrary, VTQ can obtain the elastic value by measuring SWV directly, which is expressed as meters per second (m/s) [27]. VTQ is a real quantitative technique to measure tissue stiffness. However, in the meta-analysis, VTQ improved the specificity of VTI at the cost of a drop in test sensitivity, with no overall improvement in AUC. In fact, we cannot deny that there is still some limitation of VTQ, such as dependence on the degree of pre-compression [24] and a fixed-size ROI of 5 × 5 mm. Most importantly, if the stiffness of the tissue is beyond the limitations of measurement, whether high or low, the SWV will be displayed as ‘‘x.xxm/s’’. “x.xxm/s” is caused by lack of generation of shear waves or high shear wave attenuation, which means a poor signal-to-noise ratio [9, 24]. Therefore, the exact SWV is difficult to measure with VTQ. Moreover, VTI reflects the whole stiffness of the target nodule, whereas VTQ demonstrates only single point data of the nodule [28]. Perhaps both VTI and VTQ, each of which has individual characteristics, can be applied in the work-up of breast lesions.

According to our meta-analysis, in the four studies using a combination of VTI and VTQ, the diagnostic performance was improved. However, three out of four studies combined VTI and VTQ with US using BI-RADS in the evaluation of breast lesions. As we know, sonographic BI-RADS scoring presents a high diagnostic value for breast lesions, showing a malignant rate of 17 % in category 4 and 94 % in category 5 [29]. We cannot rule out the help of US in the combined evaluation in the meta-analysis. Thus, the efficacy of ARFI using a combination of VTI and VTQ in the detection of breast cancer should be further evaluated in more studies in the future.

In contrast to VTQ, which provides only single point velocity data, VTIQ software synthesizes information from up to 256 sequential acquisition beam lines inside a two-dimensional user-defined region of interest (ROI) to display a qualitative and quantitative map of shear wave velocities, as well as qualitative maps for shear wave quality, travel time, and tissue displacement [5]. Perhaps VTIQ has added a new dimension to ARFI. Unfortunately, the summary diagnostic value of VTIQ could not be evaluated due to insufficient data.

Significant heterogeneity was found in the present meta-analysis with the exception of the combined VTI and VTQ group. Unfortunately, variation in this group cannot be ruled out due to the small number of included studies and wide confidence intervals. We utilized meta-regression analysis to identify factors that may have caused the observed heterogeneity. Several reasons could explain the heterogeneity. To date, no consensus on the diagnostic criteria for ARFI has been reached in each group. The cut-off value varied from 2.03 to 9.10 m/s, although the cut-off value was not found to provide heterogeneity to summary test results. Moreover, one study investigated the diagnostic value for both single point SWV and two-dimensional SWV, while another study used the interval SWV and boundary SWV to differentiate the breast lesions. As well, there may exist differences in the breast cancers of Asian and non-Asian women. In addition, the results of Yao et al. [9] demonstrated that the sensitivity of VTQ for lesions <10 mm was relatively low. In fact, continent of study origin and lesion size were considered as potential sources of heterogeneity in the meta-regression analysis. Nevertheless, all of these could not sufficiently explain the heterogeneity between the studies.

In regards to publication bias, the Deeks’ funnel plot asymmetry test showed significant publication bias in the evaluation of VTQ measurement, which suggested the presence of a potential publication bias, a language bias, and inflated estimates by a flawed methodological design in smaller studies. Publication bias is a potential limitation of any meta-analysis, because studies with optimistic results may be more likely to be published than studies with unfavorable results, and studies with a large sample size may be more likely to be published than studies with a small sample size [30]. As a result, this study was also subject to publication bias because most of the analyzed clinical research was concerned with the efficiency of ARFI, i.e., positive studies. And only studies published in English were included. Therefore, the literature should be interpreted cautiously, especially with regard to the VTQ assessment.

This meta-analysis has several limitations. Firstly, we have not summarized the diagnostic value of VTIQ, which is the newest technique of ARFI, in the meta-analysis. To date, only a few studies using VTIQ to differentiate breast lesions have been found. These data are not enough to perform a bivariate mixed-effects regression analysis. Therefore, further meta-analyses focusing on evaluating the diagnostic value of SWV, regardless of whether obtained by VTQ or by VTIQ, are necessary. Meanwhile, updated analyses are also needed. Secondly, there was significant heterogeneity among the eligible studies in the evaluation of VTI alone and VTQ alone. In addition, there was the possibility of publication bias in the evaluation of VTQ accuracy. Thirdly, the number of studies included in the meta-analysis was relatively small. Fourthly, we cannot exclude the possibility that the literature search was biased because even if the literature search and assessment were all carried out by two researchers independently, the adjudicating senior author ultimately resolved the discrepancies. Finally, three studies did not use the pathology as a diagnostic reference, although a reference standard was not found to provide heterogeneity to summary test results in meta-regression analysis. Therefore, large international studies with satisfying high-quality criteria with respect to ARFI for the classification of breast lesions are awaited.

Conclusion

We performed a meta-analysis to assess the ability of elastography by ARFI technology to differentiate benign and malignant breast lesions. Elastography by ARFI technology as a noninvasive procedure can be used in the work-up of the differentiation of breast lesions with high sensitivity and specificity. Large prospective international multicenter studies in various regions are necessary to further evaluate the potential of ARFI.