Introduction

Dynamic contrast-enhanced breast magnetic resonance (MR) is a promising emerging technique for evaluating breast lesions [1]. Several investigations have demonstrated the potential higher sensitivity of breast MR imaging relative to mammography or other conventional imaging techniques [13]. In breast MR, diagnosis is based on the dynamic contrast-enhancement, e.g. the increase in signal intensity, the speed and pattern of enhancement, and the degree of regularity of the border of the lesion [13]. Levels of suspicion are reported according to the Breast Imaging Reporting and Data System (BI-RADS) lexicon, with morphology rated on a scale of 0–5 (0, needs additional imaging evaluation; 1, normal enhancement; 2, benign enhancement; 3, probably benign, recommend short-term follow up; 4, suspicious; 5, highly suggestive of malignancy) [4].

Lesions of uncertain malignant potential include atypical ductal hyperplasia (ADH), atypical lobular hyperplasia (ALH), atypical columnar cell hyperplasia (ACH), lobular carcinoma in situ (LCIS), lobular neoplasia (LN), papillary lesions (PL), radial sclerosing lesions (RSL), fibroepithelial lesions, mucocele-like lesions, and columnar cell lesions [5]. According to some studies, atypical epithelial hyperplasia (ADH, ALH), LCIS, PL, multiple micropapillomas (MMP), and radial scars are considered high-risk breast lesions that predispose toward the future development of non-invasive or invasive breast cancer [69]. Dynamic contrast-enhanced breast MR is emerging as a powerful tool for detecting high-grade ductal carcinoma in situ. It has major capabilities for the diagnosis, detection and monitoring of malignancy. Its benefits include being non-invasive and three-dimensional, allowing visualization of the extent of disease and its angiogenic properties, visualization of lesion heterogeneity, detection of changes in angiogenic properties before morphological alterations, and the potential to predict the overall response either before the start of therapy or early during treatment.

The purpose of this systematic review and meta-analysis was to assess the accuracy of breast MR in patients with suspicious breast lesions.

Materials and methods

Search strategy

A comprehensive search of the PubMed, Cancerlit, Lilacs, Scopus, and Embase databases for relevant publications from January 1985 to August 2010 was conducted. Databases were searched using the following terms, both as text words and, as appropriate, Medical Subjects Heading (MeSH) or equivalent subject heading/thesaurus terms: “breast neoplasms”, “breast lesions”, “breast cancer”, “breast tumor” “mammary neoplasms”, “mammary cancer”, “mammary tumor”, and “magnetic resonance” were combined with the MeSH term diagnosis (sensitivity, specificity, false positive, false negative, predictive value, reference value, ROC, likelihood ratio, accuracy, diagnosis, measurement and analysis). This sensitive filter was created by combining three filters for the identification of diagnostic studies via the Boolean operator “OR” and “AND”. The search was limited to human studies but had no language restrictions. In addition, the Cochrane Central Register of Controlled Trials was searched up to July 2010. Reference lists of all retrieved primary diagnostic studies were checked for additional relevant diagnostic studies. Additionally, we checked references of relevant reviews, meta-analyses, guidelines, and commentaries identified in PubMed and Embase. The authors who published the studies were not contacted. The complete search strategy is available on request.

Screening of abstract for eligibility

Abstracts/titles identified from the search were screened by six reviewers (M.I.R, M.I.E., F.R.S., E.P.W., M.E., and D.D.R.). Disagreements about study inclusion or exclusion were initially solved by consensus, and when this was not possible, they were arbitrarily resolved by one reviewer (L.R.M.).

Study selection

We included primary diagnostic studies with prospective or retrospective cohort and cross-sectional designs that evaluated clinically suspected breast lesions (target conditions) by breast MR (index test) according to the American College of Radiology practice guidelines [3].

Patients

Studies that included women with mammographic and sonographic abnormalities (BI-RADS 3–5) and physical examination findings were inconclusive [3]. We excluded studies with patients who (a) were known carriers of a BRCA 1 or BRCA 2 mutation, (b) had a first-degree family member known to carry a BRCA 1 or BRCA 2 mutation but did not know their own status, (c) had a personal or family history highly suggestive of BRCA 1 or BRCA 2 mutation, (d) had a 15–20% lifetime risk for breast cancer on the basis of a personal history of breast or ovarian cancer, (e) biopsy proven lobular neoplasia or ADH, and/or had (f) radiation therapy to the chest before the age of 30 years and at least 8 years previously.

Index test

The diagnostic test was performed only with high-field strength using a commercially available 0.5 or 3 T system with a dedicated surface breast coil. Breast MR was performed using a phase array coil and T1-weighting for the transverse plane and T2-weighting for the transverse and sagittal planes. The increase of signal intensity of the solid components was evaluated on dynamic contrast-enhancement of breast MR images when available. In the absence of dynamic contrast-enhancement, the uptake of the contrast medium from solid tissues was evaluated and compared to pre- and post-contrast breast MR images. The following signs were judged as indicators of malignancy in the dynamic criteria: (a) fast wash-in, (b) plateau phenomenon, (c) wash-out phenomenon, (d) blooming sign, (e) inhomogeneous enhancement, and (f) centripetal enhancement. In the morphological criteria, the indicators of malignancy were as follows: (a) unsharp borders of lesion on T1-weighted images, (b) iso-or hypointensity on T1-weighted images, (c) adjacent vessel sign (AVS), (d) cutaneous thickening, (e) iso- or hypointensity on T2-weighted images, (f) perifocal, diffuse, unilateral, or bilateral edema, (g) connection of the lesion to the M. pectoralis, and (h) lymph nodes larger than 1 cm. The results of the index test were compared to the results of a reference standard.

Reference standard

The diagnostic reference test was the result of the histological analysis of standard paraffin-embedded sections after surgery. The breast MR diagnosis was considered correct if it did not differ from that derived from the paraffin section. For inclusion in this systematic review, the final histological diagnoses of the breast lesions had to include at least two of the following diagnosis as reference standards: benign, high-risk, or malignant breast lesion. We excluded studies presenting exclusively benign, high-risk lesions, or breast cancer as the reference standard. Breast lesions were grouped into high-risk lesions and breast cancer and compared with benign breast lesions. We grouped high-risk breast lesions with malignant lesions because they are considered to predispose the individual to the future development of non-invasive or invasive breast cancer [68]. Thus, the primary outcome analyzed was the accuracy of breast lesion diagnoses (high-risk lesions or breast cancer vs. benign lesions) by breast MR. A secondary outcome was the distribution of histological types of breast lesions according to paraffin section diagnosis (benign, high-risk, and breast cancer).

Data collection and quality assessment

We extracted data on studies, patients, and test characteristics by using a standardized form. Two reviewers (F.R.S., C.S.D.) independently abstracted data regarding the prevalence of benign, high-risk, and malignant breast lesions. They also calculated the sensitivities, specificities, positive and negative likelihood ratios, and positive and negative post-test probabilities from the primary studies of breast MR diagnoses. Studies that lacked the data needed to construct 2×2 contingency tables were excluded. The assessment of non-English articles was performed independently (D.D.R.) following translation (when necessary). Any disagreement was resolved by consensus for studies published in all languages. Final inclusion or exclusion was made with reference to a selection criteria checklist. Disagreements about study inclusion or exclusion were initially solved by consensus, and when this was not possible, they were arbitrarily resolved by one reviewer (L.R.M.). The agreement statistics among reviewers were computed.

Methodological quality assessment of diagnostic accuracy studies was performed according to QUADAS criteria, as modified for use by the Cochrane Collaboration (items that related to reporting quality were removed) [10, 11]. The modification version consists of 11 items on study characteristics with the potential to introduce bias. Items were scored as positive (no bias), negative (potential bias), or insufficient information [10, 11]. This involved scrutinizing the studies’ designs, such as the methods of data collection, the relevant features of the patient population/selection, a description of the breast MR test and the histological reference standard, and the presence of verification biases [1014].

Data synthesis and statistical analysis

To evaluate the agreement between assessments of methodological quality, as well as between the results of breast MR and paraffin-embedded section analyses, the observed percentage of agreement and the κ coefficient for interrater reliability were calculated [13]. For each study, 2×2 contingency tables were constructed in which all biopsies were classified as benign lesions, high-risk breast lesions, or breast cancer. We calculated the true-positive rate (TPR; sensitivity), specificity, false-positive rate (FPR; 1—specificity) and likelihood ratios (LH). When 2×2 tables had 0 cells, correction of calculations were made, and when a study contained two cells with the value 0, it was excluded from the analysis [14]. Bivariate analysis was used to calculate pooled estimates of sensitivity, specificity, and likelihood ratios (LRs) along with 95% confidence intervals (CIs) for the summary estimates [15, 16]. The bivariate model preserves the 2-dimensional nature of diagnostic data by analyzing the logit transformed sensitivity and specificity of each study in a single model, and takes into account both within-study and between-study variability [1520]. The model produces the following results: a random-effects estimate of the mean sensitivity and specificity with corresponding 95% CIs, the amount of between-study variation for sensitivity and specificity separately, and the strength and shape of the correlation between sensitivity and specificity. Pooled estimates were calculated only for studies showing sufficient clinical and statistical homogeneity (I 2 or Q test commonly used in meta-analysis). I 2 or Q test is not recommended for assessing statistical homogeneity in diagnostic reviews because it does not take into account the association between sensitivity and specificity [21]. Statistical homogeneity was defined as overlapping 95% CIs of both sensitivity and specificity and differences in point estimates among the studies of less than 20%. When assessing heterogeneity, we always simultaneously considered sensitivity and specificity [22, 23]. A summary receiver operating characteristic curve (SROC) was generated using data from all thresholds [16, 18, 19, 24]. It can change according to the threshold and to the ROC curve used to define an abnormal examination, resulting in the expected trade-off between sensitivity and specificity. The SROC is an excellent graphical summary, but for comparison purposes, we calculated a further statistic, Q* [16]. Q* is the point on the SROC where sensitivity and specificity are equal. Like area under the curve, the ROC curve and the Q* point indicate how closely a test approaches the desirable performance of 100% sensitivity and specificity [16]. The higher the Q* value, the better the diagnostic test performance [16]. Sensitivity analyses were performed to assess the exclusion of the effect of verification bias by calculating pooled estimates of diagnostic performance, excluding studies before the year of 2000. The statistical analysis was performed with the softwares Meta-DiSc® (Clinical Biostatistics Unit, Ramón y Cajal Hospital, Madrid, Spain) (version 1.4) [25], RevMan 5.0.21 (The Nordic Cochrane Centre, Copenhagen, Denmark) [26], and STATA version 11 [27].

Results

Study identification and eligibility

The process of study selection is summarized in Fig. 1. We identified 5,054 citations from electronic searches in PubMed, and 2,017 additional citations from EMBASE. After initial evaluation, 355 full articles were retrieved, 69 of which were finally considered eligible for the review [8, 2895]. A complete list of the excluded studies is available from the authors. Sixty-nine primary studies, including 9,298 women with 9,884 breast lesions, met the criteria for inclusion and were analyzed (Table 1) [8, 2895]. Interrater overall agreement for study eligibility and methodological quality was 88% (κ = 0.67), indicating good agreement. Disagreement between reviewers related to inclusion or exclusion criteria occurred during analysis of the 69 studies. This disagreement was resolved by consensus.

Fig. 1
figure 1

Study selection process

Table 1 Characteristics of primary diagnostic studies on diagnosing breast lesions

Descriptions of studies

Details of the participants, text standard, and index test are summarized in Table 1. The mean age of participants was reported in 59 studies [8, 28, 30, 32, 3436, 38, 39, 41, 43, 45, 46, 4972, 7589, 9194]. Forty-three studies were prospective and consecutive with a small population, but included sufficient experimental details, proper diagnostic tests and diagnostic reference standards [8, 2834, 3745, 51, 52, 5559, 62, 63, 65, 70, 73, 7577, 79, 8284, 8695]. The results of quality assessment are presented in Fig. 2 according to QUADAS [10]. On average, the reviewers disagreed on 3 of 11 items (range, 0–6). All disagreements were resolved by consensus. The most common source of bias was results that were difficult to interpret. Patient withdrawals were explained in only 10% of studies, and we were unable to determine if interpretation was done in a blinded manner in 90% of the studies included.

Fig. 2
figure 2

Quality assessment by QUADAS [10]

Breast cancer was found in 5,751 cases (58%) and 4,133 (42%) cases were benign lesions. Table 2 shows the results of contingency tables (TP, FP, FN, and TN) from each study considered in the systematic review.

Table 2 Contingency table

Diagnostic performance and summary of results

Interrater overall agreement between breast MR and paraffin sections was 79% (κ = 0.55), indicating moderate agreement (Table 2). Pooled sensitivity and specificity was calculated according to bivariate analysis because of homogeneity (patients, breast lesions), and the results had values less than 20% (Table 3). Pooled sensitivity was 90% (95% CI 88–92%) and specificity was 75% (95% CI 70–79%). The pooled positive likelihood ratio was 3.64 (95% CI 3.0–4.2) and the negative likelihood ratio was 0.12 (95% CI 0.09–0.15). For breast cancer or high-risk lesions versus benign lesions, the area under the curve (AUC) was 0.91 for breast MR, and the point Q* (summary point) was 0.84 (Fig. 3) (Q* is the point on the SROC where sensitivity and specificity are equal). The AUC and point Q* for the ROC curve were estimated by trapezoidal rule (STATA®; version 11) [27].

Fig. 3
figure 3

Summary ROC curve with summary Q* point

Table 3 Diagnostic performance of magnetic resonance mammary test

Sensitivity analysis

The robustness of the results was tested by repeating the analysis and excluding studies that were published before the year 2000. We found 42 studies meeting this exclusion criterion and performed the same bivariate calculus using the revised group of studies. Pooled sensitivity was 90% (95% CI 85–92%) and specificity was 73% (95% CI, 67–79%). The pooled positive likelihood was 3.4 (95% CI 2.7–4.2) and the negative likelihood was 0.14 (95% CI 0.10–0.19). The pooling of sensitivity, specificity, and likelihood ratios from the 42 studies published before the year 2000 did not alter the accuracy rate for breast MR diagnoses of benign breast lesions, high-risk breast lesions, or breast cancer. Therefore, the 69 studies were included in the sensitivity analysis [8, 2895].

Discussion

The results of this meta-analysis show that breast MR is an accurate, non-invasive test for identifying breast cancer with high sensitivity and specificity (90 and 75%, respectively), both with a very restricted confidence interval. Breast MR has been shown to be the best imaging modality for determining the extent of index lesions in the breast before surgery [75]. Breast MR may reveal unsuspected multifocal, multicentric, or contralateral breast carcinoma and may result in therapy changes [46]. Therefore, breast MR is a useful preoperative test for prediction of the benign or malignant nature of breast lesions.

We identified four systematic reviews evaluating breast MR in the diagnosis of breast cancer [9699]. Only one meta-analysis has used bivariate analysis and hierarchical summary ROC approach, which included studies that analyzed patients with suspect lesions in mammography (BI-RADS 4 or 5) and studies that used an MR system with a field strength of at least 1.5 T [99]. We consider our meta-analysis of breast MR in women with suspicious lesions of the breast to be an update and extension of Peters et al.’s meta-analysis [99] that retrieved 40 studies. We identified almost all of these studies as well as 29 more. When pooling sensitivities and specificities according to a random-effects model, Peters et al. [99] found sensitivity and specificity of 90 and 72%, respectively. In comparison, our covariate analysis with bivariate approach yielded sensitivities and specificities of 90 and 75% [99]. In the SROC, we found an AUC of 0.91 for breast MR, and the point Q* was 0.84 (Q* is the point on the SROC where sensitivity and specificity are equal). This showed the discriminatory power of breast MR for suspicious lesions of the breast.

The potentially limited specificity of breast MR imaging has been attributed to the fact that, in addition to cancer, many benign lesions as well as presumably normal breast tissue may enhance after administration of contrast material [51]. Enhancement has been seen in many benign lesions, including fibroadenomas, proliferative, and non-proliferative fibrocystic changes and mastitis [51]. Similarly, diagnoses of high-risk breast lesions such as radial scars, atypical ductal hyperplasia, and lobular carcinoma in situ may be suggested by breast MR [5]. Therefore, breast MR may be able to assess the malignant potential of borderline lesions, thus identifying those women for whom a surgical biopsy can be avoided [5]. Our systematic review found false-positive results in 10.8% of the lesions, most of them being fibroadenomas and high-risk breast lesions. The screening of high-risk women by breast MR has high sensitivity for detecting breast cancer compared to conventional techniques. Breast MR showed 81% sensitivity, compared with mammography (40%) and ultrasonography (43%) [2]. Annual breast MR screening should be offered to patients with the following characteristics: BRCA1, BRCA2 and TP53 mutation carriers, women at 50% risk for BRCA1, BRCA2 or TP53 mutation that runs in their family, women who have had previous mantle radiotherapy before age 30, women who have been diagnosed and treated for breast cancer (5.3%) or developed synchronous or metachronous contralateral breast cancer. In BRCA mutation carriers, the risk of contralateral disease was reported to be 29% at 10 years and 40% overall [2].

Current indications for breast MR include the following: invasive carcinoma and ductal carcinoma in situ, to determine the extent of disease and the presence of multifocality and multicentricity, evaluation of breast carcinoma prior to surgical treatment, to define the relationship of the tumor to the fascia and its extension into the pectoralis major, serratus anterior, and/or intercostals muscles, evaluation of residual disease in post-lumpectomy patients whose pathology specimens demonstrate close or positive margins for residual disease, assessment of response to neoadjuvant chemotherapy and the extent of residual disease prior to surgical treatment [2]. In patients who are potential lumpectomy candidates, determining the extent of disease is important in surgical planning, as the goal is to completely excise the tumor with clean margins with as few surgical procedures as possible [101]. Breast MR has been beneficial in improving surgical planning when assessing for extent of disease, and, therefore, may be a valuable tool in this field of medicine [100].

Strengths and weaknesses of our review

We extracted and reconstructed diagnostic data collected from cross-sectional, retrospective, and prospective studies. The methodological quality of included studies was very high, and the main problems were non-reporting of blinding status for the index test, difficult-to-interpret results, and non-explained withdrawals [10]. Bivariate analysis was used to calculate pooled estimates of sensitivity, specificity, and likelihood ratios (LRs) along with 95% confidence intervals (CIs) for the summary estimates [15, 16]. The bivariate model preserves the 2-dimensional nature of diagnostic data and takes into account both within-study and between-study variability [1520]. We adhered to the most recent guidelines for conducting diagnostic reviews, as described in the Cochrane Diagnostic Reviewers’ Handbook [10, 11]. We used an extensive search strategy, but by using a methodological filter, we may have missed some relevant publications. We did not use a language restriction. There were quite a few discrepancies in the phase of abstract selection and the agreement on some items of QUADAS. All articles were evaluated by six reviewers independently for methodological quality, and consensus was reached by discussing disagreements on individual scores. We defined statistical homogeneity as overlapping 95% CIs combined with less than 20% variation between point estimates. The population of interest was adults presenting with breast tumors (benign and malignant) and high-risk lesions.

Recommendations

Diagnostic tests as first-line investigations in primary care need to be valid, and should be easy to be performed, well tolerated by patients and sensitive, especially in the setting of serious diseases. Our systematic review shows that breast MR has an excellent sensitivity for suspicious breast lesions and could help doctors to make decisions about surgical intervention for diagnosis. In future research, cancer location and stage of disease should be important factors in analysis, especially as tests that are able to diagnose early stages of breast cancer become increasingly used as tools to reduce the burden of the disease.