Introduction

Neoadjuvant chemotherapy (NAC) has been widely used in breast cancer patients over the last decade. The use of NAC allows for reduction in tumor size prior to surgery, as well as reduced axillary nodal involvement [1,2,3,4,5,6,7]. As a result, this increases the likelihood of a successful breast-conserving surgery with minimal axillary nodal excision and improves survival outcomes in those with a complete response [1,2,3,4,5,6,7]. The avoidance of a total mastectomy may also serve to improve the quality of life of patients.

Presently, histopathologic evaluation remains the gold standard for evaluating pathologic complete response (pCR) post-NAC regimen. Consequently, patients are required to undergo surgery post-NAC in order to determine the presence, or absence, of residual cancer cells. However, there are also varying definitions of pCR across the literature, such as the inclusion or exclusion of axillary node pCR, in conjunction with breast tissue pCR. Nevertheless, research has demonstrated that patients who achieve pCR after NAC treatment have a significantly better prognosis and improved long-term outcomes compared to partial responders or nonresponders [2, 4, 6,7,8]. Furthermore, there is a rising interest in determining the extent of residual disease after NAC treatment, since protocols that observe the use of radiation alone post-NAC in cases of optimal tumor response are now available. Therefore, accurately concluding tumor response to NAC is vital to patient outcome.

It has been noted in recent years that breast imaging may in fact be able to predict pCR. The avoidance of surgery remains a future goal for patients whereby pCR can be precisely predicted by use of less-invasive procedures, such as imaging. Various imaging modalities, such as mammography, magnetic resonance imaging (MRI), combination of positron emission tomography/computed tomography (PET/CT), and ultrasonography (US), have been used to evaluate the response to NAC in breast cancer patients. Of these, MRI and nuclear imaging appear to be the most accurate as they correlate best with pathologic breast tumor size [1,2,3, 7, 9,10,11,12].

Mammography and US are presently the most widely used imaging methods upon initial diagnosis [12]. Although these modalities are sufficient to estimate primary tumor size, their effectiveness in assessing pCR following NAC is inadequate, as they rely on the detection of macroscopic changes in tumor size [12]. Since majority of NAC regimens induce an angiogenic response, which precedes the reduction in tumor size, the detection of changes in tumor vasculature and metabolism may be more effective measures of monitoring treatment response. These changes cannot be identified by use of mammography and US; hence, the use of MRI and nuclear imaging may play an invaluable role in the prediction of pCR [12]. In conjunction, studies have also demonstrated the superiority of using MRI to monitor response to NAC in contrast to mammography and US [12,13,14,15,16,17,18]. Lastly, MRI appears to be better at identifying tumors based on phenotypes, more specifically, lobular, multifocal, and multicentric tumors [12].

While MRI has multiple advantages, there are also shortcomings to its use. MRI equipment is unfortunately expensive to purchase, maintain, and operate. It is also not as widely accessible compared to other imaging modalities. Movement may affect the imaging quality during the scanning process as patients may be required to remain still for a prolonged period of time. MRI may also be particularly difficult to obtain for patients with claustrophobia. Finally, the high sensitivity of MRI can potentially lead to many false-positive results and unnecessary biopsies that elicit anxiety in patients and may delay initiation of surgery [3, 11].

Although much research has been devoted to the comparison of various imaging modalities, there are inconsistencies amongst studies due to varying breast cancer molecular subtypes, as well as different NAC regimens used. To the best of our knowledge, there are no reviews published to date that specifically assess the performance of MRI in detecting pCR post-NAC in different tumor subtypes. Our review aimed to pool findings from various studies in order to evaluate MRI performance in relation to distinct tumor subtypes. Although surgical intervention is considered indispensable for the treatment of breast cancer at this time, there is hope that if pCR can be predicted through imaging modalities, such as MRI, surgical management may be avoided, leading to improved quality of life and patient outcomes.

Methods

The literature search

We searched PubMed, EMBASE, and the Cochrane Library (Central) to identify eligible articles from March 2013 to March 2018. Only the most recent 5 years was included to reflect the improvements in MRI technique and NAC. The following search terms were used in PubMed: (“Imaging” OR “Diagnostic Imaging” OR “Magnetic Resonance Imaging” OR “Positron Emission Tomography” OR “Ultrasonography”) AND (“Breast Cancer” OR “Breast Neoplasms”) AND (“Neoadjuvant Chemotherapy”). The following keywords were used in Cochrane Library and EMBASE: breast cancer OR breast neoplasm AND imaging AND response AND chemotherapy. The search was expanded to include all major imaging methods in order to encompass studies that compared the performance of MRI versus other modalities. In addition, previous reviews were manually screened for additional eligible studies published in the past 5 years in order to capture the most recent findings.

Selection criteria

Two reviewers conducted the first screen independently using the following exclusion criteria: (i) studies published in languages other than English, (ii) MRI was not the primary imaging modality or lack of relevancy, and (iii) duplicates across the databases. Subsequently, the reviewers narrowed selected studies based on abstract information to determine degree of relevancy. After the second screen, the remaining articles were read in detail to exclude those without molecular subtype data and studies that did not specify sensitivity/specificity or PPV/NPV. Furthermore, articles that focused on performance of MRI on prediction of axillary lymph node involvement, as well as articles that used clinical complete response (cCR) as the primary endpoint, were excluded.

Data extraction

For each article, the following items were extracted: author, country, mean age of patients, sample size, study design (retrospective or prospective), magnetic strength, contrast dose used, contrast-enhanced (CE) or diffusion-weighted imaging (DWI), pCR definition, NAC regimen, and whether response to NAC was monitored with imaging and timing of surgery. Due to the complexity and variability of the NAC regimens across studies, a reviewer coded each regimen. However, no analysis was conducted using the NAC regimens. Each study had a variable definition of pCR; thus, a reviewer coded the definitions into four categories: the absence of residual invasive cells and the presence of in situ carcinoma (DCIS) were coded as “absent or DCIS”; the absence of invasive cells and DCIS was coded as “absent”; the presence of small number of scattered invasive tumor cells was categorized as “scattered invasive tumor cells”; and the presence of residual cancer cells or near pCR was coded as “scattered cancer cells.” In addition, pCR rate, sample size, NPV, PPV, sensitivity (se), and specificity (sp) were extracted per molecular subtype. Pooling of the NPV, PPV, and se/sp would have been preferred to compare the performance of MRI between different subtypes. However, due to a large heterogeneity across the included studies, use of this method was inappropriate.

Results

As shown in Fig. 1, of the 510 articles identified through a systematic computerized search, 47 papers were selected for complete reading. From the list of 47, 37 were further excluded to yield a total of 10 articles in our review. To the best of our knowledge, there are currently no reviews published focusing on MRI performance stratified by molecular subtypes (Table S-1).

Fig. 1
figure 1

Flowchart of the literature search

Study characteristics

Across the 10 studies, study population sizes ranged from 35 to 746, with a mean overall age of 48.7 years. All studies were performed using MRI, or had a component of MRI alongside other imaging modalities, such as PET/CT or US. As depicted in Table 1, half of the studies were conducted in Japan, three in the Netherlands, one in South Korea, and one in the USA. Seven of the studies enrolled patients retrospectively, and three, prospectively. Where available, magnetic strength (T) of the studies was also noted. Of which, three studies used 3.0 T exclusively, three used 1.5 T exclusively, and two used a combination of 1.5 T and 3.0 T. In the studies that used a combination of 1.5 T and 3.0 T, the distribution of patients in each was not specified. All studies, aside from two that did not specify, used contrast-enhanced MRI. Thus, we could not analyze the data using this variable. Five studies defined pCR as an absence of residual invasive cancer but in situ carcinoma was accepted [19,20,21,22,23]. Two of the studies with the most stringent definition defined pCR as an absence of tumor cells in the breast and resolution of both invasive disease and ductal carcinoma in situ [10, 24]. Only two studies incorporated lymph node status as part of their pCR definition [20, 24]. One study did not include a pCR definition [25]. Different NAC regimens were administered according to breast cancer subtype in four studies [5, 10, 19, 22]. Endocrine therapy was not administered in any of the studies. Three studies included patients as part of clinical trials, observing the efficacy of different NAC regimens [10, 25, 26]. Only one study monitored the tumor response to NAC using MRI and adjusted the course of chemotherapy accordingly [5]. Two studies reported their MRI scans to be locally reviewed, while the rest were unspecified [10, 25]. Similarly, three studies described centrally reviewed pathology, with the remaining, unspecified to be local or central [21, 25, 26]. When reported, the timing of surgery varied widely across studies. With the study heterogeneity, it is important for future studies to standardize definitions and primary endpoints to produce clinically significant results.

Table 1 Key characteristics of included studies

Triple negative

Eight studies reported MRI performance in triple negative breast cancer patients. The sample pool size ranged from 24 to 176 patients. The pCR rate ranged from 20.4 to 56.4% as noted in Table 2. Table 2a illustrates five studies with reported NPV (58–100%) and PPV (57.6–94.7%), while Table 2b displays six studies with reported sensitivity (45.5–100%) and specificity (49–94.4%).

Table 2 Performance of MR imaging on triple negative breast cancer as evaluated by (a) NPV and PPV, (b) sensitivity and specificity

HER2+ enriched

Six studies reported MRI performance in HER2+ enriched breast cancer patients. Study population ranged from 25 to 101 patients. The pCR rate ranged from 31.2 to 76.1% as represented in Table 3. Table 3a shows four studies with reported NPV (62–94.6%) and PPV (34.9–72%), while Table 3b depicts five studies with reported sensitivity (36.2–83%) and specificity (47–90%).

Table 3 Performance of MR imaging on HER2+ enriched breast cancer as evaluated by (a) NPV and PPV, (b) sensitivity and specificity

HR+/HER2−

Five studies reported MRI performance in HR+/HER2− breast cancer patients. Study population ranged from 71 to 327 patients. The pCR rate ranged from 1.9 to 21.1% as seen in Table 4. Table 4a displays five studies with reported NPV (29.4–100%) and PPV (21.4–95.1%), while Table 4b illustrates four studies with reported sensitivity (43–100%) and specificity (45–93%).

Table 4 Performance of MR imaging on HR+/HER2− breast cancer as evaluated by (a) NPV and PPV, (b) sensitivity and specificity

Discussion

Traditionally, US has been widely used to provide response-guided NAC due to its low cost, relative accessibility, and ease of use [21]. With an emerging interest to explore the potential of MRI in predicting pCR post-NAC, we aimed to create an overview of its performance in each breast cancer molecular subtype. There have been numerous studies published in the past 5 years comparing the performance of MRI, US, and PET/CT in breast cancer imaging. However, very few had the statistical power to stratify patients by molecular subtype. With only ten articles in this review, our results should be interpreted vigilantly, but suggests several areas of improvement for future research.

Our findings in this review also demonstrate the rising interest of MRI and PET/CT as alternatives to, or in combination with, US to detect pCR post-NAC. Only one review published in the last 5 years observed the performance of MRI across molecular subtypes [27]. However, this study was focused on the performance of 18FDG-PET/CT, with only one MRI study included for each of the following subtypes: ER+/HER2−, triple negative, and HER2+.

It is important to standardize pCR definitions for studies on this subject matter, particularly in a given region. For example, the Japan Breast Cancer Society defines pCR as the absence of cancer cells and necrotic or nonviable residual cancer cells [28]. However, the six studies conducted in Japan had widely varying definitions of pCR, from the presence of scattered cancer cells to the absence of any cancer cell or DCIS. The most stringent pCR definition was employed in the studies by De Los Santos and Michishita, whereas the least stringent in Schmitz and Kaise. As such, one would expect a higher pCR rate for patients in the Schmitz and Kaise studies. However, this was not consistent. Although Schmitz reported the highest pCR rate amongst studies with triple negative patients (56.4%) and HER2+ patients (76.1%), Kaise reported one of the lowest pCR rates for the HR+/HER2− group (9.2%). On the contrary, De Los Santos had one of the lowest pCR rates in HER2+ studies (37.6%), highest in triple negative studies (36.8%), and moderate for HR+/HER2− studies (13.5%). This heterogeneity suggests that other variables, such as the response to NAC and timing of surgery, may have contributed to differing pCR rates of patients. For instance, of the four studies that included trastuzumab as part of their NAC regimen, only Schmitz and Okamoto reported trastuzumab administration across all HER2+ patients [5, 10, 19, 22]. De Los Santos did not administer trastuzumab to 54 of its HER2+ patients since their breast tumors were treated prior to 2005 [10]. One study did not report how pCR was defined [25].

The timing of surgery and imaging were not reported in five of the ten articles [19,20,21,22, 24]. Timing of surgery varied between 4 weeks after last NAC, and 7.8 days after last MRI, to 20 days after last MRI. It is expected that there would be a difference in tumor response and possibly regrowth if a patient undergoes MR imaging 1 week after the last course of NAC, as compared to 1-month post-NAC. Previous studies have attempted to clarify appropriate timing of MRI per molecular subtype with preliminary reports suggesting that HER2+ patients may benefit from a midpoint-NAC MRI examination [5, 22]. Further studies are needed to explore and streamline the process from the last course of NAC to MR imaging, and finally, to surgery.

We excluded studies such as I-SPY2 because it did not quantify the performance of MRI using sensitivity/specificity [29,30,31]. Similarly, the study of Weber et al. [32] was excluded due to a lack of molecular subtype-specific data. With only ten articles that are heterogeneous across most variables, we were underpowered to reach definitive conclusions across the molecular subtypes, but important findings were noted nevertheless. Even when stratified by subtype, the ability of MRI to detect pCR post-NAC is not as accurate, or consistent, as individual studies suggest. In order to elicit the true potential of MRI in detecting pCR post-NAC, larger studies using standardized pCR definition with appropriate timing of surgery and MRI in relation to the last course of NAC need to be conducted. It was surprising for us to find that even in the molecular subtypes that had the highest response to NAC, MRI was not accurate enough to preclude the need for surgery. This means that either biopsies of the tumor bed need to be performed as proposed in the NRG Oncology BR005 study, or radiologic imaging must be developed based on nuclear imaging of tumor metabolism.

Conclusion

The accuracy of MRI in detecting pCR post-NAC by subtype is not as consistent, nor as high, as hoped. To capture the true potential of MRI in detecting pCR, larger studies using standardized pCR definition with appropriate timing of surgery and MRI need to be conducted. Meanwhile, clinicians should question the reliability of MRI findings post-NAC and adopt a careful approach to the management of residual disease.