Neoadjuvant chemotherapy (NACT) is increasingly used to treat larger tumors in patients with invasive breast cancer to minimize the surgical procedure and reduce its adverse effects. This has led to more breast-conserving surgery (BCS), resulting in better cosmesis and fewer axillary lymph node dissections without compromising survival.1,2,3,4

In the last decade, studies have reported complete eradication of tumor, with a pathologic complete response (pCR) in 20 % to 60 % of patients receiving NACT.1,57

The response of breast cancer patients to NACT varies between cancer subtypes,710 and studies report high rates of complete response, up to 70 % in triple-negative (estrogen receptor-negative (ER−)/human epidermal growth factor receptor 2-negative (HER2−) and HER2+ breast cancers (ER−/HER2+)7, indicating the extensive improvement in the effect of NACT. Surgery performed after NACT to ensure that no cancer remains still is advised for women with complete response on imaging,11,12 but currently is questioned due to the high rates of pCR.13,14

Diagnostic mammography (MG) combined with ultrasound (US) and magnetic resonance imaging (MRI) are the imaging methods used to measure tumor response during and after NACT. The extent of residual tumor on imaging is used by the surgeon when planning the surgical treatment.8,15,16 Overestimation of residual tumor size by imaging results in unnecessary large operations, whereas underestimation could result in multiple operations to obtain free margins. Accordingly, high accuracy of the imaging methods is crucial when surgery is planned. The possibility of omitting surgery also is highly dependent on the accuracy of the imaging method.

In 2013, a meta-analysis by Marinovich et al.17 stated that MRI and US are similar in predicting residual tumor burden compared with pathology, although further comparisons between MRI and other tests still are warranted. Later, in 2016, a study by Vriens et al.18 found US to be at least as good in predicting residual tumor size as MRI. However smaller studies have shown MRI to be a more valid predictor of pCR than US,16,1921 leaving no definite answer as to which imaging method should be used. In Denmark, the Danish Breast Cancer Group (DBCG) suggests the use of MRI, and we hypothesized that MRI is better in predicting pCR than US.

This study aimed to investigate whether MRI is better than US in predicting pCR in breast cancer patients receiving NACT from two breast cancer centers. Furthermore, we compared the imaging prediction of pCR in receptor subgroups.

Methods

Study Population

For this institutional retrospective study, women with breast cancer receiving NACT and examined by either combined MG and US or MRI before surgery between 1 January 2016 and 31 December 2019 at Herlev Hospital or Rigshospitalet were identified using the DBCG database, the Patobank database, and the IMPAX radiology system. Information on the eligibility of identified patients was searched using original patient files.

The study enrolled women with invasive breast cancer diagnosed by core needle biopsy and planned for six series of NACT. Patients planned for eight series of NACT and patients who did not start the sixth series of NACT were excluded from the study. If surgery was performed more than 65 days after the last imaging, the patient was excluded. Patients with insufficient imaging also were excluded (Fig. 1). Ethical approval was waived by the Danish Patient Safety Authority due to the retrospective nature of the study because all the procedures performed were part of the routine care.

Fig. 1.
figure 1

Inclusion of breast cancer patients planned for neoadjuvant chemotherapy (NACT) at Herlev Hospital and Rigshospitalet between 1 January 2016 and 31 December 2019. Lucidchart.com was used for the creation of this figure. DBCG, Danish Breast Cancer Group; MRI, magnetic resonance imaging; NACT, neoadjuvant chemotherapy; 6s, 6th series

The patients were allocated to treatment at either Herlev Hospital or Rigshospitalet based on their municipality. Both breast centers are highly specialized with similar diagnostic procedures.

Chemotherapeutic Protocols

The neoadjuvant treatment comprised three series of cyclophosphamid and epirubicin every third week followed by three series of taxane-based chemotherapy. The HER2+ cancers received a trastuzumab/pertuzumab-based regimen together with taxanes.

Imaging Methods

Before initiation of the neoadjuvant treatment, the initial tumor size was measured by handheld US in connection with the primary diagnostic MG, and an MR-compatible metallic coil was placed, guided by US, inside the tumor of each patient for identification of the primary tumor bed at the time of surgery. This was performed the same way in both the US and MRI groups. The initial US measurement of most patients for Herlev Hospital was performed in specific private clinics with a financial regional agreement according to the fastest logistics of the Danish Health System, and the patients were afterward referred to the breast center.

At Rigshospitalet, the initial US measurement was performed at the breast center in connection with the primary diagnostic MG. In addition, the patients in the MRI group had an MRI scan before initiation of the neoadjuvant treatment. Tumor size was evaluated after two series of NACT and before surgery using a handheld US in the US group and using an MRI in the MRI group. These scans were performed and evaluated at the breast centers by trained breast radiologists.

Ultrasound

The coil used in the US group was the UltraClip II Titanium Tissue Marker coil from Bard Biopsy Systems. All US procedures were performed by a trained breast radiologist with high expertise using Esaote Mylab 70XVG with a LA523 13-4MHz linear probe. The patients with no residual measurable tumor on US or MG were considered to have a radiologic complete response (rCR). The remaining patients were considered to have a non-radiologic complete response (non-rCR). Persisting suspicious and malignant calcifications were not included in the response evaluation, but recorded to aid the decision for the optimal method of surgery.

MRI

All MRI scans were performed at Rigshospitalet using either a GE 1.5 T Discovery or GE 1.5 T Optima with an eight-channel 1.5 T HD Breast Array, a Liberty 9000 8 Breast coil, and a 1.5 T HD flat gem table breast array before NACT using dynamic contrast-enhanced MRI. The coil used in the MRI group was the Ultraclip II BioDur 108 Tissue Marker coil from Bard Biopsy Systems. The following scan protocol was applied in all MR examinations: one axial T2W fast spin echo (FSE), one axial diffusion-weighted imaging (DWI), and one T1W sequence before infusion of Multihance 0.2 ml/kg at an infusion rate 1.5 ml/s.

After admission of contrast, five T1W sequences (multiphase), one T1W sagittal, and one TW (with phase AP) including subtraction recordings were performed. For agreement, the MRIs were reviewed by a senior radiologist with 7 years of experience in breast radiology, and the rCRs were doublechecked by another senior radiologist with more than 25 years of experience in breast radiology. Both radiologists were blinded to the pathologic data.

An rCR on MRI was defined as no enhancement according to tumor bed. A near rCR was defined as a minimal border of enhancement around the coil artifact and counted as a non-rCR. Any enhancement in the primary tumor bed was considered a non-rCR.

Histopathologic Analyses

Pathologic examination and immunohistochemistry (IHC) were performed according to the national guidelines of DBCG.22 Tumor molecular characteristics on biopsy were evaluated by HER2, ER, and ki-67%. An HER2 expression status of 3+ was defined as positive, with 1+ and 0 defined as negative. Tumors with a response of 2+ were further evaluated by fluorescence in situ hybridizing (FISH), and gene amplification was counted as positive. No gene amplification was counted as negative. The ER expression status was defined as positive if 1 % or more of the cells stained and negative if less than 1 % stained.

From the IHC, the tumors were divided into four subgroups; ER−/HER2−, ER−/HER2+, ER+/HER2−, and ER+/HER2+. The Ki-67 % index was evaluated manually for samples retrieved from Rigshospitalet and by automated procedure for samples retrieved from Herlev Hospital. A pCR was defined as no invasive tumor cells or ductal carcinoma in situ (DCIS) remaining. A non-pCR was defined as invasive tumor cells or DCIS remaining according to the Residual Cancer Burden classification.23,24 For the patients with multifocal cancers, only information regarding the largest tumor was collected for this study. The response in lymph nodes was not included in this study.

Statistical Analyses

Baseline characteristics were compared between the US and MRI groups using the chi-square test for the categorical variables and Student's t test for comparison of means for continuous variables. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were calculated for each group and further calculated in receptor subgroups. True-positive (TP) was defined as non-pCR and non-radiologic complete response (rCR), and true-negative (TN) was defined as pCR and rCR. False-positive (FP) was defined as pCR and non-rCR, and false-negative (FN) was defined as non-pCR and rCR. Sensitivity was calculated by TP/(TP+FN), specificity by TN/(TN+FP), PPV by TP/(TP+FP), NPV by TN/(TN+FN), and accuracy by (TP+TN)/(TP+TN+FP+FN). Comparison to determine a difference between proportions was performed between the groups evaluated by US and MRI using a z test. A p value lower than 0.05 was considered significant. Analyses were performed using R version 3.5.2.

Approval

The study was approved by the head of management at Rigshospitalet, the head of the Department of Breast Surgery at Herlev and Gentofte Hospital, the head of the Department of Radiology, and The Danish Patient Safety Authority.

Results

Patient and Tumor Characteristics

The study identified 1888 breast cancer patients through searches of the DBCG database, the Patobank database, and the IMPAX radiology system Of these 1888 patients, 305 (307 breasts) fulfilled the inclusion criteria and were included in the study (Fig. 1).

Data on ethnicity were not available from the patient files. The study included 156 patients in the US group and 151 patients in the MRI group. Two patients had bilateral breast cancer, and these tumors were evaluated individually. One patient in the US group had insufficient receptor status and was not included in the subgroup analyses.

The patient and tumor characteristics of the two imaging groups are shown in Tables 1 and 2. No significant differences in patient or tumor characteristics were found between the US and MRI groups except for a small but significant difference in age and body mass index (BMI). The mean age was 52.4 years in the US group and 49.5 years in the MRI group (p = 0.028). The mean BMI was 26.3 kg/m2 in the US group and 24.9 kg/m2 in the MRI group (p = 0.020). The number of days between the last imaging and surgery or between the final chemotherapy session and the surgery did not differ significantly. The rates of reoperations in the two groups were comparable, with 12 (11.1 %) in the US group and 10 (9.9 %) in the MRI group (p = 0.776). Approximately one third of the patients in both groups were treated by mastectomy without a significant difference between the groups. In the US group, 56.4 % (n = 88) of the tumors were overestimated in size, and 26.9 % (n = 42) of the tumors were underestimated in size. In the MRI group, 51 % (n =77) of the tumors were overestimated in size, and 21.9 % (n = 33) of the tumors were underestimated in size. The numeric deviation in tumor size on imaging compared with pathology was 9.1 mm in the US group and 8.6 mm in the MRI group, and the difference was not statistically significant (p = 0.609).

Table 1 Patient characteristics of 305 breast cancer patients (307 breasts) planned for NACT between 1 January 2016 and 31 December 2019 at Herlev Hospital or Rigshospitalet and evaluated by US or MRI
Table 2 Tumor characteristics for 305 breast cancer patients (307 breasts) planned for NACT between 1 January 2016 and 31 December 2019 at Herlev Hospital or Rigshospitalet and evaluated by US or MRI

Response Evaluation

In the US group, 51 (32.7 %) of the patients achieved a pCR compared with 37 (24.5 %) of the patients in the MRI group (p = 0.113) (Table 3). Of the four subgroups, the ER−/HER2+ subgroup in the US group had the highest rates of pCR (65.2 %), and the ER−/HER2− subgroup had the second highest rate (55.9 %) (Table 4). The MRI group had lower rates of pCR, with the highest rates being those of ER−/HER2+ (31.8 %) and ER+/HER2+ (29.8). The ER+/HER2− subgroup had the lowest pCR rate in both groups. In the MRI group 23.2 % of patients had a rCR compared with 16.7 % in the US group (p = 0.153).

Table 3 Response evaluation after NACT, in 305 breast cancer patients (307 breasts) planned for NACT between 1 January 2016 and 31 December 2019 at Herlev Hospital or Rigshospitalet according to type of imaging used for evaluation
Table 4 Pathologic response after NACT according to receptor subgroups in 304 breast cancer patients (306 breasts) evaluated by US or MRI treated between 1 January 2016 and 31 December 2019 at Herlev Hospital or Rigshospitalet

Comparison of pCR Prediction

Agreement between the pathologic and radiologic responses in the US and MRI groups is shown in Table 5. Sensitivity, specificity, PPV, NPV, and accuracy for the US and the MRI groups are shown in Table 6. The US group had a lower specificity than the MRI group (p = 0.049), whereas sensitivity was high in both groups, with no significant difference. The PPV was significantly higher in the MRI group (p = 0.025), and accuracy also was higher, but the difference between the groups was not significant.

Table 5 Agreement between the pathologic and radiologic responses after NACT in 305 breast cancer patients (307 breasts) according to type of imaging used for evaluation
Table 6 Comparison of MRI and US in predicting pathologic complete response after NACT in 305 breast cancer patients (307 breasts)

No significant difference in NPV was found between the groups. Estimates for the subgroups are shown in supplementary. The patients in the ER−/HER2− subgroup had the highest NPV in both the US group and the MRI group, but the results should be interpreted with caution due to the low number of patients in each subgroup, and accordingly, further statistical analysis was not performed.

Discussion

We found that MRI had a higher specificity and PPV than US in predicting pCR after NACT in breast cancer patients. The MRI group had a higher rate of rCR, which was more in agreement with pCR, causing the higher specificity and PPV. Sensitivity was high in both the MRI and US groups, supporting the view that both imaging methods are good at determining residual cancer.18 Although a difference in specificity between the two imaging methods was observed, this did not affect the rates of re-surgeries or mastectomy, which were comparable between the two groups. This indicates that both imaging methods are equally accurate for clinicians to use in planning post-NACT surgery. In addition, the choice of surgical procedure is not solely dependent on the response to chemotherapy, but is also dependent on other factors such as patient preference, persisting microcalcifications, and multifocality. The NPV was comparable between the two groups, although far from ideal (US, 65.4 %; MRI, 60.0 %) when the prospective aim was an attempted omission of surgery.

Like others, we found that the ER−/HER2+ subgroup showed the highest rate of pCR10 in both the MRI and US groups. The pCR rates of the US group were comparable with what has previously been found.7,10,25,26 In the subgroups of the MRI group, the rates of pCR differed from what would be expected with ER+/HER2+, rates almost as high as with ER−/HER2+ and with the ER−/HER2− rate as the second lowest. This variation could be explained by the relatively small number of patients in the subgroups.

In the literature, the definitions of pCR vary between pCR without DCIS (ypT0, pathologic evaluation after NACT showing no microscopic cancer cells or in situ components)8,27,28 and pCR with DCIS (ypT0/is, pathologic evaluation after NACT showing no cancer cells but allowing in situ components).29,30,31 This is important for clinicians to consider when comparing the pCR rates of NACT among studies. What is defined as positive and negative results also differs among studies. Some studies count pCR as a positive result,8,31,32 and others define the presence of tumor as a positive result.27,29,30,33 This must be kept in mind when sensitivity, specificity, PPV, and NPV are compared between studies. In this study, we regarded pCR without DCIS (ypT0) as a negative result.

This study of more than 300 patients is one of the larger studies on the topic. The prediction of pCR with MRI (NPV) found in our study was comparable with the finding of a smaller study by Bouzon et al.34 that included 91 patients,34 but we found a lower specificity of only 56 % compared with 79 % found by Bouzon et al.34 They used ypT0/is as their definition of pCR and did not include comparison with US.

A recent large study of 1219 patients by Zhang et al.,32 also using ypT0/is, compared US with MRI and found that MRI had a higher specificity than US (MRI, 44.4 % vs US, 36.2 %), as confirmed in our study. However, the difference was not significant. Vriens et al.18 stated in a study based on 182 patients that US is at least as good in predicting tumor size after NACT as MRI. However, the prediction of an absent residual tumor (NPV) was 33 % for US and 26 % for MRI, in contrast to our results showing NPV to be higher with both imaging methods (65.4 % with US and 60.0 % with MRI).

A limitation in this study was its design, with two groups examined via different imaging methods instead of a paired design with all patients examined via both MRI as US. The latter method would have lowered the risk of selection bias.

However, the differences in patient and tumor characteristics between the groups were small and mainly nonsignificant. A significant difference in age and BMI in the two groups was found, but this difference still was small and not considered to affect the evaluation of the imaging. Furthermore, the patients were allocated to treatment based on their municipality and not on their voluntary choice of hospital because both breast centers are highly specialized with similar diagnostic procedures.

Also, the user-dependent aspect of handheld US should be considered, although in the current study only breast-trained radiologists with high expertise performed US. The void artifact caused by the coil in the MRI measuring about 10 mm also was a notable consideration because this could have caused higher rates of false-negatives in the MRI group. A strength of the study was the definition of pCR (ypT0) that did not accept the remains of DCIS in the breast according to the clinical precautions of surgical practice.

The number of days between the last imaging and surgery was due to logistical structures of the clinical practice in the centers and could have caused further shrinkage and eradication of tumors, although no difference was observed between the US and MRI groups, and this interval was therefore considered to affect the groups equally. The number of days between the last imaging and surgery also would cause the imaging methods to overestimate tumor size more often, as seen in our study.

For patients with pCR, surgery in the breast is redundant. In attempts to avoid surgery for complete responders, neither MRI nor US were able to predict pCR in a way that surgery could be safely omitted. From this study, it was not possible to determine subgroups with higher predictability from the imaging methods. For omission of surgery in selected breast cancer patients after NACT, an important step is to compare the prediction of pCR in subgroups, especially the ER−/HER2− and ER−/HER2+ subgroups because they are known to have the highest rates of pCR and a better correlation between MRI and pathology.9 We found the estimates of the subgroups to be inconsistent with relatively large confidence intervals, which is why further comparison of the subgroups was not performed. Larger studies are needed for possible comparison and identification of subgroups with both a good response and good prediction of pCR.

In future studies, the accuracy of MRI could be further enhanced by minimizing the coil artifact via careful selection of coils. Also, biopsy techniques in addition to imaging have been suggested by several studies as a way to attempt an increase in the predictability of pCR.13,14,35,36,37 A recent study by Lee et al.38 stated that MRI findings indicating complete eradication of tumor combined with US-guided preoperative biopsies could help raise the prediction of pCR (NPV) to 87.1 %, which is higher than both the US and MRI findings in this study. The results from our study, based on a large study population, support the use of MRI in such studies.

Conclusion

In summary, to date, patients receiving NACT cannot avoid surgery, but MRI should be the imaging method used for optimizing and further improving the diagnostic accuracy of pCR prediction. Compared with US, MRI was significantly more specific in predicting pCR, although still not high enough to be a valid predictor of pCR for omission of surgery. The NPV was comparable between US and MRI. In future studies, MRI should be chosen as the preferred imaging method for a more accurate prediction of pCR. No subgroup in the current study showed a significantly higher specificity or NPV considering the relatively small number of patients in the groups. Larger subgroups are required for further studies.