Introduction

Axillary lymph node status is the most important prognostic factor in patients with early-stage breast cancer. Morbidities associated with axillary lymph node dissection have led to the development of sentinel lymph node biopsy (SLNB) to reduce the rate of negative axillary clearances [1, 2]. Reported sensitivity rates of intraoperative SLN evaluation for breast cancer range from 58 to 72% [3,4,5] and accuracy rate of 75% [6]. These rates are consistent with recently published 33% FN rate for intraoperative SLN [7].

Although SLNB is a minimally invasive procedure, it is still associated with morbidities, which include risk of lymphedema amounting 8.2% at 12 months [8]. Other complications such as seroma, localized swelling, pain and paresthesia, infectious neuropathy, decreased arm strength, and shoulder stiffness have been reported in up to 19.5% of patients with SLNB [9]. There is potential for non-invasive imaging technique for axillary evaluation that may be comparable to SLNB without the associated comorbidities. Prior studies have investigated axillary ultrasound (AUS) and positron emission tomography-computer tomography (PET-CT) for evaluation of the axillary lymph nodes. These modalities have shown only moderate accuracy and sensitivity for detecting metastatic axillary lymph nodes, with 67–77% accuracy and 43.5–72.3% sensitivity for AUS and 81.1% accuracy and 56–62.7% sensitivity for PET-CT [10,11,12]. In addition, AUS is operator dependent and PET-CT involves potentially harmful ionizing radiation exposure.

Utilizing the breast MRI modality for axillary evaluation reportedly shows low intra- and inter-observer variability and higher diagnostic accuracy (71–85%) and sensitivity 47.8–89% for nodal status [12,13,14,15]. Although MRI is the most promising of the imaging modalities, previously published studies are limited by small sample size and subjective identification of the region of interest manually defined within the lymph node by the reader.

In recent years, there has been investigation into quantitative analysis of specific extracted imaging features, termed “radiomics.” The field of radiomics has developed largely due to the contribution of machine learning techniques utilizing the extraction of pertinent imaging features and correlating with clinical data. Most recently, a subset of machine learning utilizing a type of artificial neural network called CCN has begun to proliferate due to advances in computer hardware technology for medical imaging analysis. In contrast to traditional algorithms which utilize hand-crafted features based on human-extracted patterns, neural networks allow the computer to automatically construct predictive statistical models, tailored to solve a specific problem subset [16]. The laborious task of human engineers inputting specific patterns to be recognized could be replaced by inputting curated data and allowing the technology to self-optimize and discriminate through increasingly complex layers.

A convolutional neural network (CNN) is a deep artificial neural network that automatically constructs predictive statistical models, tailored to solve a specific problem subset. It allows the technology to self-optimize and discriminate through increasingly complex layers [16]. The purpose of this study is to develop an objective and accurate approach to MRI axillary evaluation applying a novel CNN algorithm.

Methods

Patient Population

An institutional review board-approved retrospective review from 1/2013 to 6/2016 identified biopsy-proven 133 metastatic axillary lymph nodes on core biopsy from 133 patients, which was compliant with Health Insurance Portability and Accountability Act (HIPPA). One hundred forty-two negative control lymph nodes were identified based on benign biopsies and subsequent negative SLN evaluation in 100 patients, and from healthy 42 MRI screening patients with at least 3 years of negative follow-up.

MRI Acquisition and Analysis

MRI was performed on a 1.5-T or 3.0-T commercially available system (Signa Excite, GE Healthcare) using an eight-channel breast array coil. A bilateral sagittal T1-weighted fat-suppressed fast spoiled gradient-echo sequence (17/2.4; flip angle, 35°; bandwidth, 31–25 Hz) was then performed before and after a rapid bolus injection (gadobenate dimeglumine/Multihance; Bracco Imaging; 0.1 mmol/kg) delivered through an IV catheter. Image acquisition started after contrast material injection and was obtained consecutively with each acquisition time of 120 s. Section thickness was 2–3 mm using a matrix of 256 × 192 and a field of view of 18–22 cm. Frequency was in the antero-posterior direction.

Image Pre-processing

For all patients, lymph nodes were segmented by a breast fellowship trained radiologist with 8 years of experience using 3D Slicer [17] based on the first T1-W post contrast subtraction images. For each segmented lymph node, the slice with the largest cross-sectional area as determined on any orthogonal plane (axial, sagittal, or coronal) was identified. The center of mass for each 2D cross-sectional ROI was used as a landmark to create a uniform 4.0 × 4.0 cm bounding box around the lymph node of interest. A fixed size bounding box methodology was chosen to preserve relative size of lymph nodes from patient to patient.

All 2D images were rescaled to a 32 × 32 voxel resolution. The intensity values were normalized by conversion to a z score map. In addition, the ROI mask was dilated by five voxels, and every voxel outside the mask was set to a z score of − 5.

Data augmentation employed by this study involves several real-time modifications to the source images at the time of training. Specifically, 50% of all images in a mini-batch were modified randomly by means of (1) addition across all pixels of a scalar between [− 0.1, 0.1] in order to simulate the effect of random Gaussian noise from different acquisition parameters; (2) random affine transformation of the original image, which alters each lymph node slightly utilizing a rigid transformation, essentially making the same lymph node appear as a unique input to the network. Given a two-dimensional affine matrix,

$$ \left[\begin{array}{ccc}{s}_1& {t}_1& {r}_1\\ {}{t}_2& {s}_2& {r}_2\\ {}0& 0& 1\end{array}\right] $$

the random affine transformation was initialized with random uniform distributions of interval s1, s2 ∈ [0.8, 1.2], t1, t2 ∈ [− 0.3, 0.3], and r1, r2 ∈ [− 16, 16]. These parameters were confirmed on visual inspection as applying enough of a warp to simulate a different lymph node without making the lymph node appear unrealistic. The choice to apply data augmentation to 50% of the example images was made to bias the network towards recognition of real data over augmented data.

Neural Network Architecture

Several neural network architectures were tested with varying network depths and kernel sizes, including a pretrained network architecture based on VGG-16. The final overall network architecture is shown in Figs. 1, 2, and 3. The CNN is implemented completely by series of 3 × 3 convolutional kernels to prevent overfitting [18]. No pooling layers are used; instead downsampling is implemented simply by means of a 3 × 3 convolutional kernel with stride length of 2 to decrease the feature maps by 75% in size. All non-linear functions utilize the rectified linear unit (ReLU) which allows training of deep neural networks by limiting vanishing gradients on backpropagation [19]. Additionally, batch normalization is used between the convolutional and ReLU layers to stabilizing training by limiting vanishing gradients and to prevent covariate shift [20]. Upon downsampling, the number of feature channels is doubled, reflecting increasing representational complexity and to prevent a representation bottleneck. Dropout at 50% was applied to the second to last fully connected layer to limit overfitting and add stochasticity to the training process [21].

Fig. 1
figure 1

ac Representative images after pre-processing of metastatic lymph nodes

Fig. 2
figure 2

ac Representative images after pre-processing of negative control lymph nodes

Fig. 3
figure 3

CNN architecture. An eight hidden layer neural network was constructed based on a 32 × 32 input image with filter sizes as above. Seven consecutive convolution operations are performed with feature map downsampling done by utilizing convolutional layers with a stride size of two. A fully connected layer with 512 neurons was added as the final hidden layer. Network output consisted of a two class score prediction

Training was implemented using the Adam optimizer, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments [22]. Parameters were initialized to equalize input and output variance utilizing the heuristic described by He et al. [23]. L2 regularization is implemented to prevent overfitting of data by limiting the squared magnitude of the kernel weights. To account for training dynamics, the learning rate is annealed and the mini-batch size is increased whenever training loss plateaus. Furthermore, a normalized gradient algorithm is employed to allow for locally adaptive learning rates that adjust according to changes in the input signal [22].

Due to the small sample size, five-fold cross-validation was utilized to evaluate network performance (80% training and 20% testing). This method involves initially splitting the available data into five random groupings. One of the groups is utilized as the initial testing set to fine tune the parameters of the network trained on the other five groups. After parameter tuning is complete, the group utilized as the validation set is changed and the network is retrained on the remaining four groups using the same parameters. The process is repeated until every one of the five groups of data is utilized as a validation set once.

Software code for this study was written in Python using the TensorFlow module (1.0.0). Experiments and CNN training will be done on a Linux workstation with NVIDIA GTX 1070 Pascal GPU with 8 GB on chip memory, i7 CPU and 32-GB RAM.

Results

A total of 142 metastatic lymph nodes and 133 normal lymph nodes were included in this study. For each lymph node, a final softmax score threshold of 0.5 was used for classification. Based on this, mean five-fold cross-validation accuracy was calculated at 84.3%.

Manual inspection of false positive and false negative predictions of the network revealed no discernibly consistent features that consistently lead to false negative or false positive classifications from the network.

The CNN was trained for a total of 22,000 iterations (approximately 1500 epochs with batch sizes ranging from 12 to 24) before convergence. A single forward pass during test time for classification of new cases can be achieved in 0.0043 s.

Discussion

To our knowledge, this is the first study applying deep machine learning using CNN-based algorithm to predict axillary lymph node metastasis based on imaging data. Our study shows that it is feasible to use a CNN-based algorithm for axillary evaluation using breast MRI dataset yielding a reasonable diagnostic performance (accuracy of 84%) even with a relatively small dataset.

Prior published studies evaluating the axilla with MRI have reported an averaged accuracy rate of 75% (ranging 71–85%) in predicting axillary metastasis [13,14,15]. In a retrospective study, Hwang et al. analyzed performance of AUS, MRI, and PET-CT in detection of axillary lymph node metastasis (ALNM). AUS, MRI, and PET-CT had accuracies of 77.1, 77.9, and 81.1% respectively. The combination of MRI and PET-CT was most accurate with an accuracy of 83.1%. However, routine use of both MRI and PET-CT for axillary evaluation may not be cost effective.

In a retrospective analysis by Hiecken et al. [14], performance of breast MRI was assessed on both a patient-by-patient and a node-by-node analysis, which included 505 patients. Their patient pool included patients with stages T1–T4. The accuracy of MRI in detection of ALNM was 69.7–71.3%. Abe et al. [15] performed a prospective analysis of 50 patients with stages T1–T3 breast cancer, in a patient-by-patient fashion. The accuracy of MRI in detection of ALNM was 74%. Scaranelo et al. [13] evaluated prospectively the performance of MRI in evaluation of ALNM, in 61 patients. The reported accuracy was 85%. The study was limited by a small sample size (61 patients) and subjective evaluation of the lymph nodes. Furthermore, there was poor inter-observer agreement, when interpreting qualitatively the T1-weighted images, (k = 0.57 for first reading and k = 0.78 for second readings).

In our study, we have shown a validation accuracy rate of 84%, which is comparable to the highest accuracy of previously published data in the literature [13,14,15]. In comparison to the Scaranelo study, we had a larger sample size, and our study was more objective segmenting the entire lymph node with subsequent systematic analysis, instead of subjective identification of the region of interest manually defined within the lymph node by the reader.

Applying deep machine learning using CNN-based algorithm in our study, we were able to generate reasonable diagnostic performance in predicting axillary lymph node metastasis even with a small dataset. Larger dataset will likely improve our prediction model.

Our study has limitations. It is a small, retrospective study in a single institution. The performance of CNN has been shown to increase logarithmically with larger datasets [15]. Larger MRI datasets are likely to significantly improve the metastatic axillary lymph node prediction model. In addition, patients in this study underwent MRI at different magnetic field strengths (1.5 or 3.0 T), but this was determined randomly based on availability and thus limiting selection bias. Other limitations include inherent limitations of this technology including potentially long training times. Traditional algorithms comparatively take much less time to train; however, this is reversed during testing time, where a CNN can take much less time to execute. Manual inspection of false positive and false negative predictions of the network revealed no discernibly consistent features that consistently lead to false negative or false positive classifications from the network.

In conclusion, it is feasible for current deep CNN architectures to be trained to predict likelihood of axillary lymph node metastasis. Larger dataset will likely improve our prediction model and can potentially be a non-invasive alternative to core needle biopsy and even sentinel lymph node evaluation. Future research with a prospective randomized study is needed to further validate our findings.