Keywords

1 Introduction

By screening for indeterminate breast lesions, it is possible to detect breast cancer [1,2,3,4,5,6]. Clinically, the most common and best techniques are images captured from ultrasound [7] and mammography [8, 9] procedures, if there are suspicious lesions. Then further analyses using biopsies [10], histopathological images [11,12,13] and magnetic resonance imaging (MRI) are performed [14].

The ultrasound allows obtaining high quality images, without the need for ionizing radiation, and enables detection of very small lesions, even masses and microcalcifications. However, mammography (x-rays) is currently the most used imaging method to detect breast cancer early in both, symptomatic and asymptomatic patients [2], reducing unnecessary biopsies. Also, World Health Organization recommends it as the standard imaging procedure for early diagnosis.

Specialists can interpret the breast images using the latest breast imaging reporting and data system (BI-RADS) version [15,16,17]. Nevertheless, the traditional manual diagnosis is time consuming and prone to diagnostic errors [18, 19]. Digital images from physiological structures can be processed to visualize hidden diagnostic features [20].

Automated techniques based on Deep Learning (DL) and Machine Learning (ML) [20,21,22,23,24,25], can be utilized for classification, diagnostic accuracy and improvement of localization and tumor process monitoring. Convolutional neural networks (CNN), have been extensively used to analyze medical images [27,28,29,30,31,32,33]. A recent paper [27] by Jimenez et al., reviews DL applications in breast cancer using ultrasound and mammography images.

There are many semi-automated breast tumor classification methodologies [34, 35]. For instance, Ragab et al. [2] used a deep CNN technique and replacing the last fully connected layer with a SVM as breast tumor classifier. However, these semi-automated methods cannot totally relieve the diagnosis burden of the pathologist. Thus, recently automatic DL techniques are gaining attention due to their superior performance in automatic feature extraction, selection and better features discrimination for breast lesions classification [16, 36,37,38]. There are a number of CNN architectures e.g. AlexNet [39], VGGNet [40], ResNet [41], Inception (GoogleNet) [42], DenseNet [43], ImageNet [43] that are of great value in screening and reduces the need for manual processing by experts, thus saving time and resources. In this work we selected DenseNet to solve the vanishing-gradient problem, strengthen feature propagation, feature reuse and reduce the number of parameters as indicated by Huang et al. [18].

Therefore, the principal contribution of this paper is to present a novel Deep CNN method for automatic segmentation and a DenseNet for feature selection and classification of breast lesions in both Cranio-Caudal (CC) and Medio Lateral Oblique (MLO) mammography views, and discuss the results obtained from this network.

2 Materials and Methodology

The workflow for this methodology is illustrated in Fig. 1 and consists of the follo-wing steps: (1) Breast Dataset acquisition and preprocessing. (2) RoI (Region of Interest) image segmentation using a Mask R-CNN with RoIAlign technique. (3) Feature selection, extraction and classification using DenseNet architecture. (4) Evaluation of performance metrics. The Mask R-CNN and RoIAlign are discussed below.

Fig. 1.
figure 1

Methodology workflow.

2.1 Dataset

Images from a public Breast Cancer Digital Repository (BCDR) were used for training and evaluation of the CNN. The BCDR-DM [44] mammography dataset contains 724 (723 female and 1 male). In addition to individual patient clinical data, the patient mammograms had both CC and MLO image views as well as the coordinates of the lesion contours. The images are grey-level mammograms with a resolution of 3328 (width) by 4084 (height) or 2560 (width) by 3328 (height) pixels, depending on the compression plate used in the acquisition (according to the breast size of the patient).

2.2 Segmentation

Preprocessing consists of breast border extraction, pectoral muscle removal and tumor delineation from the background [24]. This is followed by Region of Interest (RoI) segmentation. The operation is necessary to target and crop the bounding box of the lesions automatically. For that, a statistical cross-validation technique (hold-out splits) was used to divide the dataset into training 80% (579 images) and testing 20% (145 images) where 579 segmentations were manually made by specialized radiologists based on BI-RADS criteria (Fig. 2).

Fig. 2.
figure 2

Illustration of RoI binary mask contour, selected by radiologists.

Once the RoI is detected and cropped, we extract the features maps of the tumor contour by a Mask R-CNN [45] network trained using RoI alignment (RoI Align) technique. This technique is based on bilinear interpolation to smoothly crop a patch from a full-image feature maps based on a region proposal network (RPN), and then resize the cropped patch to a desired spatial size using a loss function. This has shown to outperform the use of RoI pooling [28].

The four sampling points in each bin dashed grid represents the RoIAlign method (Fig. 3).

Fig. 3.
figure 3

Breast-Dense workflow for breast tumor classification (1) image is input into Mask RCNN for feature extraction (2) The Region Proposal Network (RPN) is used to generate N proposal windows for each image; (3) The Align RoI layer, generates for each RoI a fixed-size characteristic map and are assigned to the CNN Convolution, and finally (4) The RoI map feature selection and classification steps are carried out by Densenet architecture.

Here the value of each sampling point is computed by bilinear interpolation from the nearby grid points on the feature map. The maxpooling procedure is used by RoI Pooling to convert features in the projected region of the image of any size, (x1) × (y1), into a small fixed window, [x1] × [y1]. The input region is divided into [x1] and [y1] grids, giving approximately every sub-window of size ([x1]/x1) ([y1]/y1). Then maxpooling is applied to every grid.

During the Mask R-CNN training, the loss function L (Eq. 1) is minimized,

$$ L = L_{class} + L_{box} + L_{mask} $$
(1)

where Lclass is the classification loss, Lbox is the bounding-box loss regression and.

Lmask is the average binary cross-entropy loss mask prediction. The parameters Lclass + Lbox, and Lclass are defined by Eqs. (2) and (3).

$$ L_{class} + L_{box} = \frac{1}{{N_{class} }}\sum\nolimits_{i} {L_{class} \left( {p_{i,} p_{i}^{*} } \right)} + \frac{1}{{N_{box} }}\sum\nolimits_{i} {p_{i}^{*} L_{1}^{smooth} \left( {t_{i,} - t_{i}^{*} } \right)} $$
(2)
$$ L_{class} \left( {\left\{ {p_{i,} p_{i}^{*} } \right\}} \right) = - p_{i}^{*} \log p_{i}^{*} - (1 - p_{i}^{*} )\log (1 - p_{i}^{*} ) $$
(3)

where, smooth in Eq. (2) is given by:

$$ smooth_{L1} (x) = \left\{ {\begin{array}{*{20}c} {0.5x^{2} } & {if\left| x \right| < 1} \\ {\left| x \right| - 0.5} & {otherwise,} \\ \end{array} } \right\} $$
(4)

and, the Lmask, is:

$$ L_{mask} = - \frac{1}{{m^{2} }}\sum\nolimits_{1 \le i,j \le m} {\left[ {y_{ij} \log \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{ij}^{k} + \left( {1 - y_{ij} } \right)\log \left( {1 - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{ij}^{k} } \right)} \right]} $$
(5)

The different variables are interpreted in Table 1.

Table 1. Definition of different variables of Eqs. (15).

2.3 Feature Extraction and Classification: DenseNet Architecture

After Mask R-CNN segmented each RoI, DensetNet carried out the features extraction and classification process. DenseNet presents several advantages over other pretrained CNN methods. These include more accuracy, less prone to overfitting and is efficient to train a cross-layer connections structure because it contain shorter connections between layers [18].

In addition, the CNN consists of a number of feedforward layers implementing convolutional filters and pooling layers. After the last pooling layer, the CNN has a number of several fully connected layers that convert the 2D feature maps of the previous layers into a 1D vector for classification [22]. This is represented as:

$$ G\left( X \right) = g_{N} \left( {g_{N} - 1\left( { \ldots \left( {g_{1} \left( x \right)} \right)} \right.} \right) $$
(6)

Here, N is the number of hidden layers, X, the input signal and gN is the corresponding function to the layer N. A typical CNN model convolutional layer consists of a function g, with multiple convolutional kernels (\(h_{1}\),…\(h_{k - 1}\),\(h_{k}\)). Every \(h_{k}\) denotes a linear function in kth kernel, given by:

$$ h_{k} \left( {x,y} \right) = \sum\limits_{s = - m}^{m} {\sum\limits_{t = - n}^{n} {\sum\limits_{v = - d}^{w} {V_{k} } } } \left( {s,t,v} \right)X\left( {x - s,y - t,z - v} \right) $$
(7)

where (x, y, z) represents pixel position of input X, m represents height, n denotes width, w is depth of the filter, and \(V_{k}\) represents weight of kth kernel. The CNN schematic is shown in Fig. 3.

2.4 Evaluation Metrics

Various metrics are used to quantitatively evaluate the classifier performance of a DL system [33]. These include Accuracy (Acc), Sensitivity (Sen), Specificity (Spe), Area Under the Curve (AUC), Precision, and F1 score.

The trained Mask R-CNN model performance was quantitatively assessed by the mean average precision (MAP), namely the accuracy of lesion detection/segmentation on the validation set:

$$ MAP = \frac{A \cap B}{{A \cup B}} = \frac{1}{{N_{T} }}\sum\limits_{i}^{{N_{i} }} {\left( {\frac{{N_{i}^{DR} }}{{N_{i}^{D} }}} \right)} $$
(8)

where A is the model segmentation result, and B is the contour tumor delineated by the radiologist (the ground truth). In the above equation, NT is the number of images; \({N}_{i}^{DR}\) represents the area overlap between the model detected lesion and the true clinical lesion regions and \({N}_{i}^{D}\) is the size of the true clinical lesion.

3 Results

To test the model, we used the training dataset. The left side of Fig. 3 the original cropped image, and the right side the mask produced by a radiologist. The performance of the trained Mask R-CNN model achieved a MAP value of 0.75 for the automatic lesion delineation in the testing dataset.

3.1 Breast DenseNet

Table 2 summarizes the results of the Breast DenseNet model and their comparison performance evaluation with different pre-trained models in terms of the Acc, Sen, Spe and AUC .

Table 2. Summary of pre-trained DL model results in mammograms.

4 Discussion

In this work we used a BCDR dataset is one of the most utilized mammography databases for processing images; the others being MIAS, DDSM, and Inbreast [33]. The BCDR database contains 1734 total cases of patients with mammography and ultrasound images, clinical history and lesion segmentation, and has been used to train convolutional networks.

Thus, with respect to the segmentation process, several traditional methodologies have been used for extract the RoI area: i) threshold-based segmentation, (ii) region-based (iii) pixel-based, (iv) model-based (v) edge-based (vi) fuzzy theory, (vii) artificial neural network (ANN) and (viii) active contour-based [27]. But those studies used manually segmentation and the errors in the accuracy of the tumor deliniation can affect the results of the classificfation. This is one of the several reasons why researchers are using DL arquitectures. For example, Chiao et al. [25] built an automatic segmentation and classification model based on Mask RCNN in ultrasound images. It reached a mean average precision (MAP) of 0.75 for the detection and segmentation, which is similar to our results, and a benign or malignant classification accuracy of 85%.

For detection and classification process some traditional studies used support vector machine (SVM) [2, 49] methodology. Those methods extracted features manually from the RoI in breast ultrasound images and then these features were input to the SVM classifier.These were classified as benign or malignant lesions using texture morphological and fractal features. However, in the present work it was not necessary.

DL methods, have been used for their excellent performance in medical image classification. AlMasni [46], trained the YOLO method in clinical mammography images, which successfully (Acc = 97%) identified breast masses. Alkhaleefah et al., [50] used transfer learning technique to classify benign and malignant breast cancer by various CNN architectures: AlexNet, VGGNet, GoogleNet, and ResNet. However, these networks had been trained on large datasets such as ImageNet, which do not contain labeled breast cancer images and therefore lead to poor performance. Huang et al. [18] used a dense CNN to object recognition task and obtained significant improvements over other state-of-the-art [50, 51] with less computation to achieve high performance. He et al. [50] and Huang et al. [51] showed that not all layers may be needed and highlighted the fact there exist a great amount of redundancy in deep residual (ResNet) networks.

Based on these observations, our work used the DenseNet architecture.The Breast-DenseNet DL system presented here can detect the locations of masses on mammograms and classify them as benign or malignant, from the automatically segmented region which successfully accuracy of 97.7%. Also, the proposed methodology successfully identified breast masses in the dense tissues. We did not require filtering and noise elimination before segmentation and feature extraction to improve the accuracy [46]. The RoI regions were automatically delineated and the feature extraction tumor was done via YOLO using Mask RCNN.

5 Conclusions

We conclude that DL promises an improvement over other approaches. The Breast-Dense strategy is state-of-the-art and improves the state-of-the-art classification accuracy when using the BCDR dataset. The YOLO + DenseNet model trained on the dataset, achieved the best accuracy rate overall, and was used to develop a tumor lesion classification tool.

Breast-DenseNet provided highly accurate diagnoses when classifying benign from malignant tumors. Therefore, its predictor could be used as a preliminary tool to assist the diagnosis by the radiologist. Our future research includes deeper architectures as well as ultrasound, histopathology and PET images to deal with problems encountered in mammography images of highly dense breasts. It will be helpful to include other imaging techniques, in combination with mammography during the learning process, to help to model work as a robust breast mass predictor. In conclusion, Table 2 demonstrated that Breast DenseNet achieved better results compared to other state-of-the-art methods, which used the same public dataset.