1 Introduction

Breast cancer is an epidemic health care problem all over the world, accounting for 627,000 women cancer deaths, according to the World Health Organization (WHO) [10, 28]. The standard clinical imaging protocol is mammography, to detect and treat cancer or pre-cancer patients [20]. However, a mammogram requires high skills in assessing the resulting images, especially in early cases. Computer Aided Detection (CAD) systems have been developed to reduce the workload and improve the detection accuracy of doctors and experts [22, 25, 32].

Machine and deep learning and artificial intelligence show a great success in solving medical problems [1, 14]. Recently, Convolutional neural networks (CNNs) show impressive performance in the field of pattern recognition, classification [32], object detection [29], and disease diagnosis [2, 7, 33], and more specifically in the field of breast cancer detection [3, 6, 8, 18, 24, 27, 30, 31], and [12]. For example, Lévy et al. [18] applied GoogLeNet to classify benign and malignant breast masses, using cropped Digital Database for Screening Mammography (DDSM) dataset, to achieve an accuracy of 92.9%. Yi et al. [31] used GoogleNet to classify cropped DDSM mammograms as benign or malignant with an accuracy of 85%. Chen et al. [8] used a fine-tuned ResNet model to classify ROI CBIS-DDSM images, into benign or malignant, achieving an accuracy of 93.15%. Xi et al. [30] used VGGNet to classify ROI CBIS-DDSM data into mass or calcification, achieving an accuracy of 92%. Castro et al. [6] used a CNN model to classify full mammogram mass nodules into benign or malignant, achieving a sensitivity of 80% on CBIS-DDSM database. Tsochatzidis et al. [27] fine-tuned the ResNet-101 model, achieving an accuracy of 75.3% on CBIS-DDSM database, to classify mass nodules into benign and malignant. Ragab et al. [24] achieved an accuracy of 87.2%, using Alexnet model and ROI CBIS-DDSM database, to classify mass nodules into benign and malignant. Ansar et al. [3] achieved an accuracy of 74.5% using MobileNet model and ROI CBIS-DDSM data, to classify mass nodules into benign and malignant. Hekal et al. [12] used AlexNet and ROI CBIS DDSM data to classify tumor like regions into benign and malignant, achieving an accuracy of 95%.

Although efficient methods for breast cancer detection were presented, more advances should be investigated to improve the accuracy of breast cancer detection. This paper develops a system for the early detection of breast cancer using the ROI CBIS-DDSM database [17]. Table 1 shows typical examples of the ROI CBIS-DDSM database, containing two categories, i.e., malignant and benign, and two types of nodules, i.e., mass and calcification. The main features/contributions of this work are as follows:

  • The proposed ensemble learning processes SNRs instead of the ROI images, achieving four-fold advantages: (i) ability to detect smaller size nodules within the SNRs, (ii) ability to help the model to focus only on the part to be classified (tumors), (iii) eliminate the overhead of processing the whole ROI image, and (vi) improve the detection accuracy.

  • The proposed ensemble learning applies transfer learning with a shallow classifier (SVM), achieving three-fold advantages: (i) eliminate the need for big data for training, (ii) transports the weighs of the convolutional layers without training, and (iii) eliminate the need to design and build new CNN models.

  • The proposed system applies a simple first-order momentum, guided by the achieved models’ training accuracies, to fuse the binary outputs of the ensemble, which further improves the accuracy of the system

  • The proposed system achieves superior performance over the state-of-the-art methods on the challenging standard ROI CBIS-DDSM dataset [17]

Table 1 Typical sample for the classes of the ROI CBIS-DDSM database
Fig. 1
figure 1

Proposed ensemble learning system for breast cancer detection, composed of four steps: extracting suspected regions (SNR image), ensemble learning, shallow classification, and fusion

The rest of this paper is as follows: Sect. 2 shows the materials and methods; Sect. 3 explores the findings and relevant discussions; Finally, Sect. 4 concludes the paper.

2 Research methods

The proposed ensemble system (Fig. 1) is based on four processing steps: extracting suspected regions (SNR image), ensemble learning, shallow classification, decision fusion and final diagnosis. This section illustrates each of these steps.

Fig. 2
figure 2

Extracting SNR image using Otsu thresholding

2.1 Extracting suspected nodule regions (SNRs)

The proposed system extracts the SNRs from the ROI images based on an automated Otsu thresholding [21], where the threshold is dynamic and corresponds to each input ROI image. The proposed Otsu thresholding method processes a smoothed version of the input ROI image with a Gaussian kernel of a zero mean and a variance, which is selected to equal four to suppress the high frequency noise. Optimal Otsu threshold is estimated by minimizing intra-class intensity variance on the image histogram [9, 26]. Figure 2 illustrates the steps of extracting the SNR image. The SNR estimation algorithm can be summarized as in Algorithm I:

figure a

The main advantages of SNR extraction are the ability to detect smaller size nodules within the SNRs, less algorithmic overhead, and the improvement in the detection accuracy.

2.2 Ensemble learning

The ensemble is composed of four pretrained CNN networks: AlexNet, ResNet-50, ResNet-101, and DenseNet-201. We select these networks since they are more popular for data classification, especially for breast tumor classification (e.g., [24] and [12] used AlexNet [8] and [27] used ResNet, and [19] used DenseNet).

The input to the ensemble is the standard resized SNR image (i.e., 227 \(\times\) 227 for AlexNet, and 224 \(\times\) 224 for ResNet-50, ResNet-101, and DenseNet-201 (see Table 2). AlexNet [15] contains five convolution layers (conv), three max pooling layers, and three fully connected layers (Fc6, Fc7 ,and Fc8) (see Fig. 3). Each convolutional layer consists of convolutional filters and a nonlinear activation function ReLU. ResNet [27] is an abbreviation for Residual Network. The basic idea of a ResNet model is to skip blocks of convolutional layers by using shortcut connections. ResNet-50 contains 49 convolution layers, one max pooling layer, one average pooling layer and one fully connected layer (see Fig. 4). ResNet-101 contains 100 convolution layers, one max pooling layer, one average pooling layer, and one fully connected layer (see Fig. 5). DenseNet [13] is based on residual learning like ResNet. DenseNet201 contains 200 convolution layers, four max pooling layer, one average pooling layer, and one fully connected layer (see Fig. 6).

Fig. 3
figure 3

AlexNet architecture

Fig. 4
figure 4

ResNet-50 architecture

Fig. 5
figure 5

ResNet-101 architecture

Fig. 6
figure 6

DenseNet-201 architecture

Table 2 Summary of the four pre-trained CNN models. The symbol “#” in the table indicates the number, I/P indicates the input, FC denotes fully connected, and CL denotes convolutional layers

In the proposed system, transfer learning is adopted to decrease the training overhead. Therefore, the weights of the convolutional layers of the pretrained models are transferred without training, and only the fully connected layers are trained using the ROI CBIS-DDSM data. To apply transfer learning of the CNN models, the last fully connected layer of each pretrained model (FC8 layer in AlexNet or FC1000 layer in ResNet50, ResNet-101, and DenseNet-201) is replaced by a shallow classifier (namely, a (Supported Vector Machine) SVM classifier). The vectors of activities of the FC7 layer in AlexNet or the flatten layer (just before FC1000) in ResNet50, ResNet-101, and DenseNet-201, represent the feature descriptor of the input ROI CBIS-DDSM image. Features are further normalized between 0 and 1 before being fed to the input of the SVM classifier.

2.3 Shallow classifier

To classify the ROI images, a SVM classifier, with a binary kernel, is used, to account for the variability of the classes. The supported vector machine has the advantage of less risk of over-fitting [4, 16]. In addition, it has been repeatedly used in the literature to solve this problem (breast cancer classification) [12, 24]. The idea of SVM is to formulate an effective way of learning by separating hyper planes in a high dimensional feature space [11, 23]. The input of the classifier is the vectors of activities (FC7 in AlexNet or flatten layer in ResNet-50, Resnet-101, and DenseNet-201) and the output is the binary classification of the input image (e.g., Benign or Malignant).

2.4 Fusion

A binary Support Vector Machine (SVM) follows each CNN model to provide either a binary one, e.g., malignant class, or a binary zero, e.g., benign class. To obtain the final system decision, the first-order momentum is derived over the four outputs of the ensemble, taking into account the network training accuracies, following Algorithm II (see Fig. 7 for typical examples).

figure b
Fig. 7
figure 7

Four typical examples illustrating the proposed fusion of the four networks’ SVM binary outputs (left), multiplied by the accuracy of each network (right): If first-order momentum (mean) \(>0.5\), then decide class 1. If mean < 0.5, then decide class 0. If mean \(=\) 0.5, a tie is reached and the system choose the class of the larger sum of accuracies

2.5 Performance evaluation

To test the proposed system, we used six standard metrics to evaluate the system performance, i.e., Accuracy (ACC), Sensitivity (SEN), Specificity (SPE), Positive Predictive Value (PPV), Negative Predictive Value (NPV), and F1 score (FSC), defined as follows [34]:

$$\begin{aligned}&ACC\texttt { = } \frac{\texttt {No. of correct assessments}}{\texttt {No. of \ assessments}} \end{aligned}$$
(1)
$$\begin{aligned}&SEN\texttt { = } \frac{\texttt {No. of ~ true ~ positive ~ assessments}}{\texttt {No. of ~ positive ~ assessments}} \end{aligned}$$
(2)
$$\begin{aligned}&SPE\texttt { = } \frac{\texttt {No. of~ true~ negative~ assessments}}{\texttt {No. of~ negative~ assessments}} \end{aligned}$$
(3)
$$\begin{aligned}&{PPV}{} \texttt { = } \frac{\texttt {No. of~ true~ positive~ assessments}}{\texttt {No. of ~positive~ assessments }} \end{aligned}$$
(4)
$$\begin{aligned}&{PNV}{} \texttt { = } \frac{\texttt {No. of ~true~ negative~ assessments}}{\texttt {No. of~ negative~ assessments }} \end{aligned}$$
(5)
$$\begin{aligned}&{FSC}{} \texttt { = } 2* \frac{\texttt {PPV * SEN}}{\texttt {PPV + SEN }} \end{aligned}$$
(6)

3 Results and discussions

This section explains, in details, the collected database, the experimental setup, and the results and related discussions.

3.1 Collected database (CBIS-DDSM)

To test the proposed CAD system, the ROI CBIS-DDSM [17] database is used, a standardized version of the Digital Database for Screening Mammography (DDSM) [5]. It contains 3549 ROI mammogram images, with 1852 calcification and 1697 mass images (typical examples are shown in Table 1).

3.2 Experimental setting

CBIS-DDSM ROI dataset is used to test and evaluate the proposed system. The data has been divided randomly into training set (70%) and testing set (30%), in order to train and test the proposed system. The Bayesian optimizer is used to minimize the binary cross entropy function through the training process of the deep transfer learning model, with a learning rate of \(10^{-4}\). During training, the data is shuffled using a mini-patch size of 128. The maximum number of epochs is set to 20.

3.3 Comparison results

To evaluate the potential of the individual investigated learning system, performance metrics are derived for the AlexNet, ResNet-50, ResNet-101, and DenseNet-201, and compared to the proposed system, applying the same type of SNR prepossessing (see Algorithm I and Fig. 2 for the proposed SNR extraction method).

As illustrated in Fig. 8 and Table 3, all CNN models achieve acceptable accuracy (even if each CNN is used alone), since they apply the proposed SNR extraction method. To quantify the advantage and the need of applying SNR to the system, Fig. 9 presents the improvement in the accuracy for all individual transfer learning systems. It is clear from the figure the role of applying SNR, which significantly improves the performance for all investigated systems, e.g., the accuracy increases by around 50% for the proposed system.

Table 3 Comparison results between five systems (AlexNet, ResNet-50, ResNet-101, DenseNet-201, and the proposed system); using the SNRs extracted from ROI CBIS-DDSM dataset and a binary SVM classifier into, i.e., benign (B) or malignant (M), mass (MA) or calcification (CA), benign mass (BM) or malignant mass (MM), and benign calcification (BC) or malignant calcification (MC)
Fig. 8
figure 8

Quantitative comparison between five systems (i.e., AlexNet, ResNet-50, ResNet-101, DenseNet-201, and the proposed system); using the SNRs extracted from ROI CBIS-DDSM dataset and a binary SVM classifier into, i.e., benign (B) or malignant (M), mass (MA) or calcification (CA), benign mass (BM) or malignant mass (MM), or benign calcification (BC), or malignant calcification (MC)

Fig. 9
figure 9

Quantitative comparison between five systems (i.e., AlexNet, ResNet-50, ResNet-101, DenseNet-201, andd the proposed system); with or without using the SNRs extracted from ROI CBIS-DDSM dataset and a binary SVM classifier into either benign (B) or malignant (M)

To further highlight the advantages of the proposed system, visual and quantitative assessments of the ensemble learning and our fusion algorithm (Algorithm II) have been carried out. Samples for the classification of four mass images are shown in Fig. 10. The Ground Truth (GT) diagnosis for the first two columns are benign “B” and for the last two columns are malignant “M”). Figure 10 demonstrates the advantage of the proposed ensemble fusion algorithm (Algorithm II) to produce better classification results: Even if there is an error on one or two CNN outputs, the proposed system shows the potential to achieve the correct output. The visual results are verified quantitatively in Table 3. It is remarkable that the proposed system achieves the best performance for all the three investigated metrics overall competing for individual CNN systems. These results highlight the advantages of the proposed system. Furthermore, the comparison results in Table 4 show the advantage of the proposed system over other related competing methods. This is due to the inclusion of the SNR extraction step, which limits the search area to the tumor-like regions and enables the system to find the small nodules correctly.

Fig. 10
figure 10

Sample classification of four mass images, showing the advantage of the proposed ensemble fusion to improve the performance

Table 4 Comparison results

3.4 Computational complexity

All results on this paper are obtained using an ordinary laptop, Intel core I5-6200U @2.30 GHz, 6GRAM. The summary of time performance is detailed below in Table 5.

Table 5 Analysis of time performance time of the proposed system

The computational complexity of each model lies in the number of convolutional layers (CL) of each model. Table 2 summarizes the number of convolutional layers (CL) for each model. As shown in the table, the AlexNet consists of the least number of layers, so it takes the least time (i.e., the mean time between 1.8 and 8.2 s, based on the problem, as shown in Table 5). On the other hand, DensNet-201 contains the largest number of layers so it takes more time (i.e., the mean time is between 1.9 and 48 s, based on the problem). The fusion step depends on a simple first order momentum, so it takes a small amount of time (1.7 s, as shown in Table 5). The overall mean time of the proposed system is around 15–60 s per each test image, based on the problem solved. The current processing time (0.25–1 min) is still sufficient in this medical application. However, since the proposed system speed is not near real time, in the future, we will replace using Matlab, in the current work, with Python and depend on the GPU parallel processing algorithms in order to minimize the processing time of the test image.

4 Conclusion

In this work, we proposed a CAD system to early detect breast cancer based on deep learning. Unlike related work, the utilized CNN models extract features from the SNRs based on automated adaptive Otsu thresholding, in order to improve the training capabilities of the deep learning model. An ensemble is used for feature extraction followed by SVM classifiers. The final decision is taken by fusing the binary outputs of the SVM classifiers, taking into account the training accuracy of each classifier. Experiments results on the ROI CBIS-DDSM data confirm the superior of the proposed method over the related work. In the future, other public databases for breast cancer detection will be investigated to test the robustness of the proposed system. In addition, other features/models will be tested in order to improve the performance.