Keywords

1 Introduction

Breast cancer is a complicated and multidimensional disease with a wide range of risk factors [1], histological findings, clinical manifestations, and treatment choices [2]. Breast cancer is characterized by the unrestrained growth of malignant cells in the mammary epithelial tissue and is an illness that affects both men and women. Breast cancer is the most common cancer in women in the world, with a prevalence that rises with age [3]. It is the second largest cause of death for women after lung cancer [4].

Recent medical discoveries have been produced in new and improved techniques to prevent, diagnose, categorize, and treat breast cancer [5]. One of the most successful ways to automatically detect and diagnose diseases at an early stage is to use a computer-based diagnostic (CAD) tool for medical imaging. CAD binary classification methods use intelligent approaches to identify breast results as benign or cancerous automatically. These imaging techniques in medicine may be useful in early breast cancer diagnosis [6]. As a result, due to the rising importance of predictive diagnosis and therapy, there is a growing trend in cancer detection and classification research to apply machine learning algorithms for projection and prognosis [7].

The use of histopathology data to classify breast abnormalities has grown in popularity in recent years. An independent classifier is used to build a set of hand-crafted attributes in classic machine learning-based CAD systems. The extraction of features from histopathology images is critical [8]. Because histopathological images include various aspects, it is challenging to determine the particular features due to pathologists’ lack of experience in recognizing specific needs. The majority of the time, patterns, fractals, colors, and intensity levels can be used to identify images. Manual feature extractions are time-consuming and can need in-depth prior knowledge of the disease in order to find highly representative characteristics [9].

As processing power and massive amounts of data become more readily available and examined during the learning process, the number of studies using machine learning has increased. Because of its ability to learn from raw data, machine learning is becoming increasingly popular for dealing with simple and complex situations. However, the correct machine learning models can reduce diagnostic error rates [10]. Massive volumes of complex data may now be examined and comprehended using new machine learning techniques [11].

To circumvent the drawbacks of conventional machine learning techniques, deep learning was created to efficiently employ the significant data that may be extracted from raw images for categorization techniques [12]. Deep learning uses the general-purpose learning technique to do away with the necessity for manual feature tuning. Deep learning utilizing convolutional neural networks has made a lot of advancements in the field of medical image analysis recently, including the classification of mitotic cells from microscopic images and the identification of tumors [13]. With large data sets, the convolutional neural networks application performs admirably, but poorly with smaller data sets.

With encouraging findings, several research teams have investigated the application of convolutional networks in medical image processing [14]. Convolutional networks have been used to overcome the difficulty of identifying and classifying tumors in ultrasound images [15]. Furthermore, several recent studies evaluated the feasibility of [16] CNN’s, AlexNet, U-Net, VGG16, VGG19, ResNet18, ResNet50, MobileNet-V2, and Xception for the problem of classifying Ultrasound images as benign or malignant [17].

The purpose of this article is to compares the performance of pre-trained deep learning models in order to determine whether convolutional techniques are best for breast cancer binary classification and also the best performing model from a large number of alternatives, as well as to propose a mechanism for differentiating them. Eight pre-trained CNN architectures are used to train these semantic segmentation models on the Ultrasound image dataset.

The following will be how the remaining paper is constructed. Section 2 represents the previous study of breast cancer classification. Section 3 contains data pre-processing techniques, feature extraction, and methodology part. Section 4 encompasses the dataset description, experimental assessment, and discussion. Finally, Sect. 5 provides the conclusion and future direction.

2 Literature Review

In recent years, numerous breast cancer classification strategies have been proposed, with CNN models which received a lot of research interest [18]. For cancer detection and classification, various machine learning and deep learning algorithms are available. Convolutional Neural Networks, Recurrent Neural Networks, and pre-trained models like Alex Net, Google Net, VGG16, VGG19, ResNet50, InceptionV3, DenseNet121, DenseNet 169, DenseNet201, and Xception are some of the most used deep learning approaches for breast cancer classification [19].

Using the idea of transfer learning, SanaUllah et al. developed a CNN-based framework that can recognize and binary categorize breast cytology images. Their proposed framework reached 97.52% accuracy [20]. Similarly, they also offered data augmentation to expand the size of a data set and improve the efficiency of CNN algorithms. Author Gupta et al. use SVM and Logistic Regression to compare deep feature extraction and classification performance [24]. Regarding the standard breast cancer dataset, the proposed study outperforms previous strategies and produces state-of-the-art results.

Hong Fang et al. proposed an improved multilayer perceptron for breast cancer detection [21]. In contrast, Saeed Talatal et al. proposed a Multilayer Perceptron Neural Network-Based Intelligent Collective Classification method (IEC-MLP) with an average accuracy of 98.74% for breast cancer detection [5]. The suggested approach is composed of two components: parameter optimization and ensemble classification. Esraa et al. demonstrated a completely automated breast cancer diagnosis technique with a 99.33% accuracy rate utilizing a U-Net to determine the breast region from thermal images and a deep learning method to evaluate abnormal breast tissues using thermal images [22].

Mohammed Abdullah et al. created a DCNN classifier model based on InceptionV3 and V4 for breast cancer detection to study the behavior of many modern deep learning techniques for breast cancer diagnosis [23]. The results showed that employing color thermal imaging, DCNN Inception V4, and updated Inception MV4 considerably increased their accuracy and efficiency in detecting breast cancer. Karan Gupta et al. used a deep learning model to automatically categorize breast cancer images that relied on traditional classifiers’ pre-trained (CNN) activation properties [6].

The difficulty in gathering sufficient positive cases and the problems in developing breast cancer binary classification algorithms make it challenging to tackle overfitting issues, as is the case with many machine learning applications in healthcare [26]. Numerous subsequent papers used generative adversarial networks (GANs) for data augmentation [27]. In this instance, the training dataset can be improved by using the generative adversarial network (GAN) [28]. The authors Shuyue et al. devised a technique for detecting breast cancer using artificial mammograms that included the use of GANs as an image enhancement technique, achieving a validation accuracy of 79.8% [29].

To enhance the classification model’s performance, Asha et al. devised a discriminating robustness approach to increase accuracy and improve the classification model’s performance [30]. In this paper, they compare the performance of numerous CNN + Traditional Classifier configurations, such as VGG-16 + SVM, VGG-19 + SVM, Xception + SVM, and ResNet-50 + SVM. According to the researchers, the ResNet50 network had a maximum accuracy of 93.27%, according to the researchers. Aditya et al. suggested a Modified VGG (MVGG) model based on transfer learning to diagnose breast cancer in mammography images [25]. According to the trials, the suggested transfer learning combination of MVGG and ImageNet achieves a 94.3% accuracy, and other convnets are outperformed by the suggested hybrid network.

The prior background investigation revealed that other authors had previously employed several approaches with varying degrees of success. It assessed their work and uncovered several problems, prompting us to take action. Everyone has encountered a few algorithms, but none have encountered as many as we have. Because convolutional networks outperform previous models, the suggested comparison performance model will aid pathologists in accurately diagnosing breast cancer at an early stage. Additionally, data augmentation approaches are rarely employed in low-data scenarios, obviating the need to address issues such as feature forecasting. As a result, this paper gives a comparative analysis of breast cancer binary classification utilizing Convolutional Networks. We employed eight pre-trained CNN models and the GAN augmentation technique to extract and forecast attributes from ultrasound images for the purpose of classifying benign and malignant lesions.

3 Methodology

The workflow is divided into five components and prepared for binary classification in Deep Convolutional Comparison Architecture (DCCA), which is described in Fig. 1. To illustrate, the first stage of this suggested comparison approach. The pre-processing data stage begins with preparing the dataset for pre-processing, and then augmenting the data using the Generative Adversarial Networks (GANs) architecture. By this, the pre-processing data stage was completed. Ian Good-fellow and his colleagues first proposed GANs technique in June of 2014 [31].

Nevertheless, GANs is a data enrichment tool that pits two neural networks against each other to create new, synthetic data instances that can pass for actual data. They’re commonly utilized in image, video, and voice generation. The experimental environment was built up, and networks were employed to build the data for classification through feature extraction and selection. The model training phase began once all of the features were identified and selected; however, eight different convolutional techniques were utilized to categorize the data in the experimental setting. Finally, the model assessment step determines the best binary classification result. The complete architecture is displayed in Fig. 1.

Fig. 1.
figure 1

Deep Convolutional Comparison Architecture (DCCA)

3.1 Data Pre-processing Using GAN

The amount of the training dataset heavily influences deep learning models’ performance. As a result, strategies for increasing dataset cardinality, such as data augmentation, are crucial. By addressing the issue of channel over-fitting, data augmentation enhances network performance. In this study, GANs data augmentation tactics improve the generalizability of the fit model.

A generator and discriminator network were built to provide GAN augmentation using the architecture. A noise vector is sent into the generator as input. It creates augmented data, which are then supplied to the discriminator, along with actual data, to identify which distribution the samples came from. On the other hand, the generator’s purpose is to learn the accurate distribution without seeing it, such that its output is indistinguishable from actual samples. Both networks are trained simultaneously and in opposite directions until an equilibrium is attained.

For x Rd, y = P data (x) is a depict from x to actual data y in d- dimensional space. To model this mapping, a neural network dubbed the generator G. Sample y is genuine if it comes from Pdata; sample z is synthetic if it comes from G. The discriminator D is a neural network that determines whether or not a sample is genuine. (y) = 1, D(z) = 0 represents the absolute situation. The G and D are the two neural networks that constitute the GAN. The corresponding loss function of a two-player mini-max game is used to train these adversarial networks:

$$ {\text{min}}\;{\text{max}}\,V\left( {G,D} \right) = E\left\{ {\log \,D[Pdata\left( x \right)]} \right\} + E(\log \{ 1 - D[G\left( x \right)]\} ) $$
(1)

For G(x) = p data (x), there is a global optimal solution to this min-max issue (x). The goal is to determine the distribution of reliable data. When D(y) = D(z) = 0.5, the discriminator D can no longer tell the difference between a genuine and a manufactured sample. By adjusting the input x, G can be used to make artificial samples. The input x for G in this investigation was a noise vector with 100 attributes from a Gaussian mixture N (0, 1). It’s vital for a well-trained GAN to be able to create data samples that appear to be real by using noise vectors.

Using the exception of the output layer, the generator network was designed with four up-sampling layers and five convolutional layers. The ReLU activation function is used in every layer, whereas the tanh function is used at the output layer. The generator’s purpose is to create a 229 229 3 picture from a 100-length vector. In contrast, the discriminator takes a 229 229 3 picture as input and outputs a value between 0 and 1, indicating whether the image is augmented or not. The discriminator network is constructed with four conv layers with max- pooling layers and one fully connected layer, just like a standard CNN. Figure 2 shows a sample output of augmented benign and malignant classes picture after processing the data via this GAN network.

3.2 Feature Extraction, Selection and Classification

In this study, eight different deep convolutional methods are used to extract, select, and classify features, enabling the input image to flow ahead until it reaches a pre-specified layer, and then using the layer’s outputs as the outcome feature. The pre-trained network acts as an arbitrary feature extractor and selector.

DenseNet:

It’s a cutting-edge CNN model for visual object recognition that requires fewer parameters to attain cutting-edge performance. DenseNet is very similar to ResNet in terms of the preceding layer output being mixed with a prospective layer using Dense Net’s concatenated (.) characteristics. On the other hand, ResNet uses an incremental attribute (+) to integrate the result of prior layers with the output of subsequent layers. The DenseNet Architecture seeks to solve this problem by firmly connecting all tiers. The architectures DenseNet- 121, DenseNet-169, and DenseNet-201 were used in this study.

VGG:

The VGG Net architecture allows for exceptional accuracy performance. The architecture of the Visual Geometry Group is categorized into six categories. The architecture includes a layer of repeated convolution and pooling. VGG-19 Net has 19 layers, including 16 conv and three fully connected (FC) layers, whereas VGG-16 Net has just 13 conv levels and three layers. According to a deep-structure VGG-Net, the network’s depth is essential for good performance.

Xception:

This Xception architecture uses a depth-wise separable sequential array of convolution layers with residual blocks. The purpose of extensively detachable convolution is to cut down on processing time and memory usage. The 36 conv layers of Xception are divided into 14 components. In Xception, separable convolution helps in the resolution of issues such as fading gradients and representational bottlenecks. A channel in the sequential network separates channel-wise and space-wise features learning. Instead of being concatenated, this shortcut connection uses a summing operation to make the outcome of the previous layer can be used as an input to the final layer.

Inception V3:

There are 42 layers in the Inception module. The InceptionV3 module from Google Brain comprises 159 layers and is the third iteration of the Inception module. The Inception module’s main idea is to mix small and big kernels with learning multi-scale interpretations while keeping the computational cost and parameter count to a minimum.

ResNet-50:

The idea of a residual block was first thought of by ResNet, which is a deep residual learning network. The first block’s input is connected to the second block’s output through residual blocks. This strategy enables the residual block to acquire knowledge about the residual function without inflating the parameters. A conv layer, 48 residual blocks, and a classifier layer with eleven to thirty-three tiny filters make up the 50-layer residual block known as ResNet50.

4 Result Analysis and Discussion

This study employed several statistical criteria to evaluate the techniques, including accuracy, precision, recall, and F1 score. A classification confusion matrix is available for both normalized and non-normalized data. To aid comprehension, graphs of model accuracy and loss function are provided.

Fig. 2.
figure 2

Augmented Data in Various Epochs

4.1 Dataset

For this investigation, we used the Breast Ultrasound Image Dataset, which was also used for binary classification. The Breast Ultrasound Image Dataset is a freely accessible dataset found on Kaggle [32]. Walid et al. constructed the dataset in 2018 [33]. At the baseline, the original dataset had 780 images, including ultrasound images of women between the ages of 25 and 75. The classification is based on groups of 306 benign and 294 malignant images, respectively. The enhanced data at various epochs utilizing the GAN augmentation approach is shown in Fig. 2.

4.2 Experimental Setup

In this experiment, data passed through eight algorithms to extract and pick features, then used the retrieved features to train the convolutional model. Before the feature extraction and selection stages, as well as the classification phase, some experimental setup is done to ensure that the pipeline runs smoothly. Finally, assess the experimental findings and choose the model that best fits the data.

  1. a.

    Workstation: The work is done on the premium virtual engine Collaboratory.

  2. b.

    Packages and Libraries: The following libraries were used in this experiment:

    • NumPy version 1.21.5.

    • TensorFlow version 2.8.0.

    • Colab GPU NVIDIA Tesla K80

    • Keras version 2. x

4.3 Classification Result and Confusion Matrix

Data augmentation with GAN produced 1200 images total, of which 600 were (benign + malignant) and 600 were (normal) for use in model training in this study. For the training test, we split the data into an 80:20 ratio. The results are based on 240 test images that were not used during the training period. Table 1 summarizes the accuracy, precision, recall, and F1 score of the gathered findings for the Binary classification. Table 1 shows that DenseNet-121, and DenseNet-169 both had an accuracy of 92%, but Xception had the highest accuracy of 95%, with Xception having the highest precision of 97%. DenseNet-121, on the other hand, earns the highest score of 98%, and they all finish up with a 95% F1 score. The same analysis is performed for the Malignant class, and it can be observed that while the accuracy for DenseNet-121, DenseNet-169, and Xception is still 92%, the precision is now 90% for DenseNet-121, 88% for F1 score, and 84% for Xception.

Table 1. Results of Binary Classification

In this case study, the confusion matrix is vital to emphasize since it will show the fundamental importance of data processing strategies. Figure 3 shows the performance evaluation of classifying results for normalized data. The actual result in each model is greater in normalized data. However, it is less effective in non-normalized data. By doing a little digging, we can observe that in each case, Xception, DenseNet-169, DenseNet-121, and VGG-16 came out with a high degree of accuracy in determining the real outcome.

4.4 Model Accuracy, Loss Function, and ROC Graph Analysis

To understand the offered model performance, graphs are crucial to end the result analysis. The Xception model consistency on both the training and testing phases is considerably superior to the others, as seen in Fig. 4, where DenseNet-121 and DenseNet-169 have higher training accuracy but lower testing accuracy. In this scenario, the performance of the VGG-16 is also considerable.

In Fig. 5, it can be seen that while employing the categorical loss function, various abnormalities in the loss function graph occurred, with the most incredible consistency of loss in VGG-16 and Xception, but more high and low amounts of losses in the other cases.

Finally, the ROC graph analysis finished the scenario in Fig. 6. This case study revealed that the Xception model ROC is the highest, implying that the Xception model is more resilient than the other given models in this binary classification case study. It contained the peak position in each dimension and produced the best-performing model. In this binary classification scenario, the Xception model beat the DenseNet-121, DenseNet-169, and VGG-16 models, although the DenseNet-121, DenseNet-169, and VGG-16 models did poorly.

Furthermore, it is vital to note that the balance between convolutional layers and residual connections is critical in dealing with the classification issue. The detachable convolution channel of the sequence network in Xception combines channel-wise and space-wise feature learning to aid in the resolution of difficulties like fading gradients and representational limits. Instead of concatenating, the alternative model focuses on attribute management, deep structural analysis, computational cost, and parameter explosion.

4.5 Discussion

In this binary classification scenario, the Xception model beat the DenseNet-121, DenseNet-169, and VGG-16 models, although the DenseNet-121, DenseNet-169, and VGG-16 models did poorly. Furthermore, in dealing with the issue of classification, it is vital to note that the balance between convolutional layers and residual connections is critical. The detachable convolution channel of the sequence network in Xception combines channel-wise and space-wise feature learning to aid in the resolution of difficulties like fading gradients and representational limits. Instead of concatenating, the alternative model focuses on attribute management, deep structural analysis, computational cost, and parameter explosion.

Fig. 3.
figure 3

Confusion Matrix of Data with Normalization

Fig. 4.
figure 4

Model Accuracy Graphs of Presented Model

Fig. 5.
figure 5

Model Loss Graphs of Presented Model

Fig. 6.
figure 6

ROC Graphs of Presented Model

The DenseNet design may maximize the residual mechanism by having each layer tightly connected to the ones below it, based on the layering activity of these models. The model’s compactness makes it non-redundant because the learned feature is shared through community knowledge. Convolutions, average pooling, max pooling, dropouts, and fully connected layers train densely linked deep networks with implicit deep supervision, allowing the gradient to flow back more quickly due to fast connections. Convolutions, average pooling, max pooling, dropouts, and fully linked layers, on the other hand, are symmetric and asymmetric building components in the Inception model.

It primarily uses factorizing Convolutions to reduce the number of connections and parameters to train, resulting in faster processing and better results. As a result, it serves as an optimized picture classification booster. ResNet works similarly, layering all of these blocks exceedingly profoundly and regularly. In addition, the VGG approach uses both independent and completely connected Conv layers, resulting in a computationally costly strategy with a lower error rate than others.

Xception, an extreme version of Inception using depth-wise separable convolution, is even better than Inception V3. In Xception, depth-wise convolution is channel-wise n*n spatial convolution, and point-wise convolution is 1 1 convolution to change the dimension. Xception employs the most detailed and indepth knowledge-digging technique to extract explicit features from an image. Its residual network aids in achieving an optimal learning rate, making it the most efficient and best-fitting of all the convolutional models presented.

Finally, despite the model’s strength, it can be reasonably said that it is rather lightweight. Here, the emphasis is on efficiency in data training. It is bias-free and has reached a new level using transfer learning and GAN data augmentation approaches. We can see from earlier research that their model has more accuracy than ours. Furthermore, the weaknesses of those models are frequently a result of the usage of typical machine learning models, biases, and data augmentation methods like shifting and rotating. Other factors may also make the model less apparent. As a result, even though our model is not extremely accurate, it is still better than others.

5 Conclusion and Future Work

This study compared eight different convolutional models to discover which one outperformed in all possible case studies. The fittest model was determined by comparing each model’s performance in normalized and de-normalized forms of an ultrasound picture dataset, with the Xception model outperforming each dimensional case study. In the binary categorization of breast cancer, the model is the most appropriate and consistent. The use of the GAN architecture to pre-process the dataset enhanced the performance of each model, implying that data processing approaches are helpful in reaching the intended result. This also aids in determining which convolutional model outperforms binary breast cancer classification scenarios using fewer images.

In addition, working with biomedical data is difficult due to its insufficiency, and more or less envisioning work necessitates the use of augmented data and data pre-processing terminology to extract the appropriate parameters. Above, the contribution is made with the idea of using deep learning methods rather than hybrid methods to create a gateway with advanced data pre-processing techniques that can generate forecast images to teach models with different dimensional parameters and select suitable features from those to train deep learning models, as well as enrich the transfer learning process by which it will take biomedical imaging ideas to multidimensional or correlational grounds.

In the future, we will quantify breast cancer severity to develop a system that will represent an automated version of the BI-RADS severity measurement scale.