1 Introduction

Breast cancer is ranked among the first foremost causes of female cancer worldwide. It mostly occurs when cancer cells grow in the breast tissues. A tumor is a mass of abnormal tissues, and there are two types of breast cancer tumors (normal and malignant). There are numerous methods to diagnose breast cancer, like breast self-examination (BSE), mammography or imaging, clinical breast examination (CBE), and surgery. The most reliable method for breast cancer screening and diagnosis is a mammogram, which can identify 85 to 90% of all breast cancers. Masses and calcifications are the most commonly reported anomalies that signify breast cancer.

A mass examined on a mammogram may be either benign or malignant based on its shape. Benign tumors typically have oval or circular forms, whereas malignant tumors have a somewhat circular form with a sharp or uneven outer surface. Benign or noncancerous tumors consist of fibroadenomas, breast hematomas, and cysts. A malignant or cancerous tumor mostly in the breast is a collection of abnormal and uncontrolled growth of breast tissue [1]. Therefore, early diagnosis is very important. The biggest challenge is to determine the location of the tumor and the level of severity. A mammogram image is used in a variety of image processing methods to diagnose pathology. These methods are as follows: I. Image enhancement, II. Segmentation, III. Feature extraction and IV. Classification.

Figure 1 demonstrates the overall process flow for the proposed approach. Image enhancement is the first step. Image enhancement was performed by Contrast Limited Adaptive Histogram Equalization (CLAHE) methodology [60]. The second step is segmentation, and a new novel deep-learning-based model is suggested for segmentation. It extracts the affected area from the enhanced image. The third step is the extraction feature. A gray-level co-occurrence matrix (GLCM) and shape strategies are utilized to extract the valuable features from the segmented image. Finally, by using machine learning approaches including decision tree, random forest, SVM [2], and Naive Bayes [3], it is used to determine whether it is normal or abnormal and benign or malignant.

Fig. 1
figure 1

Overall process flow for the proposed approach

1.1 Image enhancement

Image enhancement in a mammogram is a way of adjusting mammogram images to improve their brightness and decrease the noise seen to assist radiologists in identifying abnormalities. CLAHE methodology is used in this paper for image enhancement.

1.2 Segmentation

Recently, many methodologies have been proposed for image segmentation [4]. This study focuses on semantic segmentation. A novel deep learning-based architecture is proposed for segmentation based on GoogLeNet architecture. The GoogLeNet was introduced by Google research which was known by the name Inception v1 in 2014 [5]. This structure won the best ILSVRC 2014 image classification challenge. The error rate was substantially lower relative to previous Alexnet, ZF-Net winners, and slightly lower than VGG. This model uses methods like 1 × 1 convolution in the center of design and global average pooling to develop a deeper structure.

1.3 Features extraction

Feature extraction acquires more specific information from the segmented image. Commonly shape and GLCM techniques are used for feature extraction [6].

1.4 Classification

Classification generally has 3 types, namely machine learning [7], deep learning [8], and neural network [9]. Here, machine learning algorithms are used for severity level classification.

The remainder of the study is organized as shown below: The related work studies are addressed in Sect. 2. Details of the proposed technique are addressed in Sect. 3. The outcomes and descriptions of the study are provided in Sect. 4. Subsequently, the conclusions are set out in Sect. 5.

2 Review of related works

In the past, most analyses were used to enhance mammogram images and numerous spatial and frequency-domain methods have been explored [10]. A comparative [11] review on digital mammography imaging enhancement mechanisms, including wavelet-based improvement, CLAHE, unsharp masking, and the morphological operator was provided. In digital mammography images, methods have been developed for both regional contrast enhancement and background texture reduction [12, 12]. The CLAHE [14] method is the most widely utilized methodology that results in an improvement of the contrast around medical images.

Previous researches suggested various methods of mammogram image segmentation, including region-growing segmentation techniques [15], contour-based methods [16, 16], cluster methods [17], threshold-based methods [18], watershed-based techniques [19], and deep learning related techniques [20, 48,49,50,51,52,53,54]. There could be a risk of segmentation if an incorrect threshold value is used in threshold-based segmentation [21]. Even several other hybrid variants of the clustering method have been suggested to achieve the best results [22,23,24]. However, it is difficult to select a number of clusters in k-means and centroids in FCM for cluster-based approaches. Other methods provide low-performance results, except for the deep-learning technique.

A few other complex, efficient architectures were also addressed in the following papers [25,26,27,28,29, 55,56,57]. Severity level classification done by many proposed methods included SVM [30, 31], decision tree [32], naïve Bayes [33, 34], random forest [31], hybrid version [35], PNN [36], and deep learning [37, 37]. Yiqiu Shen et al. [55] presented a weakly supervised localization technique for high-resolution breast cancer images. Several authors have also used metaheuristic algorithms [58, 59] to enhance classification performance. However, these techniques suffer from high computational time, low training efficiency, manual processing, and low accuracy. To overcome these problems, a novel methodology is presented in this paper and it is explained in detail in the subsequent sections.

3 Methodology

Figure 1 demonstrates the overall process flow for the proposed approach. The proposed model has four stages: Image Enhancement, Segmentation, Feature Extraction, and Classification. Image Enhancement is the first step and it is performed by CLAHE methodology [60]. The second step is segmentation, where a novel deep learning-based architecture is used. It extracts the affected area from the enhanced image. The third step is the feature extraction where the gray-level co-occurrence matrix and the shape strategies are utilized to extract the valuable features from the segmented image. Finally, by using machine learning approaches including decision tree [39], random forest [40], SVM [2], and Naive Bayes [3], the features are classified as benign and malignant.

3.1 Dataset details

The MIAS dataset [47] contains mammography scan images along with their labels. The mammogram images are centered in the matrix and the size of each image is 1024 × 1024 pixels. In most of the images, calcifications are present in the centers and the radii are mainly applied in clusters rather than individual calcification. In certain cases, the calcifications are dispersed entirely to the image instead of concentrating on a single site alone. For these cases, both the central location and radii are considered unnecessary and omitted. A detailed description of the MIAS dataset is presented in Table 1. The dataset consists of a total of 322 images where 70% of images are used for training and the remaining 30% of images are used for testing.

Table 1 MIAS dataset description

3.2 Enhancement

Image enhancement is the digital image adjustment process so that the performance can be more convenient for further image analysis like segmentation. CLAHE is one of the widely used image processing techniques to improve the prediction accuracy by enhancing the regions of the tiny veins which are often ignored during contrast enhancement. The contrast enhancement process of the standard Histogram Equalization (HE) is limited by the CLAHE technique which performs an operation similar to noise improvement. The main aim of using the CLAHE technique is to limit the noise that occurs during contrast enhancement which serves as a major hurdle for medical images. A histogram is sliced at a certain threshold level and then the formula is implemented. It is an adaptive histogram equalization approach [41, 41], in which the contrast of an image is boosted by implementing CLAHE to limited data sections called tiles instead of the entire image. To patch the desired outcome in adjacent tiles, bilinear interpolation is used. Contrast can be limited within the same region, thus avoiding noise amplification [14].

Contrast enhancement is mainly a slope function that interrelates the intensity values of the input image to generate the desired output image intensities. When the slope of the relating function is controlled, the contrast value is minimized. The height of a histogram at a particular intensity mainly represents the contrast enhancement. Contrast enhancement is mainly controlled by limiting the slope value and clipping the histogram height. The CLAHE algorithm mainly limits the contrast via a clip limit.

The clipping limit (CL) of the CLAHE algorithm is mainly shown in Eq. (1)

$$CL = \left[ {\frac{\phi }{GS}} \right] + \left[ {\alpha \cdot \left( {\psi - \left[ {\frac{\psi }{GS}} \right]} \right)} \right]$$
(1)

The controllable threshold value of our proposed methodology is explained in Eq. (2)

$$CL = \frac{GT}{{80}}$$
(2)

In Eqs. (1) and (2), GS is the grayscale value, \(\psi\) is the pixel population of each block, \(GT\) is the global threshold, and the clip factor is represented as \(\alpha\).

3.3 Segmentation

3.3.1 GoogleNet layer description

Google's research resulted in the creation of GoogLeNet, which was described as inception V1 in 2014 [5]. This structure won the best ILSVRC 2014 image classification challenge. The error rate was substantially lower relative to previous Alexnet, ZF-Net winners, and slightly lower than VGG. This model uses methods like 1 × 1 convolution in the center of design and global average pooling to develop a deeper structure.

GoogLeNet is a deep convolutional, 22-layer wide, neural network. It is a pre-trained architecture that uses a places365 or ImageNet dataset for evaluation. The network trained on ImageNet categories which comprise 1000 types of objects, such as a mouse, keyboard, pencils, and various animals. Places365 is closely related to the ImageNet-trained network but categorizes images through 365 different categories of places, such as ground, park, lobby, and runway. Transfer learning is a potential way in GoogleNet to train another image dataset.

The first convolutional layer in the GoogLeNet utilizes a patch size of 7 × 7 which is relatively higher than the remaining patches used in the network and the main purpose of this layer is to minimize the size of the input image without losing the spatial features. The size of the input image is reduced by a factor of four when it reaches the second convolutional layer and before reaching the inception module it is reduced by a factor of eight. These processes mainly generate a larger number of feature maps. The second convolutional layer uses a 1 × 1 convolutional block with a depth of 2. The main aim of this 1 × 1 convolutional block is dimensionality reduction which decreases the number of operations done by different layers thus reducing the computational burden.

The nine inception modules used in the GoogleNet are one of its crucial layers. The inception module's main functionality is to reduce the computational cost associated with dimensionality reduction by identifying the features in varying scales via the use of convolution operators with different filters. Two max-pooling layers are placed in between some inception modules and the main use of the max-pooling layer is to downsample the input when it is propagated to different layers. The downsampling process mainly reduces the height and width of the input data. In this way, the computational burden that exists between different inception modules is reduced. The mean value of every feature map is taken by the average pooling layer present at the last inception module where the input height and width are reduced to 1 × 1. To prevent the overfitting of the network, a dropout layer is used which randomly minimizes the number of interconnected neurons with a neural network. The linear layer mainly comprises 1000 hidden units in which each layer represents the image class of the ImageNet dataset. The last layer is the softmax layer which utilizes the softmax function to derive the probability distribution of the input vector. The softmax function vector is a set of values whose probability sums up to 1.

3.3.2 Novel deep learning architecture

The pre-trained GoogLeNet architecture cannot be directly applied for segmentation tasks since its output layer is mainly designed for performing classification. This arises the need for modifying the pre-trained architecture for segmentation purposes by using transfer learning. Transfer learning reuses an already trained model for another task to solve a similar task of the same sort. In this way, the training time of the model is reduced and it also offers increased performance for a small training set. The model weights used for the previous problem by the same architecture are transferred for the novel architecture by slightly modifying them to suit the novel dataset.

In this paper, we provide a novel architecture model for segmentation which is based on GoogLeNet (Table 2). The main difference between our model and GoogLeNet is the top and bottom few layers changes. In GoogLeNet last 3 layer contains the dropout Softmax and output classification layer. Instead of these 3 layers, we add fully connected layers and a pixel classification layer. This pixel classification layer provides classification output for each pixel on the image. So pixel classification plays a major role in semantic segmentation. Semantic segmentation is a process of connecting the image pixels to its class label and it is a technique that classifies the image at a pixel level. In this way, an accurate level of the image is obtained and it can help the system to understand what is exactly present in the image via computer vision and enhance accuracy. By taking into consideration pixel-wise loss and in-network-up sampling, the fully connected network is employed for dense prediction.

Table 2 Novel GoogLeNet architecture after improvement

Table 2 presents the novel GoogLeNet segmentation architecture details regarding the number of layers used, output size, number of kernels, kernel size, and depth. The architecture used in this paper is 26 layers deep. Initially, there are normal convolutional layers followed by blocks of inception layers and max-pooling layers as shown in Table 2. Both the convolution and inception modules use a ReLU (Rectified Linear Unit) activation function. The pixel classification layer in the last provides a class label for each image pixel processed using the GoogLeNet architecture and the undefined pixel labels are ignored during training. In the training phase, the enhanced image and the ground truth image are given to the new model. Using the MIAS dataset, the proposed model is trained. During the testing phase, the images were segmented with the help of our model.

3.4 Features extraction

3.4.1 Gray level co-occurrence matrix (GLCM)

The GLCM [43] is a strategy for extracting statistical second-order texture features from the segmented image. The differences in the complex image textures can be analyzed via the GLCM matrices. The variances are usually caused by differences in the relative arrangement of pixels at different intensities. The GLCM [61, 62] overcomes this problem by differentiating the spatial relationship of pixels. The 2 × 2 GLCM is represented by the Black and White GLCM (BWglcm) in three directions namely 45 diagonal upper-left-to-bottom right, vertical down, and horizontal left-to-right). The input image taken is rectangular and has M columns and N rows. In grey level of each pixel is quantized to G levels.

For the G quantized grey levels \(\left( {Q_{M} = \left\{ {0,1,2,3,.............,G - 1} \right\}} \right)\,\), the columns and rows are represented as \(L_{M} = \left\{ {1,2,3,.............,L_{M} } \right\}\) and \(L_{N} = \left\{ {1,2,3,.............,L_{N} } \right\}\,\).\(L_{N} \times L_{M}\) is the set of pixels that can be transformed into the row-column description. The input image \(I:L_{M} \times L_{N}\) is taken as a function that allocates some grey level value Q for every pair of pixels or individual pixels in the coordinates. The texture-related information is expressed using a matrix of related frequencies named \(\,BS_{\alpha } (m_{,} \,n\,)\,\). Here, m and n represent the adjacent gray level pixels separated by a distance α. These gray-level co-occurrence frequencies are a function that comprises the adjacent pixel distance and angular relationship. This paper utilizes a total of 19 GLCM-based texture features along with their equations as shown in Table 3. The (m,n)th normalized GLCM entry is represented as \(\,BS_{\alpha } \,(m_{,} \,n\,)\,\). The mean(μ) and standard deviation (σ) for the (m,n)th normalized GLCM entry are computed as shown in the below equations:

$$\,\mu_{M} = \sum\limits_{m} {\sum\limits_{n} m } \cdot \,BS_{\alpha } \,(m_{,} \,n\,)\quad \mu_{N} = \sum\limits_{m} {\sum\limits_{n} n } \cdot \,BS_{\alpha } \,(m_{,} \,n\,)$$
(3)
$$\sigma_{M} = \sum\limits_{m} {\sum\limits_{n} {\left( {m - \mu_{M} } \right)^{2} } } \cdot \,BS_{\alpha } \,(m_{,} \,n\,);\quad \sigma_{N} = \sum\limits_{m} {\sum\limits_{n} {\left( {n - \mu_{N} } \right)^{2} } } \cdot \,BS_{\alpha } \,(m_{,} \,n\,)$$
(4)
Table 3 GLCM features and their corresponding equations

The co-occurrence matrix in GLCM is represented as \(\,BS_{\alpha } \,(m_{,} \,n\,)\,\) which is the frequency of a reference matrix R and a component c is present in an angle-distance metric(α) where the value m has n near it. The reference matrix R is the image and (m,n) is the pixel differences. In the GLCM, the value of each component (m,n) equals the number of times the pixel m is associated with n using an angle-distance metric. Based on the grayscale intensity values, the number of rows and columns are identified. Since an image contains grayscale values ranging from 0–255, the GLCM output comprises more than 256 rows. Every GLCM are represented using a 2 × 2 matrix since the segmentation results in a binary image where the pixel value either falls in 0 or 1. The working of the GLCM is represented using the matrix A as follows:

$$A = \left[ {\begin{array}{*{20}c} 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ \end{array} } \right]$$
(5)

If the adjacent pixel distance is 1, then the angle is represented as 0°. When the focus is moved from left to right, the frequency value is calculated, which indicates whether a certain component is the same or different, and it is expressed as follows:

$$B_{1,90} = \left[ {\begin{array}{*{20}c} 6 & 0 \\ 0 & 6 \\ \end{array} } \right]$$
(6)

The sum of the elements in the matrix is 12 due to the presence of 12 locations in the matrix and each component is located at the right. The value of the elements \(B_{\alpha } ,b_{\alpha 1,2}\), and \(b_{\alpha 2,1}\) is zero due to the absence of number zero elements with a 1 to its right (\(b_{\alpha 1,2}\)) and number one elements with a 0 to the right (\(b_{\alpha 2,1}\)). From the actual i × j binary image (Bimg), the GLCM value is computed as follows:

$$2B_{img} \left( {m,n} \right) - B_{img} \left( {m + x,n + y} \right) = Z(m,n)$$
(7)
$$GLCM = \left[ {\begin{array}{*{20}c} {\sum\limits_{n = 0}^{u - y - 1} {\sum\limits_{m = 0}^{v - x - 1} {\varphi \left( {Z(m,n),1} \right)} } } & {\sum\limits_{n = 0}^{u - y - 1} {\sum\limits_{m = 0}^{v - x - 1} {\varphi \left( {Z(m,n),2} \right)} } } \\ {\sum\limits_{n = 0}^{u - y - 1} {\sum\limits_{m = 0}^{v - x - 1} {\varphi \left( {Z(m,n), - 1} \right)} } } & {\sum\limits_{n = 0}^{u - y - 1} {\sum\limits_{m = 0}^{v - x - 1} {\varphi \left( {Z(m,n),0} \right)} } } \\ \end{array} } \right]$$
(8)
$${\rm Where}\,\varphi \left( {Z(m,n),h} \right) = \left\{ {\begin{array}{*{20}c} {1,} & {Z(m,n) = h} \\ {0,} & {else} \\ \end{array} } \right.$$
(9)

The given GLCM formula is for a binary image. The values x and y determine the angle-distance metric(α). When the value of y is positive and x is zero, then it represents the down vertical distance. The diagonal direction has equal x and y values and the left to right horizontal direction has a positive y value and a zero x value. For an image of interest, there are eight scales and it is represented as \(x,y \in \left\{ {2^{0} ,2^{1} ,2^{2} ,2^{3} ,2^{4} ,2^{5} ,2^{6} ,2^{7} } \right\}\). Table 3 presents the different texture elements of \(\,BS_{\alpha }\).

The relative intensity measure is known as the contrast which is computed between a pixel and its adjacent value α. The contrast value is 0 for a non-zero element due to the presence of a single non-zero element in the GLCM matrix. The proportion of values on the GLCM diagonal versus the proportion of values of the diagonal is measured by homogeneity. The homogeneity value lies in the range [0,1] and the pixel intensity value in a specific distance equals the reference pixel and the diagonal matrix value is 1 for every pixel. Energy is a matrix normalization type that measures the orderliness of the image and its value falls in the range fall in [0,1]. The value 1 mainly represents the constant image and energy is also interrelated with entropy. The correlation value mainly measures the correlation present between the reference pixel and the pixel in distance α. The mean value measures the average of gray levels in the image and the variance is a measure of heterogeneity where the variance increases when a difference is noted in the grey level values. The dissimilarity value is similar to contrast where the weights of the components increase in a linear fashion.

3.4.2 Shape features

Shape features extract the shapes from the segmented image and a set of 5 shape features is suggested in this paper. Table 4 represents the shape features and their corresponding equations. If the shape of the calcification is normal, the ratio is 1, or else it is near zero.

Table 4 Shape features and their corresponding equation

3.5 Breast cancer classification


The normal, malignant, and benign classes of the mammography images are classified using the Naïve Bayes (NB) [3], Decision tree (DT)[39], Support Vector Machine (SVM) [2], and Random Forest (RF) [40] classifiers. A brief description of each classifier is shown below:


Decision Tree(DT): The decision tree is a supervised machine learning classifier where the data is continuously partitioned based on a certain parameter. The nodes represent the features extracted for the breast cancer classification problem and the edges represent the outcome of the test by interconnecting the next node or leaf. The classification result (benign, malignant, or normal) is present in the leaf node.


Random Forest (RF): It offers multiple trained decision tree classifiers for the testing stage which makes it always preferred over the conventional Decision tree. The correct input features need to be obtained to act as the nodes. There is an N number of decision tree classifiers and the features obtained from the input image are sent through every Decision Tree to obtain the class labels. At last, a bagging technique is applied to the result obtained from the trees in the previous step.


Support Vector Machine (SVM): SVM is a machine learning classifier that provides high accuracy with less computational power. The main aim of the SVM is to find the hyperplane in an N-dimensional space to uniquely classify the data points. Where N represents the number of features and the hyperplane is a decision boundary used to classify the data points. The hyperplane with a maximum margin can distinctively separate the classes with high accuracy.


Naïve Bayes (NB): To classify the breast cancer classes, a probabilistic machine learning model known as Naïve Bayes is used. It is formulated using the Bayes theorem.

$$P(R|M) = \frac{P(M|R)P(R)}{{P(M)}}$$
(10)

The above equation is used to derive the Bayes theorem. If the incident M happened means one can easily find the probability value of R. If R is the number of malignant cases, then M is the disease progression. Here M is the hypothesis and R is the evidence. The features are mostly independent, and one feature does not rely on the other feature in any way. Hence, it is known as Naïve. If m is the malignant class that classifies whether a patient is subjected to breast cancer or not. The value R = r1,r2,r3,….., rn represents the list of the input features. After expanding the Naïve Bayes Rule by substituting the values for R, we get

$$P(m|r_{1} ,r_{2} ,....,r_{n} ) = \frac{{P(r_{1} |m)P(r_{2} |m).....P(r_{n} |m)P(m)}}{{P(r_{1} )P(r_{2} )......P(r_{n} )}}$$
(11)
$$P(m|r_{1} ,r_{2} ,....,r_{n} ) \subset P(m)\prod\limits_{i = 1}^{n} {P(r_{i} |m)} \,$$
(12)

Here, the class variable(m) makes two predictions: yes or no. The main aim is to find the class m with maximum probability.

$$m = \arg \max_{m} P(m)\prod\limits_{i = 1}^{n} {P(r_{i} |m)}$$
(13)

By using the above equation, one can make a classification by taking the predictions.

4 Experiments results and discussions

The experiments are conducted on an Intel Core I9-10,850 K 3.60 GHz processor with 32 GB memory and 1 TB storage. The Matlab programming language is used to implement this model. Table 5 represents the original input image, enhanced image, and segmented image results for various abnormalities, respectively. The performance metrics used are namely Dice coefficient, Jaccard Index, Accuracy, Sensitivity, and Specificity. The dice coefficient is mainly used to compute the similarity between two sets. It is defined as two times the area of M and N divided by the sum of the areas M and N. Jaccard Index is mainly used to compute the similarity and diversity of sample image sets. They find the similarity of the finite size of samples by taking a ratio of intersections over the union.

$$Dice(M,N) = \frac{{2\left| {M.N} \right|}}{{\left| {M + N} \right|}}\,$$
(14)
$$Jaccard(M,N) = \frac{{\left| {M.N} \right|}}{{\left| M \right| + \left| N \right| - \left| {M.N} \right|}}\,$$
(15)
Table 5 Performance results

where M and N are binary vectors of equal length with values of 1 and 0, respectively. The value one indicates that an element is present in the set whereas a value 0 indicates an absence of the element in the set. \(\left| {M.N} \right|\) represents an inner product of M and N where M and N represent the true positive values. Sensitivity (S1) identifies the percentage of pixels in the diseased area that is accurately segmented as abnormal masses. It is computed using the following formula:

$$S1 = \frac{{X_{1} }}{{X_{1} + Y_{2} }}\,$$
(16)

Specificity (S2) is the percentage of normal tissues correctly segmented by the model.

$$S2 = \frac{{X_{2} }}{{X_{2} + Y_{1} }}\,$$
(17)
$$Accuracy = \frac{{X_{1} + X_{2} }}{{X_{1} + X_{2} + Y_{1} + Y_{2} }}\,$$
(18)

The true positive (X1) represents the abnormal tissue correctly segmented as abnormal and the true negative(X2) represents the normal masses segmented as normal. False-positive (Y1) represents the normal tissue incorrectly segmented as abnormal and False-negative(Y2) represents the abnormal masses incorrectly segmented as normal.

Figure 2 shows a comparison of proposed and existing models such as modified Xception [63], modified AlexNet [64], and modified VGG-19 [65] in terms of segmentation performance. In the modified Xception model, the multilevel features that are acquired from different convolutional layers are fed into a Multilayer perceptron Network (MLP) for training. In the modified AlexNet architecture, the multiclass Support Vector Machine (SVM) layer is used instead of the normal classification layer. In the modified VGG-19 model, the authors replaced the global pooling layer in the final block instead of the max-pooling layer.

Fig. 2
figure 2

Segmentation performance comparison between proposed, VGG16 [44], AlexNet [45] and CNN [46]

The pre-trained CNN architectures are trained for 100 epochs with a learning rate of 0.001 using a stochastic gradient descent algorithm. By comparing the output image to the actual segmented image acquired from the radiologist, the segmentation performance is assessed. During the training process, a snapshot of the CNN model is taken for each epoch, and the model with the highest dice coefficient value is chosen as the winner. The proposed model offers an accuracy, sensitivity, specificity, Dice coefficient, and Jaccard coefficient score of 99.12%, 99.89%, 98.45%, 82.15%, and 89.11% which is relatively higher than the other techniques.

According to segmentation performance analysis, when comparing the proposed model with the existing models, the accuracy increased approximately 4.61%, the sensitivity increased 5.43%, the precision increased 4.76%, the Dice coefficient increased 13.4% and the Jaccard coefficient increased 17.1%. The experimental results obtained show that the proposed methodology offers significant performance and outperforms other conventional methodologies when evaluated in terms of different performance evaluation metrics.

Figures 3 and 4 represent the classification training and testing performance comparison between various machine learning approaches and they are self-explanatory. From the analysis of the results, the SVM classifier provides the best classification performance in terms of accuracy, sensitivity, precision, and F-measure. The SVM achieves higher performance and surpasses the DT, NB, and RF techniques when evaluated using the MIAS dataset. The SVM mainly offers higher performance due to the detailed calcification segmentation results offered by the GoogLeNet architecture. The GoogLeNet architecture offers improved performance by the shape features extracted via precise lesion segmentation. Even though the performance of the SVM is dependent on the segmentation results, it doesn’t need additional time and effort because no manual intervention is utilized here for segmentation. Thus we can conclude that with the help of the transfer learning approach, the training and testing efficiency has been improved significantly.

Fig. 3
figure 3

Classification training performance comparison between various ML models

Fig. 4
figure 4

Classification testing performance comparison between various ML models

5 Conclusion

According to information released by the World Health Organization (WHO), breast cancer is another of the most common cancers among women. Mammography is the most effective tool for the early detection of this type of cancer. Mammography can detect cancer in the breast ten years before it manifests. We employed segmentation to assess the breast tumor, which aids doctors in determining the volume of the tumor and results in more effective treatment. In this study, we proposed a GoogLeNet architecture for breast cancer segmentation. From the analysis of the results, the proposed model provides the best segmentation results performance while comparing it with the existing model. To determine whether it is normal or abnormal, benign or cancerous, machine-learning approaches are used. When compared to other methods, SVM gives better performance results, according to classification training and test performance analyses. The proposed method has a segmentation accuracy of 99.12%, which helps to improve the classification performance of various machine learning architectures. As a result, when applied to the medical field, our proposed methodology has proven to be very beneficial.