1 Introduction

Breast cancer is deadly cancer caused by abnormal cells that cannot be controlled [8, 9, 39]. By 2020, 2.3 million new breast cancer cases have been diagnosed, posing a serious threat to the health of women around the world [37, 42]. Therefore, early screening for breast disease is particularly necessary and significant. At present, there are many methods for early screening of breast disease, such as mammography, computed tomography technology, photoacoustic imaging, etc. [4, 14, 24, 49]. Mammography is recognized as the most effective method for the early detection of breast cancer [31]. It is primarily used for breast cancer detection [18, 43], microcalcification treatment [3], and image classification [51].

Boumaraf et al. [9] used the Breast Imaging Reporting and Data System (BI-RADS) to analyze and classify mammograms. Accurate grading results of mammograms can not only reduce mortality but also protect patients’ physical and mental health by avoiding unnecessary biopsy and clinical surgery [50]. As shown in Table 1, the fifth edition of the BI-RADS classification divides breast cancer into the following 7 categories [11, 33]. However, the contrast of the mammography images of the different BI-RADS itself is very low. This may cause doctors to be easily limited by their own experience when diagnosing with the naked eye, or misdiagnosis and missed diagnosis caused by visual fatigue. Even as an expert, it is an arduous task to assign a BI-RADS category for each mammogram [6]. Therefore, this is an important research value for further studies on the classification of mammography images of breast diseases.

Table 1 Fifth edition BI-RADS classification

In recent years, there are still a host of scholars studying computer-aided diagnosis (CAD) to help doctors diagnose breast diseases. An effective CAD method can provide radiologists with reliable support for detecting abnormalities in mammograms [10, 13, 45]. Therefore, for the early diagnosis of breast diseases based on mammograms, this paper explored a computer-aided breast cancer screening and classification method based on feature extraction from a pre-trained neural network model. We explored and created a new method of image enhancement, called INCIE (Image Negatives and Contrast Limited Adaptive Histogram Equalization Image Enhancement) by classifying normal breast (BI-RADS 1), benign breast (BI-RADS 2), and probably benign breast (BI-RADS 3). The algorithm performed an image negatives operation on the dataset, and then performed a contrast-limited adaptive histogram equalization operation. After expanding the training set after INCIE, the pre-trained ResNet-50 neural network was used as the feature extractor to compare the classification results on the K-nearest neighbor (KNN) classifier, random forest (RF) classifier, and support vector machine (SVM) classifier. The final KNN classification accuracy is 85%. This paper contributes as follows:

  • Firstly, this study focuses for the first time on the subtle differences between normal, benign, and probably benign breasts so that patients can effectively avoid unnecessary biopsies. At the same time, the different classification effects of machine learning algorithms in breast disease classification were discussed, indicating that machine learning algorithms still have great potential for development in mammography diagnosis.

  • Secondly, in the study, we explore for the first time the study of multiple mammogram preprocessing methods and create a new image enhancement method, which has certain reference significance.

  • Finally, this study has certain application value. It can not only help radiologists to provide accurate diagnosis and effective interpretation, but also better help clinicians to provide rapid and efficient analysis results, thus reducing the workload of doctors.

2 Related works

With the deepening of relevant research, the diagnosis and grading of breast diseases have attracted the attention of some research institutions. Kumar et al. [19] analyzed machine learning algorithms such as Naïve Bayesian, Support Vector Machined, Decision Trees, etc. under the WEKA environment. Finally, the SVM algorithm achieved an accuracy of 97.89%. There are still some problems in the above research, such as incomplete experimental subjects. Many research institutions have also conducted related studies on the BI-RADS categories of breast cancer based on mammograms. Chokri et al. [11] recommended CAD to automatically classify mammograms into two categories, or into four BI-RADS categories. In this study, they used 480 images from the DDSM database for model evaluation, achieving accuracy rates of 88.02% and 83.85%. Although the study focused on the importance of different BI-RADS categories of breast cancer, there are still incomplete experimental subjects, and the difference between normal breast and other different stages has not been paid attention to. Miranda et al. [32] invited professional doctors to evaluate 46 mammograms in the DDSM database and predict four BI-RADS categories of breast diseases through a fuzzy logic inference system. The final accuracy rate was 76.67%. This study has not yet paid attention to the classification significance of normal breast, and the classification effect is poor, so we still need further study. Loizidou et al. [26] collected 100 pairs of mammograms for the classification of benign tissue (BI-RADS 1, BI-RADS 2) and suspicious tissue (BI-RADS 4, BI-RADS 5). They extracted 96 features and evaluated the performance by measuring sensitivity, specificity, accuracy, etc., in 9 different classifiers. The final result had an accuracy of 90.3% and an AUC of 0.87. However, there are still too few classification types and unreasonable divisions in this study. Zhang et al. [54] evaluated the ultrasound images of 1311 lesions according to the BI-RADS grading scale and with reference to the pathological findings. They divided benign and malignant lesions and used sensitivity, specificity, and accuracy to compare the diagnostic effects of AI and radiologists. The AI model had a diagnostic accuracy of 77.0%, a sensitivity of 82.0%, and a specificity of 71.7%. Ghaemian et al. [15] evaluated a total of 213 breast masses, and the data were analyzed using descriptive statistical methods. The sensitivity of mammography and ultrasonography alone was 72.6% and 68.9%, respectively. The accuracy rate of combined mammography and ultrasound was 84.9%. In summary, we have summarized the relevant research papers published in recent years, and the results are shown in Table 2.

Table 2 Summary of research literature related to the early detection and diagnosis of breast disease

3 Materials and methods

3.1 Experimental datasets

This study selected 134 (2560*3328 Pixels, 3328*4096 Pixels) different BI-RADS mammograms of 68 breasts from January 2016 to December 2020 in the Affiliated Tumor Hospital of Xinjiang Medical University (details are shown in Table 3). Among them, BI-RADS 1 includes 25 breasts, a total of 50 mammograms, all of which are normal tissues; BI-RADS 2 includes 25 breasts, a total of 50 mammograms, all of which are benign tissues, including calcified fibers Adenoma and other signs, no malignant signs; BI-RADS 3 includes 18 breasts, a total of 34 mammograms, all of which are probably benign tissues, including clusters of fine dot-like calcifications and other signs, requiring short-term follow-up. Mammograms of all patients were retrieved from the pathology department of the hospital. All samples were taken in craniocaudal (CC) or medial oblique (MLO) position, as shown in Figs. 1 and 2, and 3.

Table 3 The obtained experimental images
Fig. 1
figure 1

Mammograms of the normal breast. a Mediolateral Oblique (MLO) of the normal breast (BI-RADS 1), b Craniocaudal (CC) of the normal breast (BI-RADS 1)

Fig. 2
figure 2

Mammograms of the benign breast. a Mediolateral Oblique (MLO) of the benign breast (BI-RADS 2), b Craniocaudal (CC) of the benign breast (BI-RADS 2)

Fig. 3
figure 3

Mammograms of the probably benign breast. a Mediolateral Oblique (MLO) of the probably benign breast (BI-RADS 3), b Craniocaudal (CC) of the probably benign breast (BI-RADS 3)

3.2 Proposed method

The mammogram images themselves have low contrast, so assigning the correct BI-RADS category to each mammogram and thus avoiding unnecessary surgery can be a difficult task [6, 50]. To solve this problem, this paper proposes a computer-aided breast cancer screening and classification method based on feature extraction from a pre-trained neural network model. Created an INCIE image enhancement method, using the pre-trained ResNet-50 model as the feature extractor, input the extracted features into the KNN classifier, RF classifier, and SVM classifier respectively, and plotted the ROC curve and confusion matrix [15, 41, 54], to evaluate the experimental results. Figure 4 illustrates the framework of the early detection method for breast disease proposed in this study.

Fig. 4
figure 4

Overall process of proposed model. The blue boxes in the figure represent the five main parts in this paper

3.3 Preprocessing

The initial process of image classification is the preprocessing stage [36]. The preprocessing technology improves the accuracy of the classification by improving the quality of the image. Different researchers have different preprocessing methods for mammograms. For example, wiener filtering method used to filter equalized image sets in image processing [23, 36]; median filtering method to protect edge information [1]; laplacian filtering method for preprocessing because of its simplicity and ability to eliminate noise [5]; CLAHE for enhancing the contrast of mammograms [31]. In addition, we also discuss the contrast stretching method and the image negatives method. This paper involves removing the artifacts and background, Eliminating the pectoral muscles, and the application of image enhancement [7]. The specific steps are as follows:Our first task is to convert the CR sequence diagram in DICOM format to JPG format. Then using LabelMe software, two professional doctors help to manually outline the pectoral muscles part in the mammogram, eliminate the pectoral muscles and remove the artifacts and background, and keep them intact breast part. The concrete implementation is shown in Fig. 5.

Fig. 5
figure 5

Eliminate the pectoral muscles and remove the artifacts and background. a Manually outline the area, b Eliminate the pectoral muscle, c Remove the artifacts and background

In addition, we also explored the effect of different preprocessing methods. According to the large black area in the mammography target, we need to highlight the characteristics of the white and gray parts, and we chose the image negatives method for processing. To improve the local contrast of the image, we perform contrast limited adaptive histogram equalization (CLAHE) based on the above operation and resize the image to 1024 × 1024 pixels [1, 31]. The formula of image negatives is as follows (1).

$$s=L-1-r$$
(1)

Among them, \(s\) is the gray value of a certain point of the inverted target image, \(L\) represents the gray level, and \(r\) represents the pixel value of the original image pixel. The concrete implementation is shown in Fig. 6.

Fig. 6
figure 6

Image Negatives and Contrast limited adaptive histogram equalization Image Enhancement (INCIE) a Image Negatives, b Image Negatives and Contrast limited adaptive histogram equalization Image Enhancement (INCIE).

3.4 Data augmentation

Data augmentation is an attractive solution to reduce model overfitting, improve model generalization and performance [2, 17]. Currently, the publicly available mammograms are limited. Therefore, we adopted data enhancement to prevent overfitting caused by small sample data sets [24]. Data augmentation does not only increase the amount of our data but makes the data set “stronger.“ In this study, we randomly divided the dataset with a ratio of 7:3. The training set is augmented with horizontal and vertical flipping, translation by 10 pixels, rotation by 90°, 180°, and 270°, and adding noise [20, 22, 37, 41]. The augmented training set contains a total of 9400 mammography images.

3.5 Feature extraction

According to research, traditional classification algorithms consume a lot of time and energy when deciding which feature extraction algorithm to use. However, CNN is widely used because of its special structure of local weight sharing, which has unique advantages in the field of image processing [16, 21, 40]. The ResNet was proposed by K He et al. in 2015, and it has attracted much attention due to its excellent performance in image classification and target detection [55]. At present, researchers have applied ResNet-50 for the feature extraction of mammograms and have achieved good classification results [27, 30, 35]. Although Mammograms are slightly different from natural image data, they can still detect basic features such as edges or shapes through pre-trained neural network models [16, 52]. Considering the complexity of the feature extraction and classification process, this research uses the ResNet-50 model to extract features from the image.

3.6 Classifier

3.6.1 RF classifier

In 2001, Breiman modified bagging and proposed a random forest algorithm [29]. The essence of the algorithm is an improvement of the decision tree algorithm. It is flexible and one of the most practiced integrated classifiers, which has proven its efficiency and superiority in many classification applications [46]. A test sample can select the most probable classification after counting the classification results of each tree [12, 34]. The RF implementation is relatively simple and suitable for parallel computing. Each tree randomly selects samples and randomly selects features [38, 47]. Since the number of trees (n_estimator) directly affects the final classification effect, during the experiment, we had to determine how many trees to take.

3.6.2 SVM classifier

SVM is a supervised learning algorithm proposed by Vapnik et al. [44], which is mainly used to solve data classification in the field of pattern recognition. The biggest advantage of this algorithm is that it’s backed up by rigorous mathematical theory. In the vector space composed of sample points, SVM achieves the effect of data classification and prediction by finding a partitioning hyperplane that can correctly separate the two types of data on both sides [25, 53]. SVM classifies samples of different categories by dividing the hyperplane with the “maximum interval” suitable for sample classification. As shown in Formula (2):

$${min}_{w,b}\frac{1}{2}{\left|\left|w\right|\right|}^{2}+C\sum\nolimits_{i=1}^{m}{l}_{i}\left(y\right({w}^{T}{x}_{i}+b)-1)\cdot {l}_{i}$$
(2)

Where \({l}_{i}\) is the loss function and \(C\) is the penalty coefficient. This paper uses the Lagrangian multiplier method to solve the problem. The equivalent conversion of the dual problem is given in Eq. (3).

$${max}_{a}=\sum\nolimits _{i}^{m}{\alpha }_{i}-\frac{1}{2}\sum\nolimits _{i}^{m}\sum\nolimits _{j}^{m}{\alpha }_{i}{\alpha }_{j}{y}_{i}{y}_{j}{x}_{i}^{T}{x}_{j}\cdot {s}_{.}{t}_{.}\sum\nolimits _{i}^{m}{\alpha }_{i}{y}_{i}=0, 0\le {\alpha }_{i}\le C$$
(3)

Equation (3) shows that selecting an appropriate kernel function is very important because it not only can reduce the calculation of the inner product of \({x}_{i}^{T}{x}_{j}\) but can also improve the accuracy of classification.

3.6.3 KNN classifier

KNN algorithm can be used not only for regression but also for classification, a statistical method for pattern recognition, which plays a significant role in the field of machine learning [19]. The basic thought of the KNN algorithm is just like one in real life “Birds of a feather flock together.“ First, calculate the distance between the image feature to be classified and the training image feature, and then sort. Then determine a new sample category based on the category of similar training data features [48]. For a data sample X to be classified, its K-nearest neighbors are searched and then X is assigned to the class label to which most of its neighbors belong. The choice of K also affects the performance of the K-nearest neighbor algorithm [28]. It is concise and clear, easy to implement, has fast model training time, good prediction effect, and is especially suitable for multi-classification problems. The KNN classifier works as follows:

  1. 1.

    Initialize the value of K.

  2. 2.

    Calculate the distance between the input sample and training samples.

  3. 3.

    Sort the distances.

  4. 4.

    Take top K-nearest neighbors.

  5. 5.

    Apply simple majority.6. Predict class labels with more neighbors for the input sample.

3.7 Evaluative criteria

In this experiment, we conduct model performance evaluation by drawing the ROC curve and confusion. When testing the classification model of this study, we calculated the probabilities of various test samples. We can deduce the true-positive rates (TPR) and false-positive rates (FPR) and then use these TPR-FPR data to draw the ROC curve and calculate the area AUC under the ROC curve. Another name for the confusion matrix is the error matrix. It is a master format for showing precision evaluation. These precision indexes reflect the precision of mammograms classification from diverse sides. In the confusion matrix, all the correct prediction results are on the diagonal, so we can easily and intuitively see the error from it. In this experiment, we use the following indicators to evaluate the parameters: precision, sensitivity, specificity, and F1 score. The formula is (4)–(7).

$$Precision= \frac{TP}{TP+FP}$$
(4)
$$Sensitivity= \frac{TP}{TP+FN}$$
(5)
$$Specificity= \frac{TN}{FP+TN}$$
(6)
$$F1\;Score=2\cdot\frac{Precision\cdot Sensitivity}{Precision+Sensitivity}$$
(7)

4 Results and discussion

4.1 Experimental setup

In this experiment, the Matlab2016a platform is used to achieve data preprocessing, the LabelMe tool is used to outline the breast pectoral muscles, and the convolutional neural network is implemented using TensorFlow. All experiments are performed on an Intel(R) Core (TM) i5-7200U CPU @ 2.50 GHz 2.71 GHz machine. The software and hardware conditions of the specific experimental environment are shown in Table 4.

Table 4 Experimental environment

4.2 Experimental results

4.2.1 Preprocessing results

In this section, the classification results produced by different preprocessing methods are analyzed in detail. Table 5 shows the accuracy of classification on different classifiers using different preprocessing methods.

Table 5 Comparison of results of different image preprocessing methods

To find a suitable preprocessing method that is helpful for accurate classification, this experiment uses a variety of methods for comparison experiments [1, 5, 31, 36]. During the experiment, we considered that different filter windows and the size of the diagonal element center would produce images with different effects. Therefore, when we experiment with median filtering and Wiener filtering methods, we choose 3 × 3, 5 × 5, 7 × 7, and 11 × 11 windows for comparison; When using the Laplacian filtering method to experiment, the diagonal element center is selected as − 4 and − 8 for comparison. Most importantly, we have processed the mammogram using image negatives technology based on its characteristics. Experimentally, the INCIE method we created proved to be the most effective and better able to show the feature details in the molybdenum target images at different levels.

4.2.2 Data augmentation results

Since there are few publicly available mammograms at present, we use data augmentation to expand the number of images in an attempt to increase the generalization performance [24]. According to the above preprocessing experiment, we selected the four groups of data with the best performance in Table 5 for data augmentation. Data augmentation methods are translation, rotation, and flip. As shown in Table 6, the data on the left of each model is the result of preprocessing, and the right data is the result of data augmentation. The experiments show that the pre-processed images can be enhanced by data, which can make the classification effect better. In particular, the image processed by image negatives and CLAHE have the largest improvement in the KNN classifier, and the accuracy has increased by 25%.

Table 6 Comparison of results before and after data enhancement

In addition, we found that many researchers also pay attention to increasing noise data when performing data augmentation for mammograms [54]. Therefore, we chose the best-performing data augmentation method in Table 6 to add noise processing, and the classification effect on the KNN classifier increased by 2.5% (as shown in Table 7). The results show that when classifying mammograms, adding noise data has a certain effect on improving the performance of the model.

Table 7 Comparison of results of different data augmentation methods

4.2.3 Comparison of KNN classifier with others

In this study, the KNN classifier was used to classify mammograms, and the current two mainstream classification algorithms SVM and RF were selected to compare precision, sensitivity, specificity, F1-score, and accuracy. The comparison results are shown in Table 6. According to Table 6, we can know that when facing BI-RADS 1, the sensitivity of the KNN classifier is as high as 100.0%, which is 13.3% higher than the RF classifier and 20.0% higher than the SVM classifier. For BI-RADS 2, the precision of the KNN classifier reaches 91.7%, which is 35.4% and 29.2% higher than the RF classifier and SVM classifier severally. In BI-RADS 3, the F1-score of the KNN classifier is 84.2%. However, the results of the RF classifier and SVM classifier are not very satisfactory, which are 42.9% and 62.5%, respectively. The results show that all the evaluation criteria perform better on the KNN classifier. The accuracy of the KNN reaches 85.0%, which proves the rationality of choosing the KNN classifier (Table 8).

Table 8 Performance analysis of RF, SVM and KNN models

4.2.4 ROC curve and confusion matrix

In addition, we also perform performance evaluation by drawing ROC curves and confusion matrices. As shown in Fig. 7, (a), (b), and (c) represent the ROC curves and confusion matrices of the classification results of RF, SVM, and KNN severally (‘0’ means BI-RADS 1, and ‘1’ means BI-RADS 2, ‘2’ means BI-RADS 3). According to Fig. 7, we can see that the micro-average AUC value of the KNN classifier reaches 0.89, which is 0.1 and 0.02 higher than the RF classifier and the SVM classifier, respectively. The AUC values of the KNN classifier under BI-RADS 1, BI-RADS 2, and BI-RADS 3 are 0.92, 0.85, and 0.88, respectively. Compared with the RF classifier and the SVM classifier, when the KNN classifier is adopted, the AUC value of BI-RADS 3 has the most significant increase, increasing by 0.19 and 0.05 severally. The results show that KNN has a higher recognition rate for mammograms of three BI-RADS categories, which more effectively proves that the KNN classifier is our best choice.

Fig. 7
figure 7

ROC curves and confusion matrices. a ROC curves and confusion matrices of RF classifier, b ROC curves and confusion matrices of SVM classifier, c ROC curves and confusion matrices of KNN classifier

4.3 Discussion

In this paper, we propose an INCIE image enhancement method to process mammogram images with respect to their own characteristics. Experiments show that our proposed method provides more detailed feature information, which lays a good foundation for later classification. Through the horizontal comparison of the three classifiers and different evaluation metrics, it is found that RF classifiers do not achieve ideal results at different levels of classification. We believe that the classification performance is poor due to the small sample size. When the RF classifier is applied to mammography images with high feature similarity, the classification results of BI-RADS 3 are more inclined to the category with a large number of samples. The evaluation metrics of the SVM classifier are slightly better than the RF classifier. Although the classification ability of SVM makes up for the lack of classification ability of small sample datasets in the field of deep learning to a certain extent, the classification results of BI-RADS 3 under small samples still have not achieved satisfactory results. Therefore, we choose the One-vs-the-rest (OVR) multiclass strategy for discussion, and the experimental results are shown in Fig. 8. The experimental results show that the classification accuracy of the OVR-SVM model is improved by 2.5%, and it is superior to the SVM model in the classification accuracy and performance of different types of breast diseases. The KNN classifier achieves better results in all three breast classes, which shows that the KNN classifier is suitable for multi-classification problems and has good classification performance and generalization ability for datasets with more similar features in mammogram images. At the same time, it helps radiologists to provide accurate diagnosis and effective interpretation to a certain extent and helps clinicians to provide fast and efficient analysis results, thus reducing the workload of doctors.

Fig. 8
figure 8

ROC curves and confusion matrices of SVM classifier using OVR strategy

4.4 Study limitations

This study has some limitations. First, since the data in this study were collected from local hospitals, there are certain differences in the datasets annotated by different radiologists, so the data generality is not strong. Second, the number of mammography images, included in this study was not large enough and the types were not comprehensive enough to warrant that the conclusions of this study could be justified by other images. Finally, the mammograms were scaled down to fit the available GPU, and future studies can maintain the resolution of the original images, provide finer detail features, and potentially improve performance.

5 Conclusion

One of the best early detection approaches for breast cancer is a classification based on mammography images. It not only enables patients to receive more appropriate treatment options but also effectively avoids unnecessary surgery. In this study, intelligent classification was performed for early mammogram images of different grades. Since mammogram images are characterized by predominantly black color, we created a new image enhancement framework INCIE based on a set of multiple data preprocessing methods compared for the first time. It enhances the white or gray areas in the image to better show subtle differences between different categories of mammograms. In the experiments, considering the complexity of feature extraction and classification, we adopted the pre-trained ResNet-50 neural network model for feature extraction, which was evaluated in three different classifiers. The results showed that the KNN classifier has the best classification effect, with an accuracy rate of 85% and an AUC of 0.89. This method is practical and reliable in the diagnosis of early breast disease.

As an applied study, this study has a certain auxiliary effect on the diagnosis of physicians, while solving the time-consuming and laborious problems associated with traditional manual film reading. In future work, we will continue to collect more types of data from different central institutions to build better-performing and more generalizable auxiliary diagnostic models. At the same time, we will combine multi-directional mammograms for analysis to maximize the significance of the early diagnosis of breast diseases (Appendix Table 9 and 10).