Keywords

1 Introduction

According to the report of International Agency for Research on Cancer (IARC) presented by the World Health Organization (WHO), in the year 2012, it has been found that women are mostly affected by the breast cancer and it is considered to be the second deadliest disease in women [1]. The abnormal growth of the tumor cells followed by invasion in to the surrounding tissues leads to cancer in the human body. In general, tumors are classified into benign and malignant. The cells of the benign tumor are non–cancerous in nature whereas the cells present is in malignant tumor are considered as cancerous. The cells of benign tumor grow locally and they are unable to spread to the surrounding tissues by invasion. On the other hand, the malignant tumor cells can grow uncontrollably and also invade to the surrounding tissue s and later on to various parts of the body [2]. There are different screening methods used to diagnosis breast cancer and among all mammography is the most efficient screening technique used. In mammography, different kinds of views like Mediolateral oblique (MLO, craniocaudal (CC) views are used for better understanding of the abnormalities present in the breasts. To discriminate the abnormality found in breast among benign, normal or malignant classes, radiologists use either of these MLO or CC views to analyze the breast lesion markers such as masses and micro-calcifications [3]. The interpretation of the mammographic images is not only time consuming but also requires the expertise of a highly experienced and trained radiologist [4]. Due to these issues, now a day, Computer assisted diagnosis/detection (CAD) tools are in demand to automate the medical image analysis.

In last decade, deep learning being the major breakthrough technology used in various field of research also proves its emergence in healthcare research. Different deep architectures [5] are explored and efficiently used in the breast cancer research which showed promising results too. A good number of researchers are still carrying out research to diagnose breast cancer using computer aided diagnosis process. Arevalo et al. [6] proposed a hybrid Convolutional Neural Network (CNN) method in which supervised learning technique is used to learn image based handcrafted features. For the problem of mammographic tumor classification, Huynh et al. [7] used transfer learning on the pre-trained AlexNet model without fine-tuning it and for the classification at the back-end, Support Vector Machine (SVM) algorithm is used.

In this paper, an attempt has been made to develop an automated system that has been trained with variants of convolutional neural network architectures using mammographic images using Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM) and INbreast datasets. In this study, the investigations are performed for the classification of breast cancer between benign or malignant classes. Pre-trained models are trained from scratch in first approach, thereafter fine tuning is performed using transfer learning in the second approach. For this work, we have considered variants of convolutional neural networks such as VGG19, ResNet50, MobileNet-v2, Inception-v3, Xception and Inception-ResNet-v2 respectively. The existing research works were mostly focused on various breast cancer databases with their accuracy achieved from various machine learning approaches as well as proposed techniques. But this work mainly focuses on how different deep learning models behave for full mammogram images and region of interest (ROI) images for a particular database. The main contribution of this paper is as follows:

  • Utilization of transfer learning for fine tuning and regularization approaches in breast cancer detection.

  • To analyze the usefulness of various pre-trained models which are trained from scratch without tweaking the deep architectures.

  • Impact of transfer learning approach on the pre-trained convolutional neural networks using regularization technique.

The organization of the paper is as follows: Sect. 2 presents literature survey. Section 3 presents the dataset description. Section 4 elaborates proposed methodology. Experimental results and discussions are presented in Sect. 5 and Sect. 6 concludes the paper.

2 Related Work

Currently in healthcare sector, the most recent advances of deep learning technologies play a potential role to improve the quality of medication. Carneiro et al. [8] used fine tuning on ImageNet, a pre-trained CNN to discriminate micro-calcification and masses. Breast Imaging-Reporting and Data System (BI-RADS) score is very helpful to classify the breast cancer type. In authors in their work also used BI-RADS score to discriminate the breast cancer classification. For the classification of mammographic images, Levy and Jain [9] used transfer learning on two pre-trained models such as AlexNet as well as GoogleNet. They compared both the networks where AlexNe was found to be the one with superior results. Ting et al. [10] proposed a new network named Convolutional Neural Network Improvement for Breast Cancer Classification (CNNI-BCC) network. For BC classification, the proposed model was trained from scratch. The experiment was conducted on MIAS database and the region of interests (ROIs) detected by a one-shot detector was fed to the model. After training with all CNN, Rampun et al. [11] used the three best performing models predictions in an ensemble and modified version of AlexNet model. Tsochatzidis et al. [12] carried out a detail survey where they compared different architectures such as AlexNet, VGG16, VGG19, ResNet50, ResNet101, ResNet152, GoogleNet, and Inception-BN (v2) from scratch as well as using fine-tuning for DDSM-400 and CBIS-DDSM datasets. Arora et al. [13] proposed a deeply ensembled transfer learning model for discrimination of benign and malignant tumors along with which a neural network classifier is used for auto feature extraction.

3 Database Description

In this research, two different datasets are used. It includes a publicly available dataset CBIS-DDSM [14], and another public dataset from INbreast, acquired with a license agreement. Figure 1 represents sample images from CBIS-DDSM and INbreast datasets.

CBIS-DDSM:

It is a subpart of the DDSM [15] database which includes a total of 6775 studies. The mammographic images from DDSM are selected by the trained and experienced radiologists and represented in updated and standardized version in CBIS-DDSM. All the images are converted to DICOM format after lossless decompression. Along with pathologic diagnosis information, the segmented region of interest (ROI) for training data is also included in the database. Though the dataset is broadly classified based on the types of abnormalities, such as mass and calcification, it can also be classified based on the subclass such as benign and malignant.

INbreast:

A total of 410 numbers of digital mammographic images are there in the INbreast dataset. Experienced radiologists were assigned to interpret the mammogram images and after the analysis, the lesions captured in the images are assigned a standard score known as BI-RADS [16]. The six BI-RADS scores indicates different stages of abnormalities found in breasts where score 0 recognizes non-conclusive examination; score 1 implies no findings; score 2 indicates benign; with score 3 it implies probably benign findings; score 4 indicates suspicious findings; score 5 ensures a high probability of malignancy; and score 6 predicts breast cancer. INbreast dataset is not available publicly but it can be obtained by a request from [17].

Fig. 1.
figure 1

Example of breast images (a), (b), (c), (d) from CBIS-DDSM for Full, ROI images of Mass, Calcification and (e), (f) from INbreast

4 Proposed Methodology

The main contribution of deep learning architectures is that it can extract low level to high level features by their own. CNNs are the best models to extract deep features from images. CNN can learn feature representation automatically as compared to handcrafted features.

In this work, various variants of pre-trained CNNs such as VGG19, ResNet50, MobileNet-v2, Inception-v3, Xception and Inception-ResNet-v2 are examined for experimental purposes. In pre-processing step, all mammographic images have been converted from the DICOM images to portable network graphics (PNG) format. The reason of converting the DICOM to PNG format is to avoid the degradation of image quality. Thereafter, normalization of the pixel values in the range of (0–1) is performed so that the higher pixel values do not influence the result of the investigation. In this paper, two different approaches have been taken into consideration for this work. In the first approach, these pre-trained models are trained from scratch for both the datasets. For the second approach, we have utilized the concept known as transfer learning. Transfer learning can be used in two different ways with respect to the deep learning models. First method is transfer learning for feature extraction, which means chopping off all layers from fully connected layer to output layer. The extracted features are known as deep features. Another method is transfer learning using fine tuning, it means that we can freeze some layers, change the dimension of fully connected layer and make a new architecture utilizing pre-trained model. In this paper, we have utilized transfer learning using fine tuning only. The aim of transfer learning is to apply the knowledge, the network learned from one problem to another similar kind of problem. Transfer learning utilizes the pre-trained networks, where the network learned weights to obtain features and those weights, bias values would be later used for the new problem. In the first approach, network initialization is done with some pre-trained weights, and in the second approach though the network gets initialized with random weights but fine tuning of the parameter is induced to get better performance.

4.1 Fine-Tuning Using Transfer Learning with CNNs

In Table 1, the different CNN architectures employed in the classification task along with the finely tuned parameters used for transfer learning are presented.

Table 1. The CNNs and their fine-tuned parameters for transfer learning

5 Results and Discussion

To assess the performance of the proposed model in different scenarios, two different approaches have been taken into account for CBIS-DDSM dataset. Mass and calcification are two different abnormality types present in this dataset. In first approach, all the calcification full and ROI mammographic images are trained and tested separately. In second approach, the training and testing performed on both mass full and ROI images separately to identify benign tumor and malignant tumor. For the evaluation of this work, various performance measures considered are given in the following equations:

$$ {\text{Accuracy }} = \frac{TP + TN}{{TP + TN + FP + FN}} $$
(1)
$$ {\text{Precision }} = \frac{TP}{{TP + FP}} $$
(2)
$$ {\text{Recall}}/{\text{Sensitivity }} = \frac{TP}{{TP + FN}} $$
(3)
$$ {\text{Specificity }} = \frac{TN}{{TN + FP}} $$
(4)

5.1 Analysis for CBIS-DDSM Dataset

Table 2, 3, 4 and 5 contains the result set for calcification full, ROI and mass full, ROI mammogram images.

Table 2. Result set for calc-full mammogram images (in %)
Table 3. Result set for calc-ROI images (in %)
Table 4. Result set for mass-full mammogram images (in %)
Table 5. Result set mass-ROI images (in %)

From the above results, it is found that the model achieved a highest accuracy up to 74% with both ResNet50 and Inception-v3 model for the calc-full mammogram images. For calc-ROI images, the model achieved the highest accuracy of 75% with Xception model. On the other hand, for mass-full mammogram images, the model achieved a highest accuracy of 75% with MobileNet-v2 model and for mass-ROI images; the accuracy obtained is 71% with ResNet50 model. From Table 2, 3, 4 and 5, it can also be concluded that fine-tuned models outperforms the performance of the same models when they are being trained from scratch.

5.2 Analysis for INbreast Dataset

The experimental results for the full mammogram images taken from the INbreast dataset are shown in Table 6 and Table 7. The experimental work on the state-of-the-art models are performed using train from scratch approach and fine-tuning approach. In the first approach, where the model is trained using INbreast full mammogram images from the scratch, the model achieved a highest accuracy of 82% and for the fine-tuning approach, the model achieved an accuracy of 84%. In this case too it can also be referenced that fine-tuned models overshadows the models which are trained from scratch.

Table 6. Result set for training from scratch (in %)
Table 7. Result set after fine tuning models (in %)

The state-of-the-art method [12], has also exploited the deep features for breast cancer diagnosis using from scratch and fine-tuning approaches for CBIS-DDSM dataset and achieved an accuracy of 66% for ResNet-101 model trained from scratch whereas for the same dataset, ResNet-152 achieved an accuracy of 76% when fine-tuning is used. In this paper, the extension of the same work has been performed on the pre trained networks and the performance comparison has been evaluated to verify which pre-trained network shows the improvement in terms of accuracy. It has been found out that for CBIS-DDSM dataset, InceptionResNet-v2 achieved an accuracy of 76% when the model is trained from scratch and using fine-tuning approach, MobileNet-v2 and Xception achieved 77% accuracy. From the analytical comparison of the obtained results with the state-of-the-art method, it can be also concluded that fine-tuning approach performs well than the trained from scratch approach.

6 Conclusion

For computer-aided diagnosis for breast cancer classification from mammographic images, this paper introduces deep convolutional neural networks using transfer-learning method. The performance of various networks is analyzed on two mammogram databases CBIS-DDSM and INbreast. In this work, two different scenarios of training are considered. In the first scenario, training is carried out from-scratch in which the network weights are initialized using a random distribution approach, and the other one is the fine-tuning approach, where the training is initiated with the already learned network weights. In the second scenario, fine-tuning is performed using transfer learning. Above mentioned two approaches are performed on VGG19, ResNet50, MobileNet-v2, Incep-tion-v3, Xception and Inception-ResNet-v2 models respectively. Experimental results show that fine-tuned models outperforms the models which are trained from scratch with an accuracy of 75% and 83% for CBIS-DDSM and INbreast respectively.