Keywords

1 Introduction

Breast cancer in women is the most common form of cancer [1]. The American Cancer Society [2] reports that breast cancer leads to deaths of 40,610 women and 460 men in 2017 in the USA. Another report by the American Cancer Society reveals in every 8 women 1 which is diagnosed with invasive breast cancer and 1 in 39 women die from breast cancer [3]. In the developed countries like USA and UK, breast cancer survival rate is relatively high but in poor and developing countries like India, survival rates are much less major reasons being lack of awareness, delay in diagnosis, and costly screenings. Though there has been an increase in survival rates due to the advancement of technology, a number of cases have also increased over the years. Invasive ductal carcinoma (IDC) constitutes about 80% of all breast cancer cases [4]. Doctors use various techniques to detect IDC such as physical examination, mammography, ultrasound, breast MRI, and biopsy. The biopsy is often done to check the suspicious mammogram. It involves examining the abnormal-looking tissue under a microscope by a pathologist. Such an examination requires a lot of time, skill, and precision. Therefore over the years, many computer-aided systems are being developed to make the work of a pathologist easier and efficient.

Since the advancements in machine learning and image processing a lot of research is oriented towards creating a better and efficient model to detect breast cancer. Machine learning is a field of computer science that uses data or past experiences to generate or predict outcomes for unseen data [5,6,7,8, 37]. Multiple methods have been applied to create an efficient machine learning model for detecting breast cancer. Classification using standard machine learning techniques often requires feature selections. Some of them used image segmentation such as thresholding along with the SVM classifier to create a classification model [9,10,11,12]. Preprocessing is an important task in machine learning which depends upon the type of data. Some studies used mammography images; therefore, a common preprocessing step was cropping and resizing [9,10,11,12]. Different studies used different image preprocessing techniques such as adaptive histogram equalization followed by segmentation, image masking, thresholding, feature extraction, and normalization before training the classifier [9], morphological operations for image preprocessing followed by an SVM classifier [13] and high-pass filtering followed by a clustering algorithm [14]. Some other preprocessing and classification steps involved were median filtering and fuzzy-C clustering with thresholding [15], region-based image segmentation followed by SVM classifier [16] and ostu-based global thresholding method followed by radial basis neural network for classification [17].

Later advancements in deep learning and availability of high computational GPU lead to multiple studies in this sub-domain of machine learning. Deep neural networks do not require any feature selection or extensive image preprocessing techniques. These networks mimic human brain neurons and automatically does feature selections [18]. The use of neural network architectures in breast cancer classification leads to the state-of-the-art results. Convolutional neural networks proved to be one of the most efficient models when working with images and videos. CNN maintains the structural architecture of the data while learning; this makes it achieve a higher accuracy over traditional machine learning methods [19]. Therefore, multiple studies for breast cancer detection revolve around convolutional neural networks [20,21,22,23,24]. Some of the studies combined deep neural architectures for feature selection and standard machine learning classifiers like SVM for the classification task [22]. Other approaches involve using transfer learning as a key part. Transfer learning is a technique to using knowledge of another network trained on a different dataset and use it in with your data. Various studies have used ResNet50, VGG16, VGG19, etc., models for breast cancer detection and have shown impressive results [25,26,27,28]. Studies mentioned in [29] have also used CNN and deep learning architectures for transfer learning to classify invasive ductal carcinoma. With the proposed methods, they have achieved an accuracy of 85.41%. We worked to improve these results.

Many of these approaches have used a multi-classification dataset that contains more than one class of breast cancer. Also, most of the approaches were applied to a mammographic dataset. The mammographic dataset can detect any abnormalities in the tissues which have to further go with biopsy. Our approach is on a binary classification dataset which addresses the problem of Invasive ductal carcinoma. The effective classification on the dataset will help in easing the task of the pathologist. In this paper, we have approached the problem by using various deep learning architectures through transfer learning. We applied image undersampling and image augmentation to handle imbalance dataset and to increase the accuracy of the model by providing necessary image transformation parameters. The objective of this paper is (a) to compare the performances of various famous convolutional deep learning architectures through transfer learning and achieve high efficiency in less computational time on the given dataset (b) to increase the efficiency in the IDC classification task over previous works.

2 Approach

In our approach, we used transfer learning and image augmentations on the given dataset. The following sections include steps applied to the dataset to detect invasive ductal carcinoma.

2.1 Undersampling and Data Preparation

In medical imaging, dataset class imbalance is a common problem; there are generally more images of negative results than positive results of disease. Similarly, in our dataset, there is a huge imbalance of the data between IDC(−) and IDC(+) classes. Such imbalance data could mislead the model in learning one type of class more than the other. Therefore, we used resampling techniques to create balanced data.

2.2 Image Augmentation

Image augmentation is a technique that is used to generate more data by applying certain processing to the images. It helps to create unseen yet valuable data which could help in further increase the accuracy. The common image augmentation techniques are zoom, flips, rotation, etc. These techniques help the model to learn variation in the dataset and not to get constraint in one particular type of format. It is also a powerful technique to solve the lack of data problem. We applied image augmentation techniques to add variation to the dataset which could further increase the efficiency of the model.

2.3 Transfer Learning

Transfer learning is a technique of using models that were trained on different datasets on your data with some tuning. We used various state-of-the-art CNN architectures that were trained on the ImageNet dataset. The idea was that the starting layers of CNN are used to capture the high-level features such as texture and shape which are not dependent on the data; therefore, utilizing those trained weights and fine-tuning it according to our model could help us achieve good results. The various deep learning architectures we used are present in Sect. 3.

3 Deep Learning Architectures

The various deep learning architectures used are mentioned below. While training, each of these architectures was fine-tuned so that they give the most efficient solution. While training validation set was used to prevent overfitting of the data. The weights were stored when the minimum loss was found, and the same weights were used to test the model efficiency on the test set. All the following architectures were trained on the ImageNet dataset [30] and achieved remarkable results.

3.1 VGG16 and VGG19

The VGG16 and VGG19 models were sequential CNN models with 3 × 3 convolutional layers stacked upon one another. The architecture contained max-pooling layers to reduce the volume as the layer increases finally the fully connected network layer with 4096 nodes followed by a 1000 node layer with a softmax activation function [31]. VGG16 and VGG19 are slow to train and have large weights themselves.

3.2 ResNet50

ResNet50, unlike VGG models, is a non-sequential model. This is a collection of CNN stacked together with residual networks added to each layer. The output of each layer of CNN layer is added with the actual input of that layer [32]. ResNet50 contains 50 weight layers and is faster to train with VGG networks.

3.3 DenseNet

DenseNet involves some advancements over ResNet50 networks, instead of adding feature maps DenseNet network concatenates output feature maps with input feature maps. Each layer output is concatenated with the outputs of all the previous layers, thus creating a dense architecture [38].

3.4 EfficientNet

This model performed so well on the ImageNet dataset that it was able to achieve 84.4% top 1 ranking. It crossed the state-of-the-art accuracy with 10 times better efficiency. The model was smaller and faster. Width, depth, and image resolution were scaled, and the best result was observed [33]. EfficientNet has shortcuts that directly connect between the bottlenecks and a fewer number of channels than expansion layers.

3.5 MobileNet

MobileNet is a lightweight model. Instead of performing combined convolutions on the three channels of colors and flattening, it applies convolutions on each color channel [34]. They are ideal for mobile devices.

4 Experiment and Results

4.1 Datasets

The dataset from which our data was derived was a 162 whole mount slide images of the breast cancer specimens scanned at 40× [35, 36]. The derived dataset contained 25,633 positive samples and 64,634 negative samples. Each image is of size (50 × 50) extracted from the original dataset (Fig. 1). The image filename is in the form of z_xX_yY_classD.png. For example, 55634_idx8_x1551_y1000_class0.png where z is the patient ID (55634_idx8), X represents the x-coordinate and Y represents y-coordinate from which the patch is cropped and D indicates the class of the data. Class0 being IDC(−) and Class 1 being IDC(+).

Fig. 1
figure 1

Sample images from the dataset

4.2 Data Preparation

We performed undersampling on our dataset, i.e., removing samples from classes to make it more balanced (Fig. 2). The undersampling was done at random without replacement to create a subset of data for the target classes.

Fig. 2
figure 2

Final distribution of class after undersampling

The final distribution contains 25,367 positive samples and 25,366 negative samples of data. Final data split into the train, test, and valid sets containing 31,393, 7849, 11,490 files, respectively. The class names were extracted from the names of the files and were one-hot encoded, i.e., binary class was represented by two features instead of one; for example, class0 was represented by an array of [1, 0] and class1 was represented by [0, 1] so that for deep learning model will give probabilities for the two indexes of the array and whichever is maximum that index will be the class of our array after which image augmentation techniques were applied such as zoom, rotation, width shift, height shift, height shift, shear, flip.

Now many deep learning model architectures were used to provide transfer learning. They were fine-tuned to give change their domain to our dataset.

4.3 Results

The experiments were performed on Kaggle Notebook with its GPU with Keras API of Tensorflow. NVidia K80 GPU was used to perform experiments on the dataset. We used various famous deep learning architectures for transfer learning (VGG16, VGG19, ResNet50, DenseNet169, DenseNet121, DenseNet201, MobileNet, and EfficientNet). All these models were trained on the ImageNet dataset, and over the years, they have achieved high accuracy of the dataset. We have used only CNN architectures of these models and then added layers according to the requirement of the complexity of the dataset. The first set of experiments were conducted without any data augmentation, and then data argumentation was added to see the change. The evaluation metrics used to compare results were F1-score, recall, precision, accuracy, specificity, sensitivity. They are used to evaluate our model using true positives, true negatives, false positives, and false negatives.

DenseNet169 was trained for 60 epochs with a global pooling layer and two dense layers each followed by a dropout layer of 0.15 and 0.25, respectively, with ReLU activation added after the frozen CNN layers. The last layer was a dense layer with two classes and a softmax function. DenseNet 121, ResNet50, VGG19, and VGG16 were trained for 60, 60, 100, 100 epochs, respectively. The architecture of the last layers was made different than DenseNet169. We added batch normalization having 1e-05 epsilon value and 0.1 momentum followed by a dense layer with 512 nodes and a dropout of 0.45. After that, a dense layer with 2 nodes representing each class was added with softmax activation. The experiment results without data augmentation can be seen in Table 1, the highest value in each column is highlighted (in bold).

Table 1 Results without data augmentation

Data augmentation was further added to push the accuracy even further with more different types of images in the dataset. Augmentation was all general, i.e., zoom range of 0.3, rotation range of 20, width shift range, sheer range, and height shift range was all set to 20 and also horizontal flip was set as true. With the mentioned augmentation, all previous models were used with SGD optimizer, global pooling layer followed by 32-node and 64-node layer with 0.15 and 0.25 dropout layer, respectively, and ReLU was used as an activation function. Results can be seen in Figs. 3 and 4.

Fig. 3
figure 3

Precision and recall versus fine-tuned models

Fig. 4
figure 4

Confusion matrix of DenseNet 169 with data augmentation

Result of the experiments can be seen in Table 2.

Table 2 Results with data augmentation

By using different deep architectures for transfer learning, we were able to achieve higher accuracy and recall with our models over previous claimed methods. Table 3 compares our best model’s efficiency with respect to the efficiency claimed by previous works.

Table 3 Our score versus existing deep learning approaches

We presented F-Score as evaluation metrics for comparison with previous methods. We can see that our approach and method improved the F-Score by 10.2% compared to the latest researches.

5 Conclusion

We classified invasive ductal carcinoma (IDC) using deep learning. In our study, we took advantage of various pre-trained models and used their knowledge and added some fine-tuning to get an efficient model. We first tried undersampling techniques to balance classes then image argumentation followed by transfer learning. We were able to get a high precision and moderate recall model with VGG19 with image argumentation. The value of precision was 94.46, and recall was 78.51. We got an accuracy of 86.97 with DenseNet121 which when compared with other latest research that got 85.41% [29] accuracy.

Hence, we can conclude that transfer learning which is the simplest technique available in deep learning frameworks can be used to detect dangerous diseases such as cancer. We have also shown that deep learning can perform well while detecting IDC, and therefore, it is an improvement over manual segmentation. Deep learning has given us the freedom to detect the disease on smaller datasets with small size images from which only deep learning models can infer useful information. An increase in the dataset will improve the results. Hence, in the future, we are planning to work on the full dataset with full-size slide images and will work with more advanced techniques such as GAN networks to obtain better results.