Keywords

1 Introduction

Breast cancer is ranked as the most common cancer in women worldwide, and it also featured with high morbidity and mortality among women worldwide [1]. The diagnosis by histopathological images under microscopy is one of the golden standards in clinical applications. With the development of imaging sensors, histopathological slides can be scanned and saved as digital images. As the digital image sizes increase dramatically with the magnification, it would be ideal to develop image processing and analysis tools, e.g. classification, in computer-aided diagnosis (CAD) for breast cancer.

Hand-crafted features, such as Scale invariant feature transform (SIFT), histogram of oriented gradient (HOG), gray-level co-occurrence matrix, kernel methods have been reported in the recognition or classification tasks in breast cancer histopathological image analysis. Some well-known classifiers, e.g. support vector machines (SVM), has been reported as well. Recently, deep learning methods, for example convolutional neural networks (CNN), has receive more attention and impressive performances in many tasks of histopathological image processing for breast cancer research, including recognition, classification and segmentation [2]. Chen et al. [3] detected cell mitosis in breast histology images using deep cascading CNN, which dramatically improves detection accuracy over other methods in 2014 ICPR MITOS-ATYPIA Challenge. Wang et al. [4] used CNN, which includes 27-layer breast cancer metastasis test and then won first place in Metastasis Detection Challenge of ISBI2016. Spanhol et al. [5] trained the classification of benign and malignant breast cancer pathological images by Alexnet [6], whose result is 6% higher than the traditional machine learning classification algorithm. Bayramoglu et al. [7] used deep learning to magnification independent breast pathology image classification and the recognition rate is 83%. Spanhol et al. [8] proposed an assessment of BC-recognition for caffeine-free features, increasing the accuracy to 89%. Wei et al. [9] proposed a novel breast cancer histopathological image classification method based on deep convolutional neural networks, named as BiCNN model, resulting to a higher classification accuracy (up to 97%).

The reported state-of-the-art methods strongly rely on the large-scale labeled data in training the network. However, in the view of real-world application, large-scale labeling in medical images are tedious and extremely expensive. Strong professional skills are usually required in the applications compared with annotating natural images. Very limited reports have been contributed to reduce the labeling burden in the proposed task. We proposed a deep domain adaptation method with PCAnet and a domain alignment operation to reduce the labeling cost by transferring knowledge from the source dataset to the target one [10]. We also introduced self-taught learning to PCAnet to reduce the burden of labeling [11]. However, labeled images in the training data are still randomly selected in the previous works.

In the proposed work, we want to improve the deep learning architecture for the classification task in breast cancer histopathological images by a deep active learning framework. Instead of random selection, active learning methods usually actively select samples with lowest confidence (highest entropy) as valuable samples, and they are added to query, and then the network can be fine-tuned incrementally [12]. In the proposed method, inspired by boosting, the query strategy is also improved, where samples with both high and low confidence are considered simultaneously to emphasize the confidence boosting. We consider that the network should be fine-tuned with additional supervision and its previous regularization simultaneously. The contributions of the proposed work can be summarized as: (1) the labeling cost can be reduced labeling effort with random selection; (2) The method outperforms standard active learning query strategy by the entropy boosting effect.

2 Proposed Method

As a topic in machine learning, active learning is to seek for the most informative samples in a large number of unlabeled dataset actively to annotation query, in order to reduce the labeling effort. We consider introducing the idea of active learning into our method to reduce the labeling cost required for deep learning methods in breast cancer pathological image classification. Firstly, the network is initiated with very limited random selected labeled data. Secondly, the key problem in active learning is how to define the criteria of ‘valuable’ samples. In the standard query strategy, ‘worthness’ is usually defined by the entropy calculated with deep architecture, as:

$$ e_{i} = - \sum\nolimits_{k = 1}^{Y} {{ \log }\left( {p_{i}^{j,k} } \right) \cdot p_{i}^{j,k} } $$
(1)

Where \( p_{i} \) is the confidence value of the network for a sample \( x_{i} \), and Y represents the number of categories in the work. Entropy captures the uncertainty of classification system in each prediction. A larger entropy value denotes higher uncertainty of the system. In the standard query strategy, active learning methods select a certain number of high-entropy samples to the annotation query until the query size is full. Then the network is fine-tuned with the labeled samples incrementally.

In the proposed work, we believe that the evolution of the network should be fine-tuned incrementally by two factors, the additional supervision from manual labeling and the regulations from previous network. Thus in the proposed query strategy, inspired by the idea of boosting, samples with high entropy values and low entropy values are both considered for a boosting effect. It should be mentioned that the samples with high confidence or low entropy values are labeled by the previous network instead of manual annotation, so there is no additional cost of labeling with the standard active learning query strategy. The algorithm is detailed illustrated as follows.

figure a

As shown in Algorithm 1, let B represents the whole dataset with \( n_{B} \) images, and it is divided into training set and test set \( B_{train} \) and \( B_{test} \). The CNN model, denoted as \( M_{0} \), is initiated with \( n_{i} \) randomly selected samples in each category, \( n_{i} \) is set to be very small value, for example, two. For convenience, \( B_{train} \) is divided into labeled data \( B_{l} \), and remaining unlabeled data \( B_{u} \). The sizes of \( B_{l} \) and \( B_{u} \) are \( n_{l} \) and \( n_{u} \) respectively, where \( n_{l} + n_{u} = n_{B} \). And \( n_{l} \) and \( n_{u} \) are changing during the incrementally network learning, since the main idea of active learning is to select most valuable samples from \( B_{u} \) to annotation queue \( A_{t} \) for manual annotation. In each query round, the network is fine-tuned with all the labeled samples in \( A_{t} \), and then \( n_{l} \) turns to \( n_{l} + n \), and \( B_{u} \) turns to \( n_{u} - n \), where \( n \) is the size of \( A_{t} \). A widely-used criteria is to select \( n/2 \) samples with highest entropy values in each category. The number of query is set to \( T_{a} \), so the fine-tuned network after each query is denoted as \( M_{j} \), \( {\text{j}} = \left( {0, \cdots ,T_{a} } \right) \). In the proposed work, the network in each query is fine-tuned with samples with both high entropy and low entropy. Besides \( n \) manual annotated samples, \( A_{t} \) contains additional \( m \) samples with lowest entropy values in each category. It should be mentioned that the labels of these \( m \) samples are auto-labeled by the previous network.

3 Experiment

3.1 Dataset Description

The proposed framework is evaluated on a public dataset of breast cancer histopathological images, BreaKHis [13]. The large-scale dataset contains 7909 images from 82 patients of breast cancer. The dataset is divided into benign and malignant tumors that are scanned with four magnification factors: 40X, 100X, 200X, and 400X. Pathological images are with size of 700 × 460 in RGB format. The details of the database are shown in Fig. 1.

Fig. 1.
figure 1

Breast cancer histopathological image samples in the BreaKHis. (Top: benign. Bottom: malignant.)

3.2 Implementation Details

In this section, the proposed algorithm is implemented with TensorFlow framework. The basic CNN architecture is AlexNet pre-trained at ImageNet [14]. The basic settings of the server is intel 2.2-GHz CPU and a NVIDIA GeForce GTX 1080Ti GPU. The dataset has also been divided into training data (70%) and testing data (30%) randomly with no overlapping. In both training and testing set, the size of each category is balanced to be the same. In our work, the proposed work is evaluated on the image-level binary classification, that is, each image is predicted with benign or malignant. Since two categories have been balanced, classification accuracy is used as the metric in the validation, as follow:

$$ {\text{Image }}\;{\text{level}}\;{\text{accuracy}} = \frac{{N_{c} }}{{N_{im} }} $$
(2)

Where \( N_{im} \) the total number of images in the dataset, and the \( N_{c} \) represents the total number of images that are correctly classified.

The network is initiated with one benign sample and one malignant sample randomly selected from the training data. In each experiment, there are 5 query round, where query size for manual labeling in each round is \( N_{m} \). It should be mentioned that the network is fine-tuned incrementally with 64 labeled images after each query, 48 of them are manual labeling, and the other 16 are auto-labeling.

3.3 Experiment Result

Experimental results on four magnification factors are demonstrated in Fig. 2 and Table 1. It can be observed and concluded from the figures that both standard deep active learning methods and proposed framework have consistent better performances compared with incremental learning with random selection in all the experiments. Deep active learning methods can save up to 52% of the labeling cost compared to random selection to achieve a similar accuracy. This demonstrated that in the view of real-world application, the proposed framework is a better option in recognition task with deep learning methods. It also can be concluded that our proposed method also outperforms deep active learning method with strategy of only high entropy.

Fig. 2.
figure 2

Comparing the performance of entropy active learning, random active learning and our proposed method in 5 times active learning.

Table 1. Comparing to the annotation cost of our proposed method, random active learning and entropy active learning in similar accuracies. Thereinto, the cost refers to the number of labeled samples, which means the annotation cost.

4 Conclusions

We proposed a deep active learning framework in histopathological image analysis for breast cancer research. The main purpose of the work is to reduce the tedious labeling burden in the medical application if deep learning methods are used. Instead of randomly selecting samples for annotation as training samples, the framework actively seeking for the most valuable unlabeled data to be manual labeled, and then fine-tune the network incrementally. Besides, we also improve the query strategy with a confidence boosting operation, where both samples predicted with high confidence and low confidence are used in network training in each query round. The samples with high confidence are auto-labeled with the network, so there is no additional manual labeling cost compared with standard active learning methods. The experimental results validated on a large breast cancer histopathological images dataset have demonstrated that our proposed method significantly reduces the labeling cost compared with random selection. It also has better performances with higher accuracy when compared with standard query strategy.