Keywords

1 Introduction

Breast cancer is the most common cancer in women worldwide and is the second leading cause of cancer deaths in women, after lung cancer [1]. One of the common approaches used for cancer detection is histopathology; it is defined as the microscopic examination of the histological sections of a biopsy sample by a pathologist in order to study the effects of a particular disease. Computer-aided approaches with image analysis and machine learning can be included in digital pathology to achieve quick and reproducible results. Computer aided diagnostic (CAD) models are also important as they assist pathologists in locating and identifying abnormalities in the breast tissue images.

A concern in histopathology image based assessment is the variation in color is obtained due to a number of factors like chemical reactivity from different manufacturers, differences in color responses of slide scanners or due to the light transmission being a function of slide thickness. However, this variability only partially limits the interpretation of images by pathologists. But leads to a large variation in the efficiency of automated image analysis algorithms. This problem can be partially reduced by incorporating the use of standardized staining protocols and automated staining machines. Stain normalization algorithms [2,3,4] have also been recently introduced to address stain variation, with the aim of matching stain colors of whole slide image with a given template.

One of the naive options to deal with color constancy is to convert color image to grayscale. In [5] author extracted features from a grayscale version of a query image. Conversion of an image into grayscale gives us the average of concentrations of the tissue components and does not tell us the relative amounts of each of them. Further, this also does not make effective use of the color information which is present. Recent research in histopathology has confirmed that color information is quite significant in quantitative analysis.

The outstanding ability of a pathologist to identify stain components is not only due to the utilization of color information but also because of incorporating the spatial dependency of tissue structures. The use of standardized staining protocols and automated staining machines may improve staining quality by yielding a more accurate and consistent staining. However, eliminating all the underlying sources of variation is infeasible.

Indeed, methods that investigate the importance of staining in conjunction with classification framework have also been developed. In [6], authors investigate the importance of stain normalization in tissue classification utilizing convolution network. In the same way, the authors in [7] perform the classification of prostate tumor regions via stain normalization and adaptive cell density estimation. On the other hand, there also exists some work which considers the utilization of color information without the use of stain normalization. The works in [8] and [9] propose the use color information in addition to texture. Milagro et al. [8] propose the combinations of traditional texture features and color spaces. Furthermore, they have also considered different classifiers such as Adaboost learning, bagging trees, random forest, Fisher discriminant and SVM. In [9], authors utilized color and differential invariants to assign class posterior probabilities to pixels and then perform a probabilistic classification.

The above approaches suggest different views about the consideration of stain normalization for classification. Inspired by this, our primary contribution in the present work, is to explicitly provide indications towards addressing the question about how important is stain normalization for automatic classification, and whether there may exist useful features which inherently capture the color-texture variability without performing stain normalization. In other words, we attempt to justify the role of color-texture information in automated classification framework without performing stain normalization. We believe that such a study is important from a systems perspective, as it may help reduce the stain normalization overhead for automated histopathology classification systems. We note that such an indication of mitigating the need for stain normalization assumes that the training data contains images with different stain color/intensity, that helps in capturing the color-texture variability. We validate this hypothesis on a reasonably large dataset containing such images.

In this primary context, our work also has the following salient contributions about methodology and evaluation: (1) Exploring joint color-texture features and various contemporary classifiers, which makes the proposed work also serve as an extended comparative review. (2) Suggesting an automatic approach to select the reference (target) image for stain normalization where it is not available. Figure 1 shows examples of stained images of each magnification generated using target images. (3) Demonstrating an improved performance of the proposed method with joint color-texture features with respect to the state-of-the-art.

Fig. 1.
figure 1

Reference images chosen for normalization (top), sample of original images (middle), and obtained stain normalized images using the target images (bottom)

2 Proposed Approach

We now discuss the overall framework of proposed approach including: dataset description, stain normalization, feature descriptors, and contemporary classifiers. Due to lack of space, we briefly describe the features and classifiers with suitable references. Figure 3 depicts our overall framework.

2.1 Dataset Description

In this work, we use BreakHis dataset [10] that contains fairly large amount of histopathology images (7909). A detailed description is provided in Table 3.

In [10,11,12], authors developed the framework to classify breast cancer histopathological images utilizing the BreakHis dataset. In [10], a series of experiments utilizing six different state-of-art texture descriptors such as Local Binary Pattern (LBP), Completed Local Binary Pattern (CLBP), Threshold Adjancey Statistics (PFTAS), Grey-Level Co-occurrence Matrix (GLM), Local Phase Quantization (LPQ), Oriented FAST and rotated BRIEF (ORB), and four different classifiers were evaluated and showed the accuracy at patient level. In [11], Alexnet [13] was used for extracting features and classification. Bayramoglu et al. [12] proposed a magnification independent model utilizing deep learning to classify the benign and malignant cases. The magnification independent system is trained with images of different magnifications, and thus can handle the scale diversity in microscopic images.

2.2 Stain Normalization Procedure

Various methods [2,3,4] have been developed to automate the standardization process of histopathological images to reduce the effect of variation that exists in staining protocol. In [2] authors utilized chromatic and density distributions for each individual stain class in the hue-saturation-density (HSD) color model i.e. the spatial dependency of tissue structures was incorporated along with color information. In experiments, the target template was chosen based on opinion of two pathologists, who studied a large number of slides from two different laboratory. The high contrast between hematoxylin and eosin staining (H & E) and visibility of the nuclear texture were taken into consideration while choosing template image. In [3], the authors use the linear transform in a perceptual color space for matching the color distribution of an image to that of a target image. In [4], authors also utilized pathologist-preferred target image to generate structure-preserving color normalization. These stain normalization methods require prior knowledge of reference stain vectors for every dye present in the histopathogical images.

Due to the unavailability of a target template in the public dataset used in this work, we automatically select the target image from the dataset. Our approach considers that the target stain chromatic information should be considered that which occurs most commonly, so that a large number of images need not be color-transformed. Thus, we suggest the following process: (1) First, all the images in dataset are converted from RGB color-space to HSV color-space. (2) We choose H and S component for further analysis as Hue and Saturation essentially relate to the chromatic information. (3) A K-means clustering algorithm is applied to form the desired number of clusters. The number of clusters chosen is the double of the number of different Hue that are found on manual examination of different images. This is to ensure that we have a good enough separation of images of different Hue. (4) The number of stain hues in the dataset found after manual examination are five. So we chose to create 10 clusters. In the pictorial representation of Fig. 2, we show less number of clusters for better clarity. (5) We choose the cluster that has highest number of images. In this cluster we find out the mean H and S value of image which is the closest to the centroid of the cluster using Euclidean distance measure. The corresponding image is used as the target image after conversion to RGB color-space.

The proposed procedure is applied separately to images of different magnifications. At the end, we have one target image for each magnification group of images. After selecting target image, we use stain normalization method proposed by [3] to normalize the dataset. Figure 2 illustrates the overall procedure for selecting target image. Figure 2 is just a way to depict the procedure and is not a real picture of our plots.

Fig. 2.
figure 2

Selection of target image.

2.3 Feature Descriptors and Contemporary Classifiers

Various texture features that consider the mutual dependency of color channel as well as the features that don’t utilize the color information are extracted in order to support believe that we have made. Due to space constraint we are not providing the detailing of features. Gray level co-occurence matrix (GLCM), Completed local binary pattern (CLBP) [14], and Local phase quantization (LPQ) [15] are used to extract plain texture. For capturing joint color-texture variability, features such as Opponent Colour Local Binary pattern (OCLBP) [16], Gabor features on Gaussian color model [17], Multilayer Coordinate Clusters representation (MCCR) [18], and Parameter-free Threshold Adjacency Statistics (PFTAS) [19] are utilized. Note that the choice of features and classifiers is simply based on the popularity of the traditional texutre features, and considering some recently reported color-texture features.

Fig. 3.
figure 3

Overall process of image classification.

Various contemporary classifiers [20] such as Support Vector Machine (SVM), Nearest Neighbors (NN) and Random forest (RF) are experimented with above listed texture features. For the proposed study, we utilize following variants of mentioned classifiers: (1) Liner SVM: Linear kernel, (2) Gaussian SVM: RBF kernel and kernel scale set to 2\(\sqrt{P}\), (3) K-NN: 100 neighbors and euclidean metric for distance measure, (4) RF: number of trees (30), maximum number of splits (20). Thus, the study also serves as a comparative review of performance for the above mentioned features and classifiers for the considered problem.

Figure 3 illustrates the overall structure utilized for the proposed study. As indicated earlier, the primary intention of this study is to consider the effect of stain normalization for automated breast cancer histopathology image classification. Thus, we test the classification performance with gray-scale image, stain normalized images, and original non-normalized images.

3 Results and Discussion

3.1 Training-Testing Protocol and Evaluation Metric

In our experiments, we have randomly chosen 58 patients (70%) for training and remaining 25 for testing (30%). This also enables fair comparison with a state-of-the-art approach [10,11,12]. We train the above listed contemporary classifiers using images for the chosen 58 patients, and have also used five trials of random selection of training-testing data. These trained models are tested using images of the remaining images 25 patients.

To compare the results with existing approach [10,11,12], we use patient recognition rate (PRR) as evaluation metric. However, some other evaluation metrics such as recall, precision, area under the ROC curve (AUC) can also be utilized. The definition of patient recognition rate (PRR), which uses the patient score (PS), is given as follows:

$$\begin{aligned} PRR = \frac{\sum ^{N}_{i=1}PS_{i}}{N} PS = \frac{N_{rec}}{N_{P}} \end{aligned}$$
(1)

where N is the total number of patients (available for testing), and \( N_{rec}\) and \( N_{P} \) are the correctly classified and total cancer image of patient P respectively.

Table 1. Evaluation of color channel information along with contemporary classifiers for 40x and 100x. Best performance at each magnification is highlighted.

3.2 Performance Evaluation and Comparison

Tables 1, 2 illustrate the performance of contemporary classifiers for 40X, 100X, 200X and 400X magnification respectively. For each magnification, seven texture features out of which three are plain texture that directly extracts the feature from gray version of image, and other four are color texture that utilizes color-channel information are extracted.

Table 2. Evaluation of color channel information along with contemporary classifiers for 200x and 400x. Best performance at each magnification is highlighted.

From the tables following can be observed (for most or all cases):

(1) The performance when using color information (with or without stain normalization) is better performance than using gray level information. This highlights the importance of color in classification.

(2) Comparing the cases with and without stain normalization, it is seen that the classification performance is better in case of latter. However, the difference is not too high or even quite close in many cases where traditional texture features are used. This is expected as texture information is similar, except in some cases where even independent color channels may help in capturing the color variation.

(3) However, in case of joint color-texture features, except for a very few cases, the performance without stain normalization is consistently quite high. This indicates that the joint color-texture features, which consider the mutual dependency of color channels, indeed, better capture the color-texture variation for classification.

(4) It can also be observed that opponent LBP where opponent channels are considered to extract color channel information, shows the superior performance for most (three) of magnification images. However, Gabor feature on Gaussian color model, and M CCR also yield somewhat comparable performance.

There are very few exceptions from the above observations for some feature-classifier combinations. While these need to be better scrutinized, we note that most of the results support our hypothesis of the effectiveness of color-texture information in mitigating the need for stain normalization.

Table 3. Detailed description.
Table 4. Performance comparison.

Finally, Table 4 compares the results of the proposed approach obtained with joint color-texture features, with some existing state-of art methods. In Table 4, we report the best results of the proposed method obtained across various combination of color-texture descriptors and classifiers. We can observe that, except for the case of 40x, the proposed method outperforms the existing approaches. We also note from the Table 4 that at 400x, the obtained accuracy for most methods is lower than that at 100x and 200x. The reason could be that there are relatively less number of images for 400x and perhaps more data is required to capture the finer traits at higher magnification. Furthermore, one can also observe that the proposed work yields a lesser variance in scores, in most of the cases. This comparison further shows the effectiveness of joint color-texture features for classification.

4 Conclusion

In this work, we attempt to establish the usefulness of joint color-texture information, for classification without the need for stain normalization. We have experimented with various classifiers to show the importance of independent and dependent (mutual) color-channel information and find some interesting aspects about the same. From our experiments, it is apparent that joint dependency of color-texture can better capture the color-texture variability. We have also shown the role of contemporary classifiers with these sophisticated color-texture features. We believe that this is an interesting study which points towards obviating the need of stain normalization given effective features and classifiers. We also demonstrate that employing the joint color-texture features can also outperform the state-of-the-art methods for the breast cancer histopathology image classification.