Abstract
Stain variations often decrease the generalization ability of deep learning based approaches in digital histopathology analysis. Two separate proposals, namely stain normalization (SN) and stain augmentation (SA), have been spotlighted to reduce the generalization error, where the former alleviates the stain shift across different medical centers using template image and the latter enriches the accessible stain styles by the simulation of more stain variations. However, their applications are bounded by the selection of template images and the construction of unrealistic styles. To address the problems, we unify SN and SA with a novel RandStainNA scheme, which constrains variable stain styles in a practicable range to train a stain agnostic deep learning model. The RandStainNA is applicable to stain normalization in a collection of color spaces i.e. HED, HSV, LAB. Additionally, we propose a random color space selection scheme to gain extra performance improvement. We evaluate our method by two diagnostic tasks i.e. tissue subtype classification and nuclei segmentation, with various network backbones. The performance superiority over both SA and SN yields that the proposed RandStainNA can consistently improve the generalization ability, that our models can cope with more incoming clinical datasets with unpredicted stain styles. The codes is available at https://github.com/yiqings/RandStainNA.
Y. Shen and Y. Luo—Equal contributions.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Pathology visually exams across a diverse range of tissue types obtained by biopsy or surgical procedure under microscopes [6]. Stains are often applied to reveal underlying patterns to increase the contrast between nuclear components and their surrounding tissues [14]. Nevertheless, the substantial variance in each staining manipulation, e.g. staining protocols, staining scanners, manufacturers, batches of staining may eventually result in a variety of hue [10]. In contrast to pathologists who have adapted themselves to these variations with years’ training, deep learning (DL) methods are prone to suffer from performance degradation, with the existence of inter-center stain heterogeneity [2]. Specifically, as color is a salient feature to extract for by deep neural networks, consequently, current successful applications for whole slide images (WSIs) diagnoses are subject to their robustness to color shift among different data centers [5]. There are two primary directions to reduce the generalization error, namely stain normalization and stain augmentation [22].
Stain normalization (SN) aims to reduce the variation by aligning the stain-color distribution of source images to a target template image [11, 17, 24]. Empirical studies regard stain normalization as an essential prerequisite of downstream applications [2, 12, 22]. Yet, the capability to pinpoint a representative template image for SN relies heavily on domain prior knowledges. Moreover, in real-world settings such as federated learning, the template-image selection is not feasible due to the privacy regularizations [10], as source images are inaccessible to the central processor as a rule. Some generative adversarial networks (GANs) are proposed recently [17, 18] for SN, yet remaining the phenotype recognizability is always problematic. A salient drawback of the sole stain style in SN is the restricted color-correlated features can be mined by deep neural networks. Stain augmentation (SA) seeks a converse direction to SN by simulating stain variations while preserving morphological features intact [21]. Tellz et al. [20] first tailored data augmentations from RGB color space to H &E color space. Afterward, in parallel with SN approaches, GAN is also widely adopted by stain augmentation applications e.g. HistAuGAN [23].
Previous works have compared the performance between SN and SA without interpretation of their differences [22]. Moreover, we have observed that the mathematical formulations of SN are coincidental with SA, where the transfer of SN depends on a prior Dirichlet distribution [25] and SA distorts images with a uniform distribution [20], depicted in Fig. . Hence, we make the first attempt to unify SN and SA for histology image analysis. Two primary contributions are summarized. First, a novel Random Stain Normalization and Augmentation (RandStainNA) method is proposed to bridge stain normalization and stain augmentation, consequently, images can be augmented with more realistic stain styles. Second, a random color space selection scheme is introduced to extend the target scope to various color spaces including HED, HSV, and LAB, to increase flexibility and produce an extra augmentation. The evaluation tasks include tissue classification and nuclei segmentation, and both show our method can consistently improve the performance with a variety of network architectures.
2 Methodology
Method Overview. Random Stain Normalization and Augmentation (RandStainNA) is a hybrid framework designed to fuse stain normalization and stain augmentation to generate more realistic stain variations. It incorporates randomness to SN by automatically sorting out a random virtual template from pre-estimated stain style distributions. More specifically, from the perception of SN’s viewpoint, stain styles ‘visible’ to the deep neural network are enriched in the training stage. Meanwhile, from the perception from the SA’s viewpoint, RandStainNA imposes a restriction on the distortion range and consequently, only a constrained practicable range is ‘visible’ to CNN. The framework is a general strategy and task agonist, as depicted in Fig. .
Stain Style Creation and Characterization. Unlike the formulation of comprehensive color styles of nature images, stain style of histology remains to be a vague concept, which is primarily based on visual examination, restricting to obtain a precise objective for alleviating the stain style variation [16, 24]. To narrow this gap, our work first qualitatively defines the stain style covering six parameter, namely the average and standard deviation of each channel in LAB color space [16]. We pick up LAB space for its notable capability to represent heterogeneous styles in medical images [16]. Novelly, we first transfer all histology slides in the training set from RGB space to LAB color space. Then the stain style of image \(\textbf{x}_i\) are depicted by \(\textbf{A}_i=[a^{(l)}_i,a^{(a)}_i,a^{(b)}_i]\in \mathbb {R}^3\) and \(\textbf{D}_i=[d^{(l)}_i,d^{(a)}_i,d^{(b)}_i]\in \mathbb {R}^3\), where \(a^{(c)}_i,d^{(c)}_i\) are the average value and standard deviation of each channel \(c\in \{l,a,b\}\) in image \(\textbf{x}_i\), as shown in the pre-processing stage block in Fig. .
Virtual Stain Normalization Template. In routine stain normalization approaches [16, 24], a source image is normalized to a pre-selected template image by aligning the average of \(\textbf{A}_s\) and standard deviation \(\textbf{D}_s\) of pixel values to the template \(\textbf{A}_t\) and \(\textbf{D}_t\). Thus, it is sufficient to formulate a template image with \([\textbf{A}_t,\textbf{D}_t]\). In the proposed RandStainNA, we expand the uniformly shared one-template mode to a board randomly generated virtual templates \([\textbf{A}_v,\textbf{D}_v]\) scheme. To be more specific, iteratively, random \(\textbf{A}_v\) is sampled from distribution \(F_\textbf{A}\), and likewise \(\textbf{D}_v\) is picked out from the other distribution \(F_\textbf{D}\), which are jointly used as the target for every training sample normalization. Empirical results yield that eventual performance are robust to a wide range of distribution types of \(F_\textbf{A}\) and \(F_\textbf{D}\), such as Gaussian and t-distribution. In the rest of this section, we simply leverage Gaussian distribution as the estimation i.e. setting \(F_\textbf{A} = \mathcal {N}(\textbf{M}_A,\mathbf {\Sigma }_A)\), \(F_\textbf{D} = \mathcal {N}(\textbf{M}_D,\mathbf {\Sigma }_D)\), where \(\mathcal {N}(\textbf{M},\mathbf {\Sigma })\) writes for the Gaussian distribution with expectation \(\textbf{M}\) and covariance matrix \(\mathbf {\Sigma }\). Notably, due to the orthogonality of channels, \(\mathbf {\Sigma }\) is a diagonal matrix i.e. \(\mathbf {\Sigma } = {\text {diag}}(\sigma _1^2,\sigma _2^2,\sigma _3^2)\) for some \(\sigma _j\) with \(j=1,2,3\).
Statistics Parameters Estimation for Virtual Template Generation. The estimation of statistical parameters of \(\textbf{M}_A,\mathbf {\Sigma }_A,\textbf{M}_D,\mathbf {\Sigma }_D\) are afterwards applied to the formation of stain style discussed above. A proper candidate is attributed to the sample channel mean values of all the training images for \(\textbf{M}_A \) and \(\textbf{M}_D\), as well as the standard deviations of samples for \(\mathbf {\Sigma }_A \) and \( \mathbf {\Sigma }_D\), based on the average value and standard deviation of the whole training set. However, two defects turn out in this discipline that one is the inefficiency to transverse the whole set, and the other is the special cases of infeasibility e.g. federated learning or lifelong learning. Therefore, we provide a more computation-efficient alternative, by randomly curating a small number of patches from the training set and applying their sample mean and standard deviation as \(\textbf{M}_A,\mathbf {\Sigma }_A,\textbf{M}_D,\mathbf {\Sigma }_D\). The empirical results suggest it can achieve competitive performance.
Image-Wise Normalization with Random Virtual Template. After transferring image \(\textbf{x}_i\) from RGB into LAB space, we write the pixel value as [l, a, b]. We denote the average and standard deviation (std) for each channel of this image as \(\textbf{A}_i=[a^{(l)}_i,a^{(a)}_i,a^{(b)}_i]\) and \(\textbf{D}_i=[d^{(l)}_i,d^{(a)}_i,d^{(b)}_i]\), and the generated random virtual template associated to \(\textbf{x}_i\) from \(F_\textbf{A},F_\textbf{D}\) as \(\textbf{A}_v=[a^{(l)}_v,a^{(a)}_v,a^{(b)}_v]\) and \(\textbf{D}_i=[d^{(l)}_v,d^{(a)}_v,d^{(b)}_v]\). Then the image-wise normalization based on random virtual template is formulated as
Then, we transfer \([l^\prime ,a^\prime ,b^\prime ]\) from LAB back to RGB space. Notably, we generate different virtual templates for images that vary at every epoch during the training stage. Therefore, RandStainNA can largely increase the data variations with the on-the-fly generate virtual templates.
Determine Random Color Space for Augmentation. By the computation of the stain style parameters \([\textbf{M}, \mathbf {\Sigma }]\) of distinct color spaces, we can derive their associated \(F_\textbf{A}\) and \(F_\textbf{D}\). Afterwards, we extend our RandStainNA from LAB to other color spaces e.g. HED, HSV. This extension allows the proposal of a random color space selection scheme, which will further strengthen the regularization effect. The candidate pool comprises three widely-used color spaces in the domain of histology, i.e. HED, HSV, LAB. Training iteratively, an initial color space \(\mathcal {S}\) is an arbitrary decision with equal probability i.e. \(p=\frac{1}{3}\), or with manually-assigned values depending of the performance of each independent space. Subsequently, a virtual template is assigned to associate with \(\mathcal {S}\) to perform image-wise stain normalization.
3 Experiments
Dataset and Evaluation Metrics. We evaluate our proposed RandStainNA on two image analysis tasks i.e. classification and segmentation. Regarding the patch-level classification task, we use a widely-used histology public dataset NCT-CRC-HE-100K-NONORM for training and validation, with the addition of the CRC-VAL-HE-7K dataset for external testing [9]. These two sets comprise a number of 100,000 and 7180 histological patches respectively, from colorectal cancer patients of multiple data centers with heterogeneous stain styles. We randomly pick up 80% from NCT-CRC-HE-100K-NONORM for training and the rest 20% for validation. The original dataset covers nine categories, but for the category of background, we can straightforwardly identify them in pre-processing stage with OTSU algorithm [15] and thus it is removed in our experiment for a more reliable result. The top-1 classification accuracy (%) is used as the metric for the 8-category classification task. For the nuclei segmentation task, we use a small public dataset of MoNuSeg [12], with Dice and IoU as the metrics.
Network Architecture and Settings. In the classification task, we employ six backbone architectures to perform the evaluations, namely the ResNet-18 [7], ResNet-50 [7], MobileNetV3-Small [8], EfficientNetB0 [19], ViT-Tiny [3] and SwinTransformer-Tiny [13]. These networks, including CNN and transformers, may represent a wide range of network capabilities, which effectively demonstrate the adaptability of our method in different settings. In the nuclei segmentation task, we use CIA-Net as the backbone [26] for its notable performance in small set processing. We use a consistent training scheme for distinct networks for performance comparison with stain augmentation and stain normalization methods. Detailed training schemes and hyper-parameter settings are shown in the supplementary material. We perform 3 random runs and compute the average for each experiments.
Compared Methods. All models are trained with morphology augmentation, namely the random vertical and horizontal flip. In both evaluation tasks, we compare our method with existing stain normalization [16] and stain augmentation [21, 22] approaches performed in the three color spaces i.e. HED, HSV, LAB. Regarding the stain augmentation in HED, we employ a multiplication rule [22] that adds noise to each channel i.e. \(p^\prime = p * \varepsilon _1 + \varepsilon _2\), where \(p^\prime \) is the augmented pixel value and p is the original pixel value, and \(\varepsilon _1 \) and \(\varepsilon _2\) are uniform random noises, termed as stain augmentation scheme #1 (SA1). For the SA in HSV, we adopt an addition rule i.e. \(p^\prime = p+ p * \varepsilon \) [22], termed as stain augmentation scheme #2 (SA2). We integrate the above two schemes for LAB stain augmentation, due to an absence of literature works for SA in LAB. We also configure two augmentation settings according to different degrees of distortion i.e. range of random noise, denoted as light (L) and strong (S) [21, 22]. To fully retain recognizable morphological features, we do not take GAN-related approaches for comparison.
Results. Our method can consistently improve the baseline performance of the six backbone architectures in terms of test accuracy, with the implementation in three color spaces, as shown in Table . Therefore, it yields the effectiveness of RandStainNA. The hybrid architecture can outperform a sole deployment of either SN or SA. With the random color space selection scheme (denoted as ‘full’), the RandStainNA achieves further performance improvement. To demonstrate the effects achieved with different approaches straightforwardly, we visualize the original raw images with stain variations, SN images, SA images, and images processed with our RandStainNA in Fig. . In the visualization graph, we use the results from SA and SN performed in HSV space as an example, which shows very similar outcomes in LAB and HED spaces. As shown, the SN unifies stain styles into a shared template that may leave out many useful features [22], and the SA may generate unrealistic images. In contrast, our method generates much more realistic images to reorganize by both human and deep learning techniques. The Fig. provides the UMAP embedding of stain style parameters of \([\textbf{M},\mathbf {\Sigma }]\) in the associated solutions. The nuclei segmentation results are listed in the Table . For SN and SA in each color space, we pick up one configuration with higher performance in the classification task. Our method also achieves the best performance to demonstrate its effectiveness in various downstream tasks.
Ablation Study. The ablation study is performed on the classification task. First, we test the effect of the distribution style of \(F_\textbf{A},F_\textbf{D}\). The test accuracy is 93.98, 93.90, 93.04, 92.48 for Gaussian, t-distribution, uniform, and Laplace respectively. The effect of sample numbers to compute the sample mean and the standard deviation is also evaluated. The test accuracies are 93.42, 93.29, 94.08, 93.98 for the cases of computing the averages and standard deviations with 10 images per category, 100 images per category, 1000 images per category, and the whole training set respectively, which yields the robustness to \(\textbf{M}\) and \(\mathbf {\Sigma }\).
4 Conclusion
The proposed RandStainNA framework aims to cope with the inevitable stain variance problem for clinical pathology image analysis. Leveraging the advantages of both stain normalization and stain augmentation, the proposed framework produces more realistic stain variations to train stain agnostic DL models. Additionally, RandStainNA is straightforward practically and efficient when applied as an on-the-fly augmentation technique, in comparison with most current GANs. Moreover, the result shows the feasibility to train robust downstream classification and segmentation networks on various architectures. One future direction of our current works is the expansion of color spaces, e.g. YUV, YCbCr, YPbPr, YIQ, XYZ [4], to further improve the generalization ability.
References
Becht, E., et al.: Dimensionality reduction for visualizing single-cell data using umap. Nat. Biotechnol. 37(1), 38–44 (2019)
Ciompi, F., et al.: The importance of stain normalization in colorectal tissue classification with convolutional networks. In: 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), pp. 160–163. IEEE (2017)
Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Gowda, S.N., Yuan, C.: ColorNet: Investigating the importance of color spaces for image classification. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11364, pp. 581–596. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20870-7_36
Gupta, V., Singh, A., Sharma, K., Bhavsar, A.: Automated classification for breast cancer histopathology images: Is stain normalization important? In: Cardoso, M.J., et al. (eds.) CARE/CLIP -2017. LNCS, vol. 10550, pp. 160–169. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67543-5_16
Gurcan, M.N., et al.: Histopathological image analysis: A review. IEEE Rev. Biomed. Eng. 2, 147–171 (2009)
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Howard, A., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Kather, J.N., et al.: Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study. PLoS Med. 16(1), e1002730 (2019)
Ke, J., et al.: Style normalization in histology with federated learning. In: 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), pp. 953–956. IEEE (2021)
Khan, A.M., et al.: A nonlinear mapping approach to stain normalization in digital histopathology images using image-specific color deconvolution. IEEE Trans. Biomed. Eng. 61(6), 1729–1738 (2014)
Kumar, N., et al.: A multi-organ nucleus segmentation challenge. IEEE Trans. Med. Imaging 39(5), 1380–1391 (2019)
Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Nadeem, S., Hollmann, T., Tannenbaum, A.: Multimarginal Wasserstein Barycenter for stain normalization and augmentation. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12265, pp. 362–371. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59722-1_35
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Reinhard, E., et al.: Color transfer between images. IEEE Comput. Graphics Appl. 21(5), 34–41 (2001)
Salehi, P., et al.: Pix2pix-based stain-to-stain translation: a solution for robust stain normalization in histopathology images analysis. In: 2020 International Conference on Machine Vision and Image Processing (MVIP), pp. 1–7. IEEE (2020)
Shaban, M.T., et al.: Staingan: Stain style transfer for digital histological images. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (Isbi 2019), pp. 953–956. IEEE (2019)
Tan, M., et al.: Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
Tellez, D., et al.: H and E stain augmentation improves generalization of convolutional networks for histopathological mitosis detection. In: Medical Imaging 2018: Digital Pathology, vol. 10581, p. 105810Z. International Society for Optics and Photonics (2018)
Tellez, D., et al.: Whole-slide mitosis detection in h &e breast histology using phh3 as a reference to train distilled stain-invariant convolutional networks. IEEE Trans. Med. Imaging 37(9), 2126–2136 (2018)
Tellez, D., et al.: Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology. Med. Image Anal. 58, 101544 (2019)
Wagner, S.J., et al.: Structure-preserving multi-domain stain color augmentation using style-transfer with disentangled representations. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12908, pp. 257–266. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87237-3_25
Wang, Y.Y., et al.: A color-based approach for automated segmentation in tumor tissue classification. In: 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 6576–6579. IEEE (2007)
Zanjani, F.G., et al.: Stain normalization of histopathology images using generative adversarial networks. In: 2018 IEEE 15th International symposium on biomedical imaging (ISBI 2018), pp. 573–577. IEEE (2018)
Zhou, Y., Onder, O.F., Dou, Q., Tsougenis, E., Chen, H., Heng, P.-A.: CIA-Net: robust nuclei instance segmentation with contour-aware information aggregation. In: Chung, A.C.S., Gee, J.C., Yushkevich, P.A., Bao, S. (eds.) IPMI 2019. LNCS, vol. 11492, pp. 682–693. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20351-1_53
Acknowledgements
This work has been supported by NSFC grants 62102247.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Shen, Y., Luo, Y., Shen, D., Ke, J. (2022). RandStainNA: Learning Stain-Agnostic Features from Histology Slides by Bridging Stain Augmentation and Normalization. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13432. Springer, Cham. https://doi.org/10.1007/978-3-031-16434-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-16434-7_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16433-0
Online ISBN: 978-3-031-16434-7
eBook Packages: Computer ScienceComputer Science (R0)