Abstract
Stain color variations between Whole Slide Images (WSIs) is a key challenge in the application of Computational Histopathology. Deep learning-based algorithms are susceptible to domain shift and degrade in performance on the WSIs captured from a different source than the training data due to stain color variations. We propose a training methodology Stain-AgLr, that achieves high invariance to stain color changes on unseen test data. In addition to task loss, Stain-AgLr training is supervised with a consistency regularization loss that enforces consistent predictions for training samples and their stain altered versions. An additional decoder is used to regenerate stain color from feature representation of the stain altered images. We compare the proposed approach to state-of-the-art strategies using two histopathology datasets and show significant improvement in model performance on unseen stain variations. We also visualize the feature space distribution of test samples from multiple diagnostic labs and show that Stain-AgLr achieves a significant overlap between the distributions.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
Deep Learning based computational histopathology has demonstrated the ability to enhance patient healthcare quality by automating the time-consuming and expensive task of analyzing high-resolution WSIs [4, 9]. The analysis involves identifying tissue or cellular level morphological features highlighted by staining dyes like Hematoxylin and Eosin (H &E). The stain color distribution of a WSI depends upon factors including the process of tissue preparation, dye manufacturer, and scanning equipment. As a result, there exists a high variability in the appearance of histopathology images, as seen in Fig. 1. Stain variations may appear between WSIs scanned at different centers as well as within the same center [2]. Often the model training data is obtained from single lab, but the model is deployed across multiple other labs. This domain shift hampers the performance of deep learning models on out-of-distribution samples [11, 22, 29], as seen in Fig. 2a.
Related Work: Existing solutions reduce the effect of variation in stain distribution by normalizing stain color or by improving the model’s generalization on the test set. Traditional stain normalization approaches [17, 26] normalize the target domain images by matching their color distribution to a single reference image from the source domain. Recently, generative deep CNN [3, 8, 13, 19, 20] have been trained to perform image-to-image domain transfer, learning the color distribution from the entire set of source domain images. However, the regenerated images may display undesired artifacts [3, 7], leading to misdiagnosis. Furthermore, stain normalization significantly adds to the computation cost as each image needs to be pre-processed. Model generalization to stain variations can also be improved by learning stain agnostic features using Domain Adversarial training [12, 15] or by using color augmentation to simulate variations in stain [6, 10, 25, 27, 28]. These methods provide better generalization without the need for a dedicated normalization network during inference.
Many state-of-the-art unsupervised approaches [3, 6, 16, 27, 28, 31] rely on unlabeled images from the test set to generalize the model to a target distribution. [6, 27, 28] adapt the training data image stain based on samples from test lab data to use as augmentations, [3, 20] require images from target domain to learn a stain normalization model, where as [16, 31] use semi-supervised models for unsupervised domain adaptation, learning from unlabelled target domain images. Similarly, [31] showed that semi-supervised learning methods like [21, 23, 30] can learn from a consistency loss between real and a noisy version of the unlabeled target domain images. However, computational histopathology models need to be invariant to unseen intra-lab as well as inter-lab stain variations for application in patient care. In this work, we propose a novel strategy that learns stain invariant features without requiring any knowledge of the test data distribution. The goal of our approach is to learn a feature space that has a high overlap between the distribution of training and unseen test set with stain variations.
Our Contributions: Stain Invariance is induced with a two pronged strategy, as shown in Fig. 3. We use a stain altered version of an image to mimick test samples from a different lab, and impose consistency between the prediction of the raw image and its stain altered version, by penalizing their relative entropy. The consistency regularization loss enforces similar feature space representation for differently stained version of the same images, that is, the model learns that the difference in stain color has no bearing on the prediction task. In parallel, a generator network is tasked to regenerate the original stain from feature space representations of the stain altered image, that is perform stain normalization as an auxiliary task. We show that the two tasks are complementary and facilitate the model to learn features invariant to stain variations. During inference, only the underlying model for tissue analysis is used, without adding any computational overhead.
We compare the proposed method with state-of-the-art stain normalization as well as stain augmentation methods. We show that Stain-AgLr achieves better generalization on unseen stain variations based on evaluation on two histopathology datasets. The increased stain invariance is a result of high overlap between train and test domain data in feature space produced by Stain-AgLr. To the best of our knowledge, our work is the first to employ stain normalization as an auxiliary task rather than a preprocessing step and show that it leads to improved generalization on unseen test data with stain variations. Furthermore, the inference time corresponds to that of only the classification network which is significantly lower compared to stain normalization methods.
1 Method
We train a classification network that shares a feature extractor with a stain regeneration network. In addition to Cross-Entropy Loss, the network is supervised by two loss functions - Consistency Regularization Loss and Stain Regeneration Loss.
1.1 Model Architecture
Let \(M_{ft}\), \(M_{cls}\) & \(M_{gen}\) represent the feature extractor, the classifier, and the generator respectively. Together, the networks \(M_{ft}\) and \(M_{gen}\) constitute a stain regeneration network that learns the mapping to regenerate the stain color distribution of the training images from stain altered images. On the other hand, the network \(M_{ft}\), in conjunction with \(M_{cls}\), classifies the input image into a set of task specific classes. Global average pooling (GAP) and Dropout (50%) is applied on the output of the \(M_{ft}\), which generates a feature vector that is the input to \(M_{cls}\). During inference, only the classification network is used, with layers particular to the stain regeneration network removed. We follow CNN architecture provided by [10, 25]. Details are provided in Supplementary Material.
1.2 HED Jitter
We employ HED jitter to generate stain altered histopathology images [24, 25]. This results in samples that resemble data from different sources. As shown in Fig. 3, the model is fed with both raw and HED jittered images in a single batch. The altered image is utilized to learn stain invariant features by matching logits of the altered image to the original image, as well as train the stain regeneration network which regenerates the raw image. We use HED-light [25] configuration for Stain-AgLr with default parameters, including morphological and brightness-contrast augmentations.
1.3 Loss Functions
Two loss functions - Consistency Regularization and Stain Regeneration are used to train the Stain-AgLr model, in addition to task specific Cross-Entropy Loss. We guide the model to produce similar predictions for a raw image and its stain altered version using the Consistency Regularization Loss (\(L_{Cons}\)). Specifically, we minimize the divergence \(D(P_\theta (y|x) || P_\theta (y|x, \epsilon )) \), where D is Kullback-Leibler (KL) divergence loss, y is the ground truth label corresponding to input x and \(\epsilon \) represents stain color noise. This enforces the model to be insensitive to stain color noise.
The features of the altered image from \(M_{ft}\) are passed to the \(M_{gen}\) which generates a stain color to match the raw image. We use MSE loss as the Stain Regeneration Loss (\(L_{Reg}\)) between the raw image and the regenerated image. This auxiliary task helps the model improve generalization on images with stain variations, by learning shared features useful for both classification and regeneration tasks. As a result, a combination of three loss functions is used to train the model.
where \(\lambda _1\) and \(\lambda _2\) are weights, x and \(\hat{x}\) are the raw and stain altered images.
2 Experiments
2.1 Setup
We evaluate Stain-AgLr using two publically available datasets - TUPAC Mitosis Detection and Camelyon17 Tumor Metastasis Detection. Both datasets segregate images based on the lab of origin. This allows models to be trained on data from a single lab, while unseen data samples from other labs are utilized to test the model’s robustness to stain variations.
Camelyon17. [1]. The dataset contains H &E stained WSI of sentinel lymph nodes from five different medical centers: In our experiments, 10 WSI are used from each center for which annotation masks are available. 95500 (256\(\,\times \,\)256 size) patches were created at 40x magnification, of which 48300 represent metastasis. Patches from RUMC were used for training and validation, remaining centers (CWZ, UMCU, RST, LPON) are used as test sets.
TUPAC Mitosis Detection. [5]. The dataset consists of 73 breast cancer cases from three pathology centers. The first 23 cases were obtained from a single center, whereas cases 24–48 and 48–73 were collected from two other centers. We use binary labels provided by [5], which comprises of 1,898 Mitotic figures and 5,340 Hard Negative patches of size 128\(\,\times \,\)128. For training and validation, we use samples from the first 23 cases and separately report performance on the other two subsets.
Training Setup: We use an initial learning rate of 5e-3 for TUPAC dataset and 1e-2 for Camelyon dataset, obtained using a grid search. In the case of TUPAC, we sample an equal number of images per batch for both classes to mitigate the effect of class imbalance. All models are trained using the Adam optimizer with a batch size of 64, reducing the learning rate by a factor of 0.1 if the validation loss does not improve for 4 epochs for Camelyon and 15 epochs for TUPAC. The training was stopped when the learning rate dropped to 1e-5. For each run, we select the model weights corresponding to the model with the lowest validation loss. We found re-weighing factors \(\lambda _1=0.1\) & \(\lambda _2=10\) in multi-task loss (Eq. 1) gave the best performance. Geometric augmentations including random rotation in multiples of 90\(^\circ \), horizontal and vertical flipping were employed for all models. HED-Light Augmentation is used with default parameters as described in [25], including morphological and brightness contrast augmentation. Models were trained using NVIDIA Tesla A100 GPUs using PyTorch Library.
2.2 Evaluation
We conduct experiments to compare our proposed method with stain normalization as well as color augmentation techniques for improving model generalisability on unseen test data. Tables 1 and 2 report AUC scores on test data obtained from multiple labs with differing stain color distributions, along with standard deviations. All experiments are repeated ten times, with different random seeds.
Vanilla model does not use any stain normalization or stain augmentation. Vahadane [26], STST GAN[19] and DSCSI GAN[13] represent classifier performance on normalized images using the corresponding stain normalization method. Both [13, 19] do not require samples from target domain during training the GAN. HED-Light Aug represents a model trained with HED-Light augmentation. Lastly, Stain-AgLr represents the proposed approach.
3 Discussion
Classifiers trained on data from one lab show poor performance on data from other labs. The performance degradation depends upon the deviation of stain color distribution from the training set. All stain normalization methods improve the classifier performance, thus inducing invariance to stain color changes in the downstream classification model. Both GAN-based approaches [13, 19] provide better stain normalization, learning stain color distribution from the entire training set unlike [26] that uses a single reference image from the training set. We observe that, a classifier trained with HED-light augmentation matches or out performs deep learning-based stain normalization approaches, as also reported by [22, 25]. This indicates that the vanilla model overfits the color stain information from a single lab, which is alleviated by use of color augmentation.
Impact of Consistency Regularization and Stain Regeneration Loss. Stain-AgLr outperforms models trained using stain normalization algorithms as well as HED-light augmentation. The enhancement in stain invariance is contributed by both Consistency Regularization as well as the Stain Regeneration Loss. Using the loss individually improves model performance over model trained with HED-Light augmentation, however, best results are obtained by the network trained using both the loss functions together. This demonstrates that the two tasks are complementary to one another for learning stain invariant features. The multi-task network learns shared representation that is less likely to over-fit on the noise in the form of stain color.
Stain invariance of Stain-AgLr can be further established by analyzing the distribution of validation set and test data in feature space using UMAP plots, visualized in Fig. 2. For the test set data, the plots show significantly better class separation produced by the proposed Stain-AgLr model as compared to the Vanilla model. Importantly, the distribution of class samples from the test set corresponds with respective classes from validation data. In other words, the feature space produced by proposed Stain-AgLr model shows a high overlap between the validation and test data distributions. This verifies that quantitative performance gain is obtained by Stain-AgLr learning stain invariant features.
All stain normalization approaches significantly increase the base classifier’s inference time significantly, as seen in Table 2. Although Stain-AgLr employs additional convolution layers during training, the model’s inference time is identical to that of a vanilla classifier. A higher throughput is beneficial in reducing turnaround time in patient diagnostics, especially when processing high resolution Histopathology WSIs, as well as reducing computational requirement for deployment in diagnostics laboratory setup. Thus, the proposed approach combines the best of both worlds: improved stain invariance and fast inference.
4 Conclusion
Invariance to stain color variations in histopathology images is essential for the effective deployment of computational models. We present a novel technique - Stain-AgLr, which learns stain invariant features that lead to improved performance on images from different labs. We also show that Stain-AgLr results in a high overlap between feature space distributions of images with varying H &E staining. Unlike many state-of-the-art techniques, Stain-AgLr does not require unlabelled images from the test data as well as does not add any computational burden during inference.
References
Bandi, P., et al.: From detection of individual metastases to classification of lymph node status at the patient level: the camelyon17 challenge. IEEE Trans. Med. Imaging 38(2), 550–560 (2018)
Bejnordi, B.E., et al.: Stain specific standardization of whole-slide histopathological images. IEEE Trans. Med. Imaging 35(2), 404–415 (2015)
de Bel, T., Bokhorst, J.M., van der Laak, J., Litjens, G.: Residual cyclegan for robust domain transformation of histopathological tissue slides. Med. Image Anal. 70, 102004 (2021)
Bera, K., Schalper, K.A., Rimm, D.L., Velcheti, V., Madabhushi, A.: Artificial intelligence in digital pathology-new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 16(11), 703–715 (2019)
Bertram, C.A., et al.: Are pathologist-defined labels reproducible? comparison of the tupac16 mitotic figure dataset with an alternative set of labels. In: Cardoso, J., et al. (eds.) IMIMIC/MIL3ID/LABELS -2020. LNCS, vol. 12446, pp. 204–213. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61166-8_22
Chang, J.-R., et al.: Stain mix-up: Unsupervised domain generalization for histopathology images. In: de Chang, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12903, pp. 117–126. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87199-4_11
Cohen, J.P., Luck, M., Honari, S.: Distribution matching losses can hallucinate features in medical image translation. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11070, pp. 529–536. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00928-1_60
Cong, C., Liu, S., Di Ieva, A., Pagnucco, M., Berkovsky, S., Song, Y.: Semi-supervised adversarial learning for stain normalisation in histopathology images. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12908, pp. 581–591. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87237-3_56
Cui, M., Zhang, D.Y.: Artificial intelligence and computational pathology. Lab. Invest. 101(4), 412–422 (2021)
Faryna, K., van der Laak, J., Litjens, G.: Tailoring automated data augmentation to h &e-stained histopathology. In: Medical Imaging with Deep Learning (2021)
Koh, P.W., et al.: Wilds: a benchmark of in-the-wild distribution shifts. In: International Conference on Machine Learning, pp. 5637–5664. PMLR (2021)
Lafarge, M.W., Pluim, J.P.W., Eppenhof, K.A.J., Moeskops, P., Veta, M.: Domain-adversarial neural networks to address the appearance variability of histopathology images. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 83–91. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_10
Liang, H., Plataniotis, K.N., Li, X.: Stain style transfer of histopathology images via structure-preserved generative learning. In: Deeba, F., Johnson, P., Würfl, T., Ye, J.C. (eds.) MLMIR 2020. LNCS, vol. 12450, pp. 153–162. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61598-7_15
Macenko, M., et al.: A method for normalizing histology slides for quantitative analysis. In: 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 1107–1110. IEEE (2009)
Marini, N., Atzori, M., Otálora, S., Marchand-Maillet, S., Muller, H.: H &e-adversarial network: a convolutional neural network to learn stain-invariant features through hematoxylin & eosin regression. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 601–610 (2021)
Melas-Kyriazi, L., Manrai, A.K.: Pixmatch: Unsupervised domain adaptation via pixelwise consistency training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12435–12445 (2021)
Reinhard, E., Adhikhmin, M., Gooch, B., Shirley, P.: Color transfer between images. IEEE Comput. Graphics Appl. 21(5), 34–41 (2001)
Sainburg, T., McInnes, L., Gentner, T.Q.: Parametric umap: learning embeddings with deep neural networks for representation and semi-supervised learning. ArXiv e-prints (2020)
Salehi, P., Chalechale, A.: Pix2pix-based stain-to-stain translation: A solution for robust stain normalization in histopathology images analysis. In: 2020 International Conference on Machine Vision and Image Processing (MVIP), pp. 1–7. IEEE (2020)
Shaban, M.T., Baur, C., Navab, N., Albarqouni, S.: Staingan: Stain style transfer for digital histological images. In: 2019 IEEE 16th international symposium on biomedical imaging (Isbi 2019), pp. 953–956. IEEE (2019)
Sohn, K., et al.: Fixmatch: simplifying semi-supervised learning with consistency and confidence. Adv. Neural. Inf. Process. Syst. 33, 596–608 (2020)
Stacke, K., Eilertsen, G., Unger, J., Lundström, C.: A closer look at domain shift for deep learning in histopathology. arXiv preprint arXiv:1909.11575 (2019)
Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Tellez, D., et al.: Whole-slide mitosis detection in h &e breast histology using phh3 as a reference to train distilled stain-invariant convolutional networks. IEEE Trans. Med. Imaging 37(9), 2126–2136 (2018)
Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology. Med. Image Anal. 58, 101544 (2019)
Vahadane, A., et al.: Structure-preserving color normalization and sparse stain separation for histological images. IEEE Trans. Med. Imaging 35(8), 1962–1971 (2016)
Vasiljević, J., Feuerhake, F., Wemmert, C., Lampert, T.: Towards histopathological stain invariance by unsupervised domain augmentation using generative adversarial networks. Neurocomputing 460, 277–291 (2021)
Wagner, S.J., et al.: Structure-Preserving Multi-domain Stain Color Augmentation Using Style-Transfer with Disentangled Representations. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12908, pp. 257–266. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87237-3_25
Wang, M., Deng, W.: Deep visual domain adaptation: a survey. Neurocomputing 312, 135–153 (2018)
Xie, Q., Dai, Z., Hovy, E., Luong, T., Le, Q.: Unsupervised data augmentation for consistency training. Adv. Neural. Inf. Process. Syst. 33, 6256–6268 (2020)
Zhang, Y., Zhang, H., Deng, B., Li, S., Jia, K., Zhang, L.: Semi-supervised models are strong unsupervised domain adaptation learners. arXiv preprint arXiv:2106.00417 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Raipuria, G., Shrivastava, A., Singhal, N. (2022). Stain-AgLr: Stain Agnostic Learning for Computational Histopathology Using Domain Consistency and Stain Regeneration Loss. In: Kamnitsas, K., et al. Domain Adaptation and Representation Transfer. DART 2022. Lecture Notes in Computer Science, vol 13542. Springer, Cham. https://doi.org/10.1007/978-3-031-16852-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-16852-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16851-2
Online ISBN: 978-3-031-16852-9
eBook Packages: Computer ScienceComputer Science (R0)