Keywords

1 Introduction

Pathology visually exams across a diverse range of tissue types obtained by biopsy or surgical procedure under microscopes [6]. Stains are often applied to reveal underlying patterns to increase the contrast between nuclear components and their surrounding tissues [14]. Nevertheless, the substantial variance in each staining manipulation, e.g. staining protocols, staining scanners, manufacturers, batches of staining may eventually result in a variety of hue [10]. In contrast to pathologists who have adapted themselves to these variations with years’ training, deep learning (DL) methods are prone to suffer from performance degradation, with the existence of inter-center stain heterogeneity [2]. Specifically, as color is a salient feature to extract for by deep neural networks, consequently, current successful applications for whole slide images (WSIs) diagnoses are subject to their robustness to color shift among different data centers [5]. There are two primary directions to reduce the generalization error, namely stain normalization and stain augmentation [22].

Fig. 1.
figure 1

The overall framework of the proposed RandStainNA.

Stain normalization (SN) aims to reduce the variation by aligning the stain-color distribution of source images to a target template image [11, 17, 24]. Empirical studies regard stain normalization as an essential prerequisite of downstream applications [2, 12, 22]. Yet, the capability to pinpoint a representative template image for SN relies heavily on domain prior knowledges. Moreover, in real-world settings such as federated learning, the template-image selection is not feasible due to the privacy regularizations [10], as source images are inaccessible to the central processor as a rule. Some generative adversarial networks (GANs) are proposed recently [17, 18] for SN, yet remaining the phenotype recognizability is always problematic. A salient drawback of the sole stain style in SN is the restricted color-correlated features can be mined by deep neural networks. Stain augmentation (SA) seeks a converse direction to SN by simulating stain variations while preserving morphological features intact [21]. Tellz et al. [20] first tailored data augmentations from RGB color space to H &E color space. Afterward, in parallel with SN approaches, GAN is also widely adopted by stain augmentation applications e.g. HistAuGAN [23].

Fig. 2.
figure 2

The overall pipeline of the proposed RandStainNA that fuses stain normalization and stain augmentation. Prior to the training stage, random virtual template generation functions are defined i.e.  \(F_\textbf{M}^\mathcal {S}=\mathcal {N}(\textbf{M}_A^\mathcal {S},\mathbf {\Sigma }_A^\mathcal {S})\) and \(F_\textbf{D}^\mathcal {S}=\mathcal {N}(\textbf{M}_D^\mathcal {S},\mathbf {\Sigma }_D^\mathcal {S})\). The three-step training stage comprises a random selection of color space \(\mathcal {S}\), the generated of an associated random stain style template \([\textbf{M}_v^\mathcal {S},\mathbf {\Sigma }_v^\mathcal {S}]\), and the normalization of a batch with the generated virtual template. Our approach is downstream task agnostic.

Previous works have compared the performance between SN and SA without interpretation of their differences [22]. Moreover, we have observed that the mathematical formulations of SN are coincidental with SA, where the transfer of SN depends on a prior Dirichlet distribution [25] and SA distorts images with a uniform distribution [20], depicted in Fig. . Hence, we make the first attempt to unify SN and SA for histology image analysis. Two primary contributions are summarized. First, a novel Random Stain Normalization and Augmentation (RandStainNA) method is proposed to bridge stain normalization and stain augmentation, consequently, images can be augmented with more realistic stain styles. Second, a random color space selection scheme is introduced to extend the target scope to various color spaces including HED, HSV, and LAB, to increase flexibility and produce an extra augmentation. The evaluation tasks include tissue classification and nuclei segmentation, and both show our method can consistently improve the performance with a variety of network architectures.

2 Methodology

Method Overview. Random Stain Normalization and Augmentation (RandStainNA) is a hybrid framework designed to fuse stain normalization and stain augmentation to generate more realistic stain variations. It incorporates randomness to SN by automatically sorting out a random virtual template from pre-estimated stain style distributions. More specifically, from the perception of SN’s viewpoint, stain styles ‘visible’ to the deep neural network are enriched in the training stage. Meanwhile, from the perception from the SA’s viewpoint, RandStainNA imposes a restriction on the distortion range and consequently, only a constrained practicable range is ‘visible’ to CNN. The framework is a general strategy and task agonist, as depicted in Fig. .

Stain Style Creation and Characterization. Unlike the formulation of comprehensive color styles of nature images, stain style of histology remains to be a vague concept, which is primarily based on visual examination, restricting to obtain a precise objective for alleviating the stain style variation [16, 24]. To narrow this gap, our work first qualitatively defines the stain style covering six parameter, namely the average and standard deviation of each channel in LAB color space [16]. We pick up LAB space for its notable capability to represent heterogeneous styles in medical images [16]. Novelly, we first transfer all histology slides in the training set from RGB space to LAB color space. Then the stain style of image \(\textbf{x}_i\) are depicted by \(\textbf{A}_i=[a^{(l)}_i,a^{(a)}_i,a^{(b)}_i]\in \mathbb {R}^3\) and \(\textbf{D}_i=[d^{(l)}_i,d^{(a)}_i,d^{(b)}_i]\in \mathbb {R}^3\), where \(a^{(c)}_i,d^{(c)}_i\) are the average value and standard deviation of each channel \(c\in \{l,a,b\}\) in image \(\textbf{x}_i\), as shown in the pre-processing stage block in Fig. .

Virtual Stain Normalization Template. In routine stain normalization approaches [16, 24], a source image is normalized to a pre-selected template image by aligning the average of \(\textbf{A}_s\) and standard deviation \(\textbf{D}_s\) of pixel values to the template \(\textbf{A}_t\) and \(\textbf{D}_t\). Thus, it is sufficient to formulate a template image with \([\textbf{A}_t,\textbf{D}_t]\). In the proposed RandStainNA, we expand the uniformly shared one-template mode to a board randomly generated virtual templates \([\textbf{A}_v,\textbf{D}_v]\) scheme. To be more specific, iteratively, random \(\textbf{A}_v\) is sampled from distribution \(F_\textbf{A}\), and likewise \(\textbf{D}_v\) is picked out from the other distribution \(F_\textbf{D}\), which are jointly used as the target for every training sample normalization. Empirical results yield that eventual performance are robust to a wide range of distribution types of \(F_\textbf{A}\) and \(F_\textbf{D}\), such as Gaussian and t-distribution. In the rest of this section, we simply leverage Gaussian distribution as the estimation i.e. setting \(F_\textbf{A} = \mathcal {N}(\textbf{M}_A,\mathbf {\Sigma }_A)\), \(F_\textbf{D} = \mathcal {N}(\textbf{M}_D,\mathbf {\Sigma }_D)\), where \(\mathcal {N}(\textbf{M},\mathbf {\Sigma })\) writes for the Gaussian distribution with expectation \(\textbf{M}\) and covariance matrix \(\mathbf {\Sigma }\). Notably, due to the orthogonality of channels, \(\mathbf {\Sigma }\) is a diagonal matrix i.e.  \(\mathbf {\Sigma } = {\text {diag}}(\sigma _1^2,\sigma _2^2,\sigma _3^2)\) for some \(\sigma _j\) with \(j=1,2,3\).

Statistics Parameters Estimation for Virtual Template Generation. The estimation of statistical parameters of \(\textbf{M}_A,\mathbf {\Sigma }_A,\textbf{M}_D,\mathbf {\Sigma }_D\) are afterwards applied to the formation of stain style discussed above. A proper candidate is attributed to the sample channel mean values of all the training images for \(\textbf{M}_A \) and \(\textbf{M}_D\), as well as the standard deviations of samples for \(\mathbf {\Sigma }_A \) and \( \mathbf {\Sigma }_D\), based on the average value and standard deviation of the whole training set. However, two defects turn out in this discipline that one is the inefficiency to transverse the whole set, and the other is the special cases of infeasibility e.g. federated learning or lifelong learning. Therefore, we provide a more computation-efficient alternative, by randomly curating a small number of patches from the training set and applying their sample mean and standard deviation as \(\textbf{M}_A,\mathbf {\Sigma }_A,\textbf{M}_D,\mathbf {\Sigma }_D\). The empirical results suggest it can achieve competitive performance.

Image-Wise Normalization with Random Virtual Template. After transferring image \(\textbf{x}_i\) from RGB into LAB space, we write the pixel value as [lab]. We denote the average and standard deviation (std) for each channel of this image as \(\textbf{A}_i=[a^{(l)}_i,a^{(a)}_i,a^{(b)}_i]\) and \(\textbf{D}_i=[d^{(l)}_i,d^{(a)}_i,d^{(b)}_i]\), and the generated random virtual template associated to \(\textbf{x}_i\) from \(F_\textbf{A},F_\textbf{D}\) as \(\textbf{A}_v=[a^{(l)}_v,a^{(a)}_v,a^{(b)}_v]\) and \(\textbf{D}_i=[d^{(l)}_v,d^{(a)}_v,d^{(b)}_v]\). Then the image-wise normalization based on random virtual template is formulated as

$$\begin{aligned} {\left\{ \begin{array}{ll} \begin{aligned} l^\prime &{}= \frac{d^{(l)}_v}{d^{(l)}_i}(l - a^{(l)}_i) + a^{(l)}_v \\ a^\prime &{}= \frac{d^{(a)}_v}{d^{(a)}_i}(a - a^{(a)}_i) + a^{(a)}_v \\ b^\prime &{}= \frac{d^{(b)}_v}{d^{(b)}_i}(b - a^{(b)}_i) + a^{(b)}_v, \end{aligned} \end{array}\right. } \end{aligned}$$
(1)

Then, we transfer \([l^\prime ,a^\prime ,b^\prime ]\) from LAB back to RGB space. Notably, we generate different virtual templates for images that vary at every epoch during the training stage. Therefore, RandStainNA can largely increase the data variations with the on-the-fly generate virtual templates.

Determine Random Color Space for Augmentation. By the computation of the stain style parameters \([\textbf{M}, \mathbf {\Sigma }]\) of distinct color spaces, we can derive their associated \(F_\textbf{A}\) and \(F_\textbf{D}\). Afterwards, we extend our RandStainNA from LAB to other color spaces e.g. HED, HSV. This extension allows the proposal of a random color space selection scheme, which will further strengthen the regularization effect. The candidate pool comprises three widely-used color spaces in the domain of histology, i.e. HED, HSV, LAB. Training iteratively, an initial color space \(\mathcal {S}\) is an arbitrary decision with equal probability i.e.  \(p=\frac{1}{3}\), or with manually-assigned values depending of the performance of each independent space. Subsequently, a virtual template is assigned to associate with \(\mathcal {S}\) to perform image-wise stain normalization.

3 Experiments

Dataset and Evaluation Metrics. We evaluate our proposed RandStainNA on two image analysis tasks i.e. classification and segmentation. Regarding the patch-level classification task, we use a widely-used histology public dataset NCT-CRC-HE-100K-NONORM for training and validation, with the addition of the CRC-VAL-HE-7K dataset for external testing [9]. These two sets comprise a number of 100,000 and 7180 histological patches respectively, from colorectal cancer patients of multiple data centers with heterogeneous stain styles. We randomly pick up 80% from NCT-CRC-HE-100K-NONORM for training and the rest 20% for validation. The original dataset covers nine categories, but for the category of background, we can straightforwardly identify them in pre-processing stage with OTSU algorithm [15] and thus it is removed in our experiment for a more reliable result. The top-1 classification accuracy (%) is used as the metric for the 8-category classification task. For the nuclei segmentation task, we use a small public dataset of MoNuSeg [12], with Dice and IoU as the metrics.

Network Architecture and Settings. In the classification task, we employ six backbone architectures to perform the evaluations, namely the ResNet-18 [7], ResNet-50 [7], MobileNetV3-Small [8], EfficientNetB0 [19], ViT-Tiny [3] and SwinTransformer-Tiny [13]. These networks, including CNN and transformers, may represent a wide range of network capabilities, which effectively demonstrate the adaptability of our method in different settings. In the nuclei segmentation task, we use CIA-Net as the backbone [26] for its notable performance in small set processing. We use a consistent training scheme for distinct networks for performance comparison with stain augmentation and stain normalization methods. Detailed training schemes and hyper-parameter settings are shown in the supplementary material. We perform 3 random runs and compute the average for each experiments.

Table 1. Test accuracy (%) comparison on the tissue type classification task. We compare our method with stain augmentation (SA) [22] and stain normalization (SN) [16] in three color space i.e. LAB, HSV, HED [22]. In SA, we follow previous work [22] by leveraging two settings, namely light (L) and strong (S), determined by the degree of distortion. The best and second best are marked in boldface and with * respectively.
Table 2. Performance comparison on nuclei segmentation in terms of Dice and IoU.
Fig. 3.
figure 3

The illustrative patch examples of (a) raw images (b) stain-normalized images (c) stain-augmented images (d) images processed with the proposed RandStainNA. We incorporate the results of four random runs into one image patch to demonstrate the different grades of randomness maybe achieved by the stain augmentation methods and our RandStainNA in (c) and (d).

Fig. 4.
figure 4

UMAP [1] embedding of the stain style charactiersitic statistics i.e.  \([\textbf{M},\mathbf {\Sigma }]\) of raw images, stain normalized images, stain augmented images and those augmented with our RandStainNA. As shown, our method can enrich the realistic stain styles in training CNNs.

Compared Methods. All models are trained with morphology augmentation, namely the random vertical and horizontal flip. In both evaluation tasks, we compare our method with existing stain normalization [16] and stain augmentation [21, 22] approaches performed in the three color spaces i.e. HED, HSV, LAB. Regarding the stain augmentation in HED, we employ a multiplication rule [22] that adds noise to each channel i.e.  \(p^\prime = p * \varepsilon _1 + \varepsilon _2\), where \(p^\prime \) is the augmented pixel value and p is the original pixel value, and \(\varepsilon _1 \) and \(\varepsilon _2\) are uniform random noises, termed as stain augmentation scheme #1 (SA1). For the SA in HSV, we adopt an addition rule i.e.  \(p^\prime = p+ p * \varepsilon \) [22], termed as stain augmentation scheme #2 (SA2). We integrate the above two schemes for LAB stain augmentation, due to an absence of literature works for SA in LAB. We also configure two augmentation settings according to different degrees of distortion i.e. range of random noise, denoted as light (L) and strong (S) [21, 22]. To fully retain recognizable morphological features, we do not take GAN-related approaches for comparison.

Results. Our method can consistently improve the baseline performance of the six backbone architectures in terms of test accuracy, with the implementation in three color spaces, as shown in Table . Therefore, it yields the effectiveness of RandStainNA. The hybrid architecture can outperform a sole deployment of either SN or SA. With the random color space selection scheme (denoted as ‘full’), the RandStainNA achieves further performance improvement. To demonstrate the effects achieved with different approaches straightforwardly, we visualize the original raw images with stain variations, SN images, SA images, and images processed with our RandStainNA in Fig. . In the visualization graph, we use the results from SA and SN performed in HSV space as an example, which shows very similar outcomes in LAB and HED spaces. As shown, the SN unifies stain styles into a shared template that may leave out many useful features [22], and the SA may generate unrealistic images. In contrast, our method generates much more realistic images to reorganize by both human and deep learning techniques. The Fig. provides the UMAP embedding of stain style parameters of \([\textbf{M},\mathbf {\Sigma }]\) in the associated solutions. The nuclei segmentation results are listed in the Table . For SN and SA in each color space, we pick up one configuration with higher performance in the classification task. Our method also achieves the best performance to demonstrate its effectiveness in various downstream tasks.

Ablation Study. The ablation study is performed on the classification task. First, we test the effect of the distribution style of \(F_\textbf{A},F_\textbf{D}\). The test accuracy is 93.98, 93.90, 93.04, 92.48 for Gaussian, t-distribution, uniform, and Laplace respectively. The effect of sample numbers to compute the sample mean and the standard deviation is also evaluated. The test accuracies are 93.42, 93.29, 94.08, 93.98 for the cases of computing the averages and standard deviations with 10 images per category, 100 images per category, 1000 images per category, and the whole training set respectively, which yields the robustness to \(\textbf{M}\) and \(\mathbf {\Sigma }\).

4 Conclusion

The proposed RandStainNA framework aims to cope with the inevitable stain variance problem for clinical pathology image analysis. Leveraging the advantages of both stain normalization and stain augmentation, the proposed framework produces more realistic stain variations to train stain agnostic DL models. Additionally, RandStainNA is straightforward practically and efficient when applied as an on-the-fly augmentation technique, in comparison with most current GANs. Moreover, the result shows the feasibility to train robust downstream classification and segmentation networks on various architectures. One future direction of our current works is the expansion of color spaces, e.g. YUV, YCbCr, YPbPr, YIQ, XYZ [4], to further improve the generalization ability.