Keywords

1 Introduction

1.1 Domain Adaptation

Convolutional Neural Networks (CNNs) achieve great success in many tasks such as image classification, object detection, and action recognition. However, CNNs cannot guarantee performance in unseen data because of the variety of environments (domain gaps). Thus, there is a need to annotate the data for new domains and remake the models. There are, however, obstacles that arise. For example, object detection and semantic segmentation require fine annotation, which has a high labor cost. Annotation cost is an important issue when applying machine learning and deep learning to a social problem. The goal is to reduce annotation cost and make proper models in a wide variety of domains. Domain adaptation tackles such problems and aims to reduce domain gaps in training data (source data) and testing data (target data). By using source data that is fully annotated and target data that is not annotated or not fully annotated (the weakly supervised method), we are able to make an appropriate model in a new domain for less cost. In domain adaptation (DA), supervision is defined by the target data’s annotation level; essentially, where the source data is fully annotated and the target data is not, the process is referred to as an unsupervised domain adaptation (UDA). If some target data is annotated, it is considered a semi-supervised domain adaptation (SDA). If the target data is not fully annotate but some weak annotation exists, this is referred to as a weakly-supervised domain adaptation (WDA). Several UDA methods have shown great progress [7, 9, 13, 27, 30]; many such methods have been proposed for semantic segmentation [17, 18, 32, 34, 36]. The UDA method is being suggested as more complex method and the number of hyperparameters required is increasing. Hyperparameter tuning in medical application, which has several domains, is difficult due to the shortage of experts. In this paper we introduce weak annotation into a simple UDA method, creating a WDA. The result is a simple domain adaptation method that guarantees performance in the target data at a lower cost when compared to previous methods.

1.2 Medical Image Analysis

Recently, there have been many studies on medical image segmentation, such as those on histopathological image segmentation [8, 10, 37], MRI tumor image segmentation [21], and retinal vessel image segmentation [12]. These studies achieved significant progress, but wide domain gaps still exist in biomedical image analysis (e.g., camera, organs, staining method). In medical applications, it is particularly necessary to guarantee high performance, so a proper model for each area must be made due to many domain gaps. Semantic segmentation requires fine annotation that has a high labor cost. Experts are needed, making the total cost of annotation higher and creating a serious problem in medical applications.

1.3 Weakly Supervised Semantic Segmentation

Recently, weakly-supervised segmentation methods have been developed [1, 3, 15, 22, 28, 29]. Weak annotation involves items at the image level, point level and at the level of the bounding box that are not fully annotation but provide helpful annotations. The UDA method has progressed, but the number of hyperparameters has increased, complicating the process accordingly. We use weak annotation as the target label and aim to make an easy-to-handle model for medical application. In histopathological image segmentation, point annotation and bounding boxes are primarily used. We used point level annotation from the viewpoint of the fineness of cell size and ease of handling point information in histopathological images. This paper contributes to the literature by applying a WDA to histropathological image segmentation, showing that by using a point-level annotation, which is a low cost construct compared to full annotation, it is possible to improve the accuracy in the target domain by combined it with a simple UDA method, such that the number of hyperparameters is low and it is an easy-to-handle model.

2 Related Work

2.1 Histopathological Image Segmentation

Many histopathology image segmentation methods have been developed [4, 5, 19, 25, 37]. Semantic segmentation addresses a wide variety of issues unique to histopathology image segmentation. [25] treats regression in cell images, and [37] focuses on a cell segmentation problem that requires finer classification. [14] is weakly supervised method with point annotation. It processed pseudo-labels by combining k-means clustering and Conditional Random Fields (CRF).

2.2 Medical Image Domain Apdaptation

Domain adaptation in medical image analysis has progressed [6, 16, 20, 33]. In many cases, there are multiple methods available to obtain common domain representation to solve domain gaps. [20] deals with pneumonia classification problems. This study uses Generative Adversarial Network (GAN), which generates images such that it is difficult to discriminate between the source and the target, so the classification model is used for a common domain. Domain adaptation is also progressing in the area of histopathological image segmentation. [6] prepared models for each source and target and used maximum mean discrepancy (MMD) or correlation alignment (CORAL), which measure the difference between feature distributions in each model as a loss function to resolve domain discrepancy. [33] transfered the source image to the target style by using Cycle-GAN to solve domain gaps in image style using train data. [16] used an adversarial learning method where the common segmenter and discriminator were provided. The discriminator decides which domain is input from the common segmenter output, and common domain representation is obtained.

3 Weakly Supervised Domain Adaptation

Fig. 1.
figure 1

An overview of our method. Segmenter \(\mathbf {G}\) outputs the segmentation result. Discriminator \(\mathbf {D}\) distinguishes whether the segmenter’s softmax output is from source data \(\varvec{I_s} \in \mathbb {R}^{(H\times W \times 3)}\) or target data \(\varvec{I_t} \in \mathbb {R}^{(H\times W \times 3)}\). Then the adversarial loss is optimize Ladv and the segmenter is given a good model for both domains, with which the discriminator can determine whether an input is from the source or target. In UDA, segmentation loss is only \(L_{seg}\) on source data, but in WDA, the weakly segmentation loss \(L_{weakseg}\) on target data is added to this.

3.1 Unsupervised Domain Adaptation

As an introduction to our method, UDA is explained. In UDA, it has been experimentally shown that adversarial learning is effective, and many methods have adopted it [11, 17, 30, 34, 36]. These methods commonly set the discriminator, which distinguishes whether input data is a source or a target and solves adversarial loss \(L_{adv}\), so generater get proper model for both source and target domain. Figure 1 shows an overview of our method networks. The segmenter \(\mathbf {G}\) output is the segmentation result. Based on hidden layer outputs, discriminator \(\mathbf {D}\) distinguishes whether input data is a source image \(\varvec{I_s} \in \mathbb {R}^{(H\times W \times 3)}\) (fully annotated by \(\varvec{Y_s} \in \mathbb {R}^{(H\times W)}\)) or a target image \(\varvec{I_t} \in \mathbb {R}^{(H\times W \times 3)}\) (not annotated). Domain adaptation for semantic segmentation [32] shows that low-dimensional softmax output \(\varvec{P}=\mathbf{G} (\varvec{I}) \in \mathbb {R}^{(H\times W \times C)}\), where C is the number of categories, is better for discriminator input than high-dimensional hidden layer outputs, so this was adopted for this study. While segmenter outputs are difficult for the discriminator to distinguish the domain of, in this case, the discriminator learns from the segmenter outputs which domain they come from. Thus, after adversarial learning, such an adapted segmenter match feature distributions between source and target. So, UDA scheme can be written as follow.

Segmenter Training. We define the segmentation loss in (1) as the cross-entropy loss for source data {\(\varvec{I_s}\), \(\varvec{Y_s}\)}:

$$\begin{aligned} L_{seg}(\varvec{I_s})=-\sum _{\varvec{h,w}}\sum _{{\varvec{c}} \in C} \varvec{Y_s^{(h,w,c)}}\log \varvec{P_s^{(h,w,c)}} \end{aligned}$$
(1)

Discriminator Training. As discriminator input, we use segmenter softmax ouput \(\varvec{P}=\mathbf{G} (\varvec{I}) \in \mathbb {R}^{(H\times W \times C)}\). And to train discriminator, we use discriminaor loss in (2) cross-entropy loss \(L_{\varvec{D}}\) for two class (source and target). So, it can be written as:

$$\begin{aligned} L_D(\varvec{P})=-\sum _{\varvec{h,w}} (1-z)\log (\mathbf{D} (\varvec{P}^{(h,w,0)}))+z\log ( \mathbf{D} (\varvec{P}^{(h,w,1)})) \end{aligned}$$
(2)

where z = 0 if input data is draw from target domain, and z = 1 if inuput data from source domain.

Adversarial Learning. For target data, to make target prediction distribution \(\varvec{P}_t=\mathbf{G} (\varvec{I}_t) \in \mathbb {R}^{(H\times W \times C)}\) close to source prediction distribution \(\varvec{P}_s\), we use adversarial loss \(L_{adv}\) in (3) written as:

$$\begin{aligned} L_{adv}(\varvec{I_t})=-\sum _{\varvec{h,w}}\log ( \mathbf{D} (\varvec{P}_{t}^{(h,w,1)})) \end{aligned}$$
(3)

So, we formulate objective function for domain adaptation:

$$\begin{aligned} L( \varvec{I_s},\varvec{I_t})=L_{seg}(\varvec{I_s})+ \gamma L_{adv}(\varvec{I_t}). \end{aligned}$$
(4)

And optimizing min-max criterion:

$$\begin{aligned} \max _\mathbf{D} \min _\mathbf{G} L(\varvec{I_s}, \varvec{I_t}), \end{aligned}$$
(5)

we aim to maximize the probability of predictions in target data while minimizing segmentation loss in source data. By optimizing min-max criterion 5, the segmenter gets a common representation that solves the domain gaps.

3.2 Weakly Supervised Domain Adaptation

There are many weakly-supervised annotations. Image level annotation is given only object identification, point annotation is given object position, bounding box is given object rectangles and so on. For this paper, point annotation was determined to be best in histopathological image segmentation because of its fineness in a large number of cells. In addition, as shown in Fig. 2, we experimented with three types of weak labels: point level annotation, gaussian level annotation, and superpixel level annotation.

Fig. 2.
figure 2

An overview of the weakly labeling method on target data. A point level annotation is given to each nuclei in the image. This is point level weakly annotation. In addition, we used gaussian level annotation; gaussian functions are centered at each point annotation. This is called gaussian level weakly annotations. Superpixel level annotation is when images are divided into superpixels and annotations are given to the superpixel at the point level annotations. This is called superpixel level weakly annotation.

Point Level Annotation. Point level annotation give information by points to each cells. In this paper, this weakly label expresses as point level weakly annotations.

Point Annotation with Gaussian Function (Gaussian Level). In addition to point level annotations, we give gaussian level annotation which gaussian functions are center at each point annotations. In this paper, this weakly label expresses as gaussian level weakly annotations.

Point Annotation with Superpixel (Superpixel Level). First, images is divided into superpixel (we used SLIC algorithm [2]), and gives annotations to superpixel which is given point level annotations. In this paper, this weakly lable expresses as superpixel level weakly annotations.

Segmentation Loss with Weakly Label. In weakly supervised segmentation, [31] says partial cross entropy loss which uses only labeled points \(\varvec{p} \in \varOmega _{L}\) with ground truth is effective. So, we adopted it in our method.

$$\begin{aligned} L_{weakseg}(\varvec{I_{t}})=-\sum _{{\varvec{p}} \in \varOmega _{L}} y_i\log p \end{aligned}$$
(6)

We add weakly-segmentation loss to unsupervised domain adaptation loss function (5). Thus we opitimize weakly domain adaptation loss function (7).

\({L( \varvec{I_s},\varvec{I_t})=L_{seg}(\varvec{I_s})+\gamma _1 L_{weakseg}(\varvec{I_t})+ \gamma _2 L_{adv}(\varvec{I_t})}\)

$$\begin{aligned} \max _\mathbf{D} \min _\mathbf{G} L(\varvec{I_s}, \varvec{I_t}) \end{aligned}$$
(7)

4 Experiments

4.1 Dataset

Fig. 3.
figure 3

MoNuseg dataset

Fig. 4.
figure 4

TNBC dataset

Source Data. The source data is the Monuseg dataset [24]. The dataset consists of annotated hematoxylin and eosin (H&E) stained histology images captured at 40 x magnification and made available by the Indian Institute of Technology, Guwahati. This dataset was selected from several patients at several hospital and was extracted in a 1000 \(\times \) 1000 patch. There are seven cancer types. An example is shown in Fig. 3. This dataset consists of 30 images and 21623 nuclei are annotated (Fig. 4).

Target Data. The target data is the TNBC dataset [23]. This dataset is annotated H&E stained histology images captured at 40 \(\times \) magnification and made available by the Curie Institute. All slides are taken from a cohort of Triple Negative Breast Cancer (TNBC) patients and were scanned with a Philips Ultra Fast Scanner 1.6RA. For eleven patients, we extracted 512 \(\times \) 512 patches from different areas of tissue. This dataset consists of 50 images and 4022 nuclei are annotated. Additionally, this dataset has been annotated by three experts, guaranteeing its annotation level. In this paper, in order to evaluate them in tandem with the target data, the 50 images were divided into two groups so they could be evaluated in a 2-fold cross validation.

4.2 Experiment Conditions

Segmenter Network and Pre-traning. As segmenter model, we used drc-26 [35] which has dilated convolution and pre-trained on ImageNet. To pre-train segmenter for \(L_{seg}\) in (1), we use source data {\(\varvec{I_s}\), \(\varvec{Y_s}\)} and used Adam optimizer with learning rate \(1.10^{-2}\).

Discriminator Network. As discriminaor network, we use arcitecture similar to [32]. It consists of 5 convolution layers (kernel size is 4 \(\times \) 4 and stride is 2), and channel number is {64, 128, 256, 512, 1}. Except for the last layer, a leaky ReLU parameterized by 0.2 and batch normalization follows in each convolution layer.

Network Traning in Domain Adaptation. In all experiments we set batch size to 8 and random crop (512 \(\times \) 512 in only source data), random 4 rotation 90 degrees for data augumentation. To train segmenter, we used Adam optimizer with learning rate \(1.10^{-4}\). And to train discriminaor, We used the momentum SGD optimier with (momentum is 0.9 and weight decay is \(5.10^{-4}\)). The learning rate is decreased with the polynomial decay with power of 0.9. For \(\gamma _1\) and \(\gamma _2\), the optimum parameters were selected in the range of 0.01 to 0.5 respectively. We implement our network using the PyTorch toolbox on a single NVIDIA GeForce GTX 1080 Ti GPU. All source data 50 images were used as train data. Target is divided into two groups for 2-fold cross validation, so finally thier score is averaged.

4.3 Results

The results are shown in Table 1. These were evaluated by foreground intersection-over-union (fIoU) and F-measure. The experimental conditions for the comparative experiment follows.

Source Model. The source model is learned by using only 50 images from the source data.

Target Model. The target model is learned by using only 25 images from the target data. Domain adaptation aims at this value. Table 1 represents the differences of fIoU in the target model as domain gaps.

DA only (unsupervised DA). This is the unsupervised domain adaptation result. The source data with full annotation and the target data with no annotation were used as training data.

Point Level (weakly Supervised DA). This is the weakly supervised domain adaptation result. Source data with full annotation and target data with point level weakly annotation were used as training data.

Gauss Level (weakly Supervised DA). This is the weakly supervised domain adaptation result. Source data with full annotation and target data with gaussian level weakly annotation were used as training data.

Superpixel Level (weakly Supervised DA). This is our proposed weakly supervised domain adaptation result. Source data with full annotation and target data with superpixel level weakly annotation were used as training data.

Overall Results. Table 1 is list of evaluation values of fIoU, F measure, pixel accuracy, and fIoU gap which shows difference from target. In domain adaptation, target model result is the upper-bound result. So, in this experiment, the upper bound is 0.682. For WDA with superpixel level annotation, although the fIoU gap is 0.154, it has been reduced significantly domain gaps compared to other methods.

Figure 5 shows the output results of the method used in these comparative experiments. Looking at the output results, the source model does not give the target information well, so there are many misidentified areas in which the annotation is not given. Although the results of the unsupervised domain adaptation have been improved, it was not possible to sufficiently reduce mis-recognition. Our method, given the superpixel weakly labeling, can cause a reduction to a level that can be mis-recognized. On the other hand, when compared to weakly supervised methods, the result of point level annotation is the same as in an unsupervised method. Gaussian level annotations are an improvement, but the superpixel level is the best. Thus, it is important to give weakly annotations that capture a certain shape.

Figure 6 is the output result of grad-CAM [26], which visualizes where the discriminator focuses. The source data’s result remains unchanged because the source data is fully annotated. It appears that the discriminator focuses on the object area of the segmenter output and so tends to judge the target result using the worse result and output good result for target data.

Table 1. List of evaluation values of fIoU and F measure. The difference from the target model is shown as the fIoU gap.
Fig. 5.
figure 5

Output results. The top shows the input data and the ground truth. Next is the result of the source model, which is trained-only source data. Third from the top is the result of unsupervised domain adaptation, and the bottom is the result of weakly supervised domain adaptation with point level, gaussian level, and superpixel level.

Fig. 6.
figure 6

Output result of grad-CAM [26], which visualizes the focus of the discriminator. The yellow region indicates a larger value and represents where the discriminator looks to distinguish input data domain.

5 Conclusion

In this paper, we showed that weakly supervised domain adaptation is useful in histopathological image segmentation. Our method combines a simple unsupervised domain adaptation method with weak labeling. In the weak label method, the image is divided into superpixels and annotations are given to the superpixels at the point level annotations. The experiments show that this method resolves domain gaps constract to unsupervised domain adaptation and shows the effectiveness of weakly annotation. In the future, we hope to combine weakly-supervised semantic segmentation method.