Keywords

1 Introduction

With the recent emergence of large-scale datasets supervised by high-quality annotations, deep neural networks (DNNs) have exhibited impressive performance in numerous domains, particularly in medical applications. It has proved itself to be a worthy computer assistant in solving many medical problems, including disease early diagnosis, disease progression prediction, patient classification, and many other crucial medical image processing tasks like image registration and segmentation [7]. However, the former success is mostly contributed to the availability of well-labeled data. In practice, it is a great challenge to obtain large high-quality datasets with accurate labels in medical imaging. Because such labeling is not only time-expensive but also expertise-intensive. In most cases, the labeled datasets more or less have potential noisy labels, especially for the segmentation task, which generally needs pixel-wise annotation. Therefore, a segmentation model that is robust to such noisy training data is highly required.

To overcome this problem, a few recent approaches had been proposed. Mirikharaji et al. [11] proposed a semi-supervised method to optimize the weights on the images in the noisy dataset by reducing the loss on a small clean dataset for skin lesion segmentation. Inspired by [13], Zhu et al. [20] detected incorrect labels in the segmentation of heart, clavicles, and lung in chest radiographs through decreasing the weight of samples with incorrect labels. Wang et al. [18] combined the meta learning with the re-weighting method to adapt for corrupted pixels and re-weight the relative influenced loss for lung and liver segmentation.

All these methods are built on the basis of exclusion or simply re-weighting the suspected noisy samples to reduce their negative influence for training. However, simple exclusion or re-weighting can not make full use of noisy labels and ignores the reason leading to these noise labels, which makes them still have room for further performance improvement. This motivates us to explore the feasibility of taking full advantage of noisy labels by estimating the pixel transition confidence map, so as to do further pixel loss correction to make the model noise-robust and improve the segmentation performance with corrupted pixels.

In this paper, we propose a novel meta pixel loss correction(MPLC) to address the problem of medical image segmentation with noisy labels. Specifically, we design a meta guided network by feeding the segmentation network prediction as input to generate the pixel transition confidence map. The obtained pixel transition confidence map can represent the possibility of transitioning from the latent clean label to the observed noisy label, which can lead to improved robustness to noisy labels in the segmentation network through further pixel loss correction processing. The contributions of this paper can be summarized as follows: 1) We propose a novel meta pixel loss correction method to generate a noise-robust segmentation model to make full use of the training data. 2) With the introduction of noise-free meta-data, the whole model can be trained in an alternative manner to automatically estimate a pixel transition confidence map, so as to further do pixel loss correction. 3) We conduct experiments on a combination of three medical datasets, including LIDC-IDRI, LiTS and BraTS19 for segmentation tasks with noisy labels. The results show that our method achieves state-of-art performance in medical image segmentation with noisy labels.

2 Methodology

We propose a novel meta pixel loss correction method (MPLC) to correct loss function and generate a noise-robust segmentation network with noisy labels. The detailed architecture of our proposed framework and workflow are shown in Fig. 1. And it consists of two components: (1) a segmentation network based on U-Net (2) a meta guided network for generating the pixel transition confidence map to do further pixel loss correction. The components are trained in an end-to-end manner and are described as follows.

Fig. 1.
figure 1

Overview of our workflow in one loop.

2.1 Meta Pixel Loss Correction

Given a set of training noisy label samples \(S = \left\{ (X^{i}, \tilde{Y^{i}} ),1 \le i \le N \right\} \), where \(X^{i}\) is training input images, \(\tilde{Y^{i}} \in \left\{ 0,1 \right\} ^{h \times w \times c}\) represent the observed noisy labels, denote training images with noisy segmentation annotations. We use U-Net [14] as the backbone DNN for segmentation and it generates a prediction \(P^i\) from the function \(P^i = f(X^i,\omega )\), where f denotes the U-Net and \(\omega \) denotes the parameters of U-Net. For a conventional segmentation task, cross entropy is used as the loss function \(Loss = l(P^i, \tilde{Y^{i}})\) to learn the parameters \(\omega \).

However, there may exist many noisy labels in the training dataset which contributes to the poor performance of the trained U-Net. Because the influences of these errors in the loss function can lead the gradient into the probably wrong direction and cause overfitting issues [16]. Instead of simply excluding the corrupted unreliable pixel [11, 18], we aim to take advantage of these noisy labels.

T Construction. Assuming that there is a pixel transition confidence map T, which can bridge clean label and noisy label, specifying the probability of clean label flipping to noisy label. T will be applied to the segmentation prediction through the transition function and finally we get the revised prediction, which resembles the relative noisy mask. Thus, the noisy labels are used properly and the original cross entropy loss between the revised prediction and the noisy mask can work as usual, which approximately equals to training on clean labels.

In this paper, we design a learning framework with prediction \(P^i\) which could adaptively generate pixel transition confidence map T for every training step

$$\begin{aligned} T^i = g(P^i,\theta ), \end{aligned}$$
(1)

where \(\theta \) indicates the parameters of that framework. Specifically, for T in every pixel, we have

$$\begin{aligned} T^{i}_{xy} = p(\tilde{Y}^{i}_{xy}=m | Y^{i}_{xy}=n),\forall m,n\in \left\{ 0,1 \right\} , \end{aligned}$$
(2)

where \(T^{i}_{xy}\) represents the confidence of transitioning from the latent clean label \(Y^{i}_{xy}\) to the observed noisy label \(\tilde{Y}^{i}_{xy}\) at pixel (x,y). Corrupted pixels have low pixel confidence but high transition probability. Due to binary segmentation, we assume the size of the pixel transition matrix is \(N \times C \times H \times W\), where \(C=2\) represents the foreground and background in our paper. Each value in the transition matrix from different C represents the confidence that the pixel in foreground and background keep not flipping to other.

We can use \(T^{i}_{xy}\) to do pixel loss correction and the loss function of the whole model can be written as:

$$\begin{aligned} Loss = -\frac{1}{Nhw}\sum _{i=1}^N\sum _{x=1}^h\sum _{y=1}^wl(\mathcal {H}_{trans}(T^{i}_{xy}, f(X^{i}_{xy},\omega )),\tilde{Y}^{i}_{xy}), \end{aligned}$$
(3)
$$\begin{aligned} \mathcal {H}_{trans}(T^{i}_{xy}, f(X^{i}_{xy},\omega )) = P^{i}_{xy} * T^{i}_{xy}(C=1) + (1-P^{i}_{xy}) * (1-T^{i}_{xy}(C=0)) \end{aligned}$$
(4)

where l is BCE loss function, \(\mathcal {H}_{trans}\) is the transition function between foreground and background. In our method, the transition function Eq. 4 represents the foreground of prediction keeps no change and the background of prediction flips into the foreground.

Optimization. Given a fixed \(\theta \), the optimized solution to \(\omega \) can be found through minimizing the following objective function:

$$\begin{aligned} \omega ^*(\theta ) = arg\mathop {min}\limits _{\omega } \frac{1}{Nhw}\sum _{i=1}^N\sum _{x=1}^h\sum _{y=1}^wl(\mathcal {H}_{trans}(T^{i}_{xy}, f(X^{i}_{xy},\omega )),\tilde{Y}^{i}_{xy}). \end{aligned}$$
(5)

We then introduce how to learn the parameters \(\theta \) through our meta guided network. Motivated by the success of meta-parameter optimization, our method takes advantage of a small trusted dataset to correct the probably wrong direction of the gradient and guide the generation of pixel loss correction map. Specifically, we leverage an additional meta data set \(\mathcal {S}=\left\{ (\mathcal {X}^{j}, \mathcal {Y}^{j} ),1 \le j \le M \right\} \) which has clean annotations. M is the number of meta-samples and \(M \ll N\). Given a meta input \(\mathcal {X}^{j}\) and optimized parameters \(\omega ^*(\theta )\), through segmentation network, we can obtain the prediction map as \(\mathcal {P}^j = f(\mathcal {X}^{j},\omega ^*(\theta ))\), the meta loss for the meta dataset can be written as:

$$\begin{aligned} Loss_{meta} = -\frac{1}{Mhw}\sum _{j=1}^M\sum _{x=1}^h\sum _{y=1}^wl(f(\mathcal {X}^{i}_{xy},\omega ^*(\theta )),\mathcal {Y}^{i}_{xy}), \end{aligned}$$
(6)

Combined with Eq.(5) and Eq.(6), it is formulated into a bi-level minimization problem and the optimized solution to \(\theta ^{*}\) can be acquired through minimizing the following objective function:

$$\begin{aligned} \theta ^{*} = arg\mathop {min}\limits _{\theta } \frac{1}{Mhw}\sum _{j=1}^M\sum _{x=1}^h\sum _{y=1}^wl(f(\mathcal {X}^{i}_{xy},\omega ^*(\theta )),\mathcal {Y}^{i}_{xy}). \end{aligned}$$
(7)

After achieving \(\theta ^{*}\), we can then get the pixel transition confidence map, which estimates the transition confidence from correct labels to be corrupted ones to help train a noise-robust segmentation model.

Fig. 2.
figure 2

Illustration of working processing about meta guided network. (Dilation operator is used to generate noise)

Meta Guided Network. For the meta guided network g in the Eq. (1), we explore the different architectures, which need to satisfy the auto-encoder structure of U-Net and be also easy trained for the small meta dataset by meta-learning. In this paper, SENet [5] has been used as the backbone, which is a simple and easy trained structure and generates the same size result as U-Net for transition. By feeding the prediction \(P^i\), this meta guide network can adaptively recalibrate latent transition confidence by explicitly modeling interdependencies between channels, especially in favor of finding the transition confidence from correct labels to the corrupted ones.

From Fig. 2, we can see that how our meta guided network work to build a noise-robust model. By feeding the prediction (c) to the meta guided network, the relative pixel transition confidence map can be obtained. Corrupted pixels have low pixel confidence but high transition probability. After the transition function with the confidence map, the prediction is turned into the revised prediction (d), which is very similar to the noisy mask (e). Finally, cross entropy can be used between revised prediction and observed noisy mask to train the segmentation model. This enables our method to train a noise-robust segmentation network with noisy labels.

2.2 Optimization Algorithm

The algorithm includes mainly following steps. Given the training input \((X^{i},\tilde{Y^{i}} )\), we can then deduce the formulate of one-step w updating with respect to \(\theta \) as

$$\begin{aligned} \hat{\omega }(\theta ) = \omega ^{(t)} - \alpha \frac{1}{Nhw}\sum _{i=1}^N\sum _{x=1}^h\sum _{y=1}^w\nabla _wl(\mathcal {H}_{trans}(T^{i(t)}_{xy}, f(X^{i}_{xy},\omega )),\tilde{Y}^{i}_{xy}), \end{aligned}$$
(8)

where \(\alpha \) is the learning rate and \(T^{i(t)}_{xy}\) is computed by feeding the pixel-level prediction into meta guided network with parameters \(\theta ^{(t)}\).

Then, with current mini-batch meta data samples\((\mathcal {X}^{j}, \mathcal {Y}^{j} )\), we can perform one-step updating for solving

$$\begin{aligned} \theta ^{(t+1)} = \theta ^{(t)} - \beta \frac{1}{Mhw}\sum _{j=1}^M\sum _{x=1}^h\sum _{y=1}^w\nabla _\theta l(f(\mathcal {X}^{i}_{xy},\hat{\omega }(\theta )),\mathcal {Y}^{i}_{xy}), \end{aligned}$$
(9)

where \(\beta \) is learning rate and we use autograd to calculate Jacobian. After we achieve \(\theta ^{(t+1)}\), we can update w, that is

$$\begin{aligned} \omega ^{(t+1)} = \omega ^{(t)} - \alpha \frac{1}{Nhw}\sum _{i=1}^N\sum _{x=1}^h\sum _{y=1}^w\nabla _wl(\mathcal {H}_{trans}(T^{i(t+1)}_{xy}, f(X^{i}_{xy},\omega ),\tilde{Y}^{i}_{xy}), \end{aligned}$$
(10)

The predict \(T^{i(t+1)}_{xy}\) is updated with the parameters of \(\omega ^{(t+1)}\) of the segmentation network. The entire algorithm is then summarized in Algorithm 1.

figure a

3 Experiment Results

3.1 Dataset

We evaluate our method on three medical segmentation datasets: LIDC-IDRI [1], LiTS [4] and BraTS2019 [10], which were selected for lesion segmentation. We follow the same preprocessing and experiment settings with [18] on the LIDC-IDRI and LiTS datasets with 64\(\,\times \,\)64 cropped lesion patches. LIDC-IDRI is a lung CT dataset consisting of 1018 lung CT scans. 3591 patches are adopted, which are split into a training set of 1906 images, a testing set of 1385 images and the last 300 images for the meta set. LiTS contains 130 abdomen CT liver scans with tumors and liver segmentation challenge. 2214 samples are sampled from this dataset. 1471, 300 and 443 images are used for training, meta weight learning, and testing respectively. BraTS19 is a brain tumor challenge dataset. It consists of 385 labeled 3D MRI scans and each MRI scan has four modalities (T1, T1 contrast-enhanced, T2 and FLAIR). 3863 ET lesion patches are adopted and training dataset, meta dataset and testing dataset contain 1963 samples, 300 samples and 1600 samples respectively. Specifically, because our input is cropped lesion patch, the challenge results can not be cited in our experiments.

3.2 Experiment Setting

Noise Setting: Extensive experiments have been conducted under different types of noise. We artificially corrupted the target lesion mask with two types of label degradation: dilation morphology operator and ElasticDeform. 1) Dilation morphology operator: the foreground region is expanded by several pixels (randomly drawn from [0, 6]). 2) ElasticDeform [17]: label noise is generated by complicated operations such as rotation, translation, deformation and morphology dilation on groundtruth labels. Specifically, we set a probability r as the noisy label ratio to represent the proportion of noisy corrupted labels in all data.

Implementation Detail: We train our model with SGD at initial learning rate \(1e{-}4\) and a momentum 0.9, a weight decay \(1e{-}3\) with mini-batch size 60. Set \(\alpha = 1e{-}4, \beta = 1e{-}3\) in all the experiments. The learning rate decay 0.1 in 30th epoch and 60th epoch for a total of 120 epoch. mIOU, Dice and Hausdorff were used to evaluate our method.

Table 1. Results of segmentation models on LIDC-IDRI. (r = 0.4)

3.3 Experimental Results

Comparisons with State-of-the-Art Methods. In this section, we set r to \(40\%\) for all experiments, which means \(40\%\) training labels are noisy labels with corrupted pixels. There are 9 existing segmentation methods for the similarity task on the LIDC-IDRI dataset, including: Prob U-Net [9], Phi-Seg [3], UA-MT [19], Curriculum [8], Few-Shot GAN [12], Quality Control [2], U2 Net [6], MWNet [15] and MCPM [18]. Visualization results are shown in Fig. 3.

Table 1 shows the results of all competing methods on the LIDC-IDRI dataset with the aforementioned experiment setting. It can be observed that our method gets the best performance. Specifically, compared with MCPM and MWNet, which use the re-weighting method, our algorithm has the competitive Dice result (87.16) and it outperforms the second best method(MCPM) by \(2.52\%\).

An extra t-test comparison experiment has been done between our method and the second best method(MCPM), and the result shows that P-value \(<0.01\), which represents there is a statistical difference between our method and MCPM.

Fig. 3.
figure 3

Visualization of segmentation results under r = 0.8 in this section. Green and red contours indicate the ground-truths and segmentation results, respectively. The Dice value is shown at the bottom line, and our method produces much better results than other methods on every dataset. (Color figure online)

Robustness to Various R-S. We explore the robustness of our MPLC under the various percent of noise label ratio r \( \{0.2,0.4,0.6,0.8\}\). It has been evaluated on LIDC-IDRI, LiTS and BraTS19 datasets under the dilation operation. Table 2 shows the results compared with baseline approaches. It shows that our method consistently outperforms other methods across all the noise ratios on all datasets, showing the effectiveness of our meta pixel loss correction strategy.

Table 2. Results (mIOU) of segmentation methods using various r-s.(Noise=Dilation)

3.4 Limitation

Because our approach is based on the instance-independent assumption that \(P(\tilde{y} | y)=P(\tilde{y} | x,y)\). It is more suitable to model single noise distribution but fails in real-world stochastic noise like the complicated noise setting with multi noises(erosion, dilation, deformity, false negatives, false positives). When it is extended to instance-dependent, we should model the relationship among clean label, noisy label and instance for \(P(\tilde{y} | x,y)\) in future work.

4 Conclusion

We present a novel Meta Pixel Loss Correction method to alleviate the negative effect of noisy labels in medical image segmentation. Given a small number of high-quality labeled images, the deduced learning regime makes our meta guided network able to take full use of noisy labels and estimate the pixel transition confidence map, which can be used to do further pixel loss correction and train a noise-robust segmentation. We extensively evaluated our method on three datasets, LIDC-IDRI, LiTS and BraTS19. The result shows that the proposed method can outperform state-of-the-art in medical image segmentation with noisy labels.