TriMix: A General Framework for Medical Image Segmentation from Limited Supervision

Zheng, Zhou; Hayashi, Yuichiro; Oda, Masahiro; Kitasaka, Takayuki; Mori, Kensaku

doi:10.1007/978-3-031-26351-4_12

Zhou Zheng¹²,
Yuichiro Hayashi¹²,
Masahiro Oda¹²,
Takayuki Kitasaka¹³ &
…
Kensaku Mori^12,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13846))

Included in the following conference series:

Asian Conference on Computer Vision

382 Accesses

Abstract

We present a general framework for medical image segmentation from limited supervision, reducing the reliance on fully and densely labeled data. Our method is simple, jointly trains triple diverse models, and adopts a mix augmentation scheme, and thus is called TriMix. TriMix imposes consistency under a more challenging perturbation, i.e., combining data augmentation and model diversity on the tri-training framework. This straightforward strategy enables TriMix to serve as a strong and general learner learning from limited supervision using different kinds of imperfect labels. We conduct extensive experiments to show TriMix’s generic purpose for semi- and weakly-supervised segmentation tasks. Compared to task-specific state-of-the-arts, TriMix achieves competitive performance and sometimes surpasses them by a large margin. The code is available at https://github.com/MoriLabNU/TriMix.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Shifting to machine supervision: annotation-efficient semi and self-supervised learning for automatic medical image segmentation and classification

Article Open access 11 May 2024

Duo-SegNet: Adversarial Dual-Views for Semi-supervised Medical Image Segmentation

Self-supervised Multi-scale Consistency for Weakly Supervised Segmentation Learning

1 Introduction

Segmentation is fundamental in medical image analysis, recognizing anatomical structures. Supervised learning has led to a series of advancements in medical image segmentation [1]. However, the availability of fully and densely labeled data is a common bottleneck in supervised learning, especially in medical image segmentation, since annotating pixel-wise labels is usually tedious and time-consuming and requires expert knowledge. Thus, training a model with limited supervision using datasets with imperfect labels is essential.

Existing works have made efforts to take advantage of unlabeled data and weakly labeled data to train segmentation models [2] with semi-supervised learning (SSL) [3,4,5] and weakly-supervised learning [6,7,8]. Semi-supervised segmentation [9,10,11] is an effective paradigm for learning a model from scarce annotations, exploiting labeled and unlabeled data. Weakly-supervised segmentation aims to alleviate the longing for densely labeled data, utilizing sparse annotations, e.g., points and scribbles, as supervision signals [2]. In this study, in addition to semi-supervised segmentation, we focus on scribble-supervised segmentation, one of the hottest topics in the family of weakly-supervised learning. A conceptual comparison of fully-supervised, semi-supervised, and scribble-supervised segmentation is shown in Fig. 1.

Consistency regularization aims to enforce the prediction agreement under different kinds of perturbations, e.g., input augmentation [3, 9], network diversity [11, 12], and feature perturbation [13]. Recent works [7, 8, 14,15,16,17,18,19,20] involving consistency regularization shows advanced performance tackling limited supervision. Despite their success in learning from non-fullness supervision, an impediment is that existing studies are task-specific for semi- and scribble-supervised segmentation. Driven by this limitation, a question to ask is: Does a framework generic to semi- and scribble-supervised segmentation exist? Although the two tasks leverage different kinds of imperfect labels, indeed, they have the same intrinsic goal: mining the informative information as much as possible from pixels with no ground truth. Thus, such a framework should exist once it can excellently learn representations from the unlabeled pixels.

Consistency regularization under a more rigorous perturbation empirically leads to an improved generalization [11]. However, lacking sufficient supervision, models may output inaccurate predictions and then learn from these under consistency enforcement. This vicious cycle would accumulate prediction mistakes and finally lead to degraded performance. Thus, the key to turning the vicious cycle into a virtuous circle is increasing the quality of model outputs when adopting a more challenging consistency regularization. From these perspectives, we hypothesize that an eligible framework should be endowed with these characteristics: (i) it should output more accurate predictions, and (ii) it should be trained with consistency regularization under a more challenging perturbation.

Based on the above hypothesis, we find a solution: we present a general and effective framework that, for the first time, shows its dual purpose for both semi- and scribble-supervised segmentation tasks. The method is simple, jointly trains triple models, and adopts a mix augmentation scheme, and thus is called TriMix. To meet the requirement of (i), TriMix maintains triple networks, which have identical structures but different initialization to introduce model perturbation and imposes consistency to minimize disagreement among models, inspired by the original tri-training strategy [21]. Intuitively, more diverse models can extract more informative information from the dataset. Each model receives valuable information from the other two through intra-model communication and then generates more accurate predictions. To meet the requirement of (ii), the model diversity is further blended with data perturbation, which accompanies the mix augmentation scheme, to form a more challenging perturbation. We hypothesize that the tri-training scheme within TriMix well complements consistency regularization under the hybrid perturbation. This self-complementary manner enables TriMix to serve as a general learner learning from limited supervision using different kinds of imperfect labels. Our contributions are:

We propose a simple and effective method called TriMix and show its generic solution for semi- and scribble-supervised segmentation for the first time.
We show that purely imposing consistency under a more challenging perturbation, i.e., combining data augmentation and model diversity, on the tri-training framework can be a general mechanism for limited supervision.
We first validate TriMix on the semi-supervised task. TriMix presents competitive performance against state-of-the-art (SOTA) methods and surprisingly strong potential under the one-shot setting^{Footnote 1}, which is rarely challenged by existing semi-supervised segmentation methods.
We then evaluate TriMix on the scribble-supervised task. TriMix surpasses the mainstream methods by a large margin and realizes new SOTA performance on the public benchmarks.

2 Related Work

Semi-supervised learning (SSL) trains a model utilizing both labeled and unlabeled data. Existing SSL methods are generally based on pseudo-labeling (also called self-training) [5, 25,26,27] and consistency regularization [3, 4, 28, 29]. Pseudo-labeling takes the model’s class prediction as a label to train against, but the label quality heavily influences the performance. Consistency regularization assumes predictions should be invariant under perturbations, such as input augmentation [3, 9], network diversity [11, 12], and feature perturbation [13]. Consistency regularization usually performs better than self-training and has been widely involved in the task of semi-supervised segmentation [14,15,16,17,18, 30, 31]. A more challenging perturbation empirically profits model generalization if the model could sustainably generate accurate predictions [11]. In this work, we introduce a hybrid perturbation harsher than its elements, i.e., data augmentation, and model diversity.

Weakly-supervised segmentation learns a model using the dataset with weak annotations, e.g., bounding boxes, scribbles, sparse dots, and polygons [2]. In this work, we utilize scribbles as weak annotations, which are mostly used in computer vision community, from classical methods [32, 33] to current scribble-supervised methods [6,7,8, 19, 34,35,36,37], due to the convenient format. To learn from scribble supervision, some methods [34,35,36] make efforts to construct complete labels based on the scribble for training. Other works like [37, 38] explore possible losses to regularize the training from scribble annotations, and the scheme of [6] adds additional modules to improve the segmentation accuracy. Recently, consistency regularization is explored in several works [7, 8, 20, 39].

Data augmentation generates virtual training examples to improve model generalization. Auto-augmentation methods [40,41,42,43] automatically search for optimal data augmentation policies and show higher accuracy than handmade schemes but with relatively higher search costs. In our study, we focus on the mix augmentation [44,45,46,47,48,49,50], which is one type of strong data augmentation and is more efficient than auto-augmentation methods. Mix augmentation mixes two inputs and the corresponding labels up in some way to create virtual samples for training. It has been widely applied in semi-supervised segmentation [9,10,11] as an effective way to import data perturbation and synthesize new samples during training. In [7], mix segmentation is firstly introduced to increment supervision for scribble-supervised segmentation.

Co-training and tri-training are two SSL approaches in a similar flavor, which maintain multiple models and regularize the disagreement among the outputs of models. Co-training framework [51, 52] assumes there are sufficient and different views of the training data, each of which can independently train a model. Maintaining view diversity, in some sense, is similar to the data perturbation in SSL. Co-training has been extended to semi-supervised segmentation [18, 53]. Unlike co-training, tri-training [21] does not require view difference. Instead, it introduces model diversity and minimizes the disagreement among various outputs. This strategy is similar to imposing consistency under the model perturbation in SSL. There are several variants of tri-training [54,55,56,57], but none are for semi- or scribble-supervised segmentation. In this work, we revisit tri-training and explore its potential and general solution for handling limited supervision when it meets mix augmentation.

3 Method

3.1 Overview

This paper proposes a simple and general framework, TriMix, to tackle semi- and scribble-supervised segmentation. The plain architecture of TriMix is illustrated in Fig. 2. TriMix adheres to the spirit of tri-training, simultaneously learning triple networks $f_{1}$, $f_{2}$, and $f_{3}$, which have identical structures but different weights $\textbf{w}_{1}$, $\textbf{w}_{2}$, and $\textbf{w}_{3}$, to import network inconsistency. In addition, mix augmentation is adopted to introduce input data perturbation. Generally, assume a mini-batch $\textbf{b} = \left\{ \textbf{x},\textbf{y}\right\} $ is fetched at each training iteration, where $\textbf{x}$ and $\textbf{y}$ are images and the corresponding ground truth. TriMix involves three steps to process a batch flow at each training iteration.

Step 1: first forward pass. For $i\in \left\{ 1,2,3\right\} $, each network $f_{i}$ is fed with images $\textbf{x}$ and outputs the prediction ${\textbf {p}}_{i}$. A supervised loss $L_{sup}^{} \left( {\textbf {p}}_{i}, \textbf{y}\right) $ is then imposed between ${\textbf {p}}_{i}$ and the ground truth $\textbf{y}$.

Step 2: mix augmentation. With Step 1, we obtain three batches ${\textbf {b}}_{1} = \left\{ {\textbf {x}},{\textbf {y}},{\textbf {p}}_{1}\right\} $, ${\textbf {b}}_{2} =\left\{ {\textbf {x}},{\textbf {y}},{\textbf {p}}_{2}\right\} $, and ${\textbf {b}}_{3} =\left\{ {\textbf {x}},{\textbf {y}},{\textbf {p}}_{3}\right\} $. The goal is to mix up the pair of $\left( {\textbf {b}}_{2}, {\textbf {b}}_{3}\right) $, the pair of $\left( {\textbf {b}}_{1}, {\textbf {b}}_{3}\right) $, and the pair of $\left( {\textbf {b}}_{1}, {\textbf {b}}_{2}\right) $ to generate new batches. Similar to the mixing operation described in original papers [44, 46], we first randomly shuffle ${\textbf {b}}_{1}$, ${\textbf {b}}_{2}$, and ${\textbf {b}}_{3}$ to generate three new batches of $\tilde{{\textbf{b}}}_{1} = \left\{ \tilde{{\textbf{x}}}_{1},\tilde{{\textbf{y}}}_{1},\tilde{{\textbf{p}}}_{1}\right\} $, $\tilde{{\textbf{b}}}_{2} = \left\{ \tilde{{\textbf{x}}}_{2},\tilde{{\textbf{y}}}_{2},\tilde{{\textbf{p}}}_{2}\right\} $, and $\tilde{{\textbf{b}}}_{3} = \left\{ \tilde{{\textbf{x}}}_{3},\tilde{{\textbf{y}}}_{3},\tilde{{\textbf{p}}}_{3}\right\} $, in which $\tilde{{\textbf{x}}}_{1}$, $\tilde{{\textbf{x}}}_{2}$, and $\tilde{{\textbf{x}}}_{3}$ have different image order, and each $\tilde{{\textbf{y}}}_{i}$ and $\tilde{{\textbf{p}}}_{i}$ correspond to $\tilde{{\textbf{x}}}_{i}$ for $i\in \left\{ 1,2,3\right\} $. Afterward, we apply the mix augmentation to the pair of $\left( \tilde{{\textbf{b}}}_{2}, \tilde{{\textbf{b}}}_{3}\right) $, the pair of $\left( \tilde{{\textbf{b}}}_{1}, \tilde{{\textbf{b}}}_{3}\right) $, and the pair of $\left( \tilde{{\textbf{b}}}_{1}, \tilde{{\textbf{b}}}_{2}\right) $ to generate new batches of $\bar{{\textbf{b}}}_{1} = \left\{ \bar{{\textbf{x}}}_{1},\bar{{\textbf{y}}}_{1},\hat{{\textbf{y}}}_{1}\right\} $, $\bar{{\textbf{b}}}_{2} = \left\{ \bar{{\textbf{x}}}_{2},\bar{{\textbf{y}}}_{2},\hat{{\textbf{y}}}_{2}\right\} $, and $\bar{{\textbf{b}}}_{3} = \left\{ \bar{{\textbf{x}}}_{3},\bar{{\textbf{y}}}_{3},\hat{{\textbf{y}}}_{3}\right\} $ with mixed samples. Take the pair of $\left( \tilde{{\textbf{b}}}_{2}, \tilde{{\textbf{b}}}_{3}\right) $, for example. Each image of $\tilde{{\textbf{x}}}_{2}$ is mixed with the image indexed in the same order in $\tilde{{\textbf{x}}}_{3}$ to yield $\bar{{\textbf{x}}}_{1}$, then $\tilde{{\textbf{y}}}_{2}$ and $\tilde{{\textbf{y}}}_{3}$, $\tilde{{\textbf{p}}}_{2}$ and $\tilde{{\textbf{p}}}_{3}$ are proportionally mixed to get $\bar{{\textbf{y}}}_{1}$ and $\hat{{\textbf{y}}}_{1}$. Squares with mixed colors in Fig. 2 indicate mixed samples.

Step 3: second forward pass. For $i\in \left\{ 1,2,3\right\} $, we feed each network $f_{i}$ with mixed images $\bar{{\textbf{x}}}_{i}$ to get the individual prediction $\bar{{\textbf{p}}}_{i}$. Each $\bar{{\textbf{p}}}_{i}$ is optimized to be close to the mixed ground truth $\bar{{\textbf{y}}}_{i}$ with a supervised loss $L_{sup} \left( \bar{{\textbf{p}}}_{i}, \bar{{\textbf{y}}}_{i}\right) $. Besides, consistency is enforced between $\bar{{\textbf{p}}}_{i}$ and the mixed pseudo-labels $\hat{{\textbf{y}}}_{i}$, with an unsupervised loss $L_{unsup} \left( \bar{{\textbf{p}}}_{i}, \hat{{\textbf{y}}}_{i}\right) $. Note that $\hat{{\textbf{y}}}_{i}$ could be soft (probability maps) or hard pseudo-labels (one-hot maps). A typical choice selected by most methods [4, 14, 17] is a soft pseudo-label, and an unsupervised loss $L_{unsup}^{p}$ compares the probability consistency by the mean square error (MSE) equation. By contrast, several works, e.g., [8, 10] utilize a hard pseudo-label, where an unsupervised loss $L_{unsup}^{s}$ calculates the pseudo supervision consistency.

To conclude, the total optimization objective of each network is

$$\begin{aligned} L_{i} = L_{sup} \left( {\textbf {p}}_{i}, \textbf{y}) + \lambda _{1} L_{sup} (\bar{{\textbf{p}}}_{i}, \bar{{\textbf{y}}}_{i}\right) + \lambda _{2} L_{unsup} \left( \bar{{\textbf{p}}}_{i}, \hat{{\textbf{y}}}_{i}\right) , \end{aligned}$$

(1)

where $i\in \left\{ 1,2,3\right\} $ is the index pointing out items corresponding to network $f_{i}$, and $\lambda _{1}$ and $\lambda _{2}$ are hyperparameters to balance each term.

Default Settings. In this study, we adopt pseudo supervision consistency. We will show that TriMix potentially achieves better accuracy integrated with pseudo supervision consistency than probability consistency in Sect. 4.4. Besides, we utilize CutMix [46] as the mix strategy, similar to [9,10,11], but note that other kinds of mix augmentations should also fit our framework.

Inference Process. Triple networks with different weights are in TriMix. For a test sample, each network individually outputs a prediction. We will report the average result of them and report their ensemble result obtained by soft voting.

The below two sections will show how TriMix can be applied to semi- and scribble-supervised tasks, following the standard process from Step 1 to Step 3.

3.2 TriMix in Semi-Supervised Segmentation

Semi-supervised segmentation aims to learn a model by exploiting two given datasets: labeled dataset $\textbf{D}_{l} = \left\{ \textbf{X}_{l}, \textbf{Y}_{l}\right\} $, and unlabeled dataset $\textbf{D}_{u} = \left\{ \textbf{X}_{u}\right\} $, where $\textbf{X}_{}$ and $\textbf{Y}_{}$ are images and the corresponding ground truth.

Assume a mini-batch of labeled data $\textbf{b}_{l} = \left\{ \textbf{x}_{l},\textbf{y}_{l}\right\} \in {\textbf{D}_{l}}$ and a mini-batch of unlabeled data $\textbf{b}_{u} = \left\{ \textbf{x}_{u}\right\} \in {\textbf{D}_{u}}$ are sampled at each training iteration. We illustrate the training detail of $\textbf{b}_{l}$ and $\textbf{b}_{u}$ in the following.

First, the mini-batch $\textbf{b}_{l}$ contains the images and the corresponding ground truth, and TriMix can be optimized with $\textbf{b}_{l}$ obeying the standard process as illustrated in Fig. 2. However, existing SSL methods, e.g., [10, 11] rarely introduce perturbations to the labeled data, even though it is beneficial for performance. Following previous methods, we optimize TriMix only with Step 1 and eliminate the processes of Step 2 and Step 3 when using $\textbf{b}_{l}$. Thus, for $i\in \left\{ 1,2,3\right\} $, assume each network $f_{i}$ outputs perdition $\textbf{p}_{l_{i}}$ for images $\textbf{x}_{l}$, then only a supervised loss $L_{sup} \left( \textbf{p}_{l_{i}}, \textbf{y}_{l}\right) $ is calculated between $\textbf{p}_{l_{i}}$ and the ground truth $\textbf{y}_{l}$.

Second, the mini-batch $\textbf{b}_{u}$ contains images $\textbf{x}_{u}$ but no related labels. TriMix can still be optimized with $\textbf{b}_{u}$ following the standard process as illustrated in Fig. 2 but without supervised terms. Specifically, for $i\in \left\{ 1,2,3\right\} $, each network $f_{i}$ outputs individual prediction $\textbf{p}_{u_{i}}$ for $\textbf{x}_{u}$ with the first forward pass at Step 1. There is no supervised term at Step 1 for each $\textbf{p}_{u_{i}}$, due to the lack of ground truth. At Step 2, three batches ${\textbf {b}}_{u_{1}} = \left\{ {\textbf {x}}_{u},{\textbf {p}}_{u_{1}}\right\} $, ${\textbf {b}}_{u_{2}} =\left\{ {\textbf {x}}_{u},{\textbf {p}}_{u_{2}}\right\} $, and ${\textbf {b}}_{u_{3}} =\left\{ {\textbf {x}}_{u},{\textbf {p}}_{u_{3}}\right\} $, which contain no ground truth, can be mixed up to generate augmented batches $\bar{{\textbf{b}}}_{u_{1}} = \left\{ \bar{{\textbf{x}}}_{u_{1}},\hat{{\textbf{y}}}_{u_{1}}\right\} $,$\bar{{\textbf{b}}}_{u_{2}} = \left\{ \bar{{\textbf{x}}}_{u_{2}},\hat{{\textbf{y}}}_{u_{2}}\right\} $, and $\bar{{\textbf{b}}}_{u_{3}} = \left\{ \bar{{\textbf{x}}}_{u_{3}},\hat{{\textbf{y}}}_{u_{3}}\right\} $, that have no mixed ground truth. At Step 3, each network $f_{i}$ fed with mixed images $\bar{{\textbf{x}}}_{u_{i}}$ is expected to output a similar prediction $\bar{\textbf{p}}_{u_{i}}$ compared to $\hat{{\textbf{y}}}_{u_{i}}$, with an unsupervised loss $L_{unsup} \left( \bar{\textbf{p}}_{u_{i}}, \hat{{\textbf{y}}}_{u_{i}}\right) $.

To conclude, the total training objective of each network in this task is

$$\begin{aligned} L_i = L_{sup} \left( \textbf{p}_{l_{i}}, \textbf{y}_{l}) + \lambda L_{unsup} (\bar{\textbf{p}}_{u_{i}}, \hat{{\textbf{y}}}_{u_{i}}\right) , \end{aligned}$$

(2)

where items with $i\in \left\{ 1,2,3\right\} $ correspond to network $f_{i}$, and $\lambda $ is a trade-off hyperparameter. Moreover, we use the dice loss [58] $L_{dice}$ as both the supervised and unsupervised losses. Thus, Eq. (2) is re-written as

$$\begin{aligned} L_i = \underbrace{ L_{dice} \left( \textbf{p}_{l_{i}}, \textbf{y}_{l}\right) }_\mathrm{{sup}} + \underbrace{\lambda L_{dice} \left( \bar{\textbf{p}}_{u_{i}}, \hat{{\textbf{y}}}_{u_{i}}\right) }_\mathrm{{unsup}}. \end{aligned}$$

(3)

3.3 TriMix in Scribble-Supervised Segmentation

Scribble-supervised segmentation trains a model from a given dataset $\textbf{D}_{s} = \left\{ \textbf{X}_{s}, \textbf{Y}_{s}\right\} $, where $\textbf{X}_{s}$ and $\textbf{Y}_{s}$ are images and the related scribble annotations.

Let $\textbf{b}_{s} = \left\{ \textbf{x}_{s},\textbf{y}_{s}\right\} \in {\textbf{D}_{s}}$ indicate a mini-batch fetched at every training iteration. Since $\textbf{b}_{s}$ contains images and the corresponding ground truth in scribbles, we follow the standard process illustrated in Fig. 2 to train TriMix with $\textbf{b}_{s}$. Let us say, for $i\in \left\{ 1,2,3\right\} $, each network $f_{i}$ outputs its prediction $\textbf{p}_{s_{i}}$ for $\textbf{x}_{s}$ at Step 1, and we obtain mixed batches of $\bar{{\textbf{b}}}_{s_{1}} = \left\{ \bar{{\textbf{x}}}_{s_{1}},\bar{{\textbf{y}}}_{s_{1}},\hat{{\textbf{y}}}_{s_{1}}\right\} $, $\bar{{\textbf{b}}}_{s_{2}} = \left\{ \bar{{\textbf{x}}}_{s_{2}},\bar{{\textbf{y}}}_{s_{2}},\hat{{\textbf{y}}}_{s_{2}}\right\} $, and $\bar{{\textbf{b}}}_{s_{3}} = \left\{ \bar{{\textbf{x}}}_{s_{3}},\bar{{\textbf{y}}}_{s_{3}},\hat{{\textbf{y}}}_{s_{3}}\right\} $ at Step 2. Then identical to Eq. (1), the training objective of each network $f_{i}$ in scribble-supervised segmentation is

$$\begin{aligned} L_i = L_{sup} \left( \textbf{p}_{s_{i}}, \textbf{y}_{s}\right) + \lambda _{1} L_{sup} \left( \bar{{\textbf{p}}}_{s_{i}}, \bar{{\textbf{y}}}_{s_{i}}\right) + \lambda _{2} L_{unsup} \left( \bar{{\textbf{p}}}_{s_{i}}, \hat{{\textbf{y}}}_{s_{i}}\right) , \end{aligned}$$

(4)

where $\lambda _{1}$ and $\lambda _{2}$ are hyperparameters balancing each term.

Besides, since $\textbf{y}_{s}$ and $\bar{{\textbf{y}}}_{s_{i}}$ are scribble annotations, we apply the partial cross-entropy (pCE) function [38] $L_{pce}$, which calculates the loss only for annotated pixels as the supervised loss, following [7, 8, 38]. Formally, let $\textbf{m}$ and $\textbf{n}$ be the prediction and the scribble annotation, and $L_{pce} \left( \textbf{m}, \textbf{n}\right) $ is defined as

$$\begin{aligned} L_{pce} \left( \textbf{m}, \textbf{n}\right) = -\sum _{j\in J}^{}\sum _{k\in K}^{} \textbf{n}_{}^{jk} \log \textbf{m}_{}^{jk}, \end{aligned}$$

(5)

in which J is the set of pixels with scribble annotation, K is the number of classification categories. $\textbf{m}_{}^{jk}$ indicates the predicted value of k-th channel for the j-th pixel in $\textbf{m}$, and $\textbf{n}_{}^{jk}$ is the corresponding ground truth of k-th channel for the j-th pixel annotation in $\textbf{n}$.

Lastly, we use the cross-entropy (CE) loss $L_{ce}$ as the unsupervised loss. Thus, Eq. (4) is re-written as

$$\begin{aligned} L_{i} = \underbrace{L_{pce}^{unmix} (\textbf{p}_{s_{i}}, \textbf{y}_{s}) + \lambda _{1} L_{pce}^{mix} (\bar{{\textbf{p}}}_{s_{i}}, \bar{{\textbf{y}}}_{s_{i}})}_\mathrm{{sup}} + \underbrace{\lambda _{2} L_{ce}^{mix} (\bar{{\textbf{p}}}_{s_{i}}. \hat{{\textbf{y}}}_{s_{i}})}_\mathrm{{unsup}}, \end{aligned}$$

(6)

where the superscript unmix denotes that labels for calculation are original and without the mix augmentation. The superscript mix indicates that labels and pseudo-labels for calculation are generated from the mix augmentation.

4 Experiments on Semi-Supervised Segmentation

4.1 Data and Evaluation Metric

ACDC Dataset. [59] consists of 200 MRI volumes from 100 patients, and each volume manually delineates the ground truth for the left ventricle (LV), the right ventricle (RV), and the myocardium (Myo). The original volume sizes are $ (154-428)\times (154-512)\times (6-18)$ pixels. We resized all the volumes to $256\times 256\times 16$ pixels and normalized the intensities as zero mean and unit variance. We performed 4-fold cross-validation. We validated our method under the 16/150 partition protocol. In each fold, we sampled 16 volumes among 150 as the labeled data, and the remaining ones were treated as unlabeled data.

Hippocampus dataset was collected by The Medical Segmentation Decathlon^{Footnote 2}, is comprised of 390 MRI volumes of the hippocampus. We utilized the training set (260 volumes) for validation, which contains the corresponding ground truth of the anterior and posterior regions of the hippocampus. Volume sizes are $ (31-43)\times (40-59)\times (24-47)$ pixels. We resized all the volumes to $32\times 48\times 32$ pixels. With this dataset, we challenged a more tough problem where only one labeled sample is available for training, i.e., one-shot setting. We conducted 4-fold cross-validation, sampled 1 volume among 195 cases as the labeled data in each fold, and treated the rest as unlabeled data.

Evaluation Metric. Dice score and 95% Hausdorff Distance (95HD) were used to measure the volume overlap rate and the surface distance.

Table 1. Comparison with semi-supervised state-of-the-arts on ACDC dataset under 16/150 partition protocol. We report the average (standard deviation) results based on 4-fold cross-validation. $_{}^{\dagger }$: method with ensemble strategy.

Full size table

4.2 Experimental Setup

Implementation Details. We adopted V-Net [58] as the backbone architecture. To fit the volumetric data, we extended CutMix [46] to 3D and set the cropped volume ratio to 0.2. We empirically set $\lambda $ to 0.5 in Eq. (3). We trained TriMix 300 epochs using SGD with a weight decay of 0.0001 and a momentum of 0.9. The initial learning rate was set to 0.01 and was divided by 10 every 100 epochs. At each training iteration, 4 labeled and 4 unlabeled samples were fetched for the ACDC dataset, and 1 labeled and 4 unlabeled samples were fetched for the Hippocampus dataset.

Baseline and Upper Bound. We provided the baseline and upper bound settings for reference. We trained the backbone V-Net only with the partitioned labeled data and treated the result as the baseline setting. Besides, we regraded the result trained with the complete labeled data as the upper bound accuracy.

Mainstream Approaches. We implemented several SSL algorithms: Mean Teacher (MT) [4], Uncertainty-Aware Mean Teacher (UA-MT) [14], CutMix-Seg [9], Spatial-Temporal Smoothing Mean Teacher (STS-MT) [28], Uncertainty-Aware Multi-View Co-Training (UMCT) [18], and Cross Pseudo Supervision (CPS) [10], and compared TriMix to them. CutMix-Seg and CPS were incorporated with the 3D CutMix augmentation. UMCT was trained with three different views. We will report the student model results for MT, UA-MT, STS-MT, and CutMix-Seg. Since there is more than one trainable model within CPS and UMCT, we will report their average result among the trained models and the ensembled result for UMCT, the same as TriMix.

4.3 Experiment Results

Improvement over the Baseline. We investigated TriMix’s effectiveness in exploiting the unlabeled data. As illustrated in Table 1 and Table 2, we note that TriMix significantly improve the baseline. Specifically, it gains +15.7% in Dice and -13.7 in 95HD on the ACDC dataset and +54.7% in Dice and -6.8 in 95HD on the Hippocampus dataset, demonstrating that TriMix can effectively mine informative information from the unlabeled data to improve generalization.

Comparison with SOTAs. For the ACDC dataset under 16/150 partition protocol (see Table 1), CutMix-Seg achieves better average results than MT and confirms its effectiveness with strong input perturbation. STS-MT employs the spatial-temporal smoothing mechanism and outperforms CutMix-Seg. UMCT is in a co-training style and takes advantage of multi-view information. It brings higher accuracy than STS-MT but can not achieve the performance of CPS. TriMix obtains the best results among the methods. For the Hippocampus dataset with the one-shot setting (see Table 2), the existing SSL methods generally improve the baseline, verifying how effectively they exploit the unlabeled data. TriMix greatly outperforms the other methods, producing meaningful accuracy. Notably, TriMix surpasses the second-best method CPS by +12.5% in Dice and -2.1 in 95HD. Validation of these two datasets reveals that TriMix is competitive with SOTAs under typical partition protocols and has strong potential for learning from extremely scarce labeled data.

Table 2. Comparison with semi-supervised state-of-the-arts on Hippocampus dataset with one-shot setting. We report the average (standard deviation) results based on 4-fold cross-validation. $_{}^{\dagger }$: method with ensemble strategy.

Full size table

4.4 Empirical Study and Analysis

Pseudo Supervision Consistency vs. Probability Consistency. We compared the pseudo supervision consistency (denoted by $L_{unsup}^{s}$) and probability consistency (denoted by $L_{unsup}^{p}$) on the ACDC and Hippocampus datasets under different partition protocols. Results are shown in Fig. 3. Overall, TriMix incorporated with $L_{unsup}^{s}$ outperforms TriMix with $L_{unsup}^{p}$ across all the partition protocols on the two datasets. Especially under the one-shot setting on the Hippocampus dataset, $L_{unsup}^{s}$ surpasses $L_{unsup}^{p}$ by +54.2% in Dice and -5.9 in 95HD, indicating that a one-hot label map plays a more crucial role than a probability map as the expanded ground truth to supervise the other models within the framework TriMix. Previous works [5, 8, 10] have reported similar observations. Using hard pseudo-labels encourages models to be low-entropy/high-confidence on data and is closely related to entropy minimization [60]. Based on this ablation, we utilize the pseudo supervision consistency as the default setting for TriMix in semi- and scribble-supervised segmentation.

Robustness to Different Partition Protocols. We studied TriMix’s robustness to various partition protocols on the ACDC and Hippocampus datasets. As shown in Fig. 3, TriMix consistently promotes the baseline and outperforms UA-MT across all the partition protocols, demonstrating the robustness and effectiveness of our method under different data settings. Moreover, TriMix surpasses the upper bound accuracy under the 72/150 partition protocol on the ACDC dataset and the 96/195 partition protocol on the Hippocampus dataset, revealing that TriMix can greatly reduce dependence on the labeled data.

Relations to Existing Methods. Among the semi-supervised methods for comparison, UMCT and CPS are the two most related methods to TriMix. UMCT is a co-training-based strategy to introduce view differences. Thus, TriMix resembles UMCT in some sense as both methods follow the spirit of multi-model joint training and encourage consistency among models. However, TriMix adopts a stricter perturbation than UMCT. Moreover, CPS can be regarded as a downgraded version of TriMix, in which two perturbed networks are trained to generate hard pseudo-labels to supervise each other. TriMix outperforms UMCT and CPS on the ACDC and Hippocampus datasets, demonstrating the superiority of our strategy, where consistency regularization under a more challenging perturbation is adopted in tri-training.

5 Experiments on Scribble-Supervised Segmentation

5.1 Data and Evaluation Metric

ACDC Dataset. [59] introduced in Sect. 4.1 was reused in this task, but with corresponding scribble annotations [6]. We resized all slices to the size of 256$\times $256 pixels and normalized their intensity to [0,1], identical to the work [8].

MSCMRseg Dataset. [61] comprises of LGE-MRI images from 45 patients. We utilized the scribble annotations of LV, Myo, and RV released from [7] and used the same data partition setting as theirs: 25 images for training, 5 for validation, and 15 for testing. For data prepossessing, we re-sampled all images to the resolution of 1.37$\times $1.37 mm, cropped or padded images to the size of 212$\times $212 pixels, and normalized each image to zero and unit variance.

Evaluation Metric. Dice score and 95HD were utilized.

5.2 Experimental Setup

Implementation Details. We adopted the 2D U-Net architecture [62] as the backbone for all experiments in this task. The cropped area ratio was set to 0.2 when performing the CutMix augmentation. $\lambda _{1}$ and $\lambda _{2}$ in Eq. (6) were empirically set to 1. For the ACDC dataset, we used almost the same settings as in [8]. Specifically, we used SGD (weight decay = 0.0001, momentum = 0.9) to optimize TriMix for a total of 60000 iterations under a poly learning rate with an initial value of 0.03. The batch size was set to 12. We performed 5-fold cross-validation. For the MSCMRseg dataset, we followed [7] to train TriMix 1000 epochs with the Adam optimizer and a fixed learning rate of 0.0001. We conducted 5 runs with seeds 1, 2, 3, 4 and 5.

Baseline and Upper Bound. 2D U-Net trained with scribble annotations using the pCE loss [38] was regarded as the baseline setting. Furthermore, the upper bound accuracy was obtained using entirely dense annotations.

Mainstream Approaches. We compared TriMix with several methods, including training with pseudo-labels generated by Random Walks (RW) [33], Scribble2Lables (S2L) [19], Uncertainty-Aware Self-Ensembling and Transformation Consistency Model (USTM) [39], Entropy Minimization (EM) [60], Mumford-Shah Loss (MLoss) [63], Regularized Loss (RLoss) [37], Dynamically Mixed Pseudo Labels Supervision (simply abbreviated to DMPLS in this paper) [8], CycleMix [7], and Shape-Constrained Positive-Unlabeled Learning (ShapePU) [20].

5.3 Experiment Results

Improvement over Baseline. As shown in Table 3 and Table 4, TriMix significantly improves the baseline on the ACDC and MSCMRseg datasets, gaining +20.2% and +49.6% Dice scores, respectively, which proves that TriMix can learn good representations from sparse scribble annotations.

Comparison with SOTAs. For the ACDC dataset (see Table 3), TriMix achieves the highest average accuracy in Dice and 95HD among all scribble-supervised methods and reaches the closest result to the upper bound accuracy. It is worth noting that TriMix obtains a gain of 1.6% in Dice over DMPLS and a reduction of 1.0 in 95HD than RLoss. For the MSCMRseg dataset (see Table 4), TriMix surpasses all mix augmentation-based schemes, i.e., MixUp, CutOut, CutMix, PuzzleMix, CoMixUp, and CycleMix, as well as two SOTAs, i.e., CycleMix, and ShapePU. TriMix outperforms CycleMix by +7.4% and ShapePU by +2.2% and even improves the upper bound accuracy by +11.9% in Dice. Evaluations of these two benchmarks reveal that TriMix shows stronger generalization learning from sparse annotations than SOTAs.

Table 3. Comparison with scribble-supervised state-of-the-arts on ACDC dataset. Other average (standard deviation) results are from [8]. Ours are based on 5-fold cross-validation. $_{}^{\dagger }$: method with ensemble strategy.

Full size table

5.4 Empirical Study and Analysis

Ablation on Different Loss Combinations. We investigated the effectiveness of different loss combinations on the accuracy, as illustrated in Fig. 4. Only leveraging the original scribble annotations, $L_{pce}^{unmix}$ brings the lower bound accuracy. $L_{pce}^{mix}$ contributes to the performance and boosts the lower bound by +2.8% in Dice, showing that mix augmentation aids in increasing scribble annotations and thus improves accuracy. $L_{ce}^{mix}$ contributes much more than $L_{pce}^{unmix}$ and improves the lower bound by +41.0% in Dice, revealing that pseudo supervision is essential for TriMix. Besides, combining all losses yields the highest accuracy.

Relations to Existing Methods. TriMix is related to DMPLS and CycleMix. Specifically, DMPLS utilizes co-labeled pseudo-labels from multiple diverse branches to supervise single-branch output based on consistency regularization. CycleMix employs mix augmentation to increase scribble annotations and imposes consistency under the input perturbation. TriMix seems to be at the middle ground. It imports mix augmentation similar to CycleMix and enforces the consistency among various outputs with pseudo-label supervision, resembling DMPLS. TriMix incorporates valid features beneficial for scribble-supervised segmentation and achieves the new SOTA performance on two public benchmarks, i.e., the ACDC and MSCMRseg datasets.

Table 4. Comparison with scribble-supervised state-of-the-arts on MSCMRseg dataset. Other average (standard deviation) results in Dice score are from [7, 20]. Ours are based on 5 runs. $_{}^{\dagger }$: method with ensemble strategy.

Full size table

6 Discussion and Conclusion

This paper seeks to address semi- and scribble-supervised segmentation in a general way. We provide a hypothesis on a general learner learning from limited supervision: (i) it should output more accurate predictions and (ii) it should be trained with consistency regularization under a more challenging perturbation. We empirically verify the hypothesis with a simple framework. The method, called TriMix, purely imposes consistency on a tri-training framework under a stricter perturbation, i.e., combining data augmentation and model diversity. Our method is competitive with task-specific mainstream methods. It shows strong potential training with extremely scarce labeled data and achieves new SOTA performance on two popular benchmarks when learning from sparse annotations. We also provide extra evaluations of our method in appendix.

Moreover, as suggested by [64], Deep Ensembles can provide a simple and scalable way for uncertainty estimation. TriMix maintains triple diverse networks, and such nature allows for its efficient uncertainty modeling. It is essential to estimate and quantify uncertainty for models learned from limited supervision, which is, however, rarely explored. It is also interesting to investigate whether TriMix can be applied to handle other types of imperfect annotations, e.g., noise labels [2, 65]. In addition, TriMix’s mechanism is similar to that of the method BYOL [66], which employs two networks and enforces representation consistency between them. TriMix may be applicable for self-supervised learning, but it needs further evaluation. Last but not least, similar to multi-view co-training [18], TriMix is inherently expensive in computation. To make TriMix more efficient, we may investigate strategies such as MIMO [67] for TriMix in the future. The above avenues are regarded as our follow-up works.

Notes

1.
Note that the concepts of one-shot learning [22,23,24] and semi-supervised learning should be different. We borrow the phrase “one-shot” to define a more challenging semi-supervised setting where only one labeled sample is available during training.
2.
http://medicaldecathlon.com/.

References

Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)
Google Scholar
Tajbakhsh, N., Jeyaseelan, L., Li, Q., Chiang, J.N., Wu, Z., Ding, X.: Embracing imperfect datasets: a review of deep learning solutions for medical image segmentation. Med. Image Anal. 63, 101693 (2020)
Google Scholar
Miyato, T., Maeda, S.I., Koyama, M., Ishii, S.: Virtual adversarial training: a regularization method for supervised and semi-supervised learning. In: TPAMI, vol. 41 (2018)
Google Scholar
Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: NeurIPS (2017)
Google Scholar
Sohn, K., et al.: FixMatch: simplifying semi-supervised learning with consistency and confidence. In: NeurIPS (2020)
Google Scholar
Valvano, G., Leo, A., Tsaftaris, S.A.: Learning to segment from scribbles using multi-scale adversarial attention gates. In: TMI, vol. 40 (2021)
Google Scholar
Zhang, K., Zhuang, X.: CycleMix: a holistic strategy for medical image segmentation from scribble supervision. In: CVPR (2022)
Google Scholar
Luo, X., et al.: Scribble-supervised medical image segmentation via dual-branch network and dynamically mixed pseudo labels supervision. In: MICCAI (2022)
Google Scholar
French, G., Laine, S., Aila, T., Mackiewicz, M., Finlayson, G.: Semi-supervised semantic segmentation needs strong, varied perturbations. In: BMVC (2020)
Google Scholar
Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR (2021)
Google Scholar
Liu, Y., Tian, Y., Chen, Y., Liu, F., Belagiannis, V., Carneiro, G.: Perturbed and strict mean teachers for semi-supervised semantic segmentation. In: CVPR (2022)
Google Scholar
Ke, Z., Wang, D., Yan, Q., Ren, J., Lau, R.W.: Dual student: breaking the limits of the teacher in semi-supervised learning. In: ICCV (2019)
Google Scholar
Ouali, Y., Hudelot, C., Tami, M.: Semi-supervised semantic segmentation with cross-consistency training. In: CVPR (2020)
Google Scholar
Yu, L., Wang, S., Li, X., Fu, C.W., Heng, P.A.: Uncertainty-aware self-ensembling model for semi-supervised 3D left atrium segmentation. In: MICCAI (2019)
Google Scholar
Wang, Y., et al.: Double-uncertainty weighted method for semi-supervised learning. In: MICCAI (2020)
Google Scholar
Luo, X., Chen, J., Song, T., Wang, G.: Semi-supervised medical image segmentation through dual-task consistency. In: AAAI (2021)
Google Scholar
Wu, Y., Xu, M., Ge, Z., Cai, J., Zhang, L.: Semi-supervised left atrium segmentation with mutual consistency training. In: MICCAI (2021)
Google Scholar
Xia, Y., et al.: Uncertainty-aware multi-view co-training for semi-supervised medical image segmentation and domain adaptation. Med. Image Anal. 65, 101766 (2020)
Google Scholar
Lee, H., Jeong, W.K.: Scribble2label: Scribble-supervised cell segmentation via self-generating pseudo-labels with consistency. In: MICCAI (2020)
Google Scholar
Zhang, K., Zhuang, X.: ShapePU: A new PU learning framework regularized by global consistency for scribble supervised cardiac segmentation. In: MICCAI (2022)
Google Scholar
Zhou, Z.H., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. Trans. Knowl. Data Eng. 17, 1529–1541 (2005)
Google Scholar
Zhao, A., Balakrishnan, G., Durand, F., Guttag, J.V., Dalca, A.V.: Data augmentation using learned transformations for one-shot medical image segmentation. In: CVPR (2019)
Google Scholar
Wang, S., et al.: LT-Net: label transfer by learning reversible voxel-wise correspondence for one-shot medical image segmentation. In: CVPR (2020)
Google Scholar
Tomar, D., Bozorgtabar, B., Lortkipanidze, M., Vray, G., Rad, M.S., Thiran, J.P.: Self-supervised generative style transfer for one-shot medical image segmentation. In: WACV (2022)
Google Scholar
Lee, D.H., et al.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML (2013)
Google Scholar
Arazo, E., Ortego, D., Albert, P., O’Connor, N.E., McGuinness, K.: Pseudo-labeling and confirmation bias in deep semi-supervised learning. In: IJCNN (2020)
Google Scholar
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.A.: MixMatch: a holistic approach to semi-supervised learning. In: NeurIPS (2019)
Google Scholar
Huang, T., Sun, Y., Wang, X., Yao, H., Zhang, C.: Spatial ensemble: a novel model smoothing mechanism for student-teacher framework. In: NeurIPS (2021)
Google Scholar
Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. In: ICLR (2017)
Google Scholar
Li, S., Zhang, C., He, X.: Shape-aware semi-supervised 3D semantic segmentation for medical images. In: MICCAI (2020)
Google Scholar
Hang, W., et al.: Local and global structure-aware entropy regularized mean teacher model for 3D left atrium segmentation. In: MICCAI (2020)
Google Scholar
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. In: TPAMI, vol. 23 (2001)
Google Scholar
Grady, L.: Random walks for image segmentation. In: TPAMI, vol. 28 (2006)
Google Scholar
Lin, D., Dai, J., Jia, J., He, K., Sun, J.: ScribbleSup: scribble-supervised convolutional networks for semantic segmentation. In: CVPR (2016)
Google Scholar
Bai, W., et al.: Recurrent neural networks for aortic image sequence segmentation with sparse annotations. In: MICCAI (2018)
Google Scholar
Ji, Z., Shen, Y., Ma, C., Gao, M.: Scribble-based hierarchical weakly supervised learning for brain tumor segmentation. In: MICCAI (2019)
Google Scholar
Tang, M., Perazzi, F., Djelouah, A., Ayed, I.B., Schroers, C., Boykov, Y.: On regularized losses for weakly-supervised CNN segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 524–540. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_31
Chapter Google Scholar
Tang, M., Djelouah, A., Perazzi, F., Boykov, Y., Schroers, C.: Normalized cut loss for weakly-supervised CNN segmentation. In: CVPR (2018)
Google Scholar
Liu, X., et al.: Weakly supervised segmentation of covid19 infection with scribble annotation on CT images. Pattern Recogn. 122, 108341 (2022)
Google Scholar
Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: AutoAugment: learning augmentation strategies from data. In: CVPR (2019)
Google Scholar
Hataya, R., Zdenek, J., Yoshizoe, K., Nakayama, H.: Faster autoAugment: learning augmentation strategies using backpropagation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 1–16. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_1
Chapter Google Scholar
Lin, C., et al.: Online hyper-parameter learning for auto-augmentation strategy. In: ICCV (2019)
Google Scholar
Tian, K., Lin, C., Sun, M., Zhou, L., Yan, J., Ouyang, W.: Improving auto-augment via augmentation-wise weight sharing. In: NeurIPS (2020)
Google Scholar
Zhang, H., Cissé, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: ICLR (2018)
Google Scholar
DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv (2017)
Google Scholar
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: CutMix: regularization strategy to train strong classifiers with localizable features. In: ICCV (2019)
Google Scholar
Kim, J., Choo, W., Jeong, H., Song, H.O.: Co-Mixup: Saliency guided joint mixup with supermodular diversity. In: ICLR (2021)
Google Scholar
Verma, V., et al.: Manifold mixup: better representations by interpolating hidden states. In: ICML (2019)
Google Scholar
Olsson, V., Tranheden, W., Pinto, J., Svensson, L.: ClassMix: segmentation-based data augmentation for semi-supervised learning. In: WACV (2021)
Google Scholar
Kim, J.H., Choo, W., Song, H.O.: Puzzle mix: exploiting saliency and local statistics for optimal mixup. In: ICML (2020)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT (1998)
Google Scholar
Qiao, S., Shen, W., Zhang, Z., Wang, B., Yuille, A.: Deep co-training for semi-supervised image recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 142–159. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_9
Chapter Google Scholar
Peng, J., Estrada, G., Pedersoli, M., Desrosiers, C.: Deep co-training for semi-supervised image segmentation. Pattern Recogn. 107, 107269 (2020)
Google Scholar
Saito, K., Ushiku, Y., Harada, T.: Asymmetric tri-training for unsupervised domain adaptation. In: ICML (2017)
Google Scholar
Chen, D.D., Wang, W., Gao, W., Zhou, Z.H.: Tri-net for semi-supervised deep learning. In: IJCAI (2018)
Google Scholar
Zhang, T., Yu, L., Hu, N., Lv, S., Gu, S.: Robust medical image segmentation from non-expert annotations with tri-network. In: MICCAI (2020)
Google Scholar
Yu, J., Yin, H., Gao, M., Xia, X., Zhang, X., Viet Hung, N.Q.: Socially-aware self-supervised tri-training for recommendation. In: KDD (2021)
Google Scholar
Milletari, F., Navab, N., Ahmadi, S.: V-Net: Fully convolutional neural networks for volumetric medical image segmentation. In: 3DV (2016)
Google Scholar
Bernard, O., et al.: Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: Is the problem solved? Trans. Med. Imaging 37, 2514–2525 (2018)
Google Scholar
Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: NeurIPS (2004)
Google Scholar
Zhuang, X.: Multivariate mixture model for myocardial segmentation combining multi-source images. In: TPAMI, vol. 41 (2018)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: MICCAI (2015)
Google Scholar
Kim, B., Ye, J.C.: Mumford-shah loss functional for image segmentation with deep learning. Trans. Image Process. 29, 1856–1866 (2019)
Google Scholar
Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: NeurIPS (2017)
Google Scholar
Karimi, D., Dou, H., Warfield, S.K., Gholipour, A.: Deep learning with noisy labels: exploring techniques and remedies in medical image analysis. Med. Image Anal. 65, 101759 (2020)
Google Scholar
Grill, J.B., et al.: Bootstrap your own latent-a new approach to self-supervised learning. In: NeurIPS (2020)
Google Scholar
Havasi, M., et al.: Training independent subnetworks for robust prediction. In: ICLR (2020)
Google Scholar

Download references

Acknowledgement

This work was supported by JSPS KAKENHI Grant Numbers 21K19898 and 17H00867 and JST CREST Grant Number JPMJCR20D5, Japan.

Author information

Authors and Affiliations

Nagoya University, Nagoya, Japan
Zhou Zheng, Yuichiro Hayashi, Masahiro Oda & Kensaku Mori
Aichi Institute of Technology, Toyota, Japan
Takayuki Kitasaka
National Institute of Informatics, Chiyoda City, Japan
Kensaku Mori

Authors

Zhou Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Yuichiro Hayashi
View author publications
You can also search for this author in PubMed Google Scholar
Masahiro Oda
View author publications
You can also search for this author in PubMed Google Scholar
Takayuki Kitasaka
View author publications
You can also search for this author in PubMed Google Scholar
Kensaku Mori
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhou Zheng or Kensaku Mori .

Editor information

Editors and Affiliations

University of Wollongong, Wollongong, NSW, Australia
Lei Wang
University of Bonn, Bonn, Germany
Juergen Gall
University of Adelaide, Adelaide, SA, Australia
Tat-Jun Chin
National Institute of Informatics, Tokyo, Japan
Imari Sato
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1360 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zheng, Z., Hayashi, Y., Oda, M., Kitasaka, T., Mori, K. (2023). TriMix: A General Framework for Medical Image Segmentation from Limited Supervision. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13846. Springer, Cham. https://doi.org/10.1007/978-3-031-26351-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-26351-4_12
Published: 26 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26350-7
Online ISBN: 978-3-031-26351-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TriMix: A General Framework for Medical Image Segmentation from Limited Supervision

Abstract

Similar content being viewed by others

Shifting to machine supervision: annotation-efficient semi and self-supervised learning for automatic medical image segmentation and classification

Duo-SegNet: Adversarial Dual-Views for Semi-supervised Medical Image Segmentation

Self-supervised Multi-scale Consistency for Weakly Supervised Segmentation Learning

1 Introduction

2 Related Work