Flow-Based Deformation Guidance for Unpaired Multi-contrast MRI Image-to-Image Translation

Bui, Toan Duc; Nguyen, Manh; Le, Ngan; Luu, Khoa

doi:10.1007/978-3-030-59713-9_70

Toan Duc Bui¹⁶,
Manh Nguyen^16,17,
Ngan Le¹⁸ &
…
Khoa Luu¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12262))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

8427 Accesses
9 Citations

Abstract

Image synthesis from corrupted contrasts increases the diversity of diagnostic information available for many neurological diseases. Recently the image-to-image translation has experienced significant levels of interest within medical research, beginning with the successful use of the Generative Adversarial Network (GAN) to the introduction of cyclic constraint extended to multiple domains. However, in current approaches, there is no guarantee that the mapping between the two image domains would be unique or one-to-one. In this paper, we introduce a novel approach to unpaired image-to-image translation based on the invertible architecture. The invertible property of the flow-based architecture assures a cycle-consistency of image-to-image translation without additional loss functions. We utilize the temporal information between consecutive slices to provide more constraints to the optimization for transforming one domain to another in unpaired volumetric medical images. To capture temporal structures in the medical images, we explore the displacement between the consecutive slices using a deformation field. In our approach, the deformation field is used as a guidance to keep the translated slides realistic and consistent across the translation. The experimental results have shown that the synthesized images using our proposed approach are able to archive a competitive performance in terms of mean squared error, peak signal-to-noise ratio, and structural similarity index when compared with the existing deep learning-based methods on three standard datasets, i.e. HCP, MRBrainS13 and Brats2019.

Access provided by Autonomous University of Puebla. Download conference paper PDF

TPSDicyc: Improved Deformation Invariant Cross-domain Medical Image Synthesis

Unsupervised Learning for Cross-Domain Medical Image Synthesis Using Deformation Invariant Cycle Consistency Networks

Make-A-Volume: Leveraging Latent Diffusion Models for Cross-Modality 3D Brain MRI Synthesis

Keywords

1 Introduction

In medical imaging, the task of obtaining diagnostic images from multiple modalities is necessary for accurate and comprehensive prediction of disease diagnosis. For example, T1-weighted (T1) brain images provide clear differentiate images of gray and white matter tissues, whereas T2-weighted (T2) images differentiate fluid from cortical tissue. By leveraging the information provided by both of these image modalities, we can gain a more in-depth and completed picture of the diagnosis. However, obtaining separately both images is often costly, time-consuming, and maybe corrupted by noise and artifacts. Therefore, cross-modalities synthesis is a promising application to improve the clinical feasibility and utility of multi-contrast MRI. Image-to-image translation has recently gained attention in the medical imaging community, where the task is to estimate the corresponding image in the target domain from a given source domain image of the same subject. Generally, the image-to-image translation methods can be divided into two categories including: Generative Adversarial Networks (GANs) and Flow-based Generative Networks and summarized as follows:

Generative Adversarial Networks GANs are a class of latent variable generative models that clearly identify the generator as deterministic mapping. The deterministic mapping represents an image as a point in the latent space without regarding its feature ambiguity. Several different GAN-based models have been used to explore image-to-image translation in a literature study [2, 3, 14, 16]. For example, Zhu et al. [16] proposed a cycleGAN method for mapping between unpaired domains by using cycle-consistency dependence to constrain the optimal solutions provided by the generative network. Balakrishnan et al. [2] proposed a RecycleGAN to explore the temporal information by learning a prediction of the next frame for video generation. Chen et al. [3] proposed a 3D cycleGAN network to learn the mapping between CT and MRI. The drawback of 3D cycleGAN is it is memory consumption and loses the global information due to working on small patch sizes.

Flow-based Generative Networks are a class of latent variable generative models that clearly identify the generator as an invertible mapping. The invertible mapping provides a distributional estimation of features in the latent space. Recently, many efforts making use of flow-based generative networks have been proposed to transfer between two unpaired data [4, 5, 7, 10, 12]. For example, Grover et al. [5] introduced a flow to flow (alignflow) network for unpaired image-to-image translation. Sun et al. [12] introduced a conditional dual flow-based invertible network to transfer between positron emission tomography (PET) imaging and magnetic resonance imaging (MRI) images. By using invertible properties, the flow-based methods can ensure exact cycle consistency in translation from a source domain to the target and returning to the source domain without any further loss functions.

Limitations of Existing Methods and Our Contributions. The primary drawback of the cycleGAN model is that it can not perform one-to-one mapping for accurate and unique unpaired image translation, generates biased image translations of the inverse mapping [11]. Different from the GANs-based method, the flow-based method guarantees precise cycle consistency in mapping data points from a source domain to the target and returning to the source domain. However, the flow-based methods do not take into account the temporal information between consecutive slices. To address this problem, we propose a new method by inheriting the merits of the flow-based method and exploiting temporal information between consecutive slices. Our approach provides more constraints to the optimization for transforming one domain to another domain. To capture temporal information, we employ a deformation field between consecutive slices by training a convolutional neural network. In our proposed approach, the deformation field plays a role of guidance to keep slices realistic and consistent across translation.

2 Related Work

2.1 Cycle-Consistent Adversarial Networks (cycleGAN)

Let $\{x_i\}_{i=1}^N$ and $\{y_i\}_{i=1}^M$ be unpaired data samples for two domains, i.e. the source domain X and the target domain Y, respectively. Denote D and G as a discriminator network and a generator network. The cycleGAN model [16] solves unpaired image-to-image translation between these two domains by estimating two independent mapping functions $G_{X \rightarrow Y}: X\rightarrow Y$ and $G_{Y \rightarrow X}: Y\rightarrow X$. The two mapping functions $G_{X \rightarrow Y}$ and $G_{Y \rightarrow X}$ performed by neural networks are trained to fool the discriminator $D_X$ and $D_Y$ respectively. The discriminator $D_X$, and $D_Y$ encourage the transferred images and the real images to be similar. Hence, the cycleGAN loss is defined as:

$$\begin{aligned} \begin{aligned}&\mathcal {L}_{cycleGAN}(G_{X \rightarrow Y},G_{Y \rightarrow X},D_X,D_Y) = \mathcal {L}_{GAN}(G_{X \rightarrow Y},D_Y) +\mathcal {L}_{GAN}(G_{Y \rightarrow X},D_X) \\&\quad \quad \quad \quad \quad \quad \quad \quad +\lambda \mathcal {L}_{cycle}(G_{X \rightarrow Y},G_{Y \rightarrow X}) +\beta \mathcal {L}_{identity}(G_{X \rightarrow Y},G_{Y \rightarrow X}) \end{aligned} \end{aligned}$$

(1)

where $\mathcal {L}_{GAN}$ is a GAN loss for the D network [16]. $\mathcal {L}_{cycle}$ is a cycle consistency loss that guarantees the transferred image from a time-point is able to bring back to the original image after appearance translation by the generator network G. For example, the cycle consistency loss of the data translated from $X \rightarrow Y$ via $G_{X}$ and mapped back to the original domain X via $G_{Y}$ is defined as:

$$\begin{aligned} \mathcal {L}_{cycle}(G_{X \rightarrow Y},G_{Y \rightarrow X})=\left\Vert G_{Y \rightarrow X}(G_{X \rightarrow Y} (x)) - x \right\Vert _{1} \end{aligned}$$

(2)

The identity loss $\mathcal {L}_{identity}$ is to regularize the generator to be near an identity mapping when real samples of the target domain are given as the input to the generator. The $\lambda $ and $\beta $ control the contribution of the two objective functions.

2.2 Flow-Based Generative Models

Flow-based Generative Models are a class of latent variable generative models that clearly identify the generator as an invertible mapping $h: Z \rightarrow X$ between a set of latent variables Z and a set of observed variables X. Let $p_X$ and $p_Z$ indicate the marginal densities given by the model over X and Z, respectively. Using the change-of-variables formula, these marginal densities are defined as

$$\begin{aligned} p_{X}(x) = p_{Z}(z) \bigg |\det \frac{\partial h^{-1}}{\partial X}\bigg |_{X=x} \end{aligned}$$

(3)

where $z=h^{-1}(x)$ because of the invertibility constraints. In particular, we use a multivariate Gaussian distribution $p_{Z}(z) = \mathcal {N} (\mu ,\,0, \,\mathbf{I} )$. Unlike adversarial training, flow models trained with maximum likelihood estimation (MLE) explicitly require a prior $p_{Z}(z)$ with a tractable density to evaluate model likelihoods using the change-of-variables formula in the Eq. (3).

Based on flow-based method [4], Grover et al. [5] proposed an alignflow method for unpaired image-to-image translation. In the method, the mapping between two domains $X \rightarrow Y$ can be represented through a shared feature space of latent variables Z by the composition of two invertible mapping [5]:

$$\begin{aligned} G_{X \rightarrow Y} = G_{Z \rightarrow Y} \circ G_{X \rightarrow Z}, \quad \quad \quad G_{Y \rightarrow X} = G_{Z \rightarrow X} \circ G_{Y \rightarrow Z} \end{aligned}$$

(4)

where $G_{X \rightarrow Z}= G_{Z \rightarrow X}^{-1}$ and $G_{Y \rightarrow Z}= G_{Z \rightarrow Y}^{-1}$. Due to the fact that composition of invertible mappings is invertible, both $G_{X \rightarrow Y}$ and $G_{Y \rightarrow X}$ are invertible [5]. On the other hand, we can obtain $G_{X \rightarrow Y}^{-1} = G_{Y \rightarrow X}$. Thus the Eq. (2) can rewrite as

$$\begin{aligned} \begin{aligned} \mathcal {L}_{cycle}(G_{X \rightarrow Y},G_{Y \rightarrow X})&=\left\Vert G_{Y \rightarrow X}(G_{X \rightarrow Y} (x)) - x \right\Vert _{1} \\&=\left\Vert G_{X \rightarrow Y}^{-1}(G_{X \rightarrow Y} (x)) - x \right\Vert _{1} = 0 \end{aligned} \end{aligned}$$

(5)

where $G_{X \rightarrow Y}^{-1}G_{X \rightarrow Y}$ results in an identical matrix.

Equation 5 implies that the flow-based methods can guarantee precise cycle consistency in mapping from a source domain to the target and returning to the source domain without additional loss functions. Hence, the alignflow objective loss is defined as:

$$\begin{aligned} \begin{aligned} \mathcal {L}_{flow}(G_{X \rightarrow Y},G_{Y \rightarrow X},D_X,D_Y)&= \mathcal {L}_{GAN}(G_{X \rightarrow Y},D_Y) +\mathcal {L}_{GAN}(G_{Y \rightarrow X},D_X) \\&- \lambda _{X}\mathcal {L}_{MLE}(G_{Z \rightarrow X})- \lambda _{Y}\mathcal {L}_{MLE}(G_{Z \rightarrow Y}) \end{aligned} \end{aligned}$$

(6)

where $\lambda _{Y}, \lambda _{Y} \ge 0$ are hyperparameters that control the importance of the MLE terms for domains X and Y respectively.

Figure 1 illustrates the difference between cycleGAN and alignflow methods. Unlikes cycleGAN, the alignflow method is the full invertible architecture that guarantees the cycle-consistency translations between two unpaired domains without an additional $\mathcal {L}_{cycle}$ function.

3 Proposed Method

Our motivation is to learn a mapping between unpaired images from different domains by leveraging the temporal information between consecutive slices. We use the temporal information to constrain the mapping between two domains which should be consistent. Our method is an extension of alignflow [5] method with making use of temporal information between consecutive slides.

3.1 Deformation Guided Temporal Constraints

To obtain the displacement between consecutive slices, we use an unsupervised registration network [1] to learn a deformation field $\phi $ of a slice $x_t$ and its consecutive slices $x_k$. The deformation field $\phi $ can be obtained using a convolutional neural network (CNN) [1] by minimizing the loss function

(7)

where denotes the spatial transformation operation. The first term ensures that the distance between the next slice $x_t$ and the warped current slice to be close. The second term imposes regularization on $\phi (.)$.

To guarantee the consistency of the image translation, the $\mathcal {L}_1$ loss is used to measure the difference between the warping of fake images on consecutive slice $t^{th}$ and the translation of reference slice $k^{th}$. We define the temporal consistency loss function for mapping $X \rightarrow Y$ and $Y \rightarrow X$ as:

(8)

Figure 2 illustrates an example for image-to-image translation from domain $X \rightarrow Y$ using temporal constraints. Let $x_t, x_{t+1}, x_{t+2}$ be consecutive slices of real images in the source domain X. A mapping function $G_{X \rightarrow Y}$ generates the fake image $y_t, y_{t+1}, y_{t+2}$ on target domain Y. On the source domain, we can learn displacement fields $\phi _t(.), \phi _{t+2}(.)$ between $(x_t, x_{t+1})$ and $(x_{t+2}, x_{t+1})$. To constrain the consistency of the mapping from $X \rightarrow Y$, we minimize the distance (i) between the warped fake image and $y_{t+1}$ for mapping from $t^{th}$ slice and $(t+1)^{th}$ slice, and (ii) between the warped fake image and $y_{t+1}$ for mapping from $(t+2)^{th}$ slice and $(t+1)^{th}$ slice.

3.2 Network Diagram

Figure 3 illustrates the proposed network diagram for unpaired image-to-image translation. Our proposed network architecture inherits the advantages of invertible property of alignflow [5]. During training, we add two additional networks $Reg_X$ and $Reg_Y$ for each domain to learn the deformation field $\phi (.)$. These additional networks only use in training time, without increasing the model complexity and inference time comparison with the baseline flow-based method. The temporal constraint via $\mathcal {L}_{reg}(.)$ losses ensures the mapping of consecutive slices on the source domain should be consistent on the target domain. Finally, our objective function is defined as:

$$\begin{aligned} \begin{aligned}&\mathcal {L}_{flow\_reg}(G_{X \rightarrow Y},G_{Y \rightarrow X},D_X,D_Y, \phi )= \mathcal {L}_{flow}(G_{X \rightarrow Y},G_{Y \rightarrow X},D_X,D_Y) \\&\quad \quad \quad \quad +\lambda _{1}\mathcal {L}_{reg}(X, G_{X\rightarrow Y})+\lambda _{2}\mathcal {L}_{reg}(Y, G_{Y\rightarrow X}) + \beta _{1}\mathcal {L}_X(\phi ) + \beta _{2}\mathcal {L}_Y(\phi )\\&\quad \quad \quad \quad +\gamma _1 \mathcal {L}_{TV}(X) + \gamma _2 \mathcal {L}_{TV}(Y) \end{aligned} \end{aligned}$$

(9)

where $\lambda _{1}$, $\lambda _{2}$, $\beta _{1}$, and $\beta _{2}$ control the relative importance of the temporal consistence losses and the two registration losses. $\mathcal {L}_{TV}$ denotes total variation (TV) loss to impose spatial smoothness by measuring the horizontal and vertical gradient of generated images [15]. These TV losses are weighted by $\gamma _1, \gamma _2$.

4 Experimental Results

4.1 Datasets and Training

We used common medical datasets to measure the robustness of our method against the existing methods: cycleGAN [16], recycleGAN [2], cycleflow [11] and alignflow [5]. cycleGAN [16] is an unpaired image-to-image translation that works on single slice level. RecycleGAN [2] built upon the cycleGAN and add a temporal predictor that is trained to predict future slice in a set of previous consecutive slices. cycleflow [11] is a flow-based method, but ignores the shared latent space Z (directly map from $X \rightarrow Y$, instead of $X \rightarrow Z \rightarrow Y$ as the alignflow method). The synthetic image from each method was quantitatively compared with the real paired image using the following performance metrics: mean squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM).

Human Connectome Project (HCP) is provided by the Human Connectome project [13]. We used T1 as the source domain and T2 as the target domain. We extract the axial view of T1/T2 images into 2D images. We split the 2D images into 1150 images for training set and 500 images for testing set.

MRBrainS13: [8] contains 15 subjects for training and validation and 6 subjects for testing. For each subject, two modalities are available that include T1-weighted, and T2-FLAIR with an image size of $48 \times 240 \times 240$. We extract the dataset into 2D images with 450 images for training and 150 images for testing

Brats2019: [9] includes 210 HGG scans and 75 LGG scans. Each scan has a dimension of $240 \times 240 \times 155$. For each scan, we extract it to 2D images and use 770 images for training and 250 images for testing.

Training. All networks were implemented using the Pytorch framework and trained on the 12GB GPU. The input image is resized to $128 \times 128$ and normalized to $[-1, 1]$. We used axial slices (10 slices around the middle slice) from the each subject. The Adam optimizer with a batch size of two was used to train the network. The initialization learning rate was set as 0.0002 and was decreased ten times every 20 epochs. We trained each model for 100 epochs. The balance weights were set as $\lambda _X=\lambda _Y=1e^{-5}, \lambda = \lambda _{1}=\lambda _{2}=10, \beta _1=\beta _2=1,\gamma _1=\gamma _2=1$. The discriminator network is a $70 \times 70$ PatchGAN [6]. For alignflow network [5], we set the number of scale was 1, number of block was 3. We use two consecutive slices (before and later slices) to learn the temporal constraint.

4.2 Performance Evaluation

Qualitative Evaluation. Figure 4 illustrates the image translation on different datasets. The proposed methods (in the last column) provided a better synthetic image, resulting in better MSE, SSIM and PSNR scores. For example, the proposed synthetic T2 image provides a high qualitatively difference along the tumor boundary (indicated by the red arrows in the fifth row) than in existing methods using the available source T1 image as input.

Quantitative Evaluation. Tables 1 reports the MSE, PSNR and SSIM values of the proposed method and existing methods. From the table, it is clear that the flow-based method (such as cycleflow [11], alignflow [5] and our method) provides competitive results with GAN-based method (such as cycleGAN, recycleGAN). By adding temporal constraints, the proposed network outperforms the baseline method (alignflow) on all performance metrics. Different from recycleGAN, that exploits temporal information via future slice prediction from consecutive slices, the proposed method measures pixel-wise temporal consistency by directly warping the synthetic slices with the deformation field of the consecutive slices from the source, and thus achieves better performances. This indicates the effectiveness of the proposed method in the unpaired image to image translation for medical image.

Table 1. Comparison between the proposed method against other image-to-image translation methods on HCP, MRBrainS13, Brats19 datasets.

Full size table

5 Conclusion

We presented an effective method for image-to-image translation based on flow-based methods and deformation information that allows the proposed method to exploit the temporal information between consecutive slices to constrain the translation image. We show that the proposed method can provide a good translation image, yielding a better MSE, PSNR, and SSIM on various MRI datasets. Although our network is a fully invertible property, it requires more memory resource than GAN-based methods (such as cycleGAN, recycleGAN, ...).

References

Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V.: An unsupervised learning model for deformable medical image registration. In: Proceedings of the CVPR, pp. 9252–9260 (2018)
Google Scholar
Bansal, A., Ma, S., Ramanan, D., Sheikh, Y.: Recycle-GAN: unsupervised video retargeting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 122–138. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_8
Chapter Google Scholar
Chen, X., et al.: One-shot generative adversarial learning for MRI segmentation of craniomaxillofacial bony structures. IEEE Trans. Med. Imaging (2019)
Google Scholar
Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real nvp. arXiv preprint arXiv:1605.08803 (2016)
Grover, A., Chute, C., Shu, R., Cao, Z., Ermon, S.: AlignFlow: cycle consistent learning from multiple domains via normalizing flows. arXiv preprint arXiv:1905.12892 (2019)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Google Scholar
Kingma, D.P., Dhariwal, P.: Glow: generative flow with invertible 1 $\times $ 1 convolutions. In: Advances in Neural Information Processing Systems, pp. 10215–10224 (2018)
Google Scholar
Mendrik, A.M., et al.: Mrbrains challenge: online evaluation framework for brain image segmentation in 3t MRI scans. Comput. Intell. Neurosci. 2015 (2015)
Google Scholar
Menze, B.H., Jakab, A., Bauer, et al.: The multimodal brain tumor image segmentation benchmark (brats). TMI 34(10), 1993–2024 (2015)
Google Scholar
van der Ouderaa, T.F., Worrall, D.E.: Reversible GANs for memory-efficient image-to-image translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4720–4728 (2019)
Google Scholar
Shen, Z., Zhou, S.K., Chen, Y., Georgescu, B., Liu, X., Huang, T.: One-to-one mapping for unpaired image-to-image translation. In: The IEEE Winter Conference on Applications of Computer Vision, pp. 1170–1179 (2020)
Google Scholar
Sun, H., et al.: Dual-Glow: conditional flow-based generative model for modality transfer. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 10611–10620 (2019)
Google Scholar
Van Essen, D.C., et al.: The WU-Minn human connectome project: an overview. Neuroimage 80, 62–79 (2013)
Google Scholar
Welander, P., Karlsson, S., Eklund, A.: Generative adversarial networks for image-to-image translation on multi-contrast MR images-a comparison of cyclegan and unit. arXiv preprint arXiv:1806.07777 (2018)
Yuan, Y., Liu, S., Zhang, J., Zhang, Y., Dong, C., Lin, L.: Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 701–710 (2018)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE CVPR, pp. 2223–2232 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

VinAI Research, Hanoi, Vietnam
Toan Duc Bui & Manh Nguyen
FPT University, Hanoi, Vietnam
Manh Nguyen
Department of Computer Science, University of Arkansas in Fayetteville, Fayetteville, USA
Ngan Le & Khoa Luu

Authors

Toan Duc Bui
View author publications
You can also search for this author in PubMed Google Scholar
Manh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Ngan Le
View author publications
You can also search for this author in PubMed Google Scholar
Khoa Luu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Toan Duc Bui .

Editor information

Editors and Affiliations

University of Toronto, Toronto, ON, Canada
Anne L. Martel
The University of British Columbia, Vancouver, BC, Canada
Purang Abolmaesumi
University College London, London, UK
Danail Stoyanov
École Centrale de Nantes, Nantes, France
Diana Mateus
EURECOM, Biot, France
Maria A. Zuluaga
Chinese Academy of Sciences, Beijing, China
S. Kevin Zhou
Sorbonne University, Paris, France
Daniel Racoceanu
The Hebrew University of Jerusalem, Jerusalem, Israel
Leo Joskowicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bui, T.D., Nguyen, M., Le, N., Luu, K. (2020). Flow-Based Deformation Guidance for Unpaired Multi-contrast MRI Image-to-Image Translation. In: Martel, A.L., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. MICCAI 2020. Lecture Notes in Computer Science(), vol 12262. Springer, Cham. https://doi.org/10.1007/978-3-030-59713-9_70

Download citation

DOI: https://doi.org/10.1007/978-3-030-59713-9_70
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59712-2
Online ISBN: 978-3-030-59713-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)