Cardiac Segmentation from LGE MRI Using Deep Neural Network Incorporating Shape and Spatial Priors

Yue, Qian; Luo, Xinzhe; Ye, Qing; Xu, Lingchao; Zhuang, Xiahai

doi:10.1007/978-3-030-32245-8_62

Qian Yue¹⁶,
Xinzhe Luo¹⁶,
Qing Ye¹⁶,
Lingchao Xu¹⁶ &
…
Xiahai Zhuang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11765))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

Abstract

Cardiac segmentation from late gadolinium enhancement MRI is an important task in clinics to identify and evaluate the infarction of myocardium. The automatic segmentation is however still challenging, due to the heterogeneous intensity distributions and indistinct boundaries in the images. In this paper, we propose a new method, based on deep neural networks (DNN), for fully automatic segmentation. The proposed network, referred to as SRSCN, comprises a shape reconstruction neural network (SRNN) and a spatial constraint network (SCN). SRNN aims to maintain a realistic shape of the resulting segmentation. It can be pre-trained by a set of label images, and then be embedded into a unified loss function as a regularization term. Hence, no manually designed feature is needed. Furthermore, SCN incorporates the spatial information of the 2D slices. It is formulated and trained with the segmentation network via the multi-task learning strategy. We evaluated the proposed method using 45 patients and compared with two state-of-the-art regularization schemes, i.e., the anatomically constraint neural network and the adversarial neural network. The results show that the proposed SRSCN outperformed the conventional schemes, and obtained a Dice score of 0.758 ± .227 for myocardial segmentation, which compares with 0.757 ± .083 from the inter-observer variations.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Adversarial Convolutional Networks with Weak Domain-Transfer for Multi-sequence Cardiac MR Images Segmentation

Efficient 3D Deep Learning for Myocardial Diseases Segmentation

Cardiac Segmentation of LGE MRI with Noisy Labels

Keywords

1 Introduction

Analysis of myocardial (Myo) viability is crucial to better understand the physiological and pathological processes for patients suffering from myocardial infarction (MI). Late gadolinium enhancement (LGE) MRI is a valuable tool for MI assessment, because it can visualize the important pathological information. For quantitative assessment, segmentation of the myocardium is a prerequisite.

Manual segmentation can be time-consuming and suffer from inter-observer variations, thus automating this process is desirable in the clinic. Rajchl et al. proposed to segment the myocardium indirectly using a multi-region approach [1]. Many automatic methods use cine MRI as prior knowledge, and the image registration techniques are applied for more accurate segmentations [2]. These methods generally require an accurate registration between the cine MRI and LGE MRI. However, this registration can also be challenging, considering the intra-image misalignments as well as inter-image misregistration. Therefore, manual interaction is commonly used. Liu et al. employed the multi-component Gaussian mixture model to automatically segment the myocardium from a single LGE MRI sequence [3]. The coupled level set is employed as a spatial constraint, which can be iteratively adapted according to the image characteristics.

Fully automated segmentation of LGE MRI is challenging due to the heterogeneous intensity distributions of images and the large shape variation of the heart. Furthermore, the annotated data are meanwhile limited; thus, the attempt to solve this problem automatically is still rarely reported. In the field of medical imaging, anatomical priors can be essential in assisting the segmentation task in the deep neural network (DNN)-based algorithms. Therefore, in this work we propose an enhanced DNN model with shape reconstruction (SR) and spatial constraint (SC) to tackle the challenging segmentation task, particularly with a small set of annotated training data. The resulting network is expected to be able to constrain the segmentation to generate results with realistic heart shapes.

We first propose a shape reconstruction neural network (SRNN). SRNN can be pre-trained by anatomical priors such as a set of label images, and it works as a shape constraint to regularize the results. Hence, SRNN can maintain a realistic heart shape of the segmentation result. Furthermore, we propose the spatial constraint network (SCN) to solve the large variation of the 2D slices across different positions of a 3D cardiac MRI. This is because the 2D slices may come from any position, from the apex to the base of the ventricles. The shape and appearance of these slices can vary considerably if they come from different positions. SCN is designed to incorporate this information. By combining the learning task of spatial information with the segmentation problem and formulating them as a two-task-learning problem, one can expect the SCN to significantly improve the general performance of the network, opposed to the separate training for the two tasks. In addition, we investigate two state-of-the-art alternatives for shape regularization, i.e. the anatomical constraint neural network (ACNN) [4] and the generative adversarial network (GAN) [5], though neither of them has been used for this segmentation task, to the best of our knowledge.

2 Method

Figure 1 presents the structure of the proposed network, i.e., SRSCN, which is based on an enhanced U-Net [6]. SRSCN includes two modules to incorporate the prior knowledge, i.e., the SR module and the SC module. The models solely combining U-Net with SR and SC are denoted as SRNN and SCN, respectively.

2.1 Architecture of the Segmentation Network

Ronneberger et al. proposed U-Net for medical image segmentation, which has two key modules, i.e. the feature extraction and up sampling module [6]. Based on the fully connected network (FCN), it has the advantage of utilizing multi-scale information of the images. U-Net has a symmetric pyramid structure, where an input image is compressed into higher semantic features and then unsampled to its original resolution. The combination of local and contextual information enables a good segmentation of medical images.

In our work, we adopted the Exponential Logarithmic Loss [7] as the loss function to measure the result of the segmentation. This loss function combines cross entropy and Dice score in a balanced fashion to facilitate training, and it takes the label balance into account to accelerate convergence, i.e.,

$$ L_{Seg} = \lambda_{Dice} L_{Dice} + \lambda_{Cross} L_{Cross} , $$

(1)

where, $ \lambda_{Dice} $ and $ \lambda_{Cross} $ are the balancing parameters, respectively for the weighted Dice score term, $ L_{Dice} = \varvec{E}\left[ {\left( { - \ln \left( {Dice_{i} } \right)} \right)^{{\gamma_{1} }} } \right] $, and the weighted cross entropy term, $ L_{Cross} = \varvec{E}\left[ {w_{l} \left( { - \ln \left( {p_{l} \left( \varvec{x} \right)} \right)} \right)^{{\gamma_{2} }} } \right] $ with $ \varvec{x} $ the pixel position, $ i $ the label, and l the ground-truth label at x; $ \gamma_{1} $ and $ \gamma_{2} $ are two hyperparameters that control the nonlinearities of the loss functions.

DNN, however, generally requires a large set of annotated data to train the network. With limited training data, the generalization capacity of the network could be impaired. Therefore, constraints from prior knowledge should be included to enhance the performance of the DNN.

2.2 SRNN for Prior Knowledge of Shapes

SRNN aims to learn an intermediate representation, from which the original inputs can be reconstructed. Internally, by several down sampling operations, it can compress the information or knowledge of original input into some codes acting as a compact representation of the input image. Through this information compression, features of the inputs are captured and mapped into a high-density space.

Hence, an SRNN model, pre-trained from a set of shape images, is able to function as a constraint to regularize a segmentation result into a desired realistic shape. The architecture of this SRNN is illustrated in Fig. 1, where the SR module (in dark red) is connected, as an extended network to U-Net. During the optimization process, a regularization term produced by SRNN is in charge of constraining segmentation output. The loss function for training SRNN is formulated as follows,

$$ L_{SRNN} = L_{Seg} + \lambda_{SR} L_{SR} , $$

(2)

where $ \lambda_{SR} $ is the balancing parameter; $ L_{SR} $ is the SR module loss and is defined from Frobenius norm,

$$ L_{SR} = \sum\nolimits_{i = 1}^{n} {\left\| {\widehat{R}_{i} - R_{i} } \right\|_{F}^{2} } . $$

(3)

Here, $ n $ is the number of training samples, $ R_{i} $ indicates the reconstructed gold standard segmentation, and $ \widehat{R}_{i} $ denotes the reconstructed segmentation from the SRNN prediction; $ \left\| \cdot \right\|_{F} $ is the Frobenius norm of an $ m \times n $ matrix, and it is defined as the square root of the sum of the absolute squares of matrix elements.

2.3 SCN for Prior Knowledge of Spatial Constraints

The idea of utilizing spatial information comes from the fact that the shapes and appearance of the heart in the basal and apical slices can vary significantly. Therefore, we develop an SC module to include the prediction of the spatial information of each slice. At the same time, the segmentation task cooperates with spatial information prediction task, which forms a multi-task learning problem. Multi-task learning has been shown to be able to significantly improve the performance in contrast to learning each task independently, both empirically [8] and theoretically [9, 10]. This is the case not only when a few data per task are available but also when two tasks can intuitively strengthen each other.

As Fig. 1 shows, we propose the SC module (in dark blue), connected to the bottom of the U-Net, to predict the position of an LGE MRI slice. The SC loss is designed to penalize the erroneous prediction of the spatial positions,

$$ L_{SC} = \sum\nolimits_{i = 1}^{n} {\left\| {\widehat{P}_{i} - P_{i} } \right\|_{F}^{2} } , $$

(4)

where $ P_{i} $ is the ground truth spatial information of slice i, and $ \widehat{P}_{i} $ is the prediction. Similarly, the SCN loss is formulated with the weighted loss terms,

$$ L_{SCN} = L_{Seg} + \lambda_{SC} L_{SC} . $$

(5)

By incorporating SC, the network can combine two tasks, i.e., the regression of position and the segmentation of images, to form a two-task-learning problem.

2.4 The Proposed SRSCN

Finally, we combine the SRNN and SCN to obtain the SRSCN, as shown in Fig. 1, whose loss function is then defined as follows,

$$ L_{SRSCN} = L_{Seg} + \lambda_{SC} L_{SC} + \lambda_{SR} L_{SR} . $$

(6)

These two techniques can strengthen each other and result in better segmentation. The two weights, $ \lambda_{SC} $ and $ \lambda_{SR} $, balance the regularization effect of these two terms.

2.5 Alternative Technology for Shape Constraints

For comparisons, we further investigate the two state-of-the-art networks for shape regularization, i.e., ACNN and GAN.

ACNN takes a series of cardiac label images as the inputs [4]. Through the pre-trained auto-encoder network, the shape features are encoded as the compact codes of the network. In contrast to the proposed SRNN using the reconstruction to assist segmentation, ACNN solely uses the codes created by the encoder. Specifically, one can obtain the ACNN by replacing the regularization term in SRNN with the L2-norm between the codes coming from the segmentation result and gold standard.

GAN trains a discriminator to distinguish the authenticity of the inputs [5]. The generator of GAN is responsible for producing more realistic inputs to fool the discriminator. Integrating this idea into the segmentation task, it is quite natural to train a discriminator whose task is to identify gold standard and segmentation results. Our main purpose is to guide the segmentation network, that is U-Net to obtain better segmentation results under this regularization. Specifically, two major modifications have been performed on the U-Net to obtain the GAN-regularized U-Net segmentation. Firstly, these segmentation results to be distinguished and gold standard are fed to GAN as it plays the role of predicting a probability determining whether the current input is gold standard label or not. The Sigmoid cross entropy used for GAN penalizes this discriminator for wrong predictions. Secondly, the cost function includes a regularization term created by GAN, with fixed parameters and an input of gold standard label.

3 Experiment

3.1 Data, Experimental Setup and Implementation Details

The LGE MRI used in the study were collected from 45 patients, of which 25 patients were randomly selected for training, 5 selected for validation and 15 for testing. Note that one of the 15 test cases failed all the methods, due to the particularly poor image quality. Hence, the statistics of the results reported here exclude this outlier. To augment the training data, we registered the training images to other image spaces using a set of artificially generated rigid, affine and deformable transformations, resulting in 1,350 augmented 3D images and 20,405 2D slices.

We used Dice coefficient, average symmetric surface distance (ASD) and Hausdorff Distance (HD) as metrics for evaluation of segmentation accuracy. ASD measures the average of all the distances from points on the boundary of segmentation (Seg) to the boundary of gold standard (GS),

$$ {\text{ASD}} = \frac{1}{{\left| {B_{Seg} } \right| + \left| {B_{GS} } \right|}} \times \left( {\sum\nolimits_{{x \in B_{Seg} }} {d\left( {x, B_{GS} } \right)} + \sum\nolimits_{{y \in B_{GS} }} {d\left( {y, B_{Seg} } \right)} } \right). $$

The HD metric measures how far two subsets of a metric space are from each other, $ {\text{HD}}\; = \;\mathop {\hbox{max} }\limits_{x \in Seg} \;\mathop {\hbox{min} }\limits_{y \in GS} \left\| {x - y} \right\| $.

For SRSCN, we used 5e-4 for the weight of SRNN and 1e-6 for SCN as default. Note that it is possible to obtain better performance if an exhaustive search for the optimal value could be employed. The inputs to the networks were 2D slices of size $ 240 \times 240 $ in pixels; the size of mini-batch was 32; the learning rate was 0.001. We trained each model for 30 epochs. GAN was trained for 10 epochs with manual monitoring of convergence, due to the particularly expensive training. The codes and models were implemented using TensorFlow [11], and the optimizer for training was AdamOptimizer [12]. We used one GPU of type GTX 1080ti for training and testing. Each model required 5 to 8 h to train and the testing of a subject took 2 to 3 s.

3.2 Performance of the Proposed Method

Table 1 presents the statistics of the three metrics of the proposed SRSCN. Dice score for myocardium segmentation reaches 0.812 ± .105, which compares the inter-observer Dice of 0.757 ± .083. Note that the mean Dice score drops to 0.758 ± .227 if the one failure case is included.

Table 1. Segmentation performance of SRSCN for cardiac LGE MRI.

Full size table

3.3 Study of Constraints

3.3.1 Ablation Study of SRSCN

The results of the ablation study are presented in Table 2. SCN outperforms U-Net by 8% in terms of generalized Dice score. SRNN further improves Dice performance by 3%. The proposed model, which consists of both of the SR and SC modules achieves more than 13% improvement. Figure 2 visualizes three typical slices, i.e. from apical, middle and basal ventricle, and Fig. 3 compares the distributions of Dice scores of different methods. The segmentation improvements are evident in the ablation study.

Table 2. Dice scores of the different methods from the study of shape constraints.

Full size table

3.3.2 Comparisons with Two State-of-the-Art Models

Table 2 and Fig. 3 also present the segmentation results from the two state-of-the-art deep-learning-based algorithms, i.e. ACNN [4] and GAN [13]. Compared to ACNN, SRSCN obtains marginally better mean Dice; compared to GAN, it achieves more than 5% improvement. Compared to U-Net without shape regularization, SRSCN has evidently and significantly better Dice scores in all categories (p < 0.01).

4 Conclusion

In this work, we propose the SRSCN for cardiac segmentation of LGE MRI. SRSCN incorporates the shape and spatial priors via the SC and SR modules. SC module is introduced as a spatial constraint for 2D slices and is formulated in the unified loss function as a multi-task-learning problem. SR aims to maintain a realistic shape of the resulting segmentation. We have evaluated the proposed method using 45 patients, and compared it with two state-of-the-art regularization schemes, i.e., ACNN and GAN. The results have demonstrated the effectiveness of the SR and SC regularization terms, and showed the superiority of segmentation performance of the proposed SRSCN over the conventional schemes.

References

Rajchl, M., et al.: Interactive hierarchical-flow segmentation of scar tissue from late-enhancement cardiac MR images. IEEE Trans. Med. Imaging 33(1), 159–172 (2014)
Article Google Scholar
Dikici, E., O’Donnell, T., Setser, R., White, R.D.: Quantification of delayed enhancement MR images. In: Barillot, C., Haynor, David R., Hellier, P. (eds.) MICCAI 2004. LNCS, vol. 3216, pp. 250–257. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30135-6_31
Chapter Google Scholar
Liu, J., et al.: Myocardium segmentation from DE MR using multicomponent Gaussian mixture model and coupled level set. IEEE Trans. Biomed. Eng. 64(11), 2650–2661 (2017)
Article Google Scholar
Oktay, O., Ferrante, E., Kamnitsas, K., et al.: Anatomically constrained neural networks (ACNNs): application to cardiac image enhancement and segmentation. IEEE Trans. Med. Imaging 37(2), 384–395 (2018)
Article Google Scholar
Goodfellow, I., Pouget-Abadie, J., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Wong, K.C.L., Moradi, M., Tang, H., Syeda-Mahmood, T.: 3D segmentation with exponential logarithmic loss for highly unbalanced object sizes. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11072, pp. 612–619. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00931-1_70
Chapter Google Scholar
Evgeniou, T., Micchelli, C.A., Pontil, M.: Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 6(Apr), 615–637 (2005)
MathSciNet MATH Google Scholar
Ando, R.K., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. Mach. Learn. Res. 6(Nov), 1817–1853 (2005)
MathSciNet MATH Google Scholar
Argyriou, A., Evgeniou, T., Pontil, M.: Multi-task feature learning. In: Advances in Neural Information Processing System, pp. 41–48 (2007)
Google Scholar
Abadi, M., Barham, P., et al.: TensorFlow: a system for large-scale machine learning. arXiv preprint arXiv:1603.04467 (2016)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Luc, P., Couprie, C., Chintala, S., Verbeek, J.: Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408 (2016)

Download references

Acknowledgement

This work was funded by the National Natural Science Foundation of China (NSFC) grant (61971142), and the Science and Technology Commission of Shanghai Municipality grant (17JC1401600).

Author information

Authors and Affiliations

School of Data Science, Fudan University, 200433, Shanghai, China
Qian Yue, Xinzhe Luo, Qing Ye, Lingchao Xu & Xiahai Zhuang

Authors

Qian Yue
View author publications
You can also search for this author in PubMed Google Scholar
Xinzhe Luo
View author publications
You can also search for this author in PubMed Google Scholar
Qing Ye
View author publications
You can also search for this author in PubMed Google Scholar
Lingchao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xiahai Zhuang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiahai Zhuang .

Editor information

Editors and Affiliations

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Dinggang Shen
University of Georgia, Athens, GA, USA
Tianming Liu
Western University, London, ON, Canada
Terry M. Peters
Yale University, New Haven, CT, USA
Lawrence H. Staib
University of Strasbourg, Illkirch, France
Caroline Essert
United Imaging Intelligence, Shanghai, China
Sean Zhou
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Pew-Thian Yap
Western University, London, ON, Canada
Ali Khan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yue, Q., Luo, X., Ye, Q., Xu, L., Zhuang, X. (2019). Cardiac Segmentation from LGE MRI Using Deep Neural Network Incorporating Shape and Spatial Priors. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11765. Springer, Cham. https://doi.org/10.1007/978-3-030-32245-8_62

Download citation

DOI: https://doi.org/10.1007/978-3-030-32245-8_62
Published: 10 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32244-1
Online ISBN: 978-3-030-32245-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)