Multi-channel Image Registration of Cardiac MR Using Supervised Feature Learning with Convolutional Encoder-Decoder Network

Lu, Xuesong; Qiao, Yuchuan

doi:10.1007/978-3-030-50120-4_10

Xuesong Lu¹² &
Yuchuan Qiao¹³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12120))

Included in the following conference series:

International Workshop on Biomedical Image Registration

Abstract

It is difficult to register the images involving large deformation and intensity inhomogeneity. In this paper, a new multi-channel registration algorithm using modified multi-feature mutual information (α-MI) based on minimal spanning tree (MST) is presented. First, instead of relying on handcrafted features, a convolutional encoder-decoder network is employed to learn the latent feature representation from cardiac MR images. Second, forward computation and backward propagation are performed in a supervised fashion to make the learned features more discriminative. Finally, local features containing appearance information is extracted and integrated into α-MI for achieving multi-channel registration. The proposed method has been evaluated on cardiac cine-MRI data from 100 patients. The experimental results show that features learned from deep network are more effective than handcrafted features in guiding intra-subject registration of cardiac MR images.

You have full access to this open access chapter, Download conference paper PDF

Multi-scale Inter-frame Information Fusion Based Network for Cardiac MRI Reconstruction

Embedding Gradient-Based Optimization in Image Registration Networks

Deformable Image Registration Using Vision Transformers for Cardiac Motion Estimation from Cine Cardiac MRI Images

Keywords

1 Introduction

Image registration is an important technique in medical image analysis [1]. Many clinical applications, such as multi-modal image fusion, radiotherapy, and computer-assisted surgery, can benefit from this technique. However, large deformation and intensity inhomogeneity bring great challenges into this procedure. To deal with these problems, the standard metrics like sum of squared difference (SSD), correlation coefficient (CC), and mutual information (MI) are not sufficient for intensity-based registration.

Recently, some studies have focused on multi-channel image registration for these issues. Legg et al. [2] extracted several feature images from the original images, and subsequently incorporated these feature images into a dissimilarity measure based on regional mutual information for multi-modal image registration. Staring et al. [3] adopted k-nearest neighbors graph (KNNG) to implement multi-feature mutual information (α-MI) in order to register cervical MRI data. Rivaz et al. [4] introduced a self-similarity weighted α-MI using local structural information to register multiple feature images. Li et al. [5] developed an objective function that relies on the autocorrelation of local structure (ALOST) into registration of intra-image with signal fluctuations. Guyader et al. [6] proposed to formulate multi-channel registration as a group-wise image registration problem, in which the modality independent neighborhood descriptor (MIND) was used as the feature images.

It is critical for these methods to select discriminative features that can establish accurate anatomical correspondences between two images. Most of multi-channel image registrations utilized handcrafted features, such as multi-scale derivatives or descriptor engineering, to achieve good performance. In general, handcrafted features need manually intensive efforts to design the model for specific task. Learning-based methods have been developed to select the best feature set from a large feature pool, which can be adapted to the data at hand [7]. Moreover, deep learning can automatically and hierarchically learn effective feature representation from the data. Shin et al. [8] applied the stacked auto-encoders to organ identification in MR images. Chmelik et al. [9] classified lytic and sclerotic metastatic lesions in spinal 3D CT images by deep convolutional neural network (CNN). Wu et al. [10] employed a convolutional stacked auto-encoder to identify intrinsic deep feature representations for multi-channel image registration.

In contrast, we propose an end-to-end feature learning method to improve the performance of α-MI based on minimal spanning tree (MST). The convolutional encoder-decoder architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer is trained in a supervised fashion. Various latent features can be learned by forward computation and backward propagation. The local feature representation of testing image extracted from the first layer of encoder part is integrated into α-MI metric. The proposed method is evaluated on intra-subject registration of cardiac MR images.

2 Method

2.1 α-MI Implementation Using MST

In the previous work [11], multi-channel registration of two images $ I_{f} \left( \varvec{x} \right) $ and $ I_{m} \left( \varvec{x} \right) $ can be formulated as $ \hat{\mu } = \arg \mathop {\hbox{min} }\limits_{\mu } \alpha MI\left( {T_{\mu } ;I_{f} \left( \varvec{x} \right),I_{m} \left( \varvec{x} \right)} \right) $, where $ T_{\mu } $ is the free-form deformation (FFD) model based on B-spline. Assume that $ \varvec{z}\left( {x_{i} } \right) = \left[ {z_{1} \left( {x_{i} } \right) \cdots z_{d} \left( {x_{i} } \right)} \right] $ denotes a vector of dimension $ d $ containing all feature values at point $ x_{i} $. Let $ \varvec{z}^{f} \left( {x_{i} } \right) $ be the feature vector of the fixed image at point $ x_{i} $, and $ \varvec{z}^{m} \left( {T_{\mu } \left( {x_{i} } \right)} \right) $ be that of the moving image at the transformed point $ T_{\mu } \left( {x_{i} } \right) $. Let $ \varvec{z}^{fm} \left( {x_{i} ,T_{\mu } \left( {x_{i} } \right)} \right) $ be the concatenation of the two feature vectors: $ \left[ {\varvec{z}^{f} \left( {x_{i} } \right), \varvec{z}^{m} \left( {T_{\mu } \left( {x_{i} } \right)} \right)} \right] $. Three MST graphs with $ N $ samples can be constructed by:

$$ L_{f} = min\sum\nolimits_{ij = 1}^{N - 1} {\left\| {\varvec{z}^{f} \left( {x_{i} } \right) - \varvec{z}^{f} \left( {x_{j} } \right)} \right\|^{\gamma } } , $$

(1)

$$ L_{m} = min\sum\nolimits_{ij = 1}^{N - 1} {\left\| {\varvec{z}^{m} \left( {T_{\mu } \left( {x_{i} } \right)} \right) - \varvec{z}^{m} \left( {T_{\mu } \left( {x_{j} } \right)} \right)} \right\|^{\gamma } } , $$

(2)

$$ L_{fm} = min\sum\nolimits_{ij = 1}^{N - 1} {\left\| {\varvec{z}^{fm} \left( {x_{i} ,T_{\mu } \left( {x_{i} } \right)} \right) - \varvec{z}^{fm} \left( {x_{j} ,T_{\mu } \left( {x_{j} } \right)} \right)} \right\|^{2\gamma } } , $$

(3)

where $ \left\| \cdot \right\| $ is the Euclidean distance, and $ \gamma \in \left( {0, d} \right). $ So α-MI based on MST can be expressed as:

$$ \alpha {\text{MI}} = \frac{1}{1 - \alpha }\left( {\log \frac{{L_{f} }}{{N^{\alpha } }} + \log \frac{{L_{m} }}{{N^{\alpha } }} - \log \frac{{L_{fm} }}{{N^{\alpha } }}} \right), $$

(4)

where $ \alpha = \left( {d - \gamma } \right)/d $.

2.2 Network Architecture

The network architecture like 2D U-Net [12] for deep feature learning consists of encoding and decoding branches connected with skip connections. The encoding stage contains padded $ 3 \times 3 $ convolutions followed by rectified linear unit (ReLU) activation functions. A $ 2 \times 2 $ maxpooling operation with stride 2 is applied after every two convolutional layers. After each downsampling, the number of feature channels is doubled. In the decoding stage, a $ 2 \times 2 $ upsampling operation is applied after every two convolutional layers. The resulting feature map is concatenated to the corresponding feature map from the encoding part. After each upsampling, the number of feature channels is halved.

The input size of the encoder-decoder architecture should be divisible by 16, and equal to the output size. At the final layer, a $ 1 \times 1 $ convolution is used to generate the same depth of feature map as the desired number of classes.

2.3 Feature Representation with Supervised Learning

To train the encoder-decoder network, the input images and their labels are used to optimize the weights of convolutional layers through the softmax classifier. For the class imbalance between the foreground and background, we adopt weighted cross entropy as the loss function:

$$ {\text{L}} = - \sum\nolimits_{{x \in\Omega }} {\omega \left( x \right)y\left( x \right)log\left( {\hat{y}\left( x \right)} \right),} $$

(5)

where $ {\text{y}}\left( x \right) $ is the true label, $ \hat{y}\left( x \right) $ is the probability estimation by softmax, and $ \upomega\left( x \right) $ is the weight coefficient at the pixel $ x $ within domain $ \Omega $.

Due to supervised learning, global features containing semantic information are prone to be biased. Here local features containing appearance information are extracted from the first layer of our network for multi-channel registration. Figure 1 shows an example of 64 features from a 2D slice of cardiac MR image. Finally, we embed 65 features (original intensity image, 64 deep features) into α-MI based on MST metric. Before performing registration, these features are normalized to have zero mean and unit variance. Note that feature extraction is executed in 2D manner, while registration is performed in 3D.

3 Experiment and Result

The multi-feature mutual information using MST was implemented in the registration package elastix [13] with multi-threaded mode, which is mainly based on the Insight Toolkit. The registration experiments were run on a Windows platform with an Intel Dual Core 3.40 GHz CPU and 32.0 GB memory. A Tensorflow implementation of convolutional encoder-decoder network was trained on a Nvidia GeForce GTX 1070 GPU.

3.1 Dataset and Evaluation Method

To evaluate the performance of the proposed method, our experiments were on cardiac cine-MRI training data of the ACDC challenge [14], which consists of 100 patient scans. The image spacing varies from $ 0.70 \times 0.70 \times 5 $ mm to $ 1.92 \times 1.92 \times 10 $ mm. We resampled the data to an in-plane spacing of $ 1.37 \times 1.37 $ mm, and then cropped all resampled images to an in-plane size of $ 224 \times 224 $ pixels. The manual delineation of the left ventricle (LV), the left ventricle myocardium (LVM), and the right ventricle (RV) at the end-diastolic (ED) and end-systolic (ES) phases of each patient is provided as the ground truth for quantitative evaluation.

The data were divided into the training and validation set. The training set comprising 80 subjects was used to train the deep network in a slice-by-slice manner for feature extraction. The validation set with the remaining 20 subjects was performed registration between images at ED and ES. In total 40 different registration results were available for evaluation. The propagated segmentations can be generated by transforming the manual segmentation of the moving image to the fixed image domain, with obtained deformation field.

The Dice Similarity Coefficient (DSC) as a measure of overlap was calculated between propagated segmentation and ground truth of the fixed image. To compare two methods, a value of $ {\text{p}} < 0.05 $ in two-sided Wilcoxon tests is regarded as a statistically significant difference. The Hausdorff distance (HD) between the surface of propagated segmentation and the surface of ground truth was also used to measure the quality of registration.

3.2 Parameter Settings

The proposed α-MI based on MST using the deep feature representation (in total 65 features, called aMI+SDF) was compared to localized MI (called LMI) [15] and α-MI based on MST with the Cartesian feature set [3] (in total 15 features, called aMI+HCF). Since cardiac MR images only show local deformations between the time phases, initial rigid registration was not necessary.

For weighted cross entropy, we set a weight of 0.3 for the foreground class, and 0.1 for the background class. To train the encoder-decoder network, we used the Adam optimizer, where learning rate $ 1.0 \times 10^{ - 3} $ and 60 epochs with batch size of 4 were set.

For all experiments on intra-subject registration, a multiresolution scheme using Gaussian smoothing was applied. Scales σ = 4.0, 2.0, and 1.0 voxels in the x and y directions were used. For the z direction, σ = 2.0, 1.0, and 0.5 voxel was used. As for transformation model, the parameterized B-splines with grid spacing of 20, 10, and 5 mm was employed for three resolution levels respectively.

For LMI, a local region of $ 50 \times 50 \times 25 $ mm was randomly selected. About the parameter optimization, A = 200, τ = 0.6, a = 2000, and 2000 iterations were set. The number of random samples was set to N = 2000. For aMI+HCF and aMI+SDF, A = 50, τ = 0.602, a = 2000, and 600 iterations were set. The number of random samples was set to N = 5000.

In multi-feature mutual information, the kD trees, a standard splitting rule, a bucket size of 50, and an errorbound value of 10.0 were selected. The k = 20 nearest neighbors were set. In addition, α value was set to 0.99.

3.3 Registration Accuracy

The boxplot of overlap scores using the three methods is shown in Fig. 2. It is clear that registration quality of LMI is the worst. Compared to aMI+HCF, the median overlap of aMI+SDF increases significantly from 0.898 to 0.921 ($ {\text{p}} = 2.70 \times 10^{ - 3} $) for the LV, from 0.781 to 0.822 ($ {\text{p}} = 4.57 \times 10^{ - 6} $) for the LVM, and from 0.775 to 0.813 ($ {\text{p}} = 1.92 \times 10^{ - 5} $). The overall mean and standard deviation of the measures are summarized in Table 1. The same trend can be found in the HD measure. The median HD of aMI+SDF for the LV is as low as 9.171 mm. Figure 3 displays a typical example of registration results. It can be observed that aMI+SDF performs much better than aMI+HCF for these anatomical structures.

Table 1. The mean and standard deviation of quantitative measures using the three methods for different anatomical structures.

Full size table

4 Conclusion

In this paper, we present a multi-channel registration algorithm for cardiac MR images. To make the feature representation more robust to large appearance variations of cardiac substructures, we propose to extract the features with convolutional encoder-decoder network. Afterwards, the learned features in a supervised fashion are incorporated into multi-feature mutual information framework. With experiments on cardiac cine-MRI data, the proposed method demonstrates the superior performance regarding to intra-subject registration accuracy.

References

Aristeidis, S., Christos, D., Nikos, P.: Deformable medical image registration: a survey. IEEE Trans. Med. Imag. 32(7), 1153–1190 (2013)
Article Google Scholar
Legg, P.A., Rosin, P.L., Marshall, D., Morgan, J.E.: A robust solution to multi-modal image registration by combining mutual information with multi-scale derivatives. In: Yang, G.-Z., Hawkes, D., Rueckert, D., Noble, A., Taylor, C. (eds.) MICCAI 2009. LNCS, vol. 5761, pp. 616–623. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04268-3_76
Chapter Google Scholar
Staring, M., Heide, U.A., Klein, S., Viergever, M.A., Pluim, J.P.W.: Registration of cervical MRI using multifeature mutual information. IEEE Trans. Med. Imag. 28(9), 1412–1421 (2009)
Article Google Scholar
Rivaz, H., Karimaghaloo, Z., Collins, D.L.: Self-similarity weighted mutual information: a new nonrigid image registration metric. Med. Image Anal. 18, 343–358 (2014)
Article Google Scholar
Li, Z., Mahapatra, D., Tielbeek, J.A.W., Stoker, J., Vliet, L.J., Vos, F.M.: Image registration based on autocorrelation of local structure. IEEE Trans. Med. Imag. 35(1), 63–75 (2016)
Article Google Scholar
Guyader, J.M., et al.: Groupwise multichannel image registration. IEEE J. Biomed. Health Inform. 23(3), 1171–1180 (2019)
Article Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
Shin, H., Orton, M.R., Collins, D.J., Doran, S.J., Leach, M.O.: Stacked autoencoders for unsupervised feature learning and multiple organ detection in a pilot study using 4D patient data. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1930–1943 (2013)
Article Google Scholar
Chmelik, J., et al.: Deep convolutional neural network-based segmentation and classification of difficult to define metastatic spinal lesions in 3D CT data. Med. Image Anal. 49, 76–88 (2018)
Article Google Scholar
Wu, G.R., Kim, M.J., Wang, Q., Munsell, B.C., Shen, D.G.: Scalable high-performance image registration framework by unsupervised deep feature representations learning. IEEE Trans. Biomed. Eng. 63(7), 1505–1516 (2016)
Article Google Scholar
Lu, X.S., Zha, Y.F., Qiao, Y.C., Wang, D.F.: Feature-based deformable registration using minimal spanning tree for prostate MR segmentation. IEEE Access 7, 138645–138656 (2019)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Shamonin, D.P., Bron, E.E., Lelieveldt, B.P.F., Smits, M., Klein, S., Staring, M.: Fast parallel image registration on CPU and GPU for diagnostic classification of Alzheimer’s disease. Front. Neuroinform. 7(50), 1–15 (2014)
Google Scholar
Bernard, O., et al.: Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans. Med. Imag. 37(11), 2514–2525 (2018)
Article Google Scholar
Klein, S., Heide, U.A., Lips, I.M., Vulpen, M., Staring, M., Pluim, J.P.: Automatic segmentation of the prostate in 3D MR images by atlas matching using localized mutual information. Med. Phys. 35(4), 1407–1417 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Biomedical Engineering, South-Central University for Nationalities, Wuhan, 430074, China
Xuesong Lu
Laboratory of Neuro Imaging, Keck School of Medicine of USC, Los Angeles, CA, 90033, USA
Yuchuan Qiao

Authors

Xuesong Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yuchuan Qiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuesong Lu .

Editor information

Editors and Affiliations

Faculty of Electrical Engineering, University of Ljubljana, Ljubljana, Slovenia
Žiga Špiclin
Centre for Medical Image Computing, University College London, London, UK
Jamie McClelland
Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic
Jan Kybic
Computer Vision Lab, ETH Zurich, Zurich, Switzerland
Orcun Goksel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, X., Qiao, Y. (2020). Multi-channel Image Registration of Cardiac MR Using Supervised Feature Learning with Convolutional Encoder-Decoder Network. In: Špiclin, Ž., McClelland, J., Kybic, J., Goksel, O. (eds) Biomedical Image Registration. WBIR 2020. Lecture Notes in Computer Science(), vol 12120. Springer, Cham. https://doi.org/10.1007/978-3-030-50120-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-50120-4_10
Published: 09 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50119-8
Online ISBN: 978-3-030-50120-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics