Abstract
This paper shows that deep modelling of subtle changes of cardiac motion can help in automated diagnosis of early onset of cardiac disease. In this paper, we model left ventricular (LV) cardiac motion in MRI sequences, based on a hybrid spatio-temporal network. Temporal data over long time periods is used as inputs to the model and delivers a dense displacement field (DDF) for regional analysis of LV function. A segmentation mask of the end-diastole (ED) frame is deformed by the predicted DDF from which regional analysis of LV function endocardial radius, thickness, circumferential strain (Ecc) and radial strain (Err) are estimated. Cardiac motion is estimated over MR cine loops. We compare the proposed technique to two other deep learning-based approaches and show that the proposed approach achieves promising predicted DDFs. Predicted DDFs are estimated on imaging data from healthy volunteers and patients with primary pulmonary hypertension from the UK Biobank. Experiments demonstrate that the proposed methods perform well in obtaining estimates of endocardial radii as cardiac motion-characteristic features for regional LV analysis.
This work is supported by SmartHeart. EPSRC grant EP/P001009/1.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Cardiac MRI sequences
- Cardiac motion
- U-Net
- Convolutional LSTM
- Dense displacement field
- Left ventricular function
1 Introduction
Magnetic resonance imaging (MRI) is widely used to assess cardiac function for cardiovascular disease diagnosis. Cardiac motion estimation highlights regional deformation of the myocardium, which is related to the severity of cardiovascular disease. Cardiac motion can be determined from the displacement field in MRI. Moreover, cardiac motion estimation can be regarded as an image registration problem. Shen et al. [8] proposed a spatio-temporal 4D deformable registration method for cardiac motion estimation in MR image sequences. De Craene et al. [3] estimated motion and strain in 3D echocardiography by finding the 4D velocity field with spatio-temporal B-Spline kernels.
In recent years, deep learning-based methods have achieved promising results for deformable registration-based motion characterization. Zheng et al. [10] estimated cardiac motion using a variant of U-Net [7] with a semi-supervised learning strategy. Qin et al. [6] suggested a Siamese style recurrent spatial transformer network for cardiac motion estimation, to guide cardiac segmentation. Both of these works required expert manual segmentation of the left ventricle.
A major challenge is to estimate the effect of cardiac functional changes via automated cardiac motion analysis. The early onset of symptoms already causes an increased strain on the heart, but the strain-related changes are not always easy to see by eye until more significant cardiac structural changes occur. Motion-characteristic features, such as time series of the endocardial radius, thickness, circumferential strain (Ecc) and radial strain (Err) are related to cardiac disease and they are easy to explain as characteristics of pathological cardiac motion. Motion analysis is therefore also useful for early stage characterization of disease.
In this paper, we propose a deep learning-based architecture with a self-supervised strategy to characterise the spatio-temporal patterns of left ventricular (LV) cardiac motion in cardiac MR cine loops for improving the characterization of heart conditions. We compare the proposed method with two other state-of-the-art methods. Specifically, we extract motion-characteristic features and time series of the endocardial radius, thickness, Ecc and Err, based on the output dense displacement field (DDF) of the proposed method, and compare these features between a healthy group and a primary pulmonary hypertension (PPH) pathological group.
Contributions. The contributions of this work are as follows. (1) To our knowledge, this is the first attempt to exploit \(2D +t\) spatio-temporal patterns with convolutional Long Short-Term Memory (ConvLSTM) in LV cardiac motion with a self-supervised strategy. (2) The predicted DDF of this method can be used to determine motion-characteristic features, namely a time series of the endocardial radius, thickness, Ecc and Err. These features are able to characterize different cardiac motion in health and pathologies. (3) We demonstrate that spatio-temporal patterns achieve better performance than the spatial-only pattern for cardiac motion estimation and regional analysis of LV function.
2 Spatio-Temporal Network
In this paper, cardiac motion estimation is considered as an image registration problem. The goal then becomes to estimate the spatial transformation of each point in the cardiac structure over the whole cardiac cycle. Let \(\{I_{t}\}_{t=0,1,2,...,N}\) indicate the cardiac MR cine loop frames, where N is the total number of frames. Each pixel-wise point \(x_{0}\) from the end-diastole (ED) frame \(I_{0}\) corresponds to a certain point \(x_{t}\) at the time frame t. In image registration, \(I_{t}(x_t)\) and \(I_{0}(T(x_{0}))\) denote the pixel value at same physical location. The spatial transformation T is represented by a DDF, described as \(u_{t}\) where \(u_{t}(x_{0}) = x_{t} - x_{0}\).
We model a function \(g_{\theta }(I_{0}, I_{t}) = u_{t}\) using a deep learning architecture, where \(\theta \) are the optimal parameters of the architecture that can be trained by optimising a function that considers the similarity of the source-target image pair \((I_{0}, I_{t})\) and a spatio-temporal smoothness constraint. We estimate the motion from the ED frame \(I_{0}\) to all other time frames \(I_{t}\), and generate a new image sequence \(\{I^{'}_{t}\}_{t=0,1,2,...,N}\). The complete pipeline of the proposed architecture is presented in Fig. 1, and is described in Sect. 2.1.
2.1 Network Architecture
Our deep learning architecture is a combination of a fully convolutional network (FCN) and a recurrent neural network (RNN). We describe the function of the FCN and RNN as follows.
U-Net. The FCN component explores the spatial information in each 2D slice (intra-slice information). U-Net [7] is employed due to its well-known ability to represent image features for biomedical image segmentation. It consists of encoder and decoder parts with skip connections. The U-Net detail is shown in the middle part of Fig. 1. A sequence of source-target image pairs \(\{(I_{0}, I_{t})\}_{t=1,2,3,...,N}\) is input to the U-Net convolutional network. The image pair is concatenated into a 2-channel 2D image. The encoder uses blocks of the 2D convolutional layers (\(3 \times 3\) kernel size), 2D batch normalization, rectified linear unit (ReLu) and 2D max pooling layer (\(2 \times 2\) window size). The decoder uses blocks of the transposed 2D convolutional layers (\(2 \times 2\) kernel size), 2D batch normalization and ReLu. The output of the U-Net is an initial dense displacement field (DDF), which is fed to initialise the LSTM to update the hidden states.
Convolutional LSTMs. The RNN component learns temporal relationships along the timeline (inter-slice information). We stack multiple convolutional LSTMs (ConvLSTM) [9], in order to increase the likelihood of detecting long-term dependencies of the cardiac motion over the cardiac cycle. We ran our architecture with different numbers of layers and kernel sizes in the ConvLSTM. Based on the validation performance, we stack 2 ConvLSTM layers with a 3-pixel kernel size in each layer. The number of input channels and the number of hidden channels of the ConvLSTM are each 2, where information in one channel represents the displacement in the x direction and in the other represents the displacement in the y direction.
The ConvLSTM can learn which information to keep in the long-term state, which information to drop, and which information to read. We present the details of the LSTM in Fig. 1. Let the current input be \(X_{T}\), and the previous hidden state is \(H_{T-1} \). Then,
Here \( *\) is the convolution operator and \( \circ \) is the Hadamard product (also called element-wise product). \(W_{XI}, W_{XF}, W_{XO}, W_{XC}, W_{HI}, W_{HF}, W_{HO}\) and \( W_{HC} \) represent the convolutional filters. \(B_{I}, B_{F}, B_{O} \) and \(B_{C}\) are the biases for each layers. The input gate \( I_{T}\) controls which part of the new input information will be kept in the long-term state. The forget gate \(F_{T}\) decides which part of the long-term state is removed. The output gate \(O_{T}\) decides which part of the long-term state is read. \(C_{T} \) is the long-term state. The short-term state \( H_{T}\) is the motion state in cardiac MR cine loop frames and indicates the output - predicted DDF.
Loss Function. The loss function (L) is defined as the sum of an image intensity-based similarity loss \(L_m\) and a regularisation loss \(L_s\) on the predicted DDF displacements. Namely, \(\varvec{L} = \varvec{L_m} + \varvec{L_s}\). \(\varvec{L_m}\) measures the mean squared error between each pixel in the registered source image \(I_{0}^{'}\) and the target image \(I_{t}\). \(\varvec{L_m} = \frac{1}{N}\sum _{t=1}^{N}(I_t-I_{t}^{'})^2\). According to the spatial transformation network [4], \(I_{0}\) is transformed to \(I_{t}^{'}\) using bilinear sampling. The second term, \(L_s\), is the spatial and temporal smoothness penalty, which controls the variation of displacements over space and time via an approximated Huber loss [6]. Mathematically, \(\varvec{L_s} = \lambda _{1} \varvec{L_{spatial}} + \lambda _{2} \varvec{L_{temporal}}\), where \(\varvec{L_{spatial}}\) calculates first-order spatial derivatives and \( \varvec{L_{temporal}}\) calculates first-order temporal derivatives. \(\lambda _{1} \) and \(\lambda _{2} \) are regularization parameters which are chosen empirically.
2.2 The Regional Analysis of Left Ventricular Function
The high-level steps in regional analysis of LV function are summarised in Fig. 2. The segmentation mask of the ED frame is deformed to another frame based on the predicted DDF. Automatic post-processing is applied to identify the LV endocardial and epicardial borders. To smooth the borders of deformed masks on the mid-slice 6-segments model of the 17-Segment AHA model [2], we performed a morphological closing operation (kernel size = 2) on them.
We divide the resulting predicted myocardium mask into segments based on the 17-Segment Model (AHA). Firstly, we find the barycenter of the LV and the right ventricle (RV) in the middle slice of the short axis view image. Secondly, we define the straight line between these two points as the initial line. Thirdly, we rotate this initial line around the barycenter point of the LV by 60, 120, 180, 240, \(300^\circ \) and divide the middle slice into 6 segments. Morphological transformations and barycenter location are implemented using OpenCV. The time series of the endocardial radius, thickness, Ecc and Err are measured in these 6 segments. In each segment, mean and standard deviation are used to show the rich detail. To this aim, we sample all the points on the endocardial border for the endocardial radius, 5 points by every \(12^\circ \) for the thickness and Err. Considering the small perimeter on the end-systolic (ES) frame, we divide the endocardial border into 3 sets instead of 5 sets for Ecc.
Strain Computation. Left ventricular strain indicates the deformation of the myocardium over the whole cardiac cycle and is shown in percentages. In each time frame T, circumferential strain (Ecc) and radial strain (Err) are computed as \(E = \frac{d_{T} - d_{ED}}{ d_{ED}} \times 100\% \). Here \(d_{ED}\) is the length on the ED frame, \(d_{T}\) is the length on the time frame T. In each sample, we choose the arc length of the endocardial border for the Ecc computation and LV wall thickness for the Err computation.
3 Experiments
3.1 Data Acquisition
Short-axis view cardiac MR image sequences from the UK BioBankFootnote 1 were used in this study. The CMR is obtained from a 1.5 T scanner (MAGNETOM Aera, Syngo Platform VD13A, Siemens Healthcare, Erlangen, Germany). A stack cine balanced steady-state free precession (bSSFP) of short-axis images, around 12 slices, covers the entire left and the right ventricles. In-plane resolution is \(1.8\times 1.8\) mm\(^{2}\), while the slice thickness is 8.0 mm and slice gap is 2.0 mm. Each sequence contains 50 consecutive time frames per cardiac cycle. We randomly selected image sequences of 450 subjects for training, 47 subjects for validation and 100 subjects for testing.
3.2 Implementation Details
Pre-processing. For training and testing the deep learning architecture, all images were cropped to a size of \(192 \times 192\) pixels because of GPU limitations, and the intensity normalisation applied to the cropped images. The segmentation mask of the LV endocardial and epicardial borders and the right ventricular (RV) endocardial borders at the ED frame was generated from using the FCN method proposed by Bai et al. [1] and used to quantify cardiac motion.
Training. The model is trained over 150 epochs using Adaptive Moment Estimation (Adam) optimisation [5] with learning rate 0.0001 and a batch size of 1. For the smoothness penalty of the loss, we set \(\lambda _{1} \) to 0.002 and \(\lambda _{2}\) to 0.0002 based on algorithm performance on the validation dataset. Further, we randomly select one frame in the selected slice to be frame \(I_{0}\). We set the input image sequence length to 20 frames due to GPU memory limitations. The proposed network was implemented using Python 3.7 with Pytorch. All the experiments are run with computational hardware GeForce GTX 1080 Ti GPU 10 GB.
3.3 Evaluation Metrics.
To quantify the similarity between the predicted image and the target image, we use three image metrics: the normalised root mean-squared error (NRMSE), the mean structural similarity index (MSSIM) and the peak signal to noise ratio (PSNR). A two-sided Wilcoxon signed rank test is used to find where there is a statistically significant difference in these three metrics among three methods.
4 Results
4.1 Quantitative Results
Table 1 summarizes the comparative results on the MRI sequences and the ES frame between the proposed and other methods. It is observed that the proposed method is superior to Qin et al.’s method [6] and U-Net [7]. The proposed method achieves an accuracy with a NRMSE of \(0.053 \pm 0.017\), MSSIM of \( 0.851 \pm 0.049\), and PSNR of \(35.391 \pm 2.976\) on the MRI sequences, and a NRMSE of \(0.065 \pm 0.012\), MSSIM of \(0.836 \pm 0.036\), and PSNR of \(33.399 \pm 1.120\) on the ES frame. U-Net yielded the lowest MSSIM and PSNR value and the highest NRMSE value on both the MRI sequences and the ES frame among the evaluated approaches. Using a two-sided Wilcoxon signed rank test, statistically significant greater results than Qin et al.’s and U-Net were obtained (\(p < 0.05\)) for all the measurements.
4.2 Representative Examples
Cardiac Motion Estimation. Figure 3 shows an example cardiac motion estimation comparison on the 19th frame (ES) of the MRI sequence between the proposed method, Qin et al.’s method [6] and U-Net [7], using spatial-only patterns. It is observed that the proposed method provides a higher MSSIM 0.853 and PSNR 32.556 and a lower NRMSE 0.090 than the other methods on the predicted ES frame. The displacement image visualizes the DDF. Different colours describe the different motion directions, and the colour intensity expresses the magnitude of the displacement. The proposed method estimates higher displacements (visualised as a stronger colour in Fig. 3 middle column) compared to other methods, especially at the centre area of the LV blood pool. The U-Net seems to be less accurate, because it has strong background noise (shown in green) compared to the proposed method and the Qin et al.’s method. The displacement error maps show that the U-Net has the largest difference at the LV and the surrounding area, followed by the method of Qin et al.
Left Ventricular Function Evaluation. In our dataset, we do not have manual image segmentation. In order to do regional analysis of LV function, we ran Bai et al.’s algorithm [1] to get the segmented ED frame. Then we warped the segmented ED frame to other frames in the sequence. Table 2 and Fig. 4 shows an example of a healthy volunteer and a primary pulmonary hypertension (PPH) patient with the proposed method. Figure 4 shows an example of a time series of the endocardial radius, thickness, Err and Ecc in the six segments of myocardium estimated for a healthy volunteer and a PPH patient. Compared to a healthy volunteer, the LV of the PPH patient has poor contraction over the whole cardiac cycle, and as a result, the endocardial radius of a hypertension patient is larger than that of a healthy volunteer. For instance, the endocardial radius (orange) of segment 1 contracts less. Table 2 shows that on the 19th frame (ES), the mean radius of segment 1 is 10.69 pixel from the PPH patient, while the mean radius of segment 1 is 9.79 pixel from the healthy volunteer. In clinical practice, the endocardial radius should take on its smallest value over the cardiac cycle on the ES frame, because the volume of the LV blood pool reaches the minimum value then. Moreover, the LV wall thickness from all six segments is smaller for the PPH patient, compared to the healthy one. Due to the reduced thickness, we conclude that this left ventricle exhibits atrophy.
5 Discussion
In this work we have proposed a deep learning-based approach to cardiac MR motion analysis that uses a self-supervised paradigm to learn spatio-temporal features in cardiac MR cine loops. The results show the ability of the proposed approach to capture spatio-temporal patterns and predict a dense displacement field (DDF) over a full cardiac cycle. The proposed method has higher accuracy than the method of Qin et al. and U-Net which we attribute to the use of spatio-temporal features. According to our experiments, the best DDF results are obtained when we stack 2 ConvLSTM layers with a 3-pixel kernel size in each layer.
The predicted DDF is employed to deform an ED myocardium mask to other frames and perform regional LV endocardial radius, thickness, Ecc and Err time-series analysis. The results show the potential of the proposed approach to evaluate the clinical parameters for cardiovascular diseases. Currently, we do not use interpolation to smooth feature time series. In our experiments, we find that it is not necessary to smooth the curve. We can use the unsmoothed curve of the endocardial radius to explain the abnormal motion phenomenon in the PPH pathological group.
There are some limitations of this work. The UK BioBank consists of mainly healthy volunteers, and has a sparse number of PPH patients. The model may not well represent the motion and strain patterns typically seen in PPH patients.
6 Conclusion
We present a novel spatio-temporal network to characterise cardiac motion, visualise the dense displacement field and explain motion-characteristic features in a healthy group and a pathological group. The model learns meaningful spatio-temporal patterns of the cardiac motion that can be used for LV regional function analysis. Future work will extend this method to analyse the basal, mid-cavity and apical slices of the LV. The motion and strain analysis method is not disease-specific and could be extended extend to other cardiac conditions such as ischaemic health disease, assuming suitable training examples are available.
Notes
- 1.
UK BioBank. https://www.ukbiobank.ac.uk/.
References
Bai, W., et al.: Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. J. Cardiovasc. Magn. Reson. 20(1), 65 (2018)
Cerqueira, M.D., et al.: Standardized myocardial segmentation and nomenclature for tomographic imaging of the heart: a statement for healthcare professionals from the cardiac imaging committee of the council on clinical cardiology of the american heart association. Circulation 105(4), 539–542 (2002)
De Craene, M., et al.: Temporal diffeomorphic free-form deformation: application to motion and strain estimation from 3D echocardiography. Med. Image Anal. 16(2), 427–450 (2012)
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017–2025 (2015)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Qin, C., et al.: Joint learning of motion estimation and segmentation for cardiac MR image sequences. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11071, pp. 472–480. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00934-2_53
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Shen, D., Sundar, H., Xue, Z., Fan, Y., Litt, H.: Consistent estimation of cardiac motions by 4D image registration. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3750, pp. 902–910. Springer, Heidelberg (2005). https://doi.org/10.1007/11566489_111
Xingjian, S., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, pp. 802–810 (2015)
Zheng, Q., Delingette, H., Ayache, N.: Explainable cardiac pathology classification on cine MRI with motion characterization by semi-supervised learning of apparent flow. Med. Image Anal. 56, 80–95 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Lu, P., Qiu, H., Qin, C., Bai, W., Rueckert, D., Noble, J.A. (2020). Going Deeper into Cardiac Motion Analysis to Model Fine Spatio-Temporal Features. In: Papież, B., Namburete, A., Yaqub, M., Noble, J. (eds) Medical Image Understanding and Analysis. MIUA 2020. Communications in Computer and Information Science, vol 1248. Springer, Cham. https://doi.org/10.1007/978-3-030-52791-4_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-52791-4_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-52790-7
Online ISBN: 978-3-030-52791-4
eBook Packages: Computer ScienceComputer Science (R0)