Keywords

Introduction

The physics of the data acquisition process in magnetic resonance imaging (MRI) cause a substantial slowdown compared to other medical imaging modalities leading to prolonged scan times. This is further exacerbated in cardiac MRI (CMR), where additional considerations due to cardiac and respiratory motions further increase the scan durations. Therefore, acceleration of MRI acquisitions remains an active topic of research, with many clinical applications. This is typically achieved by acquiring fewer data, i.e., subsampling, and using image reconstruction or enhancement strategies that exploit redundancies in the acquisition or the images of interest to produce high-quality images from these subsampled raw data.

Over the past two decades, numerous approaches, including parallel imaging (PI) [25, 60] and compressed sensing (CS) [50], have been proposed for accelerating MRI. These have also found many applications in cardiac MRI [2, 4,5,6, 8, 26, 53, 57, 58, 73,74,75, 80]. Recently, deep learning (DL) methods have received substantial interest due to their superior image quality. In this chapter, we will review DL-based image enhancement and reconstruction strategies for cardiac MRI, with a particular focus on fast imaging and artifact reduction.

Basics of Deep Learning for Image Enhancement and Reconstruction

The success of DL is based on recent advances in model architectures, the availability of large training databases, and the availability of powerful GPUs. In this chapter, we introduce the basic tools to build powerful reconstruction networks. An overview of different types of reconstruction networks are depicted in Fig. 13.1, which are further discussed in sections “Deep Learning for Image/K-Space Enhancement in CMR” and “Deep Learning with Algorithm Unrolling in CMR Reconstruction”. The networks can be trained in a supervised, self-supervised, or unsupervised way with suitable training data, since it may be challenging to obtain fully sampled data for cardiac applications. In the following, we shortly outline important building blocks and base architectures for reconstruction networks and give an overview on training data and network training. For more details on these topics, we refer the interested reader to survey papers [21, 41, 49] and book chapters [27].

Fig. 13.1
figure 1

Overview of different methods for cardiac image reconstruction. A neural network (NN) is trained to fill the missing k-space lines. The reconstructed k-space is then transformed to image space using an inverse Fourier transform (IFT) and coil combined. Enhancement can be also done in image domain on the IFT image. The third method combined NN building blocks and data consistency (DC) layers , where the NN acts in image domain, and data-related information is used in the physics-driven DC layer

Network Types

The cornerstone of deep model architectures is deep convolutional neural networks (CNNs) . CNNs are powerful feature extractors that perform sequential operations of convolutions and nonlinear activation functions. Further improvements can be achieved by analyzing the image in multiple scales as in UNets [64]. To downsample the image, building blocks such as strided or dilated convolutions and pooling layers are used. Skip connections are used to avoid the vanishing gradient problem during training. Upsampling is performed by using transposed convolutions and interpolation methods.

Convolution operations can be performed in N dimensions. However, the higher the dimensionality, the more memory is required. A solution to this issue is separable convolutions. This not only overcomes the computational burden but also results in fewer parameters and improved performance compared to high-dimensional N-D convolutions [43, 45, 67]. In case of 2D+t, 2D convolutions might be first performed in spatial domain, followed by 1D convolutions in temporal domain.

Recurrent Neural Networks (RNNs) provide another way to process spatiotemporal sequences of arbitrary length. The individual frames are processed sequentially, and an additional variable termed hidden unit extracts and propagates information to later frames. Like CNNs, a number of RNN blocks can be sequentially applied to process the data.

Consistency to Acquired k-Space Data

The single network architectures are not the only important component to process the acquired MRI data accurately. The acquired k-space data contain lots of information that are beneficial to support the reconstruction process. Hence, consistency to acquired k-space data is added to many reconstruction approaches. This can be done by estimating nonlinear interpolation kernels from pairs of undersampled and fully sampled k-space data as in RAKI [7], similar to the estimation of linear GRAPPA kernels [25]. On the other hand, data consistency can be achieved similar to traditional optimization approaches by considering the data term D

$$ D\left(\mathbf{Ax},\mathbf{y}\right)=\kern0.5em \parallel \mathbf{Ax}-\mathbf{y}{\parallel}_2^2 $$
(13.1)

in the optimization procedure. Here, x denotes the reconstruction, and y denotes the acquired raw k-space data. The linear MR operator A models the acquisition physics, i.e., sampling pattern, coil sensitivity maps, and the Fourier transform. In the case of non-Cartesian sampling, the operator A further includes the sampling trajectory and/or density compensation function. Inspired by classical optimization, various unrolled optimization schemes have been introduced, such as gradient descent [28], proximal gradient [3, 68], variable splitting [14], primal-dual [1], or ADMM [82]. Hence, data consistency layers can be defined that follow, e.g., a proximal mapping [3, 68] or a GD scheme [28] or are derived from other optimization schemes.

Training Data

To train an NN, suitable training data are required targeting the selected application. For MRI reconstruction or enhancement, ideally, fully sampled k-space data are available, which can then be retrospectively undersampled. While this is feasible to obtain for certain static 2D MRI applications, e.g., knee imaging [42], it is more challenging to acquire training data for numerous CMR applications, including both static and dynamic imaging. For static CMR scans, especially at high resolutions, such as whole-heart coronary MRI or late gadolinium enhancement (LGE), data are acquired with electrocardiogram (ECG) triggering in a k-space-segmented manner during the diastolic quiescence, which lasts approximately 100 ms. Thus, these large 3D volumetric acquisitions often require data acquisition well beyond a breath-hold duration, necessitating free-breathing acquisitions, which itself reduces acquisition efficiency by two- to threefold. Thus, for instance, a fully sampled whole-heart coronary MRI would require about 20 min of scan time, in which the image quality degrades due to respiratory drift. On the other hand, for many dynamic or quantitative CMR scans, including real-time cine imaging, perfusion imaging, 4D flow imaging, and quantitative parametric mapping, the trade-offs between coverage, spatial, and temporal resolutions make it impossible to acquire fully sampled data in the first place.

In these cases, where fully sampled data is unavailable, it is more challenging to define a suitable ground-truth. Thus, frequently, a surrogate reconstruction, e.g., based on PI at a lower acceleration rate or CS reconstruction at a similar rate, is used as reference data [17, 20, 45, 71]. However, this inherently hinders the performance benefits of DL methods since their performance is limited by the conventional surrogate method that is used to generate the reference data.

Another challenge in cardiac MRI is the availability of a high number of training samples. For musculoskeletal imaging and neuroimaging, more than thousands of raw datasets are made publicly available in the fast MRI dataset [42]. For cardiac imaging, only small databases are used for research purpose, which hinders the progress of reproducible research and the effectiveness of NNs. Recently, El-Rewaidy et al. [16] published a radial cine cardiac MRI dataset with 101 patients and 7 healthy subjects, constituting the first publicly available cardiac MRI raw dataset. Transfer learning or domain adaptation can be used to deal with limited training data [13, 32, 40, 85]. While this is mostly feasible for 2D applications, where the networks can be pretrained on, e.g., natural images, large databases are hardly available for high-dimensional applications. Another way to increase the robustness of NNs is to perform data augmentation in image domain. Image domain transformations are applied to generate a diverse dataset. In Computer Vision applications, transformations including translation, rotation, scaling, shearing, and nonrigid deformations are applied. However, these transformations cannot be directly applied to enhance the raw k-space data, requiring more realistic transformations. Oksuz et al. introduced physics-aware data augmentation by simulating motion [55]. Adversarial strategies to generate realistic datasets have been only introduced for CMR segmentation [11].

Training the Neural Network

Once the training data have been procured, and the NN architecture has been finalized, the tunable parameters of the network need to be learned from the database. The training procedure depends on two main components: the availability of a reference data and the choice of a function to evaluate the quality of the NN output. In the presence of such data, training is traditionally performed using supervision, where the output of the network is compared to the reference via a loss function, and the network parameters are updated during training to reduce this loss.

Loss Functions

A loss function (cost function, error) measures how well the NN is able to predict the reconstruction based on its network input, usually an image and/or k-space data. Several loss functions have been used in the literature [41], including pixel-wise measures, patch-based measures, or adversarial losses. Pixel-wise measures are simply quality measures, including mean-square error (or ℓ2) loss and mean-absolute error (or ℓ1 loss). However, pixel-wise measures are not able to represent the complex human perceptual system. Patch-based losses such as the structural similarity index [76, 84] or perceptual losses that are based on the extracted features of the VGG network [37, 47, 70] add more realistic information to the loss function. However, they are always used in combination with a pixel-wise loss function that is able to stabilize the training [31, 37, 70, 84]. Pixel-wise and perceptual losses are additionally equipped with adversarial losses to push the image quality toward realistic images [24, 47, 70]. However, in the field of medical imaging, it is still an open question if and how adversarial training strategies distort the real image content.

Supervised Learning

In case of supervised learning, the output of the NN is compared against a clean, artifact-free reference image [28, 68] or fully sampled k-space data. For many CMR exams, fully sampled data is difficult to acquire; thus, a surrogate reconstruction is used as reference data [17, 20, 45, 71]. Hybrid loss functions exist that add both a loss in image domain and k-space domain. This is typically used for image enhancement methods where data consistency is imposed only in the loss function [36].

Unsupervised and Self-Supervised Learning

An alternative line of training approaches, broadly falling under unsupervised learning, considers training NNs from unpaired data. Hence, no pairs of undersampled/fully sampled images or any other information about the measured k-space data are available. Self-supervised learning mitigates the absence of reference data by automatically generating training labels from the data itself and has found applications in many AI applications. For image reconstruction and enhancement, this is typically done by a masking operation, in which parts of the k-space are hidden from the NN and used as training labels, as proposed by Yaman et al. [77]. Then the network is trained to learn these parts of the k-space that are hidden from the NN. It has been shown that such methods can match the performance of supervised learning without significant differences [78] and have found applications in CMR [79]. Ke et al. proposed another semi-supervised learning strategy for CMR that relies on a view-sharing method to generate fully sampled references from time-interleaved samples [38].

Deep Learning for Image/k-Space Enhancement in CMR

Image/k-space enhancement techniques have been popular in CMR applications. One of the motivations for using such methods is the large size of multi-coil CMR data, which leads to large memory requirements for unrolled NNs and to slow processing times via traditional iterative methods. Additionally, non-Cartesian trajectories are utilized in many translational CMR scans, but the encoding operator is costly to implement for DL image reconstruction with algorithm unrolling, leading to a preference of enhancement methods for faster processing.

In image enhancement methods, the input to the NN is generated via an initial reconstruction, such as coil-combined zero-filling solution. The NN is designed and trained to enhance the solution to be similar to a reference solution. But this is often done without any knowledge about the acquisition physics. Several strategies to utilize additional information about the acquisition physics have been proposed to augment these methods, including enforcing data consistency after enhancement [36] and using a k-space data consistency term in the training loss function [81]. The NN for image enhancement methods is often trained with the residual, which may include noise, aliasing artifacts, and other obstacles in the data. This output is then subtracted from the input image to generate the final output.

In k-space enhancement, the input to the NN is typically the zero-filled k-space , and the goal is similarly to interpolate/enhance the solution to resemble the true solution. As in image enhancement, acquisition physics is not incorporated into the training explicitly, though dependencies among coils may be learned implicitly due to the use of multi-coil k-space data. Nevertheless, k-space consistency is given in contrast to image enhancement methods.

Next, several of these enhancement strategies that have been used in CMR applications will be outlined.

Dynamic/Quantitative CMR Applications

Image acceleration is critical for numerous dynamic and/or quantitative CMR scans where a series of images of the same anatomy is acquired over time and/or with different contrast weightings. Due to the need for acquiring multiple images over time, these methods have long scan times and often necessitate trade-offs between spatiotemporal resolution and coverage. Thus, improved reconstruction methods are critical for such acquisitions, making them an important application for DL methods. A further complication arises due to the large amount of data that is acquired in these scans, leading to large memory requirements. This is a challenge both for conventional iterative techniques and for DL methods based on algorithm unrolling. Thus, DL-based image enhancement methods have received attention in this setting.

Residual CNNs have been used to remove undersampling artifacts in cine imaging. Hauptmann et al. [33] reconstructed undersampled tiny golden angle radial SSFP data with a residual UNet architecture. The UNet was trained on the magnitude of pairs of gridded/reference reconstructions. The core of the UNet consisted of 3D convolutions. The UNet achieved 5× faster reconstruction times than conventional CS approaches. Assessment of biventricular volumes showed a superior reconstruction quality of the UNet reconstructions compared to CS.

Spatiotemporally Separated Convolutions for Dynamic Data

A drawback of UNets is the large number of network parameters, which makes it challenging to train networks on small training datasets. Kofler et al. proposed a residual UNet architecture for 2D radial cine MRI [44]. The convolutions in the UNet were implemented as a combination of xt and yt, i.e., 2D convolutions instead of 3D convolutions. This substantially reduced the number of network parameter and makes it easier to train on small training datasets.

Though cine imaging has received attention for DL-based image enhancement due to the availability of fully sampled reference datasets, several other important dynamic and quantitative applications also exist, often without such ground-truth data. One example is perfusion imaging where Fan et al. proposed a 2D+t UNet for reconstructing multi-slice non-Cartesian myocardial perfusion MRI [20]. The conventional CS reconstruction was used as reference for supervised training with an MSE loss function. The results showed that the UNet reduced the reconstruction time by 14.4-fold, including the preprocessing and enhancement pipeline, while showing no visual difference to the CS reconstruction.

Another such application without ground-truth data is real-time cine imaging. Shan et al. [71] proposed a perceptual complex NN to accelerate the reconstruction times for 2D+t non-Cartesian real-time cine acquisitions. The core network was a UNet architecture with complex-valued building blocks, including complex convolutional layers. This complex UNet was trained using a combination of MSE loss and a perceptual VGG loss with the conventional CS reconstruction as reference. It yielded a reconstruction time of 24.5 s for 80 frames of a single image slice, including 23.7 s of preprocessing and 0.8 s of NN calculations. No significant difference in image quality and LV functional parameters were observed using the perceptual complex NN.

Combination with Pre-estimation of Temporal Basis Functions

For dynamic quantitative acquisitions using CMR multitasking [12], Chen et al. proposed a hybrid approach that combines conventional estimation of temporal features along with an NN approach for memory-efficient reconstruction of these massive imaging datasets [10]. In particular, they estimated the temporal features using a conventional principal component analysis-based method and combined this with the application of a dilated multilevel densely connected NN for enhancement in the corresponding spatial feature space. Using the conventional CS-type reconstruction as reference, the network was trained in a supervised manner with ℓ1 loss. This approach sped up the calculation compared to the conventional iterative method, from 20 min to 0.4 s, while achieving similar image quality.

Superresolution Approaches

An alternative approach for the use of enhancement has been through superresolution. In such applications, instead of acquiring an undersampled dataset and trying to enhance the image by reducing residual aliasing artifacts, one acquires a low-resolution dataset and seeks to enhance its resolution retrospectively. In the context of dynamic CMR, Oktay et al. proposed a superresolution approach to map a stack of low-resolution short-axis images to high-resolution short-axis images using a residual NN [56]. The quality of the high-resolution volume is further improved by embedding additional long-axis scans in a Siamese multi-image superresolution network. Using CNNs, the quality of superresolution is highly improved compared to linear and cubic interpolation.

Another application of superresolution for cine imaging was explored by Masutani et al. [52]. This work used both single-frame and multi-frame CNNs based on the SRNet and UNet. The training data was generated by retrospectively downsampling standard cine acquisitions to lower resolutions by keeping the central part of the k-space from DICOM images without applying any ringing filters. A combination of ℓ1 and SSIM loss was utilized. The CNNs outperformed zero padding and bilinear interpolation and yielded similar left ventricular volumes compared to full-resolution images for up to threefold downsampling.

Static CMR Applications

Image and k-space enhancement methods have also found utility in static CMR applications. Similar to the aforementioned dynamic CMR applications, one focus area has been to improve the computational speed for large volumetric datasets while matching the performance of conventional CS-type methods . Along these lines, El-Rewaidy et al. proposed a complex-valued UNet for undersampled 3D LGE imaging [17]. The network worked with 2D slices, generated by taking an inverse Fourier transform along the fully sampled frequency-encoding direction. It was trained using a CS-type reconstruction as reference with an MSE loss function. The results showed a 300-fold acceleration compared to the conventional reconstruction without a substantial change in image quality as depicted in Fig. 13.2.

Fig. 13.2
figure 2

Reconstruction results from 3D LGE imaging. The complex-valued NN from [17] matches the performance of compressed sensing, whereas its real-valued counterpart exhibits artifacts

An alternative line of work based on robust artificial-neural-networks for k-space interpolation (RAKI) explored the use of k-space enhancement in a scan-specific manner [7]. These methods use the calibration data to train a convolutional NN for k-space interpolation, extending the linear convolutional kernels used in conventional parallel imaging methods. Since the training is performed on calibration data, these methods do not rely on reconstructions from existing CS-type algorithms or acquisition of ground-truth data for training, improving its applicability to undersampled CMR datasets while potentially outperforming its conventional counterparts. For myocardial T1 mapping, Akçakaya et al. showed that this approach outperforms its clinical parallel imaging counterpart calibrated on the same dataset in terms of noise reduction. An extension of this method, called self-consistent RAKI, was proposed by Hosseini et al. [34]. For coronary MRI, it was shown to outperform its conventional counterpart based on the SPIRiT formulation. One downside of these scan-specific methods is the longer reconstruction time due to the need for retraining for each scan.

Superresolution methods have also been used in the context of whole-heart CMR [72]. Steeden et al. trained a 3D residual UNet using synthetic downsampling by keeping 50% of the slice resolution and 50% of the phase resolution. Then using prospectively acquired data with threefold savings in scan time, the super-resolved images were shown to have better edge sharpness than low-resolution or high-resolution images and significantly better SNR than high-resolution data. However, a significant underestimation was found in the proximal left coronary artery diameter using the superresolution approach, with no other significant differences for measurements in other segments.

Deep Learning with Algorithm Unrolling in CMR Reconstruction

While image/k-space enhancement methods have received attention in CMR for their speed and lower memory requirements, more advanced reconstruction strategies may be developed by incorporating further information about the measured k-space data. Algorithm unrolling is a natural fit for such methods. Powerful regularization networks are alternately applied with data consistency layers, which allows one to exploit not only the available information in the raw measurement data but also redundant information in different domains. Various applications in static and dynamic CMR settings will be outlined in the following sections.

Static CMR Applications

Learned unrolled optimization schemes have been shown beneficial in a variety of reconstruction approaches for static CMR applications. Fuin et al. proposed a multi-scale variational network that uses a multi-scale fields-of-experts regularization [22, 28, 65], realized by filter kernels of different size for coronary magnetic resonance angiography (CMRA). Furthermore, applying the activation functions on magnitude and phase images further improved the reconstruction results. The overall reconstruction time is ~14 s for the multi-scale variational network, compared to ~5 min for the CS approach. The multi-scale variational network allows for up to ninefold accelerated acquisitions (2:34 min) with comparable image quality to the fully sampled reference scan (acquisition time: 18:55 min).

Malave et al. proposed a CNN with a proximal mapping step as data consistency layer for the reconstruction of 3D image navigators (iNAV) in a CMRA sequence [51]. The reconstructed 3D iNAV enables fast nonrigid motion estimation, without loss of quality compared to ℓ1 ESPIRiT. Reported average reconstructions times per 3D iNAV are 0.53 s for the proposed network with data consistency and 1.55 s for ℓ1 ESPIRiT using an NVIDIA Titan RTX graphics card.

Self-Supervised Learning

As training data is challenging to acquire for dynamic cardiac applications, self-supervised learning approaches have gained attention in the field of CMR reconstruction. Yaman et al. [79] proposed a physics-guided self-supervised learning approach for late gadolinium enhancement in cardiac MRI. The core of self-supervision lies in the sampling masks. Two masks are used to divide the acquired k-space locations into a training set and a set that is used during data consistency , embedded in an unrolled reconstruction algorithm. This approach for 6× undersampling outperforms clinically used CS with 3× acceleration [4] as depicted in Fig. 13.3.

Fig. 13.3
figure 3

Representative reconstruction results of a 3D LGE scan from a patient. The compressed sensing (CS) reconstruction has been used in clinical studies [5, 8]. The self-supervised deep learning approach at both R = 3 and 6 outperform CS reconstruction at R = 3 by suppressing noise and residual artifacts. All reconstruction methods successfully identify LGE shown with yellow arrows

A specific time-interleaved sampling strategy serves as training data for semi-supervised learning in Ke et al. [38]. The training data is generated by using the acquired k-space data from adjacent temporal frames. The network consists of an ADMM-Net-III architecture and data consistency layers to reconstruct coil-by-coil images. A subsequent CNN block performs coil combination of the single-coil images to form the final reconstruction.

Dynamic/Quantitative CMR Applications

One of the first unrolled reconstruction approaches to accelerate dynamic single-coil cine reconstruction was proposed by Schlemper et al. in 2018 [68]. The data consistency CNN (DC-CNN) describes a proximal gradient scheme. The proximal mapping for the data consistency is solved in closed-form solution, and a 5-layer CNN is used as image regularization network. Average reconstruction time are reported with 8 s on an NVIDIA GeForce GTX 1080. In contrast, Dictionary Learning took 6.6 h per subject on the CPU.

Based on this work [68], Qin et al. proposed a recurrent convolutional neural network (CRNN) approach to exploit two types of recurrencies. First, a bidirectional convolutional recurrent unit is used to propagate information through time. Second, iteration recurrencies are used to propagate information across the unrolled iterations. While reducing the network parameters substantially (by a factor of 3), the image quality is improved compared to DC-CNN [68]. Additionally, runtime is reduced from 8 s (DC-CNN) to 6 s (CRNN).

ML for ungated cardiac multi-coil MRI and free-breathing was studied in Biswas et al. [9]. The proposed MoDL-StoRM network alternates between a denoising CNN, exploits nonlocal redundancies via smoothness regularization on manifolds, and solves the proximal mapping for multi-coil MRI using the conjugate gradient algorithm [3]. The manifold for smoothness regularization is estimated from a navigator signal [59].

High-Dimensional Cine Reconstruction with Separated Convolutions

Learning deep NNs is limited by the availability of data and resources, which makes it challenging to deal with high-dimensional data. Kuestner et al. proposed CINENet for the reconstruction of 3D+t data [45]. Unrolled optimization is performed with a multi-coil data consistency layer, and a complex-valued UNET as image prior network. To deal with the 3D+t data, the convolutions were separated into 3D spatial convolutions followed by 1D temporal convolutions . Results comparing single breath-hold CINENet to conventional multi-breath-hold 2D+t CINE acquisitions are depicted in Fig. 13.4. CINENet enables single breath-hold acquisitions, which not only reduces acquisition time but also minimizes slice misalignment and the impact of respiratory motion.

Fig. 13.4
figure 4

Example reconstructions of CINENet [45]. As reference, an accelerated 3D single breath-hold CINE scan (acquisition time 31 s) and the clinical standard multi breath-hold scan (acquisition time 260 s) are shown. CINENet shows the least artifacts compared to zero-filling and compressed sensing for ninefold undersampling (acquisition time 12 s) and 15-fold undersampling (acquisition time 7 s)

Separated convolutions not only reduce the number of parameters but also yield improved performance compared to 3D convolutions. This was further shown in Kofler et al. for image enhancement [43], and Hammernik et al. for non-Cartesian cine reconstruction using a Proximal Gradient Variational Network [30].

Sandino et al. [66] showed that separated 2D+t convolutions produced higher image quality measures for DL-ESPIRiT compared to 3D convolutions. Additionally, this network makes use of an extended set of coil sensitivity maps to overcome limited field-of-view issues. The coil sensitivity maps are embedded in the gradient-descent data consistency layer. Both data consistency layer and a 2D+t convolutional unit are applied iteratively. The proposed DL-ESPIRiT network does yield not only superior image quality measures but also more accurate functional cardiac parameters, such as left ventricular end-diastolic volume, end-systolic volume, stroke volume, and ejection fraction. Additionally, runtime decreases for the 3D approach (3.89 s) and 2D+t approach (4.89 s) compared to ℓ1 ESPIRiT (5.36 s) for a single slice.

Exploiting Complementary Domains

Most of the approaches learn an NN in image domain. However, hybrid approaches make use of NN for both k-space interpolation and artifact reduction in image domain [19]. El-Rewaidy et al. use complex-valued subnetworks in k-space and image domain for the reconstruction of radial dynamic cardiac MRI data [18]. The re-gridded k-space is processed by a k-space network to interpolate the missing k-space data and exploits redundancies across different time frames. After an inverse Fourier transform, an image domain network is applied to remove remaining aliasing artifacts and to exploit information from multiple coils and different time frames in the finally reconstructed output frame. The networks themselves consist of complex-valued operations, radial batch normalization, and complex ReLUs (XReLUs).

Strong correlations between space, time, and frequency domain are exploited in [62, 63]. Inspired by k-t reconstruction, kt-NEXT [62] recovers signals in x-f domain, combined with an image domain network and data consistency layers. The complementary time-frequency domain networks [63] propagate information in x-t space and x-f space using a bidirectional and a unidirectional CNN, respectively. Multi-coil information is handled efficiently using a variable splitting scheme. Results show that imposing structure and exploiting all kinds of available information and redundancies is beneficial to reconstruct high-quality images from highly undersampled cardiac MRI data, as shown in Fig. 13.5.

Fig. 13.5
figure 5

Qin et al. [63]: Results are shown for undersampling rates R=16 (b) for different methods (c, e, g and i) on spatial and temporal dimensions and their respective error maps (d, f, h and j) compared to the ground truth image (a) (The proposed (i) k-t VS-NEXT exploits information in the complementary time-frequency domain. This outperforms traditional methods such as (c) kt-SLR [48], and learning-based methods such as the (e) variational network [29], and the (g) cascaded recurrent network [61]

Advanced Topics and Limitations

Apart from the enhancement and reconstruction approaches discussed earlier for handling subsampled data, there are several other venues in which DL may be useful for making CMR faster and more robust. We will overview new advances in motion correction as one such direction while also discussing challenges and future directions for DL-based CMR reconstruction and enhancement.

Deep Learning for Motion Correction in CMR

Motion-related artifacts distort images in a way that they become unusable for image analysis. Motion artifacts may occur when, e.g., the patient moves during the examination or when mis-triggering in the ECG occurs. In the following, we outline approaches that use machine learning to correct for motion artifacts in cardiac MR imaging.

Zhang et al. propose to correct CMR motion artifacts using adversarial training [83]. Generated motion artifacts serve as training data. A generator network, modeled by a residual architecture, tries to correct for the motion in image domain. The network is trained using a combination of Wasserstein loss, content loss, and edge loss to efficiently remove image blurring.

In Ghodrati et al., Variational Autoencoders (VAEs) were used for retrospective respiratory motion correction in free-breathing cine MRI [23]. The VAEs are trained on unpaired data of healthy volunteers and patients with suspected cardiovascular disease. While the encoder-decoder structure preserves the general structure of the free-breathing images by learning an identity mapping, a discriminator network directs the encoder to remove artifacts.

Oksuz et al. address MR motion correction directly from k-space [54]. A generator network, motivated by Automap [85], transforms the k-space directly to image space. The network is trained with an adversarial training strategy on pairs of synthetically corrupted k-space data and uncorrupted images. The proposed network is able to correct the motion artifacts; however, textural information is lost.

NNs are used to detect and correct for cardiac MRI motion artifacts in Oksuz et al. [55]. A detection network is used to identify the corrupted k-space lines. The motion-corrected predicted sampling masks are fed into a data consistency layer. An additional CRNN is used to account for artifacts in image domain. For training, a reconstruction loss, i.e., mean-squared error, is combined with a detection loss , which is defined via cross-entropy of the binary mask and the probability of the predicted line being uncorrupted. This network outperforms the Automap-GAN [54] substantially.

Kuestner et al. proposed an end-to-end image reconstruction and motion-correction network in 4D magnetic resonance imaging of the body trunk [46]. A local all-pass network is deployed to estimate nonrigid deformation fields on motion-resolved k-spaces. These fields are directly fed into the data consistency layer for reconstruction. Both LAP net and the reconstruction network are trained jointly by minimizing a combined mean-squared error and end-point error as loss function.

Motion compensation is applied as a post-processing step for dynamic MRI in Huang et al. [35]. The reconstruction is performed in three steps. First, dynamic MRI reconstruction is addressed with a Convolutional-Gated Recurrent Units (ConvGRUs), followed by a data consistency layer. The reconstruction step is followed by a motion estimation step using a FlowNet with a Convolutional UNET architecture as backbone. The estimated motion is finally used to refine the reconstructed frames with the estimated motion. The proposed approach outperforms DC-CNN [68] and classical approaches such as kt-SLR [48].

Seegolam et al. embed motion estimation directly in a cascaded reconstruction approach [69]. A UNET architecture is used to estimate the motion fields. A novel data-consistent motion-augmented cine (DC-MAC) layer is presented that wraps an image using the estimated motion field to a consecutive frame to serve for data consistency. As all frames are consecutively processed, information is carried over the whole temporal axis. This approach allows for highly effective reconstruction of extremely accelerated data.

Challenges and Future Directions

In this chapter, we presented an overview of recent advances in using machine learning for cardiac MRI reconstruction and enhancement. While the presented methods hold a lot of promise, there are still some challenging open questions, which require further investigation.

In static CMR exams, one is often interested in the detection of subtle anatomical changes, such as stenoses in coronary MRI. Similarly, in dynamic CMR exams, the temporal dynamics are of utmost importance. Thus, in CMR, there is often a question about the degree of spatiotemporal blurring artifacts for regularized reconstruction methods. While DL methods tend to outperform compressed sensing methods for regularized reconstruction, more clinical studies are needed to establish that no artifact is introduced due to regularization. DL methods also face an additional challenge since they are trained on large databases. Thus, further investigations are also needed to assess how the methods will generalize to rare pathologies that are not prominent in the training databases [15] or adverse conditions like arrhythmia.

Several contemporary CMR scans are multidimensional in nature, often incorporating anatomical, dynamical, and quantitative information. While algorithm unrolling has proved to be very successful for DL reconstruction, their application to multidimensional data may be challenging. As these methods require enforcing consistency with acquired k-space data, as well as multidimensional NNs for regularization, the memory requirements may be prohibitive for typical GPU training. Thus , alternative methodologies for efficient training [39] and deployment strategies for multidimensional CMR data may be necessary.

CMR scans are often acquired in an undersampled manner. This is either due to acquisition constraints that make it impossible to fully sample data or due to time constraints in clinical workflow. Nonetheless, it is difficult or impossible to get fully sampled ground-truth data for most CMR scans. Early work in the field tackled this challenge by using surrogate reconstructions [17, 20, 45, 71], while later work has focused on using self-supervision with undersampled datasets [77] to achieve the full potential of DL methods. However, such techniques tend to rely on pixel-wise k-space losses, which are prone to blurring artifacts. The incorporation of more advanced loss functions, such as perceptual losses that align better with visual assessments, to such learning strategies is an open research problem.

A very important, yet less frequently studied area is motion compensation in cardiac MR imaging. The current available are very promising to improve image reconstruction substantially when incorporating motion into the reconstruction process. However, further investigations have to be made on multichannel MRI data used in a clinical setting and for applications such as perfusion and real-time imaging.

Finally, while large MR reconstruction data exists for musculoskeletal imaging, the availability of publicly available cardiac datasets is still under development with only one such example [16]. The sharing of larger training databases will help development for CMR reconstruction and enhancement.