Keywords

1 Introduction

Ensuring high image quality is essential for image analysis pipelines to extract clinically useful information. Misleading diagnoses can be made when the original data are of low quality, in particular for cardiac magnetic resonance (CMR) imaging, where cardiac indices are extracted using post-processing techniques including segmentation and registration. CMR images can contain a range of image artefacts [1], which can reduce the accuracy of image analysis. Improving the quality of such images acquired on MR scanners is a challenging task.

Fig. 1.
figure 1

Detection and correction of MR artefacts using predicted data consistency masks.

One approach for correcting artefacts is image reconstruction using deep neural networks. In deep learning based reconstruction of accelerated (i.e. undersampled) CMR, a pre-determined k-space acquisition trajectory is used and those parts of k-space that have not been sampled are estimated using an inverse problem formulation to reconstruct the image [13]. A different, but related problem exists in fully sampled acquisitions that have been corrupted by motion artefacts, for example due to mis-triggering or arrythmias. In these cases, the data contain correct k-space lines as well as corrupted lines, but it is unknown which are correct and which are corrupted. This problem is our focus in this paper but we draw inspiration from work on accelerated imaging in devising our solution.

We propose a k-space artefact detection network that generates an individual data consistency term for any given acquisition and converts the image artefact correction task to an undersampled image reconstruction problem, which is subsequently addressed by an algorithm developed for reconstruction of undersampled CMR acquisitions (see the illustration in Fig. 1). Our proposed method is evaluated using 300 cine SSFP (2D+time) CMR datasets from the UK Biobank.

The major contributions of this work are as follows: First, we introduce a novel solution for the detection of artefacts in CMR images. Second, we use the output of this k-space artefact detection network to introduce a data consistency term to be used by an image reconstruction network. By training both networks end-to-end we are able to ignore motion corrupted k-space lines during the reconstruction. Finally, our algorithm is trained and tested also on uncorrupted images, which demonstrates its utility as a generic image reconstruction algorithm.

2 Background

Deep learning has recently shown great promise in reconstruction of highly undersampled MR acquisitions with convolutional neural networks (CNNs) [4, 12, 13]. For example, Schlemper et al. [13] proposed to use a deep cascaded network to generate high quality images, and Hauptmann et al. [3] proposed to use a residual U-net to reduce aliasing artefacts due to undersampling with the purpose of accelerating image acquisition.

For automatic correction of CMR, Lotjonen et al. [7] used reconstructed short-axis and long-axis slices to optimise the locations of the slices using mutual information as a similarity measure. Estimating high quality images from corrupted (or under-sampled) k-space has been a well investigated subject in the literature [2]. The problem can be addressed either in the k-space domain or the image domain. One choice is to correct the k-space before applying the inverse Fourier transform (IFT) as proposed by Han et al. [2]. A more common approach is to use the IFT on k-space and learn a mapping between the corrupted reconstructed images and good quality images. To this end, a variety of image denoising techniques can be utilized such as autoencoders [15], residual learning networks [16] or wide networks [6]. Zhu et al. [17] proposed an end-to-end image reconstruction approach (Automap) for MR and evaluated it on undersampled k-space data.

3 Methods

Our network architecture consists of two sub-networks that are trained jointly as visualized in Fig. 2. The first network is an artefact detection network which is used to identify potentially corrupted k-space lines and hence define a data-consistency term. and the second network is a recurrent convolutional neural network (RCNN) used for reconstruction using this data-consistency term [12]. Details of both networks are provided below.

3.1 Network Architecture

The proposed artefact detection CNN consists of eight layers The architecture of our network follows a similar architecture to [14], which was originally developed for video classification using a spatio-temporal 3D CNN. In our case we use the third dimension as the time component and use 2D+time mid-ventricular sequences as the input to the network. Each image sequence has 50 time frames. The network has 4 convolutional layers and 4 pooling layers, 1 fully-connected layer and a softmax loss layer to predict corrupted k-space lines. After each convolutional layer, a ReLU activation is used. We then apply pooling on each feature map to reduce the filter responses to a lower dimensionality. We apply dropout with a probability of 0.2 at all convolutional layers and after the first fully connected layer for regularization. All of these convolutional layers are applied with appropriate padding of 2 and stride of 1.

Fig. 2.
figure 2

The CNN architecture for motion artefact correction. The proposed network architecture consists of two building blocks (1) A corrupted k-space line detection network to define the data-consistency term; (2) A recurrent neural network (RCNN) architecture to correct image artefacts.

The reconstruction network features a RCNN architecture [12]. This network reconstructs high quality cardiac MR images from highly undersampled k-space data by jointly exploiting the dependencies of the temporal sequences as well as the iterative nature of traditional optimisation algorithms. In addition, spatio-temporal dependencies are simultaneously learned by exploiting bi-directional recurrent hidden connections across time sequences. Any reconstruction network can replace this network in our architecture. We chose this particular network for its capability to incorporate information from different frames, which is instrumental in correcting the artefacts that occur due to displacement of k-space lines in time.

3.2 Loss Function and Training

Our training objective is a linearly weighted combination of the image reconstruction loss and a cross-entropy loss for the detection of corrupted lines:

$$\mathcal {L}_{\text {total}}= \lambda \mathcal {L}_{\text {detection}} + (1-\lambda ) \mathcal {L}_{\text {reconstruction}} $$

The reconstruction loss is computed using the mean square error, defined as:

$$\mathcal {L}_{\text {reconstruction}}= \dfrac{1}{N_{p}} \sum _{p=0}^{N_{p}} (I_{x}(p)-I_{y}(p))^2 $$

where p denotes each pixel and \(N_{p}\) denotes the total number of pixels in images \(I_{x}\) and \(I_{y}\). The detection loss is the cross entropy loss, defined as:

$$\mathcal {L}_{\text {detection}}(pr,y)= \dfrac{1}{N_{l}} -(y \log (pr) + (1-y) \log (1-pr))$$

where y is a binary indicator (0 or 1) indicating if a k-space line is corrupted or not and pr is predicted probability of the line being uncorrupted. \(N_{p}\) denotes the total number of k-space lines in an image.

We used the Adam optimizer to minimise the binary cross entropy and mean square error loss function. \(\lambda \), which defines the contribution of each loss was set to 0.3 using the validation set. The cross entropy term represents the dissimilarity of the predicted output distribution to the true distribution of labels after a softmax layer. The detection and reconstruction networks were pre-trained for 50 epochs separately to enable faster convergence. End-to-end training ended when the network did not significantly improve its performance on the validation set for a predefined number of epochs (100). An improvement was considered sufficient if the relative increase in performance was at least 0.5%.

During training, a batch-size of 5 2D+time sequences was used. The momentum of the optimizer was set to 0.90 and the learning rate was \(10^{-4}\). The parameters of the convolutional and fully-connected layers were initialised randomly from a zero-mean Gaussian distribution. In each trial, training was continued until the network converged. Convergence was defined as a state in which no substantial progress was observed in the training loss. We used Pytorch for implementation of the network and training took around 3 h on a NVIDIA Quadro P6000 GPU. After training, deployment of the network to correct a single image sequence took less than 1 s.

4 Experimental Results

We evaluated our algorithm on a subset of the UK Biobank dataset containing 300 datasets each consisting of 50 2D+time good quality cine CMR acquisitions at a mid-ventricular short axis slice. From each subject, the 50 temporal frames were used to generate synthetic motion artefacts. We used 200 datasets for training, 50 for validation and 50 for testing. The total of 300 subjects were chosen to be free of other types of image quality issues, such as missing axial slices, and were visually verified by an expert cardiologist for sufficient image quality. The details of the acquisition protocol of the UK Biobank dataset can be found in [11].

Data Preprocessing: Given a 2D+time cine CMR sequence of images we first normalise the pixel values between 0 and 1. Since the image dimensions vary from subject to subject, instead of image reshaping we use a motion information based ROI extraction to \(64\times 64\) pixels [5]. Briefly, the ROI is determined by performing a Fourier analysis of the sequences of images, followed by a circular Hough transform to highlight the center of periodically moving structures.

K-space Corruption for Synthetic Data: We generated k-space corrupted data in order to simulate motion artefacts. We followed a Cartesian sampling strategy for k-space corruption to generate synthetic but realistic motion artefacts [8, 10]. We retrospectively transformed each 2D short axis sequence to the Fourier domain and changed a number (0, 2, 4, 8, 16) of Cartesian sampling k-space lines to the corresponding lines from other cardiac phases to mimic motion artefacts. These lines were selected randomly in order to mimic the randomness of real motion artefacts. The newly introduced k-space lines were also selected randomly from all other frames in the image sequence. In this way CMR images with artefacts were generated from the original ‘good quality’ images in the training set. This is a realistic approach as the motion artefacts that occur from mis-triggering arise from similar displacement (in time) of an arbitrary set of k-space lines.

Methods of Comparison: We compared our algorithm to a variety of classes of artefact correction strategy as outlined in Sect. 2. For image-to-image to artefact removal (i.e. post-reconstruction), we used a deep network based on residual learning (DNCNN) as well as a wide network with larger receptive fields and more channels in each layer as proposed in [6] (WIN5). We also compared our method to a reconstruction algorithm that uses an end-to-end correction methodology [9] (Automap-GAN) based on Automap [17]. Additionally, we compared our approach to its variants: (1) training detection and reconstruction networks separately (Proposed-separate); and (2) considering the corruption mask as a pre-determined mask to illustrate the top performance achievable in this setup (Proposed-known mask).

Table 1. Mean image quality results of image quality correction for motion artefacts for corrupted and uncorrupted inputs. Uncorrupted results use the correct k-space as input and indicates the potential of our method to be used as a global image reconstruction framework.

Quantitative Results: Table 1 shows the image quality metrics for the corrected images produced by each image artefact correction algorithm for corrupted and original images. For these experiments the ground truth was the uncorrupted original 2D+time image sequence and we use peak signal to noise ratio (PSNR), root mean square error (RMSE) and structural similarity index metric (SSIM) for evaluation. The proposed end-to-end k-space detection and correction algorithm outperforms the other methods. As can be appreciated, the joint end-to-end network performs better compared to separate training of both architectures. Compared to the image-to-image denoising techniques (Win5, DNCNN), k-space based correction (Automap-GAN) results in improved reconstructions of the images. We have also shown results on using original images as input to illustrate the capability of our method as a general image reconstruction algorithm. Our method outperforms the other state of the art techniques and does not diminish the image quality of the original k-space thanks to the detection network. Baseline and proposed-known mask methods provide perfect image quality in case of uncorrupted images and therefore omitted.

Fig. 3.
figure 3

The results produced by Win5 (second column), Automap-GAN (third column) and proposed algorithm (final column). In the first column, the top row shows the corrupted image and the bottom row shows the corresponding uncorrupted image. It is evident from the difference images in the bottom row that image quality is recovered at the septum using proposed method.

Qualitative Results: In Fig. 3 we illustrate the performance of our technique on artefact correction in comparison to the top state-of-the-art techniques [6, 9]. The difference image shows improved image quality with the proposed technique, especially in the left ventricular and right ventricular regions and with regard to the sharpness of the myocardial boundaries. These results demonstrate that the network reduced the impact of k-space corruption on reconstruction quality, as the (beating) ventricles and their myocardial borders are mostly affected by such corruption.

5 Discussion and Conclusion

In this paper, we have proposed a CNN-based technique for correcting motion-related artefacts in a large 2D+time CINE CMR dataset. We have addressed the issue of incorrect k-space lines using a combined architecture to detect, correct and reconstruct images. The proposed network clearly outperforms competing algorithms. Moreover, the current architecture can also be used as a global image reconstructor, as we have shown that it does not diminish the quality of uncorrupted images, compared to the original reconstruction performed on the scanner.

We have shown for the first time that a 3D CNN based neural network architecture is capable of classifying k-space lines that cause motion artefacts. Our work brings fully automated assessment of ventricular function from CMR imaging a step closer to clinically acceptable standards, enabling reconstruction of high quality images from data containing artefacts in order to enable their analysis in large imaging datasets such as the UK Biobank. In future work, we plan to validate our method on the entire UK Biobank cohort, which is eventually expected to be 100,000 CMR images.