Keywords

1 Introduction

Cardiac magnetic resonance imaging (MRI) is the gold standard modality for the assessment of cardiac anatomy and function [30]. In current clinical practice, most cine MRI protocols acquire a set of 2D image slices that intersect the underlying anatomy at multiple spatial locations and orientations. However, this only provides an approximate representation of the true 3D shape of the human heart, which limits the accuracy of cardiac disease diagnosis [12, 14, 21, 31]. Accordingly, many previous works have investigated methods to reconstruct complete 3D cardiac surfaces from the acquired 2D slices [3, 5, 19, 20, 32]. This is a challenging task primarily due to the high levels of sparsity in the 2D image acquisitions and the presence of slice misalignment induced by patient, respiratory, and cardiac motions [4]. Previous approaches have tackled these challenges by first segmenting the acquired 2D images in the cardiac substructures of interest, and then fitting a template mesh [17, 19, 20, 29] or optimized surface mesh [3, 5, 32] to the resulting contours in a per-case regularized optimization procedure.

More recently, deep learning techniques have increasingly been employed in various parts of the reconstruction process with the key benefits of faster execution speeds and easier scalability after termination of network training. However, these methods either rely on highly inefficient voxelgrid representations of anatomical surface data [11, 34], lack validation on real datasets [34], require additional preprocessing and postprocessing steps that are complex and error-prone [6, 11], can only process single image inputs [33, 36], or have only been evaluated on a small number of real cases [6]. In this work, we present Point2Mesh-Net, a novel geometric deep learning approach to convert sparse and misaligned MRI contours to dense cardiac surface meshes in a fully automated and efficient manner. It combines a point cloud-specific encoder and a mesh-specific decoder in a hierarchical design, for effective multi-scale feature learning on highly efficient representations of the anatomical surface data in a multi-domain setting. The tailored encoder architecture allows to directly process 3D point cloud representations of MRI contours and hence, can be smoothly integrated into a multi-step surface reconstruction pipeline. Furthermore, the decoder outputs high-resolution cardiac meshes with shared vertex connectivity across the dataset, which are suitable for a variety of follow-up 3D cardiac modeling tasks [7,8,9,10, 12, 14, 18, 20, 28].

2 Methods

2.1 Overview

We present an overview of the proposed Point2Mesh-Net combined with its training data preparation and its application as part of a multi-step cardiac reconstruction pipeline in Fig. 1.

Fig. 1.
figure 1

Overview of the purpose-built training dataset and the proposed Point2Mesh-Net embedded in a multi-step cardiac surface reconstruction pipeline. First, we create a synthetic dataset of sparse and misalinged point cloud MRI contours and the corresponding high-resolution cardiac surface meshes from a 3D MRI-based statistical shape model (SSM) (A) (Sect. 2.2). Next, we use this dataset to train the proposed Point2Mesh-Net (B) (Sect. 2.3). Finally, we apply the pre-trained Point2Mesh-Net as the key component of a three-step pipeline for efficient 3D cardiac shape reconstruction from cine MR images (C) (Sect. 3.2).

We first create a synthetic dataset based on high-resolution 3D MRI acquisitions to obtain both ground truth meshes and sparse and misaligned point cloud contours (Fig. 1-A) (Sect. 2.2). We then use this dataset to train the Point2Mesh-Net and conduct an initial validation (Fig. 1-B) (Sect. 2.3). Finally, we apply the pre-trained Point2Mesh-Net as the key component of a three-step cardiac shape reconstruction pipeline from raw cine MRI acquisitions in cross-domain transfer setting (Fig. 1-C) (Sect. 3.2).

2.2 Datasets and Preprocessing

We use both a synthetic dataset for network training and an initial evaluation and a real dataset in this work. The synthetic dataset is derived from a statistical shape model (SSM) based on 3D MRI acquisition of over 1000 subjects [1]. We first randomly generate 250 meshes from the SSM [1]. For each mesh, we randomly sample from a standard normal distribution along the first 99 modes of variation and apply the resulting deformations to the mean template mesh. Since the original 3D magnetic resonance (MR) images have a relatively high voxel resolution of \(1.25 \times 1.25 \times 2\) mm compared to the UK Biobank’s cine MRI protocol (\(1.8 \times 1.8 \times 8.0\) mm), we consider the resulting SSM-based meshes as our high-resolution ground truth for network training in this work. To obtain the sparse and misaligned input contours, we first determine the typical location and orientation of the short-axis, 2-chamber long-axis (LAX), and 4-chamber LAX planes of a standard cine MRI protocol for each of the deformed meshes. We then introduce random translation and rotation to each plane in a way that mimics motion-induced misalignment of real acquisitions. We repeat this procedure 10 times for each of the 250 deformed meshes, resulting in a total of 2500 input point clouds in our SSM dataset. We use a random dataset split of 75%/5%/20% as train, validation, and test datasets, respectively, on the 250 meshes and assign the 10 misaligned point clouds to the dataset of their corresponding mesh. Finally, we extract three separate datasets for the anatomical substructures left ventricular (LV) endocardium, LV epicardium, and right ventricular (RV) endocardium.

As our real dataset, we choose 1000 cine MRI acquisitions of the UK Biobank study [24] with equal representation of female and male cases. We then pass the raw images through the first two steps of our reconstruction pipeline (Fig. 1-C) to obtain point cloud contours in a suitable format for the Point2Mesh-Net. To this end, we first apply the pre-trained fully convolutional neural networks from [2] to each of the SAX and 4-chamber LAX images, and a conditional generative adversarial network with a U-Net generator trained on an in-house annotated UK Biobank dataset to each of the 2-chamber LAX images, to delineate the contours of the LV endocardium, LV epicardium, and RV endocardium. In the second pipeline step, we combine the resulting contours from all views and slices and place them in 3D space as separate point clouds for each anatomical substructure [3], in a similar way as the input data used by the network on the SSM dataset.

2.3 Network Architecture

The architecture of the proposed Point2Mesh-Net combines recent advances in point cloud and mesh-based geometric deep learning in a hierarchical encoder-decoder structure (Fig. 2).

The network inputs are sparse and misaligned MRI contours represented as point clouds in 3D space with 900 points. They are passed into an encoder consisting of two stacked PointNet layers, which are inspired by the PointNet [25], PointNet++ [26], and Point Completion Network [35] architectures, and a multilayer perceptron (MLP) to allow for step-wise multi-scale feature extraction directly on point cloud data. More specifically, we first apply two 1D convolutions connected by a batch normalization layer and a rectified linear unit activation function to the input point clouds as part of a Point Conv Block. Then, a per-point max pooling and an expansion operation are applied, and the output is concatenated with the result of the first Point Conv Block. Next, a second Point Conv Block followed by a point-wise maxpool layer is used before the final output of the point cloud encoder is generated by the shared MLP. The resulting latent space vector of size \(1 \times 128\) acts as a low-dimensional representation of the respective cardiac input shape and is passed to the network decoder. Its structure combines an initial MLP with four levels of spectral graph convolutions [13] in a hierarchical setup. Each graph convolution layer is followed by a rectified linear unit activation function, and four mesh upsampling layers [27] are used to increase the mesh resolution with each successively higher level in the decoder. This setup enables effective automatic feature learning on both a local and global scale and results in a gradual conversion of the low-dimensional latent space vector to a high-resolution 3D surface mesh as a dense representation of cardiac anatomy with corrected misalignment. Each output mesh consists of 1780 vertices and a vertex-to-vertex connectivity consistent across the entire dataset. We use the Chebyshev polynomial approximation [13] with order 5 for all spectral graph convolutions and quadric error minimization [27] to determine all mesh upsampling operations. In total, the network has approximately \(4\times 10^5\) trainable parameters.

2.4 Training and Implementation

We train the Point2Mesh-Net with a vertex-wise mean squared error loss function using the Adam optimizer [16] and a batch size of 8, until no improvement on the validation dataset is observed for 10 epochs. The deep learning code is implemented using the PyTorch [22] and PyTorch Geometric frameworks [15]. All training and evaluation steps are run on a CPU with 8 GB memory. We record an average training time of the network of approximately 5 h and an average per-case inference time of 0.2 s.

Fig. 2.
figure 2

Architecture of the proposed Point2Mesh-Net. A sparse and misaligned point cloud with 900 points and a dense 3D mesh with corrected misalignment constitute the network inputs and outputs, respectively. The architecture consists of an encoder and a decoder connected by low-dimensional latent space vector. Both the encoder and decoder are specifically designed for direct and efficient geometric deep learning-based processing of point cloud and mesh data, respectively.

3 Experiments and Results

3.1 Surface Reconstruction on Synthetic Dataset

As our first experiment, we aim to assess the ability of the proposed Point2Mesh-Net to accurately reconstruct different cardiac shapes with different types of misalignment on the synthetic SSM dataset. We therefore train three separate networks for each of the three cardiac substructures on the respective training datasets and then apply each of them to the corresponding unseen test dataset. Figure 3 depicts the results obtained by the network for three sample cases of the LV endocardial, LV epicardial, and the RV endocardial data, respectively.

Fig. 3.
figure 3

Qualitative reconstruction results of 3 sample cases from the SSM dataset.

We observe that the predicted meshes closely resemble the corresponding ground truth ones on both a local and global level. The results are consistent across different cardiac shapes, types of misalignment and sparsity, and the three different cardiac substructures. The largest reconstruction errors are typically found in the basal area of the ventricular anatomy.

In addition to a qualitative comparison, we also want to quantify the reconstruction quality achieved by our network. To this end, we select the median surface distance, the mean surface distance, and the Hausdorff distance between the predicted and ground truth meshes as our evaluation metrics and report the results on the unseen test dataset for each of the three cardiac substructures in Table 1. All metrics are calculated directly on the meshes output by the network without applying any post-processing steps. For each subject, we use the average scores of all 10 variations of input contours in our calculations.

Table 1. Reconstruction results of Point2Mesh-Net on the synthetic dataset.

We find the median and mean surface distances below the pixel size of the underlying image acquisitions and only small differences in network performances between the different anatomical substructures.

3.2 Surface Reconstruction Pipeline on Real Dataset

After the evaluation of the Point2Mesh-Net on the synthetic SSM dataset, we also want to analyze its applicability to real data as part of a multi-step reconstruction pipeline. Accordingly, we execute steps 1 and 2 of the pipeline for 1000 cases of the UK Biobank dataset and then pass the resulting sparse and misaligned contour point clouds through the Point2Mesh-Net pre-trained on the SSM dataset. We conduct this procedure separately for each cardiac substructure with the pertinent pre-trained networks and depict the results for three sample cases in Fig. 4.

Fig. 4.
figure 4

Qualitative reconstruction results of 3 sample cases from the UK Biobank dataset.

We observe realistic and smooth 3D mesh reconstructions without noticeable misalignment that accurately capture the various different shapes of the corresponding input point clouds, both globally and locally for a variety of misalignment types. Reconstruction quality is similar across the different cardiac substructures.

In order to also conduct a quantitative evaluation of the reconstruction performance of our method on the UK Biobank dataset despite its lack of ground truth 3D shapes, we calculate multiple commonly-used clinical metrics and compare the results with two large-scale population studies. More specifically, we compute the LV volume, LV mass, and the RV volume based on the 3D meshes reconstructed by the Point2Mesh-Net for 500 female and 500 male cases of the UK Biobank dataset (Table 2). As our benchmarks, we report the results obtained by a 2D slice-based calculation [23] and a 3D MRI-based computation approach [1] in Table 2.

Table 2. Comparison of clinical metrics calculated with different approaches.

We find a high degree of similarity between the results obtained by our method and the two clinical benchmarks for both mean and standard deviation values of all three metrics. Volume and mass differences between female and male cases are accurately reflected by the Point2Mesh-Net reconstructions with scores larger than the 2D slice-based approach and smaller than the 3D MRI-based calculation in all but one metric.

4 Discussion and Conclusion

In this work, we have proposed and evaluated the Point2Mesh-Net as a novel geometric deep learning approach for cardiac surface reconstruction from cine MRI contours. It achieves small average reconstruction errors below the underlying pixel resolution on a large synthetic dataset for multiple cardiac substructures, with high degrees of similarity between reconstruction and ground truth surfaces. This not only demonstrates the high suitability of its architectural design for the task at hand, but also its ability to cope with a variety of cardiac shapes, misalignment types, and sparsity levels on both a local and global level. When applying the networks pre-trained on the SSM dataset to the real UK Biobank dataset as part of a multi-step reconstruction pipeline, we also find highly realistic reconstructions both individually and on a population level, with widely-used clinical metrics in line with previous large-scale studies. This not only shows that the synthesized dataset adequately reflects the acquisition conditions found in real datasets, but also that the features learned by the Point2Mesh-Nets on the SSM dataset successfully transfer to the real domain and can harmoniously interact with other preprocessing steps in a multi-step pipeline. This is crucial for a more wide-spread applicability of the technique in various clinical and research settings and enables a more detailed analysis of 3D cardiac shape variability of the population that goes beyond purely volume-based metrics.

The results are achieved despite the challenging combination of point cloud inputs and mesh outputs, indicating that the modality-specific design of the encoder and decoder branches can handle both the reconstruction and modality-transfer tasks with high degrees of accuracy. In addition, the output meshes exhibit vertex correspondence, which is an important requirement for many follow-up tasks, such as shape analysis with principal component analysis or graph neural networks. These characteristics are in contrast to previous deep learning approaches [6, 34] that require transformations to inefficient voxelgrids or meshes as separate processing steps with considerable negative effects on complexity, execution times, and memory requirements. While we used separate networks to reconstruct the different anatomical substructures in this work, we hypothesize that the presented architecture with its compact latent space representation can also be expanded to decode multiple cardiac structures at the same time.

Compared to per-subject optimization approaches to cardiac shape reconstruction [3, 17, 19, 20, 29, 32], the Point2Mesh-Net drastically reduces the execution time, since the feature optimization is already conducted during the training phase based on population-wide shape information. Furthermore, our approach can successfully incorporate long-axis information, which is especially beneficial for an accurate reconstruction near the apical and basal regions of the ventricles and further sets it apart from many previous techniques such as [17].