Keywords

1 Introduction

With the advent of advanced medical imaging technologies, such as computed tomography (CT) and magnetic resonance (MR), non-invasive visualizations of various human organs and tissues become feasible and are widely utilized in clinical practice [8, 22]. Cardiac CT imaging and MR imaging play important roles in the understanding of cardiac anatomy, diagnosis of cardiac diseases [20] and multimodal visualizations [9]. Potential applications range from patient-specific treatment planning, virtual surgery, morphology assessment to biomedical simulations [2, 17]. However, in traditional procedures, visualizing human organs usually requires significant expert efforts and could take up to dozens of hours depending on the specific organs of interest [7, 24], which makes large-cohort studies prohibitive and limits clinical applications [16].

Empowered by the great feature extraction ability of deep neural networks (DNNs) and the strong parallel computing power of graph processing units (GPUs), automated visualizations of cardiac organs have been extensively explored in recent years [18, 24]. These methods typically follow a common processing flow that requires a series of post-processing steps to produce acceptable reconstruction results. Specifically, the organs of interest are first segmented from medical imaging data. After that, an isosurface generation algorithm, such as marching cubes [15], is utilized to create 3D visualizations typically with staircase appearance, followed by smoothing filters to create smooth meshes. Finally, manual corrections or connected component analyses [11] are applied to remove artifacts and improve topological correctness. The entire flow is not optimized in an end-to-end fashion, which might introduce and accumulate multi-step errors or still demand non-trivial manual efforts.

In such context, automated approaches that can directly and efficiently generate cardiac shapes from medical imaging data are highly desired. Recently, various DNN works [1, 3, 4, 12,13,14, 19] delve into this topic and achieve promising outcomes. In particular, the method depicted in [1] performs predictions of cardiac ventricles using both cine MR and patient metadata based on statistical shape modeling (SSM). Similarly, built on SSM,  [4] uses 2D cine MR slices to generate five cardiac meshes. Another approach proposed in [19] employs distortion energy to produce meshes of the aortic valves. Inspiringly, graph neural network (GNN) based methods [12,13,14] are shown to be capable of simultaneously reconstructing seven cardiac organs in a single pass, producing whole-heart meshes that are suitable for computational simulations of cardiac functioning. The training processes for these aforementioned methods are usually optimized via the Chamfer distance (CD) loss, a point cloud based evaluation metric. Such type of point cloud based losses is first calculated for each individual vertex, followed by an average across all vertices, which nonetheless does not take the overall mesh topology into consideration. This could result in suboptimal or even incorrect topology in the reconstructed mesh, which is undesirable.

To solve this issue, we introduce a novel surface loss that inherently considers the topology of the two to-be-compared meshes in the loss function, with a goal of optimizing the anatomical topology of the reconstructed mesh. The surface loss is defined by a computable norm on currents [6] and is originally introduced in [23] for diffeomorphic surface registration, which has extensive applicability in shape analysis and disease diagnosis [5, 21]. Motivated by its inherent ability to characterize and quantify a mesh’s topology, we make use of it to minimize the topology-considered overall difference between a reconstructed mesh and its corresponding ground truth mesh. Such currents guided supervision ensures effective and efficient whole-heart mesh reconstructions of seven cardiac organs, with high reconstruction accuracy and correct anatomical topology being attained.

Fig. 1.
figure 1

Illustration of the proposed pipeline. Abbreviations are as follows: instance normalization (IN), downsampling (DS), convolution (conv), graph convolution (GC), residual convolution (res-conv), and residual graph convolution (res-GC).

2 Methodology

Figure 1 illustrates the proposed end-to-end pipeline, consisting of a voxel feature extraction module (top panel) and a deformation module (middle panel). The inputs contain a CT or MR volume accompanied by seven initial spherical meshes. To be noted, the seven initial spherical meshes are the same for all training and testing cases. A volume encoder followed by a decoder is employed as the voxel feature extraction module, which is supervised by a segmentation loss comprising binary cross entropy (BCE) and Dice. This ensures that the extracted features explicitly encode the characteristics of the regions of interest (ROIs). For the deformation module, a GNN is utilized to map coordinates of the mesh vertices, combine and map trilinearly-interpolated voxel features indexed at each mesh vertex, extract mesh features, and deform the initial meshes to reconstruct the whole-heart meshes. There are three deformation blocks that progressively deform the initial meshes. Each deformation block is optimized on three types of losses: a surface loss for both accuracy and topology correctness purposes, a point cloud loss for an accuracy purpose, and three regularization losses for a smoothness purpose. The network structure details of the two modules are detailed in the supplementary material.

For an input CT or MR volume, it passes into the voxel feature extraction module to predict binary segmentation for the to-be-reconstructed ROIs. Meanwhile, the initial spherical meshes enter into the first deformation block along with the trilinearly-interpolated voxel features to predict the vertex-wise displacements of the initial meshes. Then, the updated meshes go through the following blocks for subsequent deformations. The third deformation block finally outputs the reconstructed whole-heart meshes. The three deformation blocks follow the same process, except for the meshes they deform and the trilinearly-interpolated voxel features they operate on. In the first deformation block, we use high-level voxel features, \(f_{3}\) and \(f_{4}\), obtained from the deepest layers of the volume encoder. In the second deformation block, the middle-level voxel features, \(f_{1}\) and \(f_{2}\), are employed. As for the last deformation block, its input meshes are usually quite accurate and only need to be locally refined. Thus, low-level voxel features are employed to supervise this refining process.

Surface Representation as Currents. Keeping in line with [23], we employ a generalized distribution from geometric measure theory, namely currents [6], to represent surfaces. Specifically, surfaces are represented as objects in a linear space equipped with a computable norm. Given a triangular mesh S embedded in \(\mathbb {R}^{3}\), it can be associated with a linear functional on the space of 2-form via the following equation

$$\begin{aligned} S(\omega ) = \int _{S}\omega (x)(u_{x}^{1},u_{x}^{2})d\sigma (x), \end{aligned}$$
(1)

where for each \(x\in {S}\) \(u_{x}^{1}\) and \(u_{x}^{2}\) form an orthonormal basis of the tangent plane at x. \(\omega (x)\) is a skew-symmetric bilinear function on \(\mathbb {R}^{3}\). \(d\sigma (x)\) represents the basic element of surface area. Subsequently, a surface can be represented as currents in the following expression

$$\begin{aligned} S(\omega ) = \sum _{f}\int _{f} \bar{\omega }(x)\cdot (u_{x}^{1}\times u_{x}^{2})d\sigma _{f} (x), \end{aligned}$$
(2)

where \(S(\omega )\) denotes the currents representation of the surface. f denotes each face of S and \(\sigma _{f}\) is the surface measure on f. \(\bar{\omega }(x)\) is the vectorial representation of \(\omega (x)\), with \(\cdot \) and \(\times \) respectively representing dot product and cross product. After the currents representation is established, an approximation of \(\omega \) over each face can be obtained by using its value at the face center.

Let \(f_{v}^{1}\), \(f_{v}^{2}\), \(f_{v}^{3}\) denote the three vertices of a face f, \(e^1=f_{v}^{2}-f_{v}^{3}\), \(e^2=f_{v}^{3}-f_{v}^{1}\), \(e^3=f_{v}^{1}-f_{v}^{2}\) are the edges, \(c(f) = \frac{1}{3}(f_{v}^{1}+f_{v}^{2}+f_{v}^{3})\) is the center of the face and \(N(f)=\frac{1}{2}(e^{2} \times e^{3})\) is the normal vector of the face with its length being equal to the face area. Then, \(\omega \) can be approximated over the face by its value at the face center, resulting in \(S(\omega )\approx \sum _{f}\bar{\omega }(c(f))\cdot N(f)\). In fact, the approximation is a sum of linear evaluation functionals \(C(S)=\sum _{f}\delta _{c(f)}^{N(f)}\) associated with a Reproducing Kernel Hilbert Space (RKHS) under the constraints presented elsewhere [23]. Thus, \(S_{\varepsilon }\), the discrepancy between two surfaces S and T, can be approximately calculated via the RKHS as below

$$\begin{aligned} \begin{aligned} S_{\varepsilon } = \left| \left| C(S)-C(T)) \right| \right| ^{2}_{W^{*}}&=\sum _{f,g}N(f)^{T}k_{W}(c(g),c(f))N(g) \\&- 2\sum _{f,q}N(f)^{T}k_{W}(c(q),c(f))N(q) \\ {}&+ \sum _{q,r}N(q)^{T}k_{W}(c(q),c(r))N(r), \end{aligned} \end{aligned}$$
(3)

where \(W^{*}\) is the dual space of a Hilbert space \((W,\left\langle \cdot ,\cdot \right\rangle _{W})\) of differential 2-forms and \(\left| \left| \; \right| \right| ^{2}\) is \(l_{2}\)-norm. \(()^{T}\) denotes the transpose operator. f, g index the faces of S and q, r index the faces of T. \(k_{W}\) is an isometry between \(W^{*}\) and W, and we have \(\left\langle \delta _{x}^{\xi },\delta _{y}^{\eta }\right\rangle _{W^{*}}=k_{W}(x,y)\xi \cdot \eta \) [23]. The first and third terms enforce the structural integrity of the two surfaces, while the middle term penalizes the geometric and spatial discrepancies between them. With this preferable property, Eq. 3 fulfills the topology correctness purpose, the key of this proposed pipeline.

Surface Loss. As in [23], we choose a Gaussian kernel as the instance of \(k_{W}\). Namely, \(k_{W}(x,y)=exp(-\frac{\left\| x-y \right\| ^2}{\sigma _{W}^{2}})\), where x and y are the centers of two faces and \(\sigma _{W}\) is a scale controlling parameter that controls the affecting scale between the two faces. Therefore, the surface loss can be expressed as

$$\begin{aligned} \begin{aligned} L_{surface}&=\sum _{t_{1},t_{2}}exp(-\frac{\left\| c(t_{1}) - c(t_{2}) \right\| ^2}{\sigma _{W}^{2}})N(t_{1})^{T}\cdot N(t_{2}) - 2\sum _{t,p}exp(-\frac{\left\| c(t) - c(p) \right\| ^2}{\sigma _{W}^{2}})N(t)^{T}\cdot N(p) \\&+ \sum _{p_{1},p_{2}}exp(-\frac{\left\| c(p_{1}) - c(p_{2}) \right\| ^2}{\sigma _{W}^{2}})N(p_{1})^{T}\cdot N(p_{2}), \end{aligned} \end{aligned}$$
(4)

where \(t_{1}\), \(t_{2}\), t and \(p_{1}\), \(p_{2}\), p respectively index faces on the reconstructed surfaces \(S_{R}\) and those on the corresponding ground truth surfaces \(S_{T}\). \(L_{surface}\) not only considers each face on the surfaces but also its corresponding direction. When the reconstructed surfaces are exactly the same as the ground truth, the surface loss \(L_{surface}\) should be 0. Otherwise, \(L_{surface}\) is a bounded positive value [23]. Minimizing \(L_{surface}\) enforces the reconstructed surfaces to be progressively close to the ground truth as the training procedure develops.

Fig. 2.
figure 2

Illustration of how \(\sigma _{W}\) controls the affecting scale in the surface loss. The three surfaces represent an identical left atrium structure with different \(\sigma _{W}\) controls. Red, blue, and green circles denote three centers of the same surface faces. The varying colors represent the magnitudes of effects between the red circle and all other face centers on the same surface.

Figure 2 illustrates how \(\sigma _{W}\) controls the affecting scale of a face on a surface. The three surfaces are identical meshes of a left atrium structure except for the affecting scale (shown in different colors) on them. There are three colored circles (red, blue, and green) respectively representing the centers of three faces on the surfaces, and the arrowed vectors on these circles denote the corresponding face normals. The color bar ranges from 0 to 1, with 0 representing no effect and 1 representing the most significant effect. From Fig. 2, the distance between the blue circle and the red one is closer than that between the blue circle and the green one, and the effect between the red circle and the blue one is accordingly larger than that between the red circle and the green one. With \(\sigma _{W}\) varying from a large value to a small one, the effects between the red face and other remaining faces become increasingly small. In this way, we are able to control the acting scale of the surface loss via changing the value of \(\sigma _{W}\). Assigning \(\sigma _{W}\) a value that covers the entire surface results in a global topology encoding of the surface, while assigning a small value that only covers neighbors shall result in a topology encoding that focuses on local geometries.

Loss Function. In addition to the surface loss we introduce above, we also involve two segmentation losses \(L_{BCE}\) and \(L_{Dice}\), one point cloud loss \(L_{CD}\), and three regularization losses \(L_{laplace}\), \(L_{edge}\), and \(L_{normal}\) that comply with [13]. The total loss function can be expressed as:

$$\begin{aligned}&L_{total} = L_{seg} + L_{mesh_{1}} + L_{mesh_{2}} + L_{mesh_{3}},\end{aligned}$$
(5)
$$\begin{aligned}&L_{seg} = w_{s}(L_{BCE} + L_{Dice}),\end{aligned}$$
(6)
$$\begin{aligned}&L_{mesh} = L_{surface}^{w_{1}}\cdot L_{CD}^{w_{2}}\cdot L_{laplace}^{w_{3}}\cdot L_{edge}^{w_{4}}\cdot L_{normal}^{w_{5}}. \end{aligned}$$
(7)

where \(w_{s}\) is the weight for the segmentation loss, and \(w_{1}\), \(w_{2}\), \(w_{3}\) and \(w_{4}\) are respectively the weights for the surface loss, the Chamfer distance, the Laplace loss, and the edge loss. The geometric mean is adopted to combine the five individual mesh losses to accommodate their different magnitudes.

\(L_{seg}\) ensures useful feature learning of the ROIs. \(L_{surface}\) enforces the integrity of the reconstructed meshes and makes them topologically similar to the ground truth. \(L_{CD}\) makes the point cloud representation of the reconstructed meshes to be close to that of the ground truth. Additionally, \(L_{laplace}\), \(L_{edge}\), and \(L_{normal}\) are employed for the smoothness consideration of the reconstructed meshes.

3 Experiments

Datasets and Preprocessing. We evaluate and validate our method on a publicly-accessible dataset MM-WHS (multi-modality whole heart segmentation) [24], which contains 3D cardiac images of both CT and MR modalities. 20 cardiac CT volumes and 20 cardiac MR volumes are provided in the training set. 40 held-out cardiac CT volumes and 40 held-out cardiac MR volumes are offered in the testing set. All training and testing cases are accompanied by expert-labeled segmentation of seven heart structures: the left ventricle (LV), the right ventricle (RV), the left atrium (LA), the right atrium (RA), the myocardium of the LV (Myo), the ascending aorta (Ao) and the pulmonary artery (PA). For preprocessing, we follow [13] to perform resizing, intensity normalization, and data augmentation (random rotation, scaling, shearing, and elastic warping) for each training case. Data characteristics and preprocessing details are summarised in the supplementary material.

Evaluation Metrics. In order to compare with existing state-of-the-art (SOTA) methods, four metrics as in [13] are employed for evaluation, including Dice, Jaccard, average symmetric surface distance (ASSD), and Hausdorff distance (HD). Furthermore, intersected mesh facets are detected by TetGen [10] and used for quantifying self-intersection (SI).

Table 1. Comparisons with two SOTA methods on the MM-WHS CT test data. The MeshDeform [13] results are obtained from our self-reimplementation, while the Voxel2Mesh results are directly copied from [13] since its code has not been open sourced yet. WH denotes whole heart.

Results. We compare our method with two SOTA methods on the five evaluation metrics. Ours and MeshDeform [13] are trained on the same dataset consisting of 16 CT and 16 MR data that are randomly selected from the MM-WHS training set with 60 augmentations for each, and the remaining 4 CT and 4 MR are used for validation. Evaluations are performed on the encrypted testing set with the officially provided executables. We reimplement MeshDeform [13] with Pytorch according to the publicly available Tensorflow version. Please note the Voxel2Mesh results are directly obtained from [13] since its code has not been open sourced yet. Training settings are detailed in the supplementary material.

Table 1 shows evaluation results on the seven heart structures and the whole heart of the MM-WHS CT testing set. Our method achieves the best results in most entries. For SI, Voxel2Mesh holds the best results in most entries because of its unpooling operations in each deformation procedure, in which topological information is additionally used. However, as described in [13], Voxel2Mesh may easily encounter out-of-memory errors for its increasing vertices along the reconstruction process. More results for the MR data can be found in the supplementary material.

Fig. 3.
figure 3

Demonstration of the best as well as the worst cases for MeshDeform with respect to Dice and our results on the same cases.

Figure 3 shows the best and the worst CT results for MeshDeform with respect to Dice and our results on the same cases. Noticeably, the best case for MeshDeform is not the best for our method. For that best case of MeshDeform, we can see obvious folded areas on the mesh of PA, while our method yields more satisfactory visualization results. As for the worst case, both methods obtain unsatisfactory visualizations. However, the two structures (PA and RV) obtained from MeshDeform intersect with each other, leading to significant topological errors. Our method does not have such topology issues.

Ablation Study. For the ablation study, we train a model without the surface loss while keeping the rest the same. Table 2 shows the ablation analysis results on the CT data, which apparently validates the effectiveness of the surface loss.

Table 2. A comparison of reconstruction accuracy on the MM-WHS CT test data for the proposed method with and without (w.o.) the surface loss.

4 Conclusion

In this work, we propose and validate a whole-heart mesh reconstruction method incorporating a novel surface loss. Due to the intrinsic and favorable property of the currents representation, our method is able to generate accurate meshes with the correct topology.