Abstract
Systolic and diastolic registration of coronary arteries is a critical yet challenging step in coronary artery disease analysis. Most existing methods ignore the important relationship between vascular geometric shape and image contextual information in the two phases, leading to limited performance. In this paper, we propose a novel structural point registration network, which comprehensively captures both point-level geometric features and image-level semantic features as enriched feature representations to assist coronary registration. Specifically, given the systolic and diastolic CCTA images, our method improves coronary artery registration from three aspects. First, the point cloud encoder learns the spatial geometric features of the points in the 3D coronary mask to effectively capture the vascular shape representation. Second, a vision transformer (ViT) is employed to extract the image semantic information as a complementary condition of the geometric features to identify the bi-phasic correspondence of different vascular branches. Third, we design a transformer module to fuse the features across points and images to obtain the corresponding structural points in the two phases and then use structural points to guide the coronary artery registration via the thin-plate spline (TPS) method. We evaluated our method on a real-clinical dataset. Extensive experiments show that our proposed method significantly outperforms the state-of-the-art methods in coronary artery registration.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Coronary artery disease (CAD) is one of the most prevalent critical cardiovascular diseases with up to 32% mortality rate [18]. The CAD diagnosis necessitates reconstructing a 3D coronary artery tree, e.g., from CCTA images, so that the diagnosis decision could be finalized according to the vascular anatomical information, e.g., annotations of vascular branches and vascular morphological properties [7]. However, conventional reconstruction methods merely exploit the images obtained from the diastolic phase that only reveals partial coronary arteries [1, 15, 22], which potentially makes vessel lesions invisible, i.e., misdiagnosis.
In fact, a cardiac cycle has two phases, i.e., diastole and systole. The reconstructed arteries in the two-phased CCTA images are incomplete coronary trees, but they complement each other. By accurately aligning the arteries in both phases, the complete coronary tree can be reconstructed. Nevertheless, there are three challenges for successful coronary reconstruction. 1) Since the heart beats vigorously, its surrounding arteries can be squeezed by heart chambers and become invisible in one of the phases, easily causing the misalignment of a significant number of arteries in the two-phased images (short for component variation), as pointed by yellow (visible in diastole only) and cyan (visible in systole only) arrows in Fig. 1(a). 2) Arteries deform along with heartbeats, their shape, size, and location may vary significantly across the two phases, causing difficulties in alignment, as demonstrated in Fig. 1(b). 3) Arteries are tiny tubular tissues, which only occupy a very small part (\(\le 0.5\%\)) of the whole CCTA image (Fig. 1(c)), causing imbalance issues for image-based registration methods.
For vessel registration, there are mainly three main branches of methods, i.e., image-based, point-cloud-based, and hybrid-based registration. Image-based methods utilize image features to register the entire volume, and the obtained deformation field is then used to align vessels to the target space. Those methods have been extensively applied to the registration of coronary arteries [14, 16], pulmonary vessels [13, 17], cerebral vessel [10], heart chamber [11], etc. Although those methods demonstrate promising performance on the whole image scale, the vessels are not necessarily well-aligned and cannot be employed to reconstruct the complete coronary tree. By contrast, point-cloud-based registration directly aligns the vessels, which are firstly labeled or segmented from CCTA images and then modeled as point clouds for registration. For example, point-cloud networks [20, 21] or graph convolutional networks [24] commonly exploit geometric features of the vascular point-cloud, which are more flexible and accurate than those image-based methods. The limitation of those point-cloud-based methods mainly involves the disability in geometric feature representation to distinguish the arteries, because different arteries or artery branches can share very similar morphology [12]. Similarly, the hybrid-based methods [4, 8] also extract the vessel masks in the images for registration, but the lack of effective image information limits its performance. Integrating the advantages of both domains (image and point cloud) may produce improved outcomes, but has not yet been explored.
In this paper, we propose a structural point registration network (SPR-Net) to align coronary arteries from the systolic and diastolic phases. The SPR-Net is designed to exploit both image-based and point-cloud-based features, in which the image and point cloud are encoded as intrinsic features. Additionally, we propose a transformer-based feature fusion module to fully exploit the obtained intrinsic features in extracting structural points, i.e., key points that delineate the anatomical morphology of arteries across the two phases and are solely used to compute the deformation field. For those obtained structural points, a simple thin-plate spline [5] method is employed to align coronary arteries of systole and diastole. Extensive experiment results demonstrate the superiority of our method over eight methods (Fig. 2).
2 Method
We propose the SPR-Net method, which simultaneously utilizes geometric features extracted from point clouds and image features extracted from CCTA images with the goal of generating structural points to align arteries across systole and diastole, with shape, location, and component variations. In this section, we first introduce the extraction of geometric features (Sect. 2.1), then the extraction of image features (Sect. 2.2), next the extraction of structural points and their usage in registration procedures (Sect. 2.3), and finally the loss function (Sect. 2.4).
2.1 Geometric Feature Learning
The coronary arteries share a tubular structural shape. The point cloud network has the advantages of effectively learning the spatial geometric shape of arteries and providing accurate relative positional relationships of points [24], so that the obtained point features are more discriminative. Inspired by [6], we employ the point cloud encoder, with the same structure as [6] that composes three layers (i.e., sampling layer, multi-scale grouping layer, and PointNet layer), to extract the geometric features of each point.
Given the input diastolic and systolic point clouds P and Q, we first use a sampling layer in the point cloud encoder to obtain the down-sampled points \(\bar{P}=\{\bar{p}_1,\bar{p}_2,\cdots , \bar{p}_m\}\) with \(\bar{p}_i\in R^3\) and \(\bar{Q}=\{\bar{q}_1,\bar{q}_2,\cdots , \bar{q}_m\}\) with \(\bar{q}_i\in R^3\), respectively. \(\bar{P}\) and \(\bar{Q}\) are then filled into the multi-scale grouping layer to aggregate its neighboring points within different radii r. After that, the multi-scale aggregated points are fed into the PointNet layer to extract geometric features.
2.2 Geometric and Image Feature Encoding
Point clouds can provide good geometric shapes and spatial location information, but they lack sufficient semantic features of coronary arteries. Meanwhile, the images contain rich contextual information that can complement the geometric features. Therefore, we design a transformer-based module to integrate both advantages. Specifically, 1) we employ a shallow 3D vision transformer (ViT) [9] to extract image features of the artery; and 2) we employ general transformers [19] to fuse image features and geometric features extracted by the point-cloud encoder.
1) Image Feature Extraction. For efficiency, we only crop image blocks of size \(h\times w \times d\), with each point as the centroid, and the ViT block is employed to extract local features. Since these blocks are extracted along the tubular structures, the extracted local features reveal intrinsic relationships. To exploit their correlations, we employ a self-attention mechanism-based transformer. The coordinates of each point serve as the position encoding, which is added to its local image feature as the input to the following transformer blocks.
where \(E_i\) and \(I_i\) respectively indicate the position encoding and image features for the i-th rectangular volume. \((a, b, c)\in R^3\) is the point coordinates. \(f^{img}_i\in R^l\) is the self-attention input of transformer layer, and l is the feature dimension.
2) Geometry and Image Co-embedding. Given concatenated features of pointwise and image features, four transformer layers are employed to further explore comprehensive contextual features between the two phases. The transformer layer incorporates an encoder and decoder block, which are based on a multi-head attention mechanism. We use the concatenated features of the diastolic phase as input to the transformer encoder and decoder respectively, and the opposite for the systolic phase, to learn the feature dependencies between the two phases.
2.3 Registration via Structural Point Correspondences
1) Integration of Structural Points. The input of MLP is the contextual features extracted by the transformer, and the output is the probability of each point. Specifically, given the sampled points \(\bar{P}\) with the fused features \(F_P\) from diastole, we input the features into the shared MLP to generate the probability maps \(V_p = \{ v_1,v_2,\cdots ,v_k \}\) with \(v_i\in R^m\). Thus, the diastolic structural points \(S_p\) can be calculated as follows:
Note that, the systolic structural points \(S_q\) are calculated in the same way as the diastolic structural points.
2) Structural Points based Registration using TPS. Based on the correspondence established between the structural points \(S_p\) and \(S_q\) in the two phases, we apply a simple but effective idea of the TPS method to interpolate the dense deformation field. For the two sets of structural points, \(S_p\) and \(S_q\), the nearest projection from structural points \(S_q\) to the \(S_p\) is calculated, and the \(S_q\) is warped to the \(S_p\) in the diastolic phase. Eventually, each systolic point is re-meshed by the closest point to the structural point and further warped to the original points Q using the estimated dense deformation field.
2.4 Loss Function
We design a structure-constrained registration loss for SPR-Net,
where,
Here \(L_{rec}\) is chamfer distance, and X and Y denote two point clouds respectively. The first part \(L_{rec}(S_p, P)\), and the second part \(L_{rec}(S_q, Q)\) assure the predicted structural points in two different phases are close to their corresponding original point clouds. The third part \(L_{rec}(S_p, S_q)\) encourages an accurate alignment of structural points between the two phases, ensuring that structural points with the same semantics align on the same vessel branch.
3 Experiments and Results
3.1 Dataset and Evaluation Metrics
Data Processing. In our experiments, we collected 58 pairs of CCTA images with both diastolic and systolic phases. All coronary artery masks are first extracted using [25] and refined by three experts. Then, the annotated arteries were down-sampled and modeled as 3D point clouds; meanwhile, their coordinates were normalized to the range of [0,1]. We choose the five-fold cross-validation evaluation strategy, with 40 training subjects and 18 testing subjects.
Evaluation Metrics. Since the artery branches of systole and diastole only partially overlap, i.e., some coronary branches only appear in one phase, we define a common Dice coefficient (CoDice) to accurately evaluate the results.
where \(P_o\) and \(Q_o\) denote the set of coronary branches common to diastolic and systolic phases, respectively. Moreover, the Dice coefficient (Dice), Chamfer distance (CD), and Hausdorff distance (HD) are also employed for evaluation.
3.2 Implementation Details
The initial inputs of the SPR-Net contain 4096 point clouds for each phase, and a volume size of \(16 \times 16 \times 8\) is cropped around each point. The point cloud encoder consists of two set abstraction blocks with 1024 and 256 grouping centers respectively. In each set abstraction block, we utilize the grouping layer with two scales r to combine the multi-scale features, containing scales (0.1, 0.2, 0.4) and (0.2, 0.4, 0.8) respectively. The transformer blocks we used are composed of vanilla transformer layers. The outputs of the point cloud encoder and ViT have 512-D and 128-D features, respectively, which are concatenated together to form 640-D contextual features. The configuration of the MLP block in the structural point integration depends on the number of structural points. All experiments were implemented using Pytorch on 1 NVIDIA Tesla A100 GPU. We trained the networks using Adam optimizer with an initial learning rate of \(10^{-4}\), epoch of 600, and batch size of 8.
3.3 Comparison with State-of-the-Art Methods
Our SPR-Net was quantitatively and qualitatively evaluated, compared with eight SOTA registration methods, which belong to three categories:1) image-based registration, including SyN [2], VoxelMorph [3], and DiffuseMorph [11]; 2) hybrid-based registration, TMM [8]; 3) point-cloud based registration, including Go-ICP [23], DCP [20], STORM [21], and ISRP [6].
Quantitative Results. The quantitative results are listed in Table 1. We can find the superiority of point cloud-based methods if compared to image-based methods, which supports the previous conclusion about the limitation of image-based methods. We can also find that our proposed method significantly outperforms other methods since SPR-Net fully encodes and fuses features of the images and point clouds. Notably, SPR-Net achieves significantly better performance than ISRP, the closest competing method, with an improvement of 10% (i.e., increasing Dice from 58.31% to 68.58%).
Qualitative Visualization. Since the correspondence of structural points is vital for registration, we show the structural points (colored) in systole (green) and diastole (red) in Fig. 3 for demonstrating their correspondence. Those structural points with correspondence to the same vascular branch are marked by the same color denoted by the dashed boxes in the 2nd column of Fig. 3. Notably, we can find that the structural points are distributed at positions such as the endpoints or bifurcation points, as shown in the 1st and 2nd columns, which properly delineate the morphology of the point clouds when the number of structural points is small. With increased number, structural points do not only locate at endpoints or bifurcation points but also diffuse along the vessel branches, forming the vessel skeleton, as shown in the 3rd and 4th columns of Fig. 3. In the 5th column of Fig. 3, a complete coronary tree is obtained by exploiting the registration (\(K=768\)).
3.4 Ablation Study
We also conduct the ablation studies with the same backbone point cloud encoder by following three groups of configurations: 1) Whether using the four transformer layers, denoted as CoF, to encode and fuse the systolic and diastolic geometry. 2) Whether fusing the geometry features of point cloud with image-level semantic features, denoted GIF. 3) Testing the network on different numbers of structural points (Number-SP). Table 2 summarizes the ablation study results.
If without employing CoF and GIF, only the backbone encoder is used to generate structural points. 1) With the same 768 structural points, we can find the individual modules of CoF and GIF can both improve the Dice performance. Meanwhile, combining the two modules lead to the best performance, which may suggest the importance of fusing the two different aspects of features. 2) By equipping both CoF and GIF, we can find that SPR-Net’s performance has been improved when the structural points number increases from 256 to 768. However, the performance decreases when it is further increased to 1024, indicating that dense structural points negatively affect the results, which is probably caused by the increasing number of outlier points. It can also be found that SPR-Net demonstrates inferior performance than both backbone+CoF and backbone+GIF when using 256 structural points, which is probably caused by the sparsity of structural points that are largely located at the endpoints and bifurcation positions, which cannot well delineate the morphology of vessel tree. Therefore, the number of structural points is a key parameter that affects registration performance. Through extensive experiments, we determine the optimal number of structural points to ensure one-to-one correspondences between diastole and systole (Table 2).
4 Conclusion
In this paper, we have proposed an intrinsic structural point learning-based framework for systolic and diastolic coronary artery registration. The framework identifies structural points in the arteries across the two different phases using both the spatial geometric features extracted by the point cloud network and the complementary image semantic information extracted by ViT. By strategically fusing the image and point geometric features through a transformer, structural points with strong correlations in two different phases are extracted and used to guide the registration process. Compared with the existing image-based registration methods and point cloud-based methods, our integrated method achieves superior performance and outperforms the state-of-the-art methods by a large margin, which suggests the potential applicability of our framework in real-world clinical scenarios for CAD diagnosis.
References
Achenbach, S., et al.: Influence of heart rate and phase of the cardiac cycle on the occurrence of motion artifact in dual-source CT angiography of the coronary arteries. J. Cardiovasc. Comput. Tomogr. 6(2), 91–98 (2012)
Avants, B.B., Epstein, C.L., Grossman, M., Gee, J.C.: Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Med. Image Anal. 12(1), 26–41 (2008)
Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V.: VoxelMorph: a learning framework for deformable medical image registration. IEEE Trans. Med. Imaging 38(8), 1788–1800 (2019)
Bayer, S., et al.: Intraoperative brain shift compensation using a hybrid mixture model. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 116–124. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_14
Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989)
Chen, N., et al.: Unsupervised learning of intrinsic structural representation points. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9121–9130 (2020)
Çimen, S., Gooya, A., Grass, M., Frangi, A.F.: Reconstruction of coronary arteries from X-ray angiography: a review. Med. Image Anal. 32, 46–68 (2016)
Çimen, S., Gooya, A., Ravikumar, N., Taylor, Z.A., Frangi, A.F.: Reconstruction of coronary artery centrelines from X-Ray angiography using a mixture of student’s t-Distributions. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9902, pp. 291–299. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46726-9_34
Dosovitskiy, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Fu, K., Liu, Y., Wang, M.: Global registration of 3D cerebral vessels to its 2D projections by a new branch-and-bound algorithm. IEEE Trans. Med. Robot. Bionics 3(1), 115–124 (2021)
Kim, B., Ye, J.C.: Diffusion deformable model for 4D temporal medical image generation. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 539–548. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16431-6_51
Li, Y., Harada, T.: Lepard: learning partial point cloud matching in rigid and deformable scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5554–5564 (2022)
Pan, Y., Christensen, G.E., Durumeric, O.C., Gerard, S.E., Reinhardt, J.M., Hugo, G.D.: Current-and varifold-based registration of lung vessel and airway trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 126–133 (2016)
Pang, J., et al.: High efficiency coronary MR angiography with nonrigid cardiac motion correction. Magn. Reson. Med. 76(5), 1345–1353 (2016)
Schroeder, S., et al.: Influence of heart rate on vessel visibility in noninvasive coronary angiography using new multislice computed tomography: experience in 94 patients. Clin. Imaging 26(2), 106–111 (2002)
Shechter, G., Resar, J.R., McVeigh, E.R.: Rest period duration of the coronary arteries: implications for magnetic resonance coronary angiography. Med. Phys. 32(1), 255–262 (2005)
Smeets, D., Bruyninckx, P., Keustermans, J., Vandermeulen, D., Suetens, P.: Robust matching of 3D lung vessel trees. In: MICCAI Workshop on Pulmonary Image Analysis, vol. 2, pp. 61–70 (2010)
Timmis, A., et al.: European society of cardiology: cardiovascular disease statistics 2021. Eur. Heart J. 43(8), 716–799 (2022)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Wang, Y., Solomon, J.M.: Deep closest point: learning representations for point cloud registration. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3523–3532 (2019)
Wang, Y., Yan, C., Feng, Y., Du, S., Dai, Q., Gao, Y.: STORM: structure-based overlap matching for partial point cloud registration. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 1135–1149 (2022)
Weissman, N.J., Palacios, I.F., Weyman, A.E.: Dynamic expansion of the coronary arteries: implications for intravascular ultrasound measurements. Am. Heart J. 130(1), 46–51 (1995)
Yang, J., Li, H., Campbell, D., Jia, Y.: Go-ICP: a globally optimal solution to 3D ICP point-set registration. IEEE Trans. Pattern Anal. Mach. Intell. 38(11), 2241–2254 (2015)
Yao, L., et al.: TaG-Net: topology-aware graph network for centerline-based vessel labeling. IEEE Trans. Med. Imaging (2023)
Zhang, X., et al.: Progressive deep segmentation of coronary artery via hierarchical topology learning. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 391–400. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16443-9_38
Acknowledgment
This work was supported in part by National Natural Science Foundation of China (grant number 62131015, 62073260, 62203355), and Science and Technology Commission of Shanghai Municipality (STCSM) (grant number 21010502600).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, X. et al. (2023). SPR-Net: Structural Points Based Registration for Coronary Arteries Across Systolic and Diastolic Phases. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14226. Springer, Cham. https://doi.org/10.1007/978-3-031-43990-2_74
Download citation
DOI: https://doi.org/10.1007/978-3-031-43990-2_74
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43989-6
Online ISBN: 978-3-031-43990-2
eBook Packages: Computer ScienceComputer Science (R0)