Keywords

1 Introduction

Vascular disease has been one of the top and severe diseases with high mortality, morbidity and medical risk [1]. Blood vessel binary segmentation and anatomical labeling are of high interest in medical image analysis, since vessel quantification is crucial for diagnosis, treatment planning, prognosis and clinical outcome evaluation. In clinical practice, users have to manually edit or correct tracking error by defining the starting and ending points of each vessel segment, which is a time-consuming effort. It is therefore desirable to automatically and accurately segment and label vessels to facilitate vessel quantification. Computed tomography angiography (CTA) is the commonly used modality for studying vascular diseases. As shown in Fig. 1, head and neck vessels have long and tortuous tubular-like vascular structures with diverse shapes and sizes, and span across the entire image volume. Particularly, vessels inside the head are much thinner than those going through the neck. Therefore, it is challenging to handle vessels with varied shapes and size.

Fig. 1.
figure 1

An example of head and neck CTA image, along with major vessels. (a) A coronal slice of CTA; (b) A 3D vessel mask consisting of all major head and neck vessels; (c) 13 annotated segments: AO, BCT, L/R CCA, L/R ICA, L/R VA, and BA in the neck and L/R MCA, ACA and PCA in the head; (d) Point cloud representation of the head and neck vessels.

In the literature, traditional techniques have been developed for head and neck vessels segmentation [1, 2] from CTA images, as well as for cerebral vasculature segmentation and labeling [3, 4] from MRA images. Convolutional neural networks (CNNs) have also been developed for this purpose [5]. However, despite of these efforts, CNN-based segmentation still encounters great challenges in handling complicated structures. Particularly, although, because of the nature of spatial convolution, CNN-based techniques outperform many traditional algorithms for blob or larger region segmentation, the complicated shapes such as vessels and surfaces have to be considered specifically by re-designing the networks or the loss functions. Thus, head and neck vessel segmentation and labeling from CTA images remain an open field to be explored.

Possible vital techniques to tackle this problem is to effectively consider spatial relationship among vascular points. Recently, point cloud learning has attracted much attention in 3D data processing [6]. As shown in Fig. 1(d), point cloud representation of head and neck vessels allows for quantification of spatial relationship among points in vascular structures, as well as effective extraction of vessel by leveraging their spatial information in the entire volume. Previous point cloud methods [7,8,9] have shown impressive performance in 3D classification and segmentation. Balsiger et al. [10] reported the improved volumetric segmentation of peripheral nerves from magnetic resonance neurography (MRN) images by point cloud learning. However, compared to peripheral nerves, head and neck vessels have more complicated structures. On the other hand, graph convolutional network (GCN) has already been used in vessel segmentation in the literature [11,12,13,14], for learning tree-like graph structures in the images.

In this paper, we propose a GCN-based point cloud learning framework to improve CNN-based vessel segmentation and further perform vessel labeling. Specially, the first point cloud network, named as I, is used for two-class classification to improve vessel segmentation and the second point cloud network (joint with GCN), names as II, is employed for thirteen-class classification to label vessel segments. The proposed method incorporates 1) the advantage of GCN to utilize the prior knowledge of tree-like tubular structures of vessels and 2) the advantage of point cloud-based networks to handle the whole image volume, and learns anatomical shapes to obtain accurate vessel segmentation and labeling. The performance of our proposed method is evaluated on 72 subjects, by using the overlapping ratio as a metric to evaluate binary segmentation and labeling of vessels.

Fig. 2.
figure 2

Overview of the proposed method.

2 Method

Figure 2 shows the overall workflow of the proposed method, with two stages:

Fig. 3.
figure 3

The proposed GCN-based point cloud network (the point cloud network II). (a) The point cloud network I utilized to improve binary segmentation of vessels, and (b) GCN branch.

1) Vessel segmentation with point cloud refinement. In particular, a V-Net [16] model is first applied for coarse vessel segmentation, from which the point cloud is constructed. Then, the first point cloud network I is applied to refining the coarse segmentation.

2) Anatomical labeling with GCN-based point cloud network. A GCN-based point cloud network II is constructed to further label the vessels into 13 major segments. The detailed architecture of the GCN-based point cloud network is shown in Fig. 3.

2.1 Vessel Segmentation Refinement with Point Cloud Network

Coarse Vessel Segmentation. A V-Net is trained to delineate the coarse head and neck vessels as initialization of our proposed method. In particular, we dilate the ground-truth mask (labeled by radiologists) to expand the vessel segmentation area for ensuring inclusion of all vessel voxels (even with high false positive segmentations). Using the dilated masks to train the V-Net network, vessel segmentation can be performed, for generating a probability map \(I_Q\) for each image.

Point Cloud Construction. A vessel point cloud P as shown in Fig. 2(a) can be constructed from the aforementioned probability map \(I_Q\) by setting a threshold \(\theta \). Note that the amount of points depends on the output of V-Net and also the size of the object to be segmented. We denote the point cloud P as \(P=[p_{1}, p_{2}, ..., p_{N}]\) with N points \(p_{i} \in R^{3}\). Each voxel \(v \in I_{Q}\) with its probability \(q \in [0, 1]\) larger than \(\theta \) is set as a point \(p_{i} = (x, y, z)\), which are the Cartesian coordinates of v.

Point Cloud Network \({\textit{\textbf{I}}}\) for Vessel Segmentation Refinement. The first point cloud network (I) is designed to improve vessel segmentation by removing false positive points from the vessel point cloud P. The point cloud network is built upon the relation-shape CNN introduced by Liu et al. [7]. As shown in Fig. 3(a), the network includes a hierarchical architecture to achieve contextual shape-aware learning for point cloud analysis. The encoding structure gradually down-samples the point cloud to capture the context, followed by a decoding structure to upsample the point cloud. Features are combined through skip connections. The set abstraction (SA) module is used for operation of sampling, grouping and relation shape convolution (RS-Conv). In the SA module, a set of L representative points are sampled from the input vessel point cloud P using the farthest points sampling (FPS) to perform RS-Conv. RS-Conv is the core layer of RS-CNN. Multi-scale local subsets of points that are grouped around a certain point serve as one input of RS-Conv, and a multi-layer perceptron (MLP) is implemented as the convolution to learn the mapping from low-level relations to high-level relations between points in the local subset. Further, the feature propagation (FP) module propagates features from subsampled points to the original points for set segmentation. Then, two fully-connected (FC) layers decrease the features of each point to the number of classes. Finally, a softmax function gives the class probabilities of each point. Image information, contained in the original images and the probability map \(I_{Q}\), are fed into the first SA operator with the point Cartesian coordinates. Image information of each point is extracted from a nearby volume of interest \(I_{p} \in R^{X \times Y \times Z}\) using a sequence of two 3D convolutions, and then transformed into a vector of input point cloud size to be fed into the network. The trained point cloud network I is applied on the point cloud P, and a refined point cloud \(P'\) shown in Fig. 2(b) can be obtained and converted into 3D volumetric image as the refined binary segmentation of vessels.

2.2 Vessel Labeling with GCN-based Point Cloud Network

Point Cloud Construction. Vessel labeling is carried out on the refined vessel point cloud \(P'\) acquired from the first stage. For training labeling model, the point cloud representation \(O=[o_{1}, o_{2}, ..., o_{M}]\) with M points \(o_{i} \in R^{3}\), namely the refined vessel point cloud, can be established from the ground-truth anatomical mask \(M_{al}\) annotated by radiologists. Labels of points are represented using numbers \(l_{1}, l_{2}, ..., l_{13}\). Anatomical point cloud O with labels and constructed graphs are employed to train the point cloud network II for vessel labeling.

Graph Construction. Point cloud graph G as shown in Fig. 2(b) is built from the L representative points, namely the vertices, sampled from the point cloud \(P'\) using aforementioned FPS. Edges of graph are set as the Euclidean distance after normalization of coordinates of points. According to the varying diameters of vessels, a threshold of d is set to determine whether two vertices are connected. Therefore, vertices and edges are acquired. The output features of the first SA operator shown in Fig. 3(b) are utilized as the input features of vertices.

GCN-based Point Cloud Network \({\textit{\textbf{II}}}\) for Labeling. The architecture of the point cloud network III used in Fig. 2(b) is shown in Fig. 3, which is built upon the point cloud network I. The training process of the second stage is similar to the first stage, apart from three aspects. First, the input is the point cloud O and the output is 13 classes (representing 13 labels). Second, image information of only the original image is used to enrich each point’s representation. Third, graph branch is constructed to extract structural and spatial features of vessels. The constructed graph is trained using the two-layer GCN [15]. The output is concatenated to the input of the last FP operation as additional features to help labeling. Finally, these features are employed to improve the classification of every point. The anatomical point cloud R shown in Fig. 2(b) can be obtained and transformed to 3D image as the final labeling result of vessels.

Fig. 4.
figure 4

Qualitative results. Each panel (a, b, c, d) shows the ground-truth vessel segment, the vessel labeling results by our proposed method, and the corresponding segmentation results by V-Net.

3 Experiments and Results

3.1 Materials and Parameters Setting

In our experiment, CT angiography images (covering heads and necks) of 72 subjects are used to evaluate our proposed method. The size of the CTA image is \(512\,\times \,512\,\times \,533\) and the voxel spacing is \(0.4063\,\times \,0.4063\,\times \,0.7000\) mm\(^{3}\). The head and neck vessels have been manually annotated by a radiologist and reviewed by a senior radiologist. Four-fold cross-validation is used in the experiment. An ellipse kernel size k is set as 5 for the dilation of ground-truth segmentation. Threshold \(\theta \) is set to 0.1 to construct the point cloud P from the V-Net-based probability map. L is set to 2048, which stands for the number of the representative points to perform RS-Conv and also the vertex number of the graph. The size of a volume of interest \(I_{p}\) is set to \(X=5, Y=5, Z=5\) in both the point cloud networks I and II. In the point cloud network I, we train the network for 85 epochs using Adam optimizer with a learning rate of 0.001 and cross-entropy loss. Notice that, during the testing, we also randomly extract subsets of 2048 points until all the points of a point cloud have been classified. The labels of vessel segments \(l_{1}, l_{2}, ..., l_{13}\) are set to 1, 2, ..., 13 in experiment, corresponding to the vessel segment labels. The threshold d used for edge connection is set to 0.05 after normalization of coordinates of points.

3.2 Results

To make fair comparison between vessel labeling results, both the proposed method and the V-Net use the same data set and the four-fold validation setting. Qualitative and quantitative evaluations are performed.

Qualitative Results. The qualitative results for vessel labeling of four patients are demonstrated in Fig. 4. For each patient, annotations on the images from left to right stand for the ground-truth vessel labeling, and the vessel labeling results by our proposed method and the V-Net, respectively. Specially, the V-Net used to train the labeling model takes the 3D original image as input, the ground-truth vessel labeling as supervision information, and Dice ratio as loss. As shown in Fig. 4(a) and Fig. 4(b), the complete head and neck vessel structures can be segmented and labeled accurately using our proposed method. As illustrated in Fig. 4(c) and Fig. 4(d), the multi-label segmentation results are obtained without RCA and LCA. Compared to the ground truth, the corresponding labels are missing too. This shows that our proposed method has good performance on both healthy cases and diseased cases. Compared to the labeling results with direct use of the V-Net, our proposed method has better performance on both binary segmentation and labeling. As shown in Fig. 4, for head vessels with small diameter, our proposed method can achieve better performance on their segmentations.

Table 1. Quantitative evaluation of multi-label segmentation; here, the Proposed w/o represents the proposed point cloud network without GCN.

Quantitative Results. For vessel binary segmentation, the first point cloud network I is used to improve over the V-Net based segmentation. Points in the point cloud P can be classified into the vessel points belonging to the target and the outliers not belonging to the target. Average accuracy (ACC) and intersection over union (IOU) are used to evaluate the performance of the first point cloud network I. The average ACC in the four-fold cross-validation is 0.972 and 0.986 for the outlier and the vessel points, respectively, and IOU is 0.964 and 0.976, respectively. The Dice coefficient is used to evaluate the binary segmentation of vessels. The refined point cloud \(P'\) can be transformed into volumetric images as the binary segmentation of result of vessels. The average Dice coefficient is 0.965 for the refined binary segmentation result of vessels, which is improved by 0.08 compared to the result of 0.885 by the V-Net. This shows large improvement of binary segmentation of vessels by the first point cloud network I, which contributes significantly to the vessel labeling.

Vessel labeling is carried out on the binary segmentation results. The quantitative results of vessel labeling are presented in Table 1. The 1\(st \) column is the label index and its corresponding abbreviations of vessel segments. The 2\(nd \) column consists of average accuracy (ACC) and intersection over union (IOU) used to evaluate the performance of the second point cloud network II. The 3\(rd \) column shows the Dice coefficient used to evaluate vessel labeling. Three methods, i.e., 1) V-Net, 2) the proposed point cloud method without GCN, and 3) the proposed method, are evaluated. Compared to the V-Net and our proposed method without GCN, the performance of our proposed method is increased by 0.07 and 0.026, respectively, for the average Dice coefficient of all vessel segments. Taking various diameters of different segments into account, two groups of head and neck are separated. Compared to the V-Net, the average Dice coefficient of the proposed method is increased by 0.042 for neck vessels, while 0.135 for head vessels. This shows that our proposed method has better performance both on neck vessels and head vessels with small diameter, and also our proposed method is robust.

4 Conclusion

In this paper, we have proposed a GCN-based point cloud framework for labeling of head and neck vessels from CTA images. Specifically, we formulated vessel segmentation problem into a point-wise classification problem, for improving over the CNN-based binary segmentation results. Then, the GCN-based point cloud learning can further leverage the vascular structures and anatomical shapes to improve vessel labeling. Experiment results indicate that our proposed method is effective in improving volumetric image segmentation and learning complex structures for anatomical labeling. Future work will focus on the construction of graph as well as more efficient point-based sparse learning for volumetric image segmentation.