Introduction

Time-of-flight magnetic resonance angiography (TOF-MRA) is a non-contrast imaging technique of intracranial arteries, which is commonly used clinically to evaluate cerebral arterial disease [1]. Volume render reconstruction is the main post-processing technique, usually performed by experienced radiologists, allowing three-dimensional (3D) visualization of vascular structures to aid in early identification and diagnosis [2, 3]. However, as the number of TOF-MRA examinations increases, technicians are overwhelmed by time-consuming manual procedures. Consequently, there is growing interest in automated solutions for 3D visualization of intracranial arteries in clinical workflows, and deep learning (DL)-based automatic vessel segmentation may provide an effortless alternative. In addition, the 3D structure of vessels extracted by segmentation is a prerequisite for further digitization of cerebrovascular [4].

Recent studies of DL-based models have shown significant potential in various medical image-processing tasks [5,6,7]. Particularly, the convolutional neural network (CNN), one of the mainstream DL technologies, the design of CNN is inspired by the human visual system and is well suited for processing image tasks with local spatial correlations [8, 9]. U-Net [10], as a specifically CNN designed for segmentation tasks, has become the preferred method for object segmentation, including cerebrovascular segmentation, and has shown promising results in recent applications [11,12,13,14,15,16,17].

However, although these DL approaches can achieve high segmentation accuracy, it relies on a large amount of labeled data for model training [18]. Where labels refer to the intracranial artery contours hand-drawn by experts, which are also considered as the ground truth for segmentation. As an important component of CNN model construction and evaluation, labeling requires extreme time and labor-costly, so it is almost impossible to implement in large-scale studies.

Therefore, the lack of studies to verify the feasibility and accuracy of DL-based vessel segmentation models in external independent datasets with large sample sizes and comprehensive types [19,20,21]. This is one of the most important obstacles to the clinical application and development of automatic segmentation schemes in TOF-MRA [22, 23]. Assessment methods that combine expert image analysis with diagnosis for scoring may represent a new turning point in addressing the above barriers from a more clinical perspective [24,25,26,27].

This study proposed a method for extracting the 3D structure of intracranial arteries based on an attention mechanism and multi-level feature extraction, and compared and evaluated 3D U-Net [11] and 3D Brave-Net [12], which have performed well in the field of cerebrovascular segmentation in recent years [15]. A large external independent dataset of 408 subjects was used for qualitative evaluation. The dataset contains healthy subjects and the two most common cerebrovascular lesions: cerebral aneurysms and stenosis. By combining the visualization of the segmented vascular structure with manual VR, which is widely used in clinical practice, combined with clinical scoring results and diagnostic skills, the practical application of the proposed model was verified from a clinical perspective.

We expect that the developed model can be seamlessly integrated into the radiology workflow to realize the automatic segmentation process based on TOF-MRA images, which can not only achieve fine automatic visualization of cerebral arteries and save manual work steps, but also further promote the research progress of future automated quantitative analysis of vessels, and further improve the accuracy and efficiency of cerebral arterial disease diagnosis.

Methods

Patients

Data for this retrospective study were obtained from three institutions, in which a retrospective search of two independent radiology datasets from two tertiary hospitals identified TOF-MRA examinations developed between January 2007 and December 2023 for the assessment of intracranial artery-related status. Another academic dataset, collected by Centre of Advanced Studies and Innovation Lab (CASILab) and publicly available online (URL: https://marron.web.unc.edu/brain-artery-tree-data/) contains TOF-MRA images of 109 healthy volunteers from five age groups [3]. The Hilbert researchers randomly selected four scans from each age group and determined ground truth by manual recording of vessel contours and cross-validation by three radiologists [12]. Ultimately, a total of 20 scans with ground truth were included in this study.

Demographic characteristics are detailed in Table 1. The overall experimental design and data flow of the workflow diagram is shown in Fig. 1.

Table 1 Patient clinical characteristics
Fig. 1
figure 1

The data flowchart. Shows the training and testing process of the model and their respective data distributions. VR, volume render; DSC, dice similarity coefficient

MR Acquisition

The TOF-MRA scans involved in this study were performed on three different 3 T MR scanners with different imaging acquisition parameters, which were summarized in Table 2.

Table 2 Participant Characteristics and TOF-MRA acquisition parameters

Segmentation Model

We implemented an integrated segmentation pipeline using a CNN architecture with an optimization strategy, as shown in Fig. 2. The addition of the CBAM module to the input and output parts of the network can increase the attention of the network to the target object while maintaining the U-shaped structure [28]. The MSFE module contains different convolutional kernel sizes that imply different perceptual field sizes, enabling the creation of feature maps containing convolutional features with different granularity, and feature concatenation implies the fusion of multi-scale features [29]. To ensure that the MSFE module performs well, we inserted it after each convolutional feature map with a pixel size of less than 20 [30]. The combination of 1D convolution and residual connection helps to achieve the fusion of each module and avoid possible degradation due to network upscaling.

Fig. 2
figure 2

Overview of the proposed CNN framework for automatic cerebrovascular segmentation methods. A Detailed flowchart of data preprocessing before model training and prediction. B The architecture of the proposed 3D CNN architecture. The deep learning convolutional neural network is trained on 64 × 64 × 8 voxel training patches from pre-processed images. Using 1 × 1 × 1 convolution as the last layer to calculate the DSC loss, and applied the sigmoid activation function to map the feature map to the segmentation probability map, then set the pixels with probability greater than 0.5 to 1 and the rest to 0, thus obtaining the final binary vessel segmentation results. Model’s learning rate of 0.0001 and maximum feasible batch size of 16. C Implementation foundation of advanced CBAM and MSFE technology. CBAM, convolutional machine attention mechanism; MSFE, multi-scale feature extraction; DSC, dice similarity coefficient; ReLU, rectified linear unit; Conv, convolutional layer; BN, batch normalization; AvgPool, average pooling; MaxPool, Max pooling

The network is trained by inputting 3D image patches and corresponding vessel labels, learning vascular features from TOF-MRA image data, and then predicting the probability that each pixel belongs to a vascular region. Finally, outputting a binary image of the spatial structure of cerebral arteries. To evaluate the performance advantage of our approach in recent DL-based vascular segmentation studies, we replicated two state-of-the-art CNN models, 3D U-Net [11] and 3D Brave-Net [12], as comparators. Figure 1 provides details of the data allocation during model training and testing, and all test data were obtained from independent external datasets.

Quantitative Metrics of Model Effectiveness

The performance of a vessel segmentation model can be evaluated from many different perspectives. To provide a broader understanding of the performance of our model, we report three different metrics: DSC, sensitivity, and accuracy. We use DSC to assess the spatial overlap between predictions and ground truth. Sensitivity is used to measure the ability to segment the vascular region. Accuracy represents the proportion of correct prediction results in the total predicted value.

Qualitative Clinical Assessment of Image Quality

To evaluate the diagnostic confidence of the CNN models for segmenting cerebral arteries, we conducted a multi-reader study. Chief physicians with 22 (S.Q.J.) and 20 (Y.H.L.) years of experience respectively reviewed the historical TOF-MRA scans of their tertiary hospitals, and randomly selected scans evaluated for different intracranial artery-related status (healthy, aneurysm, stenosis), which were then included in the reader study.

The image quality of 3D visualization in manually VR and automatic CNN was scored, with vessels displayed in the original TOF-MRA slice and MIP as the gold standard of reference. The clinically score was independently evaluated by two radiologists with experience in cerebrovascular imaging (T.Z., with 10 years of experience, and J.L., with 13 years of experience) according to diagnostic of cerebral arterial integrity, visualization of collateral circulations, and description of lesion morphology (types of aneurysms and stenosis). The readers independently graded the images on a five-point Likert-type score (5, excellent; 4, good; 3, acceptable; 2, poor; and 1, unacceptable) from the above three categories, where all categories score greater than 2 were defined as clinically acceptable (see Text, Supplemental Digital Content 1, which demonstrates scoring details). We also calculated the summed visual score (SVS) as the sum of scores [22].

The same TOF-MRA scans were independently reviewed by two readers (Y.L. and S.J.), and the scores of all samples were finally cross-validated to obtain a unified result, for mitigating the influence of subjective bias. If the two diverged, a third party (S.Q.J. and Y.H.L) to discuss until a consensus is reached.

As shown in Fig. 1, a total of three independent datasets were randomized and de-identified for visual scoring, with the reader having no knowledge of clinical history. Multiple datasets were used for the following reasons, firstly, as the data for model training was sourced from local dataset institution-1, 58 of 89 were randomly selected from the remaining scans for comparing the performance of the CNN models and manual VR. Then, the generalization performance of three CNN models and manual VR was tested on the external hospital-1 dataset. Finally, a further comparison between the proposed CNN and manual VR was conducted using the external hospital-2 dataset.

Statistical Analysis

The Shapiro-Wilk test was used to test the normal distribution of all indicators; median values and IQR were calculated for nonnormally distributed data. We use reader clinically scores to compare two types methods of cerebrovascular 3D visualization: CNN automatic segmentation and expert manual VR. For scores from the local dataset of institution-1 and the external dataset of hospitals-1, the Friedman test was applied to compare characteristics derived from 3D U-Net, 3D Brave-Net, proposed CNN, and VR (at hospitals-1 dataset). For the external hospitals-1 and hospitals-2 datasets, pairwise comparisons between proposed CNN and manual VR scores were performed by the Wilcoxon paired signed rank test. All statistical analyzes were performed by one researcher (Y.Q.M.) using Prism version 9. P < 0.05 was considered to indicate a statistically significant difference for all previously mentioned tests.

Results

Patient Characteristics

A total of 394 patients (mean age ± SD, 59 years ± 13; 141 males) were included in the final analysis (Table 1), which included 230, 79, and 85 TOF-MRA scans with three common clinical diagnoses of cerebral arterial: healthy, aneurysm, and stenosis, respectively.

Ablation Studies for Proposed Network

In this subsection, the contributions of the different components of the proposed network are evaluated. First, 20 TOF-MRA scans with ground truth labels were randomly divided into training, validation, and test sets in a 12:4:4 ratio to quantitatively evaluate the effects of CBAM and MSFE. The image quality of the segmentation results was then qualitatively assessed using independent data sets from hospitals. The segmentation performance of each model on the test data is summarized in Fig. 3.

Fig. 3
figure 3

Contribution of different components on segmentation performance of proposed network. A, B The comparison of TP, FP, FN, DSC, and sensitivity values for each model on 4 labeled test data. Where the TP values are based on subtracting 45,000 to show the difference between the models more significantly. A show the effect of each optimization method on the recognition ability of the blood vessels and background. The addition of the CBAM attention mechanism to the 3D U-Net backbone network can effectively improve the background recognition rate and reduce FP; however, the ability to distinguish vessels is inhibited. In contrast, the addition of the MSFE module has the opposite effect, even though TP is improved, and the probability of the background being mistaken for a vessel increases significantly, leading to an increase in FP. The proposed network model combines both optimization strategies to improve TP while suppressing the rise in FP and FN. C One representative visual example shown, yellow arrows indicate regions that are easily misidentified or segmented as vessels. TP, true positives; FP, false positive; FN, false negatives; DSC, dice similarity coefficient; CBAM, convolutional machine attention mechanism; MSFE, multi-scale feature extraction

The DSC value of the 3D U-Net-based extracted vessel mask is 88.1%. When the CBAM module is integrated into the U-Net backbone network, the DSC value increases to 88.7%, while the integration of the MSFE module increases the sensitivity value from 85.1 to 86.0% (Fig. 3B). Subsequently, the best sensitivity and the second highest DSC value were obtained for the proposed network. Although the number of FPs was slightly higher than other models, the proposed network was more in line with the clinical need to obtain the most complete vessels while controlling the number of FPs. Qualitative assessment of segmented vessel morphology in the hospital test data yielded results consistent with those described above (Fig. 3C). Due to the small size of the labeled dataset, the difference in the evaluation indicators between the networks is very small, but in terms of visualization, our model clearly reflects the excellent effect of vessel segmentation.

Quantitative Evaluation of Models’ Effectiveness

Twenty pairs of TOF-MRA scans from Institution-1 and corresponding ground-truth labels were included in the dice similarity coefficient (DSC) evaluation. The performance of three CNN models for automatic vessel segmentation achieved DSC of 0.937 vs 0.942 vs 0.947 and 0.922 vs 0.928 vs 0.927 (3D U-Net vs 3D Brave-Net vs proposed CNN) on the training sets and validation sets, respectively.

A total of 60,000 3D image patches extracted from the 20 TOF-MRA data were used for model training and validation. In addition, a 3D U-Net model and 3D Brave-Net were built separately in this study for comparison as existing state-of-the-art techniques. Figure 4 presents the evolution of the DSC for the three models on the training and validation sets, with all curves reaching convergence before the 15th epoch.

Fig. 4
figure 4

Performance of the training and validation process of three cerebrovascular segmentation models in brain TOF-MRA images. Where Illustration the variation and comparison of the DSC and accuracy of the three models on the training and validation sets

The validation DSCs of the 3D U-Net model, 3D Brave-Net, and proposed models were 0.922, 0.928, and 0.927, respectively, which initially shows the improvement achieved by both optimization models based on 3D U-Net. Proposed network achieved the highest accuracies of 0.9982 and 0.9974 for the training and validation datasets, respectively. It outperformed the original network (with accuracies of 0.9977 and 0.9972, respectively) and Context U-Net (with accuracies of 0.9980 and 0.9973, respectively).

Because the scarcity of vascular-labeled data resulted in a small test set that was insufficient to support a reliable qualitative assessment of the segmentation model performance, an independent, comprehensive, and hospital-based dataset was used to visually validate the segmented vessel of each model. Figure 5B shows two representative examples from the validation and test datasets. Proposed network outperforms the other two methods in terms of segmentation completeness for major vessels and side branches, whereas the 3D U-Net model exhibited a robust segmentation performance with good generalization capabilities. In contrast, from the segmentation results of Brave-Net, the model may overfit the training dataset in practical applications but with weak generalization ability.

Fig. 5
figure 5

Evaluation results showing the visualization performance of three CNN models on healthy intracranial arteries in local and external datasets. A Bar chart shows the score of artery score on three different CNN segmentation models. B Three-dimensional visualization of intracranial arteries acquired with three CNN models. There is no gap in the actual effect of all models on the local dataset, and the cerebrovascular are relatively complete. However, on the external dataset, except that the proposed CNN still maintains robust segmentation performance, the visualization results of the other two models lack most of the main vessels, which cannot meet the diagnostic needs. SVS: summed visual score

Clinical Assessment of Image Quality

As summarized in Table 3, the qualitative image quality scores of U-Net, Brave-Net, and proposed CNN to be of diagnostic quality (acceptable, good, or excellent) (P ≥ 0.20) on the entire local dataset institution-1. However, on the external comprehensive dataset hospitals-1, less than half of the U-Net and Brave-Net scoring results were considered diagnostically acceptable (25 of 50 [36%] vs 28 of 58 [40%]). The two readers considered 316 of 316 (100%) of scans processed by the proposed CNN and expert manual VR were rated as good to excellent (P ≥ 0.12) on the two external datasets. The clinical scoring results of three CNN models and manual VR on the three datasets are detailed below.

Table 3 Multi-reader Assessment of Image Quality of CNN model segmentation versus manual reconstruction cerebrovascular

First, the local dataset institution-1 contains 78 TOF-MRA scans from healthy subjects, 58 of which are used for reader clinical scoring to evaluate the vascular image quality of the four methods. Median reader scores for two individual score of cerebrovascular integrity and collateral circulation and one visual total score of SVS were no significantly different between U-Net, Brave-Net, and proposed CNN (5.0 [IQR, 4.0–5.0]; P ≥ 0.02). This qualitatively assessment is consistent with the results of the DSC quantitative evaluation (Fig. 5). These 58 healthy-type scans processed by the three CNNs were diagnostically acceptable (two individual score > 2).

Then, we verified the practical application performance of each model and expert manual VR in the external comprehensive dataset hospitals-1, which contains 69 scans of cerebrovascular health, aneurysm, or stenosis type. Manual VR reconstruction commonly used in clinical practice. As shown in Fig. 5A, there are significant differences among the three CNN methods in the SVS and three individual scores of cerebrovascular: integrity, collateral circulation, and lesion morphology (P < 0.0001).

The U-Net and Brave-Net showed low performance of scores (median, 2.0 [IQR, 1.0–3.0]). A group comparison of the proposed CNN and manual VR showed no significant difference (P ≥ 0.12). Among them, the IQR range of proposed CNN is better than VR for the score of lesion morphology in patients with aneurysm (Fig. 6), although there was no significant difference (median, 5 [IQR, 5–5] vs 5 [IQR, 4–5], P = 0.12). Notably, the proposed CNN for automated segmentation and visualization of vessels was rated as having an expert manual VR reconstruction appearance (intraclass correlation coefficient, 0.992–0.993; P < 0.001). 

Fig. 6
figure 6

MIP of raw images and segmented vessel of three model in the two healthy samples. The first column is the original TOF-MRA MIP image, the second column is U-Net, the third column is Brave-Net, and the fourth column is the proposed method. These two samples are from the training dataset (rows 1 to 5) and the independent dataset (rows 6 to 8), where poorly segmented regions are indicated by red arrows. In the training data, the left posterior cerebral artery (PCA), anterior cerebral artery (ACA) and middle cerebral artery (MCA) long-segment branch segments are missing in 3D U-Ne, and ACA and PCA branch segments are missing in Brave-Net. In independent external data, 3D U-Net missing bilateral vertebral arteries (VA), right MCA, and bilateral posterior inferior cerebellar arteries (PICA). Brave-Net is missing most blood vessels, the arrows are not used for graphic indication here. The proposed CNN is missing the left VA and MCA branches

Finally, to further compare our method with manual VR for clinical applicability, reader scores were incorporated into an external comprehensive database Hospitals-2. Scores from both techniques had the same median score and IQA in most scans (179 of 247 [72%]), and there were no significant differences between groups (P ≥ 0.07), except for SVS of aneurysm (median of CNN and VR, 15[IQA, 15–15] vs 15[IQA, 15–15]; P = 0.01) and SVS of stenosis (median of CNN and VR 15 [IQA, 14–15] vs 15 [IQA, 15–15], P = 0.02).

Discussion

There is growing interest in alternatives to expert manual VR reconstruction for cerebrovascular visualization. Our results demonstrate the feasibility of using DL to automatically segment and visualize cerebral vessels 3D structure from TOF-MRA scans, with a high cerebrovascular overlap (mean, 0.927), a high degree of diagnosis quality (median scores, 5 [IQA, 4–5]; P > 0.05) and a few seconds of scans processing time (mean seconds, 10 ± 3.7), the hardware platform used is detailed in the code environment on the last page. The proposed CNN has promising generalization applicability (No. of acceptable diagnoses (percentages), 25 [36%] vs 28 [40%] vs 69 [100%]), with reproduces good vessel segmentation performance (median scores range; [2, 3] vs [2, 3] vs [5]) on external datasets hospital-1. Furthermore, in our multi-reader study, image quality scores from the proposed CNN to be diagnostically acceptable on scans at two external datasets (316 of 316).

In the hospital-1 datasets, proposed CNN for automatic vessel segmentation and visualization were rated as having expert manual reconstruction of VR appearance (intraclass correlation coefficient, 0.992–0.993; P < 0.001). Unlike the pseudo-3D visualization of VR, the segmented cerebral vessels can be extracted their 3D structure from TOF-MRA data, which can provide a prerequisite for the subsequent automated measurement of morphological features, such as vessel radius, and can also assist in the intelligently assisted diagnosis of CVD [31], providing researchers with richer tools for cerebrovascular analysis [32].

However, there are some significant differences between proposed CNN and VR in the types of aneurysms and stenosis scans at hospital-2 datasets (P = 0.001 vs P = 0.02), which may be related to the differences in scanner imaging parameters and equipment in different centers, and the model also lacks adequate large training sets with labels of relevant patient types for training to enhance its own performance. Overall, the proposed CNN method did not demonstrate substantial over-or underestimation of cerebral vessels 3D structure, because the individual scores were also not significantly different in the case of patients TOF-MRA (P > 0.07). With the inclusion of larger quantity, more comprehensive, and higher quality labeled data in the future, there is still a lot of room for improvement in the performance of deep learning models.

Our study has several limitations. First, this is a model construction on a single type of small training set with 20 TOF-MRA scans, although each scan is cropped into 3000 small cubes with or without vessels, which limits the development of DL model performance. Second, for a proof-of-concept study, we focused on only three common types of cerebrovascular status diagnoses. Third, we did not consider the reality degree of the tertiary circulation in the collateral circulation score, because, in clinical diagnosis, this type of microvessel is of less concern than the other larger vessel. Fourth, we did not assess the relationship between vessel diameter and lesion level, whereas these quantitative parameters may serve as risk factors for vascular events [33]. As shown in Fig. 7, although we found that the diameter of the vessel presented by automatic vessel segmentation was closer to the vessel diameter of the original TOF-MRA slice than that of VR, but we did not obtain relevant parameters for further statistical analysis, because automated vessel parameter measurement has not been implemented, and this part is placed in the next step.

Fig. 7
figure 7

Cerebrovascular image quality comparison between the proposed CNN and manual VR. The first column is the original TOF-MRA MIP image, the second column is manual VR, and the third column is proposed methods. In VR, the collateral circulation was not fully reconstructed, while the CNN model successfully segmented and reconstructed relatively complete collateral circulation. It is used as one of the reference standards for raw image MIP evaluation. Red arrows indicate the location of the patient’s cerebrovascular abnormalities

In conclusion, this proof-of-concept study demonstrated the feasibility of using DL to automatically segment and visualize cerebral vessels in 3D from raw TOF-MRA scans. The proposed CNN automated method and expert manual VR showed comparable image quality for vessel integrity, collateral circulation, and lesion morphology, with few significant qualitative differences. Although the CNN automated approach is neither intended nor likely to replace all expert manual processing, it has the potential to expand the accessibility of TOF-MRA-rich automated processing by relieving the labor involved, whereas the potential role of CNN automatic segmentation on TOF-MRA medical images needs further research and verification such as cerebrovascular digitization and intelligent disease-assisted diagnosis. These have gradually emerged in the related applications of CT angiography medical images in recent years [3, 34].