1 Introduction

Contrast-enhanced Whole-Body Magnetic Resonance Angiography (WBMRA) is performed by injecting a contrast agent and acquiring images using an MRI scanner as it passes through the arteries of interest [14]. This technique generates high contrast in the lumen (the channel where the blood is flowing), providing a non-invasive, comprehensive imaging method for assessing cardiovascular disease (CVD) throughout the entire body [19]. Analysing these large datasets is very labour-intensive however, and thus there is a great need for automated, quantitative analysis tools to help stage the disease from these scans.

The first stage of any such system is to locate and segment the arteries of interest. The segmentation of vascular structures is a common task to many medical applications [3, 8, 13], and is a fundamental step in the the quantification of pathologies such as stenoses.

Many vessel segmentation techniques have been proposed in the literature, as explored in [8, 12]. In this work we examine three commonly used techniques—active contours and two “vesselness” filters—comparing their results against a more recent approach using a convolutional neural network (Convnet), structured as a voxel-wise binary classifier following the network structures explored in [21].

2 Materials and Methods

2.1 Patient Data and Ground Truth

The data used in this study consists of three whole-body datasets, each of which are split into four “stations”; station one comprised the head and neck, station 2 the thorax and abdomen, station 3 the pelvis and thighs, and station 4 the feet (see Fig. 1). These were acquired at Ninewells Hospital in Dundee, UK, using a 3.0 Tesla MRI scanner (Siemens Magnetom Trio).

Fig. 1.
figure 1

Maximum intensity projections of the four stations of patient 1, shown after digital subtraction of the pre-contrast from the post-contrast volumes.

The arteries in each individual station were manually segmented by a trained observer using the software package “3DSlicer” [4]. The following pre-processing steps were applied to aid visualisation during the manual segmentation; firstly all volumes were scaled in one direction due to varying slice thicknesses, giving an isotropic voxel size of \(0.98\,\mathrm{{mm}}^3\) (the raw slice thicknesses varied from 0.98 mm–1.3 mm). Next the pre-contrast volume was registered to the post-contrast volume using the mutual information similarity measure. Subtracting the pre-contrast from the post-contrast volume was done to suppress static tissues. To aid with visualisation, an intensity equalisation step was applied in the axial direction, ensuring a consistent vessel intensity across the entirety of each volume. A final artefact correction step was applied, masking border voxels to remove MR artefacts and tissues not covered by the pre-contrast volume.

2.2 Active Contours

Segmentation using active contours, where an initial curve is evolved using a cost function depending on local gradients (external forces) and shape constraints (internal forces), was first proposed in [7] and has been successfully applied to many segmentation problems [1, 12]. In general, active contour methods are based on image gradient, detecting edges and generating well-defined boundaries on which to evaluate the internal and external energy.

The level set model tries to solve the optimisation problem by embedding the active contour as a constant set (zero level) in a function \(\phi \) that evolves in time with speed S.

For our comparative study we chose the classic Chan-Vese model [1]. It has been applied to the segmentation of objects whose edges are not well defined by the gradient, and has a well defined implementation for 3D segmentation as described in [25].

The Chan-Vese model is formulated as a “mean-curvature flow”-like evolving active contour, where the stopping term depends not on the gradient of the image, as in classical active contour models, but is instead related to a particular segmentation of the image [1].

For 3D data, we define the bounded domain \(\varOmega \in \mathbb {R}^n\) (in our case \(n=3\)), and the bounded image function \(I : \varOmega \rightarrow \mathbb {R}\). \(\varOmega \) can be divided into a set of connected domains by a curve C by \(\varOmega - C = \cup _{i \in I} \varOmega _i\). We then define two different regions \(R_1 = \cup _{i \in I_1} \varOmega _i\) and \(R_2 = \cup _{i \in I_2} \varOmega _i\) that represent the object support and the background support respectively. The final energy functional \(E(\phi , \mu _1, \mu _2)\) is then given by

$$\begin{aligned} \begin{aligned} E(\phi , \mu _1, \mu _2)&= \lambda _1 \int _\varOmega (I-\mu _1)^2 H(\phi ) d\varOmega + \lambda _2 \int _\varOmega (I-\mu _2)^2 (1-H(\phi )) d\varOmega \\&\qquad \qquad \qquad \qquad \quad + \alpha \int _\varOmega H(\phi ) d\varOmega + \beta \int _\varOmega |\bigtriangledown H(\phi )| d\varOmega \end{aligned} \end{aligned}$$
(1)

where \(\mu _1\) and \(\mu _2\) represent the mean value of the object support region and background support region of image I respectively, and \(H(\phi )\) is the Heaviside function. Here, the first two terms measure the variations inside and outside the active contour, the third term measures the area inside the contour and the fourth term measures the length of the contour [25].

2.3 Vessel Enhancement Filters

Frangi Filter. This method of enhancing vessel-like structures is based on calculate the local curvature by analysing the Hessian function, so as to extract the main directions in which the local structure can be decomposed [5].

To derive the “vesselness” function we first define \(\lambda _{k}\) as being the eigenvalue with the k-th smallest magnitude, i.e. (\(|\lambda _{1}| \le |\lambda _{2}| \le |\lambda _{3}|\)). Therefore, for an ideal tubular structure in a 3D image

$$\begin{aligned} |\lambda _{1}| \approx 0, |\lambda _{1}| \ll |\lambda _{2}|, |\lambda _{2}| \approx |\lambda _{3}| \end{aligned}$$
(2)

In other words, the curvature should be large in the two directions (around the circumference of the vessel), and very small along the length of the vessel.

The final vesselness function to be evaluated was defined in [5] as being

$$\begin{aligned} V_0(s) = \left\{ { \begin{array}{cc} 0 &{} \text {if } \lambda _2> 0 \text { or } \lambda _3 > 0 \\ \left( 1-\exp \left( -\dfrac{R_A^2}{2\alpha ^2}\right) \right) \exp \left( -\dfrac{R_B^2}{2\beta ^2}\right) \left( 1-\exp \left( -\dfrac{S^2}{2c^2}\right) \right) &{} \text {otherwise} \end{array}} \right. \end{aligned}$$
(3)

where \(R_A = |\lambda _2| / |\lambda _3|\), \(R_B = |\lambda _1| / \sqrt{|\lambda _2 \lambda _3|}\), \(S = \sqrt{\lambda _1^2 + \lambda _2^2 + \lambda _3^2}\). Here, \(\alpha \), \(\beta \) and c are thresholds which control the sensitivity of the filter to the measures \(R_A\) \(R_B\) and S.

For the application of enhancing vessels in a 3D angiographic dataset, the vesselness measure in Eq. (3) is analysed at different scales, s, corresponding to the sigma of the Gaussian kernel used in the construction of the Hessian matrix. It logically follows that the response of the filter will be maximised at the scale which approximately matches the size of the vessel in that region. Therefore, the final estimate of vesselness is obtained by integrating the vesselness measure provided by the filter response at different scales,

$$\begin{aligned} V_0(\gamma ) = \max _{s_{min} \le s \le s_{max}}V_0(s,\gamma ) \end{aligned}$$
(4)

where \(s_{min}\) and \(s_{max}\) are the minimum and maximum scales at which relevant structures are expected to be found, chosen so that they cover the range of relevant vessel widths. The vesselness map can then be thresholded to provide a binary vessel tree.

Optimally Oriented Flux. The “optimally oriented flux” filter, first published in [10], evaluates a scalar measure of the flux flowing through a spherical surface. Before computing this value, directional information is extracted by projecting the gradient along “optimal” axes, and the flux measure then evaluated. For each voxel a sphere with variable radius is built, centred on the voxel, which produces an “OOF response” when touching an object edge. If the voxel is inside the curvilinear structure the response will be positive, otherwise it will be negative.

The outwardly oriented flux along the direction \(\hat{\rho }\) is firstly computed by projecting the gradient of the image \(\mathbf v \) along \(\hat{\rho }\), with the flux then evaluated through the spherical region \(S_r\) with radius r using the definition

$$\begin{aligned} f(\mathbf x ; \mathbf r , \hat{\rho }) = \int _{\delta Sr} ((\mathbf v (\mathbf x + h ) \cdot \hat{\rho })\hat{\rho }) \cdot \hat{n} dA \end{aligned}$$
(5)

where dA is the infinitesimal area of \(S_r\), \(\hat{n}\) is the unit normal to the surface at position \(h = r \hat{n}\).

As before, the goal is to obtain the principal eigenvalues for each voxel. Inside the vessel, when the local spherical region with surface \(S_r\) touches the boundaries of the object \(\mathbf v \) is aligned opposite to the direction of \(\hat{n}\), therefore the eigenvalues \(\lambda _1 \le \lambda _2 \ll 0\). The gradient of the image will be perpendicular to the direction of the curvilinear structure, with a value of \(\lambda _3 \approx 0\). In the case where the voxel is in the background, \(\mathbf v \) will have the same direction as \(\hat{n}\), and therefore \(\lambda _3 \gg 0\).

To obtain the maximum response to the OOF while changing the radius r, we evaluate of the geometric mean of the eigenvalues, as

$$\begin{aligned} M(\mathbf x ; s) = \left\{ { \begin{array}{cc} \sqrt{|\lambda _{1}(\mathbf x ,s) \lambda _{2}(\mathbf x ,s)|} &{}\quad \lambda _{1}(\mathbf x ,s) \le \lambda _{2}(\mathbf x ,s) < 0 \\ 0 &{}\quad \text {otherwise} \end{array}} \right. \end{aligned}$$
(6)

where s represents the scale factor. Similar to the Frangi approach, evaluating the maximum response over an appropriate range of scales generates the final map, which can be thresholded to produce a final segmentation.

2.4 Convolutional Neural Network

In recent years, deep Convnet approaches have been driving advances in many computer vision tasks, such as image classification [9, 21] and image segmentation [18, 20]. Many network models have been developed for these tasks, and it is a very active area of research [11]. The network structure we chose was inspired by those explored in [21], and recently applied to segmentation tasks in MRI [2, 15, 17]. To the best of our knowledge, this is the reported results of applying a Convnet to vessel segmentation in WBMRA, for which no public sets of manually annotated vascular networks currently exist.

Fig. 2.
figure 2

Structure of the 3D Convnet segmentation network. All layer activation functions were “ReLU” except the final output node, which was “sigmoid”. The “Adam” optimiser was used during training, with “binary cross-entropy” selected as the loss function

The final network structure is shown in Fig. 2, consisting of five layers; 3 sets of convolutional and max-pooling layers, followed by two fully connected layers. The output node of the final layer gives a single binary output of vessel/non-vessel for the central voxel of the input patch.

Our network was implemented using Keras v1.1.0 and Theano v0.8.2. All layer activation functions were “ReLU” except the final output node, which was “sigmoid”. The “Adam” optimiser was used during training, with “binary cross-entropy” selected as the loss function [6].

While this approach of a voxel-wise classifier has been shown to be less computationally efficient than a fully convolutional network [20], it allowed fine control over dataset balancing for our limited amount of ground truth data, and the fully connected layer gave additional flexibility to the network without increasing the required input volume size, which is inherent to the operation of convolutional layers.

2.5 Comparison Metric

There are many metrics used for evaluating the quality of segmentation in medical images [24]. For our data, we have selected the Dice Similarity Coefficeint (DSC—also referred to as the F1-Measure). This is given by

$$\begin{aligned} DSC = \dfrac{2 |X \cap Y|}{|X| + |Y|} \end{aligned}$$
(7)

where |X| is the number of all the vessel voxels in the segmentation obtained by the tested method and |Y| is the number of all the vessel voxels in the ground truth.

2.6 Pre-processing and Parameter Optimisation

Pre-processing. As can be seen in Fig. 1, the subtraction of the pre-contrast from the post-contrast volume still leaves some tissues and non-arterial structures behind, particularly in stations 1 and 2. The most problematic of these are the lungs in station 1, which contain vessels which were not included in the manual ground truth. For this reason, the small region around the lungs and heart were masked out in both the original volumes and the ground truth data, excluding this area from our analyses.

Another artefact which remained after subtraction was the variation of lumen intensity along the length of vessels. These may arise due to poor timing of the contrast agent during acquisition, or by inhomogeneities in the magnetic field (such as surface coil artefacts). A simple procedure was followed to correct these variations in each station, in the form of a local intensity normalisation.

First we make the assumption that each axial slice contains an artery, and they are the brightest objects present (which holds true for all regions except for slices above the head and below the feet—these slices were simple masked to zero after the procedure was applied). We then applied a 7-slice sliding window axially, in which the local vessel intensity was estimated from its histogram by choosing the highest frequency bin above 70% of the maximum intensity, with this value corresponding to the vessel intensity estimate for the central slice. Once calculated for the entire volume, Gaussian smoothing of the values was applied and then each slice divided by it’s corresponding estimate. An example of the results of this processing is show in Fig. 3.

Fig. 3.
figure 3

Results of intensity equalisation on station 3 of patient 1. The MIP of the raw volume is show on the left, and the equalised volume on the right.

Active Contours. For the active contour method, the Toolbox implementation provided by [25] was used. Values for \(\beta \), \(\varDelta t\), \(\lambda \) were fixed using a grid search optimisation procedure across all patients. In our case, the optimal values were found to be 0.08 for the smoothing weight term, 0.0002 for the image weight term, and 2.72 for the time step.

The final step was the initialisation of \(\phi _0\). This choice was critical as it affects the time and the speed of the evolution of the curve. So again under the hypothesis that the highest intensity voxels belong to the vessels, we took a set of seed points with high grey levels as \(\phi _0\). To keep the process completely automatic we used Otsu’s method for generating thresholds from grey-level histograms [16].

For each station we generated 10 thresholds, which served as 10 different sets of seed points. The active contour method was then applied using the above parameters, and the highest Dice score recorded.

Enhancement Filters. The optimal parameters for the enhancement filters are shown in Table 1. These were optimised for each station across all patients using a grid search, with a fixed segmentation threshold.

The final segmentations acquired by calculating the vesselness map using the parameters in Table 1, then 20 thresholds automatically calculated using Otsu’s method [16]. The highest Dice score achieved from all 20 segmentation maps was then recorded.

Table 1. Enhancement filter parameters. The scale factor and radii values are written in the form minimum:step:maxiumum.

3D Convolutional Neural Network. A number of network structures were explored during optimisation of the network structure. Inspired by models discussed in [20, 21], we trained models consisting of 2–6 convolutional layers with 16–128 \(3 \times 3 \times 3\) kernels, 1–3 max-pooling layers, and 1–2 fully-connected layers. To help combat overfitting, \(l_2\) weight regularisation was used for each convolutional layer [6], and 20% dropout used on the fully connected layers [22]. All layer weights were initialised from a scaled Gaussian distribution.

A single network was trained for each station, with training patches extracted from two patients and the trained model applied to the held-out third patient in a 3-fold cross-validation setup.

The cubic patches were varied in size according to the network structure used, based on ensuring that the deepest layer still received a patch large enough to perform meaningful calculations on. A minimum side length of 15 voxels was needed to capture the thickest vessels, leading to a side length range of 15–50 voxels for the network structures we explored. For our final network, a patch size of \(27 \times 27 \times 27\) was found to be optimal.

Finally, the number of training patches was chosen to maximise the available data. For each station, the minimum number of ground truth vessel voxels across all 3 patients was calculated, and this used as the number of positive samples to be extracted from each patient.

The data was balanced by extracting an equal amount of background samples. The position of the background samples were weighted to have two-thirds from regions within 5 voxels of a vessel and one third sampled randomly from the rest of the volume. This was found to improve the networks tendency to over estimate the diameter of the vessels when the background patches were sampled completely at random.

The total number of training samples used for each station were 136000 for station 1, 160000 for station 2, 28000 for station 3, and finally 24000 for station 4. During training, 5% of the training was data held out for validation, and the best network weights saved as those giving the highest validation accuracy score after 20 epochs (the network performance was found to typically converge after 8–12 epochs).

The networks were trained using an Nvidia Titan X Pascal GPU, with training times of between 3–4 h for each model (depending on the station and number of training samples used).

3 Results

The results of applying the four automated strategies described in Sect. 2 to all stations and patients are shown in Table 2, with the corresponding segmentation results of patient 1 shown in Fig. 4.

Fig. 4.
figure 4

Segmentation results for patient 1, shown as coronal projections.

Table 2. Dice coefficients for each method

It can be seen from Table 2 that for our case of only three patients, the OOF filter achieved the greatest mean DSC of 0.705. The 3D Convnet typically outperforms at least one of the other techniques, except for station 2. The main reason for this appears to be because of the additional artefacts left over from the imperfect volume registration and subtraction procedure (particularly the kidneys and bladder). The network approach had the most difficulty distinguishing between these artefacts and the arteries, causing it to over-segment station 2, resulting in a lower Dice score.

Looking at the segmentation results in Fig. 4, a number of observations can be made. The active contour method often produces broken vessels, such as the right branch in station 2, and has the most difficulty segmenting the finest vessels in station 4.

The Frangi and OOF enhancement filters produce visually similar results, though the OOF performs better at rejecting non-vessel artefacts (most noticeably in the brain and abdomen of station 1). Both filters do exhibit difficulties segmenting the finest vessels, such as at the bottom of station 2, and in different cases tend to either underestimate (Frangi in station 2, OOF in station 4) or overestimate (Frangi and OOF in station 3) the true diameter of the vessels as compared to the ground truth.

The Convnet performs poorest at rejecting non-arterial artefacts in station 2, but it also has the highest sensitivity to extracting fine vessels. Indeed, some of the finest vessels in the lower half of stations 2 and 3 were not present in the ground truth, having been either overlooked or rejected due to low contrast.

4 Conclusions

In this paper we have presented a quantitative comparison between four automated vessel segmentation technique for whole-body MRA data, using three manually segmented patient datasets.

In this regime of having limited ground truth data, it has been found that the Optimally Oriented Flux filter provides the best average DSC of 0.705. Visually, the Convnet approach segments vessels most consistently, with the least number of breaks, picking up finer vessels, and having the most consistently accurate diameters when compared with the ground truth. However it performed poorest at rejecting non-arterial artefacts, resulting in a lower DSC overall. It was also noted that some of the fine vessels segmented by the Convnet were not present in our ground truth. Due to having ground truth from a single observer, we are unable to estimate the quality and reliability of the ground truth data, and therefore the impact of this on the DSC results cannot be easily estimated for our data. We are not aware of any publicly available sets of manually annotated vascular networks for WBMRA volumes.

The Convnet approach appears to be mainly limited by the lack of training data. Other deep learning approaches which integrate large amounts of data augmentation, such as U-Net [18], may achieve better results at rejecting non-arterial artefacts, however those techniques have not been explored here. Another approach often used is to fine-tune a previously trained network such as GoogLeNet, a 22 layer network trained on a database of 1 million natural images [23]. Given that currently no pre-trained 3D networks exist for medical data, the training of deeper networks will require a larger database of ground truth segmentations.