Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Blood vessel segmentation in ultrasound images may be useful in medical applications such as deep venous thrombosis detection [5], anesthesia guidance [12] and catheter placement. The goal of vessel detection in this work, was to identify the position and size of blood vessels in the image. Several segmentation and tracking methods require this as an initialization [1, 6]. In [12], a real-time vessel detection method was introduced, removing the need for manual initialization. This method performs an ellipse fitting at each pixel in the image using a graphic processing unit (GPU). However, this method has problems distinguishing vessels from non-vessels when varying user settings, such as gain, on the ultrasound scanner, and on individuals with more subcutaneous fat tissue, due to increased amounts of reverberation artifacts. Also, this method was only made to detect a single vessel for each image.

In this paper, we propose to use a similar ellipse fitting method to find vessel candidate regions which are passed on to a deep neural network classifier which determines if the region contains a vessel or not. As the proposed detection method provides both position and size, it may also be used as a vessel segmentation method, assuming the vessel has an elliptical shape. The proposed method also enables detection of multiple vessels at the same time.

2 Methods

The next section will introduce the elliptic vessel model, which was used to find vessel candidate regions in the ultrasound image. Subimages were created from the ultrasound image for each vessel candidate. A deep convolutional network determines if each subimage is of an actual blood vessel. Figure 1 provides an overview of the steps involved in the proposed method.

Fig. 1.
figure 1

Overview of the proposed method. The first step finds vessel candidates and creates subimages for each. The subimages are then passed on to a deep neural network which identifies the subimages belonging to vessels, and discards those that are not of vessels.

2.1 Vessel Model

Each vessel is modelled as an ellipse with center \(\varvec{c} = [c_x, c_y]\) and major and minor radius a and b. The point \(\varvec{p}_i\) and its normal \(\varvec{n}_i\) of point i on an ellipse of N evenly distributed points can be calculated with the following equations.

$$\begin{aligned} \alpha _i&= \frac{2\pi i}{N} \end{aligned}$$
(1)
$$\begin{aligned} \varvec{d}_i&= \left[ a\cos (\alpha _i), b\sin (\alpha _i) \right] \end{aligned}$$
(2)
$$\begin{aligned} \varvec{p}_i&= \varvec{c} + \varvec{d}_i \end{aligned}$$
(3)
$$\begin{aligned} \varvec{n}_i&= \frac{[b\cos (\alpha _i), a\sin (\alpha _i)]}{|[b\cos (\alpha _i), a\sin (\alpha _i)]|} \end{aligned}$$
(4)

2.2 Vessel Candidate Search

First, the image is blurred using convolution with a Gaussian mask \((\sigma = 0.5\mathrm{{mm}})\) and then the image gradients \(\varvec{G}\) are calculated using a central difference scheme. For a given radii a and b, the vessel score S is calculated as the average dot product of the outward normal \(\varvec{n}_i\) and the corresponding image gradient at N points on the ellipse, as shown in (5).

$$\begin{aligned} S(\varvec{c}, a, b)&= \frac{1}{N}\sum _{i = 0}^{N-1} \varvec{n}_i \cdot \varvec{G}(\varvec{p}_i) \end{aligned}$$
(5)

For each pixel, ellipses of different major radius a ranging from 3.5 to 6 mm, flattening factor f from 0 to 0.5 (minor radius \(b = (1-f)a\)) and \(N = 32\) samples were used to calculate the vessel score. An increment of 0.25 mm was used for the radius, and 0.1 for the flattening factor. The ellipse with the highest score is selected for each pixel. The best score and the values a and b is stored for each pixel. Any vessel candidate with a score below 1.5 is discarded. This is a low threshold, which will not discard vessels with low contrast, but will also allow several non-vessel regions. Next, the vessel candidates are sorted according to their score from high to low. These are then processed in order, and a vessel candidate is accepted if the center is not inside another vessel candidate structure already accepted. Any vessel candidates which overlap with previously accepted vessel candidates are discarded.

For each vessel candidate, a square subimage is created from the ultrasound images as shown in Fig. 1. Examples of vessel candidates images are shown in Fig. 2. The vessel candidate image is centered at the vessel center \(\varvec{c}\) and the size of the image is determined by the major radius a so that the width and height of the image is \(4a \times 4a\) converted to pixels. This image size will thus include the vessel as well as some surrounding tissue.

2.3 Vessel Classifier

The next step of the proposed method is to send each vessel candidate image through a deep convolutional neural network classifier to determine if the image belongs to a blood vessel. Caffe [7] was used as the underlying framework both for training and testing of the classifier, while the vessel candidate search was implemented with the FAST medical image computing framework [11].

Data: The data used for training and validation was acquired by first scanning the femoral region of both legs of 15 subjects with varying image quality and different ultrasound acquisition settings. Every tenth frame was run through the vessel candidate search step and the resulting images were stored on disk. This resulted in 12,804 images in total. All images were resized to \(128 \times 128\), and classified manually as either vessel or non-vessel. Figure 2 show some image examples of both blood vessels and non-vessel structures. The ultrasound system used was an Ultrasonix SonixMDP (Analogic, Boston, USA) with L14-5 linear array probe. To increase the amount of training data, all vessel candidate images were flipped horizontally, effectively doubling the amount of training data.

Fig. 2.
figure 2

Vessel candidate images of blood vessels and other non-vessel structures used for training the neural network.

The Network: The AlexNet [8] network was used initially, and gradually simplified by removing convolution-pooling blocks and reducing the number of convolutions, while maintaining the validation accuracy. The network was simplified mainly to improve the test runtime speed, which was important in order to achieve real-time performance. The final vessel classification network consisted of two convolution layers, one normalization layer, two max pooling layers and three fully connected layers. Additionally, rectified linear units (ReLU), which have shown to improve training [4], was used as non-linear activation units both for the convolution layers and the fully connected (FC) layers. Thus, including ReLU layers, the network consisted of 13 layers in total, as shown in Fig. 3. Additionally, a softmax loss layer was used for the training of the network. The data layer size was fixed to \(110 \times 110\) pixels. During training, random patches of size \(110 \times 110\) were cropped from the \(128 \times 128\) vessel candidate images to prevent overfitting. This technique increased accuracy with about 1 %. The mean image was calculated from the training data and subtracted from the input image. The first convolution layer had 9 convolutions of size \(11 \times 11\) pixels and the second had 32 convolutions of size \(15 \times 15\). The max pooling was done over patches of \(3 \times 3\). Local response normalization (LRN) [8] was used after the first convolution layer with the same parameters as in [8]. Dropout was used on the fully connected layers with a probability of 0.5. The network was trained with stochastic gradient descent, batch size 128, momentum 0.9 and weight decay 0.0005. The base learning rate was 0.01 with a sigmoid learning rate decay.

Fig. 3.
figure 3

The vessel detection network. A fixed-size input image of size \(110 \times 110\) is feed into two convolution-pooling stages with 9 and 32 convolutions respectively. This is followed by three fully connected (FC) layers with dropout to reduce overfitting. A local response normalization (LRN) is performed after the first convolution. Rectified linear units (ReLU) are used as non-linear activation units in all stages.

2.4 Performance Optimizations

Ultrasound is a real-time imaging modality, delivering typically 10–20 images per second. The proposed method thus have to find vessels in each image in less than 100 ms to be able to process the ultrasound image stream in real-time. The vessel candidate search, subimage creation and resizing were all implemented on the GPU using the FAST framework. Caffe was run in GPU mode and all vessel candidates for a given image frame were batch processed, which significantly boosts performance. Additionally, the vessel candidate search was only performed on every fourth pixel.

3 Results

Figure 4 show the convolutions learned by the neural network. These figures show that the first convolutional layer learns to detect horizontal edges, and the second layer learns to identify different patterns of horizontal edges. The trained neural network does not seem to find vertical edges as important in the ultrasound images. This seems sensible, as vertical edges are often weaker or missing in ultrasound images.

Fig. 4.
figure 4

Features learned by the neural network. The first layer has learned several horizontal edge detectors, while the second convolutional layer has learned to recognize patterns of horizontal edges.

Fig. 5.
figure 5

Vessel detection result on two ultrasound images.

Leave-one-subject-out cross validation was used, thus 14 subjects were used for training and 1 subject kept for validation. The average classification accuracy for the cross validation was 94.5 %, with a standard deviation of 2.9. This was calculated using a discrimination threshold of 0.5 on the softmax output of the vessel classifier. Figure 5a shows the result of the vessel detection on an image of the femoral region.

This dataset was only from a single area of the body, the femoral region covering the femoral artery and vein. To see how well the proposed method can generalize to other parts of the body, a dataset of the left and right carotid artery was acquired from two subjects and used as validation data, while the dataset with the 15 subjects of the femoral region was used as training data. The dataset was created with the same method described in Sect. 2.3. With this data, the method achieved an accuracy of 96 %. Figure 5b shows the result of the vessel detection on an image of the carotid artery.

The proposed method was compared to a another state of the art vessel detection method [12]. This method achieved an average accuracy of 84 % on the femoral region dataset. The receiver operating characteristics (ROC) curves in Fig. 6 show how the two methods perform when varying the discrimination threshold for the same dataset.

Fig. 6.
figure 6

ROC curve of the proposed method and the method in [12].

Training time was about 10 min on a laptop computer with an NVIDIA GTX 980M GPU with 8 GB of memory. The average runtime of all steps including the vessel candidate search and vessel classification was 46 ms, enabling the ultrasound images to be processed in real-time.

4 Discussion

The vessel model used in the proposed method assumes that the vessels are elliptical, while this often holds true for arteries, it may not be ideal for veins which often have a more irregular shape. Thus, the proposed method is more suited for arteries than veins. The vessel model also does not consider rotation of the ellipse. However, in our experiments this has not been an issue as vessels usually are compressed in the vertical direction, due to pressure from the ultrasound probe applied by the user. Including rotation in the vessel candidate search would significantly reduce runtime performance.

An alternative to the proposed ellipse fitting method would be to use more general object detection methods, such as R-CNN [3, 10]. However, these methods are more complex and bounding boxes would have to be created manually around each vessel in each image, which is time consuming. With the ellipse fitting method, the user only have to choose between the classes “vessel” and “non-vessel” for each vessel candidate subimage. Thus, the proposed ellipse fitting method aids in the labeling of the data.

Another alternative can be to use a fully convolutional neural network [9]. Such as network would provide a classification of each pixel. The ground truth data could be created by a user selecting the center of each blood vessel. Using such as network may be more robust in terms of rotation and deformation of the blood vessels. However, it would not provide the radius of the vessels and a segmentation as shown in Fig. 5.

The validation accuracy was 94.5 %, which is a major improvement from the vessel detection method in [12] which got an accuracy of 84 % on the same validation dataset. The accuracy may be improved by adding more training data, including the temporal dimension of the data with recurrent neural networks, and including Doppler data in a separate image channel. In the proposed network, the weights were initialization using Gaussian noise with standard deviation 0.01. Unsupervised pre-training has shown to be a good way to initialize the weights of deep neural networks [2]. With ultrasound imaging, a large amount of unlabeled data can easily be acquired from the target body regions. Thus, we believe unsupervised pre-training of deep networks will be a useful technique within ultrasound imaging.

5 Conclusion

A robust real-time vessel detection method for ultrasound images was presented. The method uses a deep convolutional neural network to classify subimages. Although the neural network was only trained on images of the femoral artery and vein, it is able to generalize to other vessels such as the carotid artery.