Keywords

1 Introduction

Accurate segmentation of liver tumors is an important guarantee for the success of liver cancer surgery. CT imaging technology is a common way for doctors to diagnose liver cancer. Comparing with other medical imaging technologies, CT images have the characteristics of clear imaging and high signal-to-noise ratio, playing an important role in the diagnosis and treatment of liver diseases. In the actual clinical diagnosis, doctors are required to manually segment liver tumors on CT images. The segmentation process requires a lot of time and effort, and the segmentation results are greatly affected by human subjectivity. To alleviate this issue, researchers have developed some computer-aided methods for liver tumor segmentation. These methods can be divided into three categories, namely traditional methods [1,2,3,4], machine learning-based methods [5, 6], and deep learning-based methods [7,8,9,10,11].

Traditional segmentation methods mainly include thresholding [1], region-based growth [2], level set and active contours [3, 4], and so on. Traditional methods mainly use manual extraction of features, resulting in inaccurate segmentation effects, especially the target of medical images to be segmented may not have obvious contours, etc. Machine learning based segmentation usually consist of feature extraction and classification or regression. Massoptier et al. [5] first segmented the liver in the CT image by the dynamic contour method, and then segmented the tumor on the liver using the K-means clustering method. Shi et al. [6] used the AdaBoost score Similar algorithm realizes the automatic segmentation of liver tumors. Machine learning-based methods need to manually set the features to be extracted and the features used could heavily affect the segmentation results, which are still affected by subjective experience and prior knowledge, and the segmentation efficiency needs to be improved.

Deep learning methods are widely used in the field of medical image processing. For example, Guo et al. [7] proposed ALexNet [8] liver tumor segmentation model based on the FCN structure. Ronneberger et al. [9] proposed U-Net for medical image segmentation. Christ et al. [10] proposed a method of segmenting liver tumors with a cascaded fully convolutional network. He [11] proposed that ResNet can directly pass the input information to the next layer by introducing the identity mapping structure to form a residual network. With the advent of ResNet, more and more deep convolutional networks are used in the field of medical image processing.

After exploring existing liver tumor segmentation methods based on convolutional neural networks, we can find that they still have two limitations. Firstly, the shallow convolutional neural network is plagued by the problem of deep network degradation, and the ability of the network to extract features is limited. Secondly, the output of the high-level convolutional layer tends to lose part of the detailed information like the number of layers increases and the pooling process, resulting in a rougher feature map obtained by upsampling, which affects the accuracy of liver tumor segmentation. To alleviate these issues, we proposed a modified method for liver tumor segmentation based on DFCN. Experimental results showed that the DFCN model had better feature expression capability and more generalization performance, which improved the segmentation accuracy of the model.

2 Method

To solve the problem of accurate segmentation of liver tumors, we proposed an improved method of liver tumor segmentation. First of all, this method overcomes the problem of deep network degradation and improves the rough expression of the fuzzy segmentation results of the fully convolutional network, and then introduces a balanced loss function to train the network. Finally, the fully connected conditional random field is used to optimize the liver tumor segmentation results of DFCN.

2.1 DFCN Segmentation Model

ResNet with the fully connected layer removed is used as the basic network of DFCN, where ResNet is formed by stacking residual units and has 24 layers in total. Each residual unit is composed of two convolutional layers with BN layers. The basic network is divided into 5 convolutional stages with the pooling layer as the demarcation point. The scale of the feature map generated at each stage is different: from shallow to deep, the original image size, 1/2 original image size, 1/4 original image size, 1/8 original image size, and 1/16 original image size. A side output layer is connected at the end of each convolution stage, and each side output layer is responsible for supervising the feature map generated by the convolution stage. The side output layer is composed of a convolution layer with a convolution kernel size of 3 × 3 and an output channel of 16 and a deconvolution layer. The deconvolution layer is responsible for upsampling feature maps of different scales to the original size. The feature maps with different scale information generated by each side output layer are superimposed and input into the fusion layer. The fusion layer linearly fuses the features of each scale through a convolution layer with a convolution kernel size of 1 × 1. Finally, the fused result is sent to the classifier as the output of DFCN for classification.

In this paper, when using DFCN for CT image liver tumor segmentation, we find that the receptive field in the first convolution stage is small, and it is easy to extract local image noise, which affects the entire tumor segmentation, so we only use feature maps for the last four convolution stages. In this paper, inspired by the structure of the cascaded full convolution network proposed by Christ et al. [10], the DFCN of the cascaded structure is designed. As shown in Fig. 1, two DFCNs with the same structure are trained to segment the liver and tumor respectively. The first DFCN focuses on segmenting the liver from the CT slices of the abdomen, and then the liver ROI was cut out from the original image through the liver segmentation results, and the second DFCN focuses on segmenting liver tumors from the liver ROI.

Fig. 1.
figure 1

Cascaded liver tumor segmentation network.

2.2 Training of DFCN Network

This article introduces the cost-sensitive loss objective function [12] in the process of calculating the network loss function, the loss function generated by all the side output layers in the DFCN network is:

$$ L_{side} \left( {W,w} \right) = \sum\nolimits_{m = 1}^{M} {\alpha_{m} l_{side}^{\left( m \right)} \left( {W,w^{\left( m \right)} } \right)} . $$
(1)

Because of the imbalance of the number of positive and negative sample pixels, this paper introduces the balance parameter β according to the cost-sensitive method. For each side output layer, the specific result of the loss function is:

$$ \begin{aligned} L_{side} \left( {W,w^{\left( m \right)} } \right) = & - \beta \mathop \sum \limits_{{j \in Y_{ + } }} logPr\left( {y_{i} = 1 |X;W,w^{\left( m \right)} } \right) \\ & - \left( {1 - \beta } \right)\sum\nolimits_{{{\text{j}} \in {\text{Y}}_{ - } }} {{\text{logPr}}\left( {{\text{y}}_{\text{j}} = 0 | {\text{X}};{\text{W}},{\text{w}}^{{\left( {\text{m}} \right)}} } \right)} , \\ \end{aligned} $$
(2)

where \( \beta = \left| {Y_{ - } } \right|/Y, \) \( 1 - \beta = \left| {Y_{ + } } \right|/\left| Y \right|, \) \( \left| {Y_{ - } } \right| \) represents the set of positive sample label pixels, and \( \left| {{\text{Y}}_{ + } } \right| \) represents the set of negative sample label pixels. The loss function of the network includes two parts: the loss function \( L_{side} \left( {W,w} \right) \) generated by the output layer on all sides and the loss function \( L_{side} \left( {W,w,h} \right) \) generated when the fusion layer predicts the final segmentation result, where \( {\text{h}} \) represents the weight parameter of the fusion layer.

In this paper, a stochastic gradient descent algorithm with momentum parameters is used to optimize the loss function of the network model. During the training process: the learning rate is set to \( 10^{ - 7} \) and the momentum parameter is 0.9. In order to prevent overfitting, the regular term coefficient is set to 0.0002, a total of 50000 iterations. In order to visualize the training process, this paper records the Loss generated by the network segmentation tumor every 100 iterations and draws a line graph.

The Loss line chart is shown in Fig. 2(a). In the training set and verification set, not all CT slices contain tumors, the Loss generated when there is no tumor is small, so it can be seen from the figure that the Loss fluctuates locally. But as the number of iterations increases, the overall trend of the Loss gradually declined and eventually stabilized in a lower range. The Dice line chart is shown in Fig. 2(b). Similarly, when recording the Dice similarity coefficient, the Dice similarity coefficient of CT slices that do not contain tumors will be 0. When drawing a line chart, this article discards the item whose Dice similarity coefficient is 0. As the number of iterations increases, the Dice similarity coefficient gradually increases, and finally stabilizes in the training set in the range of 70% ± 20%, and in the validation set in the range of 55% ± 20%.

Fig. 2.
figure 2

Line charts during training: (a) Loss, (b) Dice.

2.3 FC-CRF Optimization Process

DFCN improved the roughness of segmentation results. However, it does not fully consider the relationship between pixels and lacks a priori constraints on context information, resulting in a lack of spatial consistency in segmentation results. To resolve this issue, we use fully FC-CRF [13] to further optimize the segmentation results.

The energy function \( E\left( x \right) \) in the fully connected conditional random field is:

$$ E\left( x \right) = \sum\nolimits_{i} {\varphi_{u} \left( {x_{i} } \right) + \sum\nolimits_{i \ne j} {\varphi_{p} \left( {x_{i} ,x_{j} } \right),} } $$
(3)

where \( \varphi_{u} \left( {x_{i} } \right) \) indicates the unary energy term, which represents the probability that the i-th pixel belongs to the category label \( x_{i} \), and \( \varphi_{p} \left( {x_{i} ,x_{j} } \right) \) represents the binary energy term, which represents the probability that the pixel points \( i \) and \( j \) belong to the labels \( x_{i} \) and \( x_{j} \) at the same time. The binary energy term considers the interaction between adjacent pixels and uses spatial context information. Its expression is:

$$ \varphi_{p} \left( {x_{i} ,x_{j} } \right) = \mu \left( {x_{i} ,x_{j} } \right)\left( {w^{\left( 1 \right)} \exp \left( { - \frac{{\left\| {p_{i} - p_{j}^{2} } \right\|}}{{2\sigma_{\alpha }^{2} }} - \frac{{\left\| {I_{i} - I_{j} } \right\|}}{{2\sigma_{\beta }^{2} }}} \right) + w^{\left( 2 \right)} \exp \left( { - \frac{{\left\| {p_{i} - p_{j}^{2} } \right\|}}{{2\sigma_{\gamma }^{2} }}} \right)} \right) , $$
(4)

\( \mu \left( {x_{i} ,x_{j} } \right) = \left[ {x_{i} \ne x_{j} } \right] \) is the label compatibility function. When adjacent pixels are assigned different category labels, \( \mu \left( {x_{i} ,x_{j} } \right) \) is a penalty term, and it can be understood that similar pixels tend to be classified into the same category. The parameters \( \sigma_{\alpha } \), \( \sigma_{\beta } \) and \( \sigma_{\gamma } \) are used to control the scale of the Gaussian kernel function.

The solution of FC-CRF can be transformed into the energy function minimization problem. The average field approximation algorithm proposed by Krähenbühl [13] et al. is used to reduce computational complexity. First, the pre-processed abdominal CT image is input into the DFCN to predict the probability of each pixel being classified as a tumor and output a probability map, and then connect an FC-CRF to optimize the DFCN segmentation results. The input of FC-CRF includes two parts, which are the probability map and the pre-processed CT image. The probability map provides unary potential energy, and the color and spatial position information between pixels provided by the CT image after preprocessing is used as binary potential energy. Finally, it continuously iterates through the average field approximation algorithm until the energy function value is minimum, and then outputs the liver tumor segmentation results.

3 Experimental Results

3.1 Data Preprocessing

The experimental data use the data set officially provided by the CT Image Live Tumor Segmentation Challenge Competition (LiTS) [14]. Since the sponsor of LiTS does not disclose the label information of the liver and tumor of 70 patients in the data set, the data of 130 patients are used in the experiment. Among them, 100 patients’ data are used for network training, 10 patients’ data are used for verification, and 20 patients’ data are used to test the trained network. Abdominal CT images need to be pre-processed before segmentation.

The pre-processing mainly includes: window technology processing [15], data enhancement, and normalization processing. K. Sahi et al. [16] have given that the window width of the liver is [−62,238]. To enhance the contrast of the liver in the abdominal CT image, the window of the abdominal CT is set to [−150,250] through the window technique. Since the density of liver lesions will decrease compared with normal liver tissue, the lower limit of the window is set to −150 to ensure that the liver lesions are not removed. Due to the lack of CT image data, this paper uses the way of data enhancement to expand the data. To solve the comparability between different data features, this paper uses the minimum-maximum standardized processing method. It processes the pixel matrix of the image, finds the minimum value \( X_{min} \) and maximum value \( X_{max} \) in the entire pixel matrix, and then we normalize these data by using Eq. (5). \( f \) is a coefficient in Eq. (5) that can control the range of normalization. If normalized to [0,1], the factor value is 1, normalized to [0,255], the factor value is 255.

$$ X_{norm} = f *\frac{{X - X_{min} }}{{X_{max} - X_{min} }} . $$
(5)

3.2 Segmentation Results

To verify the superiority of the proposed liver tumor segmentation method over the counterparts including FCN [17] and DRIU [18], this article selected 30 photos in the test set for comparison. Figure 3 compares the similarities and differences between doctor marks and the results of DFCN, FCN, and DRIU. It can be seen that the segmentation result of FCN is relatively rough, the tumor contour is quite different from the tumor label image marked by the doctor, and when the tumor size is small and the grayscale is uneven, FCN cannot segment those tumors. The segmentation results of DRIU is more accurate than that of FCN, and the segmentation result is closer to the label map. However, when the tumor size is smaller and the grayscale is uneven, DRIU cannot also segment those tumors. The segmentation result of DFCN is more accurate than DRIU, and the segmentation result is closest to the tumor label map marked by the doctor. It can also be segmented if it encounters tumors with smaller grayscale unevenness and smaller size. The experimental environment is Ubuntu 16.04 + python2.7 + TensorFlow, the experimental equipment is a Dell computer with GPU, and its GPU model is TITAN X.

Fig. 3.
figure 3

Qualitative comparison of segmentation results obtained by different methods: (a) original CT image, (b) tumor label, (c) FCN [17], (d) DRIU [18], (e) DFCN.

Table 1 lists the quantitative comparison of segmentation results. It can be seen from Table 1 that the liver tumor segmentation effect of the DFCN model is superior to the other two deep learning segmentation methods in the Dice similarity coefficient, Recall, Precision, and F-measure [19]. Regardless of FCN or DRIU, their network layers are relatively shallow, and they cannot learn the deep semantic features of liver tumors in CT images.

Table 1. Quantitative comparison on liver tumor segmentation.

3.3 Optimization Results

To verify the optimization effect of the FC-CRF model on the DFCN tumor segmentation results, 100 CT images are selected in the test data set for comparative experiments. Part of tumor segmentation results is shown in Fig. 4, followed by the abdominal CT map, tumor labeling map, DFCN segmentation results and FC-CRF optimization results. It can be seen from Fig. 4 that the segmentation results of liver tumors optimized by FC-CRF add more detailed expression than the segmentation results before optimization, which is closer to the label diagram marked by the doctor. The experiment is conducted on the Ubuntu system, where FC-CRF is implemented using python’s pydensecrf package. The densecrf interface provided by the program can solve FC-CRF using mean-field approximation algorithm.

Fig. 4.
figure 4

FC-CRF optimization results: (a) original CT image, (b) tumor label, (c) DFCN, (d) DFCN + FC-CRF [13].

It can be seen from Table 2 that after the DFCN liver tumor segmentation results are optimized by FC-CRF, all four indicators are improved, and the segmentation results are closer to the tumor label. Not only the prediction probability of each pixel is considered, but also the full use of the correlation between the gray value and position of all pixels in the CT image, which increases the constraints of context information, thereby improving the detailed expression and spatial consistency of the liver tumor segmentation results.

Table 2. Segmentation accuracy comparison of DFCN without and with FC-CRF optimization.

4 Conclusion

In this paper, we propose a liver segmentation method based on a deep fully convolutional network. The proposed method develops a cascaded network to segment liver tumors, and uses fully connected conditional random fields to further optimize the segmentation results. We qualitatively and quantitatively evaluate the proposed method on clinical data containing 30 sets of CT images. Experimental results show that the proposed method improves the accuracy of liver tumor segmentation. However, the proposed method does not use spatial information of liver tumors for the segmentation. In the future, we will develop 3D convolutional networks to solve this problem.