Keywords

1 Introduction

Segmenting the pancreas in 3D radiological scans (e.g. an MRI volume) could provide significant insight into the severity or progression of type 2 diabetes [1] and ductal adenocarcinoma [2]. However, pancreas segmentation presents several challenges due to high structural and inter-patient variability in size and location. The greyscale intensity of the pancreas can be very similar to neighbouring tissue, and the boundary contrast can vary depending on the level of surrounding visceral fat. Differing from computer tomography (CT), the low resolution and slower imaging speed of MRI presents edge-based artefacts that blur the imaging boundaries between the pancreas and surrounding organs [3]. In existing research literature, pancreas segmentation tasks have been driven by two major methodologies: multi-atlas based [4, 5] coupled with statistical shape modeling [6], and in more recent years, convolutional neural networks (CNNs) or deep learning [3, 7, 8]. While CNNs have achieved higher quantitative accuracy scores in 2D medical image segmentation, such methods can exhibit discontinuity in predicting pancreatic regions between consecutive slices for an input volume.

This paper presents a novel approach for automatic pancreas segmentation in MRI. As illustrated in Fig. 1, the proposed method consists of two successive stages. First, a CNN specialising in blurred boundary detection is trained to predict targeted pixel-wise pancreas tissue. This deep learning stage firstly identifies the main pancreas region (ROI) in a dataset of MRI volumes [8] by training a random forest on extracted texture and probability-wise features on image patches of \(25 \times 25\) pixels. Next, inspired by the encoder-decoder architecture of SegNet [9] a new model termed Hausdorff Sine SegNet (HSSN) is developed using the ROI data. A novel loss function incorporates the modified Hausdorff distance metric and a sinusoidal component to capture local boundary information, enforce edge detection and thus raise the true pancreas prediction rate on a 2D (slice-by-slice) basis. The testing stage consists of two phases. First, the output of the trained HSSN for a given test MRI volume encodes spatial information to classify every pixel in each slice, thus forming a volumetric binary mask (VBM). The second phase generates dense contouring by further tackling the low dissimilarity between organ boundaries: a digital contrast enhancement model is utilised to improve the greyscale variation between surrounding background classes within close proximity to the pancreas. A 3D energy-minimising algorithm performs refined segmentation on the enhanced pancreas that is fused with the VBM, producing greater consistency in spatial smoothness and prediction among successive slices.

The proposed method, which is evaluated on two MRI datasets with varying noise, outperforms the state-of-the-art approaches [8, 10,11,12], and moreover, surpasses the performance of readily employed deep learning-based loss functions. Although this approach has been tested on pancreas segmentation, the methodology is reproducible, scalable and generalisable to other organ segmentation tasks. The implementation is available at https://github.com/med-seg/p.

Fig. 1.
figure 1

Overview of proposed approach. (1) develop the HSSN deep learning model using training MRI; and (2) apply the test MRI to generate segmented pancreas volume.

Fig. 2.
figure 2

Overview of HSSN model. An encoder stage (5 blocks of HSSN-E) downsamples the MRI input through convolution, batch normalisation and ReLU. A decoder stage (5 blocks of HSSN-D) upsamples its input using the transferred pooling indices from its corresponding encoder to generate sparse feature maps. From here, convolution is performed with a trainable filter of weights to density the feature map. Resulting decoder output feature maps are fed to soft-max classifier for 2-channel pixel-wise classification of the input image as “pancreas” or “non-pancreas”.

2 Methodology

2.1 Training the HSSN

The proposed HSSN model has an encoder-decoder topology, as illustrated in Fig. 2. The decoder network uses max-pooling indices to upsample low-resolution feature maps, consequently retaining high-frequency details to improve pancreatic boundary delineation, and reducing the total number of trainable parameters in the decoders. Unlike other models that have been fine-tuned from pre-trained CNNs using a large number of natural images [3, 13], this network is trained from scratch using exclusively pancreas datasets. Since this organ accounts for \(\sim \)1% in a scan, there is a need to weight the loss differently based on the true class: Median frequency balancing [14] is utilised, in which the weight assigned to a class in the loss function is the ratio of the median of class frequencies computed on the entire training set divided by the class frequency. The HSSN also employs data augmentation of random reflections and translations to reduce overfitting [15], and further address problems caused by high shape variability.

Integrated Hausdorff-Sine Loss Function: A novel loss function is proposed for training the segmentation neural network. The optimisation of the modified Hausdorff distance and a sinusoidal functionality serves to reduce the boundary matching error and “enhance” a resulting pixel-wise pancreas prediction. Let \(T_{H}\) and \(Y_{H}\) represent the ground-truth and network boundary predictions respectively, where \(T_{H},{\textstyle Y_{H}}\subset \mathbb {R}^{n}\) such that \(\left| T_{H}\right| ,\left| Y_{H}\right| <\infty \). Furthermore, \(t_{j}\) and \(y_{j}\in \left\{ 0,1\right\} \) are indexed pixel values in \(T_{H}\) and \(Y_{H}\) respectively, and can be viewed as boundary points. The Euclidean distance between a point \(t_{j}\) and set of points, \(Y_{H}\) is \(s(t_{j},Y_{H}) = \min \limits _{{\scriptstyle y_{j}\in Y_{H}}} \left\| t_{j}-y_{j}\right\| \), and the distance between a point \(y_{j}\) and set of points, \(T_{H}\) is \(s(y_{j},T_{H}) = \min \limits _{{\scriptstyle t_{j}\in T_{H}}}\left\| y_{j}-t_{j}\right\| \). If \(\varepsilon _{Y}=\frac{1}{\left| Y_{H}\right| }\sum \limits _{{\scriptstyle y_{j}\in Y_{H}}}s(t_{j},Y_{H})\) and \(\varepsilon _{T}=\frac{1}{\left| T_{H}\right| }\sum \limits _{{\scriptstyle t_{j}\in T_{H}}}s(y_{j},T_{H})\), the modified Hausdorff distance loss, \(L_{mhd}\) is:

$$\begin{aligned} L_{mhd}=\max \left\{ \varepsilon _{Y},\,\varepsilon _{T} \right\} \end{aligned}$$
(1)

Thus, computing the gradient yields:

$$\begin{aligned} \frac{\partial L_{mhd}}{\partial Y_{H}}={\left\{ \begin{array}{ll} \frac{\partial }{\partial Y_{H}}\left( \varepsilon _{Y}\right) &{} \text {if }\varepsilon _{Y}>\varepsilon _{T}\\ \frac{\partial }{\partial Y_{H}}\left( \varepsilon _{T}\right) &{} \text {if }\varepsilon _{T}<\varepsilon _{Y}\\ \text {undefined} &{} \text {if }\varepsilon _{Y}=\varepsilon _{T} \end{array}\right. } \end{aligned}$$
(2)

An additional sinusoidal component increases non-linearity during network training and, empirically evaluated, raises the true positive predictions. If T and Y represent the ground-truth and network predictions, the loss \(L_{sine}\) is defined:

$$\begin{aligned} L_{sine}=-\frac{1}{\left| Y\right| }\sum _{i=1}^{nC}\sin (T_{i})\log _{2}(Y_{i}) \end{aligned}$$
(3)

where \(nC=2\) is the number of classes (e.g., \(Y_{1}\) refers to “pancreas” and \(Y_{2}\) refers to “non-pancreas”). From here, computing the gradient yields:

$$\begin{aligned} \frac{\partial L_{sine}}{\partial Y_{i}}=-\frac{1}{\left| Y\right| }\frac{\sin (T_{i})}{Y_{i}\log _{10}(2)} \end{aligned}$$
(4)

The model is updated via the combined gradients of \(L_{sine}\) and \(L_{mhd}\).

2.2 Testing Stage

(A) Targeted Pancreas Binary Mask: The trained HSSN model performs pixel-wise prediction on each slice in a test MRI volume to generate a resulting volumetric binary mask (VBM). Columns (a) and (b) in Fig. 3 displays three sample input slices in three different image volumes, and the corresponding positive pancreas region (white mask) as predicted by the HSSN model. The red contouring in each image in column (b) is the ground-truth.

Fig. 3.
figure 3

Visualising proposed approach.

(B) Achieve Dense Contouring: The test MRI volume undergoes non-local means for denoising, after which a learned intensity model incorporates a sigmoid function to exhaustively differentiate pancreatic tissue against background classes. Every \(s_{i}\)-th slice transforms to \(C(s_{i})=1/(1+\exp \left[ g(c\mathtt {-}s_{i})\right] )\), where g controls the actual contrast, and c is the cut-off value representing the (normalised) greyscale value about which g is changed [12, 16]. The VBM is applied to the enhanced image volume and processed through a 3D unsupervised energy-minimisation method via continuous max-flow [17], revealing detailed contouring as highlighted in Fig. 3, column (c). The accurate HSSN predictions reduce the level of non-pancreatic tissue carried into the max-flow segmentation stage, as shown in Fig. 3, column (d), eliminating the need for post-processing.

3 Experimental Results and Analysis

3.1 Datasets and Evaluation

Two expert-led annotated pancreas datasets are utilised. MRI-A and MRI-B contain 180 and 120 abdominal MRI scans (T2-weighted, fat suppressed), which have been obtained using a Philips Intera 1.5T and a Siemens Trio 3T scanner, respectively. Every MRI-A scan has 50 slices, each of size \(384 \times 384\) with spacing 2 mm, and 0.9766 mm pixel interval in the axial and sagittal direction. Every MRI-B scan has 80 slices, each of size \(320 \times 260\) with 1.6 mm spacing, and 1.1875 mm pixel interval in the axial and sagittal direction. The proposed approach is evaluated using the Dice Similarity Coefficient (DSC), precision (PC), recall (RC) and the Hausdorff distance (HSD) representing the maximum boundary deviation between the segmentation and ground-truth.

Fig. 4.
figure 4

Segmentation results in six different MRI scans (volumes). Every column corresponds to a single MRI volume. From left, first row displays sample MRI axial slices with segmentation outcome (green) against ground-truth (red), and computed DSC. Second row displays 3D reconstruction of entire pancreas with computed DSC. (Color figure online)

3.2 Network Implementation

The training and testing data are randomly split into 160 and 20 (MRI-A) and 100 and 20 (MRI-B). The HSSN model employs stochastic gradient descent with parameters momentum (0.9), initial learning rate (0.001), maximum epochs (300) and mini-batch size (10). The mean time for model training is \(\sim \)11 h and the testing phase is \(\sim \)7.5 min per MRI volume using an i7-59-30k-CPU at 3.50 Ghz. Future work can potentially reduce these run-times by a factor of 10 via multiple GeForce Titan X GPUs.

Fig. 5.
figure 5

Box plots of DSC and JI.

3.3 Analysis of Proposed Approach

Figure 4 displays the final segmentation results in six MRI scans, equally split between MRI-A and MRI-B. Columns (a, b, c) are part of MRI-A, yet there is high variation between intensity and contrast in the original axial MRI slices. Columns (d, e, f) corresponds to exemplars from MRI-B. As reflected in Fig. 5, 85% of MRI-A compared to 95% in MRI-B segmentations score above 80% in DSC, demonstrating the robust performance of the approach with respect to poor image quality, intensity distribution and spatial dimensions.

Fig. 6.
figure 6

DSC across threshold ranges [0.05, 0.95] via multiple loss functions.

Fig. 7.
figure 7

Averaged ROC curves via multiple loss functions.

Hausdorff-Sine Loss: Figure 6 compares the segmentation results (in DSC) using Hausdorff-Sine and the loss functions, Hausdorff, Cross-entropy, Dice [18] and Jaccard [19] in the probability range [0.05,0.95]. The cross-entropy penalises true positive predictions, forcing the “optimum” probability to approximately 0.5. Although the Dice loss minimises the class distribution distance, squaring the weights in the backpropagation stage causes instability and a higher rate of false negative predictions. Similarly, the Jaccard loss suffers from low true positive predictions. Empirically tested, the Hausdorff loss minimises the maximum deviation between a prediction and desired outcome; however, the addition of a sinusoidal component increases non-linearity during training, and thus Hausdorff-Sine achieves improved true positive predictions across differing thresholds while delivering strong discrimination of true negatives. The ROC curves in Fig. 7 highlight the inferior performance of other loss functions in the extremely unbalanced segmentation, whereas Hausdorff-Sine generally improves the true positive accuracy results.

Phase B of Testing Stage: Integrating the second phase (B) produces contextual boundary information that is essential for accurate segmentation in biomedical imaging. Figure 3, column (b) and column (e) visibly highlights the differences in segmentation boundary delineation against the ground-truth before and after this phase. Thus, the mean HSD metric confirms less deviation from the ground-truth (see Tables 1 and 2) by approximately 1 mm, and furthermore, the mean DSC raises by approximately 4% in both MRI-A and MRI-B.

Table 1. Deep learning model performance using state-of-the-art loss functions versus the integrated novel Hausdorff and Hausdorff-Sine loss. Datasets MRI-A and MRI-B are evaluated in 9-fold and 6-fold cross-validation (FCV), respectively. DSC, PC, RC and HSD are presented as mean ± standard deviation.
Table 2. DSC, PC, RC and HSD as mean ± standard deviation [lowest, highest] for automatic segmentation methods. Datasets MRI-A and MRI-B are evaluated in 9-fold and 6-fold cross-validation (FCV), respectively.

Comparison with the State-of-the-Art: Table 2 highlights the proposed approach outperforming state-of-the-art methods [8, 10,11,12] in terms of accuracy and statistical stability despite employing non-organ optimised protocol data.

4 Conclusion

This paper presents a novel approach for automatic pancreas segmentation in MRI volumes generated from different scanner protocols. Combined with the proposed Hausdorff-Sine loss, an encoder-decoder network reinforces pancreatic boundary detection in MRI slices, outperforming the rate of true positive predictions compared to multiple loss functions. In the later stage, a 3D hybrid energy-minimisation algorithm addresses the intensity consistency problem that is often the case when segmenting image volumes on a 2D basis. The proposed approach generates quantitative accuracy results that surpass reported state-of-the-art methods, and moreover, preserve detailed contouring.