1 Introduction

Pancreatic ductal adenocarcinoma (PDAC) is the 4th most common cancer of death with an overall five-year survival rate of 8%. Currently, detection or segmentation at localized disease stage followed by complete resection can offer the best chance of survival, i.e., with a 5-year survival rate of 32%. The accurate segmentation of PDAC mass is also important for further quantitative analysis, e.g., survival prediction [1]. Computed tomography (CT) is the most commonly used imaging modality for the initial evaluation of PDAC. However, textures of PDAC on CT are very subtle (Fig. 1) and therefore can be easily neglected by even experienced radiologists. To our best knowledge, the state-of-the-art on this matter is [17], which only reports an average Dice of 56.46%. For better detection of PDAC mass, dual-phase pancreas protocol using contrast-enhanced CT imaging, which is comprised of arterial and venous phases with intravenous contrast delay, are recommended.

Fig. 1.
figure 1

Visual comparison of arterial and venous images (after alignment) as well as the manual segmentation of normal pancreas tissues (yellow), pancreatic duct (purple) and PDAC mass (green). Orange arrows indicate the ambiguous boundaries and differences of the abnormal appearances between the two phases. Best viewed in color. (Color figure online)

In recent years, deep learning has largely advanced the field of computer-aided diagnosis (CAD), especially in the field of biomedical image segmentation [4, 10, 11, 16]. However, there are several challenges for applying existing segmentation algorithms to dual-phase images. Firstly, these algorithms are optimized for segmenting only one type of input, and therefore cannot be directly applied to handle multi-phase data. More importantly, how to properly handle the variations between different views requires a smart information exchange strategy between different phases. While how to efficiently integrate information from multi-modalities has been widely studied [3, 6, 15], the direction on learning multi-phase information has been rarely explored, especially for tumor detection and segmentation purposes.

To address these challenges, we propose a multi-phase segmentation algorithm, Hyper-Pairing Network (HPN), to enhance the segmentation performance especially for pancreatic abnormality. Following HyperDenseNet [3] which is effective on multi-modal image segmentation, we construct a dual-path network for handling multi-phase data, where each path is intended for one phase. To enable information exchange between different phases, we apply skip connections across different paths of the network [3], referred as hyper-connections. Moreover, by noticing that a standard segmentation loss (cross-entropy loss, Dice loss [8]) only aims at minimizing the differences between the final prediction and the groundtruth thus cannot well handle the variance between different views, we introduce an additional pairing loss term to encourage the commonality between high-level features across both phases for better incorporation of multi-phase information. We exploit three structures together in HPN including PDAC mass, normal pancreatic tissues, and pancreatic duct, which serves as an important clue for localizing PDAC. Extensive experiments demonstrate that the proposed HPN significantly outperforms prior arts by a large margin on all 3 targets.

2 Methodology

We hereby focus on dual-phase inputs while our approach can be generalized to multi-phase scans. With phase A and aligned phase B by the deformable registration, we have the set \({\mathcal {S}} = \{\left( {\mathbf {X}}_{i}^\text {A}, {\mathbf {X}}_{i}^\text {B}, {\mathbf {Y}}_{i}\right) |i=1,...,M\}\), where \(\text {X}^\text {A}_i\in {{\mathbb {R}}^{W_i\times H_i\times L_i}}\) is the i-th 3D volumetric CT images of phase A with the dimension \(\left( W_i\times H_i\times L_i\right) = {\mathcal {D}}_{i}\) and \(\text {X}^\text {B}_i\in {{\mathbb {R}}}^{{\mathcal {D}}_{i}}\) is the corresponding aligned volume of phase B. \({\mathbf {Y}}_i = \{ y_{ij} | j=1,..., {\mathcal {D}}_{i}\}\) denotes the corresponding voxel-wise label map of the i-th volume, where \(y_{ij}\in {\mathcal {L}}\) is the label of the j-th voxel in the i-th image, and \({\mathcal {L}}\) denotes the label of the target structures. In this study, \({\mathcal {L}}\) = {normal pancreatic tissues, PDAC mass, pancreatic duct}. The goal is to learn a model to predict label of each voxel \(\hat{{\mathbf {Y}}}={f({\mathbf {X}}^\text {A}, {\mathbf {X}}^\text {B})}\) by utilizing multi-phase information.

Fig. 2.
figure 2

(a) The single path network where only one phase is used. The dash arrows denote skip connections between low-level features and high-level features. (b) HPN structure where multiple phases are used. The black arrows between the two single path networks indicate hyper-connections between the two streams. An additional pairing loss is employed to regularize view variations, therefore can benefit the integration between different phases. Blue and pink stand for arterial and venous phase, respectively. (Color figure online)

2.1 Hyper-connections

Segmentation networks (e.g., UNet [2, 10], FCN [7]) usually contain a contracting encoder part and a successive expanding decoder part to produce a full-resolution segmentation result as illustrated in Fig. 2(a). As the layer goes deeper, the output features evolve from low-level detailed representations to high-level abstract semantic representations. The encoder part and the decoder part share an equal number of resolution steps [2, 10].

However, this type of network can only handle single-phase data. We construct a dual path network where each phase has a branch with a U-shape encoder-decoder architecture as mentioned above. These two branches are connected via hyper-connections which enrich feature representations by learning more complex combinations between the two phases. Specifically, hyper-connections are applied between layers which output feature maps of the same resolution across different paths as illustrated in Fig. 2(b). Let \(\mathbf{R }_{1}, \mathbf{R }_{2},..., \mathbf{R }_{\text {T}}\) denote the intermediate feature maps of a general segmentation network, where \(\mathbf{R }_{t}\) and \(\mathbf{R }_{\text {T} - t}\) share the same resolution (\(\mathbf{R }_{t}\) is on the encoder path and \(\mathbf{R }_{\text {T} - t}\) is on the decoder path). Hyper-connections are applied as follows: \(\mathbf{R }^\text {A}_{t}\longrightarrow \mathbf{R }^\text {B}_{t }\), \(\mathbf{R }^\text {B}_{t}\longrightarrow \mathbf{R }^\text {A}_{t }\), \(\mathbf{R }^\text {A}_{t}\longrightarrow \mathbf{R }^\text {B}_{\text {T} - t}\), \(\mathbf{R }^\text {B}_{t}\longrightarrow \mathbf{R }^\text {A}_{\text {T} - t}\), \(\mathbf{R }^\text {A}_{\text {T} - t}\longrightarrow \mathbf{R }^\text {B}_{\text {T} - t}\), \(\mathbf{R }^\text {B}_{\text {T} - t}\longrightarrow \mathbf{R }^\text {A}_{\text {T} - t}\), while maintaining the original skip connections that already occur within the same path, i.e., \(\mathbf{R }^\text {A}_{t}\longrightarrow \mathbf{R }^\text {A}_{\text {T} - t}\), \(\mathbf{R }^\text {B}_{t}\longrightarrow \mathbf{R }^\text {B}_{\text {T} - t}\).

2.2 Pairing Loss

The standard loss for segmentation networks only aims at minimizing the difference between the groundtruth and the final estimation, which cannot well handle the variance between different views. Applying this loss alone is inferior in our situation since the training process involves heavy integration of both arterial information and venous information. To this end, we propose to apply an additional pairing loss, which encourages the commonality between the two sets of high-level semantic representations, to reduce view divergence.

We instantiate this additional objective as a correlation loss [13]. Mathematically, for any pair of aligned images (\(\text {X}^\text {A}_i\), \(\text {X}^\text {B}_i\)) passing through the corresponding view sub-network, the two sets of high-level semantic representations (feature responses in later layers) corresponding to the two phases are denoted as \(f_1(\text {X}^\text {A}_i; \varvec{\Theta }_1)\) and \(f_2(\text {X}^\text {B}_i; \varvec{\Theta }_2)\), where the two sub-networks are parameterized by \(\varvec{\Theta }_1\) and \(\varvec{\Theta }_2\) respectively. The outputs of two branches will be simultaneously fed to the final classification layer. In order to better integrate the outcomes from the two branches, we propose to use a pairing loss which exploits the consensus of \(f_1(\text {X}_i^\text {A}; \varvec{\Theta }_1)\) and \(f_2(\text {X}_i^\text {B}; \varvec{\Theta }_2)\) during training. The loss is formulated as following:

(1)

where N denotes the total number of voxels in the i-th sample and \(\varvec{\Theta }\) denotes the parameters of the entire network. During the training stage, we impose this additional loss to further encourage the commonality between the two intermediate outputs. The overall loss is the weighted sum of this additional penalty term and the standard voxel-wise cross-entropy loss:

(2)

where \(p^k_{ij}\) denotes the probability of the j-th voxel be classified as label k on the i-th sample and \(\mathbbm {1} (\cdot )\) is the indicator function. K is the total number of classes. The overall objective function is optimized via stochastic gradient descent.

3 Experiments

3.1 Experiment Setup

Data Acquisition. This is an institutional review board approved HIPAA compliant retrospective case control study. 239 patients with pathologically proven PDAC were retrospectively identified from the radiology and pathology databases from 2012 to 2017 and the cases with \(\le \)4 cm tumor (PDAC mass) diameter were selected for the experiment. PDAC patients were scanned on a 64-slice multidetector CT scanner (Sensation 64, Siemens Healthineers) or a dual-source multidetector CT scanner (FLASH, Siemens Healthineers). PDAC patients were injected with 100–120 mL of iohexol (Omnipaque, GE Healthcare) at an injection rate of 4–5 mL/sec. Scan protocols were customized for each patient to minimize dose. Arterial phase imaging was performed with bolus triggering, usually 30 s post-injection, and venous phase imaging was performed 60 s.

Evaluation. Denote \({\mathcal {Y}}\) and \({\mathcal {Z}}\) as the set of foreground voxels in the ground-truth and prediction, i.e., \({{\mathcal {Y}}}={\left\{ i\mid y_i=1\right\} }\) and \({{\mathcal {Z}}}={\left\{ i\mid z_i=1\right\} }\). The accuracy of segmentation is evaluated by the Dice-Sørensen coefficient (DSC): \({\mathrm {DSC}\,\left( {\mathcal {Y}},{\mathcal {Z}}\right) }= {\frac{2\,\times \,\left| {\mathcal {Y}}\,\cap \,{\mathcal {Z}}\right| }{\left| {\mathcal {Y}}\right| \,+\,\left| {\mathcal {Z}}\right| }}\). We evaluate DSCs of all three targets, i.e., abnormal pancreas, PDAC mass and pancreatic duct. All experiments are conducted by three-fold cross-validation, i.e., training the models on two folds and testing them on the remaining one. Through our experiment, abnormal pancreas stands for the union of normal pancreatic tissues, PDAC mass and pancreatic duct. The average DSC of all cases as well as the standard deviations are reported.

3.2 Implementation Details

Our experiments were performed on the whole CT scan and the implementations are based on PyTorch. We adopt a variation of diffeomorphic demons with direction-dependent regularizations [9, 12] for accurate and efficient deformable registration between the two phases. For data pre-processing, we truncated the raw intensity values within the range [−100, 240] HU and normalized each raw CT case to have zero mean and unit variance. The input sizes of all networks are set as \(64\times 64\times 64\). The coefficient of the correlation loss \(\lambda \) is set as 0.5. No further post-processing strategies were applied.

We also used data augmentation during training. Different from single-phase segmentation which commonly uses rotation and scaling [5, 17], virtual sets [14] are also utilized in this work. Even though arterial and venous phase scanning are customized for each patient, the level of enhancement can be different from patients by variation of blood circulation, which causes inter-subject enhancement variations on each phase. Therefore we construct virtual examples by interpolating between venous and arterial data, similar to [14]. The i-th augmented training sample pair can be written as: \(\tilde{\text {X}}^\text {A}_i = \lambda \text {X}^\text {A}_i + (1 - \lambda ) \text {X}^\text {B}_i, \quad \tilde{\text {X}}^\text {B}_i = \lambda \text {X}^\text {B}_i + (1 - \lambda ) \text {X}^\text {A}_i,\) where \(\lambda \sim \text {Beta}(\alpha , \alpha ) \in [0, 1]\). The final outcome of HPN is obtained by taking the union of predicted regions from models trained with the original paired sets and the virtual paired sets. We set the hyper-parameter \(\alpha = 0.4\) following [14].

Table 1. DSC (%) comparison of abnormal pancreas, PDAC mass and pancreatic duct. We report results in the format of mean ± standard deviation.

3.3 Results and Discussions

All results are summarized in Table 1. We compare the proposed HPN with the following algorithms: (1) single-phase algorithms which are trained exclusively on one phase (denoted as “single-phase”); (2) multi-phase algorithm where both arterial and venous data are trained using a dual path network bridged with hyper connections (denoted as “HyperNet”). In general, compared with single-phase algorithms, multi-phase algorithms (i.e., HyperNet, HPN) observe significant improvements for all target structures. It is no surprise to observe such a phenomenon as more useful information is distilled for multi-phase algorithms.

Efficacy of Hyper-connections. To show the effectiveness of hyper-connections, output from different phases (using single-phase algorithms) are fused by taking at each position the average probability (denoted as “fusion”). However, we observe that simply fusing the outcomes from the different phases usually yield either similar or slightly better performances compared with single-phase algorithms. This indicates that simply fusing the estimations during the inference stage cannot effectively integrate multi-phase information. By contrast, hyper-connections enable the training process to be communicative between the two phase branches and thus can efficiently elevate the performance. Note that directly applying [3] yield unsatisfactory results. Our hyper-connections are not densely connected but are carefully designed based on previous state-of-the-art on PDAC segmentation [17] for better segmentation of PDAC. Meanwhile, we show much better performance of 63.94% compared to 56.46% reported in [17].

Fig. 3.
figure 3

Qualitative comparison of different methods, where HPN enhances PDAC mass segmentation (green) significantly compared with other methods. (Best viewed in color) (Color figure online)

Fig. 4.
figure 4

Qualitative example where HPN detects the PDAC mass (green) while single-phase methods for both phases fail. From left to right: venous and arterial images (aligned), groundtruth, predictions of single-phase algorithms, HyperNet prediction, HPN prediction (overlayed with venous and arterial images). (Best viewed in color) (Color figure online)

Efficacy of Data Augmentation. From Table 1, compared with HyperNet, HyperNet-aug witnesses performance gain especially for PDAC mass (i.e., from 60.87% to 61.69% for 3D-ResDSN; from 54.36% to 55.72% for 3D-UNet), which validates the usefulness of using virtual paired sets as data augmentation.

Efficacy of HPN. We can observe additional benefit of our HPN over hyperNet-aug (e.g., abnormal pancreas: 85.87% to 86.65%, PDAC mass: 61.69% to 63.94%, pancreatic duct: 54.07% to 56.77%, 3D-ResDSN). Overall, HPN observes an evident improvement compared with HyperNet, i.e., abnormal pancreas: 85.79% to 86.65%, PDAC mass: 61.69% to 63.94%, pancreatic duct: 54.07% to 56.77% (3D-ResDSN). The p-values for testing significant difference between hyperNet and our HPN of all 3 targets are \(p < 0.0001\), which suggests a general statistical improvement. We also show two qualitative examples in Fig. 3, where HPN shows much better segmentation accuracy especially for PDAC mass.

Another noteworthy fact is that 11/239 cases are false negatives which failed to detect any PDAC mass using either phase (Dice = 0%). Out of these 11 cases, 7 cases are successfully detected by HPN. An example is shown in Fig. 4—the PDAC mass is missing from both single phases and almost missing in the original HyperNet (DSC = 0.27%), but our HPN can detect a reasonable portion of the PDAC mass (DSC = 61.5%).

The deformable registration error by computing pancreas surface distances between two phases is \(1.01\pm 0.52\) mm (mean ± standard deviations) which can be considered as acceptable for this study. However, the effects between different alignments can be described as a further study.

4 Conclusions

Motivated by the fact that radiologists usually rely on analyzing multi-phase data for better image interpretations, we develop an end-to-end framework, HPN, for multi-phase image segmentation. Specifically, HPN consists of a dual path network where different paths are connected for multi-phase information exchange, and an additional loss is added for removing view divergence. Extensive experiment results demonstrate that the proposed HPN can substantially and significantly improve the segmentation performance, i.e., HPN reports an improvement up to 7.73% in terms of DSC compared to prior arts which use single phase data. In the future, we plan to examine the behaviour of HPN when using different alignment strategies and try to extend the current approach to other multi-phase learning problems.