Keywords

1 Introduction

The incidence of chronic pancreas disease, particularly non-alcoholic fatty pancreas disease (NAFPD), is rising rapidly, reflecting the increasing worldwide prevalence of metabolic disease and obesity [1, 2]. Fat infiltration due to obesity in the pancreas triggers an inflammatory response that can lead to chronic pancreatitis and ultimately to pancreatic cancer. While multiparametric magnetic resonance imaging (MRI) of the liver has become the gold standard tool for early detection, diagnosis and monitoring of chronic disease, this technique has been understudied in the pancreas. Multiparametric MRI provides the advantages of soft tissue contrast, lack of radiation, high accuracy and high precision, even in the most obese patients. This has resulted in the development of liver imaging biomarkers such as corrected T1 (cT1) [3] and proton density fat fraction (PDFF) [4]. This motivates the development of quantitative imaging biomarkers for assessing the pancreas, which is increasingly important as it is fundamentally implicated in obesity-related conditions such as type 2 diabetes (T2D).

Volume [5], morphology [6], T1 [7] or PDFF [8] have been proposed as multiparametric MRI biomarkers for the pancreas. The UK Biobank is a rich resource, currently acquiring dedicated pancreas volumetric and quantitative images from 100,000+ volunteers, alongside other non-imaging data [9], which enables the assessment of the aforementioned biomarkers. UK Biobank imaging data has been used in the past for establishing inter alia reference ranges [10], validation of novel processing methods [11], as well as automated processing and quality control [12].

Since quantitative parametric maps are primarily 2D, recent methods for analysing UK Biobank pancreas MRI data [13, 14] have proposed using the 3D segmentation from a volumetric acquisition to extract a region of interest (ROI) from the 2D quantitative parametric maps. The method proposed in these studies uses the pancreas-specific volumetric scan and the 2D quantitative map derived from the multiecho gradient-recalled echo scan. The DICOM header coordinates are used to intersect the 3D segmentation from the volumetric scan with the 2D quantitative map in the same coordinate space. This approach is appealing over the alternative of segmenting the pancreas on the 2D quantitative maps directly, for several reasons: (1) only one segmentation method may be used for all quantitative map types, as opposed to training (and validating) map-specific methods; (2) fewer annotated subjects are needed in order to obtain a robust segmentation method: a model trained on 2D maps needs to have been exposed to many possible orientations and positions of slices through the pancreas observed in practice, and (3) the segmentation on the volumetric scan may be used for pancreas volumetry and morphometry, which provide insight into the pathophysiology of T2D.

However, the volumetric and slice acquisitions are acquired in separate breath-holds (usually 4 to 5 min apart in the case of T1), which may lead to misalignment between the scans due to breathing motion or scan re-positioning. Larger, more regular structures than the pancreas, such as the liver, may not be affected by misalignment to the same extent, and using the DICOM header position may be a reasonable approach. However, the pancreas is a small, irregularly-shaped organ that can vary significantly in appearance with changes in the viewing slice. Even slight motions cause the pancreas to move outside reference planes. In this context, using only the DICOM header information for alignment may be insufficient (Fig. 1, top); we will refer to this method as No Registration in this paper.

No considerations of inter-scan misalignment have previously been made for pancreas MRI analysis in the UK Biobank. We conjecture that accurate alignment of the volumetric and the slice acquisitions may improve quantitative pancreas analysis downstream, for example T1 and PDFF quantification. In this study, we propose using Slice-to-Volume Registration prior to ROI extraction and quantitative reporting. Slice-to-Volume Registration (SVR) aims to align data from more than one scan, one consisting of a planar acquisition (slice), and the other consisting of a 3D volume. SVR has been used in multiple applications, from real-time surgical navigation to volume reconstruction, with rigid registration and iconic matching criteria being the most commonly used SVR strategy [15]. Recent advances have shown the potential of deep learning within SVR, for instance used as feature extraction mechanisms towards robust matching criteria, or to drive SVR optimisation as a whole [16].

In this paper, we show an example implementation of automated SVR with a multimodal similarity criterion. We apply the method to pancreatic T1 scans, and demonstrate more accurate T1 quantification in the UK Biobank imaging substudy.

Fig. 1.
figure 1

An example of the need for alignment. (Top) Using No Registration (i.e. only using the DICOM header position). The segmentation resampled from the pancreas-specific volume (PSV) to the T1 slice is not aligned with the pancreas boundary, so T1 quantification is inaccurate. (Bottom) Using SVR-SSC. The output segmentation is aligned with the pancreas on the T1 slice. The method’s optimal z translation was +10 mm from the DICOM header position. The cost evaluation mask used to evaluate image similarity for all PSV slices is shown as the red contour. The pancreas subcomponents head (blue), body (green), and tail (yellow), are shown on the PSV candidates. (Color figure online)

2 Materials and Methods

2.1 UK Biobank Data

In this work, we present an exemplary implementation of SVR for UK Biobank where the 3D volume used was the pancreas-specific volumetric scan (named “Pancreas fat - DICOM", Data-Field 20202 on the UK Biobank ShowcaseFootnote 1), that we refer to in this work as PSV. As the 2D image, we used the T1 map derived from the Shortened Modified Look-Locker Inversion recovery (ShMoLLI) scan (named “Pancreas Images - ShMoLLI", Data-Field 20259 on the UK Biobank Showcase). The T1 map was computed using a proprietary algorithm from Perspectum Ltd, previously used to compute T1 maps for UK Biobank ShMoLLI images of the liver [3].

Both scans were acquired using a Siemens Aera 1.5T (Siemens Healthineers AG, Erlangen, Germany). The PSV imaging data was acquired using the FLASH-3D acquisition, echo time (TE)/repetition time (TR) = 1.15/3.11 ms, voxel size = 1.1875 \(\times \) 1.1875 \(\times \) 1.6 mm, with 10\(^{\circ }\) flip angle and fat suppression. The PSV scans were resampled to 2 mm isotropic using linear interpolation. The ShMoLLI imaging data was collected using the same parameters than for the liver ShMoLLI, with voxel size = 1.146 \(\times \) 1.146 \(\times \) 8 mm, TE/TR = 1.93/480.6 ms, 35\(^{\circ }\) flip angle [9], and often had oblique orientation to better capture the pancreas. Only data from the first imaging visit (Instance 2) were used.

3D pancreas segmentations were predicted on the PSVs using the implementation of U-Net described in [17]. 14,439 subjects with both T1 and PSV scans were processed using (1) No Registration and (2) the proposed SVR method.

2.2 Slice-to-Volume Registration Method

We have observed that in practice translations in Z are the most prominent source of misalignment in the data. The SVR method we used is based on initial affine alignment in the XY plane, and posterior exhaustive search along the Z direction, above and below the DICOM header reference position (z = 0). At each z, the method evaluates image similarity between the resampled 3D volume (moving image) at z and the quantitative 2D slice (fixed image). The method then chooses the z that gives the highest image similarity between images, as illustrated in Fig. 2.

Fig. 2.
figure 2

(Top) Illustration of the Slice-to-Volume Registration procedure. The resampled pancreas-specific volume (PSV) candidates generated using resampling along Z are shown. In yellow is the chosen PSV candidate slice that gives highest image similarity with the quantitative slice (lowest registration cost). The corresponding resampled segmentation is chosen as the output segmentation for the quantitative slice. (Bottom) Alignment of the 3D PSV with the T1 slice and intersection of the segmentation from the 3D volume scan. The oblique T1 slice may extend beyond the PSV bounding box, which may cause missing data in the resampled candidates. (Color figure online)

Resample Volume into ‘Candidate’ Slices. We evaluated similarity over a Z range that was computed from the volume total height (mm) as follows: (voxel Z resolution in mm \(\times \) number of voxels in Z )/4. The Z range used was the same for all subjects. We incremented Z in steps of 1 mm. In this work, using the PSV as our volume rendered the SVR method’s search range over Z = (2 mm \(\times \) 42 )/4 = [−21, 21] mm.

At each z, the 3D PSV was resampled by considering the slice acquisition profile, since the T1 slice thickness is greater than the PSV slice thickness (8 mm and 2 mm, respectively). Resampled PSV slices across the T1 8 mm slice profile were weighted by a gaussian function (standard deviation of 1.4 mm) before being merged together. The corresponding 3D segmentation was resampled using nearest neighbour interpolation.

Initial Within-Plane Alignment. An initial within plane registration step was carried out to align the the body contour in the T1 slice and PSV data. This allows to better compare candidate slices in the subsequent Z alignment step. The initial affine registration in XY was performed using the ‘multimodal’ configuration in Matlab R2019b imregister function, which uses Mattes Mutual Information as similarity metric. For this initial XY alignment, we used the resampled PSV at z = 0 mm and the T1 slice as the moving and fixed images, respectively. The resulting affine transformation in XY was subsequently applied to each resampled PSV ‘candidate’ along Z.

Evaluate Similarity over Z Range. For the exhaustive search along Z, image similarity at each z was evaluated only within a predefined cost evaluation mask, common to all ‘candidates’ (resampled PSV images). The cost evaluation mask was computed as the intersection of the body masks from all candidates, which were computed using a simple thresholding operation (>10). We evaluated similarity over the computed Z range, and the inverse of similarity was used as the cost function for alignment (see Fig. 2).

We explored 2 different similarity metrics: normalised mutual information (NMI) and Self-Similarity Context descriptors (SSC) based similarity from Heinrich et al. [18]. The SSC-based similarity computes the inverse of the squared differences between the SSC descriptors of image 1 and image 2 within the cost evaluation mask. This rendered two methods that we will refer to as SVR-NMI and SVR-SSC in the text.

Following the above steps, the z position for which the resampled 3D PSV gave the highest similarity (i.e., lowest cost) was selected, and the corresponding resampled 3D segmentation on the 2D slice was calculated. This output segmentation may be used for subsequent pancreas quantification on the 2D slice, for instance extracting global descriptive statistics (such as mean or median) or local measurements.

Figure 1 shows an example case after our method has been applied, including comparisons between using No Registration and using the SVR-SSC approach. Note the entire body of the pancreas (green) is missing from the No Registration candidate alignment, while it is present on the T1 map.

2.3 SVR Implementation and Inference at Scale

The implementation of the SVR method is available on GitHubFootnote 2. The Matlab R2019b Compiler tool was used to package our SVR implementation into a Matlab application. The application was moved into a Docker image containing the compatible version of Matlab RuntimeFootnote 3. This enabled us to run the SVR method using Docker containers, in parallel and at scale on Amazon Web Services (AWS) EC2 instances, in order to process the large UK Biobank data set.

2.4 Automated Quality Control

For T1 quantification results in this paper, we opted for the SVR-SSC method due to its higher robustness in validation (see Sect. 3.2). The 14,439 subjects were processed by a quality control (QC) pipeline prior to T1 quantification. We first excluded those subjects where the resulting 2D pancreas mask was empty in either of the 2 methods. This was mostly due to the initial automated segmentation prediction being empty, for instance in the presence of imaging artefacts. For the remaining N = 13,845 subjects, we quality controlled the data by excluding those that met the following QC exclusion criteria: output 2D mask size<1000 pixels using the No Registration method (N = 3,014 excluded), percent overlap between the PSV at z = 0 and the T1 slice<50% (N = 2,150 excluded), or non-axial T1 acquisitions (see examples in Fig. 3) (N = 106 excluded). We also excluded those subjects where the SVR-SSC method gave an output z>10 mm (N = 317), in order to ensure there was enough overlap between the volume and the slice scans to compute image similarity. Figure 4 shows the histogram of output optimum z displacements for all subjects prior to this QC step. N = 8,829 subjects remained for analysis. Note that a given subject may be excluded using more than 1 of these QC exclusion criteria.

Fig. 3.
figure 3

Example of T1 slices from 2 different subjects where the primary view was coronal. These subjects were excluded from analysis.

Fig. 4.
figure 4

Histogram of output optimum z-displacement by the SVR-SSC algorithm, showing a median displacement of +1 mm.

2.5 SVR Validation

We performed direct validation of SVR using the alignment error from manually annotated reference positions. We also performed indirect validation by assessing quality of the output 2D segmentation, compared to manual delineations performed on the 2D slice directly.

Direct Validation Using Alignment Error. For alignment error, we selected the N = 50 most outlying cases from our T1 quantification experiment (see Sect. 3.1). We then manually annotated the reference z position that corresponded to the best alignment between the resampled PSV and T1 slice. Note the reference z is expressed relative to the DICOM header position. We then computed the alignment error of each method compared to the manually annotated z position. We did not consider translations in X or Y for this experiment.

Indirect Validation Using Segmentations. We had an available dataset of N = 157 from UK Biobank of healthy and diabetic subjects where manual delineations of pancreas had been performed on the T1 maps directly. These data had been quality-checked also (see Sect. 2.4). We evaluated output segmentation quality via overlap and surface distance measures, namely Dice Similarity Coefficient (DSC) overlap and 95th percentile Hausdorff Distance (HD), respectively. The output 2D pancreas masks of all methods (No Registration, SVR-NMI, SVR-SSC) were compared to the reference manual delineations.

Fig. 5.
figure 5

Bland-Altman density plot showing T1 quantification using No Registration and the proposed SVR-SSC method. T1 is reported as the median of the output 2D pancreas segmentation for each method.

3 Results

3.1 T1 Quantification: No Registration vs SVR-SSC

We compared the T1 quantification results of the SVR-SSC method vs No Registration. Figure 5 shows the Bland-Altman density plot for N = 8,829 subjects, where T1 was quantified for each method using the median of the output segmentation. While the bias was small at 1.4 ms, the observed variability between No Registration T1 and SVR-SSC T1 of 53.3 ms renders the two approaches not equivalent for pancreatic T1 quantification at the subject level. We further explored these differences in our validation experiments (see Sect. 3.2), in order to determine which method had performed best.

3.2 SVR Validation

Direct Validation Using Alignment Error. Figure 6 shows the alignment error in mm of each method in the selected N = 50 subset of the most outlying cases from our T1 quantification experiment (see Sect. 3.1). The median ± std for each method were: No Registration (8 ± 4.0) mm, SVR-NMI (3 ± 5.5) mm, SVR-SSC (3 ± 4.2) mm. The alignment error was substantially reduced using any of the two SVR implementations compared to No Registration. The differences between any of the two implementations vs No Registration were statistically significant (paired t-test, p<1e−3 for SVR-NMI, p<1e−3 for SVR-SSC). The differences between the two SVR methods, using 2 different similarity metrics, were not statistically significant (p = 0.73), though the SVR-SSC method appeared more robust to challenging examples.

Fig. 6.
figure 6

Validation of SVR using alignment error in mm relative to the manually obtained reference positions. The alignment error was substantially reduced using any of the two SVR implementations compared to No Registration.

Fig. 7.
figure 7

Indirect validation of SVR was performed by assessing the quality of the output segmentations, in terms of Dice Similarity Coefficient (DSC) overlap (left) and 95th percentile Hausdorff surface distance (HD, right). The boxplots show summarised metrics for all three methods considered: No Registration, SVR-NMI and SVR-SSC.

Indirect Validation Using Segmentations. Figure 7 shows the segmentation quality of the three methods for the selected N = 50 subset of the most outlying cases from our T1 quantification experiment (see Sect. 3.1). We used DSC overlap and 95% HD metrics. The DSC median ± std for each method were: No Registration 0.799 ± 0.114, SVR-NMI 0.819 ± 0.118, SVR-SSC 0.822 ± 0.098. The differences between any of the two implementations vs No Registration were statistically significant (paired t-test, p = 0.0069 for SVR-NMI, p<1e-3 for SVR-SSC). The differences between the two SVR methods, using 2 different similarity metrics, were not statistically significant (p = 0.1614), though the SVR-SSC method appeared more robust to challenging examples, based on Fig. 8.

The 95% HD median ± std for each method were: No Registration (4.123 ± 11.875) mm, SVR-NMI (3.606 ± 12.026) mm, SVR-SSC (3.606 ± 11.154) mm. The differences between any of the two implementations vs No Registration were statistically significant (paired t-test, p = 0.0122 for SVR-NMI, p = 0.0054 for SVR-SSC). The differences between the two SVR methods, using 2 different similarity metrics, were not statistically significant (p = 0.83).

Fig. 8.
figure 8

Output segmentation quality compared to a manual annotation: a challenging example. The manual annotation is shown in yellow, while the predicted segmentation for each method is shown in red. Measured DSC for No Registration, SVR-NMI and SVR-SSC were 0.506, 0.689 and 0.739, respectively. Upon closer inspection, arguably the disconnected component is part of the pancreas, but the annotator might have excluded it from the quantifiable mask. T1 colormap was set to gray to increase visibility of the segmentation contours. (Color figure online)

4 Discussion and Conclusions

In this work, we report a method for Slice-to-Volume Registration (SVR) that enables automated pancreas segmentation and accurate downstream quantification in multiparametric MRI protocols like UK Biobank. We showed that SVR improved T1 segmentation quality (evaluated using overlap with manual annotations (Sect. 3.2)) as well as improved pancreas T1 quantification for an individual in UK Biobank. To our knowledge, this study is the first to report on utilising SVR for deriving pancreas MRI biomarkers from the UK Biobank. As discussed previously in [13], such a pipeline means that a segmentation model has to be built only for the 3D volumetric scan, which removes the need for generating new annotated data sets in order to segment each quantitative slice type.

In this study, we chose to use the 3D PSV scan for initial segmentation, and the 2D T1 map as our target quantitative slice to segment. However, the method is compatible with other 3D and 2D image types in UK Biobank, for instance the stitched 3D Whole-Body scans (named “Dixon technique for internal fat - DICOM", Data-Field 20201 on the UK Biobank Showcase) or the 2D quantitative PDFF maps derived from the multiecho gradient-recalled echo scan (Data-Field 20260). The choice of a feature-based similarity metric originally proposed in the context of multimodal image registration, the Self-Similarity Context descriptors (SSC) [18], was shown to work robustly on non-quantitative (3D PSV data) and quantitative (2D T1 data) scans, and may transfer well to other data types.

The limitation of choosing the dedicated pancreas volume scan as our moving image was its limited total Z coverage of 84 mm. In practice, inter-scan motion may exceed this range. Furthermore, obliquely placed T1 slices (acquired to traverse the pancreas longitudinally) extended beyond the PSV bounding box at the Z range extrema (see Fig. 2, bottom), causing the PSV candidates at those extrema to have missing data (see Fig. 2, top), which in turn raised a set of challenges we have sought to address.

First, the cost evaluation mask, computed as the intersection of the body masks from all slices, was small (see Fig. 1, red contour). In those cases, the cost evaluation mask does not optimally include all the potentially useful image features for registration. This could have led to more noisy image similarity measurements and have introduced spurious local minima into the cost function. The alternative of considering a different cost evaluation mask for each z independently could have led to unfair comparisons of cost when selecting the global minimum. Second, we initially had considered an independent affine XY transformation for each z. However, candidates with missing data at the Z range extrema misled the registration procedure by producing a more noisy cost function. Using the same affine XY transformation on all candidates, obtained from the resampled candidate at z = 0 (avoiding missing data), addressed this problem, but could have introduced error.

Future work will use the Whole-Body scans from UK Biobank as our moving images for SVR. This will extend the SVR Z range and could lead to more robust estimates of similarity, as well as improved alignment in XY. However, obtaining robust 3D pancreas segmentations from the Whole-Body scans is challenging, as discussed in [14], since they are lower resolution compared to the dedicated pancreas volumes used in this work.

The proposed SVR methodology did not consider local deformations or deformations through the image plane, which can be expected in the abdominal region when the images are taken during different breath-holds. Our method’s performance using axial affine registration and rigid alignment in Z encourages more advanced deformable image registration approaches, for instance those focusing on the organ surroundings rather than a global whole-body cost mask.

The exclusion criteria that we used during QC caused that nearly 39% of cases were excluded from reporting, which could have biased the comparison between the methods. This was due in part by upstream method failure, for instance where the segmentation model produced empty predictions or masks with a low number of voxels. We observed this effect mainly on images that contained image artefacts, such as wrap-around. Furthermore, we expect that the future work described above will increase our method’s throughput, notably when using volumes with higher coverage since they will lead to full pancreas segmentations as well as to an extended Z range for SVR optimisation.

The small bias at 1.4 ms when comparing T1 quantification differences between No Registration and SVR-SSC indicates that we may be able to use No Registration for T1 quantification at the population level, for instance for median pancreatic T1 in the UK Biobank. However, for individual subjects, or even for comparisons between relatively small groups (such as type 2 diabetics in UK Biobank), the clinically significant variability between methods renders the No Registration approach insufficient.

Moreover, note we quantified T1 for each method as the median of the output 2D segmentation, a relatively robust metric to outlier values. However, researchers often make distinctions in quantification between pancreatic head, body and tail [19, 20]. When comparing medians of pancreas subsegments, we expect that small misalignments will amplify quantification differences between methods. This will further increase the need for registration in order to obtain accurate pancreas quantification in UK Biobank for given individuals. We have recently developed the first fully automated method for pancreas subsegmentation into head, body and tail [21] that, combined with SVR, will enable accurate MRI biomarker quantification regionally.