Introduction

Chronic pancreatitis (CP) involves progressive fibro-inflammatory changes of the pancreatic gland leading to fibrosis, calcifications, atrophy, fat infiltration, and ductal distortion [1, 2]. The clinical presentation of CP includes chronic visceral pain, impairment of exocrine and endocrine functions, reduced quality of life, etc. [3]. Recently, several publications and guidelines have highlighted the clinical need to detect morphological alterations of the gland at an early stage and to monitor/predict progression of CP in order to improve disease outcomes [1, 4,5,6,7,8]. In this context, magnetic resonance imaging (MRI) seems promising and allows quantitative assessment of image data that reflects fibrosis, atrophy, fat infiltration, and ductal changes [9,10,11,12,13,14]. Especially the development of pancreatic fibrosis seems to play a key role in the early development of CP [1]. Conventional imaging analysis with assessment of mean apparent diffusion coefficient (ADC) from diffusion weighted imaging (DWI) has shown to correlate well with the histological degree of pancreatic fibrosis [15]. However, the mean ADC values in single pancreatic region of interest may not be an optimal estimation of the true histological tissue composition. Hence, more advanced and detailed quantitative image analysis, such as image texture analysis, could be of relevance.

Texture analysis of medical imaging has recently emerged as a promising tool for quantitative assessment of solid abdominal organs, such as the liver [16,17,18]. By advanced analysis of textural information in the image, quantitative information is obtained on tissue heterogeneity, including distribution of and relationship between voxel intensities [19]. This technique can provide additional information compared to clinical morphological and intensity-based image analysis and may describe more subtle changes in tissue structures compared to what is visible by the human eye. Additionally, it may describe changes in tissue characteristics that to a higher degree reflect the true histological composition, as compared to conventional imaging assessments. Accordingly, texture features obtained from MRI of the liver have shown fair accuracy for the diagnosis and classification of hepatic fibrosis [16,17,18].

To our best knowledge, MRI texture analysis has never been performed on the pancreas in patients with CP. However, pancreatic texture analysis applied in computed tomography (CT) images can differentiate patients with functional abdominal pain, recurrent acute pancreatitis, and CP [20]. Supported by the abovementioned results seen for hepatic fibrosis, MRI texture analysis could also be relevant for more detailed characterization of the pancreatic gland. Texture analysis of DWI could be a highly relevant research tool for more detailed pancreatic assessment, potentially giving more information on especially fibrosis.

This explorative feasibility and validation study was performed to address the use of MRI texture analysis of the pancreas in patients with CP. We hypothesized that: (1) Pancreatic MRI texture analysis of DWI is feasible, and (2) the extracted MRI texture features provide relevant information allowing reliable classification between well-known risk factors for development/progression of CP and clinical relevant disease characteristics (complications) with acceptable accuracy. Hence, the aims were in a well-characterized cohort of CP patients and healthy controls to: (1) extract MRI texture features from DWI of the pancreas, and (2) identify texture features allowing accurate classification between (a) CP patients and healthy controls, (b) patients with different etiological risk factors (smoking and alcohol exposure), and (c) patients with and without presumed pancreatogenic diabetes and pancreatic exocrine insufficiency (PEI).

Methods

Study subjects

This study initially included pancreatic MRI from 82 patients with a CP diagnosis from the outpatient clinic at Department of Gastroenterology and Hepatology, Aalborg University Hospital. The CP diagnosis was based on the Lüneburg criteria, where CP was defined as a score ≥ 4 points [21]. MRI from 22 healthy controls were also included from outside the hospital and they were screened to exclude any history of pancreas-related diseases or gastrointestinal diseases. The MRI was performed at Department of Radiology, Aalborg University Hospital. The findings from the conventional image analysis of both CP patients and healthy controls were reported in two previous publications [10, 22]. None of the controls had MRI findings indicating CP.

Age, gender, body mass index, disease duration, and clinical information of patients were retrieved from review of the medical records. This included the two risk factors: presence of alcoholic etiology of CP according to Etemad et al. [23] and ongoing use of tobacco according to Tolstrup et al. [24]. Two complications were also included: presence of presumed pancreatogenic diabetes mellitus (defined as a diagnosis of diabetes established after the CP diagnosis) and PEI (defined as fecal elastase < 100 mg/g).

Written informed consent was obtained from both CP patients and healthy controls prior to study procedures, and the study was approved by the North Denmark Region Committee on Health Research Ethics (N-20130040; N-20130059) and conducted according to the Declaration of Helsinki.

Magnetic resonance image acquisition

MRI was performed on all subjects in supine position after minimum 4 h of fasting using the same 1.5T magnet (Signa HDxt, version 23, General Electric Healthcare, Milwaukee, Wisconsin, USA) with an 8 channel body coil. Axial DWI was acquired with TR/TE 4000/70 ms, slice thickness 6 mm, matrix 160 × 192, FOV 44 cm, and with b values of 0, 50, 400, and 800 s/mm2. For anatomical mapping and to aid in the DWI texture feature extraction process (see below), T2-weigted fast imaging employing steady-state acquisition (FIESTA, axial, slice thickness 4 mm), T2-weighted SSFSE fat saturated (axial, slice thickness 5 mm), and 3-dimensional MRCP (coronal, free breathing, slice thickness 2.6 mm) were used. For details about the image acquisition, see [10].

MRI texture analysis

In short, the pancreatic gland was segmented using the DWI b = 0 s/mm2 image and quantitative texture features were extracted for each subject. The entire data set was divided into subgroups (classifiers). The classifiers were used to select a subset of features which in combination provided the best classification performance in terms of accuracy, sensitivity, specificity, and positive predictive value (PPV) between the subgroups. An overview of the texture analysis is shown in Fig. 1 and details are provided in the following.

Fig. 1
figure 1

Overview of the texture analysis process. MRI magnetic resonance imaging, DWI diffusion weighted imaging

Extraction of pancreatic MRI texture features

A radiologist (MVL) with 12 years of experience manually delineated the pancreatic glands using the 3D Slicer 4.10.1 open-source software (Kitware Inc., Carrboro, NC, USA). The entire gland was segmented in all slices containing the pancreas using the DWI b = 0 s/mm2 image as a part of MultiVolume with an in-built software extension, see Fig. 2. The segmentation included the pancreatic ductal system, but excluded the common biliary duct and pseudocysts.

Fig. 2
figure 2

Segmentation of the pancreas in representative patients with chronic pancreatitis. The axial diffusion weighted imaging (DWI, b = 0 s/mm2) images were used. In subject a, b the gland is segmented including small cystic changes (arrows in a, b) in head as they could also be enlarged side-ducts, whereas in subject C larger pseudocysts are excluded (arrows in C)

Quantitative texture features were calculated using a SlicerRadiomics extension. In total, 851 texture features of the gland were extracted for each subject: Shape-based (3D) (excluded Compactness 1, Compactness 2 and Spherical Disproportion as they correlate to Sphericity), First Order Statistics (excluded standard deviation as this feature is correlated with variance), Gray Level Co-occurence Matrix, Gray Level Dependence Matrix, Gray Level Run Length Matrix, Gray Level Size Zone Matrix, Neighboring Gray Tone Difference Matrix for original image and after wavelet transformation in eight directions (LLL, LLH, LHL, LHH, HLL, HLH, HHL, and HHH) [25,26,27].

Subgroups for classification

The data set was divided into subgroups characterizing presence of disease, etiological risk factors, and complications to CP. Thus, five different comparison groups were generated and used as classifiers for the texture analyses: (1) disease classifier: CP vs. healthy controls, (2) alcohol classifier: alcoholic vs. nonalcoholic etiology of CP, (3) tobacco classifier: ongoing tobacco use vs. no tobacco, (4) diabetes classifier: diabetes vs. no diabetes, and (5) exocrine pancreatic function classifier: PEI vs. normal exocrine pancreatic function. For each classifier, the subjects were partitioned into the two groups.

Feature selection process

The total number of texture features extracted for each subject was 851. To find the most significant texture features, a Bayes classifier was trained using a tenfold cross-validation forward selection procedure using Matlab R2016b (MathWorks, Natick, MA, USA) [28, 29]. This was repeated for each classifier. The forward selection procedure was used to reduce the initial texture feature set into the most compact and discriminative feature set. Subjects were divided semi-randomly into tenfolds by ensuring that each m-fold contained at least one patient from each class. In each m-fold the features from the training samples were used to train the Bayes classifier and the test samples were classified by obtaining posterior probabilities for each test sample using \({{ P\left( {\omega_{i} |x} \right) = p\left( {x|\omega_{i} } \right)P\left( {\omega_{i} } \right)} \mathord{\left/ {\vphantom {{ P\left( {\omega_{i} |x} \right) = p\left( {x|\omega_{i} } \right)P\left( {\omega_{i} } \right)} {\mathop \sum \limits_{i}^{n} p\left( {x|\omega_{i} } \right)}}} \right. \kern-0pt} {\mathop \sum \limits_{i}^{n} p\left( {x|\omega_{i} } \right)}}\), where \(P\left( {\omega_{i} |x} \right)\) is the posterior probability for each class \(\left( {\omega_{i} } \right)\), \(P\left( {\omega_{i} } \right)\) is the prior probability for each class, and \(p\left( {x|\omega_{i} } \right)\) is a class conditional probability density function. During the forward selection process, \(p\left( {x|\omega_{i} } \right)\) for each class was modeled with a multivariate Gaussian distribution from the feature values of the training samples from each class. Each classifier was trained using equal \(P\left( {\omega_{i} } \right)\) for each class [29, 30].

The forward selection procedure was initiated using an empty feature set. In each iteration the classification performance was evaluated by combining the features in the selected feature set with each of the non-selected features remaining from the initial feature set. Then the feature that improved classification performance the most was iteratively added to the selected feature set for the classifier. To evaluate the performance of each feature combination in each iteration, the classifier was trained by obtaining a multivariate normal distribution for each \(p\left( {x|\omega_{i} } \right)\) from the feature values for the selected features combined with each of the remaining features for each m-fold. The trained classifier was then used to classify the test samples for each m-fold. The average classification performance across the m-folds was obtained for each feature and the feature with the best performance was added to the feature set. The average classification performance was obtained by calculating the average of the following performance metrics across each m-fold: accuracy, sensitivity, specificity, and PPV. The performance metrics were selected to obtain the most accurate classifier with a balanced performance between the sensitivity and specificity. PPV was selected to ensure the classification performance was not affected by skewed class distributions. The feature selection procedure was stopped when the improvement average classification performance was less than 0.005 in between iterations to prevent overfitting of the classifiers.

Statistical analysis

Data are presented as numbers (%) for categorical variables and means (standard deviations [SD]) for continuous variables. QQ plots were used to determine if assessment parameters were normally or non-normally distributed and hence to guide the selection of appropriate statistical analysis. A Student´s t test for continuous variables and Fisher’s exact test for categorical variables were used to compare demographic variables for CP patients and healthy controls. Since CP disease duration was not normally distributed, Wilcoxon rank-sum test was used to compare CP disease duration between the subgroups of classifiers. The software package STATA version 15.1 (StataCorp LP, College Station, Texas, USA) was used for all statistical analyses, and P < 0.05 was considered as statistically significant.

Results

Extraction of texture features was completed in 77 CP patients and in all 22 healthy controls, since segmentation was not possible in 5 patients due to magnetic susceptibility artifacts in two patients and major atrophy of the parenchyma (parenchymal thickness less than 1–2 mm) in three patients. Demographic and clinical characteristics of the 77 CP patients and 22 healthy controls included in the final texture analysis are given in Table 1. There was no difference in gender distributions (P = 0.32) between the two groups, but patients were older (P = 0.019) and had lower body mass index (P = 0.026) compared to the healthy controls.

Table 1 Demographical characteristics for patients with chronic pancreatitis (CP) and healthy controls (HC)

Disease and patient subgroup classifiers

The disease classifier included all CP patients (n = 77) and the healthy controls (n = 22). The alcohol classifier divided patients into alcoholic etiology of CP (n = 46) and nonalcoholic etiology of CP (n = 31). The tobacco classifier divided patients into ongoing smokers (n = 51) and non-smokers (n = 26). The diabetes classifier divided patients into patients with presumed pancreatogenic diabetic (n = 21) and non-diabetic patients (n = 50), since 6 patients were excluded with diabetes type 1 or type 2 diagnosed before the diagnosis of CP. The exocrine pancreatic function classifier divided patients into a PEI group (n = 54) and exocrine sufficient group (n = 19), since 4 subjects were excluded due to missing fecal elastase data. The CP disease duration was comparable between the subgroups of the alcohol classifier (P = 0.51), tobacco classifier (P = 0.69) and the exocrine pancreatic function classifier (P = 0.08). However, patients classified with diabetes had longer CP disease duration compared to patients without diabetes (diabetes: median 12 (range 2–32) years, no diabetes: median 7 (range 2–31) years, P = 0.02).

Selected MRI texture features for the disease and patient subgroup classifiers

The number of selected features and average m-fold performance metrics obtained for each classifier are reported in Table 2. The accuracy of each classifier is also illustrated in Fig. 3.

Table 2 The number of selected features and average m-fold performance metrics obtained for each classifier
Fig. 3
figure 3

Best average m-fold accuracy of each classifier: (1) disease (chronic pancreatitis (CP) vs. healthy controls (HC), (2) alcoholic etiology (alcoholic vs. nonalcoholic etiology of CP), (3) tobacco use (ongoing tobacco use vs. no tobacco use), (4) diabetes (presumed pancreatogenic diabetes vs. no diabetes), and (5) pancreatic exocrine insufficiency (PEI) vs. normal exocrine pancreatic function

Disease classifier

The best classification performance was obtained for the disease classifier (CP vs. healthy controls) by selecting five features from the original feature set of 851 features, having an accuracy of 0.98, sensitivity of 0.97, specificity of 1.00, and PPV of 1.00.

Etiological risk factor classifiers

The classifiers based on alcohol (alcoholic vs. nonalcoholic etiology) and tobacco (use of tobacco vs. no tobacco use) both obtained good performance using 9 and 10 features of the original feature set. The average accuracy and sensitivity were a bit lower for tobacco (accuracy 0.86, sensitivity 0.86, specificity 0.85, and PPV 0.92) compared to alcohol (accuracy 0.88, sensitivity 0.91, specificity 0.84, and PPV 0.89), while the specificity and PPV were similar between the two classifiers. Figure 4 illustrates two subjects with alcoholic etiology of CP with the feature values furthest from the nonalcohol group, and two subjects with nonalcoholic etiology of CP with the most normal feature values.

Fig. 4
figure 4

Illustration of two subjects with alcoholic etiology of chronic pancreatitis (CP) a, b with the texture feature values furthest from the nonalcoholic etiology group, and two subjects with nonalcoholic etiology of CP c, d with the most normal feature values. Anatomical T2 weighted images and diffusion weighted imaging (DWI, b = 0 s/mm2) with segmentation of the pancreas are illustrated. The two subjects with alcoholic etiology of CP have visually lower average signal intensity compared to the two subject without alcoholic etiology

Complication classifiers

The two complication classifiers for diabetes (diabetes vs. no diabetes) and exocrine pancreatic function (PEI vs. normal exocrine pancreatic function) both obtained similar accuracies, and 4 and 3 features were selected for each classifier, respectively. Diabetes had an accuracy of 0.83, sensitivity of 0.71, specificity of 0.88, and PPV of 0.71, while exocrine pancreatic function had an accuracy of 0.82, sensitivity of 0.81, specificity of 0.84 and PPV of 0.94, so performance were generally lower compared to the two risk factor classifiers. The diabetes classifier obtained the lowest sensitivity and PPV for all the classifiers, but obtained the second highest specificity. The exocrine pancreatic function classifier also had a lower sensitivity compared to the risk factor classifiers, but still obtained a similar specificity and a high PPV.

Detailed information of the features selected and their corresponding mean feature values for each class in each classifier are reported in Table 3.

Table 3 Selected features for each classifier and corresponding mean feature values for each class, presented as the true positive (TP) class or the true negative (TN) class mean values

Discussion

We demonstrated the feasibility and validity of pancreatic MRI texture analysis (using DWI) in patients with CP, and reported good classification accuracies between patients and healthy controls, as well for etiological risk factors and complications associated with CP. The method is based on analysis of the entire pancreatic gland including all the different types of morphological changes seen in CP. This may bypass the sampling error problems seen with conventional radiological assessment methods based on measurements in small pre-specified regions of interest. Altogether, MRI-based texture analysis may be a promising tool for extraction of additional information from MRI of the pancreas in patients with CP, which in future studies may be used for detection of early-stage CP and provide useful information on disease progression.

Feasibility and discriminative accuracy of the method

MRI texture analysis of the pancreas using DWI was feasible in most (77 of 82) patients with CP, even though the analysis process is cumbersome and time-consuming. The 3D-Slicer used is an open-source software intended for research (without FDA approvals) and is suitable for clinical use as no special education in programming to segment and extract features is needed. So far, texture analysis will mainly be a research tool, but could be integrated in more advanced clinical analysis packages. In this study, a Bayes classifier was used in the feature selection procedure as this classifier is a popular and optimal classifier which minimizes the average of probability error [30]. However, other feature selection techniques could be investigated in future studies to optimize classification accuracies.

The relevance of the selected features for each classifier reported in Table 3 has to be discussed, since it is not enough only to have high values of the performance metrics if the selected features are not meaningful or are conflicting.

For the disease classifier (CP vs. healthy controls) the selected features had very good classification accuracies and were observed to generally describe the coarseness (reflecting roughness and graininess) of the pancreatic texture patterns between the two classes. From the mean feature values in the two classes it can be seen that CP patients had higher feature values in long run emphasis, 10 percentile intensity value, zone entropy, and minimum intensity indicating that the CP patients generally had a more coarse texture and a more bright intensity level compared to the healthy controls. These findings make good sense, since more fibrosis, calcifications and ductal changes overall will give a more inhomogeneous tissue structure when segmenting the entire pancreas.

For the alcohol classifier (alcoholic vs. nonalcoholic etiology of CP), the selected features had good classification accuracies. The features mainly characterize fine textures (e.g., zone percentage), intensity variation (e.g., maximum correlation coefficient and total energy), and the intensity level of the pancreas tissue (e.g., mean intensity and long run low gray level emphasis) between the classes. Based on the mean feature values for each class, it seems that the pancreas tissue in the alcoholic etiology class was characterized by more fine texture patterns, a lower average intensity (as seen in Fig. 4), and more homogeneous appearance compared to the pancreas tissue in the nonalcoholic etiology class. This could reflect that alcohol and alcoholic etiology of CP is associated with different degrees of histopathological findings and calcifications.

For the tobacco classifier (use of tobacco vs. no tobacco), the selected features had also good classification accuracies and were observed to mainly characterize intensity variation (joint energy and busyness), intensity range (maximum and minimum intensity), and coarse texture patterns (large area low gray level emphasis, maximum correlation coefficient, and long run emphasis). Based on the mean feature values, it appears as the pancreas tissue in the smokers was characterized by a more coarse texture patterns and a larger variance in the intensity values compared to the non-tobacco class, which are characterized as having a more homogenous intensity. This likely reflects that the tissue changes associated with smoking are different as compared to other etiologies and may be related to an increased prevalence of pancreatic calcifications seen in patients with a smoking history [31, 32].

For the diabetes classifier (presumed pancreatogenic diabetes vs. no diabetes), the selected features had acceptable classification accuracies and mainly describe intensity homogeneity (inverse difference normalized, variance, and sum of squares) and general texture intensity level (long run low gray level emphasis) of the pancreas tissue. Based on the mean feature values, it appears that the pancreas tissue for the diabetic patients had a more inhomogeneous intensity distribution and more dark appearance compared to non-diabetic pancreas tissue. As patients with diabetes typically have a longer disease duration, as also found in our study, and replacement of the normal tissue with fibrosis as well as calcifications, this may explain the findings [33]. Also, patients with a history of alcohol and smoking typically have an increased risk of developing diabetes.

For the exocrine pancreatic function classifier (PEI vs. normal exocrine pancreatic function), the selected features had acceptable classification accuracies and describe the local intensity correlation, intensity range, and the concentration of pixels with high intensity levels (large dependence high gray level emphasis) between the two groups. Based on the mean feature values, it appeared that the pancreas tissue in patients with PEI is more homogeneous since pixels are more correlated. Additionally, the pancreas tissue tended to have a darker appearance for this group of exocrine insufficiency patients based on the intensity range and due to the low mean value in large dependence high gray level emphasis. The findings are plausible and could reflect that changes in exocrine function of the pancreas are associated with different degrees of histopathological findings.

Potential use of the method

Regarding the potential future use of MRI texture analysis in CP, the method could be a relevant potential future research tool for improved understanding of CP. More detailed information on pancreatic tissue structure and texture properties could potentially add to the already valuable characteristics obtained by conventional MRI analysis. This will support the ongoing research strategy focusing on identifying biomarkers of early-stage CP and markers of disease progression [1]. Of course this will call for future longitudinal follow-up studies to truly identify reliable biomarkers useful in the detection of early CP. Also other types of images (CT, MRI with T1 mapping, DWI with other b-values, ADC maps, etc.) should also be evaluated, including different methods of texture analysis and radiomics approaches.

Limitations

Several study limitations should be addressed. In this study, we based the texture analysis on DWI (the b = 0 s/mm2 images), since we know from previous studies that diffusion (ADC) relates to the histological degree of fibrosis [15]. Analysis of the DWI b = 0 s/mm2 image could be better than the ADC map, due to higher signal-to-noise ratio and less image distortion. Also, the analysis could have been based on other b values. It is well know that high b values more reflect the diffusion fraction of the image, while low b-values reflect a combination of the diffusion and perfusion properties [34]. Hence, the findings in our study likely reflect that properties of the image are related to both tissue diffusion and perfusion, which both are known to be affected in CP [35]. Indeed, as a potential tool in research and clinical studies, future studies should aim at identifying what images (other b values, etc.) that are the best and most accurate for characterizing the gland. Also, other feature selection techniques could be investigated in future studies to optimize classification accuracies, and including investigations of how the sensitivity and specificity vary depending on the number of included features. A major limitation, as also seen for most other pancreatic imaging studies, is the lack for biopsies for histological verification. Also, only a few patients had a CT examination in relation to the MRI study making it impossible to explore the exact role of calcifications on selected texture features. Another potential bias is that our method included segmentation of the entire pancreatic gland, including all types of pathological tissue changes, meaning that interpretation of the features selected for the different classifiers becomes even more challenging. This makes, as seen in the sections above, the interpretation of the specific links between alternations of certain texture features and specific pathological tissue changes very challenging. On the other hand, this could also be a strength of the method, since it includes assessment of the entire gland in an objective way. This will of course only be verified by conducing validation studies of this texture analysis technique including larger sample sizes, longitudinal studies, and multiple institutions.

Conclusion

In conclusion, the present pancreatic MRI (DWI) texture analysis approach demonstrated to be feasible in patients with CP. Good classification accuracy was found between CP patients and healthy controls, and high classification accuracies were found for different etiological risk factors and clinical relevant complications of CP. The method may prove to be an additional useful tool for morphological characterization of the pancreas, and could be a promising tool for characterization of early-stage CP and monitoring of disease progression.