Introduction

Diseases of the pancreas are complex (with a wide array of genetic, environmental, and behavioural factors affecting them) and often lie on a continuum. Acute pancreatitis (AP) is the most common disease of the exocrine pancreas with the global incidence of 33.7 per 100,000 individuals per year [1]. One-fifth of individuals after first episode of AP develop recurrent acute pancreatitis (RAP), and 36% of those with RAP progress to chronic pancreatitis (CP) [2]. Pancreatic cancer is the most lethal disease of the pancreas with the global incidence and mortality of 8.1 and 6.9 per 100,000 general population per year, respectively [1]. Its common risk factors include familial pancreatic cancer kindred and deleterious germline mutations in pancreatic cancer susceptibility genes [3]. Also, several focal pancreatic lesions (pancreatic intraepithelial neoplasms-grade 3 (PanIN-3), intraductal papillary mucinous neoplasms, and mucinous cystic neoplasms) are considered precancerous [4]. Both pancreatitis and pancreatic cancer often lead to new-onset diabetes, termed ‘diabetes of the exocrine pancreas’—the second most common type of new-onset diabetes in adults [5].

Imaging modalities (such as computed tomography (CT), magnetic resonance imaging (MRI), endoscopic ultrasonography, and positron emission tomography) are frequently used in management of diseases of the pancreas [6, 7]. Traditionally, their use predominantly includes subjective assessment of a handful of generic qualitative features that describe the underlying pathology of the pancreas. However, images of the pancreas contain an innumerable amount of objective data specific to each patient that could be harnessed to provide personalised management of patients [8, 9]. The field of quantitative image analysis has evolved in recent years and automatedly extracted features can now be analysed. The process of high-throughput extraction of image features from radiological images has been termed ‘radiomics’ [10]. Organ-specific radiomics promises to be a cornerstone of personalised medicine in the future. The use of radiomics in lung, liver, prostate, breast, kidney, rectum, and central nervous system diseases has been reviewed [11,12,13,14,15,16,17]. However, to date, there has been no systematic review on the use of radiomics in diseases of the pancreas.

The aim was to systematically benchmark published studies on radiomics of the pancreas and to determine their quality as well as the factors that are associated with it.

Methods

Search strategy

The search strategy was conducted in consultation with an experienced subject librarian to identify all relevant studies that reported on the use of radiomics of the pancreas in humans. A systematic literature search was conducted to identify all studies published from January 1, 2000 to April 15, 2020, using the MEDLINE database. No language restrictions were applied. The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines were followed. The initial screening was done through the review of titles and abstracts. Full-text articles of potentially relevant studies were retrieved and assessed for eligibility. Relevant articles were also identified through reference lists of the retrieved full-text articles.

Eligibility criteria

Eligible studies had to investigate the applications of image analysis in pancreatic benign or precancerous lesions, pancreatic cancer, pancreatitis, or diabetes mellitus through extracting quantitative imaging features (i.e. radiomics). All imaging modalities were eligible. Studies were excluded if they were conducted not in humans; qualitative imaging features alone were reported; machine learning (e.g. convolutional neural network) was used to recognise image patterns without extracting quantitative features; they focused on technical (e.g. image pre-processing or image acquisition parameters) or patient-related parameters and their effect on the stability and reproducibility of extracted features; and they focused on a complication of pancreatic surgery (e.g. pancreatic fistula). Publications other than original articles (e.g. reviews, book chapters, editorials) were not considered.

Data extraction

The following data were extracted from the included studies, if available: authors, year of publication, country, cohort size, goal(s) of the study, type of imaging modality, parameters of imaging (e.g. slice thickness, imaging phase, MRI sequence), method of segmentation, type of feature extraction software, type of extracted features (i.e. quantitative radiomics only or semantic), type and number of extracted quantitative features, number of statistically significant quantitative features, and method of feature reduction and classification.

The included studies were grouped into three main categories based on the main goal of each study. The first category was differential diagnosis of diseases of the pancreas (e.g. differentiation between healthy pancreas and chronic pancreatitis or pancreatic cancer). The second category was classification of diseases of the pancreas, where more than two subtypes of the same disease of the pancreas were studied (e.g. classification of subtypes of pancreatitis or classification of histologic grades of pancreatic cancer). The third category was prediction of diseases of the pancreas (e.g. prediction of survival in patients with unresectable pancreatic cancer or prediction of patient response to a certain treatment). Within each category, all radiomics features were categorised into three main groups (at least one significant feature reported, no significant feature reported, and non-investigated feature) with the view to determining a clinically useful pattern.

Radiomics quality score

The radiomics quality score (RQS) was calculated for each individual study. In brief, the RQS assessed the quality of radiomics study in terms of robustness and reproducibility through assigning points based on 16 criteria [18]. The number of points depended on the importance of the respective criterion, with 36 points (100%) being the maximum number.

Statistical analysis

The associations of the RQS with number of extracted features and cohort size were investigated using Pearson correlation coefficient. The association between the RQS and the type of imaging modality was investigated using linear regression analysis (with CT set as the reference). Statistical analysis was performed using SPSS software (version 24). A p value of < 0.05 was considered statistically significant.

Results

Characteristics of the included studies

The total number of retrieved publications was 120 (Fig. 1). Seventy-two studies met the eligibility criteria and were included in the systematic literature review [19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90]. These studies encompassed a total of 8863 individuals. The sample size varied between 17 and 690 individuals, with a median of 100 individuals. Fifty-four studies (75.0%) employed CT [23, 24, 26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43, 45,46,47,48,49, 51, 52, 55,56,57, 59, 61,62,63,64,65, 68,69,70,71, 74,75,76,77, 80,81,82,83, 85,86,87,88,89,90]; nine studies (12.5%), MRI [44, 50, 53, 54, 58, 60, 66, 67, 78]; three studies (4.2%), endoscopic ultrasound [19,20,21]; and six studies (8.3%), positron emission tomography [22, 25, 72, 73, 79, 84]. Forty-four studies (61.1%) were conducted in Asia [19,20,21, 33, 35,36,37,38, 41,42,43,44, 48,49,50,51,52, 56,57,58,59,60,61, 63, 64, 66,67,68,69,70,71,72,73,74, 76, 77, 79, 80, 83, 84, 87,88,89,90], 20 studies (27.8%) in North America [22,23,24,25,26,27,28,29,30,31,32, 39, 40, 45, 55, 62, 65, 75, 82, 85], and eight studies (11.1%) in Europe [34, 46, 47, 53, 54, 78, 81, 86]. Other details are presented in Table 1.

Fig. 1
figure 1

Flowchart of the study selection process

Table 1 Characteristics of the included studies

Sixty-six (91.7%) studies applied manual segmentation of the pancreas, whereas six studies used semi-automated segmentation [24, 25, 29, 46, 56, 84]. Twenty-seven studies (37.5%) extracted only quantitative radiomics features, whereas 45 studies combined both radiomics and semantic features. The semantic features in the 45 studies were clinical features (e.g. age, gender, body mass index (n = 45), histopathological features (e.g. tumour grades, mitotic index) (n = 15), blood biomarkers (e.g. cancer antigen 19-9, carcinoembryonic antigen) (n = 14), and genetic signatures (e.g. HMGA2 and c-Myc genes, miRNA genomic classifier) (n = 2). Out of the 45 studies, 20 studies reported that the performance of combined model (i.e. significant radiomics features plus semantic features) is higher than the performance of radiomics model alone. The superiority of combined model was confirmed statistically in eight studies [22, 24, 31, 46, 48, 51, 59, 88].

Various approaches to dimensionality reduction were applied in the included studies in order to select the most useful radiomics feature and reduce the effect of overfitting. These approaches included univariate filter technique (n = 17), multivariate filter technique (n = 20), least absolute shrinkage and selection operator regression (n = 18), as well as principle component analysis (n = 2). The useful features were used as an input for training and validating classification model. Out of the 72 included studies, 36 studies applied supervised machine learning techniques (including random forest in nine studies); 14 studies, support vector machine; and 22 studies, logistic regression. The median RQS of the included studies was 28% (interquartile range 22–36%). The three most frequently observed RQS characteristics were discrimination statistics and applying resampling techniques, employing well-documented imaging protocol, and clinical usefulness of the model (Fig. 2). The three least frequently observed RQS characteristics were prospective study design, imaging at different time points, and comparing radiomics model with current gold standard method (Fig. 2). Fourteen studies (19.5%) analysed feature robustness through detecting inter-scanner differences and vendor-dependent features [33, 38, 42, 48, 55, 58, 61,62,63, 69, 71, 73, 82, 86].

Fig. 2
figure 2

Radiomics quality score of the included studies

Applications of radiomics of the pancreas

The median number of extracted radiomics features in the included studies was 166 (interquartile range 14–416). Eighteen studies (25%) used in-house developed software [22, 24, 28,29,30, 33, 36, 38, 39, 45, 56, 61, 63, 69, 72, 80, 85, 90], whereas the remaining studies used open source or commercial software. The types and number of extracted features in individual studies are presented in Table 2. The significant radiomics features in individual studies are presented in Tables 3, 4, and 5 (stratified by the primary goal of using radiomics). The main focus of radiomics in 56 studies (77.8%) was pancreatic cancer; four studies (5.6%), pancreatic precancerous lesions [23, 24, 32, 39]; six studies (8.4%), pancreatic benign lesions [29, 68,69,70, 87, 88]; four studies (5.6%), pancreatitis [42, 60, 78, 85]; and two studies (2.8%), diabetes mellitus [61, 80]. Nineteen studies (26.4%) primarily applied radiomics for differentiation between various diseases of the pancreas, 23 studies (32.0%) for classification of subtypes/histologic grades, and 30 studies (41.7%) for prediction of prognosis/treatment response (Fig. 3). Out of the 72 included studies, 28 studies extracted radiomics features and patterns with the use of different filters (including wavelet, square, square root, exponential, logarithm, gradient, Laws, local binary pattern, Laplacian of Gaussian, and fractal dimension filters).

Table 2 Radiomics features investigated in the included studies
Table 3 Radiomics features in the studies focused on differential diagnosis of diseases of the pancreas
Table 4 Radiomics features in the studies focused on classification of diseases of the pancreas
Table 5 Radiomics features in the studies focused on prediction of diseases of the pancreas
Fig. 3
figure 3

Applications of radiomics of the pancreas. Only primary goals of individual studies are depicted. The complete list of goals of the individual studies is presented in Table 1Abbreviations: AIP, autoimmune pancreatitis; IPAS, intrapancreatic accessory spleen; IPMN, intraductal papillary mucinous neoplasm; MCN, mucinous cystic neoplasm; MFP, mass forming pancreatitis; PCN, pancreatic cystic neoplasm; PDAC, pancreatic ductal adenocarcinoma; PL, pancreatic lymphoma; PNET, pancreatic neuroendocrine tumour; SCN, serous cystic neoplasm; SPN, solid pseudopapillary neoplasm

Factors that affect radiomics quality score

Supplementary Table 1 details the RQS of individual studies. Overall, the RQS was significantly correlated with the number of extracted features (r = 0.529, p < 0.001) as well as with the cohort size (r = 0.343, p = 0.003). In its turn, the number of extracted features and the cohort size were significantly correlated with the number of statistically significant features (r = 0.437, p < 0.001; and r = 0.437, p < 0.001, correspondingly). Using CT as the reference, radiomics features extracted from MRI images resulted in an increase in the RQS by 1 point (p = 0.732); extracted from positron emission tomography images, in an increase in the RQS by 2 points (p = 0.570); extracted from endoscopic ultrasonography images, in a decrease in the RQS by 10 points (p = 0.102). Using CT as the reference, radiomics features extracted from MRI images resulted in 5 more statistically significant features (p = 0.134); extracted from endoscopic ultrasonography images, in 3 more statistically significant features (p = 0.495); extracted from positron emission tomography images, in 1 less statistically significant feature (p = 0.680).

Discussion

This is the first systematic review to investigate the use of radiomics of the pancreas and the factors that affect quantitative imaging features of the pancreas. A total of 72 studies that enrolled more than eight thousand participants were included, with the median sample size of 100 participants. The median number of investigated radiomics features in the included studies was 166, ranging from 4 to 2041 features. These features could be grouped into five main categories: shape features, first-order texture features, second-order texture features, filtered image features, and customised features [91]. Filtered image features appeared to be the most frequently observed significant radiomics features in the studies that employed them for classification (Table 4) and prediction (Table 5) of diseases of the pancreas. However, only 39% of studies (28 out of 72) used these features and more research is needed to confirm their usefulness in diseases of the pancreas. Future research also needs to determine the optimal filters as the included studies used a total of 10 different filters (including wavelet, square, square root, exponential, logarithm, gradient, Laws, local binary pattern, Laplacian of Gaussian, and fractal dimension filters). Second-order texture features were used in 94% of studies (68 out of 72) and they appeared to be the most frequently observed significant radiomics features in the studies focused on differential diagnosis of diseases of the pancreas (Table 3). The superiority of this group of features is likely explained by the fact that they capture the spatial arrangement and distribution of intensities within the pancreas using different types of matrices (e.g. grey level co-occurrence matrix, grey level run length matrix). Making use of large number of matrices may be required as the pancreas is a complex glandular organ of a relatively small size, located deeply in the retroperitoneal space, and composed of different types of cells (i.e. acinar, ductal, endocrine)—each with different functions [92]. The exocrine part constitutes around 95% of the pancreas with two main types of cells being acinar cells and ductal cells. The endocrine part (i.e. islets of Langerhans) constitutes less than 5% of the pancreas, with five major types of cells being alpha cells, beta cells, delta cells, epsilon cells, and pancreatic polypeptide cells. Besides, the size of the pancreas may change during consumption of food [93,94,95]. These physical and physiological characteristics of the pancreas make radiomics investigation of the pancreas quite challenging and justify the extraction of large number of radiomics features that could describe the pancreas comprehensively. Further, extracting large number of features likely captures variabilities in genetic, environmental, and behavioural factors that cause diseases of the pancreas, hence enabling characterisation of each patient individually and ultimately resulting in personalised management [1]. In the future, the performance of radiomics models may, in principle, be enhanced by considering certain semantic features (e.g. demographics, blood biomarkers, genomics). However, it is a long way to go as, out of the 45 studies that reported on combined models, only 8 studies (18%) demonstrated a statistically significant superiority of combining radiomics and semantic features.

The other notable finding of the present systematic review was that the RQS had a significant positive correlation with the number of extracted features and the cohort size. However, the RQS in all the included studies altogether was rather low, with a median of 28%. The top three most consistently reported criteria were reporting discrimination statistics, applying well-documented imaging protocol, and studying the clinical utility of the extracted biomarker. By contrast, the three least frequently reported criteria were prospective study design, applying the delta radiomics (i.e. extracting features at different imaging time points), and comparing the results with gold standard. Worryingly, none of the included studies were prospective. Therefore, future radiomics studies of the pancreas should be conducted in a prospective fashion. The second least frequently observed criterion was the use of delta radiomics, where multiple images were obtained at different time points in order to test the reproducibility and stability of extracted radiomics features over a specific period of time. Out of the 72 included studies, only three studies met this criterion. Therefore, future studies on radiomics of the pancreas should extract and test radiomics features at multiple time points. The third common omission was the lack of comparison of radiomics findings with gold standard. For example, results of many studies on the use of radiomics of the pancreas to determine prognosis of patients with pancreatic cancer were not compared with the well-established gold standards (tumour node metastasis (TNM) staging system and MD-Anderson pre-treatment classification) [96]. It is also worth noting that, while histology is the gold standard for diagnosing focal pancreatic lesions, only 15 included studies used it (although it may not be ethical to use it in patients with benign lesions). Further, all the 6 radiomics studies on pancreatic benign lesions (such as serous cystadenoma) used CT only, which is considered suboptimal as accurate diagnosing of these benign lesions is quite challenging without the use of MRI (especially if lesions are of small size). Careful selection of gold standard in future studies on radiomics of the pancreas is encouraged.

There are several limitations that need to be acknowledged when interpreting the findings of the present review. First, there was a heterogeneity between the included studies in terms of image acquisition protocols. For example, different phases of CT and MRI sequences were employed in the primary studies. This brings to the fore the need to standardise image acquisition protocols in future radiomics studies of the pancreas. Second, the included studies used a range of software packages that not infrequently offer different algorithms for defining the same radiomics features. This highlights the need to standardise the definitions of imaging features. The Image Biomarker Standardisation Initiative aspires to standardise the extraction of imaging biomarkers from acquired imaging for the purpose of high-throughput quantitative image analysis [97]. This initiative needs to be taken into account in radiomics of the pancreas research. Third, the included studies disproportionately focused on focal pancreatic lesions. Only six studies investigated benign diseases that are characterised by diffuse changes of the pancreas (i.e. pancreatitis and diabetes mellitus) and high-quality radiomics studies in these diseases are now warranted [98]. Fourth, we designed the present systematic review to include only studies that applied handcrafted radiomics as the main method for extracting quantitative imaging features from radiological images. However, it is also possible to use machine learning to extract some features. For example, one study extracted 256 deep learning features from the first three layers of convolutional neural network model, in addition to the radiomics features [57]. Last, building predictive model is one of the promising applications of radiomics in diseases of the pancreas. Thirty studies applied radiomics for building predictive models; however, none of them appeared to follow the TRIPOD (Transparent Reporting of multivariable prediction model for Individual Prognosis or Diagnosis) guidelines [99].

In conclusion, the present systematic review demonstrated that radiomics of the pancreas emerges as a promising tool that could be used for personalised management of patients with diseases of the pancreas. To maximise the benefits of radiomics of the pancreas, future studies are best to have a large sample size (more than 100 participants), use standardised software packages that offer a large number of radiomics features (especially second-order texture features and filtered image features), investigate radiomics in prospective fashion, compare radiomics results with an appropriate gold standard, and apply delta radiomics.