Introduction

Pancreatic cancer (PC) is one of the most common cancers in the digestive tract and one of the most lethal malignant neoplasms worldwide [1], being the pancreatic ductal adenocarcinoma (PDAC), the most common type of PC. Once diagnosed, the prognosis is poor, with less than a 10% 5-year survival rate [2, 3]. Some pancreatic cystic lesions (PCLs) are well-known precursors of PDAC, with different prognosis depending on their characteristics.

PCLs are increasingly common incidental findings on abdominal imaging tests due to the rise of the aging population and an extension in the usage of abdominal imaging tests and enhanced quality of imaging [4]. The prevalence of PCL varies extremely with the method of imaging used and among studies, ranging from 3% of all patients undergoing routine computed tomography (CT) to 13–20% if the imaging test used is MRI [5, 6]. On the other hand, autopsy studies have evidenced a much higher prevalence of PCL, revealing that up to 50% of the elderly population may present at least one pancreatic cyst.

Discrimination between different cyst types is difficult. Several studies have reported that there are still no clinically available methods to effectively differentiate PCLs among benign, premalignant, and malignant lesions. Cystic lesions with malignant potential include intraductal papillary mucinous neoplasms (IPMNs), mucinous cystic neoplasms (MCNs), and solid-pseudopapillary tumors. On the other hand, benign cysts such as serous cystic neoplasms (SCNs) rarely or never progress to cancer [7].

In the clinical practice, the accuracy for the discrimination of these cysts ranges from 43 [8] to 70% [9], the latter reached by physicians with 10 + years of experience in abdominal imaging. The use of other diagnostic techniques, such as endoscopic ultrasound (EUS), has shown a high sensitivity ranging from 75 to 95%, although these are invasive techniques and are not always available [4]. Unfortunately, aside from the assessment of morphological changes on costly and inconvenient serial imaging tests, to date, there are no reliable biomarkers to predict the progression of these cysts. Usually, a final diagnosis can only be made based on follow-up examinations or after a histopathological analysis of the lesion. Importantly, resection of these PCLs may dramatically reduce the quality of life of the patients as these surgeries come with a 50% chance of complications and a 5% chance of death, and it has been demonstrated that around 60% of them end up having been unnecessary because the cyst was benign [10,11,12].

In this challenging scenario, artificial intelligence (AI) could help to improve and speed up the detection and classification of PCLs and PC in early stages [13]. Many publications regarding this topic have been released in recent years, most of them in an experimental offline setting and applying different methodologies. Artificial intelligence (AI) through machine learning allows machines to analyze extremely large amounts of training images and find patterns to extract specific clinical features by using an algorithm. Based on the accumulated clinical features, machines can diagnose newly acquired clinical images. There are many different algorithms, that can be classified between supervised and unsupervised learning. Supervised learning infers an answer from labeled training data that come from a set of training examples [14]. On the contrary, in unsupervised learning the training data are not labeled. Most of the ML algorithms used in radiology are random forest (RF), support vector machine (SVM) and CNN which fall into the supervised learning category. Convolutional neural networks (CNNs) are one of the most used systems for this purpose [14] as they present a great capacity to automate the analysis and process a large number of images. AI has been shown as a promising tool to help radiologists to detect neoplastic and pre-neoplastic lesions in the pancreas [15, 16]. The aim of this manuscript is to review the articles addressing the diagnostic capacity of AI-based algorithms processing CT or MR images for the detection of PC and PCLs.

Data sources and search strategy

We performed a comprehensive literature search using PubMed, EMBASE, and Scopus (from January 2010 to April 2021) to identify full articles evaluating the diagnostic accuracy of AI-based methods processing CT or MR images to detect PDAC or PCLs. Electronic searches were supplemented by manual searches of references of included studies.

We excluded studies addressing neuroendocrine lesions, pseudocysts, or lesions arising from non-pancreatic tissue. We also excluded studies aimed to classify histologic subtypes of PC, prognostic studies, and studies only assessing pancreatic segmentation. Two review authors (JA and JR-C) independently screened the titles and abstracts obtained by the search using the inclusion criteria.

Artificial intelligence: design of studies

Throughout the last decade, there have been many studies published that have proven the great step forward that the use of the artificial intelligence applied to medical imaging analysis has meant in the detection and treatment of breast nodules or lesions in the liver, e.g., [17, 18]. Such technology has reached a level of maturity in which it is able to assist radiologists in a straightforward, robust, and trustworthy way locating lesions and image biomarkers for several diseases in an early stage, with a direct impact on the management of the workload of the radiologists and the life’s quality of the patients.

We found 20 studies meeting our inclusion criteria. Amid the 20 studies, 7 had a multicenter design [19,20,21,22,23,24,25] and 13 came from a single center [20, 24,25,26,27,28,29,30,31,32,33,34,35].

The most used radiological technique was contrast CT for both PC and PCLs detection. Two studies used ML-assisted techniques for MRI processing [26, 36]. Most of the AI-based systems used were convolutional neural networks (CNNs). Two studies [21, 37] used a combination of random forest (RF), which classifies a set of predefined features (e.g., demographic features), and a convolutional neural network (CNN), which analyses the radiological features of the lesions. In other 4 studies, the models used to develop the system were RF [20, 24, 38], and the study from Shen et al. [30] compared the performance of several systems: support vector machine (SVM), RF, and artificial neural network (ANN). The reference standards in all detection studies were expert radiologists manually delineating neoplasia in radiological images. The gold standard was histology obtained from surgery except from 2 articles including autoimmune pancreatitis (AP) in which image and analytical data were also used to diagnose AP [20, 26]. Characteristics of included studies are detailed in Tables 1 and 2.

Table 1 Characteristics of studies addressing AI and PAC
Table 2 Characteristics of studies addressing AI and pancreatic cysts

Pancreatic ductal adenocarcinoma (PDAC) detection

Ten studies addressed the use of AI to detect PDAC [20, 22, 23, 26, 31, 34,35,36, 38, 39]. Two of these studies aimed to detect PC and differentiate it from AP [20, 26]. Only 4 studies used external validation [22, 34, 36, 37]. From these 4, all used an offline, image-based validation, and none were validated in a clinical setting. Six studies reported the number of patients included in the training and testing set [20, 22, 23, 36, 38]. Accuracy for the detection of pancreatic cancer ranged from 83 to 98%. S, Sp, and accuracy of all studies are reported in Table 3.

Almost all AI studies used contrast CT, which is generally the preferred and most accessible tool for the first approach to diagnosis [41]. Pancreatic cancer often carries a poor prognosis due to the diagnosis in advanced stages. This usually occurs because of the lack of specific symptoms and due to the subtle changes in the parenchyma in its early phases [42,43,44]. For that reason, an early diagnosis of PC requires expertise in reading radiological images [45]. Indeed, it has been reported that in a tertiary medical center, radiologists missed 7.1% of the PCs finally diagnosed [24].

Table 3 Sensitivity (S), specificity (Sp), and accuracy for the detection of PC, classification of PCL and detection of HGD/cancer, -: Not reported

Zhu et al. [35] initially described a system using DL to detect and segment PC tissue and to differentiate it from normal tissue, with a sensitivity and specificity above 90%. Ziegelmayer [26] and Park et al. [20] investigated the performance of a RF algorithm and a CNN, respectively, to differentiate between PC and AP. For defining AP, they used both clinical and histopathological criteria. In the study from Park, 95% of the 62 test patients were correctly classified as either having PC or AP. Noticeably, all patients with PC were correctly classified. Indeed, the highest accuracy was obtained in the study from Park et al. [20]. Ma et al. [31] also obtained one of the highest accuracy for detecting PC. They developed a CNN base model using a dataset of 3494 CT images and then evaluated an approach based in binary and ternary classifiers, with the purposes of detecting and localizing masses, respectively. In the binary classifier, the performance of plain, arterial and venous phase had no difference, and the accuracy was 95%. However, in the ternary classifier, the arterial phase had the highest sensitivity in detecting cancer in the head of the pancreas among the three phases (85%) and was much lower than that of the tail (52%). For this reason, they recall that the model is suitable mainly for screening purposes in pancreatic cancer detection.

However, these studies were using images of carcinoma and normal tissue to detect pancreatic cancer, whereas in clinical practice the differentiation of different pancreatic diseases is of key importance [46].

Gao et al. [36] designed a CNN to differentiate pancreatic diseases in MR images, including cancerous and normal tissue and also images of various kinds of tumors. They used a generative adversarial network (GAN) to augment and balance the dataset with synthetic images to overcome the shortage of images. Most of the images used for the training and testing set were carcinomas. In the external validation set, the patch-level area under the roc curve (AUC) of carcinomas and pancreatic neuroendocrine tumor were 0.903 to detect PC.

The largest studies up to now are those from Liu et al. [34] and Si et al. [22], with more than 600 patients using an external validation cohort. Liu et al. [34] conducted a study including 370 patients with PC and 320 controls from a Taiwanese center. They used 2 internal sets and 1 external set for testing. The sensitivity of the CNN for tumors < 2 cm was 92% in the local test sets and 63% in an external set from the USA. So, it means that the system achieved excellent results in the internal validation cohort and good but lower accuracy in the external cohort, remarking the importance of a large and representative training cohort of patients. In this study, AI performance was better than that of the radiologists participating in the analysis. In a recently released publication, Si et al. [22] carried out a large study including also other types of pancreatic tumors with 319 patients and 143,945 CT images, obtaining an average accuracy for all tumor types of 82.7%, and the independent accuracy of pancreatic ductal adenocarcinoma was 87.6% in an external validation cohort. These data show that AI could help diminish the problems arising from differences in radiologists’ expertise and mitigate the heavy workload that is coming from the increase in the number of CT performed. However, these systems still need to prove that they can detect small and early pancreatic lesions, which is likely the main limitation in clinical practice.

Characteristics of included studies addressing PDAC are detailed in Table 1.

Pancreatic cystic lesions

Eight studies aimed to detect and classify PCLs (2 of them also included other kinds of tumors as mentioned before) [21, 22, 24, 25, 30, 33, 36, 37], and 4 aimed to predict the presence of high-grade dysplasia (HGD) or cancer [19, 27, 32, 40]. Six studies reported the number of patients included in the image extraction for the training and testing sets [22, 25, 30, 32, 36, 37].

Classification of PCL

From the studies designed to classify PCLs, 6 of them aimed to detect and classify any type of cyst [21, 22, 30, 33, 36, 37], 1 to differentiate serous cystadenoma from the other types [25] and 1 to differentiate between serous and mucinous cysts [24].

Only the studies from Gao et al. and Dmitriev et al. [36, 37] used an offline external validation. However, in the study from Gao et al., the corresponding PCL results were only given for the internal validation set due to the insufficient number of images including PCLs in the external validation cohort. The rest of the studies used only internal validation.

Accuracy for classifying cystic lesions ranged from 73 to 91%. S, Sp, and accuracy of all studies are reported in Table 3.

Early detection of pancreatic cysts could be a great opportunity of preventing the development of PC. These cysts are often detected in CT carried out for other reasons [46,47,48]. However, differentiating between the different cysts is crucial due to the different malignant potential and the different need of follow-up [49]. In this regard, the best results were obtained from the study of Dmitriev et al. [21] who initially presented an algorithm to discriminate between the 4 main types of neoplastic pancreatic cysts: intraductal pancreatic mucinous neoplasm (IPMN), mucinous cystic neoplasia (MCN), serous cystic neoplasm (SCN), and solid pseudo-papillary neoplasm (SPN). They developed a model using a Bayesian combination of an RF classifier and a CNN, merging patient demographic factors with signal intensity and shape features from the cyst images. The overall accuracy obtained was 83.6%. Then, in a more recent publication [37], they tested the algorithm in an external cohort including 134 patients with an accuracy of 95%. However, most of the cases included were IPMN, so the rest of the lesions could have been underrepresented. Interestingly, the median size of the misclassified cysts was 4.8 cm, suggesting that the network could not correctly distinguish smaller lesions due to a lack of distinctive internal features. This paper also included an analysis to provide visual clarification of the decision-making process of the CAD system focusing on which input features are the most important for the RF component and how their values affect the final prediction and also analyzing the function of the CNN by studying the semantical separability and characteristics of the learned radiological features. This is important to get to understand how the system works and which changes could be made to improve it in the near future.

Li et al. [33] also developed a CNN model to classify PCLs on whole pancreas CT images. Besides, saliency maps were generated to remark the important pixels in the images to visualize the most important areas contributing to the classification output and to help the physicians to understand how the deep learning method works for in the diagnostic process. This system showed and accuracy of 72.8% improving the results of the radiologist’s baseline manual reading in the same study. They observed that MCN was easily misclassified as IPMN probably due to the similar appearance in the CT images.

As already stated, malignant potential of the PCL varies widely. Serous cystic neoplasms have a negligible malignant potential; therefore, identifying those is key to detect patients that will not have malignant potential and will not have to undergo long-term surveillance [50, 51]. To classify pancreatic serous cystic neoplasms from other pancreatic cystic neoplasms, Wei et al. [25] conducted a study including 260 patients from which 102 had an SCN. They achieved an accuracy of 83%, an S of 67%, and an Sp of 81%. Besides, they reported that only 31 of 102 serous cystic neoplasm cases in this study were recognized correctly by clinicians before the surgery. Previous publications [9, 52] reported an accuracy of ~ 70% for the discrimination of pancreatic cysts on CT scans read by radiologists with > 10 years of experience in abdominal imaging. In this sense, this study from Wei showed an accuracy greater than 83% evidencing the potential of these networks to classify PCLs.

Yang et al. [24] also investigated the performance of a RF to discriminate between serous and mucinous cysts with similar results. Later, Shen et al. [30] carried out a study to compare the performance of 3 different ML systems: support vector machine, RF, and an automatic neural network (ANN) for the differential diagnosis of SCN, MCN, and IPMN. In this case, the RF model showed the highest overall accuracy in both the training and validation dataset.

Detection of HGD and cancer

Three studies aimed to detect HGD or cancer in IPMN [19, 27, 40] and another included the 4 main kinds of cysts [32]. Accuracy ranged from 75 to 88%.

One of the most important tasks of the radiologists when analyzing pancreatic cysts is to determine whether they have malignant features or not. This task is extremely challenging, especially for recognition of high-grade dysplasia as the changes in the cyst may be extremely subtle. Many guidelines have been developed to address this topic, but results are still not accurate enough [53,54,55].

Chakraborthy et al. [27] initially developed an ML model that included clinical and imaging features from CT to predict high- or low-risk IPMNs. Using the imaging features, they reported a sensitivity of 80% and a specificity of 59%. Interestingly, when they also included clinical variables, the Sp raised to 70%.

Later, Corral et al. [19] published a study using a CNN in MR images to detect HGD in IPMN. They included 139 patients, achieving an S and Sp of 75%. Their research reported that once features were extracted, the computer code took 0.18 s to run the complete algorithm. They stated that the accuracy reported was similar to that of an expert radiologist but much faster [38, 39]. In this regard, probably the use of clinical or biochemical data could also help to improve its performance.

In another study released in 2020, Kang et al. [40] compared the performance of a ML-based system with the traditionally used logistic regression (LR) to detect HGD in IPMN, reporting similar results (accuracy ~ 75%) for both systems without including clinical or biochemical information.

When approaching the diagnosis of dysplasia in PCL, the balance between S and Sp is crucial. These studies showed a performance comparable to the current diagnostic guidelines with a slight increase in sensitivity [4, 56,57,58]. The use of this tools could increase the chances of some patients of having a curative pancreatic resection, which may reduce pancreatic cancer mortality. However, the still low Sp causes the concerns about false positive results which could lead to an increase in unneeded major surgeries with the mortality and comorbidities often attached [59,60,61].

It has been proved that adding different imaging techniques could improve the outcomes regarding diagnosis of dysplasia [62, 63]. Endoscopic ultrasonography (EUS) is a technique used to evaluate the pancreas with high accuracy, and it adds valuable information to assess the malignancy of IPMNs [64,65,66]. In a recent research study, Kuwahara et al. [66] reported an accuracy of 94% to detect HGD in IPMN using EUS images. The reported that AI accuracy was higher than human diagnosis (56.0%) and the mural nodule (68.0%). They also performed a multivariate logistic regression analysis that showed that AI malignant probability was the only independent factor for IPMN-associated malignancy. However, the study was a single-center retrospective study with small sample size and these results should be further validated.

It is likely that these systems will also benefit from including clinical information and biochemical and genetic data, as has been recently reported in CompCyst, an ML tool designed to characterize PCL and guide clinical decisions [67]. They tested Comp Cyst in 474 patients, and it correctly identified 71% of pancreatic adenocarcinomas with cystic degeneration, whereas clinical and imaging criteria correctly identified 58% of pancreatic adenocarcinoma although slightly lower Sp. However, Sp to detect serous cystic lesions was very high. Application of the CompCyst test would have spared surgery in more than half of the patients who underwent unnecessary resection of their cysts. These systems will probably be not a substitute of imaging techniques but a help to clinicians contributing with additional information to allow doctors make a better diagnosis [68, 69]. The way in which these tests will be implemented in routine clinical settings remains to be determined.

Complete characteristics of included studies addressing PCLs are reported in Table 2.

Conclusions

In this review, we searched publications on machine learning for pancreatic ductal adenocarcinoma or pancreatic cystic lesions diagnosis in CT or MRI images, observing that in the last 3 years, there has been a huge increase in the number of publications regarding this topic. However, most of them are still in experimental stages.

With the arrival of higher-resolution cross-sectional imaging techniques, incidental PCL has been increasingly discovered over the past years [5, 70]. Some carry a malignant potential or could even carry malignant cells already, and in most cases, these changes are very difficult to detect and classify [71]. In this sense, while PCL is increasingly being discovered, the survival rate of pancreatic cancer patients has barely improved in the last few decades. Indeed, a correct management of these PCLs, focusing on the stratification the malignant potential of these cysts, may prevent the mortality associated with progression to pancreatic cancer. The current consensus guidelines for management of PCL, which rely on standard imaging characteristics to predict cyst malignancy potential, have shown a limited accuracy in detecting and characterizing PCLs [4, 57, 58]. For this reason, the introduction of a new technology such as AI through machine learning has raised a lot of attention [72].

In recent years, there have been a fast development of AI tools that have showed the great potential of ML and DL models to detect pancreatic lesions and pancreatic adenocarcinoma and to help to classify PCL [73]. Some of them have showed a very good performance with an accuracy over 90% for differentiating pancreatic adenocarcinoma from normal pancreatic tissue [35] or for differentiating carcinoma from autoimmune pancreatitis [20] which is another important differential diagnosis of PC. However, other features such as parenchymal atrophy or pancreatic duct enlargement are not yet recognized by AI [74].

Regarding PCL, some groups focused on classifying the different types of cystic lesions with varying results, often more accurate for diagnoses of IPMN [21, 33] and others tried to differentiate between mucinous and serous lesions which is important because of the different prognosis and follow-up [25]. Besides, the usage of this tools can speed up dramatically this tasks that usually carry a great burn for the radiologists. There are also very promising results in the field of detecting HGD or cancer, with a recent study reporting high S over 80% although with moderate Sp that will probably be increased in the future by including clinical, biochemical, and genetical data [32, 67].

However, most of the studies referred in the review present several limitations and methodological concerns that need to be addressed in the coming future. The main concern is the retrospective and offline design, which makes difficult to elucidate the applicability of these systems in clinical practice. Another crucial limitation is that many of the studies were trained in a small internal dataset. The low prevalence of some PCLs may difficult the collection of the large number of images needed to construct a reliable algorithm. This, together with the inclusion of only the best radiological images to perform the studies, could lead to a poor generalization of the model. Another problem is the reference gold standard chosen in the different studies. The most reliable gold standard is the anatomopathological sample of the cyst. However, this samples are usually obtained only for larger cysts according to clinical guidelines [4] which could bias the results of the studies and exclude the analysis of the smallest cysts. More importantly, most of the studies were carried out without external validation [72]. External validation is crucial to estimate the prospective performance of the model in an unseen population. Testing the system with an external prospective cohort is necessary to characterize the model bias and leads to a more reliable tool [75, 76].

An important aspect to consider is how the interaction between the human doctors and the AI tools will be. There are multidisciplinary groups working on it, such as the one involved in “Felix Project,” that describe the future of this tool as a “second reader” integrated in the radiology workflow, that will segment the organs and annotate any suspicious pancreatic pathology and then send it back to the radiologists to be double checked [77].

In conclusion, the increased number of cross-sectional imaging tests and diagnoses related to PCLs implies an increase in workload in the clinical practice, but on the other hand it entails a greater probability of finding lesions of mucinous origin with premalignant characteristics. This may lead to an increase in early-stage pancreatic cancer diagnoses. AI techniques have shown to be a promising tool which is expected to be helpful for most radiologists’ tasks. However, methodologic concerns must be addressed, and prospective clinical studies should be carried out before implementation in clinical practice.