1 Introduction

Pancreatic ductal adenocarcinoma (PDAC) is a common and aggressive malignancy, with a 5-year survival rate of 4–6 % and a median survival time of less than 6 months. The poor prognosis and clinical outcome are due to the inability to detect PDAC in an early stage and to the poorly effective therapeutic options currently available. Therefore, a further understanding of the molecular mechanisms underlying PDAC development is necessary to identify new early diagnostic biomarkers and therapeutic targets [1]. In the past, it has e.g. been postulated that epithelial to mesenchymal transition (EMT) processes may yield therapeutic targets for PDAC and other solid tumours [2, 3]. KRAS, CDKN2A, TP53 and SMAD4 are the most frequently mutated genes in PDAC, whereas several other genes are mutated at relatively low frequencies [4]. Also, several expression-related biomarkers at both the RNA and protein level with diagnostic, prognostic and predictive value have been identified [5], including carbohydrate antigen 19–9 (CA 19–9) and carcinoembryonic antigen (CEA, currently named CEACAM5) [5], whereas other biomarkers have been excluded as PDAC prognosticators [6]. In attempts to unravel the molecular mechanisms underlying the pathogenesis of PDAC, microarray-based gene expression profiling studies have in the past been carried out. By doing so, Badea et al. identified 65 over-expressed genes in PDAC tumour epithelia and, among them, KRT7, LAMC2, SFN, PFKP, ANXA2, MAP4K4 and MBOAT2 were found to be inversely related to patient survival [7]. Donahue et al. identified 171 genes by which, on basis of their expression levels, two PDAC prognostic subgroups could be defined. In particular, high levels of PIK3R1 expression were found to be strongly associated with improved survival rates, whereas high levels of SRC expression were found to be associated with poorer survival rates [8]. Zhang et al. identified 2620 differentially expressed genes in PDACs, including 277 genes that were found to be associated with clinical outcome. Among them, DPEP1 reached the strongest association [9].

More recently, systems biology approaches have been applied to microarray-based gene expression datasets, aiming to extract new information besides a simple list of differentially expressed genes. Further in-depth analyses of the data generated by Zhang et al. [9] suggested important roles for the SP1 and TK1 genes in the progression of PDAC [10]. Additional approaches include the analysis of transcriptome maps to reveal genomic regions enriched in over/under-expressed genes [11]. Finally, although widely employed in the field of oncology, the analysis of microarray-based gene expression data by the weighted gene co-expression network analysis (WGCNA) systems biology approach [12] has, to our knowledge, so far not been applied to PDAC-derived data [1315]. WGCNA allows a global interpretation of gene expression data by constructing gene networks based on similarities in expression profiles among samples. Highly co-expressed genes are connected in the network and, therefore, can be grouped into modules (i.e., highly connected network regions). Since these modules often consist of functionally related genes, different modules are involved in individual functions [16]. Within the modules, WGCNA also allows the identification of the most central and connected genes, that is the so called “hub” genes. When this approach is applied to two groups of samples, for example healthy and diseased samples, it is possible to obtain a network for each group and, therefore, to identify common modules and modules that differ between networks. These latter modules and their key genes may be involved in e.g. pathological processes and, thus, may have important clinical implications as potential diagnostic and prognostic biomarkers or therapeutic targets. As such, they warrant validation. In order to improve our understanding of the biological mechanisms underlying PDAC, we analysed existing PDAC gene expression datasets by applying an advanced network analysis strategy to detect key genes potentially involved in the pathogenesis of PDAC.

2 Materials and methods

2.1 Gene expression data and pre-processing

Raw CEL files of five microarray-based gene expression datasets (GSE15471 [7], GSE32676 [8], GSE28735 [9], GSE41368 [17] and GSE71989 (unpublished)) containing expression data from in total 105 normal pancreatic and 129 PDAC tissue samples were downloaded from the NCBI Gene Expression Omnibus (GEO) (Table 1). Data elaboration and figures presenting the results were obtained using the R 3.1.2 statistical environment (www.r-project.org) and Bioconductor (version 2.14) (www.bioconductor.org). Raw data from each microarray dataset were pre-processed identically with the R package affy using the Robust Multichip Average (RMA) function for background correction and normalization with the quantiles method [18]. Previous comparison studies have reported that RMA outperforms other normalization methods for analysis of microarray-based gene expression data [19]. In order to be able to merge the five microarray datasets, which were derived from two different platforms, they were made compatible. To this end, we first mapped the array probes to the respective Entrez Gene ID, a cross-platform common identifier, using the array annotation data hgu133plus2.db or hugene10sttranscriptcluster.db, depending on the platform used. Since the expression of a given gene is usually measured by multiple probes, we next summarized the expression values using the function collapseRows implemented in the R package WGCNA [20]. We selected the parameter “MaxMean” as method for collapsing rows, which chooses the probe with the highest mean value among samples, since this generally produces the most robust results [21]. In order to limit further analyses to genes common to all datasets, we finally created an overlapping gene set by selecting the rows with the Entrez Gene ID present in both platforms using the WGCNA function intersect, resulting in a total of 17,536 common genes. Since in the Affymetrix Human Gene 1.0 ST and Human Genome U133 Plus 2.0 Arrays 19,878 and 19,851 unique Entrez Gene IDs are represented, respectively, the percentages of common genes included in our analyses are considered to be very high (88.2 and 88.3 %). In order to remove cross-platform batch effects, we used the ComBat method [22], which is implemented in the R package sva and carries out another normalization step using an empirical Bayes approach. Among the available methods for batch correction, ComBat reaches the highest precision, accuracy and overall performance [23]. After this, we were able to merge all normalized microarray-based data into two global datasets, i.e., Normal and PDAC, that we used for the subsequent WGCNA analyses. The identification of outlier samples (to be excluded) was performed by hierarchical cluster analysis using the hclust function in WGCNA on each global dataset.

Table 1 Summary of the five gene expression datasets used

2.2 Dataset comparability analyses

In order to assess the comparability of the Normal and PDAC datasets, necessary for subsequent analyses, we correlated the gene expression levels and the overall connectivity, i.e., the co-expression level correlation, between the datasets. In case the correlations are positive, a higher correlation value indicates a higher comparability between the Normal and PDAC datasets. To this end, we used the function softConnectivity from package WGCNA, with the “randomly selected genes” parameter set at 5000, other parameters set as default, and the power parameter pre-calculated by the pickSoftThreshold function of WGCNA. This function provides the appropriate soft-thresholding power for network construction by calculating the scale-free topology fit index for several powers. If the scale-free topology fit index for the reference dataset reaches values above 0.8 for low powers (<30), as defined in [12], it means that the topology of the network is scale-free and, therefore, that there are no batch-effects.

2.3 Construction of weighted gene co-expression networks and identification of modules

Using standard WGCNA procedures [12], we created two weighted gene co-expression networks based on the Normal and PDAC expression data, respectively. Briefly, in each dataset we first created a matrix of adjacencies using the WGCNA function adjacency, by calculating Pearson correlations between each gene pair to determine concordances of gene expression, after which this matrix was transformed into a Topological Overlap Matrix (TOM) using the function TOMsimilarity. The resulting topological overlap is a biologically meaningful measure of gene similarity based on co-expression relationships between two genes [12]. Each TOM was used as input for hierarchical clustering analysis, which was performed with the function flashClust. Finally, in the resulting dendrograms we identified network modules present in the Normal dataset (used here as reference dataset) with the function cutreeHybrid from the R package dynamicTreeCut, using a relatively large minimum module size (minClusterSize = 30), and a medium sensitivity (deepSplit = 2), with other parameters set as default.

2.4 Module preservation analyses

The preservation levels of Normal network modules in the PDAC network were assessed first by plotting the two networks, thereby imposing modules from the Normal network onto the PDAC network. Secondly, by the function modulePreservation from the WGCNA package, a permutation test was carried out that assesses the preservation of the connectivity and density between each couple of modules, each belonging to the Normal and PDAC networks. This function provides a summary preservation Z-score for each module. The higher the Z-score, the higher the module preservation is, whereas values below 10 indicate a moderate to low preservation. Note that, since the grey module is the module of genes not assigned to any module, and the gold module contains random genes used for statistical aims by the modulePreservation function, they should have low Z-scores [24]. For the modulePreservation function we set some parameters (nPermutations = 30, maxGoldModuleSize = 100, maxModuleSize = 400), whereas others were left as default.

2.5 Detection of hub genes and their functional annotations

Module hub genes, which are highly connected intra-modular genes, have the highest Module Membership (MM) scores to the respective module [25]. The MM of each gene was calculated by the WGCNA function signedKME that correlates the expression profile of a gene with the Module Eigengene (ME) of a module, so it quantifies how close a gene is to a given module. ME is the representation of a module in one synthetic expression profile, obtained by the WGCA function moduleEigengenes. Next, we mapped these hub genes to the associated Gene Ontology (GO) terms and KEGG pathways using the DAVID tool (http://david.abcc.ncifcrf.gov/) [26]. Functional enrichment analysis of the identified hub genes consists of statistically highlighting the most over-represented (enriched) GO terms and KEGG pathways (p < 0.05), in order to facilitate the interpretation of the biological mechanisms related to a given gene list. Next, we used the Enrichr tool (http://amp.pharm.mssm.edu/Enrichr/) [27] to perform enrichment analyses of our hub gene lists. These enrichment analyses were carried out on predicted transcription factor binding sites using the “TRANSFAC_and_JASPAR_PWMs” section, on predicted miRNA binding sites using the “TargetScan_microRNA” section, and on chromosomal regions where these genes are located using the “Chromosome_Location” tool section. Only statistically significant results are reported (p < 0.02).

2.6 Survival analyses

Survival analyses were carried out using the SurvExpress tool [28], through which comparisons and validations of candidate cancer biomarkers using patient survival data present in the microarray datasets are made. The SurvExpress tool divides samples into two groups (high-risk and low-risk) through the median of the prognostic index obtained via a Cox regression model. After this, it generates risk hazard ratios (HR), relative confidence intervals (CI) and p-values. Our survival analyses were performed on an independent GEO dataset (GSE21501), containing gene expression and survival data derived from 132 PDAC patients.

3 Results

3.1 Pre-processing of the Normal and PDAC datasets

In order to increase the sample size and thus enhance the reliability of our analyses, five microarray datasets containing raw gene expression data of both normal and tumour pancreatic samples were pre-processed and merged into two global datasets, i.e., Normal and PDAC (Table 1; see Section 2). This data pre-processing step was needed since weighted gene co-expression network analysis (WGCNA) is sensitive to batch effects, i.e., to systematic and technical differences between different platforms and datasets and to the presence of outlier samples [16]. Besides removal of batch effects, we discarded a clear outlier, the normal sample GSM388111, from the GSE15471 dataset, as indicated by the dendrogram of sample clustering (Supplementary Fig. 1). As a result, the Normal dataset consisted of 104 normal pancreas samples and the PDAC dataset of 129 tumour samples. Subsequently, we assessed whether our datasets were comparable, since highly comparable datasets provide better chances of finding similarities and, therefore, also differences among them during subsequent analyses. In Supplementary Fig. 2 we show that our datasets are indeed comparable, since the correlation of gene expression between datasets was found to be 0.97 (p < 1e-200) and that of gene connectivity was found to be 0.43 (p < 1e-200). The latter parameter reflects the weighted co-expression level correlation, which indicates how strong a gene is connected to all other genes in the network. Next, we verified whether the networks to be constructed had a scale-free topology, as is required for WGCNA. A scale-free topology is a fundamental property of metabolic and signalling networks in which some nodes (here genes) are more connected than others, that is, some nodes are central (hub nodes) and others are peripheral. To this end, we applied the R function pickSoftThreshold and found that the scale-free topology fit index correctly reached values above 0.8 for a low power of 10 in the Normal dataset, here used as a control dataset (Supplementary Fig. 3). This result is also an indirect indication that we have efficiently removed batch-effects. In order to confirm this latter notion, we applied the function pickSoftThreshold to the Normal dataset not corrected for batch-effects, and found that it indeed failed to reach 0.8 at low powers (data not shown). As a further confirmation of efficient removal of batch-effects, the hclust function was applied to the non-corrected Normal and PDAC datasets, yielding several clusters of samples corresponding to the original datasets, whereas the hclust function applied to batch-effect corrected datasets yielded clustering dendrograms in which the samples were correctly sorted in no specific order (Supplementary Fig. 4).

3.2 Identification of gene co-expression networks and modules

In the weighted gene co-expression network deriving from the Normal dataset, we identified via hierarchical clustering a total of 27 modules of different size in terms of gene number, which were labelled by different colours according to WGCNA package functions. In WGCNA analyses, a module is a group of genes with strongly shared co-expression relationships and, therefore, these genes are more connected than other genes in the network. Next, we set out to evaluate how well the characteristics of the modules that we identified in the reference network (Normal) are reproduced in the test network (PDAC). By doing so, we may be able to identify non-preserved modules, i.e., modules whose network properties are altered between the Normal and PDAC networks. These modules may, in turn, be related to the development of PDAC. To this end, we first plotted the two networks and imposed the modules from the Normal network onto the PDAC network (Supplementary Fig. 5). Since we found that the module-coloured labels still cluster together in the PDAC network, this indicates that the module preservation between the two datasets is good. In order to subsequently quantify the module preservation and to identify lowly preserved modules, we used the modulePreservation function from the WGCNA package, resulting in Z-scores for each module (Supplementary Table 1). Modules with a Z-score higher than 10 are highly preserved, i.e., they have similar network characteristics in the PDAC network, whereas the grey60 (Z-score = 9.6) and lightgreen (Z-score = 8.4) modules are lowly preserved between the datasets, and they may thus distinguish normal from pathological conditions. The Grey and Gold modules are special WGCNA modules that should not be considered here (see Section 2).

3.3 Identification of hub genes and their functional annotations

For each identified module, the hub nodes (that is hub genes) common to both networks were identified (Supplementary Table 2). Next, we focused on the grey60 and lightgreen modules, since these were found to be lowly preserved between the Normal and PDAC networks (see above) and, as such, can potentially distinguish PDAC from normal samples. In order to identify the central (hub) nodes that well represent these modules, we analysed these modules in further detail. In Table 2 we have listed the top 20 hub genes identified in the PDAC network. Since these hub genes were found exclusively in the PDAC network, they may play important roles in the pathogenesis of PDAC and, therefore, warrant further validation. In order to provide an interpretation of the biological mechanisms associated with these hub genes, we used the DAVID tool for a functional enrichment analysis. In Table 3 over-represented (enriched) Gene Ontology (GO) terms with p-values < 0.05 are listed. In the grey60 module, genes related to endoplasmic reticulum were found to be significantly over-represented among the hub genes. In the lightgreen module, several GO terms were found, some of which are related to cellular compartments, such as “cytoplasmic part”, “membrane” and “mitochondrion”, while other terms are related to biological processes such as “lipid metabolic process”, “transferase activity”, “hydrolase activity” and “transmembrane transport”.

Table 2 Hub genes identified in the PDAC network restricted to the grey60 and lightgreen modules
Table 3 Functional annotation of hub genes in the grey60 and lightgreen modules

In order to identify common elements involved in gene expression regulation, we next performed a gene enrichment analyses using the Enrichr tool. In Table 4 the transcription factors that are most over-represented in the grey60 and lightgreen modules are listed, of which RBPJ and FOXO3A were the most statistically significant ones. Regarding microRNAs, only miR-202, predicted to target the BCL7A and MANEAL genes, was found to be enriched in the grey60 module (p = 0.0112). Finally, we found that the chromosomal regions 7q21 and 3q28 were enriched in genes belonging to the lightgreen module, i.e., PON2 and SLC25A13 (p = 0.0012) and B3GNT5 (p = 0.0115), respectively. In the grey60 module, the chromosomal region 20q13, in which the STAU1 and ZNF334 genes are located, was found to be enriched (p = 0.0064).

Table 4 Enriched transcription factors binding to the promoters of hub genes

3.4 Stratification of PDAC patients into high- and low-risk groups based on novel candidate biomarkers

Finally, we assessed whether the grey60 and lightgreen modules, i.e., the very lowly preserved modules between the Normal and PDAC datasets, are associated with the overall survival (OS) of PDAC patients [14]. For each module, we performed both single-gene and multi-gene (top 20 hub genes) survival analyses using the SurvExpress tool on an independent PDAC dataset (GSE21501). In Supplementary Fig. 6 the Kaplan-Meier survival plots for OS are shown using as input either the top 20 hub genes or each gene individually. By doing so, we found that both the grey60 and the lightgreen top 20 hub gene signatures successfully stratified patients into high- and low-risk groups, with OS times of the high-risk group patients being more than three-fold shorter than those of patients in the low-risk group (HR 3.83 [95 % CI 2.26–6.5] p = 6.474e-07 for the grey60 module and HR 3.41 [95 % CI 1.95–5.85] p = 8.813e-06 for the lightgreen module). In addition, we carried out a single-gene analysis in order to reveal which genes were most significantly associated with OS. We found that an increased expression of the CAMKMT (HR 1.76 [95 % CI 1.07–2.89] p = 0.02552), PON2 (HR 1.97 [95 % CI 1.19–3.27] p = 0.008798) and SLC25A13 (HR 1.65 [95 % CI 1.01–2.7] p = 0.04636) genes was associated with a poor OS. Conversely, increased expression levels of the TBC1D24 (HR 1.8 [95 % CI 1.1–2.95] p = 0.02005) and CASK (HR 1.81 [95 % CI 1.11–2.96] p = 0.01742) genes were found to correlate with a better OS.

4 Discussion

Here we identified, for the first time, candidate genetic biomarkers for pancreatic ductal adenocarcinoma (PDAC) by applying weighted gene co-expression network analysis (WGCNA), a systems biology approach, on expression data derived from five microarray-based datasets of PDAC and normal samples. We found that two modules of co-expressed genes differed significantly between the Normal and PDAC networks, suggesting a role in the pathogenesis of PDAC. Subsequently, we narrowed down the list of genes within these modules by identifying only the hub genes, i.e., the most PDAC-related genes according to WGCNA. Functional enrichment analysis of these genes revealed that they are related to either endoplasmic reticulum (ER), mitochondrion or membrane functions, exhibit transferase or hydrolase activities, and are related to biological processes such as lipid metabolism or transmembrane transport.

In particular, we identified the ERLIN2 gene , coding for an ER lipid raft associated protein, that was previously found to be over-expressed in PanIN-3, a pancreatic intraepithelial neoplasia-derived cell line [29]. Moreover, we identified TXNDC12, a gene that was previously found to inhibit ER stress-induced apoptosis of cancer cells [30]. It is well known that high proliferation rates of cancer cells can lead to ER stress, since these cells require increased protein folding activity in the ER [31]. Under stress conditions, increased levels of unfolded and misfolded proteins can induce an unfolded protein response (UPR) that can either mitigate ER damage or induce apoptosis [31]. However, cancer cells may develop effective escape strategies to overcome and prevent UPR-induced apoptosis [31]. Recent work has shown that these strategies are also used by PDAC cells and has yielded therapeutic molecules that induce apoptosis by acting on the ER stress response system [32, 33].

Regarding mitochondria, we identified the VDAC1 gene coding for a mitochondrial protein that belongs to a tethering protein complex that allows physical contact between the ER and mitochondria, i.e., the mitochondria-associated membrane (MAM) [34]. Many proteins regulating the MAM have been associated with cancer [34], of which the VDAC1 protein has very recently been found to be highly expressed in PDAC samples and has been suggested to serve as a reliable biomarker [35]. In addition, siRNA-mediated silencing of VDAC1 gene expression has been found to inhibit the growth of pancreas cancer-derived cell lines and xenograft models [36], highlighting a pivotal role of VDAC1 in PDAC development. Moreover, it has been reported that during apoptosis VDAC1 allows the release from mitochondria of the apoptogenic factor cytochrome c (CYCS) [34], which we identified as another key factor and which was previously found to be highly expressed in invasive ductal adenocarcinomas [37]. Additionally, since it was recently reported that PDAC cells rely on mitochondrial oxidative phosphorylation for survival [38], CYCS, which is a central element in this pathway, may play a critical role. Whereas VDAC1 is an outer mitochondrial membrane protein, the hub gene MCU that we also identified codes for a calcium uniporter localized at the mitochondrial inner membrane. MCU has been proposed to act as a tumour suppressor gene, along with other calcium-related genes, since it shows loss of heterozygosity (LOH) in PDACs [39]. Besides MCU, our hub gene list includes additional calcium-related genes, such as CASK, KCNMB3, PON2, SLC25A13 and ZDHHC3. It should be noted here that alteration of the calcium pathway plays an important role in the initiation and progression of PDAC via the Ca2+/calmodulin, PI3Kα/Akt and Raf/MEK/ERK pathways [40, 41]. Moreover, it has recently been reported that higher serum calcium levels are associated with a poor prognosis in PDAC patients [42]. Among the calcium-related genes, PON2 seems to be interesting since it was found to be over-expressed in tumour epithelia [7] and since the PON1 gene, which is another member of the paraoxonase gene family, has been found to serve as a candidate biomarker for pancreatic cancer [43]. Also the ZDHHC3 gene warrants further investigation since, like its related hub gene ZDHHC4, it codes for a palmitoyltransferase. Recently, integrin palmitoylation levels have been correlated with breast cancer invasiveness [44].

Besides alterations in the calcium pathway, it is well known that cancer cells can reprogram their metabolism to support rapid proliferation, including a metabolic switch to anaerobic glycolysis (Warburg effect) [45, 46]. Another relevant altered metabolic pathway is lipid synthesis, which strongly increases [47]. It has been found that inhibition of lipogenesis, decrease of the lipoprotein lipase and enhancement the biosynthesis of ceramide, a pro-apoptotic molecule, may inhibit cancer growth [47]. Moreover, it has been found that defects in ceramide metabolism may contribute to tumour cell survival and chemoresistance [48]. Among the key genes that we identified, ELOVL6 codes for a fatty acid elongase and NAPEPLD codes for a lipase, whereas CERS6 and B3GNT5 play a role in the ceramide synthesis pathway. In fact, CERS6 has been found to be strongly involved in several tumour types [49, 50]. Moreover, it was recently found that changes in lipid metabolism play a role especially in pancreatic carcinogenesis, since cholesterol uptake is increased in PDAC tissues and LDL receptor silencing reduces cell proliferation and enhances the chemotherapeutic efficacy in PDAC cells [51].

An additional key gene that should be further investigated is CEACAM1, a member of the carcinoembryonic antigen (CEA) gene family. In fact, it is considered as a PDAC biomarker, since it was found at higher expression levels in PDAC samples compared to noncancerous pancreatic samples [52], to be highly expressed in metastatic PDAC-derived cell lines [53] and to be present at high levels in serum of PDAC patients [54]. Moreover, its related protein CEA (also known as CEACAM5), is currently used as a conventional PDAC biomarker. Finally, some of the hub genes belonging to the grey60 module have been found by others to be highly expressed in PDAC, such as the C15ORF52 [55], TMEM51 [29] and LARP1 [37] genes.

We also sought to identify genes potentially related to PDAC by searching for gene expression regulatory elements that were enriched in our hub gene lists. Through this analysis, several common transcription factors were identified that have previously been related to PDAC development, including the well-studied SP1, HIF1A, FOXO3A and TP63 proteins (Supplementary Table 3). Through our miRNA enrichment analysis we identified miR-202, which has been proposed to serve as a therapeutic target since, when expressed at a low level, it has been found to induce apoptosis of PDAC cells [56]. Finally, the genomic regions 7q21-q22 and 20q13, that we identified through chromosomal enrichment analysis, are known to be frequently altered in PDAC and, thus, may play a causal role in its tumorigenesis [57, 58].

Moreover, patient overall survival analyses showed that the grey60 and lightgreen modules could effectively stratify high- and low-risk PDAC patients in an independent gene expression dataset. In particular, five genes (CAMKMT, CASK, PON2, SLC25A13 and TBC1D24) were found to be the major contributors to these gene signatures and, therefore, they may serve as prognostic biomarkers for PDAC.

Overall, the systems biology approach that we adopted in this study (WGCNA) allowed us to identify two modules of co-expressed genes related to pancreatic cancer, including some genes already known to serve as candidate PDAC biomarkers, which confirms the reliability of our results. Our results do, however, differ from those of three previously reported studies, performed on single PDAC microarray datasets [79]. This could be due to differences in input data and the adopted method. In fact our analysis has been performed on 5 PDAC datasets, including the three above mentioned ones, since large sample sizes are required for WGCNA. Moreover, WGCNA seeks hub genes in a co-expression network, in contrast to the cited studies in which primarily differentially expressed genes were identified. The previously reported studies yielded different results even among each other both in terms of differentially expressed genes and biomarker genes related to overall survival. These differences could be due to patient variables such as treatment, stage, gender, demographics, ethnicity, etc. Interestingly, several physiologic, hormonal and genomic differences between males and females may be relevant to gender-specific cancer susceptibility [59] or even to differential sensitivities of cells to anti-neoplastic agents [60]. It has been reported that significant differences in PDAC incidences do exist among different ethnic groups, correlating both with disparities in access to care and with genetic and other unknown risk factors [61, 62]. Unfortunately, we cannot assess these variables since this patient information is lacking in the microarray datasets used in this and in the three other cited studies [79]. Therefore, a further analysis and validation of the candidate PDAC biomarkers reported here is necessary, including those that have not yet been associated with PDAC.