Introduction

Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal cancers in the world, being characterized as high aggressiveness, early metastasis, and insensitive to chemotherapy or radiotherapy [1], its 5-year survival is only 8% based upon the latest data [2], and little improvement has been seen over the past years [1, 3]. Hence, illumination of the molecular pathophysiology mechanisms and identification of the key signaling pathways and regulators is urgently needed to develop novel screening, diagnostic and therapeutic strategies.

Recently, the microarray technology has been extensively used to detect generally genetic alteration during tumorigenesis and cancer progression. With this technology, several gene expression profiling studies have shown hundreds of differentially expressed genes (DEGs) in PDAC carcinogenesis, which involved in various pathways, biological processes, and molecular functions. Comparative analysis of the overlapped DEGs may be more reliable when compared with a single expression profile. In this study, we used integrated bioinformatics methods to find the overlapped DEGs, analyzed the functional and pathway enrichment and protein–protein interaction (PPI) network to find the possible hub genes, and by using The Cancer Genome Atlas (TCGA) database to obtain the survival data and predict the prognostic values of the hub genes.

Methods

Collection of studies

We searched the GEO database (https://www.ncbi.nlm.nih.gov/geo/) for publics from inception to February 10, 2018, using the following strategy: “pancreatic cancer”, “homo sapiens” (Organism), “tissue” (Attribute Name), “RNA” (Sample Type), “expression profiling by array” (Study Type). Further inclusion criteria were as follows: (1) samples composed of both PDAC tissues and normal tissues, (2) gene expression profiling of mRNA, (3) sample count of each group are more than 10, and total count more than 30, and (4) sufficient information to perform the analysis. Five gene expression profiles (GSE62452, GSE15471, GSE102238, GSE16515, and GSE62165) were finally chosen.

Microarray data and data processing

GSE62452 datasets contained 69 tumor samples and 61 normal samples [4], GSE15471 consisted of 39 tumor samples and 39 normal samples [5], GSE102238 included 50 tumor samples and 50 normal samples [6], GSE16515 was composed of 36 tumor samples and 16 normal samples [7], and GSE62165 was formed by 118 tumor samples and 13 normal samples [8].

GEO2R (https://www.ncbi.nlm.nih.gov/geo/geo2r/) is an R programming languages-based tool to screen for DEGs [9]. By entering the series accession number, defining groups, assigning groups and clicking “Top 250”, the webpage could compare the differences between the groups. After saving the results, we picked the genes whose adjusted P-Values (adj. P) < 0.05 and |logFC|>1. Venn map (http://bioinformatics.psb.ugent.be/webtools/Venn/) was drawn to identify the overlapped genes.

Functional and pathway enrichment analysis

The Database for Annotation, Visualization and Integrated Discovery (DAVID) Version 6.8 (https://david.ncifcrf.gov/) is a comprehensive functional annotation tools to help us understand biological meaning behind the genes [10]. By Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, we found the potential relevant biological function annotation. P < 0.05 was considered statistically significant.

PPI network construction and module analysis

The STRING database Version 10.5 (http://string-db.org/) aims to collect and integrate interactions between proteins, including direct (physical) interactions and indirect (functional) interactions [11], and combined score > 0.4 was set as the cut-off criterion. Then we used Cytoscape Version 3.6.0 [12] to visualize the biomolecular interaction networks of the DEGs. Node degree ≥ 10 was set as the criterion of hub genes. Molecular Complex Detection (MCODE) plug-in was used to screen modules from the PPI network with degree cutoff 2, haircut on, node score cutoff 0.2, k-score 2, maximum depth 100, and nodes more than 8. The functional and pathway enrichment analysis was performed through DAVID in the modules.

Survival analysis of hub genes

OncoLnc (http://www.oncolnc.org/) is a tool for interactively exploring survival correlations, which contains survival data for 21 cancer studies performed by TCGA [13]. The PDAC patients were divided into two groups: low (expression lower than the first quartile) and high (expression higher than the third quartile), the overall survival of the two groups was assessed by Kaplan–Meier plots and log rank P-Value, log rank P-Value < 0.05 was the cut-off criterion. Since OncoLnc cannot provide with the hazard ratio (HR) with 95% confidence intervals (CI), we downloaded data from OncoLnc, and then used IBM SPSS Statistic Version 24.0.0.0 to perform the survival analysis and calculate the HR and 95% CI. At last, we also performed survival analysis of the hub genes using the data obtained from the GEO database of GSE62452 and GSE71729 to validate the results.

Results

Identification of DEGs

A total of 295, 1793, 2133, 1824, and 4063 genes were extracted from GSE62452, GSE15471, GSE102238, GSE16515, and GSE62165, respectively. Among them, 179 DEGs overlapped (Fig. 1), and 130 were up-regulated, 49 were down-regulated.

Fig. 1
figure 1

Identification of differentially expressed genes (DEGs) in GSE62452, GSE15471, GSE102238, GSE16515, and GSE62165

Functional and pathway enrichment analysis

GO biological process (BP) analysis indicated that the up-regulated DEGs were significantly enriched in extracellular matrix and structure organization, collagen catabolic and metabolic process, while the down-regulated genes were mainly involved in proteolysis, reactive oxygen species metabolic process, homeostatic process and cellular response to starvation. For GO cell component (CC), the up-regulated DEGs were significantly enriched in extracellular region part and matrix, proteinaceous extracellular matrix, and the down-regulated genes were mainly involved in extracellular space and region part. Upon molecular function (MF), the up-regulated DEGs were significantly enriched in extracellular matrix structural constituent, collagen binding, and integrin binding, while the down-regulated genes were mainly involved in exopeptidase activity, peptidase activity, and serine-type peptidase activity. Additionally, KEGG analysis proved that the up-regulated DEGs were significantly enriched in extracellular matrix (ECM)-receptor interaction, focal adhesion and phosphoinositide 3-kinase (PI3K)-Akt signaling pathway, while the down-regulated genes were mainly involved in pancreatic secretion, and complement and coagulation cascades (Fig. 2, if the terms enriched in this category were more than five, top five were chose according to P-Value).

Fig. 2
figure 2

Functional and pathway enrichment analysis of up-regulated and down-regulated genes in pancreatic ductal adenocarcinoma (PDAC) tissue

PPI network construction and module analysis

In total, 126 nodes and 327 edges were mapped in the PPI network of identified DEGs (Fig. 3a). Twenty-one genes with degree ≥ 10 were chosen as hub genes (Table 1). Through the MCODE plug-in, one significant module was selected with average MCODE score = 5.6, nodes = 16 and edges = 42 (Fig. 3b). Functional enrichment analysis indicated that the up-regulated genes in the significant module were enriched in extracellular matrix organization, extracellular structure organization, extracellular matrix disassembly, and collagen metabolic process, while the down-regulated genes were involved in platelet degranulation (Fig. 4).

Fig. 3
figure 3

a Protein–protein interaction (PPI) network of differentially expressed genes in pancreatic ductal adenocarcinoma. b A significant module selected from PPI network. Red nodes stand for up-regulated genes, while green nodes stand for down-regulated genes, and the darker color of the nodes stands for a larger |log FC|, the larger size of the nodes stands for a higher degree of connectivity. The lines represent interaction relationship between the nodes, and a wider line stands for a larger combined-score

Table 1 Top 21 hub genes with higher degree of connectivity
Fig. 4
figure 4

Functional and pathway enrichment analysis of up-regulated and down-regulated genes in the significant module

Survival analysis of hub genes

OncoLnc predicted that among the selected hub genes, high mRNA expression of ITGA2, MMP7, ITGB4, ITGA3, VCAN and PLAU may be associated with poor survival of PDAC patients (P < 0.05). Survival analysis performed by SPSS was more conservative than OncoLnc, and the result showed that, high expression of MMP7 predicting poor survival may be debatable, as its P-Value is 0.053 (Fig. 5). The survival analysis of the hub genes using the data in GSE62452 validated that, high expression of ITGA2, ITGB4, and ITGA3 predicted poor survival. Date in GSE71729 showed that high expression of ITGB4 and PLAU had a poor survival (Table 2).

Fig. 5
figure 5

Prognostic values of ITGA2, MMP7, ITGB4, ITGA3, VCAN and PLAU in pancreatic ductal adenocarcinoma patients

Table 2 Validation of prognostic values of ITGA2, MMP7, ITGB4, ITGA3, VCAN and PLAU in PDAC patients

Discussion

PDAC, with a high mortality and short period of survival, is a malignancy that poses a serious threat to human health [1], unfortunately, the early diagnosis and efficient treatment of PDAC still remains as a huge problem due to the lack of understanding of the molecular mechanisms which drive the occurrence and development of PDAC. Therefore, it is of vital importance to have in-depth research into the factors and mechanisms, which might help in PDAC diagnosis and therapy [14]. With the development of bioinformatic and microarray technology, the precious and tremendous data of the patients could be shard, and it is much easier to determine the general genetic alterations in diseases occurrence, progression and prognosis, which may shed light on some hub genes or targets for clinical utility.

In this study, we identified 5 datasets comparing the differences in mRNA between tumor tissues and normal tissues. Eventually, a total of 179 DEGs were screened, including 130 up-regulated genes and 49 down-regulated genes. Functional and enrichment analysis revealed that the up-regulated DEGs were significantly enriched in extracellular organization, collagen catabolic and metabolic process, while the down-regulated genes were mainly involved in proteolysis, reactive oxygen species metabolic process, homeostatic process. Additionally, KEGG pathway analysis showed that the up-regulated DEGs were significantly enriched in ECM-receptor interaction, focal adhesion and PI3K-Akt signaling pathway, while the down-regulated genes were mainly involved in pancreatic secretion, and complement and coagulation cascades.

The results are in accordance with previous studies, which proved that PDAC was characterized by a dense stromal response, and stromal element contribute to its progression [15, 16], and Begum et al. showed that the ECM proteins increased PDAC tumor initiating potential, self-renewal and the frequency of cancer stem cells, indicating that the present method is effective in identifying hub genes. PI3K-Akt pathway is vital in various cancers [17, 18], dysregulation of it is common in PDAC [17], and up to 60% of PDAC cases had increased PI3K-Akt activity [17, 19, 20]. Liu et al. proved that inactivation of PI3K-Akt increased gemcitabine induced apoptosis in pancreatic cancer cells [21]. Hence, targets on this pathway might be potentially novel therapy for PDAC. Ebrahimi et al. summarized the agents targeting PI3K-Akt, only Wortmannin, LY294002, and Perifosine has been tested in pancreatic cancer, and the results might be optimistic [17].

By PPI network construction, we identified top 21 genes with high connectivity degrees, which include ALB, COL11A1, COL1A1, COL3A1, COL5A2, COL6A3, FBN1, FN1, COMP, EGF, ITGA2, ITGA3, ITGB4, MMP1, MMP7, PLAT, PLAU, THBS2, POSTN, SPARC, and VCAN, and among them, only ALB and EGF are down-regulated. Further survival analysis proved that, high expression of ITGA2, MMP7, ITGB4, ITGA3, VCAN and PLAU may predict poor survival. In validation analysis using the data from GSE62452 and GSE71729, high expression of ITGA3, ITGA2, ITGB4 and PLAU was proved to be associated with poor survival.

Till now, no research has showed the expression of ALB (albumin) in PDAC, there are only studies investigating the correlation between serum ALB and PDAC. Deng et al. developed a nomogram for predicting survival in PDAC patients, decrease level of ALB indicated poor survival [22]. Arima and Liu et al. also proved that high C-reactive protein (CRP)/ALB ratio also indicated poor survival [23]. However, the serum ALB level is influenced by the patient’s nutrition status, the liver function and other elements, and could not reflect the expression of ALB in the tissues, therefore, further studies are needed on this issue.

COL11A1, COL1A1, COL3A1, COL5A2, COL6A3, are all collagen genes. In Garcia-Pravia et al.’s research, the expression of COL11A1 is significantly increased in PDAC samples compared with normal and chronic pancreatitis (CP) tissues, and they further pointed out that, proCOL11A1 may be a powerful new marker for its diagnosis [24]. Araft et al. proved that PDAC tissues had significantly upregulated COL6A3 protein levels compared with paired adjacent tissues, and that presence of COL6A3 isoform and high protein levels appeared to correlate with tumor stage [25]. As for the rest of the collagen genes, no study had showed their relationship with PDAC for the moment, and maybe that’s what we can do next.

FBN1 (fibrillin 1) and FN1(fibronectin 1) are also ECMs, the study investigating FBN1 and PDAC is rare, only one pointed out that in the process of pancreatic islets progressed to angiogenic to insulinoma, FN1 and FBN1 were found in significantly higher abundance [26]. Hu et al. verified 25 protein biomarker candidates for PDAC prognosis, and they brought up that upregulated FN1 may predict poor survival [27]. Our survival analysis did not prove this, and the reason might lie in how we divided the group in our study. In their study, fold change ≥ 2 was regarded as upregulated, while in our study, expression higher than the third quartile was taken as highly expressed, and this may be the reason why the result does not accord.

COMP (cartilage oligomeric matrix protein) is a member of the thrombospondin family of ECM, and it was proved that, COMP was preferentially expressed in degenerating acinar cells in CP-like areas in pancreatic cancers and CP, indicating that this molecule is important in the course of acinar cell deterioration and dedifferentiation [28]. From their results, COMP could be a marker for PDAC with CP-like areas, but may not be of assistance in differentiating CP and PDAC.

The epidermal growth factor receptor (EGFR) signaling pathway is tightly related to tumorigenesis and progression [29]. Early studies [30,31,32] supported that overexpression of EGF (epidermal growth factor) and EGFR has been seen in pancreatic cancer samples compared with normal ones, in this point, our study is contrary to it. However, in another microarray analysis study, which used GSE16515 alone [33], the result was in accordance with ours, supporting that EGF was downregulated. This is really confusing, and the divergence may lie in that what we known about EGFR family is just the tip of the iceberg, and multiple members may participated in the aberrant autocrine and paracrine activation of this pathway [34]. On the other hand, Uegaki et al. thought that, the expression of EGF or EGFR alone does not reflect the prognosis of patients, their coexpression mattered [35], therefore new drugs blocking EGFR pathway still needs more exploration before advanced treatment shows up.

ITGA2, ITGA3, ITGB4 are all integrin subunits, which have important function in epithelial-mesenchymal transition (EMT). Nones et al. [36] proved that, in patients with PDAC, hypomethylation of ITGA2 correlated with high gene expression, which was related with poor survival, the result is the same as ours. There is also study showing that, ITGA3 was overexpressed in PDAC, and overexpression of ITGA3 correlated to poor survival [37], also the same as ours. Yamazaki and Masugi et al. firstly used microarray analysis, and identified ITG4 was upregulated in high-EMT xenografts derived from PDAC patients [38], additionally, they elucidated that, overexpression of ITGB4 promoted cell motility, and may be potential in regulating invasion and EMT [39]. Our study further believed that, high expression of ITGB4 was a risk factor for poor survival, and the results were validated both in GSE62452 and GSE71729, though no trial has been reported on this issue.

MMP1 and MMP7 are all metalloproteinases (MMPs), which have long been implicated for roles in cancer initiation and invasion [40]. Pancreatic cancer cells could induce alterations in MMPs in pancreatic stem cells (PSCs), including upregulation expression and activation of MMP1, and enhanced migration [41]. Fukuda et al. established that Stat3 signaling enforces MMP7 expression in pancreatic cancer cells, while MMP7 deletion restricts tumor size and metastasis in mice, and increased expression of MMP7 predicted shortened survival [42], our survival analysis of MMP7 is also in support of this. Even though data from GSE62452 and GSE71729 was not in support of this result, it may be due to that the sample size is not large enough in the two dataset, and more studies with larger sample-size are still needed to validate.

Plasminogen activator, tissue type (PLAT) and plasminogen activator, urokinase (PLAU) are both plasminogen activators. Bournet et al. used endoscopic ultrasound-guided fine needle aspiration biopsy samples to compare the different gene expression between advanced PDAC and pseudotumoural CP, and they demonstrated PLAT and PLAU were significantly overexpressed in cancer samples [43]. Besides, PLAU is highly expressed in more invasive pancreatic cells, and a combination of CDH3, LENG, and PLAU panels were significantly associated with poor survival [44]. In our study, high expression of PLAU also predicted poor survival, and data from GSE71729 further validated this result.

The basic research about THBS2 (thrombospondin-2) and PDAC is scarce, Kim et al. revealed that, the concentrations of plasma THBS2 discriminated among all stages of PDAC, and a new measurement combing both THBS2 and CA19-9 helped to increase the specificity to 98% in diagnosing PDAC [45], suggesting a combined blood marker panel may improve the detection of PDAC. POSTON (periostin) is a secretory protein function in cell adhesion, and was proved to drive the carcinogenic process, and furthermore, increase the chemoresistance to gemcitabine in pancreatic cancer cells [46]. Yu et al. elucidated that SPARC (secreted protein acidic and rich in cysteine) expressed differentially not only between PDAC samples and normal samples, but also showed difference in metastatic and normal lymph nodes, moreover, patients with positive POSTON expression had poor overall survival [47]. VCAN (versican) is a kind of proteoglycan, and was proved to be greatly increased in PDAC matrix, and disproportional increase of VCAN compared to another contradictory proteoglycan namely decorin may be associated with the aggressiveness of PDAC [48]. Survival analysis in our study pointed out high expression of VCAN was a risk factor for poor survival, but we failed to validate it with the data from GSE62452 and GSE71729, and no study has investigated the value of VCAN expression in the survival of PDAC for the moment, so this result still needs verification.

Conclusion

This study identified 179 DEGs, which include ALB, COL11A1, COL1A1, COL3A1, COL5A2, COL6A3, FBN1, FN1, COMP, EGF, ITGA2, ITGA3, ITGB4, MMP1, MMP7, PLAT, PLAU, THBS2, POSTN, SPARC, and VCAN. In addition, high expression of ITGA2, ITGA3, ITGB4, MMP7, PLAU, and VCAN may be predictors of poor survival. High-throughput technology, such as microarray analysis, and integrated bioinformatic analysis assist in the identification of hub genes in tumorigenesis and progression, and the results coordinate with previous studies well. The significant genes and pathways may open up brand-new possibilities for early detection and treatment of PDAC; however, further researches are still required for untangling the mechanism of PDAC occurrence and development.