Introduction

Pancreatic adenocarcinoma is one of the highly malignant and aggressive solid tumors, characterized by atypical symptoms, insidious location, rapid disease progression and poor prognosis. According to the American Cancer Society 2021 report, the 5-year survival rate of pancreatic cancer patients is only 8% [1]. Currently, surgical resection is still the only possible treatment to cure pancreatic cancer [2], and Gemcitabine or other biologically targeted therapies are given after surgery. However, most patients develop Gemcitabine resistance [3], resulting in recurrence and chemotherapy failure. Therefore, the discovery of new biomarkers of pancreatic cancer is essential for early diagnosis and development of new drug targets.

In pancreatic cancer tissues, malignant tumor cells account for only a small portion of the tumor components, with the majority of the rest being extracellular matrix [4], pancreatic stellate cells and fibroblasts proliferate [5]. In addition, pancreatic cancer has an extensively immunosuppressive microenvironment that promotes cancer cell proliferation by directly suppressing antitumor immunity or evading immune surveillance [6,7,8]. In clinical treatment, disrupting this immunosuppressive network and promoting the tumor-killing activity of immune effector cells have the potential to improve patient outcomes. There several immunotherapies for pancreatic cancer currently undergoing clinical trials, including immune system modulators targeting T cell receptors [9], immune checkpoint inhibitors, CAR-T cell therapy and tumor vaccines. Various immunotherapies have been tested in pancreatic cancer patients; however, most approaches failed to show clinical effects as found in other malignancies [10, 11]. Therefore, understanding the molecular mechanisms in the immune microenvironment of pancreatic cancer may provide new therapeutic opportunities for patients.

Estimation of STromal and Immune cells in MAlignant Tumours using Expression data (ESTIMATE) is a new algorithm proposed by Yoshihara and his colleagues in 2013 [12]. It uses gene expression data to calculate the score of stromal/immune cells in tumor tissues to further predict tumor purity. This computational method is currently used to explore the immune microenvironment of leukemia [13], stomach adenocarcinoma [14], hepatocellular carcinoma [15] and so on. In this study, we will use this algorithm to understand the unrevealed part of the immune microenvironment of pancreatic cancer.

In this research, the gene expression profile data of pancreatic adenocarcinoma patients were obtained from TCGA (excluding special pathological types). R software and WGCNA were used to identify the DEGs associated with prognosis. Subsequently, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed for these DEGs, and differential genes were integrated through PPI network to find Hub genes related to prognosis. Finally, it was validated in the Gene Expression Omnibus (GEO) database and Chinese pancreatic cancer tissues.

Materials and method

Database and ESTIMATE score

The RNA-Seq expression profiles and corresponding clinical information of PAAD patients were obtained from the TCGA database sharing data portal (https://portal.gdc.cancer.gov/). The gene expression datasets GSE78229 and GSE85916 were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/). The dataset of GSE78229 was coming from GPL6244 platforms (HuGene-1_0-st] Affymetrix Human Gene 1.0 ST Array) and contains 50 PAAD patients. The dataset of GSE85916 was based on GPL13669 platforms (Affymetrix Human Genome U219 Array) and contains 80 PAAD patients.

ESTIMATE is a tumor purity assay algorithm that uses gene expression data to predict the presence or absence of infiltrating stromal/immune cells in tumor tissue. Stromal and immune scores are calculated based on single-sample gene set enrichment analysis. Stromal, immune and ESTIMATE scores for each sample in the TCGA-PAAD cohort were downloaded from the official website (https://bioinformatics.mdanderson.org/estimate/). Association analysis of tumor patient survival was performed between these scores.

Identification of DEGs based on immune and stromal score

All PAAD patients were classified into two groups (high group vs low group) based on an immune/humoral score of 0 as the threshold. Data analysis was performed through employing the package edgeR. The inclusion criteria for identifying DEGs were set as FC|Fold Change|> 1 and adjust P < 0.05.

Weighted correlation network analysis (WGCNA)

WGCNA analysis is a bioinformatics analysis method used to describe patterns of gene association between different samples. Genes with similar expression patterns can be clustered to analyze the relationships between modules and specific traits or phenotypes. DEGs co-expression networks were performed utilizing the WGCNA package, and clustering results are shown as a color-assigned dendrogram of genes with similar traits in the same module. To identify prognostically relevant modules, heatmaps of module–trait relationships between age, gender, OS (overall survival) and survival, and P values were plotted.

DEGs function analysis

To analyze the function of the above DEGs, GO enrichment analysis and KEGG pathway enrichment analysis were performed using the website of Database for Annotation, Visualization and Integrated Discovery (DAVID, https://david-d.ncifcrf.gov/). GO enrichment analysis included biological processes (BP), cellular components (CC) and molecular functions (MF); P < 0.05 was considered statistically significant.

PPI network establishment and Hub gene identification

The string database was used to analyze the protein information and PPI network information of DEGs (https://string-db.org/) [16]. MCODE is a plug-in for Cytoscape [17] to construct functional modules for clustering in gene (protein) networks.

Survival analysis

We constructed Kaplan–Meier survival curves of these DEGs, and according to log-rank test, we found which DEGs were significantly negatively correlated with overall survival (P < 0.05).

Relationship between tumor immune infiltration and hub genes

We analyzed the correlation between hub genes and six tumor-infiltrating immune cells by TIMER (http://timer.comp-genomics.org/), which contains data from all tumor samples in TCGA. We also explored the correlation between hub genes and 22 immune cells using CIBERSORT[18].

Immunohistochemical (IHC)

Tissue microarray of Chinese pancreatic cancer was purchased from SHANGHAI OUTDO BIOTECH CO., LTD. The IHC staining of PLAU (1:200) was performed following the manufacturer’s protocol. Tissue microarray were boiled in citrate buffer (pH 6.0) 10 min for antigen retrieval. The staining intensity score defined by two independent experienced pathologists as follows: negative (0), weak (1), moderate (2) and strong (3). Percentage scores were defined as 1 (1–25%); 2 (26–50%); 3 (51–75%); and 4 (76–100%). We divided the all tissue into high (score ≤ 6) and low (score > 6) groups according to the final score. Final score = intensity score multiplied by percentage scores.

Statistical analyses

All data were analyzed by R studio (version 3.6.1) and GraphPad prism 8.0.2. Data were analyzed using the log-rank test and Chi-square test. Statistical significance was set at P < 0.05.

Results

Immune and stromal scores correlated with overall survival

The complete gene expression profile and clinical information of 177 cases of PAAD were obtained from TCGA database. The age of PAAD ranged from 35 to 88 years old, with a median age of 65 years. There were 97 males (54.8%) and 80 females (45.2%).

The immune, stromal and ESTIMATE scores of each sample were obtained from the ESTIMATE website. The immune scores ranged from −1559.87 to 3037.78, the stromal scores ranged from −1843.32 to 2179.19, and the ESTIMATE scores ranged from −3178.36 to 4435.59. To explore the potential relationship between immune/stromal scores and patient survival rate, we divided PAAD patients into high and low groups according to the score of zero and performed Kaplan–Meier survival analysis. Results showed that patients with lower immune scores tended to have a better prognosis than those with higher levels (Fig. 1A).

Fig. 1
figure 1

The relationship between immune status and overall survival in PAAD. PAAD patients were divided into two groups according to high and low immune/stroma/ESTIMATE scores. A, Kaplan–Meier curve shows the overall survival of the high and low immune score groups. B, Kaplan–Meier curve shows the overall survival of the high and low stromal score groups. C, Kaplan–Meier curve shows the overall survival of the high and low ESTIMATE score groups

Identification of DEGs based on immune scores and stromal scores in pancreatic adenocarcinoma

To elucidate the relationship between gene expression profiles and immune status, package edgeR was performed to identify sets of genes that were significantly up- and down-regulated between the two groups with high/low immune scores and stromal scores. For both immune and stromal scores, use |log (Fold Change)|> 1 and adjust P value < 0.05 as cutoff criteria. A total of 1760 up-regulated genes and 155 down-regulated genes were identified in the immune score group (Fig. 2A), and 1670 up-regulated genes and 129 down-regulated genes were identified in the matrix score group (Fig. 2B). The DEGs with significant difference between the two groups were shown in the heatmap (Fig. 2C, D). After comprehensive bioinformatics analysis, the crossover genes in the two gene sets were identified, containing 1433 up-regulated and 67 down-regulated genes (Fig. 2E, F).

Fig. 2
figure 2

Identification of differentially expressed genes (DEGs) based on immune/stromal scores in pancreatic adenocarcinoma. A, B, Two respective volcano maps of the two groups. Red indicates genes with fold change > 1 and adjust P < 0.05, blue represents genes with fold change < -1 and adjust P < 0.05, and gray indicates that the remaining genes are not significantly different. C, D, Heat map of two groups of significantly differentially expressed genes, with red representing high expression and blue representing low expression. E, F, Intersection of two groups of differentially expressed genes

Functional enrichment analysis of DEGs

The DEGs with |Fold Change|> 2 was selected for GO and KEGG analysis at the DAVID website. The GO enrichment analysis results include cellular component (CC) (Fig. S1A), molecular function (MF) (Fig. S1B) and biological process (BP) (Fig. S1C). For BP, DEGs were predominantly enriched in complement activation, receptor-mediated endocytosis, immune response and so on. For CC, DEGs were mainly enriched in extracellular region, extracellular space and plasma membrane. For MF, DEGs were mainly enriched in antigen binding, serine-type endopeptidase activity and immunoglobulin receptor binding. KEGG pathway enrichment analysis showed that DEGs were predominantly enriched in phagosome, cytokine–cytokine receptor interaction and chemokine signaling pathway (Fig. S1D).

WGCNA and identification of key module

The up-regulated and down-regulated DEGs are analyzed by WGCNA. When the soft threshold of the network is 5, the co-expression network and the scale-free network have the best fit. The results showed that DEGs could be divided into 8 modules according to their functions, of which the most significant modules positively correlated with over survival (OS time) was blue (Fig. 3).

Fig. 3
figure 3

Weighted correlation network analysis (WGCNA). A, Analysis of the scale-free fit index (left) and the mean connectivity (right) for various soft-thresholding powers. B, Gene clustering dendrograms. C, Topological overlap heat maps. D, Heat map of correlations between modules and clinical features

Survival analysis of genes in blue module

To clarify the relationship between genes in blue module and overall survival of pancreatic cancer patients, we constructed Kaplan–Meier survival curves for these differential genes, 57 of which were significantly correlated with overall survival according to the log-rank test (P < 0.05) (Fig. S4, Table S1).

Visualization of gene expression patterns and functional enrichment analysis

We visualize the expression patterns of 57 prognostic-related genes and their chromosomal location (Fig. S2). According to fold change value, the top 5 up-regulated genes are COL10A1, MUC16, CXCL5, GREM1 and EPYC are distributed on chromosomes 6, 19, 4, 15 and 12, as well as significantly negative correlated with overall survival. The top four down-regulated genes FOXA2, DNASE1L2, EPHA10 and WFIKKN1 are located on chromosomes 20, 16, 1 and 16, as well as significantly positive correlated with overall survival. Subsequently, we conducted GO and KEGG analysis. The most important GO terms in biological processes, cell components, molecular functions and KEGG pathways are shown in Fig. 4.

Fig. 4
figure 4

Chord diagram demonstrates GO and KEGG analysis of prognosis-related genes. A, biological processes (BP), B, cellular components (CC) and C, molecular functions (MF). D, KEGG pathways

Construction of PPI network and identification of prognostic-related gene

The 57 prognostic-related genes were selected to construct the PPI network via using the STRING database (Fig. 5A). The protein network information obtained from the String database was then imported into Cytoscape 3.8.0 for visual analysis to calculate the network and topological characteristics of each node. The most important subnetworks of the entire protein network were constructed based on the plug-in of MCODE (Fig. 5B, C). In all protein networks, the size of the node represents the degree. The above-combined analysis shows that the protein interaction network contains 16 central genes (MMP14, LOX, SERPINE1, DKK1, WNT2, TWST1, FOXA2, WNT7A, COL6A1, COL10A1, COL6A3, COL12A1, SNAI2, FGF2, MMP11 and PLAU).

Fig. 5
figure 5

Construction of PPI network and identification of key subnetwork. A, PPI network with 41 nodes and 81 edges was constructed based on the STRING database and Cytoscape software. The color of the nodes represents the degree in the network. Two important subnetwork modules were identified based on the level of importance. B, The subnetwork contains 8 nodes and 12 edges. C, The subnetwork contains 8 nodes and 11 edges

Validated analysis in GEO database

Further, we determine the prognostic value of 57 genes in other databases (Fig. 6). Two independent data sets were downloaded from the GEO database and validated for analysis (GSE78229 and GSE85916). Then, 19 genes were validated to be inversely related with the PAAD survival (SERPINB2, MMP14, GJB2, SCEL, MMP11, IL1RN, COL6A3, ANXA1, AQP9, CXCL5, GREM1, DCBLD2, SEMA7A, MUC16, CHST11, MSN, TGM2, PLAU and FRMD6) (Table S2).

Fig. 6
figure 6

Genes of prognostic value were validated in the GEO database. Representative genes were significantly negatively correlated with overall survival

Association of prognostic genes’ expression with tumor purity and immune infiltration

Combining PPI and GEO data validation, we identified four prognosis-related genes. We used TIMER to explore the potential association between PAAD prognosis-related gene expression and tumor purity and immune cell infiltration (Fig. S3). COL6A3, PLAU, MMP11 and MMP14 were all weakly negatively correlated with tumor purity. On the contrary, there was a partial positive correlation between these four genes and infiltration of CD8 + T cells, CD4 + T cells, dendritic cells, B cells, neutrophils and macrophages.

Additionally, based on PAAD gene expression data, we explored the correlation between four hub genes and 22 types of immune cell infiltration. As shown in Fig. 7, the expression levels of COL6A3, PLAU, MMP11 and MMP14 were positively correlated with macrophages M0, while the levels of PLAU were negatively correlated with T cells CD4 memory resting. These results suggest that hub genes may play an essential role in the regulation of immune cells.

Fig. 7
figure 7

The correlation between hub genes and 22 immune cell types. Colors are Pearson correlation coefficients, numbers are p-values, significant in green (P < 0.05)

Clinical experimental validation

Some researchers have shown that MMP11 and MMP14 were overexpressed in pancreatic cancer compared to normal tissues. So, we verified the expression of PLAU in Chinese clinical specimens by IHC. IHC results showed that PLAU protein levels from pancreatic adenocarcinoma tissues were significantly higher than adjacent non-tumorous tissues (Fig. 8). In addition, PLAU protein levels positively correlated with node metastasis and TNM stage, but not with gender, Age, or tumor size (Table 1).

Fig. 8
figure 8

Representative immunohistochemistry picture showing PLAU higher expression in pancreatic adenocarcinoma tissues. The left row represents tumor tissue, and the right row represents adjacent non-tumorous tissues. Scale bar = 500um

Table 1 PLAU expression and clinicopathological features in pancreatic cancer

Discussion

Severe immunosuppression in the immune microenvironment of pancreatic cancer is critical for its resistance to immunotherapy, including immune checkpoint blockade and cytokine therapy [19, 20]. In this study, we analyzed the prognostic value of immune microenvironment-related genes in pancreatic cancer. We divided the microarray dataset obtained from TCGA into high and low groups based on immune/stromal scores. First, commonly DEGs in the two groups were identified. Next, we used bioinformatics to explore the function of these genes in depth. This includes GO, KEGG enrichment analysis, PPI network construction and Hub gene identification. Finally, prognosis-related gene validation using GEO database.

Based on sample immune/stromal scores, we found that immune score was significant in terms of overall survival with pancreatic cancer. In pancreatic cancer, immune, stromal [21] and extracellular components are critical in influencing proliferation, metastasis, and treatment resistance [22,23,24]. The immune cell hosts involved in pancreatic cancer pathology are regulatory T cells (Treg) [25], bone marrow-derived suppressor cells(MDSCs) [26] and dendritic cells [27]. These cells and cancer cells can secrete IL-6 [28], IL-10, TGF-β cytokines that contribute to maintain the immunosuppressive microenvironment of pancreatic cancer. Stroma components can stimulate the proliferation of immunosuppressive cells and blood vessel formation and promote the metastasis of pancreatic cancer [29, 30].

In addition, we carried out enrichment analysis on the identified DEGs. Consistent with the published data, these DEGs are enriched in several GO and KEGG terms, such as complement activation, immune response, extracellular region/space, chemokine signaling pathway, ECM–receptor interaction and PI3K-Akt signaling pathway, which have confirmed their involvement in the progression of pancreatic cancer [31,32,33,34].

The co-expression network was constructed by WGCNA and the key module was identified as the blue module. In the blue module, genes associated with prognosis of pancreatic cancer were also enriched by GO and KEGG analysis showing that these genes were enriched in collagen, fibrous tissue, extracellular matrix, cytokine activity and receptor activation. In pancreatic cancer, fibroblast proliferation forms a physical barrier that makes it difficult for drugs and immune cells to reach the tumor [35, 36]. Reconstruction of the extracellular matrix promotes pancreatic cancer growth by activating multiple cytokines [37]. In addition, a type I collagen-dominant phenotype during extracellular matrix remodeling tends to stimulate angiogenesis and neurogenesis to promote neointima formation, which is beneficial for tumor metastasis [38]. Abundant extracellular matrix components are associated with neural tissue infiltration in pancreatic cancer [39, 40]. Combining the results of GO and KEGG analysis above, we suggest that these DEGs are closely related to pancreatic carcinogenesis.

Through PPI construction and validation analysis of the GEO database, we identified four hub genes involved in the prognosis of pancreatic cancer, namely COL6A3, PLAU, MMP11 and MMP14, to explore their diagnostic and prognostic value. Matrix metalloproteinases are closely associated with extracellular matrix remodeling and play an important role in progression and invasion in solid tumors [41], MMP11 and MMP14 are often up-regulated in pancreatic cancer [42,43,44]. Type VI collagen (COL6) is often linked to type I collagen protofibrils to form a network of microfibrils [45], which may prevent drugs from reaching cancer. VI collagen also inhibits apoptosis and oxidative stress damage [46, 47], and promotes tumor cell growth. COL6A3 has been reported to be involved in the pathogenesis of various cancers, such as ovarian cancer [48] and breast cancer [49]. But, its underlying mechanism in pancreatic cancer remains unclear.

PLAU belongs to the PA family of serine proteases and converts plasminogen to plasmin. PLAU can degrade or remodel proteins that build the extracellular matrix [50], which leads to the release or activation of various types of growth factors that promote the migration and invasion of cancer cells. PLAU also activates several signaling pathways, including JAK-STAT, ERK and MAPK, which enhance tumor cell proliferation [51]. PLAU plays an essential role in the metastasis and infiltration of a variety of tumors, such as breast cancer and glioma [52, 53]. But, its role in the diagnosis and prognosis of pancreatic cancer remains unclear.

Our study revealed that PLAU was highly expressed in PAAD, positively correlated with macrophage M0 and negatively correlated with T cell CD4 memory resting. Macrophages are one of the most abundant immune cells in the tumor microenvironment and exert antitumor or pro-tumor effects according to different differentiation phenotypes [54]. The high expression of PLAU in pancreatic cancer may be related to macrophage differentiation, as in colon cancer [55]. In esophageal squamous cell carcinoma, PLAU can increase IL-8 expression and recruit MDSCs to maintain tumor growth [56]. PLAU can also maintain the immunosuppressive function of Treg through STAT5 and ERK signaling pathways [57]. Taken together, PLAU may be involved in the pathogenesis of pancreatic cancer by regulating macrophage differentiation and maintaining an immunosuppressive microenvironment.

Most of the data included in TCGA were from Caucasians. To further verify the correlation between PLAU and PAAD, we performed IHC staining of 85 pairs of Chinese clinical PAAD tissues. The results showed that PLAU was highly expressed in PAAD tissues, which was consistent with the results obtained from the TCGA database. In addition, we found that high expression of PLAU in PAAD was also associated with tumor TNM stage and lymph node metastasis, which further demonstrated that PLAU may involve in the invasion and metastasis of PAAD.

Although there have been several published papers exploring prognostic genes in pancreatic cancer, our study has the following novel findings. First, we sequentially used the edge R package and WGCNA to identify immune-related prognostic genes. Second, we first performed bioinformatics analysis using TCGA and GEO databases and then validated our result with Chinese clinical samples. Third, our results show that PLAU protein expression correlates with nodal metastasis and TNM staging.

In conclusion, we performed an integrated bioinformatics analysis of the expression matrix of pancreatic cancer patients in TCGA based on the immune microenvironment. The in-depth exploration of DEGs led to the identification of four genes that may have prognostic value. Further validation in Chinese clinical samples revealed that PLAU was highly expressed in PAAD tissues and was associated with tumor staging and lymph node metastasis. This study may unveil new insights into the complex crosstalk network in the pancreatic cancer microenvironment and provide new targets for pancreatic cancer therapy.

Conclusions

We performed a comprehensive bioinformatic analysis of a dataset of pancreatic adenocarcinoma patients in TCGA based on the immune microenvironment and identified four genes that may have prognostic value. Among them, PLAU was significantly associated with tumor TNM stage and lymph node metastasis in Chinese pancreatic cancer patients, which affected the prognosis of PAAD patients. Our findings have clear implications for PLAU as a biomarker to predict the prognosis of PAAD patients and provide a new target for the treatment of pancreatic cancer. However, further clinical studies are still needed to validate these findings.