Introduction

Gastric cancer is one of the highest-incidence and highest-mortality tumors and poses a major challenge to human health worldwide (Ferlay et al. 2013; Jemal et al. 2011; Torre et al. 2015; Znaor et al. 2013). In China, gastric cancer has been revealed as the most common malignant tumor of the digestive tract. Data further suggests gastric cancer is becoming more prevalent, with males being more susceptible than females, and most cases were detected in adults aged between 55 and 70 (Catalano et al. 2009). Additionally, inflammation caused by Helicobacter pylori (HP) is closely related to gastric cancer. Inflammatory-related malignant transformation is a biological process involving multiple cells, genes, and non-coding RNAs (Migita et al. 2018; Senol et al. 2014; Wang et al. 2017a, b). Gastric adenocarcinoma is a type of gastric cancer that results from the deterioration of gastric gland cells, accounting for 95% of all gastric cancers (Blank et al. 2014; Dixon et al. 2016). Early diagnosis can significantly improve treatment prognosis of this cancer type, but the lack of effective early diagnosis approaches means most patients are often detected in late -stage or already have developed metastasis, reducing the effectiveness of treatment. As a result, elucidating the underlying mechanisms during the development and progression of GA is critical for the advancement of new tumor biomarkers and therapeutic targets.

Between 80 and 90% of human RNAs are non-coding RNAs. In the last decade, researches have described how many non-coding RNAs play important roles in various cellular events (Song et al. 2013). The mature miRNA is composed of approximately 22 nucleotides (nt), which are recognized by the ribonucleoprotein in the cell and assembled into an RNA-induced silencing complex (RISC). RISC can result in mRNA degradation or inhibition of mRNA translation by pairing with the 3′ region of mRNA completely or incompletely (Kim et al. 2009), which can significantly modulate gene expression. Long non-coding RNAs (lncRNAs) are > 200 nt sized transcripts that are not translated. The abnormal expression of lncRNAs has been considered to be involved in various tumorigenic processes. It has been reported that changes in the expression level of lncRNA are closely related to the occurrence of gastric cancer (Fang et al. 2015). Work has shown that lncRNAs have microRNA responsible elements (MRE), the binding site that can sponge miRNAs, so that miRNA-mediated post-transcriptional regulation of target mRNAs can be impaired. In 2011, after clarifying the interaction between PTEN and its pseudogene PTENP1, these transcripts were named competitive endogenous RNA (ceRNA) (Poliseno et al. 2010). Recently, various reports have clarified the presence of ceRNA networks in a variety of cancers including gastric cancer (Xia et al. 2014). Identification of gastric cancer-associated ceRNA regulatory networks has been suggested to provide insight into the role of these RNAs in tumorigenesis and treatment outcomes in gastric cancer.

In this study, we constructed a regulatory network between lncRNAs–miRNAs–mRNAs. Here 10 lncRNAs and 2 mRNAs were identified to be associated with the prognosis of GA. Analysis further revealed that AC018781.1 and VCAN-AS1 are independent risk factors for GA. Through Gene Ontology (GO) enrichment analysis of DE(differentially expressed)-mRNA, it was found that it was significantly enriched in TERM (P < 0.05), and the Kyoto Encyclopedia of Genes and Genomes (KEGG) signal regulatory network analysis found that DE-mRNA was enriched in AGE-RAGE signaling pathway in the diabetic complications pathway. The differential expression profile analysis of three candidate lncRNAs and two mRNAs were confirmed by RT-qPCR. Ultimately, a promising ceRNA regulatory network related to the progression of GC was successfully identified. This novel approach of predicting GA related lncRNA and lncRNA–miRNA–mRNA networks could help to understand the underlying mechanism of GA.

Materials and methods

Database screening

The GA-associated gene (miRNA, lncRNA, mRNA) expression data was collected from the TCGA database (https://gdc-portal.nci.nih.gov/). This included 375 GA samples and 32 samples from normal tissues. Data was merged and gene ID conversion was performed by Perl script allow the development of Gene expression matrix.

Identifying differentially expressed genes

The expression matrix of lncRNAs, miRNAs and mRNAs were analyzed by R language edge R package (version 3.5.1) to obtain the expression matrix of differentially expressed RNAs (cut-off: false discovery rate (FDR) < 0.01 and |fold change|> 2). Correlated pairs of DE-lncRNAs and DE-miRNAs, as well as DE-miRNAs and DE-mRNA were evaluated by using Perl scripts. The lncRNAs–miRNAs–mRNAs regulation loops were obtained based on lncRNAs-miRNAs and miRNAs-mRNAs regulation pairs. The regulatory relationship between DE-lncRNAs and DE-miRNAs pairs were predicted by miRcode (http://www.mircode.org). The regulatory relationship between DE-miRNAs and DE-gene (DEG) pairs were predicted by miRDB, miRTarBase, TargetScan. Using the R language Venn diagram the regulatory pairs of DE-miRNAs and DEGs as well as DE-lncRNAs and DE-miRNAs (that have opposite expression trends with each other) were selected to construct a Wayne map of regulatory networks.

Single factor, multi-factor COX regression

Univariate analysis was used to assess DE-lncRNA, DE-mRNA, DE-miRNA in combination with clinical pathological data. DE-RNAs with P value less than 0.05 were selected for multivariate COX regression analysis. This analysis was used to develop a prognostic model of gastric cancer and ROC curve was made to verify the model.

ceRNAs regulation network map construction and survival analysis

The integrated co-expression network of DE-lncRNAs, DE-miRNAs and DE-mRNAs were visualized by Cytoscape software. Further, prognostic DEGs, DE-lncRNAs, and DE-miRNAs in the ceRNAs network were identified, and Kaplan–Meier survival plots of representative miRNAs, lncRNAs and mRNAs were drawn by the R survival package.

Functional enrichment analysis

GO analysis was performed using the BINGO plugin of Cytoscape software. KEGG analysis was performed using KOBAS (http://kobas.cbi.pku.edu.cn/).

RNA extraction and reverse transcription quantitative polymerase chain reaction (RT-qPCR) assay

Total RNA of GA samples and matched adjacent normal samples in 6 patients were obtained TRIzol reagent (Invitrogen Life Technologies, Carlsbad, CA, USA) and then was reversely transcribed into complementary DNA (cDNA) (PrimeScript™RT reagent Kit with gDNA Eraser, Takara, Otsu, Shiga, Japan). Real Time-PCR was performed usingTaKaRa TB Green™ Premix Ex Taq™ II (Takara, Otsu, Shiga, Japan). GAPDH was used as reference genes to normalized the expression of candidate genes. Sequence or primers was showed in Table 1. The study was approved by Ethics Committee of Xingtai People's Hospital of Hebei Medical University and informed consents from all patients.

Table 1 PCR primers used for qRT-PCR

Results

Identification of DE-lncRNAs, DE-miRNAs and DE-mRNAs associated with GA

After analysis of RNA-seq and miRNA-seq data from TCGA, a total of 1632 mRNAs, 1008 lncRNAs, and 104 miRNAs were identified and screened for differential expression. Targeted predictive analysis using miRDB, miRTarBase and TargetScan indicated that mRNA expressed in GA contains potential targets for lncRNA and miRNA, of which 10 mRNA, 65 lncRNA and 10 miRNA are differentially expressed RNA (DE-RNA) (supplement 1).

Identification of DE-lncRNAs, DE-miRNAs and DE-mRNAs that are associated with GA survival

Single-factor COX analysis of DE-RNA (P < 0.001) suggested that 17 DE-RNAs (CADM2, ADAMTS9-AS1, ADAMTS9-AS2, C15orf54, VCAN-AS1, AC110491.1, FRMD6-AS2, AC011374.1, LINC00326, POU6F2-AS2, AC018781.1, AL391152.1, AL139002.1, ERVMER,COL1A1,ATAD2,SERPINE1) are risk factors for GA (supplement 2). A heat map was constructed as shown in (Fig. 1a). Multivariate Cox regression analysis showed that 10 combined DE-RNAs can predict GA prognostic, while AC018781.1 and VCAN-AS1 were predicted independent prognostic factors of GA (Table 2). Using these 10 DE-RNAs to construct a prognostic model, the survival time of low-risk patients was significantly higher than that of high-risk patients (Fig. 1b). In addition, ROC analysis indicated good predictability as area under the curve (AUC) was reported as 0.704, 95%CI (0.639–0.749) (Fig. 1c).

Fig. 1
figure 1

The identification of DE-lncRNAs, DE-miRNAs and DE-mRNAs that are associated with GA. a Construction of a DE-RNA heat map that affects the development of GA. b ROC analysis of DE-RNA predictability. c Survival analysis of DE-RNA and prognosis of GA

Table 2 Multivariate Cox regression analysis for significant DE-RNAs

Kaplan–Meier Survival analysis of DE-lncRNAs, DE-miRNA, DE-mRNA for GA

A total of 10 DE-lncRNAs (AC010145.1, AC018781.1, ADAMTS9-AS1, ADAMTS9-AS2, AL139002.1, AL391152.1, IGF2-AS, LINC00326, POU6F2-AS2, VCAN-AS1) (Fig. 2a–j) and two DE-mRNAs (ATAD2, SERPINE1) (Fig. 2k, l) were found to be associated with survival rates of GA (P < 0.05) by Kaplan–Meier analyses. Additionally, no DE-miRNAs were found to be related to the prognosis of GA.

Fig. 2
figure 2

Kaplan–Meier Survival analysis of DE-lncRNAs (a AC10145.1, b AC01818781.1, c ADAMTS9-AS1, d ADAMTS9-AS2, e AL139002.1, f AL391152.1, g IGF2-AS, h LINC00326, i POU6F2-AS2 j VCAN-AS1), DE-mRNA for GA (k ATAD2, l SERPINE1). Blue lines represent low expression and red lines represent high expression

Construction of a ceRNA regulatory network seen in GA

DE-lncRNA, DE-miRNA, and DE-mRNA were used to construct a ceRNA network to further analyze the regulatory relationship. A total of 169 pairs of lncRNA-miRNA and 16 pairs of DEmiRNAs-DE mRNAs were involved in the construction of ceRNA network map (Fig. 3). The lncRNAs–miRNAs–mRNAs regulatory network includes 85 nodes and 184edges.

Fig. 3
figure 3

Construction of a ceRNAs regulatory network for GA. Triangles represent mRNA, squares represent miRNAs, and circles represent lncRNA. Red represents high expression and yellow represents low expression

Functional analysis of GO and KEGG pathways that are regulated in GA

GO analysis of 10 DE-mRNAs revealed that the three genes ESRRG, ATAD2, and COL1A1 were enriched in the "positive regulation of transcription, DNA-templated" function on biologic processes (Table 3). As shown in Fig. 4, a total of 108 nodes and 166 edges constitute a functional regulatory network. KEGG analysis revealed 11 signaling pathways (P < 0.05) the most significant among them was the AGE-RAGE signaling pathway associated with diabetic complications (Fig. 5).

Table 3  Representative DE-mRNAs were selected for gene ontology analysis with DAVID
Fig. 4
figure 4

Functional analysis of GO in GA. Each node stands for a certain process and a larger size indicates a larger number of genes involved in the process. The colored nodes indicate statistical difference (P < 0.05). White-colored nodes were only used to connect the biologic processes without statistical difference

Fig. 5
figure 5

KEGG pathways in GA. The color represents the P value and the length represents the number of enriched genes

Expression of candidate DE-RNAs in patients

To further verify our findings, three DE-lncRNAs (VCAN-AS1, IGF2-AS, IPOU6F2-AS2) and two DE-mRNAs (ATAD2, SERPINE1) were further explored by RT-qPCR. The expression of VCAN-AS1, IGF2-AS, IPOU6F2-AS2 and SERPINE1 was higher in GA than that in matched adjacent normal samples. The expression of ATAD2 was lower in GA than that in matched adjacent normal samples (Fig. 6). These results were consistent with bioinformatic analysis.

Fig. 6
figure 6

Relative expression of ATAD2 (a), SERPINE1 (b), VCAN-AS1 (c), IGF2-AS (d), IPOU6F2-AS2 (e). GA gastric adenocarcinoma, Con adjacent normal samples

Discussion

Epidemiological investigations have suggested the main contributing factors to GA development are Helicobacter pylori infection, diet, lifestyle, host gene type, and smoking (Kato and Asaka 2012; Uemura et al. 2001). The pathological process of GA is complex involving multiple genetic and phenotypic factors. GA is not just the result of the expression of specific prognostic genes, but also the complement of miRNA and lncRNAs (Chong et al. 2018), several post-transcriptional events like methylation level (Maeda et al. 2018) and other translational events. The development of GA entails the formation of a complex regulatory network involving RNAs. However, many studies are currently limited to specific genes or specific gene regulatory pathways. Few studies have examined the regulation network involving lncRNAs–miRNAs–mRNAs in GA. Here we explore the ceRNA regulatory network through bioinformatics and integrated analysis, with the aim to identify key genes involved in the development of GA and provide data that may be used in the development of molecular biomarkers and targeted drug screen for GA.

In this study, we investigated the factors affecting the development of GA by constructing lncRNAs–miRNAs–mRNAs regulatory network based on the ceRNA hypothesis. To achieve this goal, we collected the expression data of GA-related genes (miRNAs, lncRNAs and mRNAs) from the TCGA database, and selected 1632 DE-mRNAs, 1008 DE-lncRNAs, and 104 DE-miRNA. From this, a ceRNA regulatory network was developed from 65 DE-lncRNAs, 10 DE-miRNAs and 10 DE-mRNAs. At the same time, we identified 10 DE-lncRNAs (AC010145.1, AC018781.1, ADAMTS9-AS1, ADAMTS9-AS2, AL139002.1, AL391152.1, IGF2-AS, LINC00326, POU6F2-AS2, VCAN-AS1) and 2 DE-mRNAs (ATAD2, SERPINE1) that were associated with survival time of GA. Studies have shown that ADAMTS9-AS2 can participate in gastric cancer cell proliferation, apoptosis, migration and invasion. Furthermore, ADAMTS9-AS2 has been found to play a key role in the development of gastric cancer by regulating PI3K/Akt pathway (Cao et al. 2018). In addition, the SERPINE1 gene is involved in tumor gene activation (Rivas-Ortiz et al. 2017), and is currently studied in esophageal cancer (Klimczak-Bitner et al. 2016), rectal cancer (Wang et al. 2017a, b), endometrial cancer (Yildirim et al. 2017), head and neck cancer (Pavon et al. 2016) and other cancers, High expression of SERPINE1 is associated with poor prognosis in these cancer patients. These research results confirm the analysis results of our study, which proves confidence in the data generated. Additionally, for the COX regression analysis, we included AC018781.1, ADAMTS9-AS1, AL139002.1, AL391152.1, C15orf54, ERVMER61-1, LINC00326, VCAN-AS1, ATAD2 and SERPINE1 genes in the model. This analysis revealed that AC018781.1 and VCAN-AS1 are independent risk factors for GA. To identify these results, RT-qPCR were performed. VCAN-AS1, IGF2-AS, IPOU6F2-AS2 and SERPINE1 was higher expressed but ATAD2 was lower expressed in GA than that in matched adjacent normal samples. These data indicate that VCAN-AS1, IGF2-AS, IPOU6F2-AS2 and SERPINE1 may play an oncogene role in GA, while ATAD2 may play a role of tumor suppressor gene in GA, and further studies are needed to confirm that they are potential tumor biomarkers.

The lncRNAs appear to play an important regulatory role in the modulation of gene expression. LncRNAs can bind endogenous miRNA to play a part in the ceRNA network (Cesana et al. 2011; Kallen et al. 2013; Wang et al. 2013). Previous studies have constructed a gastric cancer-related lncRNA-mRNA network showing that lncRNA RP11-363E7.4 was a key regulator both in the topology and random walk with a restart analysis. This new method which predicts gastric cancer-related lncRNA and lncRNA–miRNA–mRNA networks helps to understand the underlying mechanisms of gastric cancer (Wang et al. 2018).

Another study using bioinformatics conducted on gastric cancer in patients from India, predicted 19 lncRNA-regulated miRNAs and mRNAs related to gastric cancer (Arun et al. 2018). This study also conducted a comprehensive network analysis of lncRNA, miRNA and mRNA to determine the presence of ceRNA networks (Arun et al. 2018). In addition, studies on specific lncRNA expression patterns and ceRNA networks in gastric cancer have revealed that gastric cancer-specific lncRNAs are associated with clinical features, and these lncRNAs can be used as new candidate biomarkers and potential prognostic indicators for clinical diagnosis of gastric cancer (Li et al. 2016). In this study, we constructed a GA-related ceRNA network map to further analyze the regulatory relationship between lncRNA, miRNA and mRNA. A total of 169 pairs of lncRNA-miRNA and 16 pairs of DE-miRNAs-DE mRNAs were involved in the construction of ceRNA network map. The lncRNAs–miRNAs–mRNAs regulatory network included 85 nodes and 184 edges, and such networks are important for further study of biomarkers and potential prognostic indicators of GA.

Further, we performed functional enrichment and pathway analysis of DE-RNAs. GO analysis of 10 DE-mRNAs revealed that three genes ESRRG, ATAD2 and COL1A1 were enriched on the "positive regulation of transcription, DNA-templated" biologic processes. In this case, a total of 108 nodes, and 166 edges constituted a functional regulation network. Additionally, KEGG analysis revealed 11 signaling pathways, of which the AGE-RAGE signaling pathway associated with diabetic complications was the most significant. Interestingly, recent studies have shown that the AGE-RAGE axis composed of AGE and its receptor RAGE plays an important role in the development of various tumors (Abe and Yamagishi 2008; Taguchi et al. 2000). RAGE is highly expressed in prostate cancer (Allmen et al. 2008; Ishiguro et al. 2005), liver cancer (Yaser et al. 2012), Pancreatic cancer (DiNorcia et al. 2012) and associated with tumor growth and progression. These results suggest that the prognosis of GA can be improved by further evaluation of key regulatory genes.

Conclusions

In summary, we established a ceRNA regulatory network for lncRNAs–miRNAs–mRNAs in GA. The current findings provide new insights into the role of ceRNA networks in GA and identify potential diagnostic and prognostic biomarkers. Furthermore, the analysis provides a new reference for a better understanding of the pathogenesis of GA. Although our findings have important clinical implications, they still have certain limitations. First of all, the analysis uses a straightforward TCGA database which lacks any diversity as all the participants are Caucasian. Whether it is suitable to extrapolate findings to other ethnic groups is not clear yet, and further research is needed. Secondly, this study only examines microarray data and does not carry out any functional tests to verify the analysis. It will be important for further studies to verify these results.