Introduction

Gastric cancer (GC) stands as one of the most common and life-threatening cancers in China [1]. According to the Correa hypothesis, GC commonly progresses from chronic atrophic gastritis, intestinal metaplasia, and dysplasia to carcinoma; a majority of GCs develop from gastric intramucosal neoplasia [2]. GC can be classified into two histological subtypes: differentiated and undifferentiated types or intestinal and diffuse types [3]. According to the 5th edition of the WHO Classification [4], gastric differentiated-type intramucosal neoplasia includes low-grade dysplasia, high-grade dysplasia, and intramucosal cancers (IMCs). This classification is widely adopted by pathologists internationally and is helpful for evaluating gastric carcinogenesis. Given the significance of early diagnosis and treatment in improving patient outcomes [5], it is essential to gain a deeper understanding of early GC clinically and biologically.

GC is a complex disease involving multiple genetic and epigenetic alterations [6]. Previous studies have revealed that altered DNA methylation is associated with Helicobacter pylori infection, Epstein-Barr virus (EBV) infection, microsatellite instability, intestinal metaplasia, the tumor microenvironment, and the prognosis of GC [7,8,9]. In the case of gastric tumors, aberrant DNA methylation occurs more frequently than mutations [10]. Moreover, accumulated epigenomic aberrations in apparently normal tissues reportedly modify tumor risk and are thus suggested to be utilized as tumor risk markers [11]. Recent studies on epigenetic alterations in GC have primarily focused on the DNA methylation of advanced GC, yet little is known about DNA methylation changes in IMCs.

In this study, Illumina 850K microarrays were used to detect differences in DNA methylation between eight paired IMCs and adjacent nontumor mucosa FFPE tissues. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were conducted to investigate the biological functions of differentially methylated genes (DMGs). To identify candidate methylation genes, the gene expression profiling microarrays (GSE130823 and GSE55696) were integrated and analyzed through bioinformatics. Finally, we used the quantitative real-time PCR, the TCGA methylation database, and single-sample gene set enrichment analysis (GSEA) to validate the candidate genes. This study delved into the biological differences between IMCs and nontumor mucosa by examining methylation alterations and identified genes potentially involved in IMCs development.

Materials and methods

Patients and tissues

FFPE tissues were collected from early GC patients who underwent endoscopic resection at Beijing Chao-Yang Hospital (Beijing, China) between April 2019 and April 2022. The reference standard was the 5th edition of the WHO Classification of Tumors of the Digestive System. All hematoxylin and eosin (H&E)-stained slides were assessed independently by two pathologists. The inclusion criterion was histopathological diagnosis of differentiated intramucosal adenocarcinomas with no lymphatic or vascular invasion. For tumor tissues, 8- to 10-µm-thick unstained FFPE tissue and isolated cancerous areas were cut by microdissection with compared to a H&E-stained slide. Among the nontumor tissues, we selected the gastric mucosa farthest from the tumor and none of the dysplastic lesions under the microscope. Altogether, eight tumor and adjacent nontumor FFPE tissues from eight patients with EBV-negative and DNA mismatch repair proficient (pMMR) tumors underwent genome-wide methylation analysis. Quantitative real-time PCR was performed on twenty-four tumor and adjacent nontumor FFPE tissues from twenty-four patients. (The inclusion criteria are depicted in Fig. 1A; the patient and tumor characteristics are presented in Supplementary file2 and file3).

Fig. 1
figure 1

Flow chart for illustrating the study design. A Histology-based screening for tumor and adjacent normal gastric mucosa. B Flow chart for filtering pipeline used to generate a set of high-confidence probes. A total of 727,890 finalized probes were generated

DNA extraction, bisulfite conversion, and Illumina 850K microarray data

DNA was extracted from FFPE tissues (1-2 years old) using a ReliaPrep™ FFPE gDNA Miniprep System (Promega). The quality of the DNA was assessed with the real-time PCR-based Infinium HD FFPE QC Assay Kit (Illumina, Inc). Only DNA tissues with a purity > 500 ng/µl, OD260/280 > 1.8, or delta CT < 5 compared to the quality control (QC) template were deemed eligible for restoration. All DNA samples were bisulfite-converted using the EZ DNA methylation kit (Zymo Research). DNA purified from FFPE tissue was subsequently repaired using the Infinium HD FFPE DNA Restore Kit (Illumina, Inc.) and ZR-96 DNA Clean & Concentrator-5 (Zymo Research). Global DNA methylation was profiled with the Infinium® MethylationEPIC BeadChip Kit (manual protocol, Illumina) and the Illumina iScan system was used to determine image intensities (Illumina, Inc.). Raw data quality control assessment, filtering, normalization, and differential methylation analysis were performed with the R statistical environment. Following these intrasample normalization procedures, DNA methylation at each CpG was scored as a β value, ranging from 0 (no methylation) to 1 (100% methylation) at a given CpG dinucleotide [12].

Differential methylation analysis and functional enrichment

Statistical analyses were conducted on the obtained β values, and mean β values were calculated and compared between cases (IMCs) and controls (nontumor). The criteria for differentially methylated probes (DMPs) were set as follows: significance according to Benjamini & Hochberg (adjusted p < 0.05) and a difference in the β value between groups larger than 0.2 (|Δβ|> 0.2). Statistics for DNA methylation levels at different genomic locations were calculated using the Wilcoxon rank test, and the distribution of DMPs was assessed using chi-squared tests. We also conducted GO, including the biological process, cellular component, molecular function, and KEGG pathway enrichment analyses, for differentially methylated genes (DMGs) using the R package “ClusterProfiler.” A cutoff criterion of p < 0.05 was chosen to identify significant pathway terms.

Data acquisition and processing

The Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) was searched to identify relevant RNA-seq data. The search utilized keywords such as “early gastric cancer,” “Homo,” and “tissue,” with consideration for racial differences. Ultimately, two RNA-seq datasets GSE130823 and GSE55696 were identified. Thirty paired chronic gastritis, fourteen high-grade dysplasia, and sixteen intestinal-type early GC cases were obtained from the GSE130823 cohort (platform: GPL1707 Human GE v2 8 × 60K Microarray 039381). Nineteen chronic gastritis and twenty high-grade dysplasia samples were obtained from the GSE55696 cohort (platform: GPL6480 Whole Human Genome Microarray 4 × 44K G4112F). Both of these RNA-seq datasets were based on Chinese individuals. Differential expression analysis was conducted (|log2FC|> 1.5, p < 0.05) by comparing the IMC tissues to normal tissues using the R package “limma.”. DNA methylation data (Illumina HumanMethylation 450k microarrays) was obtained from The Cancer Genome Atlas (TCGA) stomach adenocarcinoma (STAD) cohort. All CpG probes overlapping with the eEF1A2 were investigated, and the mean values were calculated to attain a stable signal.

RNA extraction, cDNA preparation, and quantitative real-time PCR

RNA was extracted from FFPE tissues using TRIzol Reagent (Invitrogen) according to the protocol (the protocol is presented in Supplementary file1). cDNA was synthesized using a TIANscript RT Kit. PCR was performed using the following primers: eEF1A2-f, 5′-GGACCATTGAGAAGTTCGAGA-3′; and eEF1A2-r, 5′-AGCACCCAGGCATACTTGAA-3′. Beta-actin was amplified as the internal control using the primers ACTB-f 5′- CTGAGCGCAAGTACTCCGTG-3′ and ACTB-r 5′- CATTTGCGGTGGACGATGGA-3′. The reaction conditions included a denaturation program (95 °C for 15 min) and an amplification and quantification program repeated 40 times (95 °C for 10 s and 60 °C for 43 s). Each sample was tested in triplicate. The eEF1A2 expression level in IMC tissues was directly compared to that in matched nontumor tissues, and the relative expression level was calculated using the 2−ΔCT method.

Single-sample gene set enrichment analysis

After annotation, an expression matrix was arranged by selected gene values from low to high, and all participants were divided into high and low groups based on mean expression. GSEA of the selected genes was performed with the “GSVA package.” A nominal NSE > 1, p < 0.05, and FDR < 0.25 were considered the thresholds for statistical significance.

Results

Global DNA methylation profile of IMCs

The results of quality control and preprocessing are presented in Fig. 1B, and a set of 727,890 probes was utilized for further analysis. The overall DNA methylation patterns exhibited a bimodal distribution, with high or low levels at most CpG sites in both tumor and nontumor controls. The DNA methylation levels for different genomic locations are presented in Fig. 2. Separating CpGs in the gene context based on genomic features revealed a distinct pattern: most unmethylated CpGs were restricted to gene promoters. The distribution of DNA methylation in specific gene regions was similar in both groups. Global genome-wide methylation analysis revealed that the global methylation level in tumors was lower than that in nontumor mucosa tissues (median methylation = 0.567 and 0.596, respectively; p < 0.01; Wilcoxon rank test). However, when the CpG sites were analyzed separately based on gene regions, significantly different methylation levels were observed between tumor and nontumor tissues. These results indicate that DNA methylation is significantly altered during IMCs development.

Fig. 2
figure 2

The DNA methylation levels at different genomic locations: Violin plots showing DNA methylation at the genome-scale (n = 727,890), gene promoters (n = 233,210), intergenic regions (n = 199,436), gene body regions (n = 277,090), and 3’UTRs (n = 18,154). Box plots within each violin plot indicate the interquartile range, and the red horizontal lines denote the median methylation. In all cases, the y-axis represents the methylation level on a 0 to 1 scale (i.e., 0 to 100%). p values were calculated by Wilcoxon rank tests

Genomic features of DMPs between tumors and nontumor tissues

We next sought to identify DNA methylation alterations specific to IMCs. The differentially methylated probes used for IMCs are presented in Fig. 3. By comparing gastric IMC tissues to normal tissues, we found a total of 449 differentially methylated probes (DMPs; adjusted p value < 0.05, |Δβ|> 0.2; Supplementary file5), and Most DMPs (66.1%) were hypomethylated in IMCs (297 hypo- and 152 hypermethylated). We then examined the genomic distribution of DMPs, revealing significant differences between hypo- and hypermethylated DMPs in terms of functional genomic distribution, as well as the CpG content and neighborhood context. Moreover, we observed a significant enrichment of hypermethylated DMPs in 3’UTRs (61.5%) and a notable overrepresentation of hypomethylated DMPs located in intergenic regions (87.9%) (p < 0.05 for both, Pearson χ2 test). Regarding CpG island regions (island, shore, shelf, and open sea), hypermethylated DMPs were overrepresented in shelf regions (42.2%), whereas hypomethylated DMPs were enriched in shore regions (81.8%) (p < 0.05, Pearson χ2 test). Genomic distribution of the probes in different groups are presented in Supplementary file4.

Fig. 3
figure 3

Differentially methylated probes in IMCs. A Volcano plot of probe-level methylation in IMCs versus controls. The plot illustrates the relationship between the magnitude of difference in β values (Δβ values; x-axis) and adjusted p values (negative log10 transformed adjusted p values; y-axis). Each dot represents a single probe. B Bar charts showing the distribution of DMPs in relation to the gene region (left panel) and CpG island region (right panel). The distribution of CpG probes was as follows: All probes in the 850K array available for analysis (grey; n = 727,890 probes), hypermethylated DMPs (red; n = 152 probes), and hypomethylated DMPs (blue; n = 297 probes) according to gene region (left panel) and CpG island region (right panel). p values were calculated by Pearson χ2 tests (*p < 0.05)

Functional annotation analysis of differentially methylated genes

To identify changes in biological behavior during gastric tumorigenesis, GO and KEGG enrichment analyses of the identified differentially methylated genes (DMGs) are presented in Fig. 4. The 152 significantly hypermethylated DMPs mapped to 120 genes, and functional annotation of the hypermethylated DMGs revealed enrichment in biological process terms including the regulation of cell morphogenesis involved in differentiation and regulation of the GTPase activity. Additionally, pathways related to the occurrence and development of GC, such as the Rap1 signaling pathway, MAPK signaling pathway, PI3K-Akt signaling pathway, and T-cell receptor signaling pathway were implicated. The 297 significantly hypomethylated DMPs represented 136 genes. However, the hypomethylated DMGs were enriched mainly for cellular component terms, including those related to the regulation of cell adhesion molecules, cell membrane ion channels, and the VEGF signaling pathway. Consequently, the molecular expression and biological functions of IMCs are substantially altered during gastric tumorigenesis compared to those in adjacent nontumor mucosal tissues.

Fig. 4
figure 4

GO and KEGG enrichment results of DMGs in paired IMCs groups. A GO analysis in terms of cellular component (CC), biological process (BP), molecular function (MF). B KEGG enrichment results for hypermethylated and hypomethylated DMGs, respectively

Identification of overlapping DMGs

To validate the functional impact of aberrant methylation observed in IMCs, we further investigated the correlation of DMGs with their corresponding gene expression levels in IMC cases through the GEO database. The RNA-seq data from each microarray were separated from the screened differentially expressed genes (DEGs) in IMC and normal tissues. By setting the thresholds (|Log2FC|> 1.5, p < 0.05), we identified 206 upregulated DEGs, 160 downregulated DEGs in the GSE130823, and 268 upregulated DEGs, 499 downregulated DEGs in the GSE55696. By examining to the overlap between the two GEO datasets and methylation data, two genes were found to be differentially expressed and methylated, including one hypomethylation-high expression gene, eEF1A2, and one hypomethylation-low expression gene, CLDN4. In this study, we selected eEF1A2 for further research. (DEGs of GSE130823 and GSE55696 are presented in Supplementary file6 and file7, respectively.)

eEF1A2 mRNA expression is elevated in IMC tissues

A subset of 24 IMC FFPE tissues and 24 paired nontumor FFPE tissues were randomly selected to verify the mRNA expression levels of eEF1A2 using quantitative real-time PCR. The relative expression of eEF1A2 in IMC tissues was significantly higher than that in paired matched adjacent nontumor tissues (Fig. 5A, ***p < 0.001, Wilcoxon signed-rank test).

Fig. 5
figure 5

The final validation of eEF1A2. A The expression level of eEF1A2 in IMCs tissues was examined by quantitative real-time PCR. Expression of eEF1A2 was higher in 24 IMCs tissues than in their pair-matched adjacent normal tissues. p values were calculated by Wilcoxon signed-rank tests (***p < 0.001). B eEF1A2 methylation levels in normal gastric tissues versus IMCs and advanced GC. C GSEA analysis of eEF1A2. Pathways related to the eEF1A2 high-expression group are shown

eEF1A2 methylation in advanced GC and single-sample gene set enrichment analysis

To further examine the diagnostic value of eEF1A2 methylation, we analyzed the methylation status of advanced GC in STAD patients. After overlapping the eEF1A2 methylation level with the TNM stage, complete clinical data were obtained for 187 patients. As shown in Fig. 5B, consistent with our 850k assay results, eEF1A2 was hypomethylated in advanced GC.

To investigate the potential functions of eEF1A2, GSEA was performed on the GSE130823 and GSE55696 RNA-seq data, and the results for eEF1A2 are presented in Fig. 5C and supplementary file1 (Fig. 6). The genes in the high-expression group were enriched in “gastric acid secretion,” “mucin type O-glycan biosynthesis,” and the “Hedgehog signaling pathway.” The genes in the low-expression group were enriched in “arginine biosynthesis,” “cholesterol metabolism,” “fat digestion and absorption,” “renin − angiotensin system,” and “vitamin digestion and absorption.” These results suggest that the low expression of eEF1A2 is mainly involved in nontumor pathways, while the high expression of eEF1A2 is associated with typical GC pathways and tumor invasion.

Discussion

In this study, we investigated the genome-wide DNA methylation patterns of IMCs and identified 727,890 probes that displayed significant differential DNA methylation. Genome-wide DNA methylation profiling revealed a global decrease in methylation in IMCs compared with nontumor tissues. Overall, 449 DMPs were identified, including 152 hypermethylated sites and 297 hypomethylated sites. The distributions of these methylated sites are in the gene region, CpG islands, and their surrounding exhibited variations. Our results indicated that gastric IMC and nontumor tissues exhibited distinct DNA methylation profiles. Then GO and KEGG enrichment analyses were used to systematically analyzed the DMGs between IMC and nontumor tissues. The functional analysis unveiled that hyper- and hypomethylated genes encompass diverse functional aspects. Hypermethylated DMGs were mostly enriched in biological process terms, whereas hypomethylated DMGs were mainly enriched in cellular components. The enrichment of hypermethylated DMGs indicated that methylation plays a regulatory role in cell morphogenesis and contributes to the differentiation of gastric epithelial cells. Furthermore, it indicated the involvement of multiple signaling pathways, including the Rap1 signaling pathway, MAPK signaling pathway, PI3K-Akt signaling pathway, and T-cell receptor signaling pathway were identified, suggesting that DNA hypermethylation might simultaneously regulate essential pathways and the tumor microenvironment to mediate IMCs development. Additionally, hypomethylated DMGs were primarily enriched in the regulation of cell membrane ion channels. The aberrant expression and/or activity of these channels can contribute to malignant transformation and tumor progression in gastrointestinal cancer [13], and regulating the molecular processes and signaling pathways of cell adhesion in IMCs indicated the acquisition of the potential for invasion and metastasis in gastric tumor cells. Our study contributes more comprehensive information on the expression profile of methylation in IMCs.

To investigate the DMG of IMCs, we analyzed gene expression profiles from the GSE130823 and GSE55696 datasets. Previous studies [14, 15] have shown that the molecular differences between high-grade dysplasia and early GC are not significant, during GC tumorigenesis, cancer-like changes occur in low-grade dysplasia and accumulate in high-grade dysplasia and early GC. Therefore, utilizing RNA-seq data from the GSE130823 and GSE55696 cohorts is meaningful for studying IMCs expression. By integrating two RNA-seq microarray datasets from Chinese individuals and employing bioinformatics analysis, we identified that one hypomethylated/upregulated gene, eEF1A2, and one hypomethylated/downregulated gene, CLDN4, overlapped across our DNA methylation data. Additionally, we assessed the mRNA expression level of eEF1A2 in 24 paired differentiated IMCs and adjacent nontumor mucosa FFPE tissues, and quantitative real-time PCR analysis revealed that expression of eEF1A2 was significantly greater in IMCs compared with nontumor tissue. GSEA of the eEF1A2 gene revealed that it affects specific pathways associated with GC. Moreover, the analysis of methylation datasets from TCGA further confirmed the hypomethylation levels of eEF1A2 in advanced GC. Previous studies have demonstrated the DNA methylation process is gene- and tumor-stage-dependent during gastric carcinogenesis. Some genes exhibit higher methylated during dysplasia and the early cancer phase compared to those in normal but show lower methylation in advanced GC [16]. Our research revealed that, in contrast to normal tissues, eEF1A2 undergoes hypomethylated from the early stage to the advanced stage of the GC. eEF1A2 belongs to the eEF1A family and plays a central role in the elongation step of translation. Simultaneously, it performs critical functions in tumorigenesis and influences survival in cancer-specific manner [17]. eEFlA2 was reported as a putative oncogene due to its high expression in many cancers [18,19,20]. In previous studies [21, 22], it plays a crucial biological role by promoting the guanosine triphosphate (GTP)-dependent binding of aminoacyl tRNA to ribosomal A sites during protein biosynthesis. This leads to regulated changes or disturbances in cell function significantly impacting the growth and proliferation of tumor cells, preventing immune surveillance, inducing angiogenesis, and activating metastasis. Additionally, higher mRNA levels of eEF1A2 predict poor overall patient survival and first progression in GC [23]. These studies suggest that eEF1A2 may have the potential for wide-ranging clinical significance and application. The present study is the first to report aberrant hypomethylation of eEF1A2 in gastric IMCs through endoscopic resection tissues. However, the corresponding hypomethylated CpG sites were located within not a promoter CpG island but within a gene body region. Numerous studies have shown that the gene body is widely involved in regulating the expression of many genes and is closely related to the occurrence and progression of malignant tumors [24]. The mechanism of gene body hypomethylation in tumors is still unclear, and more complex mechanisms may exist. In our study, this candidate was confirmed to have upregulated mRNA expression by quantitative real-time PCR analysis, indicating that the candidate DMGs identified in this study are not artifacts. To analyze the potential pathogenic mechanism underlying the high expression of eEF1A2, we conducted pathway analysis of the high and low mRNA expression groups of eEF1A2 through GSEA. We found that the eEF1A2 high expression group was mainly enriched in regulating gastric acid secretion, mucin-type O-glycan biosynthesis, and activating the Hedgehog signaling pathway. These signals are closely related to GC progression [25,26,27]. Studies have shown that abnormal methylation and mRNA expression of eEF1A2 may have clinical significance and potential for application in GC research.

Early GCs are typically diagnosed and treated by endoscopic resection and are characterized by small lesions and relatively precious specimens. Any mucosal defects can impact diagnosis and treatment. To preserve the integrity of the lesion and ensure the accuracy of the diagnosis, this study utilized embedded FFPE specimens instead of fresh frozen mucosal specimens. Nevertheless, there have been many reports confirming that the overall DNA methylation profile in FFPE tissue is highly concordant with that found in fresh frozen tissue [28, 29]. The limitations of this study are as follows: First, although we selected tumor and adjacent nontumor tissues from the same patient to avoid individual differences, it is also necessary to collect normal tissues without intestinal metaplasia mucosa as a control to avoid the influence of other factors on methylation status. Second, the threshold values for gene expression differences and methylation differences selected were |Log2FC|> 1.5 and |Δβ|> 0.2. However, the setting of thresholds varies among studies, and the next step may involve exploring additional potential overlapping genes by adjusting the screening threshold. Third, our study involved only RNA-seq data from the GEO database; additionally, the absence of data from microarray expression profiling related to IMCs means that our results should be interpreted cautiously. Moreover, with a comparative and cross-sectional design, it is impossible to infer causal relationships between methylation status and expression levels or between the significance of the identified genes and carcinogenesis. These findings warrant further functional studies to elucidate whether methylation-induced overexpression of eEF1A2 is a driver event for the development of IMCs.

In conclusion, we present a detailed description of genome-wide DNA methylation alterations in IMCs and propose that aberrant hypomethylation and high mRNA expression of eEF1A2 might play a role in the development of IMCs.