Introduction

Neuroblastoma (NB) is the most common extracranial solid tumor in children and an important cause of childhood cancer deaths. It originates from the neural crest progenitor and mostly occurs in the abdomen, particularly the adrenal gland [1]. Although the incidence rate accounts for only 8% of all childhood cancers, the mortality rate is 15% of the total [2, 3]. The clinical manifestations of patients with NB are diverse. Some patients experience spontaneous degeneration or differentiation into benign ganglioneuroma, whereas others remain affected by tumor hazards after adopting intensive treatment strategies [4, 5]. According to the clinical manifestations and biological characteristics of tumors, including age, disease grade, MYCN expansion or not, and histopathological manifestations, NB patients can be divided into low-, moderate-, and high-risk groups [2, 6]. Patients in the low- and moderate-risk groups responded better to surgery and chemotherapy, and their long-term survival rate was over 90% [7]. Patients in the high-risk group often have extensive metastatic lesions. Even with intense chemotherapy combined with surgery, radiotherapy, and autologous bone marrow stem cell transplantation, their long-term survival rate is still less than 50% [8,9,10]. Improving the cure rate and long-term survival rate of NB patients in the high-risk group is key to improving the overall prognosis, and an urgent problem that needs to be solved in basic research and clinical treatment. Therefore, it is necessary to find new valid targets for NB diagnosis and treatment.

Studies have shown that many biochemical molecular markers are related to tumor occurrence and development and can be used for early tumor screening [11]. However, many markers are highly expressed in various types of tumors and do not have excellent specificity [12]. Therefore, it is necessary to further explore new specific diagnostic NB markers as an auxiliary detection scheme for early diagnosis. Currently, microarray technology and bioinformatic analysis have become a promising and useful tool for screening significant genetic or epigenetic variations that occur during carcinogenesis and determining cancer diagnosis and prognosis [13]. Gene Expression Omnibus (GEO) is an international public repository for the archival and free distribution of microarrays, next-generation sequencing, and other forms of high-throughput functional genomic data [14, 15]. Researchers can obtain publicly available cancer data from around the world, providing opportunities for mining of cancer gene expression profiles [15] and laying a foundation for improving the early diagnosis, treatment, and prevention of various cancers.

In this study, we downloaded two NB chip datasets, GSE66586 and GSE78061, from the GEO database [16, 17]. Differentially expressed genes (DEGs) were screened by comparing gene expression between NB and control cells. A protein-protein interaction (PPI) network was constructed and module analysis of DEGs was performed through the STRING database and Cytoscape software. Then, using gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis in the DAVID database, functional annotation and signal pathway analysis were performed on DEGs, and survival analysis was performed to confirm the importance of prognosis. Finally, CTGF, EDN1, GATA2, LOX, and SERPINE1 are the hub genes. These findings may provide insights into the occurrence and development of NB, as well as potential therapeutic targets for future research.

Materials and methods

Microarray data

Two gene expression datasets (GSE66586, GSE78061) were obtained from the NCBI Gene Expression Synthesis (GEO) database, which can be obtained from http://www.ncbi.nlm.nih.gov/geo/ [18]. The GSE66586 array data was submitted by Gu L. et al., including eight NB and two control cells [17]. The GSE78061 dataset was submitted by Cole KA. et al., and consisted of 25 NB and four control cells [16]. Both datasets are based on the GPL6244 platform (Affymetrix Human Gene 1.0 ST Array; Agilent Technologies, Palo Alto, California, USA).

Microarray data processing

The Affy package in R (http://cran.r-project.org/) was used to perform a robust multi-array average (RMA) algorithm, the original array data was converted into expression values, and background correction, normalization, and probe summary were performed [19, 20]. The paired T-test of the limma package based on R was used to analyze the DEGs between NB and control cells [21, 22]. Adjust P value (AdjP-value) < 0.01 and | log2FC | > 2 were considered as the critical values for DEGs screening.

Integration of microarray data

The list of DEGs obtained from the two microarray datasets was saved as a CSV file through limma packet analysis [21]. We downloaded the robust rank aggregation (RRA) software package and used R to run the instruction code [23]. The gene list was up- or downregulated and the two chips were used for subsequent analysis. The RRA method can be publicly used in the comprehensive R package.

Enrichment analysis of DEGs with GO and KEGG

The GO (http://www.geneontology.org) database can provide a functional classification for genomic data, including biological process (BP), cellular component (CC), and molecular function (MF) [24]. Hence, GO analysis is a widely used gene and gene product annotation tool. The “Kyoto Encyclopedia of Genes and Genomes” (KEGG, http://www.genome.ad.jp/kegg/) database is a networked website designed to analyze, explain, and visualize gene functions [25, 26]. DAVID (http://david.abcc.ncifcrf.gov/) is an annotation, visualization, and comprehensive discovery database, and an online tool for gene function classification, useful for assessing the biological function of genes [27]. In this study, GO enrichment analysis and KEGG pathway analysis were performed using the DAVID website to study the function of DEGs. Values with P < 0.05 were deemed statistically significant.

PPI network construction and analysis

STRING (version: 11.0, https://string-db.org) is a search tool for identifying interacting genes and proteins, and importing DEGs into a database to construct a PPI network, which shows physical and functional interactions [26]. In this study, protein pairs with a total score > 0.4 were selected for PPI network construction. Additionally, Cytoscape software (version 3.6.1) was used to calculate the node degree through the Network Analyzer application, and draw PPI networks with different colors and sizes to show the adjustment (up or down) and node degree [28]. Twelve methods in cytoHubba (Betweenness, BottleNeck, Closeness, Clustering Coefficient, Degree, DMNC, EcCentricity, EPC, MCC, MNC, Radiality, and Stress) were used to rank and evaluate the hub genes, and to finally generate the hub gene network; DAVID was used for hub gene enrichment analysis of GO, and KEGG further explained the reliability of the results.

Hub gene survival analysis

The R2 platform (http://r2platform.com) is a genomic analysis and visualization platform that provides a biologist-friendly interface for high-throughput data analysis. It was developed in the Netherlands AMC Cancer Genomics Department, where it remains the main entry point for all types of high-throughput data. The R2 platform consists of two parts: a publicly accessible database that stores data, coupled with a web interface that provides a set of tools and visualizations to mine the database. In this study, the hub DEGs were selected, and the survival analysis of gene expression in related tumors were performed through the R2 database to determine the relationship between their expression in NB and patient prognoses. Bonfferoni P Value (Bonf P) < 0.05 was regarded as the critical point with statistical significance.

Result

Microarray data information and identification of DEGs

In order to identify DEGs, we performed background correction, and normalization of the NB expression microarray datasets GDS66586 and GSE78061. When filtering the GDS66586 dataset through the limma software package in R (AdjP-value <0.01 and | log2FC | > 2), 778 DEGs were obtained, including 355 upregulated and 423 downregulated DEGs. Besides, 846 DEGs were screened from the GSE78061 dataset, including 500 upregulated and 346 downregulated DEGs. The differential expression of multiple genes in both sample datasets from each of the two microarrays is shown in Fig. 1a and b. In addition, the cluster heatmap of the top 100 DEGs is shown in Fig. 1c and d.

Fig. 1
figure 1

Volcano plot distribution of DEGs and heatmap of the top 100 DEGs between the two datasets. The volcano plot of (a) GSE66586 and (b) GSE78061. The blue points indicate the screened downregulated DEGs, red points indicate the screened upregulated DEGs, and gray points indicate genes with no significant differences; Heatmap for top 100 DEGs of (c) GSE66586 and (d) GSE78061. From red to blue, the expression level of the gene in the sample gradually decreases. All DEGs are screened based on an Adjust P value <0.01 and | fold change | > 2. (DEGs, differentially expressed genes)

Identification of DEGs using integrated bioinformatics

In order to identify overlapped DEGs, we used the limma software package to analyze the two NB gene expression microarray datasets, classified them according to logarithmic change values, and then conducted an RRA analysis (Adjust P value <0.01), which is based on the assumption that each gene in each experiment is randomly ordered. If a gene is ranked higher in all experiments, the smaller the Adjust P value, the higher the likelihood of differential gene expression. RRA analysis showed 238 overlapping DEGs (Table 1), including 151 upregulated and 87 downregulated DEGs (Fig. 2).

Table 1 Overlapped differentially expressed genes (DEGs) between GSE66586 and GSE78061 microarray data
Fig. 2
figure 2

Identification of overlapped DEGs. Venn diagram of (a) 151 overlapped upregulated DEGs and (b) 87 overlapped downregulated DEGs between GSE66586 and GSE78061. (DEGs, differentially expressed genes)

GO and KEGG enrichment analysis of overlapped DEGs

In order to understand the molecular functions and pathways involving DEGs, we conducted a functional enrichment analysis. GO-based BP analysis showed that upregulated DEGs were significantly enriched in cell adhesion, extracellular matrix organization, angiogenesis, regulation of cell growth, and cell migration (Fig. 3a), while downregulated DEGs were significantly enriched in negative regulation of transcription, neuron migration, cell fate determination, axon guidance, and cell maturation (Fig. 4a). GO analysis of CC showed that upregulated DEGs were significantly enriched in the plasma membrane, extracellular exosome, extracellular region, extracellular space, and integral component of plasma membrane (Fig. 3b), while downregulated DEGs were mainly involved in the plasma membrane, Golgi apparatus, neuron projection, cell junction, and neuronal cell body (Fig. 4b). Regarding MF, upregulated DEGs were significantly enriched in protein binding, transcription factor activity, RNA polymerase II regulatory region DNA binding, transcriptional activator activity, and RNA polymerase II transcription factor binding (Fig. 3c), while downregulated DEGs were mainly involved in calcium ion binding, actin binding, heparin binding, actin filament binding, and collagen binding (Fig. 4c). In addition, KEGG analysis showed that upregulated DEGs were significantly enriched in focal adhesion, the TNF signaling pathway, and arrhythmogenic right ventricular cardiomyopathy (Fig. 5a), while downregulated DEGs were mainly involved in cholinergic synapse, dopaminergic synapse, morphine addiction, cancer pathways, and signaling pathways regulating stem cell pluripotency (Fig. 5b).

Fig. 3
figure 3

Functional enrichment analysis of upregulated differentially expressed genes (DEGs). Analysis of (a) biological process, (b) cellular component, and (c) molecular function

Fig. 4
figure 4

Functional enrichment analysis of downregulated differentially expressed genes (DEGs). Analysis of (a) biological process, (b) cellular component, and (c) molecular function

Fig. 5
figure 5

Pathway enrichment analysis of DEGs. KEGG pathway analysis of (a) upregulated and (b) downregulated DEGs. (DEGs, differentially expressed genes; KEGG: Kyoto Encyclopedia of Genes and Genomes)

PPI network construction, module analysis, and hub gene determination

In order to study the protein-protein interactions of DEGs, we used STRING network-based protein interaction analysis to generate a PPI network from 238 DEGs overlapped in two datasets (Fig. 6). Following further analysis in Cytoscape, the top 100 DEGs were selected by 12 modules in cytoHubba and intersected. A total of 48 overlapped DEGs were identified and visualized (Fig. 7a). The 48 overlapped DEGs were sorted according to their degree scores, and the top 15 DEGs with the highest scores, namely ACTA2, COL4A1, LOX, CTGF, FBN1, SERPINE1, FSTL1, GATA3, GATA2, TAGLN, ISL1, HAND2, GJA1, MMP14, and EDN1, were selected (Table 2) and visualized (Fig. 7b).

Fig. 6
figure 6

Protein-Protein Interaction (PPI) network of differentially expressed genes (DEGs) constructed in STRING

Fig. 7
figure 7

PPI network of DEGs constructed in Cytoscape. PPI network of (a) overlapped DEGs, filtered out by 12 modules of cytoHubba in Cytoscape, and (b) the top 15 overlapped hub DEGs. Red points represent upregulated DEGs, and green points represent downregulated DEGs. (DEGs, differentially expressed genes; PPI, protein-protein interaction)

Table 2 Top 15 differentially expressed genes (DEGs) by degree score ranking

Hub gene survival analysis

In order to study the correlation between DEG expression and NB patient prognosis, we further analyzed data for 649 samples through R2 database. Finally, we selected the prognosis of CTGF, EDN1, GATA2, LOX, SERPINE1, and NB patients in these 15 DEGs. The closely related DEGs serve as hub genes. Among them, the overall survival rates of patients with low CTGF and LOX expression and high EDN1, GATA2, and SERPINE1 expression in NB were high (Figs. 8 and 9a-e). Subsequent GO and KEGG enrichment analyses showed that hub genes were mainly enriched in intracellular signal transduction, cell-cell signaling, protein binding, HIF-1 signaling pathway, and Hippo signaling pathway (Table 3), and the visualization results are shown in Fig. 8. Bonf p < 0.05 was regarded as the critical point with statistical significance.

Fig. 8
figure 8

Distribution of hub DEGs in NB for GO enrichment. (DEGs, differentially expressed genes; NB, Neuroblastoma; GO, gene ontology)

Fig. 9
figure 9

Survival Analysis of hub differentially expressed genes (DEGs). Survival Analysis of (a) CTGF, (b) EDN1, (c) GATA2, (d) LOX, and (e) SERPINE1 in NB. Bonf p < 0.05 was regarded as the critical point with statistical significance

Table 3 Functional and pathway enrichment analysis of hub DEGs. (GO, gene ontology; BP, biological process; CC, cellular component; MF, molecular function; KEGG: Kyoto Encyclopedia of Genes and Genomes)

Discussion

NB is a significant cause of childhood death, and early diagnosis and treatment are essential to prolonging the survival time of patients [29]. Therefore, it is necessary to further explore the predictive indicators and treatment goals of NB. With the development of bioinformatics, DNA microarrays are increasingly used to explore the early diagnosis, treatment, and prognosis of cancer [30]. This study aimed to identify DEGs between NB and control cells to further understand the pathogenesis of NB and potentially provide diagnostic biomarkers and therapeutic targets.

According to reports, studies using multiple cohorts tend to have lower false positive and false negative rates than single cohort studies [31]. However, due to factors, such as batch effects and biological differences, multiple microarrays from different platforms may obscure and confuse the real situation [32]. In order to improve the reliability of DEG identification, we selected two microarray datasets from the same platform, then identified a total of 704 upregulated and 682 downregulated DEGs. Among them, 151 upregulated and 87 downregulated DEGs were significantly expressed in both datasets. To further define the role of these DEGs in NB, we conducted a series of bioinformatic and prognostic analyses of these DEGs.

GO enrichment analysis showed the upregulation of DEGs mainly involved in BP, such as cell adhesion, regulation of cell growth, and cell migration. In contrast, DEGs mainly involved in BP, such as negative regulation of transcription, cell fate determination, and cell maturation were downregulated. Studies have shown that the reduction in cell adhesion and the change in cell migration ability are critical steps in cancer metastasis, which is consistent with our the results from our analysis [33, 34]. For MF, DEGs mainly involved in protein binding, transcription factor activity, and cadherin binding were upregulated, whereas DEGs mainly enriched in calcium ion binding, actin binding, and heparin binding were downregulated. Villalobos and others pointed out that calmodulin plays an essential role in tumor cell migration, invasion, and metastasis, which supports our findings [35]. CC analysis showed that the upregulation of DEGs was concentrated in the plasma membrane, extracellular exosome, and extracellular region, whereas the downregulation of DEGs was concentrated in the plasma membrane, Golgi apparatus, and cell junction. Some previous studies have shown that the role of extracellular exosome and transcription factor activity in tumor development and progression is consistent with the results of this study [36]. Besides, KEGG enrichment analysis showed the upregulation of DEGs that were significantly enriched in the TNF signaling pathway and the downregulation of DEGs significantly enriched in the cancer pathways, signaling pathways regulating pluripotency of stem cells, and cell adhesion molecules. This is also consistent with the fact that TNF is a cytokine that can directly kill tumor cells and has no apparent cytotoxicity to healthy cells. The activation of the TNF signaling pathway also plays a crucial role in tumor regulation [37, 38].

A PPI network was constructed for the identified DEGs, and critical genes were defined according to degree levels. ACTA2, COL4A1, LOX, CTGF, FBN1, SERPINE1, FSTL1, GATA3, GATA2, TAGLN, ISL1, HAND2, GJA1, MMP14, and EDN1 were determined to have a high degree of network connectivity, combined with gene expression and NB prognosis correlation. We finally identified CTGF, EDN1, GATA2, LOX, and SERPINE1 as hub genes among these 15 genes. Functional enrichment analysis of the hub genes showed that the development of NB is related to angiogenesis, protein binding, the HIF-1 signaling pathway, and the Hippo signaling pathway.

CTGF, also known as CCN2 (Cell Communication Network Factor 2), is a protein-coding gene that encodes proteins that play a role in cell adhesion in many cell types and participates in ERK signaling and the TGF-β pathway [39,40,41,42]. Wang et al. pointed out that in cells overexpressing TAZ, knocking out CTGF with small interfering RNA can inhibit the expression of CTGF induced by TAZ, thereby inhibiting the proliferation and colony formation of NB cells [43]. Although there are relatively few studies on the role of CTGF in NB, this provides some potential value for the study of its role in NB.

EDN1 is a member of the endothelin family and is a protein-coding gene. Abnormal expression of genes may promote tumorigenesis [44]. SERPINE1 is a member of the serpin protease inhibitor (serpin) superfamily, reported to be involved in the Hippo signaling pathway [45]. Although there are no reports related to the research on the role of EDN1 and SERPINE1 in NB, our GO and KEGG enrichment analysis showed that EDN1 and SERPINE1 were involved in the regulation of cell-cell signaling, protein binding, the HIF-1 signaling pathway, and the Hippo signaling pathway. This agrees with previous reports that the HIF-1 and Hippo signaling pathways play a vital role in NB regulation [46,47,48].

GATA2 (GATA binding protein 2) is a protein-coding gene that plays an essential role in regulating the transcription of genes involved in the development and proliferation of hematopoietic and endocrine cell lineages [49]. Hoene et al. Pointed out that changes in the expression levels of GATA2 and its family in NB may be related to the pathogenesis of neuroblastoma [50]. Wei and others pointed out that in the transcriptome sequencing of three tumors including NB, LPAR1, GATA2, and NUFIP1 had high expression levels of mutant alleles, indicating that these mutant genes may have carcinogenic effects [51]. These results indicate that GATA2 may be a potential marker for early cancer detection and prognosis.

LOX is an extracellular copper-dependent amine oxidase, which is involved in the crosslinking of collagen and elastin lysine residues in the extracellular matrix. Its expression level in tumors is related to tumor prognosis [52]. Studies have shown that LOX/COX inhibitors can promote the differentiation of neuroblastoma cells induced by all-trans retinoic acid to a certain extent [53, 54]. Based on the role of LOX in tumors, with further in-depth research, LOX is expected to become a potential target molecule for NB treatment.

In summary, the results from this study suggest that CTGF, EDN1, GATA2, LOX, and SERPINE1 are NB hub genes. GO and KEGG enrichment analysis of these five hub genes further revealed their functions and pathways, and survival analysis found them to be closely associated to the prognosis of NB patients. These genes may become potential markers for improving diagnosis, optimizing chemotherapy, and predicting prognosis for NB, as pathways related to the genes are potential therapeutic targets for NB. We plan to verify the potential functions and pathways of these genes in future research.