Introduction

The brain is the most important organ in higher organisms since it controls the functioning of other organs in the body. Specifically, it is responsible for higher cognitive functions, such as memory, learning and perception. Environmental and genetic factors can affect the brain and cause neurodegenerative diseases, but time alone is sufficient to lead to brain dysfunction (Crowder, 2014). This functional loss is called aging, which is negatively associated with cognition and memory performance of the brain. This decline in brain function is not understood well because of its complexity; it affects multiple systems and molecular processes such as lipid metabolism, insulin balance, calcium balance, inflammatory processes, mitochondrial function, myelination and extracellular vesicles (Schiera et al. 2020; Harman and Martín 2020; Spinelli et al. 2019; Alberini et al. 2018).

Transcriptome data is commonly utilized in the literature to study the activity of genes in response to diseases and other dysfunctions (Malone and Oliver 2011). There are several studies in the literature that use transcriptome analysis to understand the molecular mechanisms of learning and memory. These experiments use the mouse and the rat as model organisms because they are readily available and have high genetic similarity to humans (Perlman 2016). Rattus norvegicus is commonly used for memory and learning experiments. In these experiments, the effect of aging on the brain is investigated mostly by using behavioral responses of animals to memory tests (Verbitsky et al. 2004). These experiments create a model for the aging brain to understand the molecular mechanisms of aging and their effects on memory performance by using experimental data from the hippocampus region of the brain. Blalock et al. (Blalock et al. 2003) created an aging brain model by comparing young and aged rats, which shows that the disturbance in the Ca + signaling mechanism negatively affects neuronal activity in the cell and has side effects on bio-processing mechanisms, causing nutrient deficiency in neurons. This deficit reduces energy production, which is required for signaling mechanisms, and contributes to diminished memory and learning performance (Blalock et al. 2003). Rowe et al. (Rowe et al. 2007) created a rat-based aging brain model and showed that memory-impaired animals have maintenance problems for glucose utilization, leading to downregulation in energy-providing astrocytic processes. This disturbance causes injury in signaling mechanisms and triggers myelination/demyelination processes. Plasticity mechanisms can be affected by myelination/demyelination processes and, consequently, influence learning and memory formation (Rowe et al. 2007).

These studies are limited in terms of elucidating mechanisms because the transcriptome data was analyzed without considering molecular interactions. Subnetwork discovery algorithms such as BioNet (Beisser et al. 2010) and KeyPathwayMiner (KPM) (Alcaraz et al. 2014) can be employed to map transcriptome data on organism-specific protein–protein interactome (PPI) data. The aim of these tools is to extract significantly affected and interacting proteins from the vast PPI data. These algorithms are successful and have been used to identify essential mechanisms and proteins in the literature. KPM has been used, among others, to investigate the molecular mechanism of Huntington’s disease (Alcaraz et al. 2011), to find biomarkers for the tumor cells (Huang et al. 2017) and to detect the chemotherapy response of breast cancer patients (Warsow et al. 2013). The BioNet algorithm was used to analyze the effect of silencing specific genes in bladder cancer ( Chen et al. 2017), the mechanisms of ovarian cancer (Yin et al. 2016) and the recurrence risk in ovarian cancer patients (Cheng et al. 2018). Alternatively, network inference algorithms can be used to predict co-expression networks based on the correlation between gene expression levels of gene pairs. The weighted gene correlation network analysis (WGCNA) algorithm (Langfelder and Horvath 2008) was used in several studies, including understanding the effect of aging on DNA methylation (Horvath et al. 2012), comparing normal aging with Alzheimer’s disease condition (Miller et al. 2008) and comparing samples from various cancer types (Kalamohan et al. 2019). The studies in the literature have proven the success of these algorithms in discovering disease-associated subnetworks.

Here, we used these efficient subnetwork discovery algorithms, KPM and BioNet, alongside memory- and learning-related rat-based transcriptome data to discover subnetworks from the organism-specific PPI data. Further, we used the WGCNA algorithm to find highly correlated protein pairs in the discovered subnetworks. The subnetworks from the two algorithms and from two independent datasets were compared, functionally analyzed and memory and learning related proteins were deduced. The analyses led to the construction of a memory-learning network. The main objective of this study is to reveal memory- and cognition-related molecular changes in the brain in response to aging by integrating transcriptome data with molecular interaction networks.

Methods

Microarray Data and Data Processing

Experimental data was downloaded from the Gene Expression Omnibus (Clough and Barrett 2016) database with the accession numbers GSE854 (Blalock et al. 2003) and GSE5666 (Rowe et al. 2007). The samples in both datasets were collected from the hippocampal regions of 7-day (GSE854) or 5-day (GSE5666) trained rats. Samples from nine rats aged 4 months (young) and ten rats aged 24 months (old) were used from the GSE854 dataset to understand the effect of aging in memory and learning performance. The animals were trained with the Object Memory Test (OMT) and the Morris water maze (MWM) test. Similarly, the GSE5666 dataset included data from ten rats aged 4–6 months (young) and 20 rats aged 24–26 months (old) trained by the MWM test. In order to make the two datasets comparable, only data from 5-day-trained rats were selected from the dataset, and the subcategories in the aged rats were ignored in our study. The raw transcriptome data from both datasets were normalized by the RMA package in R, and PCA and Sammon mapping were used to detect outliers in MATLAB. Only one sample (GSM132538) from the GSE5666 dataset was detected as an outlier and removed from the dataset for further analysis. p values between age groups were calculated using the limma package in R (Ritchie et al. 2015).

Protein–Protein Interactome Data

Rattus norvegicus PPI data was created using five different interactome databases, which are BioGrid (Stark et al. 2006), Mint (Licata et al. 2012), Intact (Orchard et al. 2014), UniProt (Bateman 2019) and iRefindex (Razick et al. 2008) in December 2018. This interactome consists of 3704 proteins and 7304 interactions. The transcriptome datasets used in this study include about 6000 (GSE854) and 12,000 (GSE5666) unique proteins. Only a fraction of those proteins could be mapped on the interactome, and many proteins that may be relevant to memory and cognition were lost. To prevent this loss, the Rattus norvegicus interactome was enriched with the Mus musculus interactome by using the following criteria: (i) the Mus musculus interactome was created by using the five databases mentioned above (December 2018), and the size of the interactome was identified as 10,377 proteins and 36,380 interactions; (ii) orthologous genes were found by Ensemble BioMart (Yates et al. 2020), which matches genes of different organisms based on sequence similarity. For this study, only the orthologous genes that have the same gene name in both organisms were selected to create the final rat interactome. The assumption here is as follows: if the genes have the same name in both organisms, they have a higher chance of having the same function(s). All the interactions of orthologous genes in the Mus musculus interactome were transferred to the Rattus norvegicus interactome, leading to a final interactome size of 9500 proteins and 37,043 interactions.

Subnetwork Discovery

Two different subnetwork discovery algorithms were used to create subnetworks. BioNet (Beisser et al. 2010) is an R package, and KeyPathwayMiner (Alcaraz et al. 2014) is an open-source software project. Both methods use organism-specific interactome data and p values derived from the transcriptome data. The KPM algorithm was used as a Cytoscape application (Shannon et al. 2003), and all the analyses were performed in Cytoscape. p values were binarized based on the threshold of 0.01 for KPM analysis. If the p value of the gene is below the threshold value, this gene is considered to be changed significantly, and it is assigned with the value of 1; otherwise, it is assigned to 0. The threshold value is the first parameter of KPM, and the second important parameter is “K value,” which gives the number of allowed nonsignificant genes in the discovered subnetworks. Interactome data and binarized p values of genes were introduced to KPM, and K value was adjusted to 2 in all simulations. BioNet analysis was performed in R. The false discovery rate (FDR) parameter was set to 0.1 for the GSE854 dataset and to 0.05 for the GSE5666 dataset.

Network Inference

The WGCNA algorithm (Langfelder and Horvath 2008) available as an R package was used to identify highly correlated protein pairs in the detected subnetworks by using Pearson correlation. Application of two subnetwork discovery algorithms on two datasets led to a total of four subnetworks. The combined list of genes from the four subnetworks was selected for WGCNA analysis to identify co-expression patterns between them. The gene expression data from each dataset was introduced to the algorithm separately along with the gene list to identify dataset-specific correlated pairs. The soft threshold parameter was chosen as 18, which corresponded to a 90% scale-free rate. “networkType” was set to “signed” to consider only positively correlated gene pairs, since they are more likely to have protein–protein interactions (Ramani et al. 2008), and they tend to have a more significant functional association (Song et al. 2012) compared to negatively correlated genes. “minModuleSize” was adjusted to 100, and “mergeCutHeight” was adjusted to 0.25 for both datasets.

Functional Analysis

g:Profiler (Reimand et al. 2016) was used to detect the common functions of the proteins in the subnetworks through Gene Ontology (GO), pathway, miRNA and transcription factor (TF) enrichment analyses. Each subnetwork was functionally analyzed separately in g:Profiler with the default correction type (g:Profiler correction) and p value cutoff of 0.05 for significance. As an alternative approach, GeneCards (Stelzer et al. 2016) and NCBI (Agarwala et al. 2016) were used to go through the functions of specific proteins manually when needed.

Results and Discussion

Subnetwork Discovery from Transcriptome Datasets

In this study, BioNet and KPM were used to discover subnetworks from the organism-specific PPI networks by mapping learning- and memory-related transcriptome data. The size of the discovered subnetworks is given in Table 1. The subnetworks are given in list format in Online Resource 1. These four subnetworks have proteins in common, as shown in Fig. 1. The GSE854 dataset has 106 proteins in common in BioNet and KPM results, and the KPM subnetwork is a subset of the BioNet subnetwork. The GSE5666 dataset has 145 proteins in common in BioNet and KPM results. All four subnetworks have only 18 proteins in common, and 229 proteins are subnetwork-specific, meaning that they appear only in one of the subnetworks (Fig. 1).

Table 1 The size of the subnetworks created by BioNet and KPM algorithms for two datasets
Fig. 1
figure 1

Intersection of four subnetworks identified from two different datasets by two different subnetwork discovery algorithms. Venn diagram representation was obtained from the Venny tool (https://bioinfogp.cnb.csic.es/tools/venny/.)

The exact mechanism of aging and memory deficits has not yet been identified. However, several mechanisms were reported in the literature related to the learning and memory mechanisms. Each subnetwork reported in Fig. 1 is enriched by several memory- and learning-related terms reported in the literature (Online Resource 2). Among the terms commonly observed in all identified subnetworks are gliogenesis, neurogenesis, central nervous system development, neuron and glial cell differentiations, cytokine production and stimuli, signaling, circadian rhythm and oxidative stress. Cytokines are proteins that are responsible for cell interactions and communications. They are known as messenger molecules; they bind to receptors and evoke biological activity. In the nervous system, they work for inflammatory responses. In case of nerve injury in neurons or any part of the nervous system, macrophages or microglia cells can migrate to the injured area, and they produce specific growth factors or cytokines that can be used for regeneration of nerve cells and foster neurogenesis, contributing to learning and memory mechanisms by creating new connections between neurons. Gliogenesis and neurogenesis are also related mechanisms in learning and memory processes. Neurogenesis produces neurons from neuronal stem cells, and these neurons are further used in memory and learning mechanisms. Gliogenesis is responsible for myelination and for the production of supporting glia cells, oligodendrocytes and astrocytes. These cells protect neurons from the environment, repair the damaged parts of neurons and create myelin sheets for faster signal transmission (Rusznák et al. 2016). Myelin sheets cover axons to prevent them from damage and are responsible for safe and faster information transmission, by also contributing to the formation of new connections (Dutta et al. 2013). Transmission of matter and signal occurs through intracellular and intercellular signalling mechanisms. Those mechanisms create signalling networks to transport stimuli and response and help neurons communicate with each other, contributing to learning and memory processes (Koseska and Bastiaens 2017). Circadian rhythm is a major molecular process that regulates physical events such as the cell cycle, feeding, body temperature, sleep–wake cycle and metabolism. This process also influences memory and learning performance in animals and cognitive performance in humans (Antoniadis et al. 2000). More specific terms regarding those mechanisms such as learning and memory, synaptic signaling, cognition, synaptic plasticity, aging, astrocyte differentiation and myelination also appeared in the enrichment results of some of the subnetworks.

Reconciliation of Between-Subnetwork Differences

Subnetwork analysis was performed by using two methods; BioNet (Beisser et al. 2010) and KeyPathwayMiner (Alcaraz et al. 2011). BioNet and KPM algorithms both use PPI and transcriptome data to discover significant and meaningful subnetworks, but they use different methods and different parameters. Therefore, these two methods have some advantages and disadvantages over each other. KPM allows users to choose the maximum number of nonsignificant genes allowed in the subnetwork (the K parameter), but p values are introduced to the algorithm in binarized format, which makes two significant but different p values the same. BioNet has only one parameter (FDR) and does not binarize p values; it weighs each node by using its p value. However, the number of nonsignificant genes allowed in discovered subnetworks cannot be specified by users. Therefore, we first analyzed BioNet modules in terms of nonsignificant genes and compared them with KPM modules, which had two nonsignificant genes as set by the K parameter. BioNet modules have a higher number of nonsignificant genes than KPM modules, with eight genes for the GSE8654 dataset and 36 genes for the GSE5666 dataset (p value < 0.05). For a threshold of 0.01, the numbers are 89 and 49, respectively. The functional enrichment analysis results show that desirable subnetworks were created by both algorithms, but KPM results have more related functional terms about memory, learning and cognition (Online Resource 2).

The low overlap between the subnetworks derived from different datasets and different discovery algorithms (Fig. 1) led us to test the hypothesis that algorithm-specific or dataset-specific proteins can indeed have the same functions, reconciling the differences in the results. To this aim, we checked the functions of the proteins not common between the subnetworks in terms of learning and memory. Even in the same dataset, nearly 40–50% proteins are different in BioNet and KPM subnetworks. In the GSE854 dataset, the KPM subnetwork is a subset of the BioNet subnetwork, that is, the BioNet subnetwork covers all the proteins identified by the KPM subnetwork. In the GSE5666 dataset, 145 proteins are common in both subnetworks, while 58 proteins are specific to the BioNet subnetwork, and 95 proteins are specific to the KPM subnetwork (Fig. 1). To understand the reason behind the difference between subnetworks identified by the two algorithms in terms of the number of not-shared proteins, functional enrichment analysis was performed for subnetwork-specific proteins using g:Profiler with a high p value cutoff (0.5) to capture all protein-function associations. Comparison of the functions of specific proteins reveals that 46 proteins from the BioNet subnetwork and 66 proteins from the KPM subnetwork in the GSE5666 dataset are different, but they have common functions (GO terms) such as system development, regulation of biological processes and structure development. Only 12 proteins from the BioNet subnetwork and 29 proteins from the KPM subnetwork are functionally different (Table 2). The analysis shows that the majority of the subnetwork-specific proteins indeed have the same functions. This means that one reason behind the discrepancy between the results of different algorithms is that different algorithms capture different proteins functioning in the same molecular processes.

Table 2 Analysis of proteins for the GSE5666 dataset between the subnetworks discovered by the two algorithms

Reconciliation of Between-Dataset Differences

Both transcriptomic datasets studied here aim to understand the effect of aging on learning and memory mechanisms by using Rattus norvegicus as a model organism, but there are some differences between the two studies. These studies used the same organism and same region of the brain (hippocampal CA1 region), but different platforms were used for transcriptome analysis. GSE5666 used the Affymetrix Rat Expression 230A Array, and GSE854 used the Affymetrix Rat Genome U34 Array. The number of genes that are captured by these chip designs is different. GSE854 dataset consists of 8800 probes, and 5748 of these probes are unique in terms of associated genes, but the GSE5666 dataset consists of 15,923 probes, and 11,860 of the probes are unique. The number of significantly changed genes is also different for both datasets for young–aged comparison. GSE854 has 407 (7% of data) significantly changed genes at threshold of p = 0.01, and GSE5666 has 924 (8% of data) significantly changed genes at the same threshold. Nearly the same percentage of data significantly changed across the two datasets. One other difference between the two datasets, which may be the reason behind the differences in the discovered subnetworks, is that the samples were taken immediately after the 5-day training in GSE5666, while they were taken 24 h after the 7-day training in GSE854.

KPM and BioNet algorithms map transcriptome data on interactome data. For GSE854, 61% of the transcriptome data was mapped on the interactome data by KPM, and this corresponds to 37% of the proteins in the interactome. Nearly 3500 genes were used from the transcriptome data for subnetwork discovery analysis. For GSE5666, 72% of the transcriptome data and 58% of the interactome data was used. Nearly 8500 genes were used for subnetwork discovery analysis. The size of the subnetworks may vary due to the chips used in the experiments. Both datasets have 5368 genes in common (93% of the GSE854 dataset and 45% of the GSE5666 dataset). The size of the subnetworks affects the common functional property of the subnetwork since as the number of nodes increases, their common functions may change. Functional enrichment analysis was performed for the identified subnetworks in g:Profiler. In the subnetwork analysis, GSE854 gave more learning- and memory-related terms than GSE5666. They both have neuron, nervous system-related terms, cytokine production and regulation, immune system processes, neurogenesis and gliogenesis functional terms, but in GSE854 synaptic plasticity, learning and memory terms also explicitly exist in the enrichment analysis results. As a result, GSE854 gives more reliable subnetworks based on functional analysis at first sight.

We wanted to check the differences between the datasets in terms of subnetworks created by BioNet and KPM. In BioNet, only 33 proteins were common between the datasets and we checked the functions of different proteins between the datasets (Fig. 1). Functional enrichment analysis result reveals that the number of functionally different proteins between the subnetworks derived from the two datasets is indeed much lower, since most of the proteins were associated with the same GO functional terms (Table 3, Online Resource 3). These different but functionally same proteins are related to the cytokine production and response, aging, nervous system development, gliogenesis, axon development, stress response, signaling and lipid response. In KPM, only 26 proteins are common in both datasets at first sight, and we compared the functions of dataset-specific proteins. These proteins were found to be commonly associated with cytokine production, oxidative stress, nervous system development, gliogenesis, postsynaptic density and neuron-to-neuron synapse functions, and the number of functionally same genes is indeed much higher, implying that the discovered subnetworks are functionally much more similar to each other (Table 3). With this analysis, we have proven that the subnetworks that were discovered from the different datasets and initially found to differ significantly have a high degree of functional similarity.

Table 3 Number of functionally same proteins between datasets

Critical Assessment of Functional Analysis Results

Functional enrichment analysis reveals that the four discovered subnetworks include memory- and learning-related proteins but some of the proteins did not seem functionally relevant, as reported in Tables 2 and 3 (labelled as functionally different proteins). We used two approaches to better understand this result; first, we focused on the less-known pathways, processes, miRNAs and transcription factors identified in our enrichment analysis results as significantly affected and checked if they were reported to be linked to memory-learning mechanisms in the literature. Then, we performed one-by-one investigation of the proteins in the subnetworks that were not relevant to learning- and memory-associated molecular mechanisms at first sight (Online Resource 3). Online Resource 3 provides the list of functionally common/different proteins for each subnetwork.

Kinase binding is one of the terms identified in the functional enrichment analysis that seems not relevant to learning and memory. There are some kinases known to be involved in memory and learning mechanisms such as calcium/calmodulin-dependent protein kinase II (CaMKII), extracellular-signal-regulated kinase (ERK1/2), protein kinase A (PKA), protein kinase G (PKG), etc. These kinases regulate synaptic transmission via changing ion channel densities or trigger protein synthesis, which affects synaptogenesis. Kinase binding and kinase activity is important to create, store and recall memory in the adult brain (Giese and Mizuno 2013). PKA is a serine–threonine kinase, which is known to form hippocampus-dependent memory via controlling synaptic plasticity (Abel and Nguyen 2008). miR-19b was identified to be enriched in the functional enrichment analysis. It is one of the important miRNAs about learning and memory mechanisms. miR-19a-b, miR-20a and miR-92a have functions in neurite remodeling and neurogenesis (Schiera et al. 2020). miR-19b targets the ADRB1 gene, and they have a role in memory stabilization (Volk et al. 2014).

There are some transcription factors (EGR1, EGR2, EGR3, SRF, WT1) in our functional enrichment analysis which seems unrelated to the learning and memory terms at first sight. We also checked these transcription factors in terms of whether they have a function in controlling learning or memory mechanisms. Early growth response (EGR) factor family proteins increase long-term memory and synaptic plasticity with CREB (cAMP response element binding protein) and activating protein 1 (AP-1) (Alberini 2009). Serum response factor (SRF) is one of the important TFs that control long-term memory and synaptic plasticity. This TF is primarily expressed in neurons and targets EGR1 and EGR2 TFs (Alberini and Kandel 2015). The Zif268/EGR TF family is expressed during development and triggers the memory formation (Veyrac et al. 2014). EGR1 and EGR3 TF families are related to the long-term potentiation and synaptic plasticity and are expressed in the nervous system, and EGR2 is expressed in Schwann cells to trigger myelination in the peripheral nervous system (Adams et al. 2017). There are five members in the EGR family, one of which is Wilms tumor 1 (WT1). Synaptic plasticity and memory flexibility are important mechanisms for learning and memory to regulate and control behavior. Any impairment in memory flexibility causes neurodegenerative disorders such as autism. WT1 is one of the important TFs responsible for regulating synaptic plasticity via decreasing memory strength. This TF is activated in the hippocampus, and its overexpression causes memory weakness (Mariottini et al. 2019).

The mitogen-activated protein kinase (MAPK) signaling pathway and some of the related transcription factors (E2F1, TEAD2, SP1) were detected in the enrichment analysis, and their relationships with the memory and learning mechanisms were checked from the literature. The MAPK signaling pathway is important for short- and long-term memory formation in early phases of development (Ribeiro et al. 2005). MAPK expression and activity increase after training to help store information in the brain and use it when necessary (Michel et al. 2011). MAPK/ERK signaling pathway inhibition causes learning memory problems in many species, especially in rats (Miao et al. 2018). There are many transcription factors that regulate the MAPK signaling pathway, among which are TEA domain transcription factor 2 (TEAD2), E2F transcription factor 1 (E2F1) and Sp1 transcription factor (SP1) (Zellmer et al. 2010). SP1 is important in Alzheimer’s disease to respond to inflammatory signals. It influences memory performance, and the absence of this TF causes memory deficits and cognitive dysfunction in mice. SP1 also regulates neuronal survival genes (Citron et al. 2015). E2F1 mutation causes memory-related deficits in mice, and expression of genes targeted by E2F1 decreases with aging in mice and causes memory deficits (Ting et al. 2014). TEAD2 is related to the neural development and neuronal tube closure at some stages of the brain development, and it regulates paired box gene 3 (PAX3) gene expression, which is related to mammalian brain development (Kaneko et al. 2007).

As a complementary approach to the enrichment analysis, we searched one by one the proteins that are subnetwork-specific and irrelevant to the memory-learning mechanism to uncover their functions. Proteins identified as functionally different by BioNet and KPM between the two datasets (Table 3) were searched in NCBI and GeneCards one by one. Some of these proteins are related to the brain, learning and cognition mechanism based on NCBI and GeneCards. Bcas1, brain-enriched myelin-associated protein 1, is a specific protein for the GSE5666 BioNet subnetwork. This protein is required for myelination in the brain and mostly expressed in Schwann cells and oligodendrocytes (Ishimoto et al. 2017). By knocking out the corresponding gene by point mutation and comparing the behavior of the knock-out mice with that of the control group, Ishimoto and colleagues showed that the lack of this protein causes hypomyelination, which brings about schizophrenia and anxiety-like behaviors in mice. Myelination is important for learning and memory mechanisms, so this protein is indeed relevant for our analysis (Ishimoto et al. 2017). In an RNA-seq-based independent analysis, bcas1 was found to be enriched in oligodendrocytes (Sharma et al. 2015). Agrin is another GSE5666 BioNet-specific protein and related to the synapse development and regeneration based on Rat Genome Database (Smith et al. 2020). Also, agrin is important in synaptogenesis. It functions by organizing electrical and chemical transmission between neurons, and this transmission is important in memory mechanisms (Martin et al. 2005). Neural cell adhesion molecule 1 (NCAM1) is one of the GSE854 BioNet-specific proteins. Djordjevic and colleagues exposed rats to chronic stress and showed that mRNA expression of NCAM1 is increased in the hippocampus in stressed animals, which implies involvement of NCAM1 in stress-induced cognitive disturbance and synaptic instability. This protein is responsible for nervous system development, and it can act as a synaptic plasticity biomarker (Djordjevic et al. 2012). NCAM1 is also important in enhancing spatial learning and memory in rats (Knafo et al. 2012). Schizophrenia is a neurodevelopmental disorder with alterations in cognitive functions, and NCAM1 was also reported to be differentially methylated in schizophrenia patients in a genome-wide DNA methylation study (Viana et al. 2017). Probable global transcription activator SNF2L2 (SMARCA2) is another GSE854 BioNet-specific protein and responsible for the differentiation of neuronal stem cells to neurons in the brain based on a proteomic study (Lessard et al. 2007). ATP-binding cassette sub-family A member 2 (ABCA2) is a GSE5666 KPM-specific protein and mostly expressed in the brain and has a function in myelination (Tanaka et al. 2003). The authors showed through immunohistochemical analysis that ABCA2 expression level is specifically different in oligodendrocytes during brain development (Tanaka et al. 2003). ABCA2 is also related to Alzheimer’s disease, and overexpression of this gene upregulates the production of amyloid-beta and its precursor protein (Chen et al. 2004). Ferritin heavy chain 1 (FTH1) is another GSE5666 KPM-specific protein that is responsible for iron storage, and lack of this protein causes neurodegenerative disease in rats, because iron accumulation can cause toxicity in the brain (Finazzi and Arosio 2014). Nerve growth factor receptor (NGFR) is a GSE854 KPM-specific protein, and its levels were found to be lower in the blood samples of schizophrenia patients, implying a role of the protein in neuron differentiation and survival during development. (Zakharyan et al. 2014).

Analysis of Co-expression-Based Interactions

Subnetwork discovery analysis led to four subnetworks from the two datasets. These subnetworks have common and specific proteins, as shown in Fig. 1. We proved in our previous analyses that the subnetwork-specific proteins, even if their names are different, are involved in the same cellular tasks. In this section, we aim to show that there is a correlation-based relationship between subnetwork-specific proteins and proteins shared by different subnetworks (common proteins). In other words, we here show that although the specific proteins were identified only in one subnetwork in the protein–protein interactome-based analysis, they are strongly co-expressed with the common proteins.

The two subnetwork discovery methods led to two subnetworks for each dataset. The four subnetworks were combined, and the combined network includes 471 genes and 829 unique interactions. These 471 genes were used to create a highly correlated network through WGCNA to identify co-expressed pairs. Four hundred and sixty-seven of the genes are available in GSE5666 while 327 are available in the GSE854 dataset. WGCNA was performed for the two datasets separately, and one module was created for each dataset. These two modules were combined under the name “WGCNA Network,” and it has 138 genes and 564 interactions (Online Resource 4). In the WGCNA network, the edge score cutoff was adjusted to 0.05, which corresponds to a Pearson correlation value of 0.85 (0.05 to the power of the soft threshold of 1/18). That is, only gene pairs showing a positive correlation higher than 0.85 were considered. Later, we further combined the WGCNA network with the combined subnetworks, which had 471 genes and 829 interactions, and we refer to the final network as the “Memory-Learning Network.” The Memory-Learning Network includes 471 genes and 1389 interactions (Fig. 2) (Online Resource 4, Online Resource Fig. 1).

Fig. 2
figure 2

Memory-learning-related network. Green represents 49 specific proteins that are common between the WGCNA co-expression network and the list of subnetwork-specific proteins. Turquoise proteins are other subnetwork proteins discovered by KPM or BioNet subnetworks. Red interactions come from WGCNA co-expression analysis

Our major goal in the incorporation of co-expression information was to understand the relationship of the subnetwork-specific proteins with other proteins in the Memory-Learning Network. Therefore, we focused on 49 proteins in the Memory-Learning Network that are in the set of subnetwork-specific proteins and co-expressed with other proteins in the network. Fifteen of those proteins are specific for the GSE5666 KPM subnetwork, 16 for the GSE5666 BioNet network and 18 for the GSE854 BioNet network. The 49 specific proteins were found to be co-expressed with 89 proteins from the common protein set based on the WGCNA analysis. Complement C1Q B chain (C1Qb) is one of the specific proteins. It has a function in inflammatory/immunity processes and its expression increases with aging (Qiu et al. 2016). Its co-expression with Cathepsin S (CTSS) was commonly identified in both WGCNA networks obtained from the two datasets. Impairment of CTSS causes neurodegenerative and psychiatric diseases such as anxiety, stress-related impairments in the brain and major depressive disorder (Niemeyer et al. 2020).

Figure 3 shows how WGCNA identifies co-expression-based interactions between the specific proteins and the proteins identified by multiple subnetworks (common proteins). The 49 proteins belong to the three different subnetworks (GSE854 BioNet, GSE5666 KPM and GSE5666 BioNet subnetworks), and they do not interact based on the subnetwork discovery algorithms. However, the WGCNA approach reveals that specific proteins from a subnetwork can be co-expressed with the specific proteins from other subnetworks or with the common subnetwork proteins. Peptidyl arginine deiminase 2 (PADI2) protein is one of these proteins and specific to the GSE854 BioNet subnetwork, and it is co-expressed with C1Qb (specific protein), which is specific to the GSE5666 KPM subnetwork. PADI2 is related to myelination and has a role in the onset of the neurodegenerative diseases such as multiple sclerosis. The overexpression of this protein in transgenic mice was shown to cause demyelination in brain tissue (Musse et al. 2008). In another study, ATAC-seq was used to show PADI2-driven inhibition of oligodendrocyte differentiation, and the protein was shown to interact with several myelin proteins via a pull-down assay (Falcão et al. 2019). C1Qb protein has co-expression-based interaction with C4A and FCGR2B proteins, which are specific to the GSE5666 BioNet subnetwork. Complement component 4 (C4A) protein is found in human dendrites, cell bodies and neuronal synapses, and increase in its expression with C4B is related to schizophrenia (Sekar et al. 2016). FCGR2B (FC fragment of IgG receptor IIb) is related to Alzheimer’s disease via increasing amyloid-β toxicity in the brain, and knocking out this gene increases amyloid-β resistance (Kam et al. 2013). C1Qb has also co-expression interactions with common proteins (e.g. CTSS, ANXA3, LAMP2, B2M). Annexin A3 (ANXA3) is upregulated in the brain in nerve injury or post-ischemic conditions (Kessler et al. 2008; Konishi et al. 2006). Lysosome-associated membrane 2 (LAMP2, GSE854 BioNet KPM-specific) is a Parkinson’s disease (PD)-related gene, and its expression significantly decreases in PD condition (Wu et al. 2011). Beta-2 microglobulin (B2M, a common protein) is also related to PD, and its expression increases in dopaminergic striatal regions in the brain (Mogi et al. 1995). Therefore, the WGCNA-derived co-expression network proves that although these proteins were identified by the different datasets/algorithms, and they do not interact in the discovered subnetworks, they are co-expressed. Prostaglandin D2 synthase (PTGDS) is also one of the 49 proteins, and it is specific to the GSE5666 KPM subnetwork. It has three co-expression interactions; plasmolipin (PLLP) (common protein), crystallin alpha B (CRYAB) (common protein) and cysteine- and glycine-rich protein 1 (CSRP1) (specific protein). CSRP1 has a function in neuronal development and maintenance (Hetmańczyk-Sawicka et al. 2020). The authors identified a decrease in the mRNA expression of CSRP1 gene in Niemann-Pick C disease patients in microarray experiments, a neurodegenerative disease, which was additionally confirmed by quantitative real-time PCR (Hetmańczyk-Sawicka et al. 2020). PLLP is related to myelination, and its expression decreases in the brain in chronic social stress (Cathomas et al. 2019). Co-expression analysis links PTGDS with CSRP1 protein, which is specific to the GSE5666 BioNet subnetwork, and with PLLP protein, which is specific for GSE854 subnetworks, bridging the two proteins identified by different datasets. Protein phosphatase 3 catalytic subunit alpha (PPP3CA), a GSE854 BioNet-specific protein, is another of the 49 specific proteins, and it is co-expressed with ribosomal protein L21 (RPL21) (common protein). PPP3CA is one of the long-term potentiation genes, and it is related to the synaptic plasticity mechanism in immature rats (Göl et al. 2019). Fumarylacetoasetaz (FAH) is another of the 49 specific proteins, and it is primarily expressed in white matter in the brain, and the mutation of this gene decreases visual/spatial learning performance in mice (Hillgartner et al. 2016). FAH is specific to the GSE5666 BioNet subnetwork and has co-expression-based interactions with GSE5666 KPM-specific proteins (FIS1, C4B, C1Qa), GSE854 BioNet-specific proteins (SRSF5, IRF1 AND EDF1) and lots of common proteins such as B2M, CTSS, CDH1. FIS1 is known as mitochondrial fission 1 protein, and its overexpression causes abnormal mitochondrial functions and triggers Alzheimer’s disease (Wang et al. 2009). Interferon regulatory factor 1 (IRF1) protein expression increases in traumatic brain injury condition (Rao et al. 2003), and it was reported to be associated with programmed cell death and inflammation (Yanai and Taniguchi 2008). Another study reported that IRF1 is downregulated by miR130b, leading to suppression of cell apoptosis in cerebral ischemia/reperfusion, a condition that accelerates neurodegeneration (Liu et al. 2020). The cadherin 1 (CDH1) protein has a protective effect in the hippocampus and increases neuroplasticity in ischemia condition (Zhang et al. 2019). WGCNA-based results show that many subnetwork-specific proteins (Fig. 1) are co-expressed together, or they are co-expressed with the common proteins in the combined subnetwork, implying that although they were classified as functionally different proteins (Table 3), these proteins are indeed linked to the proteins with memory-learning-related functions.

Fig. 3
figure 3

Co-expression of 49 subnetwork-specific proteins with common and specific proteins. Non-colored proteins are proteins captured in more than one subnetwork; green shows GSE5666 BioNet-specific proteins, blue shows GSE5666 KPM-specific proteins and pink shows GSE854 BioNet-specific proteins. Red interactions represent the co-expression interaction of 49 specific proteins with each other. This figure shows how 49 WGCNA proteins are co-expressed with each other and with the commonly identified subnetwork proteins

Novelty of Network-based Data Analysis Over the Traditional Analysis

Several experimental designs were presented in the literature to elucidate the effect of aging on learning and memory performance. Rattus norvegicus is commonly used as a model organism in these studies, and animals with different age groups are trained with SWM, OMT or other memory tests. At the end of the training, the hippocampal region of the brains of animals was extracted, and transcriptome data was collected. Basic statistical analyses (Student t test, ANOVA) were performed with transcriptome data, differentially expressed genes were identified and functional enrichment analysis was performed on significantly changed genes to detect the underlying mechanisms of the effect of aging on memory. In this study, these analyses were expanded with network-based approaches. Mainly two network-based approaches were used for this study, subnetwork discovery, which maps transcriptome data on the organism-specific PPI network to find subnetworks, and network inference, which predicts co-expression-based interactions by using Pearson correlation. Network-based approaches are important to understand topological relationships of genes and their functions in specific conditions. Not only significant genes but also experimentally proven interaction/relationships of genes were used in this study to understand the effect of aging on memory deficits. Functional analyses of the discovered subnetworks proved that these modules include cognition-, learning- and memory-related terms. Some of these terms were already identified in the original studies that reported those transcriptome data, without incorporating networks into analysis (Blalock et al. 2003; Rowe et al. 2007). These terms are nervous system development, immune system processes, signal transduction, axonal growth, myelinogenesis, cytokine production and regulation. In the network-based analysis, learning, memory, synaptic plasticity, circadian rhythm terms are clearly detected, which are important for learning and memory. There are some common and different genes and processes between our analysis and the two studies reporting the transcriptome datasets analyzed here (Blalock et al. 2003; Rowe et al. 2007). cAMP/protein kinase A (PKA)-related signaling, EGR1 transcription factor, FTH1, CTSS, C1Qb, FAH, ANXA3, LAMP2, FCGR2B, B2M, PTGDS, RPL21 and Agrin genes, cytokine metabolism, neurogenesis and myelination are common genes and processes. On the other hand, our network-based analysis discovered memory-related genes (BCAS1, NCAM1, SMARCA2, ABCA2, NGFR, PADI2, CSRP1, PLLP, PPP3CA, FIS1, IRF1, C4A-B, CRYAB AND CDH1), transcription factors (EGR 2,3, WT1, TEAD2, SP1, E2F1, CaMKII, ERK1/2, PKG) and a miRNA (Mir-19b), which were not captured in the original analysis that solely focused on statistical analysis of the data.

Conclusions

Learning and memory processes have not been fully resolved yet at a molecular scale. In the literature, there are related transcriptome data collected from rats. These studies conducted various statistical analyses aimed at elucidating memory and learning mechanisms but ignored interactome data. In this study, we mapped the learning- and memory-related transcriptome data on organism-specific interactome data and determined subnetworks that include significantly altered and interacting protein pairs between young–aged comparison of trained rats. The subnetworks were generated for two different datasets by using two different algorithms to document the effect of the dataset and algorithm differences on the results. Our analysis showed that within the same dataset, different subnetwork discovery algorithms create different subnetworks, but these subnetworks are enriched with proteins with common functions. Also, within the same subnetwork discovery algorithm, different datasets create different subnetworks. However, we showed here that they also have common functions in terms of memory-learning mechanisms. In addition, functionally different proteins were searched in NCBI and GeneCards. With a detailed functional examination, we have shown that proteins whose memory-learning association cannot be detected as a result of functional analysis are indeed related to memory-learning and the brain. Our analysis of co-expressed gene pairs within the combined subnetwork using the WGCNA algorithm further validated our findings, since high co-expression between the subnetwork-specific/dataset-specific and common proteins showed that even if the specific proteins do not interact physically across different subnetworks/datasets, they are positively correlated based on the WGCNA analysis. Based on those analyses, we created the Memory-Learning Network which considers organism-specific interactome data, significantly changed genes and positive correlations between proteins with two different subnetwork discovery algorithms as well as a network inference algorithm. We believe that our network-based approach presented here gives novel insights on extracting memory- and learning-related molecular mechanisms from transcriptome data.