Identification of key gene modules and pathways of human breast cancer by co-expression analysis

Zhao, Qingnan; Song, Wenqing; He, Dai yu; Li, YanSong

doi:10.1007/s12282-017-0817-5

Identification of key gene modules and pathways of human breast cancer by co-expression analysis

Original Article
Published: 23 November 2017

Volume 25, pages 213–223, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Breast Cancer Aims and scope Submit manuscript

Identification of key gene modules and pathways of human breast cancer by co-expression analysis

Download PDF

Qingnan Zhao¹,
Wenqing Song²,
Dai yu He² &
…
YanSong Li²

1098 Accesses
15 Citations
Explore all metrics

Abstract

Background

Breast cancer is the most common and aggressive tumor causing injury to women world wide. Although gene expression analysis had been performed previously, systemic co-expression analysis for this cancer is still lacking to date. We attempted to identify the critical modules of breast cancer.

Methods

Co-expression modules were established with the help of WGCNA and the interactions among them were performed by R language. Biological process and pathways analysis of co-expression genes were figured out by GO and KEGG functional enrichment analysis using DAVID dataset.

Results

In this study, expression data of 4,000 genes from 136 samples with breast cancer was used for the establishment of co-expression modules. And nine modules were identified. There was much higher scale independence among different modules by interactions analysis. Moreover, there was an obvious difference in adjacency degree among different modules. The most enriched pathways as immune response and ubiquitin-mediated proteolysis were identified as the most critical modules of breast cancer by GO and KEGG enrichment analysis.

Conclusion

Our result demonstrated that immune response and ubiquitin-mediated proteolysis could serve as prognostic and predictive markers for the occurrence of breast cancer, providing evidence for further analysis in the prognosis and treatment of breast cancer.

Co-expression network analysis identified candidate biomarkers in association with progression and prognosis of breast cancer

Article 06 July 2019

RMCL-ESA: A Novel Method to Detect Co-regulatory Functional Modules in Cancer

MultiDCoX: Multi-factor analysis of differential co-expression

Article Open access 28 December 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Breast cancer is the most common and aggressive tumor causing great injury to women physically and mentally [1]. This disease largely affects women in their 40s to 60s. Women before or after the period of menopause were more prone to be affected. It is the second most cancer now, just after lung cancer, the principal cause of death from cancer among women both in developing and developed countries [2]. However, the mechanisms of critical pathways and their interactions involved in the occurrence and development of breast cancer, remain largely unknown. Up to now, early diagnosis is still the key to improving the curative effect in the clinical treatment of breast cancer [3, 4]. Therefore, in this study, we aimed to explore the molecule mechanism in the development of breast cancer and thus provide evidence for further research.

Weighted Gene Co-expression Network Analysis (WGCNA) is a method frequently used in the co-expression module correlation analysis by microarray samples [5]. Besides, it is a comprehensive collection of R functions, which is commonly used in various aspects of weighted correlation network analysis. It’s widely used in various biological processes, such as cancer, genetics, and brain imaging data analysis [6], which is quite helpful for the identification of candidate biomarkers or therapeutic targets. Not only can it help in the process of comparing differentially expressed genes, but also help in figuring out the interactions among genes in different co-expression modules [7]. It is reported that WGCNA analysis had been performed on publicly available microarray data covering a genome-wide scale of genes. WGCNA was proven to be a promising and reliable tool for clinical diagnosis of breast cancer. In this study, a total of nine co-expression modules were constructed by WGCNA. In this study, the WGCNA analysis identified nine modules of genes with high topological overlap in total.

Kyoto Encyclopedia of Genes and Genomes (KEGG) [8], a bioinformatics resource for better understanding of high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, was widely used in the mechanism research. The result of KEGG analysis in this study showed that the enriched pathways of hsa04120 (ubiquitin-mediated proteolysis) in co-expression module nine were quite meaningful in the occurrence of breast cancer. We hope our study will help in better understanding the discovery of biomarker in the clinical diagnosis of breast cancer.

Materials and methods

Expression value analysis of microarray data of breast cancer samples

Probe values were downloaded from GEO dataset at the https://www.ncbi.nlm.nih.gov/geo/ of NCBI with the key word “breast cancer”. Annotation information of microarray data was used to match probes with corresponding gene information. Probes matching with more than one gene were eliminated and the average expression values were calculated out for genes matching with more than one probe. The number of genes was calculated with different expression threshold value of genes so as to determine the appropriate threshold value. WGCNA algorithm was used to evaluate the expression value of genes. What is more, flashClust tool package in R language [9] was used to conduct the cluster analysis of samples at the appropriate threshold value.

Analysis of co-expression modules of breast cancer

Power values were screened out by WGCNA [5] algorithm in the construction of co-expression modules. Scale independence and average connectivity analysis of modules with different power value were performed by gradient test (power value ranging from 1 to 20). Appropriate power value was determined when the scale independence value was equal to 0.8. WGCNA algorithm was then used to construct the co-expression modules and extract the gene information in each module. The smallest number was set as 50 for the reliability of the result.

Interaction analysis of co-expression modules of breast cancer

WGCNA algorithm was used to analyze the interaction relationship among different co-expression modules. Heatmap tool package in R language was used to describe the strength of the relationship (strong or weak degree).

Functional annotation analysis of co-expression genes of breast cancer

Co-expression modules were ranging from the most to the least by the number of genes. Then, functional enrichment analysis was performed on the genes in these modules. Corresponding gene information was mapped to the DAVID dataset (https://david.ncifcrf.gov/summary.jsp) [10]. Gene ontology (GO) [11] and KEGG [8, 12] enrichment analysis were performed. Therefore, the enriched biological processes and metabolic pathways were obtained. The analysis was conducted with the condition of P < 0.05. If there were more than five records, then the top five were selected for the further analysis.

Results

Expression values analysis of microarray data of breast cancer

A total of 136 typical breast cancer samples were obtained from NCBI with the accession number of GSE12903 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12903) [13]. The sequencing platform was GPL96 ([HG-U133A] Affymetrix Human Genome U133A Array) and the number of cancer samples was from GSM305129 to GSM30526. This dataset was larger and much newer. These 136 tumors were from Breast cancer patients who had received adjuvant tamoxifen therapy only. Frozen tumor specimens source and clinical information for breast cancer patients are listed in Table 1 [13]. The microarray data was transformed to genes expression information using the original data. On one hand, probes matching with more than one gene were eliminated and the average value of expression value of genes matching with more than one probe was calculated out as the final expression value of the gene. Besides, genes with the negative values were eliminated. As a result, a total of 12,389 expression values of genes were obtained. Then, 4000 genes with the highest average expression value were selected for the cluster analysis by flashClust tool package of WGCNA algorithm (Fig. 1). As can be seen in Fig. 1, 136 breast cancer samples were divided into two clusters, GSM305262 and GSM305263, on the whole. Two samples were included in Cluster I, and 134 samples were included in Cluster II, which can be divided into two sub-clusters, including 124 samples (Sub-Cluster I) and 10 samples (Sub-Cluster II), respectively.

Table 1 Tumor characteristics for breast cancer patients in the present study

Full size table

Construction of co-expression module of breast cancer

Co-expression modules were constructed by the expression values of 4000 genes in 136 breast cancer samples using the WGCNA algorithm. Power value was one of the most critical parameters in the construction process, which mainly affected the scale independence and average connectivity of co-expression modules. Firstly, we screened the appropriate power value. When power value was equal to 8, the scale independence can be up to 0.8 (Fig. 2a) and was with higher average connectivity meanwhile (Fig. 2b). Therefore, power value equal to 8 was determined for further analysis. 4000 genes with highest expression value in 136 breast cancer samples were used for the construction of co-expression modules (Fig. 2c). As a result, a total of nine co-expression modules were constructed by the screened power value (8) and each module was manifested in different colors. These modules were numbered from the most to the least by the number of genes. There were 996 genes in module 1 (gray), 607 genes in module 2 (turquoise), 563 genes in module 3 (blue), 553 genes in module 4 (brown), 403 genes in module 5 (yellow), 371 genes in module 6, (green), 305 genes in module 7 (red), 120 genes in module 8 (black) and 82 genes in module (pink). The average number of genes in these nine modules was 444. The information of module each gene belongs to was listed in supplement Table 2.

Table 2 GO enrichment analysis of genes in the co-expression module

Full size table

Interaction relationship among co-expression modules of genes

Interaction relationship among the nine co-expression modules of genes was further analyzed (Fig. 3). As can been from the result, there was not any obvious difference of the interaction relationship, on the whole, indicating the relative independence expression of genes in each module and the much higher scale independence among different modules. What is more, the connectivity degree of eigengenes was analyzed for the better understanding of interaction relationship among the constructed co-expression modules. First, cluster analysis was performed on these critical genes (Fig. 4a) and we found that these nine modules were enriched in two clusters, one included six samples (module 1, 3, 5, 7, 8, 9) while the other included three samples (module 2, 4 and 6). Furthermore, there was an obvious difference in the effect of connectivity degree of different modules. Three pairs of module combination had much higher adjacency degree besides the highest self-comparison and adjacency degree. The three pairs had much stronger effects, and they are module 2 and module 6, module 3 and module 5, module 7 and module 8.

Functional enrichment analysis of critical modules

GO and KEGG enrichment analysis was performed on the genes in the constructed nine modules. We found that there was much difference in the enriched functions among different modules by the result of biology process analysis. The enriched GO terms in module 1 were mainly about the cell division and adherence and DNA repairing, including GO:0098609 (cell–cell adhesion), GO:0051301 (cell division) and GO:0006260 (DNA replication). The GO terms in module 2 were mainly enriched in the splicing and regulation of mRNA, mainly including GO:0000398 (mRNA splicing, via spliceosome) and GO:0043488 (regulation of mRNA stability). Genes in module 3 were similar to that in module 2, mainly enriched in the splicing process of mRNA, mainly including GO:0000398 (mRNA splicing, via spliceosome) and GO:0008380 (RNA splicing). Genes in module 4 were significantly enriched in rRNA processing and translation inhibition, mainly including GO:0006364 (rRNA processing) and GO:0006413 (translational initiation). Genes in module 5 were mainly enriched in the process of the mitochondrion, which was associated with energy supplying, mainly including GO:0006120 (mitochondrial electron transport, NADH to ubiquinone). Module 6 and module 7 were similar to module 1, mainly enriched in GO:0098609 (cell–cell adhesion). Module 8 was mainly enriched in immune/defend reactions, including GO:0006955 (immune response), GO:0006954 (inflammatory response) and GO:0051607 (defense response to virus). In module 9, genes were mainly enriched in the process of protein ubiquitination and instability, mainly including GO:0031648 (protein destabilization) and GO:0016925 (protein sumoylation). The result of KEGG enrichment analysis of genes in the nine constructed modules was shown in Fig. 5. The result showed that there were significant enriched metabolic pathways in each module and the enriched degree of metabolic pathways was quite different. Metabolic pathways in module 8 had the highest enriched degree while module 1 was the lowest. The result of KEGG analysis was illustrated in Table 3. Genes in module 1 were mainly enriched in hsa01100 (metabolic pathways) and hsa04110 (cell cycle). Genes in module 2 were mainly enriched in pathways as hsa03040 (spliceosome) and hsa00190 (oxidative phosphorylation). Genes in module 3 were mainly enriched in pathways as splicing and antibiotic synthesis, mainly including hsa03040 (spliceosome) and hsa01130 (biosynthesis of antibiotics). Genes in module 4 were mainly enriched in hsa03010 (ribosome) and hsa03040 (spliceosome) pathways. Genes in module 5 were mainly enriched in pathways of hsa00190 (oxidative phosphorylation) while genes in module 6 was mainly enriched in pathways as hsa04141 (protein processing in endoplasmic reticulum) and hsa01130 (biosynthesis of antibiotics). Genes in module 7 were mainly enriched in hsa04512 (ECM-receptor interaction) and hsa04510 (focal adhesion) pathways. Genes in module 8 were mainly enriched in hsa04612 (antigen processing and presentation) and hsa04145 (phagosome) pathways, which are in accordance with the result of GO analysis about biological process of immune response. Genes in module 9 were mainly enriched in the biological process of immune response pathways.

Table 3 KEGG enrichment analysis of genes in the co-expression modules

Full size table

Discussion

Breast cancer is the second most common tumors affecting people, especially women around the period of menopause worldwide. It is also one of the most principal causes of death of patients suffering from cancer [14]. Nowadays, there hasn’t been any effective treatment for patients with breast cancer and the most effective measure to this disease was prevention [3]. What is worse, patients at the same stage of disease can have quite different treatment responses and overall outcome, which makes the situation more complicated and thus the research on prognostic or predictive markers of breast cancer became more urgent. In this study, we aimed to explore the critical biomarker for a better understanding of the molecular mechanism, which can then be applied in the diagnosis or treatment of breast cancer. In this study, co-expression patterns in breast cancer and matched normal tissues were examined by WGCNA, a powerful method used to extract co-expressed groups of genes from large expression data sets. As a result, a total of nine co-expression modules were screened out by WGCNA in the training dataset GSE12903 from NCBI dataset. Besides, the critical co-expression modules and genes they included were identified by GO and KEGG functional enrichment analysis. Early studies on breast cancer most relied on gene expression profiles, which had some disadvantages. Although genome-wide gene expression breast cancer datasets were available and offered opportunities for translational advances and personalized medicines, the challenges still existed in data analysis. For example, the result of differential expressed gene analysis cannot be in accordance with another which was obtained at different platforms, thus making the result unreliable.

However, WGCNA approach can well avoid this disadvantage by performing well across all types of data and focusing on a batch of gene modules rather than individual genes. Besides, it does not rely on a prior assumption about genes or covariates. Therefore, WGCNA can avoid biologically wrong assumptions about independence of gene expression levels since it can also transform gene expression profiles into functional co-expressed gene modules. Up to now, WGCNA method has been applied in many types of cancers, such as lung cancer, brain cancer, and breast cancer. In this study, we found the genes in two co-expression modules, module 8 and module 9, played an essential role in immune response and ubiquitin-mediated proteolysis process, and these two modules were recognized as the most important modules in the occurrence of breast cancer. GO analysis showed that genes in module 8 were mainly involved in pathways in response to the immune system, inflammatory, and defense. Similarly, we found that genes in module 9 played important roles in response to protein syntheses, such as ubiquitin-mediated proteolysis, protein destabilization, and protein sumoylation processes. Furthermore, KEGG analysis revealed that module 8 was mainly enriched in hsa01130 (Biosynthesis of antibiotics) and hsa00190 (Oxidative phosphorylation) pathways. Most co-expression modules were in close association with immune reaction and ubiquitin-mediated proteolysis process, and these two pathways were regarded as potential biomarkers in the mechanism study of breast cancer. The enrich pathway of hsa04120 (ubiquitin-mediated proteolysis) was recognized as the most critical prognostic marker in the occurrence of breast cancer. Combined with the result of other two enriched pathways, that is, hsa01130 (biosynthesis of antibiotics) and hsa00190 (oxidative phosphorylation), enriched by more than one co-expression module, which were also in close association with the process of ubiquitin-mediated proteolysis, we have reason to believe these enriched pathways can function as biomarkers in the diagnosis of breast cancer. It is reported that cell proliferation correlate with relapse rate in pre- and postmenopausal women with breast cancer [15], and women around this period experienced changes in hormone levels in vivo. The ubiquitin-mediated proteolysis was in close association with the protein syntheses required for the cell proliferation and hormone synthesis. For example, estrogen and progesterone, two main hormones in menopause period, were largely affected in women with breast cancer [16, 17], combined with their main component of protein, the profound meaning of critical biomarker of ubiquitin-mediated proteolysis pathway was more certain to believe, which required further investigations.

In summary, our study used systems biology-based WGCNA approach to construct co-expression modules, which played a critical role in breast cancer. Ubiquitin-mediated proteolysis pathway, significantly enriched in module 8 and module 9, could function as the prognostic and predictive marker in the clinical management of breast cancer.

References

Berg JW, Robbins G. Factors influencing short and long term survival of breast cancer patients. Surg Gynecol Obstet. 1966;122:1311.
CAS PubMed Google Scholar
Adair F, Berg J, Joubert L, Robbins GF. Long-term followup of breast cancer patients: the 30-year report. Cancer. 1974;33:1145–50.
Article CAS PubMed Google Scholar
Saez RA, McGuire WL, Clark GM. Prognostic factors in breast cancer. Semin Surg Oncol. 1989;5:102–10.
Article CAS PubMed Google Scholar
Bloom H, Richardson W. Histological grading and prognosis in breast cancer: a study of 1409 cases of which 359 have been followed for 15 years. Br J Cancer. 1957;11:359.
Article CAS PubMed PubMed Central Google Scholar
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 2008;9:559.
Article Google Scholar
Ivliev AE, AC’t Hoen P, Sergeeva MG. Coexpression network analysis identifies transcriptional modules related to proastrocytic differentiation and sprouty signaling in glioma. Cancer Res. 2010;70:10060–70.
Article CAS PubMed Google Scholar
Clarke C, Madden SF, Doolan P, Aherne ST, Joyce H, O’Driscoll L, et al. Correlating transcriptional networks to breast cancer survival: a large-scale coexpression analysis. Carcinogenesis. 2013;34:2300–8.
Article CAS PubMed Google Scholar
Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30.
Article CAS PubMed PubMed Central Google Scholar
Ihaka R, Gentleman R. R: a language for data analysis and graphics. J Comput Graph Stat. 1996;5:299–314.
Google Scholar
Dennis G, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, et al. DAVID: database for annotation, visualization, and integrated discovery. Genome Biol. 2003;4:R60.
Article PubMed Central Google Scholar
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25:25–9.
Article CAS PubMed PubMed Central Google Scholar
Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. The KEGG resource for deciphering the genome. Nucleic Acids Res. 2004;32:D277–80.
Article CAS PubMed PubMed Central Google Scholar
Zhang Y, Sieuwerts AM, McGreevy M, Casey G, Cufer T, Paradiso A, et al. The 76-gene signature defines high-risk patients that benefit from adjuvant tamoxifen therapy. Breast Cancer Res Treat. 2009;116:303–9.
Article CAS PubMed Google Scholar
Fisher B, Bauer M, Wickerham DL, Redmond CK, Fisher ER, Cruz AB, et al. Relation of number of positive axillary nodes to the prognosis of patients with primary breast cancer. An NSABP update. Cancer. 1983;52:1551–7.
Article CAS PubMed Google Scholar
Isola J, Visakorpi T, Holli K, Kallioniemi O-P. Association of overexpression of tumor suppressor protein p53 with rapid cell proliferation and poor prognosis in node-negative breast cancer patients. J Natl Cancer Inst. 1992;84:1109–14.
Article CAS PubMed Google Scholar
Foekens JA, Portengen H, Van Putten WL, Peters HA, Krijnen HL, Alexieva-Figusch J, et al. Prognostic value of estrogen and progesterone receptors measured by enzyme immunoassays in human breast tumor cytosols. Cancer Res. 1989;49:5823–8.
CAS PubMed Google Scholar
Berger U, Wilson P, Thethi S, McClelland RA, Greene GL, Coombes RC. Comparison of an immunocytochemical assay for progesterone receptor with a biochemical method of measurement and immunocytochemical examination of the relationship between progesterone and estrogen receptors. Cancer Res. 1989;49:5176–9.
CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Breast Surgeon, China-Japan Union Hospital of JILIN University, Chang Chun, 130033, Jilin Province, China
Qingnan Zhao
Daqing Longnan Hospital, Daqing, 163453, China
Wenqing Song, Dai yu He & YanSong Li

Authors

Qingnan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Wenqing Song
View author publications
You can also search for this author in PubMed Google Scholar
Dai yu He
View author publications
You can also search for this author in PubMed Google Scholar
YanSong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to YanSong Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

About this article

Cite this article

Zhao, Q., Song, W., He, D.y. et al. Identification of key gene modules and pathways of human breast cancer by co-expression analysis. Breast Cancer 25, 213–223 (2018). https://doi.org/10.1007/s12282-017-0817-5

Download citation

Received: 10 June 2017
Accepted: 08 November 2017
Published: 23 November 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s12282-017-0817-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Identification of key gene modules and pathways of human breast cancer by co-expression analysis