Construction of Co-expression and Co-regulation Network with Differentially Expressed Genes in Bone Marrow Stem Cell Microarray Data

Biswas, Paramita; Barman, Bandana; Mukhopadhyay, Anirban

doi:10.1007/978-81-322-2250-7_76

Paramita Biswas⁷,
Bandana Barman⁸ &
Anirban Mukhopadhyay⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 339))

1730 Accesses

Abstract

It is important to understand the interaction mechanism among co-expressed and co-regulated genes in stem cell to restrict the abnormal growth of cell tissues (tumor) which may lead to cancer. In this article, differentially co-expressed and co-regulated genes exist in normal stem cells and stem cell derived tumors are identified from sample Bone Marrow microarray data. By performing statistical t-test between sample groups, first we have identified differentially expressed genes (DEG). Then up-regulated (UR) and down-regulated (DR) genes are separated by setting a p-value cutoff at 0.001. After identifying the differentially expressed genes, distinguished co-expressed up-regulated and down-regulated genes are found. Subsequently, we have constructed pair-wise co-expression networks with the co-expressed genes. Finally, we have studied the significance of co-expressed genes with gene ontology (GO) and we have found significant GO-ids. This study is expected to lead to finding of pathways for diseases.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Identification of key gene modules for human osteosarcoma by co-expression analysis

Article Open access 02 May 2018

Identifying key genes and functionally enriched pathways in acute myeloid leukemia by weighted gene co-expression network analysis

Article 09 July 2024

Identification of key gene modules and pathways of human breast cancer by co-expression analysis

Article 23 November 2017

Keywords

1 Introduction

Stem cells [1] have remarkable potential i.e. atypical characteristics to develop into many different cell types in the body during growth of early life. Sometimes, the unrestricted growth of stem cells or the abnormal growth of cell tissue causes cancer. Stem cells are distinguished from other cell types by two important characteristics. First, they are unspecialized cells capable of renewing themselves through cell division or sometimes after long periods of inactivity. Second, under certain physiologic or experimental conditions, they can be induced to become tissue or organ-specific cells with special functions. Researchers primarily worked with two kinds of stem cells: embryonic stem cells and non-embryonic “somatic” or “adult” stem cells. The induced pluripotent stem cells (iPSCs) [2–4] are some specialized adult cells which “reprogrammed” genetically to assume a stem cell like state in special condition. This property is unique property of stem cell it can be explored by gene expression analysis. Microarray gene expression data to predict and analysis of cancer disease becomes very important. These data can be characterized by genome variables and with their corresponding observations (experiments) in a experimental limitations [5]. To discover co-regulated genes, analysis of gene expression data [6] is required. Previously, it has been assumed that similar patterns in gene expression profiles usually suggest relationships between the genes. Now it is proved genes targeted by same transcription factors, tend to show similar expression patterns along time. Analyzing expression profiles of genes, targeted by same transcription factors revealed complex relationships between co-regulated gene pairs and it also includes co-expression relationships. In this article, we developed a simple algorithm to find differentially co-expressed and co-regulated genes, and then, to construct pairwise co-expression network. We applied the algorithm on sample gene expression microarray data of normal stem cells (nscr), stem cells derive tumors cells (scdtr) and patch tumor cells (ptr).

2 Materials and Methods

For finding and analyzing the relationship between differentially co-expressed genes (DCEG), many techniques have been developed in the literature. Here, we applied statistical t-test [7, 8] and Benjamini Hochberg method [9] to identify differentially expressed genes (DEG) [10] within sample groups of nscr and scdtr, nscr and ptr (http://www.ncbi.nlm.nih.gov/GSE20948). In addition to this, we have studied the significance of up-regulated and down-regulated genes. We code the algorithms with using Matlab. The main steps of proposed algorithms are discussed in the following subsections.

2.1 Preprocessing of the Dataset

Our sample Bon Marrow microarray gene expression data is in normalized form. Normalized data for each gene is typically known as an ʻexpression ratio’ or as the logarithm of expression ratio. We have done data preprocessing with filtered out low expressed values and null values from the datasets.

2.2 Identification of Differentially Co-expressed Genes

In this section, we discuss the steps to identify DEG. First, a standard statistical t-test is performed for detecting significant changes between measurements of genes in sample microarray groups. It may occurs two types of errors, 1: a false positive by declaring that a gene is differentially expressed when it is not, and 2: a false negative when the test fails to identify a truly differentially expressed gene. Second, Benjamini Hochberg method [9] is used for choosing significantly differentially expressed genes. It is done by the following equation:

$$ P \le y*\frac{\alpha }{m}, $$

(1)

where $ P $ is the largest p-value called significant, $ y $ is the number of genes called significant and $ m $ is the total number of genes tested, $ \alpha $ is false discovery rate (FDR) [11, 12], defined as the expected ratio of the number of false positives to the total number of positive calls in a differential expression analysis between two sample groups [11]. FDR can be measured [11] as,

$$ FDR = Err\left[ {\frac{F}{F + T}} \right] = Err\left[ {\frac{{n_{0} \cdot [1 - specificity]}}{{n_{0} \cdot [1 - specificity] + n_{1} \cdot sensitivity}}} \right], $$

(2)

where $ F $ is the number of false positives, $ T $ is the number of true positives, and $ S $ is the total number of features called significant. Also, $ n $, number of p-values is seen more clearly, $ n_{0} $ is the number of truly null features in the study, and $ n_{1} = n - n_{0} $ is the number of truly alternative features. Regardless of whether the p-value threshold is fixed or data-dependent, the quantities $ F $, $ T $ and $ S $ are random variables. Therefore, it is common statistical practice to write the overall error measure in terms of an expected value, which we denote by $ Err $ [11].

2.3 Proposed Algorithm

Input::

phenotype1: gene expression values of normal stem cell (nscr),

phenotype2: gene expression values of stem cell derive tumor (scdtr),

phenotype3: gene expression values of patched tumor (ptr).

Output::

All differentially expressed genes.

Step1::

Perform two-sample t-test to evaluate differential expression of genes from phenotype1 and phenotype2. Their p-values and t-scores are stored.

Step2::

Perform a permutation t-test to compute the p-values of 10,000 permutations by permuting the columns of the gene expression data matrix of phenotype1 and phenotype2.

Step3::

Determine the number of genes considered to have statistical significance at the p-value ʻcutoff’ of 0.001.

Step4::

Estimate FDR for the genes with statistically significant p-values.

Step5::

Create a scatter plot of gene expression data, plotting significance versus fold change of gene expression ratios of phenotype1 and phenotype2.

2.4 Separation of Up-regulated and Down-regulated Genes

We plot the volcano plot of identified all differentially expressed genes as volcano plot (Fig. 4) of two phenotypes returns a structure containing information for genes that are considered to be both statistically significant and significantly differentially expressed. This information helps us to identify co-regulated genes, specially up-regulated and down-regulated genes. Now based on cutoff value (0.001), up-regulated and down-regulated genes are separated from the total set of differentially co-expressed genes.

2.5 Extraction of Co-expressed and Co-regulated Genes

To separate co-expressed and co-regulated genes from the identified DEG, we generate a algorithm and code the algorithm with using Matlab. DEGs which have the same $ p $-value are called co-expressed genes. Up-regulated and down-regulated genes with similar p-values are known as co-expressed and co-regulated genes. Our proposed algorithm is as follows:

Algorithm

Input::: Exp: differentially expressed genes or up-regulated or down-regulated genes.
Output::: All Co-expressed and Co-regulated genes.
Step1::: Each distinct p-value of Exp and its respective position are stored in separate data vector and index vector.
Step2::: These data vector p-values are compared with the p-values of the original data, and if they match then their corresponding information is extracted from Exp with the help of the index vectors.
Step3::: The process is repeated until all distinct data vector p-values are compared.

2.6 Visualization of Differentially Co-expression and Co-regulation Network

We constructed pair-wise co-expression network to understand the relationship among co-expressed and co-regulated genes. Co-expression networks are built depending on their corresponding p-values. We have used cytoscape software to visualize the pair-wise network. The genes having same p-value construct a paired structure.

3 Results and Discussion

The Bon Marrow microarray data consists of 45,101 genes with having normal stem cell tissue (nscr) responses (5no.), stem cell derived tumor tissue (scdtr) responses (5no.) and patched tumor cell tissue (ptr) responses (4no.) are taken for the study. After analysis, we considered only the genes having p-value $ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{ \prec } $ 0.001 as significant genes (Figs. 1 and 2). Out of 45,101 genes from the dataset, 2,325 genes for sample first group (nscr–scdtr), 2,056 genes for sample second group (nscr–ptr) of data have been extracted on the basis of p-value which is approximately 5 % of total number of genes. Now, estimated false discovery rate (FDR) for statistically significant p-values are computed by using Eq. 2. As a result, we get 1,110, 792 up-regulated and 1,215, 1,264 down-regulated genes for two sample groups (Figs. 3 and 4) respectively. It implies that 2–3 % genes of total 45,101 genes are up-regulated and down-regulated.

Further, with finding similar p-values co-expressed and co-regulated genes are extracted from all differentially co-expressed genes (DCEG), up-regulated (UR) and down-regulated (DR) genes. We get 8, 4 numbers of up-regulated and down-regulated genes for group1 i.e. approximately 0.01 and 0.008 % of total 45,401 genes. Again we get 6, 14 numbers of up-regulated and down-regulated genes for group2 i.e. approximately 0.01 and 0.03 % of the total 45,101 genes. It is shown in the Table 1. The interaction among co-expressed and co-regulated genes depending on corresponding p-values, 6 paired networks are developed (Figs. 5 and 6).

Table 1 The table shows resulting differentially expressed and co-expressed genes of all sample groups

Full size table

The significance of the identified DEG are studied and analyzed with Gene Ontology (GO). We have listed it in Fig. 7.

4 Conclusion

In this paper, we first found DEG and then found correlations between gene-pairs for construction of co-expressed and co-regulated networks under diseased conditions that assist the interpretability of network. We also generated pairwise differentially co-expression network and constructed the same for differentially co-expressed and co-regulated genes of Bone Marrow stem cell microarray data. We also analyze the significance of DEG for the same microarray data. From our sample data we found 82 significantly co-expressed genes and 30 co-expressed and co-regulated genes. In future study, we can apply artificial intelligence based sophisticated techniques (fuzzy logic, neural networks, evolutionary computation) for better construction of cancer-specific regulatory networks.

References

William, et al, J.B.: Functional recovery of spinal cord injury following application of intralesional bone marrow mononuclear cells embedded in polymer scaffold-two year follow-up in a canine. J. Stem Cell Res. Ther. 1(3) 2011
Google Scholar
Hanna, J., et al.: Treatment of sickle cell anemia mouse model with iPS cells generated from autologous skin. Science 318(5858), 1920–1923 (2007)
Article Google Scholar
Saha, K., Jaenisch, R.: Technical challenges in using human induced pluripotent stem cells to model disease. Cell Stem Cell 5(6), 584–595 (2009)
Article Google Scholar
Jiang, Y., et al.: Pluripotency of mesenchymal stem cells derived from adult marrow. Nature 418(6893), 41–49 (2002)
Article Google Scholar
Vaishali, K., Vinayababu, A.: Application of microarray technology and soft computing in cancer biology: a review. Int. J. Biometr. Bioinform. 5(4), 225–233 (2011)
Google Scholar
Chen, J.J.: Key aspects of analyzing microarray gene-expression data. Pharmacogenomics 8(5), 473–482 (2007)
Article Google Scholar
Allen, et al. J.D.: Comparing statistical methods for constructing large scale gene networks. PLoS ONE 7(1) (2012)
Google Scholar
Vardhanabhuti, S., et al.: A comparison of statistical tests for detecting differential expression using affymetrix oligonucleotide microarrays. OMICS 10(4), 555–566 (2006)
Article Google Scholar
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. 57(1), 289–300 (1995)
MATH MathSciNet Google Scholar
Bandyopadhyay, S., Mallik, S., Mukhopadhyay, A.: A survey and comparative study of statistical tests for identifying differential expression from microarray data. IEEE/ACM Trans. Comput. Biol. Bioinform. 11(1), 95–115 (2014)
Article Google Scholar
Storey, J.D., Tibshirani, R.: Statistical significance for genome-wide studies. Proc. Natl. Acad. Sci. 100(16), 9440–9445 (2003)
Article MATH MathSciNet Google Scholar
Storey, J.D., Taylor, J.E., Siegmund, D.: Strong control conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. J. Roy. Stat. Soc. 66(1), 187–205 (2004)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Kalyani, Kalyani, 741235, West Bengal, India
Paramita Biswas & Anirban Mukhopadhyay
Department of Electronics and Communication Engineering, Kalyani Government Engineering College, Kalyani, 741235, West Bengal, India
Bandana Barman

Authors

Paramita Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Bandana Barman
View author publications
You can also search for this author in PubMed Google Scholar
Anirban Mukhopadhyay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anirban Mukhopadhyay .

Editor information

Editors and Affiliations

University of Kalyani, Kalyani, West Bengal, India
J. K. Mandal
Department of Computer Science and Engineering, Anil Neerukonda Institute of Technology and Sciences, Vishakapatnam, India
Suresh Chandra Satapathy
Dean, Faculty of Engineering, Technology, University of Kalyani, Kalyani, West Bengal, India
Manas Kumar Sanyal
Engineering and Technological Studies, University of Kalyani, Kalyani, West Bengal, India
Partha Pratim Sarkar
Department Computer Science & Engineering, University of Kalyani, Kalyani, India
Anirban Mukhopadhyay

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Biswas, P., Barman, B., Mukhopadhyay, A. (2015). Construction of Co-expression and Co-regulation Network with Differentially Expressed Genes in Bone Marrow Stem Cell Microarray Data. In: Mandal, J., Satapathy, S., Kumar Sanyal, M., Sarkar, P., Mukhopadhyay, A. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 339. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2250-7_76

Download citation

DOI: https://doi.org/10.1007/978-81-322-2250-7_76
Published: 21 January 2015
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2249-1
Online ISBN: 978-81-322-2250-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics