Learning Bayesian Networks Structure Based Part Mutual Information for Reconstructing Gene Regulatory Networks

Meng, Qingfei; Chen, Yuehui; Wang, Dong; Meng, Qingfang

doi:10.1007/978-3-319-63312-1_57

Qingfei Meng^16,17,
Yuehui Chen^16,17,
Dong Wang^16,17 &
…
Qingfang Meng^16,17

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10362))

Included in the following conference series:

International Conference on Intelligent Computing

2162 Accesses

Abstract

As a kind of high-precision correlation measurement method, Part Mutual Information (PMI) was firstly introduced into Bayesian Networks (BNs) structure learning algorithm in the paper. Compared to the general search scoring algorithm which set the initial network as an empty network without edge, our training algorithm initialized the network structure as an undirected network. That meant that our initial network identified the genes related to each other. And then the following algorithm only needed to determine the direction of the edges in the network. In the paper, we quoted the classic K2 algorithm based on Bayesian Dirichlet Equivalence (BDE) scoring function to search the direction of the edges. To test the proposed method, We carried out our experiment on two networks: the simulated gene regulatory network and the SOS DNA Repair network of Ecoli bacterium. And via comparison of different methods for SOS DNA Repair network, our proposed method was proved to be effective.

Access provided by CONRICYT-eBooks. Download conference paper PDF

An improved Bayesian network method for reconstructing gene regulatory network based on candidate auto selection

Article Open access 17 November 2017

Inferring Small-Scale Maximum-Entropy Genetic Regulatory Networks by Using DE Algorithm

Research on the Gene Regulation Network Construction Algorithm

Keywords

1 Introduction

The advances on high-throughput technology enabled us to obtain a great deal of gene expression data during the past ten years [1]. Meanwhile, a large number of models and methods emerged for accurately reconstructing gene regulatory networks. For example, the boolean network, the system of differential equations, artificial neural networks, bayesian network, and so on. Among these, the Bayesian Networks become a main model in the research of gene regulatory networks for it’s advantages, including handling incomplete data sets, fully combing the prior knowledge of the domain, visual image, and so on.

Generally there were two kinds of algorithms for Bayesian network structure learning: Search scoring method and correlation analysis method. The search scoring method usually introduced a scoring function $ S(G,D) $ and then used this function to evaluate each possible network structure for finding an optimal solution from all possible network structures. The correlation analysis method captured the dependency between nodes via independence test to study the network structure. In the study, we quoted the classic K2 algorithm [2] based on Bayesian Dirichlet Equivalence (BDE) scoring function to learn Bayesian Networks structure. The innovation of our method lied in introducing Part Mutual Information (PMI) to initialize the Bayesian Network structure.

The method based on the Mutual Information (MI) firstly obtained the mutual information matrix by calculating the mutual information between all possible gene pairs and then constructed the regulatory network through the mutual information matrix [3]. Although the method achieved good results, the limitation of this method was that it could not distinguish the direct and indirect association between genes with a high false positive. So Wang et al. [4] proposed a method based on conditional mutual information (CMI), which could distinguish the direct and indirect relationships between genes, and greatly reduced the false positive rate. However, the model based on conditional mutual information also had some limitations. For example Chen et al. [5] pointed out that there is a false negative problem based on conditional mutual information. When we measured the dependency between variables X and Y given variable Z, CMI could not correctly measure the direct association or dependency if X (or Y) was strongly associated with Z. In order to solve this problem, a method based on partial mutual information [5] was proposed, which solved the false positive problem of mutual information model and the false negative problem of conditional mutual information.

In the paper, we took advantage of partial mutual information to determine the relationship between variables for initialing Bayesian network structure. Then use K2 algorithm based on BDE scoring function to search the optimal network structure. Through two groups of experiments, our proposed method was illustrated to be effective.

2 Method

2.1 Part Mutual Information

Assuming that X and Y were two random variables, MI was defined on the basis of an extended Kullback–Leibler (KL) divergence D [6]:

$$ MI(X;Y) = D(p(x,y)\left\| {p(x)p(y)} \right.) = \sum\limits_{x,y} {p(x,y)\log \frac{p(x,y)}{p(x)p(y)}} $$

(1)

where $ p(x,y) $ was the joint probability distribution of X and Y and $ p(x) $ and $ p(y) $ were the marginal distributions of X and Y, respectively. Clearly, MI in Eq. 1 was evaluated against the ‘mutual independence’ of X and Y, which was defined as

$$ p(x)p(y) = p(x,y) $$

(2)

CMI for variables X and Y given Z could further detect nonlinearly direct association and was defined as:

$$ \begin{aligned} CMI(X;Y\left| Z \right.) & = D(p(x,y,z)\left\| {p(x\left| z \right.)p(y\left| z \right.)} \right.p(z)) \\ & = \sum\limits_{x,y,z} {p(x,y,z)\log \frac{p(x,y\left| z \right.)}{p(x\left| z \right.)p(y\left| z \right.)}} \\ \end{aligned} $$

(3)

So the conditional independence of X and Y given Z, which was defined as:

$$ p(x\left| z \right.)p(y\left| z \right.) = p(x,y\left| z \right.) $$

(4)

To analogy Eqs. (2), (4), partial independence of X and Y given Z was defined as:

$$ p^{*} (x\left| z \right.)p^{*} (y\left| z \right.) = p(x,y\left| z \right.) $$

(5)

where $ p^{*} (x\left| z \right.) $ and $ p^{*} (y\left| z \right.) $ were defined [7] as

$$ p^{*} (x\left| z \right.) = \sum\limits_{y} {p(x\left| {z,y} \right.)} p(y),p^{*} (y\left| z \right.) = \sum\limits_{x} {p(y\left| {z,x} \right.)} p(x) $$

(6)

So

$$ PMI(X;Y\left| Z \right.) = D(p(x,y,z)\left\| {p^{*} (x\left| z \right.)p^{*} (y\left| z \right.)} \right.p(z)) $$

(7)

Or as

$$ PMI(X;Y\left| Z \right.) = \sum\limits_{x,y,z} {p(x,y,z)\log \frac{p(x,y\left| z \right.)}{{p^{*} (x\left| z \right.)p^{*} (y\left| z \right.)}}} $$

(8)

2.2 Bayesian Networks

The Bayesian Networks were a directed acyclic graph $ B = (G,\Theta ) $ which shown the probabilistic dependency relationship between variables. $ G = (V,E) $ was a directed acyclic graph and $ \Theta $ was the collection of conditional probability table (CPT). $ V = (V1, \cdots ,Vn) $ was a set of nodes and each node represented a field random variable $ Xi $.

$$ E = \left\{ { < Xi,Xk > \left| {Xi,Xk \in V,i \ne k} \right.} \right\} $$

(9)

Where $ < Xi,Xk > $ indicated the directed edge. $ Xi \to Xk $ indicated the probabilistic dependency relationship between variables $ Xi $ and $ Xk $. $ Xi $ was the father node of $ Xk $. $ Pa(Xk) $ indicated the parent node set of variable $ Xk $.

For every variable $ Xk $, it’s value $ xk $ had the following parameter $ \theta xk\left| {pa(Xk)} \right. = P(xk\left| {pa} \right.(Xk)) $, which showed the probability of $ xk $ occurrence under the condition of $ Pa(Xk) $. So the joint conditional probability distribution on a given set of variables of Bayesian Networks was as:

$$ P(X1,X2, \cdots ,Xn) = \mathop \prod \limits_{i = 1}^{n} P(Xi\left| {Pa(Xi)} \right.) $$

(10)

2.3 K2 Algorithm

Cooper [2] put forward K2 algorithm based on Bayesian Dirichlet Equivalence (BDE) scoring function to learn Bayesian Networks structure. The K2 algorithm used the way of hill-climbing search to learn the network structure.

According to prior knowledge, We first obtained the initial network. Then, Use the search operator including reducing edges, increasing edges and reserving edges to modify the current network to get candidate network. According BDE scoring function, Select the fractional optimal network to replace the current network, and then continue the search until obtaining the best network.

The BDE scoring function an approximation of the marginal likelihood function under the condition of large sample. The BDE scoring function was defined as:

$$ S(G,D) = LgP(D\left| G \right.) + LgP(G) - \frac{d}{2}\log m $$

(11)

Where $ P(D\left| G \right.) $ was an edge distribution of dataset $ D $ and was the probability averaging of data from $ D $. $ P(G) $ was the prior probability of the network structure $ G $. $ \frac{d}{2}\log m $ was a penalty function for sample size $ m $. Because of the addition of a penalty term, the BIC scoring function avoided overfitting.

2.4 Evaluation Index

In our study, five criterions (True Positive Rate (TPR), False Positive Rate (FPR), Positive Predictive (PPV), Accuracy (ACC) and F-score) were used to test the performance of the proposed method. Their definition was given as follows:

$$ TPR = \frac{TP}{TP + FN} $$

(12)

$$ FPR = \frac{FP}{FP + TN} $$

(13)

$$ PPV = \frac{TP}{TP + FP} $$

(14)

$$ ACC = \frac{TP + TN}{TP + FP + TN + FN} $$

(15)

$$ F - score = \frac{2PPV*TPR}{PPV + TPR} $$

(16)

Where True Positive (TP) meant that edges in real networks were identified as edges in the model. False Positive (FP) meant that edges not in real networks were identified as edges in the model. True Negative (TN) meant that edges not in real networks were not identified as edges in the model. False Negative (FN) meant that edges in real networks were not identified as edges in the model.

3 Experimental Results and Analysis

3.1 Experiment 1: Simulated Gene Regulatory Network

Figure 1 showed a simulated gene regulatory network which was modeled by a S-system model [8]. The model consisted of $ n $ non-linear ordinary differential equations and the generic form of equation $ i $ is given as follows:

$$ X_{i}^{'} (t) = \alpha_{i} \mathop\Pi \limits_{j = 1}^{N} X_{j}^{{g_{ij} }} (t) - \beta_{i} \mathop\Pi \limits_{j = 1}^{N} X_{j}^{{h_{ij} }} (t) + \varepsilon $$

Where $ X_{i} $ was a vector element of dependent variable, $ N $ was the number of variables, $ \alpha_{i} $ and $ \beta_{i} $ was vector elements of non-negative rate constants, $ g_{ij} $ and $ h_{ij} $ were matrix elements of kinetic orders and random Gaussian noises ($ \varepsilon $) were added to each equation independently.

Table 1 listed the parameters of S-system containing 15 genes. The initial value is random and we respectively got the simulated data by 20 time points, .30 time points and 50 time points.

Table 1. Parameters of the S-system

Full size table

From Table 2, we can see the TPR and FPR obtained by our proposed algorithm.

Table 2. Performances of the proposed model to the simulated data

Full size table

3.2 Experiment 2: SOS DNA Repair Network of E.Coli Bacterium

The datasets from SOS DNA Repair network of E.coli bacterium [9] contained four experiments under various light intensities (5$ Jm^{ - 2} $, 5$ Jm^{ - 2} $, 20$ Jm^{ - 2} $, 20$ Jm^{ - 2} $). Each experiment (http://www.weizmann.ac.il/mcb/UriAlon/ Papers/SOSData) consisted of 50 time points evenly spaced by 6 min and referred to eight genes: uvrD, lexA, umuD, recA, uvrA, uvrY, ruvA and polB. Figure 2 displayed the known real regulation among 8 genes.

In the experiment, we firstly normalized the gene expression data for each gene to the interval [0, 1] using

$$ x_{i}^{'} (t) = \frac{{x_{i} (t) - \min_{i} }}{{\max_{i} - \min_{i} }} $$

Where $ x_{i} (t) $ is the actual measured gene expression level of gene $ i $ at time point $ t $. $ min_{i} $ and $ \max_{i} $ respectively were the minimum and maximum of gene $ i $ expression level. Table 3 showed the detail comparison between proposed method and other methods including S-system [10], ODEs [11], RNN [12] and DBN [13]. The results showed that our proposed method performed better than other methods.

Table 3. Comparison of different methods for SOS dataset

Full size table

4 Conclusions

In the paper, the Part Mutual Information was introduced into the training algorithm of the Bayesian Networks structure, which meant that the initial network has identified the genes related to each other and the later searching process with K2 algorithm only needed to determine the direction of the edges in the Bayesian Networks. We carried out our experiments on two networks: the simulated gene regulatory network and the SOS DNA Repair network of Ecoli bacterium. And via comparison of different methods for SOS DNA Repair network, our proposed method was proved to be effective.

References

Cho, K.H., Choo, S.M., Jung, S.H., et al.: Reverse engineering of gene regulatory networks. IET Syst. Biol. 1(3), 149–163 (2007)
Article Google Scholar
Cooper, G.F., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9(4), 309–347 (1992)
MATH Google Scholar
Ji, Z., Wu, D., Zhao, W., et al.: Systemic modeling myeloma-osteoclast interactions under normoxic/hypoxic condition using a novel computational approach. Sci. Rep. 5 (2015)
Google Scholar
Wang, B., Zhang, J., Chen, P., et al.: Prediction of peptide drift time in ion mobility mass spectrometry from sequence-based features. BMC Bioinform. 14(8), S9 (2013)
Article Google Scholar
Altay, G., Emmertstreib, F.: Revealing differences in gene network inference algorithms on the network level by ensemble methods. Bioinformatics 26(14), 1738–1744 (2010)
Article Google Scholar
Bao, W., Chen, Y., Wang, D.: Prediction of protein structure classes with flexible neural tree. Bio-Med. Mater. Eng. 24(6), 3797–3806 (2014)
Google Scholar
Zhao, J., Zhou, Y., Zhang, X., et al.: Part mutual information for quantifying direct associations in networks. Proc. Nat. Acad. Sci. 113(18), 5130–5135 (2016)
Article Google Scholar
Schreiber, T.: Measuring information transfer. Phys. Rev. Lett. 85(2), 461–464 (2000)
Article MathSciNet Google Scholar
Anzing, D., Balduzzi, D., Grosse-Wentrup, M., Schölkopf, B.: Quantifying causal influences. Ann. Stat. 41(5), 2324–2358 (2013)
Article MathSciNet MATH Google Scholar
Kimura, S., Ide, K., Kashihara, A.: Inference of S-system models of genetic networks using a cooperative coevolutionary algorithm. Bioinformatics 21, 1154–1163 (2005)
Article Google Scholar
Ronen, M., Rosenberg, R., Shraiman, B.I., et al.: Assigning numbers to the arrows: parameterizing a gene regulation network by using accurate expression kinetics. Proc. Natl. Acad. Sci. U.S.A. 99(16), 10555 (2002)
Article Google Scholar
Noman, N., Iba, H.: Reverse engineering genetic networks using evolutionary computation. In: Genome Informatics International Conference on Genome Informatics, PubMed, pp. 205–214 (2005)
Google Scholar
Ji, Z., Wang, B., Deng, S.P., et al.: Predicting dynamic deformation of retaining structure by LSSVR-based time series method. Neurocomputing 137, 165–172 (2014)
Article Google Scholar
Kimura, S., Sonoda, K., Yamane, S., et al.: Function approximation approach to the inference of reduced NGnet models of genetic networks. BMC Bioinform. 9(1), 23 (2008)
Article Google Scholar
Xu, R., Wunsch, D.C., Frank, R.L.: Inference of genetic regulatory networks with recurrent neural network models using particle swarm optimization. IEEE/ACM Trans. Comput. Biol. Bioinform. 4(4), 681–692 (2007)
Article Google Scholar
Perrin, B.E., Ralaivola, L., Mazurie, A., Bottani, S., Mallet, J., Buc, D.F.: Gene network inference using dynamic bayesian networks. Bioinformatics 19(Suppl. 2), 138–148 (2003)
Google Scholar

Download references

Acknowledgment

This work was supported by the National Key Research and Development Program of China (2016YFC106000), the National Natural Science Foundation of China (Grant No. 61671220, 61640218, 61201428), the Shandong Distinguished Middle-aged and Young Scientist Encourage and Reward Foundation, China (Grant No. ZR2016FB14), the Project of Shandong Province Higher Educational Science and Technology Program, China (Grant No. J16LN07), the Shandong Province Key Research and Development Program, China (Grant No. 2016GGX101022).

Author information

Authors and Affiliations

School of Information Science and Engineering, University of Jinan, Jinan, 250022, China
Qingfei Meng, Yuehui Chen, Dong Wang & Qingfang Meng
Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, 250022, China
Qingfei Meng, Yuehui Chen, Dong Wang & Qingfang Meng

Authors

Qingfei Meng
View author publications
You can also search for this author in PubMed Google Scholar
Yuehui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Dong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qingfang Meng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yuehui Chen or Dong Wang .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Universidad Distrital Francisco José de Caldas, Bogotá, Colombia
Juan Carlos Figueroa-García

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Meng, Q., Chen, Y., Wang, D., Meng, Q. (2017). Learning Bayesian Networks Structure Based Part Mutual Information for Reconstructing Gene Regulatory Networks. In: Huang, DS., Jo, KH., Figueroa-García, J. (eds) Intelligent Computing Theories and Application. ICIC 2017. Lecture Notes in Computer Science(), vol 10362. Springer, Cham. https://doi.org/10.1007/978-3-319-63312-1_57

Download citation

DOI: https://doi.org/10.1007/978-3-319-63312-1_57
Published: 20 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63311-4
Online ISBN: 978-3-319-63312-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Bayesian Networks Structure Based Part Mutual Information for Reconstructing Gene Regulatory Networks

Abstract

Similar content being viewed by others

An improved Bayesian network method for reconstructing gene regulatory network based on candidate auto selection

Inferring Small-Scale Maximum-Entropy Genetic Regulatory Networks by Using DE Algorithm

Research on the Gene Regulation Network Construction Algorithm

Keywords

1 Introduction