Reverse Engineering of Gene Regulatory Networks Combining Dynamic Bayesian Networks and Prior Biological Knowledge

de Souza, Mariana C.; Higa, Carlos H. A.

doi:10.1007/978-3-319-95162-1_22

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10960))

Included in the following conference series:

International Conference on Computational Science and Its Applications

1566 Accesses
7 Citations

Abstract

An important problem in the Systems Biology field is the reverse engineering of gene regulatory networks from gene expression data. In this work, we addressed this problem using a probabilistic graphical model known as Dynamic Bayesian Network to model the regulatory relations among the genes. We also used a Boolean formalism, assuming that each gene can take on two possible values: 0 (not expressed) and 1 (expressed). To learn the Dynamic Bayesian Network from time-series gene expression data we search for the network structure that best matches the data using the Bayesian Information Criterion score and the BDe score and compared them. Besides that, we used a source of prior biological knowledge from a database named STRING, unlike most of the reverse engineering algorithms that does not take into account any source of additional information. The results show that this approach can improve the quality of the inferred networks, and we also showed that the Dynamic Bayesian Network performs better than its standard version, Bayesian Network.

Supported by CNPq and UFMS.

Access provided by CONRICYT-eBooks. Download conference paper PDF

MICRAT: a novel algorithm for inferring gene regulatory networks using time series gene expression data

Article Open access 14 December 2018

Overview and Evaluation of Recent Methods for Statistical Inference of Gene Regulatory Networks from Time Series Data

Reverse Engineering Transcriptional Gene Networks

Keywords

1 Introduction

The reverse engineering of Gene Regulatory Networks (GRN) is an important problem studied in the Systems Biology and Bioinformatics field. The importance of dealing with this problem is that it can help us to understand the regulatory mechanisms of a biological phenomenon under study. For instance, researchers have studied GRN in the context of cell cycle [15], circadian cycle of plants [18], and more important, diseases such as cancer [9, 19]. Given the importance of this problem, several models of GRN along with algorithms for the inference of GRN have been proposed over the years. Generally, these algorithms receive a dataset of gene expression as input and outputs a regulatory network represented by a specific model. The input data are those provided by technologies such as DNA microarrays [22] and, more recently, RNA-Seq [27].

Among the proposed mathematical models of GRN are the Boolean networks [12], Bayesian networks [4], models based on ordinary differential equations [6], to name a few of them. To review a set of models of GRN we suggest the papers [7, 11, 17]. Algorithms for the reverse engineering of GRN are based on mathematical models and each one has its own characteristics besides advantages/advantages. We can not say that there is an algorithm that outperforms the others in every situation. For instance, some algorithms are specific to work with time-series data while some others work with steady-state data. It is worth mentioning that the reverse engineering of GRN is an ill-posed problem, i.e., given a set of gene expression data, there could be several consistent solutions (networks) with these data. This fact makes the problem a very difficult one.

In this work, we modeled the gene expression in a Boolean fashion where a gene can take on two possible values: 0 (not expressed) and 1 (expressed). We also used a probabilistic graphical model known as Dynamic Bayesian Network, which is commonly used to represent complex stochastic processes. It differs from the standard Bayesian Network in the sense that it models the stochastic evolution of a set of variables over time. Both models are defined by a graphical structure and a set of parameters, which together specify a joint distribution over the random variables. Algorithms for learning the parameters and the structure of such networks have been developed [14, 21]. In this paper, to learn the structure of Dynamic Bayesian Networks from time-series gene expression data we search for the network structure that best matches the data using the Bayesian Information Criterion score [23] and the BDe score [8].

Most of the reverse engineering algorithms of gene regulatory networks do not take into account any sources of prior biological knowledge. The approach of using additional biological information in this problem has shown interesting results, as can be seen in [5, 26, 29]. The source of prior knowledge used in this work is a database called STRING [20, 25], which covers more than 2000 organisms in its latest version (10.0). In [13] a similar approach was performed using the Boolean network model and a feature selection algorithm. Here, we use a different model and approach do infer the model, but using the same source of prior biological knowledge. Because we are facing an ill-posed problem, this approach aims to improve the inference of gene interactions, once we take into account some interactions already supported by the literature.

To validate the methodology we used data from the DREAM challenge [2]. We also applied the methodology in a dataset of yeast (Saccharomyces cerevisiae) cell-cycle gene expression [24] and compared the results against the literature through the KEGG (Kyoto Encyclopedia of Genes and Genomes) database [10].

In the next section, we present the methodology, beginning with the introduction of the Dynamic Bayesian Networks, followed by the definitions of the score functions used in the learning process. We also explain how the prior biological knowledge database was used in the process. In Sect. 3 we discuss the results, followed by our conclusions, in Sect. 4.

2 Methodology

In this section we present the mathematical model of GRN used in our methodology along with the methodology to learn the model from the data. We also present the source of prior biological knowledge used and how it is used in the process of reverse engineering a GRN. In the final part of the section we discuss how the methodology is validated.

2.1 Dynamic Bayesian Networks

A Dynamic Bayesian Network (DBN) is a probabilistic graphical model defined by a network structure and a set of parameters to specify the joint probability of random variables. Let $\mathcal {X} = \{X_1,X_2, \ldots , X_n\}$ be a set of random variables where each variable $X_i$ can assume values in the finite set $\mathrm {Val}(X_i)$. In DBN, unlike standard Bayesian networks, the time is taken into account such that $X_i[t]$ is the value of $X_i$ at time t. Thus, $\mathcal {X}[t]$ is the set of all random variables at time t.

To represent the beliefs about the process, a probability distribution over the variables $\mathcal {X}[0] \cup \mathcal {X}[1] \cup \mathcal {X}[2] \cup \cdots $ is necessary. We assume that the process is Markovian in the sense that $\mathrm {P}(\mathcal {X}[t+1]\,|\,\mathcal {X}[0],\ldots ,\mathcal {X}[t]) = \mathrm {P}(\mathcal {X}[t+1]\,|\,\mathcal {X}[t])$. The probability $\mathrm {P}(\mathcal {X}[t+1]\,|\,\mathcal {X}[t])$ is time independent.

A DBN representing the joint distribution over all possible trajectories of a process is composed of: (i) a prior network $B_0$ specifying a distribution over $\mathcal {X}[0]$; (ii) a transition network $B_{\rightarrow }$ over $\mathcal {X}[0] \cup \mathcal {X}[1]$ that is taken to specify $\mathrm {P}(\mathcal {X}[t+1]\,|\,\mathcal {X}[t])$ for all t. In Fig. 1(a) and (b) we can see a simple example.

The transition probability of this network is given by

$$\begin{aligned} \mathrm {P}_{B_{\rightarrow }}(\mathcal {X}[1]\,|\,\mathcal {X}[0]) = \prod ^{n}_{i=1} \mathrm {P}_{B_{\rightarrow }}(X_i[1]\,|\,\varvec{\mathrm {Pa}}(X_i[1])), \end{aligned}$$

(1)

where $\varvec{\mathrm {Pa}}(X_i)$ is the parents of $X_i$. Thus, the structure of a DBN is defined by a pair $(B_0, B_{\rightarrow })$ corresponding to a network over the variables $\mathcal {X}[0], \ldots , \mathcal {X}[\infty ]$. In practice, we reason only about the finite interval $0, \ldots , T$. To this end, we “unroll” the DBN structure into a Bayesian Network over $\mathcal {X}[0], \ldots , \mathcal {X}[T]$. In slice 0, the parents of $X_i[0]$ are specified in $B_0$. In slice $t+1$, the parents of $X_i[t+1]$ are nodes in slices t and $t+1$ corresponding to the parents of $X_i[1]$ in $B_{\rightarrow }$. In Fig. 1(c) we can see the result of unrolling the network for 3 time slices. Given a DBN, the joint distribution over $\mathcal {X}[0], \ldots , \mathcal {X}[T]$ is

$$\begin{aligned} \mathrm {P}(\mathcal {X}[0], \ldots , \mathcal {X}[T]) = \mathrm {P}_{B_0}(\mathcal {X}[0]) \prod _{t=0}^{T-1}\mathrm {P}_{B_{\rightarrow }}(\mathcal {X}[t+1]\,|\,\mathcal {X}[t]). \end{aligned}$$

(2)

2.2 Learning DBN from Complete Data

In this work, the reverse engineering of gene networks is treated as a problem of learning a DBN from complete data. The data in question is a time-series gene expression dataset $\mathcal {D}$ consisting of K complete observation sequences $D_1,D_2, \ldots , D_K$. Each sequence $D_{\ell }$ has length $N_{\ell }$ and specifies values for the variables $\mathbf {x}^{\ell }[0], \mathbf {x}^{\ell }[1], \ldots , \mathbf {x}^{\ell }[N_{\ell }]$. Thus, we have K initial instances of initial slices to learn $B_0$ and $N = \sum _{\ell } N_{\ell }$ instances of transitions to learn $B_{\rightarrow }$.

Formally, the reverse engineering problem is stated as follows. Given a dataset $\mathcal {D}$ of instances of $\mathcal {X}$, find a DBN $\mathcal {B}$ with structure $\mathcal {G}=(B_0, B_{\rightarrow })$ and its parameters $\mathbf {\Theta }$ that best matches $\mathcal {D}$. We use a scoring function to measure how a DBN matches the data. The score combines the likelihood of the data according to the network, $L(\mathcal {B}:\mathcal {D}) = \log \mathrm {P}(\mathcal {D}\,|\,\mathcal {B})$, with some penalty according to the complexity of the network.

BIC Score. We define the Bayesian Information Criterion score and the BDe score according to [3]. First, let us define

$$\begin{aligned} \theta ^{(0)}_{i,j^{'}_i,k^{'}_i} = P(X_i[0] = k^{'}_i\,|\,\varvec{\mathrm {Pa}}(X_i[0])=j^{'}_i) \end{aligned}$$

(3)

and similarly

$$\begin{aligned} \theta ^{\rightarrow }_{i,j_i,k_i} = P(X_i[0] = k_i\,|\,\varvec{\mathrm {Pa}}(X_i[0])=j_i) \end{aligned}$$

(4)

for $t=1,\ldots ,T$. The notation for the sufficient statistics is given by

$$\begin{aligned} N^{(0)}_{i,j^{'}_i,k^{'}_i} = \sum _{\ell } I(X_i[0] = k^{'}_i, \varvec{\mathrm {Pa}}(X_i[0])=j^{'}_i ; \mathbf {x^{\ell }}) \end{aligned}$$

(5)

and

$$\begin{aligned} N^{\rightarrow }_{i,j_i,k_i} = \sum _{\ell } \sum _t I(X_i[0] = k_i, \varvec{\mathrm {Pa}}(X_i[t])=j_i ; \mathbf {x^{\ell }}) \end{aligned}$$

(6)

where $I(\cdot ;\mathbf {x^{\ell }})$ is an indicator function which takes on value 1 if the event $\cdot $ occurs in $\mathbf {x}^{\ell }$, and 0 otherwise.

Using Eq. 2 and rearranging the terms, the likelihood function is decomposed according to the DBN structure:

$$\begin{aligned} \mathrm {P}(\mathcal {D}\,|\,\mathcal {G},\mathbf {\Theta }_{\mathcal {G}}) = \prod _i \prod _{j^{'}_i} \prod _{k^{'}_i} (\theta ^{(0)}_{i,j^{'}_i,k^{'}_i})^{N^{(0)}_{i,j^{'}_i,k^{'}_i}} \times \prod _i \prod _{j_i} \prod _{k_i} (\theta ^{\rightarrow }_{i,j_i,k_i} )^{N^{\rightarrow }_{i,j_i,k_i}} \end{aligned}$$

(7)

and the log-likelihood is given by

$$\begin{aligned} L(\mathcal {B}:\mathcal {D}) = \sum _i \sum _{j^{'}_i} \sum _{k^{'}_i} N^{(0)}_{i,j^{'}_i,k^{'}_i} \log \theta ^{(0)}_{i,j^{'}_i,k^{'}_i} + \sum _i \sum _{j_i} \sum _{k_i} N^{\rightarrow }_{i,j_i,k_i} \log \theta ^{\rightarrow }_{i,j_i,k_i} . \end{aligned}$$

(8)

This decomposition facilitates the computation of the BIC and BDe scores.

Using the standard maximum likelihood estimate for multinomial distributions, we get the following expression for $\hat{\mathbf {\Theta }}_{\mathcal {G}}$:

$$\begin{aligned} \hat{\theta }^{(0)}_{i,j^{'}_i,k^{'}_i} = \frac{N^{(0)}_{i,j^{'}_i,k^{'}_i}}{\sum _{k^{'}_i}N^{(0)}_{i,j^{'}_i,k^{'}_i}} , \end{aligned}$$

(9)

and similarly for the transition case. Thus, the BIC score is given by

$$\begin{aligned} \mathrm {BIC}(\mathcal {G}:\mathcal {D}) = \mathrm {BIC}_0 + \mathrm {BIC}_{\rightarrow }, \end{aligned}$$

(10)

where

$$\begin{aligned} \mathrm {BIC}_0 = \sum _i \sum _{j^{'}_i} \sum _{k^{'}_i} N^{(0)}_{i,j^{'}_i,k^{'}_i} \log \hat{\theta }^{(0)}_{i,j^{'}_i,k^{'}_i} - \frac{\log K}{2}\#G_0 \end{aligned}$$

(11)

and

$$\begin{aligned} \mathrm {BIC}_{\rightarrow } = \sum _i \sum _{j_i} \sum _{k_i} N^{\rightarrow }_{i,j_i,k_i} \log \hat{\theta }^{\rightarrow }_{i,j_i,k_i} - \frac{\log N}{2}\#G_{\rightarrow }, \end{aligned}$$

(12)

where $\#G_0$ and $\#G_{\rightarrow }$ are the number of parameters in $B_0$ and $B_{\rightarrow }$, respectively.

BDe Score. The BDe score is computed by evaluating the following integral:

$$\begin{aligned} \mathrm {P}(\mathcal {D}\,|\,\mathcal {G}) = \int \mathrm {P}(\mathcal {D}\,|\,\mathcal {G},\mathbf {\Theta }_{\mathcal {G}})\mathrm {P}(\mathbf {\Theta }_{\mathcal {G}}\,|\,\mathcal {G})\,d\mathbf {\Theta }_{\mathcal {G}}. \end{aligned}$$

(13)

Using Eq. 7 and assuming Dirichlet priors we can obtain a closed form. A Dirichlet prior for a variable X is specified by a set o hyperparameters $\{N'_{x}:x \in \mathrm {Val}(X)\}$:

$$\begin{aligned} \mathrm {P}(\mathbf {\Theta }_{X}) = \mathrm {Dirichlet}({N'_{x}:x \in \mathrm {Val}(X)}) \propto \prod _x \theta ^{N'_{x}-1}_x . \end{aligned}$$

(14)

The hyperparameters can be interpreted as pseudo counts. Assuming that for each structure $\mathcal {G}$ we have hyperparameters $N'^{(0)}_{i,j^{'}_i,k^{'}_i}$ and $N'^{\rightarrow }_{i,j_i,k_i}$ we can rewrite $\mathrm {P}(\mathcal {D}\,|\,\mathcal {G})$ as a product of two terms

$$\begin{aligned} \prod _i \prod _{j^{'}_i} \frac{\varGamma (\sum _{k^{'}_i} N'^{(0)}_{i,j^{'}_i,k^{'}_i})}{\varGamma (\sum _{k^{'}_i} N'^{0}_{i,j^{'}_i,k^{'}_i} + N'^{\rightarrow }_{i,j_i,k_i})} \times \prod _{k^{'}_i} \frac{\varGamma (N'^{(0)}_{i,j^{'}_i,k^{'}_i} + N^{(0)}_{i,j^{'}_i,k^{'}_i})}{\varGamma (N'^{(0)}_{i,j^{'}_i,k^{'}_i})}, \end{aligned}$$

(15)

and similar for the transition case where $\varGamma (x) = \int ^{\infty }_{0} t^{x-1}e^{-t}dt$ is the Gamma function satisfying $\varGamma (1) = 1$ and $\varGamma (x+1)=x\varGamma (x)$. Following [8], we assign the Dirichlet hyperparameters as

$$\begin{aligned} N'^{(0)}_{i,j^{'}_i,k^{'}_i} = \hat{N}^{(0)} \times \mathrm {P}_{B'_0}(X_i[0] = k^{'}_i | \varvec{\mathrm {Pa}}(X_i[0]) = j^{'}_i)\;\;\mathrm {and} \end{aligned}$$

(16)

$$\begin{aligned} N'^{\rightarrow }_{i,j_i,k_i} = \hat{N}^{\rightarrow } \times \mathrm {P}_{B'_{\rightarrow }}(X_i[1] = k_i | \varvec{\mathrm {Pa}}(X_i[1]) = j_i), \end{aligned}$$

(17)

given two equivalent sample sizes $\hat{N}^{(0)}$ and $\hat{N}^{\rightarrow }$.

2.3 Prior Biological Knowledge

The source of prior biological knowledge used is the STRING database (Search Tool for the Retrieval of Interacting Genes/Proteins) [25]. This database aims to provide a critical assessment and integration of protein-protein interactions, including direct (physical) as well as indirect (functional) associations. The current version of STRING is v10 and it covers more than 2000 organisms. Besides that, an API interface for the R computing environment is available, allowing us to easily retrieve the information about gene interactions.

The interactions are derived from multiple sources: (i) known experimental interactions are imported from primary databases, (ii) pathway knowledge is parsed from manually curated databases, (iii) automated text-mining is applied to uncover statistical and/or semantic links between proteins, based on Medline abstracts and a large collection of full-text articles, (iv) interactions are predicted de novo by a number of algorithms using genomic information as well as by co-expression analysis and (v) interactions that are observed in one organism are systematically transferred to other organisms, via pre-computed orthology relations [25].

For each pair of genes of a given organism, the STRING database provides a set of scores based on association evidence type: conserved neighborhood, gene fusions, phylogenetic co-occurrence, co-expression, database imports, large-scale experiments and literature co-occurrence. There is also a combined score expressing confidence when an association is supported by several types of evidence. For details see reference [20].

Let $s(X_i,X_j)$ be the combined score for a pair of genes obtained from STRING. First, we normalize the scores so that $0 \le s(X_i,X_j) \le 1$. Given that, the biological score is defined as the mean

$$\begin{aligned} \beta (X_i, \varvec{\mathrm {Pa}}(X_i)) = \frac{1}{|\varvec{\mathrm {Pa}}(X_i)|} \sum _{X_j \in \varvec{\mathrm {Pa}}(X_i)} s(X_i,X_j) . \end{aligned}$$

(18)

The decomposition in Eq. 8 facilitates the computation of the BIC and BDe scores. Notice that the likelihood is expressed as a sum of terms, where each term depends only on the conditional probability of a variable $X_i$ given its parents $\varvec{\mathrm {Pa}}(X_i)$. Thus, when computing the BIC or BDe score, we add the term $\omega \cdot \beta (X_i, \varvec{\mathrm {Pa}}(X_i))$, where $\omega \in \mathbb {R}$ is the weight given to the prior biological knowledge. Obviously, $\omega =0$ means that we are not taking into account the biological knowledge.

2.4 Validation

To validate and analyze the methodology, assume that $\mathcal {G}^{*}$ is the gold standard network and $\mathcal {G}$ is the output network. Then, it is possible to fill the entries of a confusion matrix [28] according to Table 1. According to this table, if an edge/connection is present in the gold standard and it is correctly inferred, then it counts as a true positive (TP).

Table 1. Confusion matrix. TP = true positive; FP = false positive; FN = false negative; TN = true negative.

Full size table

To quantify the quality of the output network, we used a similarity [1] measure defined as

$$\begin{aligned} \mathrm {Similarity}(\mathcal {G}^{*}, \mathcal {G}) = \sqrt{\mathrm {PPV} \times \mathrm {Specificity}}, \end{aligned}$$

(19)

where

$$\begin{aligned} \mathrm {PPV} = \frac{\mathrm {TP}}{\mathrm {TP} + \mathrm {FP}} \text{ and } \mathrm {Specificity} = \frac{\mathrm {TN}}{\mathrm {TN} + \mathrm {FP}}. \end{aligned}$$

(20)

2.5 Implementation

The methodology was implemented by extending an open source package named Pgmpy (http://pgmpy.org), which is a Python library for working with probabilistic graphical models. This is an ongoing project and for DBN only the parameters inference was implemented. We extended the library by implementing the learning of DBN structure given a gene expression dataset and a source of prior biological knowledge provided by the STRING database.

3 Results and Discussion

In order to apply the proposed methodology, we performed experiments using data from the DREAM4 Challenge (Dialogue for Reverse Engineering Assessments and Methods) [2] and a yeast (Saccharomyces cerevisiae) cell cycle study [24]. All datasets used were discretized using an algorithm called Bikmeans [16].

3.1 DREAM4 Data Results

The DREAM4 in silico challenge data consists of a time series generated from networks of 10 and 100 genes. There are 5 networks of size 10 and for each of them there are 5 datasets of 21 time points. For the network of size 100 the number of networks is also 5, but there are 10 datasets of 21 time points each. This experiment validates the methodology without the use of prior biological knowledge, once it consists of synthetic data.

In Fig. 2 we compare the standard Bayesian Network (BN), the Bayesian Network considering the time factor (BN with time), and Dynamic Bayesian Network (DBN) by using the BIC score. The BN with time simply means that, when estimating the conditional probabilities for a gene $X_i[t]$, the values of its parents in the samples are taken at time $t+1$. Networks from 1 to 5 are composed by 10 genes and from 6 to 10 by 100 genes. We can observe that DBN performs better in most cases.

In Fig. 3 we performed the same comparison, but using the BDe score. In some cases DBN presented better results, but in others the BN with time was better.

We also compared the true positive rate between BIC and BDe scores. In Fig. 4 we can observe that the BDe score performs slightly better in most experiments. From 1 to 10 we used BN with time, where from 1 to 5 are the networks of size 10 genes and, from 6 to 10 are networks of size 100. From 11 to 20 we used DBN, where from 11 to 15 are networks of size 10 and, from 16 to 20 of size 100.

3.2 Yeast Cell Cycle Data Results

We tested the methodology using a dataset of yeast cell cycle gene expression provided by [24]. This dataset was obtained from six experiments, named cln3 (2 time points), clb2 (2), alpha (18), cdc15 (24), cdc28 (17) and elutriation (14). In this experiment, we selected 11 genes considered to be key regulators of the cell cycle process according to [15]. The gold standard network was obtained from the KEGG database.

In Fig. 5 we show the similarities values when using the BIC score and the prior biological knowledge for $\omega =1,2,4$. The completeData bars was generated using all six experiments together. Because the experiments cln3 and clb2 are composed by only 2 samples each we did not use them individually. We can observe that the biological knowledge improves the quality of the output network specially in the cdc28 and alpha experiments.

In Fig. 6 the show the results when using the same data, but with the BDe score. We can observe that the similarity values are slightly worst when compared to the BIC score experiment. It is also possible to notice a soft improvement of the quality when considering the biological knowledge.

Comparing the true positive rates between BIC and BDe scores shows that BDe performs better in most of our experiments, as we can observe in Fig. 7. In experiments 1 to 4 we used the completeData with biological knowledge plus $\omega =0,1,2,4$, respectively. In experiments 5 to 8 the alpha dataset was used, following the same pattern of $\omega $. Experiments 9 to 12 correspond to the cdc15 dataset, 13 to 16 correspond to the cdc28 dataset, and from 17 to 20 to the elutriation dataset.

4 Conclusion

In this work, we proposed a methodology for the reverse engineering of gene regulatory networks from time series gene expression using dynamic bayesian networks as well. We also proposed the use of a source of prior biological knowledge, known as STRING, to improve the quality of the inferred networks. The methodology was tested on two datasets: artificial dataset provided by the DREAM4 in silico challenge and a biological dataset from a yeast cell cycle study.

To learn the structure of the DBN we search for the network that best matches the data, based on a score function. We used two types of functions, BIC and BDe scores, and compared them. The results show that BDe score performs better in most of the experiments.

We also compared the standard Bayesian Network model against the Dynamic Bayesian Network, showing that the DBN provides better results than its standard version. One important point to highlight is that our methodology considers a source of prior biological knowledge, when available, unlike most of the reverse engineering algorithms. We showed through our experiments that this approach can improve the quality of the inferred networks. As a future work, we can explore how to use various sources of biological knowledge instead of only one.

References

Dougherty, E.R.: Validation of inference procedures for gene regulatory networks. Curr. Genomics 8(6), 351–359 (2007)
Article Google Scholar
DREAM: DREAM: Dialogue for Reverse Engineering Assessments and Methods (2009). http://wiki.c2b2.columbia.edu/dream/
Friedman, N., Murphy, K., Russell, S.: Learning the structure of dynamic probabilistic networks. In: 15th Annual Conference on Uncertainty in Artificial Intelligence, pp. 139–147. Morgan Kaufmann (1999)
Google Scholar
Friedman, N., et al.: Using Bayesian networks to analyze expression data. J. Comput. Biol. 7(6), 601–620 (2000)
Article Google Scholar
Gao, S., Wang, X.: Quantitative utilization of prior biological knowledge in the Bayesian network modeling of gene expression data. BMC Bioinf. 12(1), 359+ (2011)
Article Google Scholar
Goodwin, B.C.: Temporal Organization in Cells; A Dynamic Theory of Cellular Control Process. Academic Press, Cambridge (1963)
Google Scholar
Hecker, M., et al.: Gene regulatory network inference: data integration in dynamic models - a review. BioSystems 96, 86–103 (2009)
Article Google Scholar
Heckerman, D., Geiger, D., Chickering, D.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 2(3), 197–243 (1995)
MATH Google Scholar
Huang, S., Ernberg, I., Kauffman, S.: Cancer attractors: a systems view of tumors from a gene network dynamics and developmental perspective. Semin. Cell Dev. Biol. 20(7), 869–876 (2009)
Article Google Scholar
Kanehisa, M., Goto, S., Kawashima, S., Nakaya, A.: The KEGG databases at GenomeNet. Nucleic Acids Res. 30, 42–46 (2002)
Article Google Scholar
Karlebach, G., Shamir, R.: Modelling and analysis of gene regulatory networks. Nature 9, 770–780 (2008)
Google Scholar
Kauffman, S.A.: Metabolic stability and epigenesis in randomly constructed genetic nets. J. Theor. Biol. 22(3), 437–467 (1969)
Article MathSciNet Google Scholar
Koike, C.Y., Higa, C.H.A.: Inference of gene regulatory networks using coefficient of determination, Tsallis entropy and biological prior knowledge. In: Proceedings of the IEEE 16th International Conference on Bioinformatics and Bioengineering (2016)
Google Scholar
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques, 1st edn. MIT Press, Cambridge (2012)
Google Scholar
Li, F., Long, T., Lu, Y., Ouyang, Q., Thang, C.: The yeast cell-cycle network is robustly designed. PNAS USA 101(14), 4781–4786 (2004)
Article Google Scholar
Li, Y., Liu, L., Bai, X., Cai, H., Ji, W., Guo, D., Zhu, Y.: Comparative study of discretization methods of microarray data for inferring transcriptional regulatory networks. BMC Bioinf. 11(1), 520+ (2010)
Article Google Scholar
Linde, J., Schulze, S., Henkel, S.G., Guthke, R.: Data and knowledge-based modeling of gene regulatory networks: an update. EXCLI J. 14, 346–378 (2015)
Google Scholar
Locke, J.C.W., Millar, A.J., Turner, M.S.: Modelling genetic networks with noisy and varied experimental data: the circadian clock in Arabidopsis thaliana. J. Theor. Biol. 234(3), 383–393 (2005)
Article MathSciNet Google Scholar
Madhamshettiwar, P.B., Maetschke, S.R., Davis, M.J., Reverter, A., Ragan, M.A.: Gene regulatory network inference: evaluation and application to ovarian cancer allows the prioritization of drug targets. Genome Med. 4(5), 1–16 (2012)
Article Google Scholar
von Mering, C., Jensen, L.J., Snel, B., Hooper, S.D., Krupp, M., Foglierini, M., Jouffre, N., Huynen, M.A., Bork, P.: STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res. 33(Database issue), D433–D437 (2005)
Article Google Scholar
Russell, S.J., Binder, J., Koller, D., Kanazawa, K.: Local learning in probabilistic networks with hidden variables. In: IJCAI, pp. 1146–1152 (1995)
Google Scholar
Schena, M., Shalon, D., Davis, R.W., Brown, P.O.: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235), 467–470 (1995)
Article Google Scholar
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Article MathSciNet Google Scholar
Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B.: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 (1998)
Article Google Scholar
Szklarczyk, D., Franceschini, A., Wyder, S., Forslund, K., Heller, D., Huerta-Cepas, J., Simonovic, M., Roth, A., Santos, A., Tsafou, K.P., Kuhn, M., Bork, P., Jensen, L.J., von Mering, C.: STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43(Database issue), D447–D452 (2015)
Article Google Scholar
Wang, Z., Xu, W., Lucas, F., Liu, Y.: Incorporating prior knowledge into gene network study. Bioinformatics 29(20), 2633–2640 (2013)
Article Google Scholar
Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009)
Article Google Scholar
Webb, A.R.: Statistical Pattern Recognition, 2nd edn. Wiley, Hoboken (2002)
Book Google Scholar
Werhli, A.V., Husmeier, D.: Reconstructing gene regulatory networks with Bayesian networks by combining expression data with multiple sources of prior knowledge. Stat. Appl. Genet. Mol. Biol. 6(1), 15 (2007)
Article MathSciNet Google Scholar

Download references

Acknowledgement

This research was supported by CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico) and UFMS (Universidade Federal de Mato Grosso do Sul).

Author information

Authors and Affiliations

Federal University of Mato Grosso do Sul, Campo Grande, MS, 79070-900, Brazil
Mariana C. de Souza & Carlos H. A. Higa

Authors

Mariana C. de Souza
View author publications
You can also search for this author in PubMed Google Scholar
Carlos H. A. Higa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos H. A. Higa .

Editor information

Editors and Affiliations

University of Perugia, Perugia, Italy
Osvaldo Gervasi
University of Basilicata, Potenza, Italy
Beniamino Murgante
Covenant University, Ota, Nigeria
Sanjay Misra
Saint Petersburg State University, Saint Petersburg, Russia
Elena Stankova
Polytechnic University of Bari, Bari, Italy
Carmelo M. Torre
University of Minho, Braga, Portugal
Ana Maria A.C. Rocha
Monash University, Clayton, Victoria, Australia
David Taniar
Kyushu Sangyo University, Fukuoka shi, Fukuoka, Japan
Bernady O. Apduhan
Politecnico di Bari, Bari, Italy
Eufemia Tarantino
Myongji University, Yongin, Korea (Republic of)
Yeonseung Ryu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Souza, M.C., Higa, C.H.A. (2018). Reverse Engineering of Gene Regulatory Networks Combining Dynamic Bayesian Networks and Prior Biological Knowledge. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2018. ICCSA 2018. Lecture Notes in Computer Science(), vol 10960. Springer, Cham. https://doi.org/10.1007/978-3-319-95162-1_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-95162-1_22
Published: 04 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95161-4
Online ISBN: 978-3-319-95162-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics