Keywords

1 Introduction

The advances on high-throughput technology enabled us to obtain a great deal of gene expression data during the past ten years [1]. Meanwhile, a large number of models and methods emerged for accurately reconstructing gene regulatory networks. For example, the boolean network, the system of differential equations, artificial neural networks, bayesian network, and so on. Among these, the Bayesian Networks become a main model in the research of gene regulatory networks for it’s advantages, including handling incomplete data sets, fully combing the prior knowledge of the domain, visual image, and so on.

Generally there were two kinds of algorithms for Bayesian network structure learning: Search scoring method and correlation analysis method. The search scoring method usually introduced a scoring function \( S(G,D) \) and then used this function to evaluate each possible network structure for finding an optimal solution from all possible network structures. The correlation analysis method captured the dependency between nodes via independence test to study the network structure. In the study, we quoted the classic K2 algorithm [2] based on Bayesian Dirichlet Equivalence (BDE) scoring function to learn Bayesian Networks structure. The innovation of our method lied in introducing Part Mutual Information (PMI) to initialize the Bayesian Network structure.

The method based on the Mutual Information (MI) firstly obtained the mutual information matrix by calculating the mutual information between all possible gene pairs and then constructed the regulatory network through the mutual information matrix [3]. Although the method achieved good results, the limitation of this method was that it could not distinguish the direct and indirect association between genes with a high false positive. So Wang et al. [4] proposed a method based on conditional mutual information (CMI), which could distinguish the direct and indirect relationships between genes, and greatly reduced the false positive rate. However, the model based on conditional mutual information also had some limitations. For example Chen et al. [5] pointed out that there is a false negative problem based on conditional mutual information. When we measured the dependency between variables X and Y given variable Z, CMI could not correctly measure the direct association or dependency if X (or Y) was strongly associated with Z. In order to solve this problem, a method based on partial mutual information [5] was proposed, which solved the false positive problem of mutual information model and the false negative problem of conditional mutual information.

In the paper, we took advantage of partial mutual information to determine the relationship between variables for initialing Bayesian network structure. Then use K2 algorithm based on BDE scoring function to search the optimal network structure. Through two groups of experiments, our proposed method was illustrated to be effective.

2 Method

2.1 Part Mutual Information

Assuming that X and Y were two random variables, MI was defined on the basis of an extended Kullback–Leibler (KL) divergence D [6]:

$$ MI(X;Y) = D(p(x,y)\left\| {p(x)p(y)} \right.) = \sum\limits_{x,y} {p(x,y)\log \frac{p(x,y)}{p(x)p(y)}} $$
(1)

where \( p(x,y) \) was the joint probability distribution of X and Y and \( p(x) \) and \( p(y) \) were the marginal distributions of X and Y, respectively. Clearly, MI in Eq. 1 was evaluated against the ‘mutual independence’ of X and Y, which was defined as

$$ p(x)p(y) = p(x,y) $$
(2)

CMI for variables X and Y given Z could further detect nonlinearly direct association and was defined as:

$$ \begin{aligned} CMI(X;Y\left| Z \right.) & = D(p(x,y,z)\left\| {p(x\left| z \right.)p(y\left| z \right.)} \right.p(z)) \\ & = \sum\limits_{x,y,z} {p(x,y,z)\log \frac{p(x,y\left| z \right.)}{p(x\left| z \right.)p(y\left| z \right.)}} \\ \end{aligned} $$
(3)

So the conditional independence of X and Y given Z, which was defined as:

$$ p(x\left| z \right.)p(y\left| z \right.) = p(x,y\left| z \right.) $$
(4)

To analogy Eqs. (2), (4), partial independence of X and Y given Z was defined as:

$$ p^{*} (x\left| z \right.)p^{*} (y\left| z \right.) = p(x,y\left| z \right.) $$
(5)

where \( p^{*} (x\left| z \right.) \) and \( p^{*} (y\left| z \right.) \) were defined [7] as

$$ p^{*} (x\left| z \right.) = \sum\limits_{y} {p(x\left| {z,y} \right.)} p(y),p^{*} (y\left| z \right.) = \sum\limits_{x} {p(y\left| {z,x} \right.)} p(x) $$
(6)

So

$$ PMI(X;Y\left| Z \right.) = D(p(x,y,z)\left\| {p^{*} (x\left| z \right.)p^{*} (y\left| z \right.)} \right.p(z)) $$
(7)

Or as

$$ PMI(X;Y\left| Z \right.) = \sum\limits_{x,y,z} {p(x,y,z)\log \frac{p(x,y\left| z \right.)}{{p^{*} (x\left| z \right.)p^{*} (y\left| z \right.)}}} $$
(8)

2.2 Bayesian Networks

The Bayesian Networks were a directed acyclic graph \( B = (G,\Theta ) \) which shown the probabilistic dependency relationship between variables. \( G = (V,E) \) was a directed acyclic graph and \( \Theta \) was the collection of conditional probability table (CPT). \( V = (V1, \cdots ,Vn) \) was a set of nodes and each node represented a field random variable \( Xi \).

$$ E = \left\{ { < Xi,Xk > \left| {Xi,Xk \in V,i \ne k} \right.} \right\} $$
(9)

Where \( < Xi,Xk > \) indicated the directed edge. \( Xi \to Xk \) indicated the probabilistic dependency relationship between variables \( Xi \) and \( Xk \). \( Xi \) was the father node of \( Xk \). \( Pa(Xk) \) indicated the parent node set of variable \( Xk \).

For every variable \( Xk \), it’s value \( xk \) had the following parameter \( \theta xk\left| {pa(Xk)} \right. = P(xk\left| {pa} \right.(Xk)) \), which showed the probability of \( xk \) occurrence under the condition of \( Pa(Xk) \). So the joint conditional probability distribution on a given set of variables of Bayesian Networks was as:

$$ P(X1,X2, \cdots ,Xn) = \mathop \prod \limits_{i = 1}^{n} P(Xi\left| {Pa(Xi)} \right.) $$
(10)

2.3 K2 Algorithm

Cooper [2] put forward K2 algorithm based on Bayesian Dirichlet Equivalence (BDE) scoring function to learn Bayesian Networks structure. The K2 algorithm used the way of hill-climbing search to learn the network structure.

According to prior knowledge, We first obtained the initial network. Then, Use the search operator including reducing edges, increasing edges and reserving edges to modify the current network to get candidate network. According BDE scoring function, Select the fractional optimal network to replace the current network, and then continue the search until obtaining the best network.

The BDE scoring function an approximation of the marginal likelihood function under the condition of large sample. The BDE scoring function was defined as:

$$ S(G,D) = LgP(D\left| G \right.) + LgP(G) - \frac{d}{2}\log m $$
(11)

Where \( P(D\left| G \right.) \) was an edge distribution of dataset \( D \) and was the probability averaging of data from \( D \). \( P(G) \) was the prior probability of the network structure \( G \). \( \frac{d}{2}\log m \) was a penalty function for sample size \( m \). Because of the addition of a penalty term, the BIC scoring function avoided overfitting.

2.4 Evaluation Index

In our study, five criterions (True Positive Rate (TPR), False Positive Rate (FPR), Positive Predictive (PPV), Accuracy (ACC) and F-score) were used to test the performance of the proposed method. Their definition was given as follows:

$$ TPR = \frac{TP}{TP + FN} $$
(12)
$$ FPR = \frac{FP}{FP + TN} $$
(13)
$$ PPV = \frac{TP}{TP + FP} $$
(14)
$$ ACC = \frac{TP + TN}{TP + FP + TN + FN} $$
(15)
$$ F - score = \frac{2PPV*TPR}{PPV + TPR} $$
(16)

Where True Positive (TP) meant that edges in real networks were identified as edges in the model. False Positive (FP) meant that edges not in real networks were identified as edges in the model. True Negative (TN) meant that edges not in real networks were not identified as edges in the model. False Negative (FN) meant that edges in real networks were not identified as edges in the model.

3 Experimental Results and Analysis

3.1 Experiment 1: Simulated Gene Regulatory Network

Figure 1 showed a simulated gene regulatory network which was modeled by a S-system model [8]. The model consisted of \( n \) non-linear ordinary differential equations and the generic form of equation \( i \) is given as follows:

Fig. 1.
figure 1

A simulated gene regulatory network containing 15 genes

$$ X_{i}^{'} (t) = \alpha_{i} \mathop\Pi \limits_{j = 1}^{N} X_{j}^{{g_{ij} }} (t) - \beta_{i} \mathop\Pi \limits_{j = 1}^{N} X_{j}^{{h_{ij} }} (t) + \varepsilon $$

Where \( X_{i} \) was a vector element of dependent variable, \( N \) was the number of variables, \( \alpha_{i} \) and \( \beta_{i} \) was vector elements of non-negative rate constants, \( g_{ij} \) and \( h_{ij} \) were matrix elements of kinetic orders and random Gaussian noises (\( \varepsilon \)) were added to each equation independently.

Table 1 listed the parameters of S-system containing 15 genes. The initial value is random and we respectively got the simulated data by 20 time points, .30 time points and 50 time points.

Table 1. Parameters of the S-system

From Table 2, we can see the TPR and FPR obtained by our proposed algorithm.

Table 2. Performances of the proposed model to the simulated data

3.2 Experiment 2: SOS DNA Repair Network of E.Coli Bacterium

The datasets from SOS DNA Repair network of E.coli bacterium [9] contained four experiments under various light intensities (5\( Jm^{ - 2} \), 5\( Jm^{ - 2} \), 20\( Jm^{ - 2} \), 20\( Jm^{ - 2} \)). Each experiment (http://www.weizmann.ac.il/mcb/UriAlon/ Papers/SOSData) consisted of 50 time points evenly spaced by 6 min and referred to eight genes: uvrD, lexA, umuD, recA, uvrA, uvrY, ruvA and polB. Figure 2 displayed the known real regulation among 8 genes.

Fig. 2.
figure 2

The true SOS network with eight genes and 9 interactions

In the experiment, we firstly normalized the gene expression data for each gene to the interval [0, 1] using

$$ x_{i}^{'} (t) = \frac{{x_{i} (t) - \min_{i} }}{{\max_{i} - \min_{i} }} $$

Where \( x_{i} (t) \) is the actual measured gene expression level of gene \( i \) at time point \( t \). \( min_{i} \) and \( \max_{i} \) respectively were the minimum and maximum of gene \( i \) expression level. Table 3 showed the detail comparison between proposed method and other methods including S-system [10], ODEs [11], RNN [12] and DBN [13]. The results showed that our proposed method performed better than other methods.

Table 3. Comparison of different methods for SOS dataset

4 Conclusions

In the paper, the Part Mutual Information was introduced into the training algorithm of the Bayesian Networks structure, which meant that the initial network has identified the genes related to each other and the later searching process with K2 algorithm only needed to determine the direction of the edges in the Bayesian Networks. We carried out our experiments on two networks: the simulated gene regulatory network and the SOS DNA Repair network of Ecoli bacterium. And via comparison of different methods for SOS DNA Repair network, our proposed method was proved to be effective.