Keywords

1 Introduction

In recent years, a large number of methods have been proposed for inferring gene regulatory networks (GRNs) from gene expression data. Correlations and other statistical measures that group genes by profile similarity identify functionally related groups of genes [1, 2]. Much effort has been devoted to inferring GRNs using varies modeling approaches, ranging from simple Boolean networks to dynamical models of cellular processes [3, 4]. However, correlation measures do not provide direct insight into the identification of the gene interactions that give rise to the observed expression patterns [5]. For this reason, pairwise maximum-entropy probability models have been introduced to infer the GRNs [6]. The logic of such methods is to determine the probability distribution governing the microarray data where the entropy-reducing constraint their pairwise correlations is faithfully encoded [6]. Consequently, the real-valued maximum-entropy distribution given first and second moments is found as a Boltzmann-like distribution, which is determined by the mean and the covariance matrix C, and the inverse covariance matrix M (also known as the precision or concentration matrix) [7].

Statistical inference methods using partial correlations in the context of Gaussian graphical models (GGMs) have led to similar results. By assuming that the precision matrix is sparse and the data samples are drawn independently from the same distribution, the most commonly used method to infer the maximum-entropy networks are spectral decomposition [6] and graphical lasso [8].

Although the above methods have been successfully used to estimate the regulatory relationships among genes, their performance may be limited in some areas. The inferring performance of the spectral decomposition method is not good enough when the samples are very few. Although the graphical lasso is better than spectral decomposition on small samples, its GRNs can not reflect gene regulatory relations (expression and repression) and it is difficult how to choose the right lasso penalty to fit the sparseness is very difficult [9].

To address the above problems, in this paper, we proposed a new way to infer the maximum-entropy GRNs, which is formulated as a constrained optimization problem, using a differential evolution (DE) algorithm. DE has achieved widely successes on various complex constrained optimization problems [10]. We set maximum entropy as the objective function and its subject to the constraint of first and second moment as the penalty functions based on the pairwise maximum-entropy probability models. To demonstrate the performance of the proposed method, the method is compared with other state-of-the-art methods on two synthetic datasets and four real-world datasets. The GRNs obtained by the DE are fully connected networks that reflect gene regulatory relations to identify GRNs involved in diverse cellular processes. Experiment results demonstrate that our method outperforms the other two state-of-the-art inferring maximum-entropy GRNs methods on synthetic datasets. In the meantime, the real data results suggest that the proposed approach is robust to inferring small-scale GRNs.

The rest of this paper is organized as follows. In the Sect. 2, we introduce the background of the maximum-entropy GRNs. In the Sect. 3, we present our proposed framework in detail. In the Sect. 4, we describe the performance of our method on synthetic and real-world datasets. Finally, Sect. 5 concludes this work.

2 Maximum-Entropy GRNs

Pairwise associations between genes can be determined by gene expression and are commonly estimated by the sample Pearson correlation coefficient computed for each pair of genes. However, the Pearson correlation is a misleading measure for direct dependence as it only reflects the association between two genes while ignoring the influence of the remaining ones. Therefore, the relevance network approach is not suitable to deduce direct interactions from a dataset [11].

To address these problems, maximum-entropy GRNs is proposed. It relies on Boltzmann’s concept of entropy maximization to support statistical inference with minimal reliance on the form of missing information, which can remove the variational effect due to the influence of the remaining variables.

Let the state vector x = (x1, …, xN) denote the expression levels of the N genes that are probed in a microarray experiment, and a series of T measurements then has associated with it T distinct state vectors. Let ρ(x) denote the probability that the genome is in the arbitrary state x. We determine ρ(x) by maximizing the Shannon entropy.

$$S=-\rho (\overrightarrow{x)}\mathit{ln}(\overrightarrow{x)}$$
(1)

subject to the ρ(x) is normalized

$${\sum }_{\overrightarrow{x}}\rho (\overrightarrow{x)}=1$$
(2)

first moment, <xi>, and second moment, <xi, xj>

$${<}{x}_{i}{>}={\sum }_{\overrightarrow{x}}\rho (\overrightarrow{x)}{x}_{i}=\frac{1}{T}{\sum }_{k=1}^{T}{x}_{i}^{k}$$
(3)
$${<}{x}_{i}, {x}_{j}{>}={\sum }_{\overrightarrow{x}}\rho (\overrightarrow{x)}{x}_{i}{x}_{j}=\frac{1}{T}{\sum }_{k=1}^{T}{x}_{i}^{k}{x}_{j}^{k}$$
(4)

Equation (2) provides the normalization condition that the probabilities of all observable states sum to 1. Equations (3) and (4) ensure that the distribution ρ(x) preserves the mean expression level of each gene and the correlations between genes. This procedure leads to a Boltzmann-like distribution:

$$\rho (x)\sim {e}^{-H}$$

where

$$\text{H = }\frac{1}{2}{\sum }_{ij}{x}_{i}{M}_{ij}{x}_{j}$$

Note that, because <xi, xj>  =  <xj, xi>, the number of constraints is 1 + N + N(N + 1)∕2. In the same reason Mij = Mji, the number of Mij should be estimated is N(N + 1)∕2.

The elements of the matrix M are the effective pairwise gene interactions that reproduce the gene profile covariances exactly while maximizing the entropy of the system. The intensity and type of an element Mij: a positive value denotes expression (facilitation) and a negative value denotes repression while a zero (0) value implies that there is no interaction between i and j [6, 11].

The matrix of M can be obtained by inverting the matrix of their covariances C. However, in the high dimensional setting where the number of features p is larger than the number of observations n, the empirical covariance matrix C is singular and so can not be inverted to yield an estimate of M. If pn, then even if C is not singular the estimate for C will suffer from very high variance [5].

Spectral decomposition and graphical lasso are proposed to get around this problem by estimating the inverse covariance matrix based on the assumption that the network is sparse and the problem can be approximated via convex optimization. However, the assumption might not be true in reality. The performance of spectral decomposition is not good enough at solving the small sample problem. Although the graphical lasso has better inferring performance, its GRNs cannot reflect the gene regulatory relations (expression and repression). Also, it is difficult to choose a right lasso penalty to control the sparseness of the inferred networks [9].

3 Method

To address the above problems, in this paper, we proposed a new way to infer the maximum-entropy GRNs, a constrained optimization problem, by using an adaptive differential evolution (DE) algorithm. DE as a nature-inspired method has become a more feasible and popular choice among researchers due to their competitive performance on complex search spaces to address the constrained optimization problems [10].

However, the performance of the classic DE is still entirely dependent on control parameters and mutation strategies to both experimental studies and theoretical analyses [12]. The adaptive and self-adaptive DE algorithms have shown faster and more reliable convergence performance than the classic DE algorithms for many benchmark problems.

For this reason, the main objectives of this work are three-fold. First, we set maximum entropy as the objective function and its subject to the constraint of first and second moment as the penalty functions. Second, the Probability Matching (PM) method is integrated into DE to implement the adaptive strategy selection. Third, the JADE [13] is used to set controls mutation factor F and crossover probability CR in an adaptive manner. Details behind algorithm are elucidated as follows.

3.1 Problem Formulation

The general form of the constrained optimization problem will be expressed as follows:

$$ \begin{aligned} & \min \,f(x) \\ {\text{s.t.}}\; & g_{i}(x) \le 0, \, i = 1,2, \cdots ,j \\ & h_{i}(x) = 0, \, i = j + 1,j + 2, \cdots ,m \\ \end{aligned} $$
(5)

where x = (x1, x2, …, xn) are the decision variables of the objective function f (x), gi is an inequality constraint describing the variable, which role of the inequality constraint is to form the search area in the feasible domain. hi is equality constraint that forms a boundary value condition in the feasible domain, which role is to control the boundary of the search area. Normally, we define the objective function as min f(x) and penalty function as

$$ G_{i}(x)=\left\{\begin{array}{ll} \max \left\{0, g_{i}(x)\right\}, & 1 \leq i \leq j \\ \max \left\{0,\left|h_{i}(x)-\delta\right|\right\}, & j+1 \leq i \leq m \end{array}\right.$$
$$ G(x) = \sum\limits_{{i{ = }1}}^{m} {G_{i} (x)} $$
(6)

where δ is a small positive tolerance value. The final fit function would be like

$$\text{Fit} = f(x)+\sigma G(x)$$
(7)

where σ is the punitive coefficient.

In this paper, Eq. (1) is set to the objective function and its subject to the constraint of first and second moment Eqs. (24) are set to the penalty functions. Then the optimization problems (14) can convert into the optimization problem (7) that will be minimized by DE. The initial population {xi = (x1,i,0, x2,i,0, …, xD,i,0)|i = 1, 2, …, NP} is randomly generated according to a uniform distribution [−1, 1], where D = N(N + 1)∕2 (the number of Mij) is the dimension of the problem and NP is the population size.

3.2 Strategy Selection: Probability Matching

Suppose there are K > 1 strategies in the pool A = {a1, · · ·, aK} and a probability \(P(t)=\left\{{p}_{1}(t),\cdots ,{p}_{k}(t)\right\}(\forall t:{p}_{min}\le {p}_{i}(t)\le 1;{\sum }_{i=1}^{K}{p}_{i}\left(t\right)=1).\) In this work, the PM technique is used to adaptively update the probability pa(t) of each strategy a based on its known performance and updated by the rewards received. Denote ra(t) as the reward that a strategy a receives after its application at time t. qa(t) is the empirical estimate of a strategy a, that is updated as follows [14]:

$${q}_{a}(t\text{ + 1})\text{ = }{q}_{a}(t)\text{ + }\alpha [{r}_{a}(t)\text{ . }{q}_{a}(t)]$$
(8)

where α ∈ (0, 1] is the adaptation rate. Based on this quality estimate, the PM method updates the probability pa(t) of applying each operator as follows:

$$p_{a} (t + 1) = p_{min} + (1 - K \cdot p_{min} )\frac{{q_{a} {(}t{ + 1)}}}{{\sum {_{i = 1}^{K} q_{i} {(}t{ + 1)}} }} $$
(9)

where pmin ∈ (0, 1) is the minimal probability value of each strategy, used to ensure that no operator gets lost [13].

In order to assign the credit for each strategy, we adopt the relative fitness improvement ηi proposed in [12] as follows:

$${\eta }_{i}=\frac{\delta }{c{f}_{i}}\cdot \left|p{f}_{i}-c{f}_{i}\right|$$
(10)

where i = 1, · · ·, NP. δ is the fitness of the best-so-far solution in the population. pfi and cfi are the fitness of the target parent and its offspring, respectively. If no improvement is achieved, a null credit is assigned.

Denote Sa as the set of all relative fitness improvements achieved by the application of a strategy a (a = 1, · · ·, K) during generation t. At the end of the generation, a unique reward is used to update the quality measure kept by the PM method (Eq. 9). The credit assignment is as follows [14]:

$$ r_{{\text{a}}} {(}t{) = }\frac{{\sum {_{i = 1}^{{\left| {S_{a} } \right|}} S_{a} (} i)}}{{\left| {S_{a} } \right|}} $$
(11)

where |Sa| is the number of elements in Sa. If | Sa | = 0, ra(t) = 0.

In DE, many schemes have been proposed, applying different mutation strategies and/or recombination operations in the reproduction stage [15]. In order to constitute the strategy pool used in this work, we have chosen four strategies: ‘DE/rand/1’, ‘DE/rand/2’, ‘DE/rand-to-best/1’ and ‘DE/current-to-rand/1’.

3.3 Parameter Adaptation

The parameter adaptation is similar to JADE. At each generation g, the crossover probability CRi of each individual xi is independently generated according to a normal distribution of mean μCR and standard deviation 0.1 as

$$C{R}_{i}\text{ = rand}{\text{n}}_{i}({\mu }_{CR}\text{ , 0.1})$$
(12)

and then truncated to [0, 1] [13]. Denote SCR as the set of all successful crossover probabilities CRi’s at generation g. The mean μCR is initialized to be 0.5 and then updated at the end of each generation as

$${\mu }_{CR}=(1-c)\cdot {\mu }_{CR}+c\cdot {\text{mea}}{\text{n}}_{A}\left({S}_{CR}\right)$$
(13)

where c is a positive constant between 0 and 1 and meanA(·) is the usual arithmetic mean.

Similarly, at each generation g, the mutation factor Fi of each individual xi is independently generated according to a Cauchy distribution with location parameter μF and scale parameter 0.1 as:

$${F}_{i}={\mathit{randn}}_{i}({\mu }_{F} , 0.1)$$
(14)

and then truncated to be 1 if Fi ≥ 1 or regenerated if Fi ≤ 0 [13]. Denote SF as the set of all successful mutation factors in generation g. The location parameter μF of the Cauchy distribution is initialized to be 0.5 and then updated at the end of each generation is as follows:

$${\mu }_{F}=(1-c)\cdot {\mu }_{F}+c\cdot {\text{mea}}{\text{n}}_{L}({S}_{F})$$
(15)

where meanL(·) is the Lehmer mean [13].

3.4 Optimization Work Flow

The probability matching method and JADE are used respectively to select the mutate strategy and set the parameter adaptively. In Algorithm 1, the use of our adaptive DE for inferring Maximum-Entropy GRNs is illustrated.

figure a

4 Experimental Results

Let \( \theta _{{ij}} \) and \( \hat{\theta }_{{ij}} \) denote the elements in the true GRNs and the inferred GRNs, respectively. Whether the absolute value of a particular element is 0 or 1 can be evaluated by a threshold defined for the purpose of inclusion of an interaction in a GRN. An edge can be characterized into four types: true positive (TP), false positive (FP), true negative (TN), and false negative (FN), with their definitions as follows:

  • TP: if \( \theta _{{ij}} \) = 1 and \( \hat{\theta }_{{ij}} \) = 1; TN: if \( \theta _{{ij}} \) = 0 and \( \hat{\theta }_{{ij}} \) = 0.

  • FP: if \( \theta _{{ij}} \) = 0 and \( \hat{\theta }_{{ij}} \) = 1; FN: if \( \theta _{{ij}} \) = 1 and \( \hat{\theta }_{{ij}} \) = 0.

Then the metrics based on which the proposed methodology can be evaluated. (1) True Positive Rate (TPR)/Recall: this signifies the fraction of the total number of existing edges in the original network, correctly predicted in the inferred GRNs; (2) False Positive Rate (FPR)/Complimentary Specificity: this signifies the fraction of the total number of nonexistent edges, incorrectly predicted in the inferred GRNs; (3) Positive Predictive Value (PPV)/Precision: this signifies the fraction of the total number of inferred edges, which is correct. (4) F-Score: this signifies the harmonic mean of the precision and recall.

4.1 Simulation Studies

We first build a random ER random network denoted by its adjacency binary matrix M with non-zero element substituted by a uniform distribution value on [−0.6, − 0.3] ∪ [0.3, 0.6]. To ensure the positive definiteness of the covariance matrix, the real precision matrix \(\Theta\) is set as

$$ \Theta = M{ + }\sigma I $$

σ is the absolute value of the eigenvalues of M, and I is an identity matrix. After this procedure, the synthetic gene expression data could be generated with zero means and covariance C = \(\Theta^{{{ - }1}}\). In order to test the performance of our method with the other two state-of-art method which can deal with the small sample problem (p ≈ n). We generate two small-scale groups of samples (group 1 with n = 5 samples and p = 5 genes; group 2 with n = 10 samples and p = 10 genes) to simulate the small-scale GRNs. 10 random datasets are generated for the above two groups. The empirical covariance matrix C in each dataset is singular and so can not be inverted to yield an estimate of M.

The gene interaction network comprising the genes showing the strongest couplings is highly interconnected. For this reason, we choose the top 20% strongest pairwise interaction to identify the most consistent predicted edges for the construction of the final GRNs.

Fig. 1.
figure 1

The experiment results on two small-scale groups of samples: (A) 10 datasets of 5 genes network with 5 samples. (B) 10 datasets of 10 genes network with 10 samples.

Figure 1 presents the average performance on two different scale networks datasets. For graphical lasso, its sparsity-controlling parameter is chosen automatically by cross-validation. In particular, we run 10 times of the DE on each dataset and take the best optimization result as the final inferring result.

We can find that our method dominates the other two methods on inferring maximum-entropy GRNs. As the number of genes increases, there is a degradation on the performance of all three methods, our method could still achieve competitive performance with the other two comparative methods. The results suggest that our approach can effectively solve the small sample problem while meantime reflecting the gene regulatory relations.

4.2 Real Data Analysis

In this section, the proposed method for inferring maximum-entropy GRNs has been employed to identify the causal relationships among the genes from an in vivo (experimental) microarray dataset. The said dataset summarizes the dynamics of the well illustrated transcriptional network involved in the SOS DNA repair mechanism of E. coli studied experimentally by Ronen et al. [16]. The study included eight genes heavily involved in the SOS repair mechanism: recA, lexA (the master repressor), uvrA, uvrD, uvrY, umuD, ruvA, and polB. The original network has been shown in Fig. 2.

Fig. 2.
figure 2

The original structure of the SOS DNA repair transcriptional network of E.coli. The solid black lines denote activation, and the dashed red lines denote repression.

We choose the absolute value of Mij > 0.7 as the inferred Maximum-Entropy GRNs for the 4 datasets. Table 1 displays a comparison of the statistical properties of the inferred GRNs with those presented in recent investigative work [17,18,19] for different experimental datasets. Table 2 shows the top 3 genes in our inferred GRNs.

Table 1. Comparison of results obtained from the E.coli experiments with those presented in a recent investigative work.

Experiment results show that our method performs better in Precision than the other methods in all experiments except the method [17] in experiment 2. However, the method [17] fails to identify any true positive in experiment 4.

We have to concede that the method [19] has a higher Recall than our GRNs, but its Precision is significantly less than our method. When viewed from the aspect of F-score, our method performs better by comprehensive considering of the Precision and Recall in all 4 experiments.

In Table 2, it should be noticed that lexA (the master gene) is the node with the highest degree in our inferred GRNs in all four experiments, so our GRNs find the master gene correctly. In the meantime, the lexA—uvrY and lexA—polB are not predicted in our GRNs except experiment 2.

Table 2. Top 3 nodes with the highest degree in our inferred GRNs

5 Conclusion

In this paper, we propose a DE based inferring method that can effectively estimate maximum-entropy GRNs. Unlike the other inferring methods of inverse the covariance we consider the inferring maximum-entropy GRNs as a constrained optimization problem and the best individuals searched by our method is the inferred maximum-entropy GRNs.

First, we assume that the maximum-entropy distribution is a Boltzmann-like distribution. Under this assumption, we set maximum entropy as the objective function and its subject to the constraint of first and second moment as the penalty functions. Then the probability matching method and JADE are used respectively to select the mutate strategy and set the parameter adaptively to improve the success rate of the algorithm.

The GRNs resulting from the DE is a fully connected network that fulfills the maximum-entropy GRNs reflecting gene regulatory relations (expression and repression), which graphical lasso can not. For this reason, we can identify connections between genes involved in diverse cellular processes by choosing the different degrees of pairwise interactions.

It outperforms the other two state-of-the-art inferring maximum-entropy GRNs methods on synthetic datasets. In the meantime, the real data results suggest that it can find the master gene, so the proposed approach is robust to inferring small-scale GRNs.

The performance of nature-inspired algorithms often deteriorates rapidly as the dimensionality of the problem increases. Small-scale and large-scale constrained optimization are two completely different problems. There are too few true predictions and a large number of incorrect predictions [19]. Thus, the methodology implemented in this paper needs to be enriched further by studying its performance in larger networks. This provides a vital scope for further research.