Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

It has been repeatedly suggested that biological (and perhaps also artificial) evolution should preferentially lead to states that are dynamically critical [16]. These states, sometimes said to be “at the edge of chaos”, are neither too rigidly ordered nor chaotic; if the system is described by a dynamical system, the claim translates into the statement that evolution should tune the system’s parameters, so they should be at (or close to) the separatrices between regions of ordered behavior (where the attractors are, e.g., fixed points or limit cycles) and regions where the attractors are chaotic.

It is also often assumed that the presence of power-law distributions is the hallmark of criticality. Indeed, slightly different (although overlapping) notions of criticality have been used [7]. In this paper we show that a well-known model of genetic regulatory networks, introduced by one of us several years ago [8], i.e. that of Random Boolean Networks (RBNs), can be used to study the relationships between power-law distributions and criticality issues.

This work is based in part on previous investigations by some of us [9, 10], where it had been shown that RBNs can simulate the statistical properties of the changes induced by single gene knock-out in the expression levels of all the genes of S. Cerevisiae. In this paper we are not concerned with the comparison of the model with experimental data, but we rather deepen the analysis of the behavior of the model when subject to small permanent perturbations. The smallest perturbation of this type consists in fixing the value of a single node. Here we will consider perturbations that simulate the knock-out of a randomly chosen gene: among the N nodes of the network, one is chosen at random and its value is fixed to 0. However, as it will be discussed in Sect. 2, RBNs have cyclic attractors, and we perform the perturbation after the network has reached an attractor. It is therefore possible that the candidate node be always 0 in every state of the attractor, but in this case clamping it to 0 would have no effect at all; so we discard that node and we choose another one.

In our studies we then compare the time behavior of the unperturbed (“wild type”, briefly WT) network with that of the perturbed one (“knocked-out”, KO) that differs from the first by the clamping to 0 of the chosen node (let us call it node R). A node is said to be affected if its value in the KO network differs from that in the WT network at least once, after the clamping in root. Since nodes are connected, the perturbation can in principle spread, and it is not limited to node R, or to those nodes that are directly connected to it. The avalanche associated to that particular knock-out is the set of affected genes, and the size of the avalanche is the cardinality of that set (let us call it V). In order to compare results concerning different networks, it is sometimes useful to use the relative size of the avalanche, i.e. the ratio V/N.

One of the most intriguing features of the RBN model is that it allows one to distinguish ordered from disordered (often called “chaotic”) regimes on the basis of a single parameter, sometimes called the Derrida parameter λ; as it will be discussed in Sect. 2 this parameter depends upon the choice of the Boolean functions and upon the average number of links per node. Ordered states have λ < 1, for chaotic states λ > 1; the value λ = 1 separates order from chaos, and it is therefore the critical value.

Under the assumptions that the number of incoming links per node A is small (A << N) and that the overall avalanche is small (V << N), it can be proven, as it will be shown in Sect. 3, that the distribution of avalanches depends only upon the same Derrida parameter that determines the dynamical regime of the network. The assumptions made here amount to suppose that an avalanche never interferes with itself. Precisely: an affected node B is defined to be the parent of another affected node C if the first deviation of C from the unperturbed value is due to the influence of B. The non-interference condition amounts to assuming that every node C in the avalanche is not affected by any other affected node different from B (neither at a later stage nor at the same time). Therefore, under these assumptions the topology of a spreading avalanche is that of a tree, where each node has a single parent.

The dependency of the avalanche distribution upon λ had already been derived in our previous paper [10], however at that time it was not possible to provide a formula for avalanches of arbitrary size, because a numerical coefficient had to be manually computed. Here, after correcting a missing term, a recursive formula appears to correctly describe the distribution of avalanches up to size 8. It has then been hypothesized that the formula holds for any avalanche size v, an ansatz that has been numerically verified on simulated avalanches in networks with 1000 nodes and also theoretically proven.

The correct formula for p(v) = Pr(V = v) is then:

$$ p(v) = \frac{{v^{v - 2} }}{(v - 1)!}\lambda^{v - 1} e^{ - \lambda v} $$
(1)

Equation 1 is the same as the one that had been previously reported by Ramo [11], but here it is derived in the physically sound “quenched” model, where all the connections and Boolean functions are fixed for each network, without resorting to the “annealed” approximation [12], where connections and Boolean functions are changed at random at each time step, thus losing any possibility of identifying dynamical attractors.

Equation 1 is valid for avalanches of any size and it is not a power law; by inserting the value λ = 1 we can derive the distribution for avalanches of any size in dynamically critical networks. As it will be shown in Sect. 3, this does not lead to a true power law. However, if we consider fairly large avalanches, for which the Stirling approximation holds (while still being V << N), it turns out that the distribution indeed approximates a power law with slope −3/2.

These results help to clarify the relationship between the concepts of dynamical criticality and those based upon the existence of power-law distributions. In our view, dynamical criticality is a more profound concept, and it may lead (and it often leads) to approximate power-law distributions of interesting quantities.

Due to their modularity, it is sometimes interesting to consider relatively small gene regulatory networks. In these cases the approximation V << N may not hold, and an affected node may be subject to the influence of another changed node, so self-interference can take place. We have also numerically explored this phenomenon, by counting the fraction of self-interfering avalanches as a function of the network size. It is shown in Sect. 4 that this fraction can be a substantial one in networks composed by tens or even hundreds of genes. Note that, while real genetic networks usually host thousands of genes, most of those networks that have been described in detail in the literature are relatively small ones; if it were true that their behavior be largely uncoupled from that of the whole network, then the self-interference of avalanches might have relevant biological implications.

Some comments and conclusions are finally drawn in Sect. 5

2 Random Boolean Networks

Here below a synthetic description of the model main properties is presented, referring the reader to [1, 2, 13] for a more detailed account. Several variants of the model have been presented and discussed, but we will restrict our attention here to the “classical” model. A classical RBN is a dynamical system composed of N genes, or nodes, which can take either the value 0 (inactive) or 1 (active). Let x i (t)∈{0,1} be the activation value of node i at time t, and let X(t) = [x 1 (t), x 2 (t) … x N (t)] be the vector of activation values of all the genes.

The relationships between genes are represented by directed links and Boolean functions, which model the response of each node to the values of its input nodes. In a classical RBN each node has the same number of incoming connections k in , and its k in input nodes are chosen at random with uniform probability among the remaining N − 1 nodes: in such a way the distribution of the outgoing connections per node tends to a Poisson distribution for large N. The Boolean functions can be chosen in different ways: in this paper we will only examine the case where they are chosen at random with uniform probability in a predefined set of allowed transition functions.

In the quenched model, both the topology and the Boolean function associated to each node do not change in time. The network dynamics are discrete and synchronous, so fixed points and cycles are the only possible asymptotic states in finite networks (a single RBN can have, and usually has, more than one attractor). The model shows two main dynamical regimes, ordered and disordered, depending upon the degree of connectivity and upon the Boolean functions. Typically, the average cycle length grows as a power of the number of nodes N in the ordered region and diverges exponentially in the disordered region [1]. The dynamically disordered region also shows sensitive dependence upon the initial conditions, not observed in the ordered one.

It should be mentioned that some interesting analytical results have been obtained by the annealed approach [12], in which the topology and the Boolean functions associated to the nodes change at each step. Several results for annealed nets hold also for the corresponding ensembles of quenched networks. Although the annealed approximation may be useful for analytical investigations [13], in this work we will always be concerned with quenched RBNs, which are closer to real gene regulatory networks.

A very important aspect concerns how to determine and measure the RBNs’ dynamical regime: while several procedures have been proposed, an interesting and well-known method directly measures the spreading of perturbations through the network. This measure involves two parallel runs of the same system, whose initial states differ for only a small fraction of the units. This difference is usually measured by means of the Hamming distance h(t), defined as the number of units that have different activations on the two runs at the same time step (the measure is performed on many different initial condition realizations, so one actually considers the average value <h(t)>, but we will omit below the somewhat pedantic brackets). If the two runs converge to the same state, i.e. h(t)→0, then the dynamics of the system are robust with respect to small perturbations (a signature of the ordered regime), while if h(t) grows in time (at least initially) then the system is in a disordered state. The critical states are those where h(t) remains initially constant. If a single node is perturbed, the average number of differing nodes at the following time step will be equal to [the probability that a node changes value if one of its input changes] times [the average number of connections per node], a quantity that is sometimes called the Derrida parameter.

In the classical model of RBNs, Boolean functions are often chosen at random among all those with k in values, but a detailed study of tens of actual genetic control circuits [14] has shown that in real biological systems only canalizing functions are found: a function is said to be canalizing if there is at least one value of one of its inputs that uniquely determines the output. Therefore it may be interesting to consider cases where only canalizing functions are allowed. Moreover, if we associate the value 0 to inactivity, a node that is always 0 will never show its presence, so it may be interesting also to consider cases where the null function is excluded [9].

3 Perturbations in RBNs

As discussed in the Introduction, one can compare what happens in the WT RBN and in the KO RBN: at the beginning, a single node (that is, the knocked-out one, also called the root of the perturbation) will differ in the two cases, so the size of the initial avalanche will be 1. If no one of the nodes that receive input from the root changes its value, then the avalanche stops there and it will turn out to be of size 1.

Therefore one can compute p 1 , i.e. the probability that an avalanche has size 1, as follows. Let q be the probability that a node chosen at random changes its value if one (and only one) of its inputs changes its value; p 1 is then the probability that all the output nodes of the root do not change, and if there are k outgoing connections, this probability is q k; therefore, integrating over the outgoing distribution:

$$ p_{1} = \sum\limits_{k = 0}^{N - 1} {p_{out} (k)q^{k} } $$
(2)

where p out (k) is the probability that a node chosen at random has k outgoing connections.

As far as larger avalanches are concerned, we will limit in this section to consider the case of large sparse networks with (on average) a few connections per node; therefore the probability that an output node of the root is also one of its input nodes is negligible. In this case the probability that an avalanche has size 2 equals the probability that only one of the output nodes of the root (i.e. a node at level 1) changes its value, and that the perturbation does not propagate downwards to level 2 (i.e. that nodes which receive connections from the affected node do not change their value). Therefore:

$$ p_{2} = \sum\limits_{k = 0}^{N - 1} {kp_{out} (k)q^{k - 1} } (1 - q)\sum\limits_{m = 0}^{N - k - 1} {p_{out} (m)} q^{m} $$
(3)

By applying the same reasoning, one can continue and compute the probability of avalanches of increasing size. Of course, calculations become more and more cumbersome, as the same size can be achieved in different ways (for example, an avalanche of size 3 may be composed by the root and by two nodes at level 1, none at level 2, or by the root, one node at level 1 and one at level 2).

It is however possible to show that every p m can be written as a function of the probability generating function F(q) of the outdegree distribution, defined as:

$$ F = \sum\limits_{k = 0}^{N - 1} {q^{m} p_{out} (m)} $$
(4)

and of its derivatives. Indeed p 1 directly coincides with F (see Eq. 2); noting that \( \frac{\partial F}{\partial q} = \sum\limits_{k = 0}^{N - 1} {p_{out} (k)kq^{k - 1} } \) one can show that p 2 (Eq. 3) can be written as:

$$ p_{2} = (1 - q)F\frac{\partial F}{\partial q} $$
(5)

In the same way it can be shown [10] that also the higher order probabilities can be expressed as functions of F and its derivatives.

One can move one step further by taking into account the fact that the outdegree distribution in the (classical) model networks is approximately Poissonian:

$$ p_{out} (k) = e^{ - A} \frac{{A^{k} }}{k!} $$
(6)

where A = <k> (note that the average of the number of ingoing connections necessarily equals that of the outgoing connections, so there is no need to specify). In this case Eq. 4 becomes:

$$ F = \sum\limits_{k = 0}^{N - 1} {q^{k} e^{ - A} \frac{{A^{k} }}{k!}} \cong \sum\limits_{k = 0}^{\infty } {q^{k} e^{ - A} \frac{{A^{k} }}{k!} = e^{ - 1} e^{qA} } $$
(7)

and therefore, introducing the variable λ = ln(1/F) [15]:

$$ \begin{aligned} & \lambda = \left( {1 - q} \right)A \\ & F = e^{ - \lambda } \\ & p_{n} = B_{n} \lambda^{n - 1} e^{ - n\lambda } \\ \end{aligned} $$
(8)

From Eq. 8 one can observe that F, and therefore the avalanche distribution (the coefficient B n depending only on the graph of perturbation spreading) depends only upon the parameter λ that is the product of two terms, i.e. [probability that a node changes value if one of its input changes]*[average number of connections per node]. Therefore it coincides with the same Derrida parameter defined in Sect. 1.

The computation of the coefficients B n is lengthy an tedious; it has been explicitly performed [15] up to the avalanche of size 8, and the results are summarized in Table 1.

Table 1. Coefficients of the avalanche distribution

By looking at the way in which these numbers are generated, the following formula can be suggested [15]:

$$ B_{n} = {{n^{n - 2} } \mathord{\left/ {\vphantom {{n^{n - 2} } {\left( {n - 1} \right)!}}} \right. \kern-0pt} {\left( {n - 1} \right)!}} $$
(9)

Equation 9 correctly describes the entries of Table 1, and it can be conjectured that it holds for every avalanche size n; taken together with Eq. 8 it leads to Eq. 1 (where the size of the avalanche was denoted by v). In Fig. 1 it is shown that this formula does well approximate the observed distribution of avalanches in simulated RBNs with 1000 nodes (right panel), while the comparison is not satisfactory for small networks (20 nodes, left panel), where self-interference plays a key role; see Sect. 4 for further comments.

Fig. 1.
figure 1

The theoretical avalanche distribution given by Eq. 1 (red circles) is shown together with the distribution observed in simulations (blue triangles). Every node has exactly 2 inputs, and all the 16 Boolean function are allowed with uniform probability. Left: networks with 20 nodes; right: networks with 1000 nodes (Color figure online)

Let us now come back to the issue of critical systems. The formula for the avalanche distribution (Eq. 1) is valid for avalanches of any size; by inserting the value λ = 1 we can derive the distribution for avalanches of any size in dynamically critical networks:

$$ p(v) = \frac{{v^{v - 2} }}{(v - 1)!}e^{ - v} $$
(10)

It is often stated that power-law distributions are associated with critical states, but Eq. 10 does not describe a power law. However, if we consider fairly large avalanches, such that they still are v << N, but for which the Stirling approximation holds, i.e. \( v! \cong \sqrt {\left( {2\pi v} \right)} \left( {{v \mathord{\left/ {\vphantom {v e}} \right. \kern-0pt} e}} \right)^{v} \), we obtain the approximate formula:

$$ p\left( v \right) \cong \left( {2\pi } \right)^{{ - \frac{1}{2}}} \nu^{{ - \frac{3}{2}}} $$
(11)

that is indeed a power law. This result helps to clarify the relationship between the concepts of dynamical criticality (that, for the reasons given in Sect. 1, appear to be the deeper ones) and those based upon the existence of power-law distributions (that are approximate relationships that often hold at criticality).

It goes without saying that the above remark holds for power-law distributions; close to critical states another type of power-law relationship often holds, which describes the relationship between two different variables (i.e., scaling laws of order parameters as a function of the distance from the critical point).

4 Self-Interfering Avalanches

As it has been observed in Sects. 1 and 3, the theoretical Eq. 1 has been derived by assuming that avalanches do not interfere with themselves. This approximation breaks down when the network is “small”, so that it is likely that some avalanches actually show the phenomenon of self-interference. Indeed, it has been shown in Fig. 1 that the distribution of avalanches is largely different from the theoretical one for small networks of 20 nodes. If a portion of a genetic network is, at least under some circumstances, largely decoupled from the rest, then it may be interesting to consider also small networks; therefore we have analyzed the behavior of networks of different sizes, while keeping the connection fixed (two inputs per node).

Fig. 2.
figure 2

Distribution of total avalanches (left) and of non-interfering ones (right) for a network with 20 nodes, two connections per node, only canalizing functions allowed

We have developed an algorithm that provides a good approximation to the number of really interfering avalanches, thus separating them from the non-interfering ones. The results obtained for N = 20 networks are shown in Fig. 2 and it can be seen that they provide support to our guess that departures from the theoretical formula Eq. 1. are largely due to self-interference. Similar results have been obtained by considering networks of different sizes; as it should be expected, ceteris paribus the fraction of interfering avalanches is a monotonous decreasing function of the network size N.

The reported simulations have been performed by considering the case where only canalizing functions are used. As it has been observed, there is a biological reason for that. In this case, the 1−q term, i.e. the probability that a node is changed when one of its parent nodes is changed, is no longer ½ (like in the case with all the Boolean functions) but it is rather 3/7. Another biologically interesting case is the one where all the non-canalizing functions and the NULL function are excluded (in this case 1−q = 6/13). Simulations of avalanche distributions also in this case (not shown here) broadly confirm the above remarks.

5 Conclusions

We have shown here that a very simple formula describes the distribution of non-interfering avalanches of all sizes (provided that they fulfill the non-interference constraint). A similar formula had been obtained by Ramo [11], but by resorting to the annealed approximation. Here the distribution has been explicitly computed for avalanches up to size 8: a recursive formula shows up, so a generalization has been proposed and checked against simulations. It is worth observing that the formula has actually also been proven by the theory of branching process; the interested reader is referred to [15] for details.

The formula allows one to show that approximate power-law distributions can indeed be observed in critical systems for not too small avalanches.

The very interesting phenomenon of avalanche self-interference has been observed and described. It certainly deserves more careful future investigations.

Last but not least, it will be extremely interesting to consider the results of the interaction among different avalanches. This might indeed be the most common case in biology: for example when a chemical is introduced into a cell it is likely to affect more genes at the same time. Interesting effects like the nonlinear dose-response relationships that have been observed might perhaps find at least a partial explanation in the study of the interactions among avalanches.

A final mention of data on real systems is worth, although this paper is not concerned with comparisons with experimental data. It is however interesting to mention that by comparing the experimental data on S.Cerevisiae with the theoretical distribution of Eq. 1, using the jackknife method [16, 17], it is possible to locate the λ parameter in the 95 % confidence interval [0.84, 0.93]. Moreover, an analysis of the same data using Bayes factors [18, 19], leads to reject the hypothesis that the network is precisely critical, since the probability that λ = 1 given the data is smaller than 10−3 under a broad range of prior distributions. The interested reader is referred to [15] for further details.

Of course these results are not conclusive, given that we have analyzed a single data set, but they are very interesting. Note also that Kauffman had suggested that living beings might live “in the ordered region, close to the order-chaos border”, and this is perfectly compatible with the results of the above analysis. Note also that the claim concerning the advantages of critical states might refer to organisms or colonies, and not necessarily to single cells. The relationship between the dynamics of a single cell and that of an organism (or of an organ) may be far from trivial and we refer the interested reader to [2023].