Introduction

Reservoir computing (RC) is a recent paradigm in the field of recurrent neural networks (for a recent overview, see Lukosevicius and Jaeger (2009)). RC approaches have been employed as mathematical models for generic cortical microcircuits, to investigate and explain computations in neocortical columns (see e.g., Maass et al. (2002)). A key element of reservoir computing approaches is the randomly constructed, fixed hidden layer—typically, only connections to output units are trained.

A fundamental question is how the recurrent hidden layer or reservoir should be prepared, designed or guided, to best facilitate the training of connections to output units and consequently maximize task performance. It has been previously shown that the ability of reservoir computing networks to achieve the desired computational outcome is maximized when the network is prepared in a state near the edge of chaos (Legenstein and Maass 2007a, b; Büsing et al. 2010). This refers to a critical state between ordered dynamics (where disturbances quickly die out) and chaotic dynamics (where disturbances are amplified). This property is particularly interesting because of evidence in the literature that cortical circuits are tuned to criticality (see e.g., Beggs and Plenz (2003), Chialvo (2004), Beggs (2008)). The reasons why network performance is increased near the edge of chaos are, however, not yet fully understood.

Other approaches to improving network performance have also been investigated. For example, in a previous study, we have addressed performance issues of echo state networks (ESNs), a particular reservoir computing approach, and investigated methods to optimize for longer short-term memory capacity or prediction of highly non-linear mappings (Boedecker et al. 2009). A general method for improving network performance is the use of permutation matrices for reservoir connectivity. However, problem-specific methods such as unsupervised learning also exist. Bell and Sejnowski (1995), for example, changed connection weights to maximize information, whereas intrinsic plasticity (IP) (Triesch 2005) aims to increase the entropy of each output of the internal units by adapting transfer functions. As we reported elsewhere (Boedecker et al. 2009), IP for tanh neurons unfortunately improves performance only slightly compared to a setup based on random or permutation matrices (at least for a number of tasks; see also further comments in the “Discussion”).

The phenomenon of increased computational performance in recurrent neural networks at the edge of chaos has been addressed in the literature before. Bertschinger and Natschläger (2004) examined networks of threshold units operating on input streams and found computational performance maximized at the phase transition. The “network-mediated separation” criterion was proposed as a measure to quantify computational capability, and it was found to peak at the critical point. In Legenstein and Maass (2007a), the authors proposed two new measures in the context of liquid state machines (LSM) (Maass et al. 2002), another reservoir computing approach using neuron models closer to the detailed biology. They suggested to consider the kernel quality and the generalization ability of a reservoir. Its computational capabilities, they argued, will be characterized as a trade-off between the two, and they showed that it is most efficient at the edge of chaos.

These quantitative studies helped to gain insight into the increased computational performance at the critical point. However, we argue that they measured the elements of ongoing computation only indirectly and on a global scale (network perspective).

In this study, we seek to directly measure the computational capabilities of the reservoir as it undergoes the phase transition to chaotic dynamics. In particular, we will measure the information storage at each neuron, and information transfer between each neuron pair in the reservoir. This contrasts with examining the entropy of each unit alone, since these measures relate directly to the computational tasks being performed. Furthermore, it means that we can directly quantify whether the computational properties provided by the reservoir are maximized at the edge of chaos, and we can do so on a more local scale (node perspective). Finally, the general applicability of these measures allows us to compare the computations in different kinds of dynamical systems.

We begin by describing in “Echo state networks” the reservoir computing approach used here (ESNs). We then explain the parameter variation under which the reservoirs of these networks undergo a transition from ordered to chaotic dynamics in “Estimating the criticality of an input-driven ESN”. Subsequently, we describe the information-theoretical framework used for analysis here in “Information-theoretical measures”, including the active information storage (AIS) (Lizier et al. 2007, 2008a) and transfer entropy (TE) (Schreiber 2000). We show in “Results” that direct measurement of these computational operations reveals that both information storage and transfer in the reservoir are maximized near the edge of chaos. This is an important result, since it provides quantitative evidence that a critical reservoir is useful in reservoir computation specifically because the computational capabilities of the reservoir are maximized in that regime. Finally, we discuss the significance of these results in “Discussion”.

Echo state networks

ESNs provide a specific architecture and a training procedure that aims to solve the problem of slow convergence (Jaeger 2001a; Jaeger and Haas 2004) of earlier recurrent neural network training algorithms. ESNs are normally used with a discrete-time model, i.e., the network dynamics are defined for discrete time-steps t, and they consist of inputs, a recurrently connected hidden layer (also called reservoir) and an output layer (see Fig. 1).

Fig. 1
figure 1

Architecture of an echo state network. In ESNs, usually only the connections represented by the dashed lines are trained, all other connections are setup randomly and remain fixed. The recurrent layer is also called a reservoir, analogously to a liquid, which has fading memory properties. As an example, consider throwing a rock into a pond; the ripples caused by the rock will persist for a certain amount of time and thus information about the event can be extracted from the liquid as long as it has not returned to its single attractor state—the flat surface

We denote the activations of units in the individual layers at time t by u t x t , and o t for the inputs, the hidden layer and the output layer, respectively. We use w inWw out as matrices of the respective synaptic connection weights. Using f(x) = tanh  x as output nonlinearity for all hidden layer units, the network dynamics are defined as:

$$ \begin{aligned} {\mathbf{x}}_{t}&=\hbox{tanh}({\mathbf{W x}}_{t-1}+ {\mathbf{w}}^{\rm{in}} {\mathbf{u}}_{t})\\ {\mathbf{o}}_{t}&={\mathbf{w}}^{\rm{out}} {\mathbf{x}}_{t} \end{aligned}. $$

The main differences of ESN to traditional recurrent network approaches are the setup of the connection weights and the training procedure. To construct an ESN, units in the input layer and the hidden layer are connected randomly. Connections between the hidden layer and the output units are the only connections that are trained, usually with a supervised, offline learning approach using linear regression (see Jaeger (2001a) for details on the learning procedure).

For the approach to work successfully, however, connections in the reservoir cannot be completely random; ESN reservoirs are typically designed to have the echo state property. The definition of the echo state property has been outlined in Jaeger (2001a) and is summarized in the following section.

The echo state property

Consider a time-discrete recursive function:

$$ {\mathbf{x}}_{t+1}=F({\mathbf{x}}_{t}, {\mathbf{u}}_{t + 1}) $$
(1)

that is defined at least on a compact sub-area of the vector-space \({\bf x} \in R^{n},\) with n the number of internal units. The x t are to be interpreted as internal states and u t is some external input sequence, i.e. the stimulus.

Definition 1

Assume an infinite stimulus sequence \(\bar{\bf u}^{\infty} = {\bf u}_0, {\bf u}_1, \ldots,\) and two random initial internal states of the system x 0 and y 0. From both initial states x 0 and y 0 the sequences \(\bar{\bf x}^{\infty} = {\bf x}_{0}, {\bf x}_{1}, \ldots\) and \( \bar{\bf y}^{\infty} = {\bf y}_{0}, {\bf y}_{1},\ldots\) can be derived from the update equation Eq. (1) for x t+1 and y t+1. If, for all right-infinite input sequences \({\bf \bar{u}}^{+\infty } = {\bf u}_{t}, {\bf u}_{t+1},\cdots\) taken from some compact set U, for any (x 0, y 0) and all real values \(\epsilon > 0,\) there exists a \(\delta(\epsilon)\) for which \(\lVert {\bf x}_{t} - {\bf y}_{t} \rVert \leq \epsilon\) for all \(t \geq \delta(\epsilon)\) (where \(\lVert \cdot \rVert\) is the Euclidean norm), the system \(F(\cdot)\) will have the echo state property relative to the set U.

In simple terms, the system has echo state property if different initial states converge (for all inputs taken from U).

Estimating the criticality of an input-driven ESN

To determine whether a dynamical system has ordered or chaotic dynamics, it is common to look at the average sensitivity to perturbations of its initial conditions (Derrida and Pomeau 1986; Bertschinger and Natschläger 2004; Büsing et al. 2010). The rationale behind this is that small differences in the initial conditions of two otherwise equal systems should eventually die out if the system is in the ordered phase, or persist (and amplify) if it is in the chaotic phase. A measure for the exponential divergence of two trajectories of a dynamical system in state space with very small initial separation is the Lyapunov (characteristic) exponent (LE). Although, a whole spectrum of Lyapunov exponents is defined, the rate of divergence is dominated by the largest exponent. It is defined as:

$$ \lambda = \lim_{k \rightarrow \infty} \frac{1}{k}\ln\left(\frac{\gamma_{k}}{\gamma_{0}}\right) $$

with γ0 being the initial distance between the perturbed and the unperturbed trajectory, and γ k being the distance at time k. For sub-critical systems, λ < 0 and for chaotic systems λ > 0. A phase transition thus occurs at \(\lambda \approx 0\) (called the critical point, or edge of chaos).

Since, this is an asymptotic quantity, it has to be estimated for most dynamical systems. We adopt here the method described in Sprott (2003, Chap. 5.6). Two identical networks are simulated for a period of 1,000 steps (longer durations were tried but found not to make a significant difference). After this initial period serving to run out transient random initialization effects, proceed as follows.

  1. 1.

    Introduce a small perturbation into a unit n of one network, but not the other. This separates the state of the perturbed network x 2 from the state of the unperturbed network x 1 by an amount γ0.Footnote 1

  2. 2.

    Advance the simulation one step and record the resulting state difference for this kth step \(\gamma_{k}=\Vert {\bf x}^{1}(k) -{\bf x}^{2}(k)\Vert\). The norm \(\Vert \cdot \Vert\) denotes the Euclidean norm in our case, but can be chosen differently.

  3. 3.

    Reset the state of the perturbed network \(\mathbf{x}^{2}\) to \(\mathbf{x}^{1}(k)+(\gamma_{0}/\gamma_{k})(\mathbf{x}^{2}(k)- \mathbf{x}^{1}(k)).\) This renormalization step keeps the two trajectories close to avoid numerical overflows (see Fig. 2 for an illustration of these steps).

Fig. 2
figure 2

Numerical estimation of the largest Lyapunov exponent λ. Trajectories are kept close by resetting the distance to γ0 after each update step in order to avoid numerical overflows (illustration after (Zhou et al. 2010)). See text for more details

In Sprott (2003), γ k is added to a running average and steps 2 and 3 are performed repeatedly until the average converges. Here, we repeat these simulation and renormalization steps for a total of 1,000 times (again, longer durations were tested, but found not to change results significantly), and then average the logarithm of the distances along the trajectory as \(\lambda_{n}=\langle \ln (\gamma_{k}/\gamma_{0})\rangle_{k}.\)

For each reservoir with N units that is tested, we calculate N different λ n values, choosing a different reservoir unit n to be perturbed each time. These values are then averaged to yield a final estimate of the Lyapunov exponent \(\lambda = \langle \lambda_{n}\rangle_{n}.\)

Information-theoretical measures

A natural framework to describe distributed computation in dynamical systems is found in information theory (Shannon and Weaver 1949; Cover and Thomas 2006). It has proven useful in the analysis and design of a variety of complex systems (Klyubin et al. 2005; Lungarella and Sporns 2006; Sporns and Lungarella 2006; Prokopenko et al. 2006; Olsson et al. 2006; Ay et al. 2008; Lizier et al. 2008b), as well as in theoretical neuroscience (Strong et al. 1998; Tang et al. 2008; Tang and Jackson 2008; Borst and Theunissen 1999). To introduce the measures, we use for information storage and transfer in multivariate systems, we briefly review important concepts of information theory.

The (Shannon) entropy is a fundamental measure that estimates the average uncertainty in a sample x of stochastic variable X. It is defined as

$$ H_{X}=-\sum_{x}p(x)\log_{2}p(x) $$

If a base two logarithm is used in this quantity as above, entropy is measured in units of bits.

The joint entropy of two random variables X and Y is a generalization to quantify the uncertainty of their joint distribution: \(H_{X,Y}=-\sum_{x,y} p(x,y)\log_2{p(x,y)}.\) The conditional entropy of X given Y is the average uncertainty that remains about x when y is known: H X|Y  =  − ∑ x,y p(xy) log2 p(x|y). The mutual information between X and Y measures the average reduction in uncertainty about x that results from learning the value of y, or vice versa: I X;Y  = H X  − H X|Y . The conditional mutual information between X and Y given Z is the mutual information between X and Y when Z is known: I X;Y|Z  = H X|Z  − H X|Y,Z .

These information-theoretic measures can be used to describe the process by which each variable or node X in a system updates or computes its next state. Such computations utilize information storage from the node itself, and information transfer from other nodes.

The information storage of a node is the amount of information in its past that is relevant to predicting its future. We quantify this concept using the AIS to measure the stored information that is currently in use in computing the next state of the node (Lizier et al. 2007, 2008a). The AIS for a node X is defined as the average mutual information between its semi-infinite past \(x_n^{(k)}=\left\lbrace x_{n},x_{n-1},\ldots,x_{n-k+1} \right\rbrace\) and its next state x n+1:

$$ A_X =\lim_{k \rightarrow \infty}{\sum_{x_{n+1},x^{(k)}}{p(x_{n+1},x^{(k)}) \log_2{\frac{p(x^{(k)}_{n},x_{n+1})}{p(x^{(k)}_{n})p(x_{n+1})}}}}. $$
(2)

A X (k) represents an approximation with finite history length k. From our computational perspective, a node can store information regardless of whether it is causally connected with itself; i.e., for ESNs, this means whether or not the node has a self-link. This is because information storage can be facilitated in a distributed fashion via one’s neighbors, which amounts to the use of stigmergy [e.g., see Klyubin et al. (2004)] to communicate with oneself (Lizier et al. 2008a, b, c).

The information transfer between a source and a destination node is defined as the information provided by the source about the destination’s next state that was not contained in the past of the destination. The information transfer is formulated in the TE, introduced by Schreiber (2000) to address concerns that the mutual information (as a de facto measure of information transfer) was a symmetric measure of statically shared information. The TE from a source node Y to a destination node X is the mutual information between the previous state of the source.Footnote 2. y n and the next state of the destination x n+1, conditioned on the semi-infinite past of the destination x (k) n (as \(k \rightarrow \infty\) (Lizier et al. 2008c)):

$$ T_{Y \rightarrow X}=\lim_{k \rightarrow \infty}{\sum_{{\mathbf{u}}_n}{p({\mathbf{u}}_n) \log_2{\frac{ p(x_{n+1}|x^{(k)}_{n},y_{n})}{p(x_{n+1}|x^{(k)}_{n})}}},} $$
(3)

where u n is the state transition tuple (x n+1x (k)y n ). Again, \(T_{Y \rightarrow X}(k)\) represents finite-k approximation.

Results

To investigate the relation between information transfer, AIS, and criticality in ESNs, we used networks whose reservoir weights were drawn from a normal distribution with mean zero and variance σ2. We changed this parameter between simulations so that log σ varied between [−1.5, −0.5], increasing in steps of 0.1. A more fine-grained resolution was used close to the edge of chaos, between [−1.2, −0.9]. Here, we increased log σ in steps of 0.02. We recorded the estimated Lyapunov exponent λ as described in “Estimating the criticality of an input-driven ESN”, the information measures described in the previous section, and a parameter for task performance described below.

The AIS was measured for each reservoir unit, and the TE between each reservoir unit pair. A history size of k = 2 was used in the TE and AIS calculations, and kernel estimation with a fixed radius of 0.2 was used to estimate the required probabilities. We recorded 15,000 data points for each time series after discarding 1,000 steps to get rid of transients. The output weights were trained with 1,000 simulation samples using a one-shot pseudoinverse regression. Input weights were drawn uniformly between [−0.1, 0.1].

We used two common benchmark tasks to evaluate network performance. The first task was used to assess the memory capacity of the networks as defined in Jaeger (2001b). For this task, ESNs with a single input, 150 reservoir nodes, and 300 output nodes were used. The input to the network was a uniformly random time series drawn from the interval [−1; 1]. Each of the outputs was trained on a delayed version of the input signal, i.e., output k was trained on input(t − k), \(k = 1\ldots 300.\) To evaluate the short-term memory capacity, we computed the k-delay memory capacity (MC k ) defined as

$$ {\text{MC}}_k =\frac{{\text{cov}}^{2}({\mathbf{u}}_{t-k},{\mathbf{o}}_{t})}{\sigma^2({\mathbf{u}}_{t-k})\sigma^{2}({\mathbf{ o}}_{t})}. $$

The actual short-term memory capacity of the network is defined as \({\text{MC}}=\sum_{k=1}^{\infty}{\text{MC}}_{k}.\) However, since we can only use a finite number of output nodes, we limited their number to 300. This provided sufficiently large delays to see a significant drop-off in performance for the tested networks.

The second benchmark task we used was a systems modeling task. We trained networks with a single input and 150 reservoir neurons to model a 30th order nonlinear autoregressive moving average (NARMA) system. In this task, the output y(t) of the system is calculated by combining a window of past inputs x(t) (sampled from a uniform random distribution between [0.0, 0.5]) in a highly nonlinear way:

$$ \begin{array}{l} y(t+1)=0.2 y(t) + 0.004 y(t)\sum_{i=0}^{29}y(t-i)\\ \qquad + 1.5x(t-29) x(t)+ 0.001. \end{array} $$

The performance for this task was evaluated using the normalized root mean squared error measure:

$$ {\text{NRMSE}} =\sqrt{\frac{\langle (\tilde{y}(t)- y(t))^{2}\rangle_{t}}{\langle(y(t)-\langle y(t)\rangle_{t})^{2} \rangle_{t}}}, $$

where \(\tilde{y}(t)\) is the sampled output and y(t) is the desired output.

The results of the experiments described above are shown in Fig. 3 (left) for the MC task, and in Fig. 3 (right) for the NARMA modeling task. For each value of log σ, the simulations were repeated 50 times (the clusters that can be observed in the figures are the result of slightly different LE values for each of these repetitions). The MC performance in Fig. 3 (left) shows a lot of variance, but a general increase can be seen as the LE approaches the critical value zero. After peak performance is reached very close this point, the performance drops rapidly. The performance in the NARMA task does not show as much variation. The NRMSE stays around 0.8 for LE values from −0.9 to −0.4. As the LE approaches zero, the NRMSE decreases from around 0.5 to its lowest value of 0.4125 at LE −0.081. Shortly after that, however, as the LE approaches zero even more closely, the NRMSE increases sharply and reaches values as high as 142 (LE −0.011). After this peak, the NRMSE values stay at an increased level of about 2.

Fig. 3
figure 3

Left Memory capacity versus estimated Lyapunov exponent. Right Normalized root mean squared error (NRMSE) versus estimated Lyapunov exponent

To arrive at a single value for the TE and AIS per reservoir, we took averages over all the nodes in the reservoir. The TE plots in Fig. 4 and AIS plots in Fig. 5 show very similar behavior for both tasks. Both TE and AIS can hardly be measured for LE values below −0.2. Around the critical point, however, there is a sharp increase in TE/AIS, followed by a sharp decline between LE values 0 and about 0.05. Both quantities stay at a slightly elevated level compared to the values in the stable regime after that, decreasing only slowly.

Fig. 4
figure 4

Left Average TE in the reservoir for the memory capacity task versus estimated Lyapunov exponent. Right Average TE in the reservoir for the NARMA task versus estimated Lyapunov exponent

Fig. 5
figure 5

Left Average AIS in the reservoir for the memory capacity task versus estimated Lyapunov exponent. Right Average AIS in the reservoir for the NARMA task versus estimated Lyapunov exponent

Discussion

The conjecture that computational performance of dynamical systems is maximized at the edge of chaos can be traced back at least to (Langton 1990), and a significant number of works have addressed this issue [see Legenstein and Maass (2007b) for a good review]. A number of quantitative studies, including those mentioned in the “Introduction”, have been presented and have helped to elucidate the mechanisms underlying this maximization of computational performance. In this study, we adopt a more general framework and at the same time are able to measure the elements contributing to ongoing computation more directly and in a more localized fashion.

By investigating the information dynamics, this study provides new insights into the problem of relating computation in recurrent neural networks to elements of Turing universal computation–information transfer and information storage. Our motivation for this study was to explore why tuning the ESN reservoir to the edge of chaos here produces optimal network performance for many tasks. Certainly, we confirmed previous results (Legenstein and Maass 2007a; Büsing et al. 2010) which have shown that performance peaks at the edge of chaos (for the MC task in our case). We then showed that our information-theoretic approach quantitatively suggests that this is due to maximized computational properties (information storage and transfer) near this state. This also indicates that information transfer and information storage are potential candidates to guide self-organized optimization for the studied (and maybe other) systems (see, however, the points below).

Our results for these information dynamics through the phase transition in ESNs are similar to previous observations of these dynamics through the order-chaos phase transition in Random Boolean Networks (RBNs) (Lizier et al. 2008b). A distinction however is that in the RBNs study, the information storage was observed to be maximized slightly on the ordered side of the critical point and the information transfer was maximized slightly on the chaotic side of the critical point. This is in contrast to our results here, where both maximizations appear to coincide with criticality. Both results, however, imply maximization of computational properties near the critical state of the given networks. The similarity of the results seems natural on one hand (given similar descriptions of the phase transitions in both systems), but on the other hand these two types of networks are quite different. Here, we used analog activations and connections, whereas RBNs have discrete connections and binary states (supported by Boolean logic). Also, our networks are input driven, and RBNs [in Lizier et al. (2008b)] are not. Since, we know that the transition from binary to analog networks can change system dynamics to a very large degree (Büsing et al. 2010), the similarity in results across these network types is intriguing. The implications are quite interesting also, since relevant natural systems in each case are suggested to operate close to the edge of chaos (gene regulatory networks for RBNs, and cortical networks here).

We must place a number of caveats on these results however. Certainly, the computational capability of the network will be dependent on the input, and we will not find universal behavior through the order-chaos phase transition.

We also note that the network is always performing some computation, and does not need to be at the critical state to do so. While the critical state may maximize computational capabilities, the given task may require very little in terms of computation. For these reasons, it is known that systems do not necessarily evolve the edge of chaos to solve computational tasks (Mitchell et al. 1993). Moreover, neural networks are applied to a large variety of different tasks, and certainly not all of them will benefit from networks close to criticality. Training a network for fast input-induced switching between different attractors (“multiflop” task), for instance, is known to work best with reservoirs whose spectral radius is small, i.e., those on the very stable side of the phase transition (cf. Jaeger 2001a, Sect. 4.2). Instead of a long memory, this tasks requires the networks to react quickly to new input. We also see that the networks in the NARMA task show best performance slightly before the phase transition, while performance is actually worst right at the measured edge of chaos. A possible explanation for this might be that the memory in the network actually gets too long. The networks in this task need access to the last 30 inputs to compute a correct output, but if information stays in the reservoir from inputs older than 30 steps, it might interfere with the ongoing computation. Figure 3 (left) for the memory capacity task supports this to some extent, showing that memory capacity reaches values in excess of 30 around the critical point. Lazar et al. (2009) present evidence that RNNs with reservoirs which are initialized close to the phase transition point and subsequently shaped through a combination of different plasticity mechanism (IP, synaptic scaling, and a simple version of spike timing dependent plasticity) actually drive the network further away from the critical region toward more stable dynamics. Nonetheless, they outperform networks with fixed random reservoirs close to that region, at least for the task they tested (predicting the next character in a sequence).

Further, we see that performance on the two tasks we studied shown in Fig. 3 is still quite good while the network remains in the ordered regime, even though storage and transfer are not measured to be very high here. This suggests that much of the storage and transfer we measure in the reservoir is not related to the task —an interesting point for further investigation. The effect of different reservoir sizes on the computational capabilities may be interesting to investigate: while the memory capacity increases with the number of reservoir units, the prediction of some time series will only require a finite amount of memory. Adjusting the reservoir size to the point so that the reservoir is exactly large enough for the given task and data may produce networks where the computational capabilities are only dedicated to the task at hand. Also, information transfer between input and outputs of the reservoir is corresponding to a quantification of computational properties of the task (rather than computational capabilities of the reservoir); taking this into account may complete the picture.

Using these insights to improve or guide the design of the reservoir, beyond confirming that best performance occurs near the edge of chaos, gives a number of opportunities for future study: Levina et al. (2007) present a possible mechanism for self-organized criticality in biologically realistic neuron models, and it would be interesting to examine their results from the information dynamics perspective presented in this article. First steps toward using the information transfer to improve performance of reservoirs have been taken in (Obst et al. 2010). Here, information transfer of individual units is tuned locally by adapting self-recurrence, dependent on the learning goal of the system. In addition, the information dynamics framework might be useful to gain insight into how the different plasticity mechanisms drive networks away from the edge of chaos in Lazar et al. (2009), but still achieve superior performance.

We emphasize that our main finding is that information storage and transfer are maximized near the critical state, regardless of the resulting performance. Indeed, there is certainly not a one-to-one correspondence between either of the information dynamics and performance. We also note the results of Lizier et al. (2010), showing that maximizing these functions in other systems does not necessarily lead to complex behavior.

Therefore, our results represent a promising starting point for an understanding of the individual computational properties of ESN nodes. However, there is certainly much work remaining in exploring how these properties can be guided to best support network computation.