Reservoir Computing Trends

Lukoševičius, Mantas; Jaeger, Herbert; Schrauwen, Benjamin

doi:10.1007/s13218-012-0204-5

Reservoir Computing Trends

Fachbeitrag
Published: 16 May 2012

Volume 26, pages 365–371, (2012)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

KI - Künstliche Intelligenz Aims and scope Submit manuscript

Reservoir Computing Trends

Download PDF

Mantas Lukoševičius¹,
Herbert Jaeger¹ &
Benjamin Schrauwen²

5302 Accesses
249 Citations
Explore all metrics

Abstract

Reservoir Computing (RC) is a paradigm of understanding and training Recurrent Neural Networks (RNNs) based on treating the recurrent part (the reservoir) differently than the readouts from it. It started ten years ago and is currently a prolific research area, giving important insights into RNNs, practical machine learning tools, as well as enabling computation with non-conventional hardware. Here we give a brief introduction into basic concepts, methods, insights, current developments, and highlight some applications of RC.

Architectural richness in deep reservoir computing

Article 15 January 2022

Reservoirs Learn to Learn

Next generation reservoir computing

Article Open access 21 September 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

About ten years ago a new trend of understanding, training, and using Recurrent Neural Networks (RNNs) has been started with Echo State Networks (ESNs) [19, 21] and Liquid State Machines (LSMs) [34]. While the former came from the field of Machine Learning (ML) and the latter from computational neuroscience, both approaches share the same basic idea. It stems from the observation that, as long as an RNN possessed certain generic properties, supervised adaptation of all interconnection weights is not necessary, and only training a memoryless supervised readout from it is enough to obtain excellent performance in many tasks (Fig. 1). The RNN is called a reservoir in this context.

This was a welcome discovery, since training RNNs has always been much more difficult than training feedforward neural networks. Cyclic dependencies in RNNs lead to bifurcations during training: infinitesimally small changes to RNN parameters can lead to drastic discontinuous changes in its behavior. This phenomenon may render classical gradient descent RNN training methods (like [52, 53]) non converging [11]. Even if they do converge, this process is typically slow, computationally expensive, requires careful selection of learning parameters, and ends in a local minimum. Learning long-term dependencies in the data is hard [2] (but see [15] for an RNN architecture specialized on learning such dependencies, and [35] for recent progress in generic RNNs). Because of the complexity and computational costs, the number of neurons used in so-trained RNNs has typically been in the order of tens, which in turn limits their expressive capacity.

The approach started by ESNs and LSMs reinvigorated interest in RNN research and applications, a stream which became collectively known as Reservoir Computing (RC) [49]. It now has many more related methods and extensions of the original idea (see [30] for an extensive overview; http://reservoir-computing.org is a web portal collectively maintained by leading RC groups). We will mention a few selected variants here, but let us start with the original basic ESN RC approach.

2 The Basic ESN Approach

Here are the update equations of a typical RNN used in ML with leaky-integrated discrete-time continuous-value units:

(1)

(2)

where n is discrete time, $\mathbf{u}(n) \in{\mathbb {R}}^{N_{\mathrm{u}}}$ is the input signal, $\mathbf{x}(n) \in{\mathbb{R}}^{N_{\mathrm{x}}}$ is a vector of reservoir neuron activations and $\tilde{\mathbf{x}}(n) \in{\mathbb{R}}^{N_{\mathrm {x}}}$ is its update, all at time step n, tanh(⋅) is applied element-wise, [⋅;⋅] stands for a vertical vector concatenation, W ⁱⁿ and W are the input and recurrent weight matrices respectively, and α∈(0,1] is the leaking rate. The model is also frequently used without the leaky integration, which is a special case obtained by setting α=1 and thus $\tilde{\mathbf{x}}(n) \equiv\mathbf {x}(n)$. The linear readout layer is defined as

$$ \mathbf{y}(n)={\mathbf{W}^{\mathrm{out}}}\bigl[1;\mathbf{u}(n);\mathbf{x}(n) \bigr], $$

(3)

where $\mathbf{y}(n) \in{\mathbb{R}}^{N_{\mathrm{y}}}$ is network output, and W ^out the output weight matrix. An additional nonlinearity can be applied to y(n) in (3), as well as feedback connections W ^fb from y(n−1) to $\tilde{\mathbf{x}}(n)$ in (1).

The original method of RC introduced with ESNs [19] is to:

generate a large random reservoir (W ⁱⁿ,W,α);
run it using the training input u(n) and collect the corresponding reservoir activation states x(n);
compute the linear readout weights W ^out from the reservoir using linear regression, minimizing the mean square error of the network output w.r.t. the training target signal y _target(n);
use the trained network on new input data u(n) by computing y(n) employing the trained output weights W ^out.

Let us look at these steps in more detail.

For the approach to work, the reservoir must possess the echo state property, which can roughly be described as fading memory of the input: trajectories of the reservoir state should converge given the same input, irrespective of the previous history. This is typically ensured by appropriately scaling recurrent connection weights W [19]. A few other parameters, most importantly the input weight W ⁱⁿ scaling and leaking rate α, should also be adjusted for an optimal validation performance in a given task.

While running the generated model with training data, the vectors [1;u(n);x(n)] as in (3) are collected into a matrix X and the desired teacher targets y _target(n) into a matrix Y _target, both having a column for every training time step n. The training is typically done by computing the output weights via ridge regression

$$ {\mathbf{W}^{\mathrm{out}}}=\mathbf{Y}_{\mathrm{target}}\mathbf {X}^{{\mathrm{T}}} \bigl(\mathbf{X}\mathbf{X}^{{\mathrm{T}}}+ \gamma^2 \mathbf{I} \bigr)^{-1}, $$

(4)

where I is the identity matrix and γ is a regularization parameter. For optimal results γ should also be selected through validation; note that the network needs no rerunning with a different γ to recompute W ^out. By avoiding training of RNN connections W, the learning is done in a single pass through training data and the optimal output weights W ^out are computed with a high precision using a closed-form solution (4). This also enables a practical use of reservoirs with size of thousands or even tens of thousands of units on contemporary computers [46]. Also note, that Y _target X ^T and XX ^T in (4) can be computed incrementally and stored in the memory, instead of Y _target and X, for arbitrary long training data sequences. Alternatively, W ^out can be continuously adapted by an online learning algorithm [19].

Such simple and efficient RNN training was demonstrated to outperform fully-trained RNNs in many benchmark tasks, e.g., [17, 22, 23, 46, 50]. Some examples of applications are presented in Sect. 7.

3 Perspectives on RC

The principles of RC can be perceived from several different perspectives.

There are certain parallels between RC and kernel methods in ML. The reservoir can be seen as a nonlinear high-dimensional expansion x(n) of the input signal u(n). For classification tasks, input data u(n) which are not linearly separable in the original space ${\mathbb{R}}^{N_{\mathrm{u}}}$, often become linearly separable in the expanded space ${\mathbb{R}}^{N_{\mathrm{x}}}$ of x(n) where they are separated by W ^out. At the same time, the reservoir serves as a memory, providing the temporal context. The “kernel trick” is typically not used in RC, however it is possible to do so by defining recursive temporal context-sensitive kernels that integrate over a continuum of W ⁱⁿ and W, which can be used as in regular Support Vector Machines (SVMs), but for sequence data [13]. SVM-style readouts can also be trained from the reservoirs [41].

The separation between the fixed reservoir and the adaptive readout can also be arrived at when analyzing the dynamics of a full gradient descent RNN training, and optimizing it. In an efficient version of gradient descent RNN training introduced by Atiya and Parlos [1] the output weights W ^out are adapted much more than W and W ⁱⁿ [38], which led to a further optimization where they remain constant, an RC method called BackPropagation-DeCorrelation (BPDC) [43]. BPDC is an online RNN learning algorithm which runs with O(N _x) time complexity.

From a biological perspective, RC gives a simple and yet powerful interpretation of how generic cortical circuits with no well-understood supervised adaptation can be utilized for purposeful computation [34]. Reservoirs also correspond well to how temporal information is spatially encoded in the brain and provides a context for interpreting current inputs [6]. Fixed RNNs for modeling parts of sensory-motor sequence [8] and speech [10] learning architectures have been employed even before the original ESN and LSM publications.

Another advantage or RC is that the same reservoir can be used as a generic computational substrate for multiple tasks concerning the same input. For each task a new readout can be learned independently and without interfering with what has been learned before. This might have a potential in aiming for a general purpose artificial intelligence mechanisms and corresponds well to the natural intelligence.

4 Other RC Approaches

Despite the success of the original ESN approach depicted in Sect. 2, there are many extensions, modifications and improvements possible.

For example, intuitively, there should be something better than a random reservoir. The error landscape of the RNN parameter space for a given task is usually notoriously complicated (this is why gradient descent is difficult). The probability of exactly hitting the global, or even local, minimum in this landscape by picking a random point is virtually equal to zero. Often slight, but always present variations of performance among randomly sampled reservoirs confirm this.

The linear readout is also quite limited in its expressive power.

Guided by such intuitions the modern field of RC substantially widened and differentiated. It has moved away from the initial paradigm of having a fixed RNN and training only the output. However, what still sets the RC approach apart from other RNN training methods is that the recurrent part (the reservoir) is generated or trained differently than the readout. This has become the modern paradigm of RC.

The RC paradigm of separating the reservoir and readout training allows for these two research directions to be pursued virtually independently, and the best results from both to be combined. There are numerous different methods proposed in the literature for both of the directions.

Output training is in essence a standard ML problem, where virtually any method capable of learning an input-to-output mapping can be employed with their respective strengths and weaknesses.

For the reservoir part, there has been also a large number of proposals in the literature. They can roughly be classified into three categories:

Generic methods for generating RNNs with different neuron models, connectivity patterns and dynamics;
Unsupervised adaptation of the reservoir, based on the input data u(n), but not y _target(n);
Supervised, or semi-supervised, like reinforcement learning, adaptation of the reservoir, using task-specific information from both u(n) and y _target(n), but exploiting it differently than for the readout training.

Since the readout training can be very efficient, the quality of a reservoir for a particular task can be tested quickly by measuring the error of the readout. This makes RC a convenient and popular testing ground for many types of RNN models, topologies, unsupervised, reinforcement, and biologically inspired adaptation algorithms.

Most of these different approaches are reviewed in [30], updated in [29]. For the sake of brevity only a few of those have been mentioned here.

The numerous proposed RC modification introduced multiple improvements, often case-specific, extending the power of RC to new domains, and offering new insights into the workings of RNNs. The original ESN approach of Sect. 2, however, still holds its ground for its combination of simplicity and power.

5 Beyond Neural Networks

The RC principle can also be seen as a strategy to implement useful computations on generic dynamical systems, treating them as reservoirs, either in simulations or even in physical instantiations. Thus RC has spread well beyond the world of artificial neural networks. In particular, it enables useful computation on hardware platforms where, e.g., it is hard or just impractical to implement equivalents of basic electronic logic gates and memory cells. Potential and functioning examples include: analog electronics [39, 40], randomly crystallized nonlinear electronic networks [44], opto-electronic [26, 36] and optical [47] systems, or just a bucket filled with water [12]. Readouts from such reservoirs are typically implemented in more conventional ways.

Many of these directions are very active research areas. In a long run such non-neural, physical reservoirs might significantly complement or even, for some domains, replace the omni-present digital electronic computers.

6 Training of the Dynamics

Even if the reservoir is kept fixed, for some tasks the trained readouts are fed back to the reservoir and thus the training process changes its dynamics. In other words, a recurrence exists between the reservoir and the readout. Pattern generation is a typical example of such task. This is either realized by feedback connections W ^fb from the trained output y(n−1) to the reservoir $\tilde{\mathbf{x}}(n)$, or by looping the output y(n−1) as an input u(n) for the next update step n in a predictive generator mode in (1). Note, that these two options are equivalent and just a matter of notation: u(n) and W ⁱⁿ instead of y(n−1) and W ^fb. In some cases, however, both external input and output feedback can be present.

This extends the power of RC, because it no longer relies on fixed random input-driven dynamics to construct the output, but the dynamics are adapted to the task. This power has its price, because stability issues arise here. In order to avoid falling prey to the same difficulties as with full RNN training algorithms, two strategies are used in RC.

The first strategy is to disengage the recurrent relationship between the reservoir and the readout using teacher forcing and treat output learning as a feedforward task. This is done by feeding the desired output y _target(n−1) through the feedback connections W ^fb (or W ⁱⁿ) instead of the real output y(n−1) while learning. The target signal y _target(n) “bootstraps” the learning process and if the output is learned with high precision (i.e., y(n)≈y _target(n)), the recurrent system runs much the same way with the real y(n) in feedbacks after training as it did with y _target(n) during training.

There are some caveats here. The approach works very well if the output can be learned precisely [19]. However, if this is not the case, the distorted feedback leads to an even more distorted output and feedback at the next time step, and so on, with the actual generated output y(n) quickly diverging from the desired y _target(n). Even with well-learned outputs the dynamical stability of the autonomous running system is often an issue. In both cases the problem is alleviated by some kind of regularization of the weights or “immunization” of the state and/or the feedbacks with noise.

The second strategy is using specialized RC learning algorithm to train the outputs W ^out while the real feedbacks are present. The before-mentioned BPDC algorithm is an efficient online option with an optimal time complexity [43]. A recent approach named FORCE learning uses a powerful 2nd-order online learning algorithm to vigorously adapt W ^out in the presence of the real feedbacks [45]. By the initial fast and strong adaptation of W ^out the feedbacks y(n) are kept close to the desired y _target(n) already from the beginning of the learning process, similar to teacher forcing. It appears that FORCE learning is well suited to yield very stable and accurate neural pattern generators.

7 Applications

RC methods have been widely employed in more or less academic applications. The nature of these applications spans all kinds that are amenable to supervised modeling of temporal systems, e.g., temporal pattern classification, temporal pattern generation, time series prediction, timing, routing, memorizing, or controlling nonlinear systems. We refrain from giving an ad hoc selection here; googling “echo state” network application will retrieve a few hundreds of relevant instances.

Useful hints for setting up RC learning systems for practical tasks are given in [20, 48]. It should be clearly appreciated that, like always in machine learning, achieving very good results requires experience, experimentation, and insight into the nature of the respective task. Furthermore, an understanding of basic principles of machine learning is a necessary precondition. Specifically, an insightful use of cross-validation and regularization is key for good performance. RC is not a miracle method that can be used out-of-the-box and then be expected to excel.

Instead of attempting a comprehensive overview, we will highlight a number of applications in which the authors have been (or still are) personally involved.

Speech Recognition

One of the textbook examples of temporal sequence recognition is speech recognition. ESNs and LSMs have already early on been applied to this domain. The first approaches focused specifically on isolated recognition of Japanese vowels [20] and digits [49, 51]. The first attempt to continuous speech recognition was based on a rather atypical setup: a large committee of predictive classifiers using ESNs [42]. It showed good results on a benchmark dataset, but due to the use of a custom acoustic front-end, it is not trivial to compare to state-of-the-art work. More recently, in the European FP7 project ORGANIC (http://reservoir-computing.org/organic) which set out to establish neurodynamical architectures as viable alternative to statistical methods for speech and handwriting recognition, different approaches to speech recognition have been applied. In [46] it was demonstrated that competitive phoneme recognition rates can be achieved using straightforward application of the ESN setup on a hard benchmark dataset. Based on this front-end, ESN-HMM hybrids are currently being investigated to realize word recognition with excellent results. Research on noise-robust recognition using ESNs [24] also demonstrate that they perform better than classic HMM approaches.

Handwriting Recognition

Handwriting recognition is in many respects very similar to speech recognition, and traditionally similar computational approaches have been employed [5]. Therefore it is no coincidence that the Organic project also hosts an industrial partner who develops text recognition solutions, e.g., for car number plate reading (easy) or address recognition in automated postal parcel sorting plants (difficult). This partner, Planet intelligent systems GmbH, has been developing ESN-based recognition modules in a long-standing co-operation with the Machine Learning group at Jacobs University. Important customers of Planet’s parcel sorting technology are FedEx and the US Postal Services. ESN-based offline text recognition functions by scanning the text with a virtual linear camera from left to right, obtaining a time series of pixel vectors, which is passed to a hierarchical reservoir recognizer architecture. On subsequent layers, increasingly aggregate “chunks” are recognized (e.g., letters → words). Importantly, no explicit segmentation routine is necessary (“segmentation-free” processing). The different layers are trained individually in a supervised way, which requires training data that are teacher-annotated on each representational level. Planet seeks collaboration with academic partners, and—quite remarkably—allows scientific results which emerge from such collaborations to be published (e.g., [23, 31]). Planet furthermore has made its very large annotated training dataset available to the scientific community as a benchmark (http://reservoir-computing.org/organic/benchmarks/294).

Robot Motor Control

ESNs can be conveniently trained as deadbeat controllers for nonlinear plants. The setup for such controllers is detailed in the original ESN patent document [18] and had first been employed in practice for the tracking control of omniwheel Robocup robots at Fraunhofer AIS (now Fraunhofer IAIS) [37]. The training principle is to feed the controller ESN with the current plant output observation and an n-timestep-delayed version of the same, which enables the ESN to acquire an nth order model of the plant. In exploitation, the direct plant output feedback channel is replaced by the reference signal while the input which in training received the delayed output observation now receives the direct observations. In a very different way, ESNs are currently being explored as neural pattern generators for the humanoid iCub robot (http://www.icub.org/) within the European FP7 project AMARSi (http://www.amarsi-project.eu/). Here, the objective is to obtain neural pattern generators which can be modulated by higher-level control input, e.g., in order to adapt frequency, amplitude, offset, phase, or waveform of the generated pattern. Modulatable neural pattern generation is an extensive field of research [16]. The innovation offered by ESNs is to obtain a generic learning mechanism by which an existing neural pattern generator can acquire essentially arbitrary novel modes of modulatability by learning [28].

Financial Forecasting

Here is an episode worth telling. In a graduate seminar held at Jacobs University in 2007, a group of five students with no previous exposure to machine learning engaged in an international financial time series prediction contest (http://neural-forecasting-competition.com/NN3/). The competition data consisted in a set of 111 time series of very diverse nature (it was part of the challenge to develop versatile predictors). Within 3 months, the students acquired the basic knowledge of standard data preprocessing methods used in the field, applied them to the raw data, developed ESN predictors, implemented them, submitted their predictions—and won the contest, against competitors with years of professional experience in financial forecasting [17]. An informal account of this story is given at http://minds.jacobs-university.de/teaching/highlights. The predictions were obtained by combining the outputs of ensembles of 500 independently created reservoirs whose sizes ranged around 100 units. This episode underlines the simplicity of RC modeling and its motivational capacity in education as much as it illustrates its modeling performance.

Medical

Ghent University has been actively pursuing the use of ESNs in bio-medical applications with great success. It has been applied to real-time detection of epileptic seizures, and this with very low latency and high accuracy, outperforming the state-of-the-art [7]. This technology would enable treatments for epilepsy that are based on closing-the-loop: rapidly detecting the seizure and actively counteracting it using, e.g., medication or brain stimulation. Based on the good results on seizure detection, we also started investigating various forms of Brain Computer Interfaces (BCIs). The most overt result here was that ESNs are very good at detecting the so called “rest state”: the interval between specific thoughts. Combining ESNs with Common Spatial Patterns lead to state-of-the-art results in motor imagery BCI [25].

Here we mentioned only applications in engineering and machine learning. This is one of two main directions of utilizing RC, the other being to model biological phenomena in the cognitive and neurosciences. This is often done with more biologically plausible reservoirs made up from spiking neurons, and is mostly associated with the “liquid state machine” flavor of RC. Pioneers in this area are Peter F. Dominey and Wolfgang Maass. Dominey actually was the first to explicitly spell out the RC principle as early as 1995 [8], and ever since he has continued to extend and refine his models of the cortico-striatal processing loop for temporal sequence learning (e.g., [9, 10, 14]). Maass et al. widely explored the RC principle to understand generic computational properties of cortical microcircuits (e.g., [32–34]). Recently he and his group have added reinforcement learning [27] and Bayesian inferencing [4] to the picture of microcircuit adaptation. RC principles have been taken up by other leading researchers in computational neuroscience (e.g., [3, 45]).

8 Resources

Leading European RC groups jointly maintain an RC web portal at http://reservoir-computing.org. Here potential users can find introductory tutorials, an extensive bibliography, an option to subscribe to an RC mailing list, and links to a choice of RC tools. Among the latter we want to point out the OGER engine, a very comprehensive Python-based toolbox with interfaces to a number of standard (spiking) neural simulators (supporting the computational neuroscience branch of RC) and numerous pre-installed validation, regularization, and optimization methods supporting the machine learning side of RC. This engine has been developed within the Organic FP7 project.

References

Atiya AF, Parlos AG (2000) New results on recurrent network training: unifying the algorithms and accelerating convergence. IEEE Trans Neural Netw 11(3):697–709
Article Google Scholar
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
Article Google Scholar
Bernacchia A, Seo H, Lee D, Wang XJ (2011) A reservoir of time constants for memory traces in cortical neurons. Nat Neurosci 14(3):366–372
Article Google Scholar
Buesing L, Bill J, Nessler B, Maass W (2011) Neural dynamics as sampling: a model for stochastic computation in recurrent networks of spiking neurons. PLoS Comput Biol 7(11):e1002211
Article MathSciNet Google Scholar
Bunke H, Varga T (2007) Off-line roman cursive handwriting recognition. In: Chaudhuri BB (ed) Digital document processing, advances in pattern recognition. Springer, Berlin, pp 165–183
Google Scholar
Buonomano DV, Maass W (2009) State-dependent computations: spatiotemporal processing in cortical networks. Nat Rev, Neurosci 10(2):113–125. http://www.ncbi.nlm.nih.gov/pubmed/19145235
Article Google Scholar
Buteneers P, Verstraeten D, van Mierlo P, Wyckhuys T, Stroobandt D, Raedt R, Hallez H, Schrauwen B (2011) Automatic detection of epileptic seizures on the intra-cranial electroencephalogram of rats using reservoir computing. Artif Intell Med 53(3):215–223
Article Google Scholar
Dominey PF (1995) Complex sensory-motor sequence learning based on recurrent state representation and reinforcement learning. Biol Cybern 73:265–274
Article MATH Google Scholar
Dominey PF (2005) From sensorimotor sequence to grammatical construction: evidence from simulation and neurophysiology. Adapt Behav 13(4):347–361
Article Google Scholar
Dominey PF, Ramus F (2000) Neural network processing of natural language. I. Sensitivity to serial, temporal and abstract structure of language in the infant. Lang Cogn Processes 15(1):87–127
Article Google Scholar
Doya K (1992) Bifurcations in the learning of recurrent neural networks. In: Proceedings of IEEE international symposium on circuits and systems 1992, vol 6, pp 2777–2780
Chapter Google Scholar
Fernando C, Sojakka S (2003) Pattern recognition in a bucket. In: Proceedings of the 7th European conference on advances in artificial life (ECAL 2003). LNCS, vol 2801. Springer, Berlin, pp 588–597
Google Scholar
Hermans M, Schrauwen B (2012) Recurrent kernel machines: computing with infinite echo state networks. Neural Comput 24(1):104–133. doi:10.1162/NECO_a_00200
Article MathSciNet MATH Google Scholar
Hinaut X, Dominey PF (2011) A three-layered model of primate prefrontal cortex encodes identity and abstract categorical structure of behavioral sequences. J Physiol 105(1–3):16–24
Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Ijspeert AJ (2008) Central pattern generators for locomotion control in animals and robots: a review. Neural Netw 21:642–653
Article Google Scholar
Ilies I, Jaeger H, Kosuchinas O, Rincon M, Šakėnas V, Vaškevičius N (2007) Stepping forward through echoes of the past: forecasting with echo state networks. http://www.neural-forecasting-competition.com/downloads/NN3/methods/27-NN3_Herbert_Jaeger_report.pdf. Short report on the winning entry to the NN3 financial forecasting competition
Jaeger H (2000) A method for supervised teaching of a recurrent artificial neural network. International patent. http://www.wipo.int/patentscope/search/en/WO2002031764
Jaeger H (2001) The “echo state” approach to analysing and training recurrent neural networks. Tech Rep GMD report 148, German National Research Center for Information Technology. http://www.faculty.jacobs-university.de/hjaeger/pubs/EchoStatesTechRep.pdf
Jaeger H (2002) Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the echo state network approach. GMD Report 159, Fraunhofer Institute AIS. http://minds.jacobs-university.de/pubs
Jaeger H (2007) Echo state network. Scholarpedia 2(9):2330. http://www.scholarpedia.org/article/Echo_state_network
Article Google Scholar
Jaeger H, Haas H (2004) Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science 304(5667):78–80. doi:10.1126/science.1091277
Article Google Scholar
Jaeger H, Lukoševičius M, Popovici D, Siewert U (2007) Optimization and applications of echo state networks with leaky-integrator neurons. Neural Netw 20(3):335–352
Article MATH Google Scholar
Jalalvand A, Triefenbach F, Verstraeten D, Martens JP (2011) Connected digit recognition by means of reservoir computing. In: Proceedings of interspeech 2011, pp 1725–1728
Google Scholar
Kindermans PJ, Buteneers P, Verstraeten D, Schrauwen B (2010) An uncued brain-computer interface using reservoir computing. In: Proceedings of the workshop on machine learning for assistive technologies
Google Scholar
Larger L, Soriano MC, Brunner D, Appeltant L, Gutierrez JM, Pesquera L, Mirasso CR, Fischer I (2012) Photonic information processing beyond Turing: an optoelectronic implementation of reservoir computing. Opt Express 20:3241–3249. doi:10.1364/OE.20.003241
Article Google Scholar
Legenstein R, Chase SM, Schwartz AB, Maass W (2010) A reward-modulated hebbian learning rule can explain experimentally observed network reorganization in a brain control task. J Neurosci 30(25):8400–8410
Article Google Scholar
Li J, Jaeger H (2011) Minimal energy control of an ESN pattern generator. Technical report 26, Jacobs University Bremen, School of Engineering and Science
Lukoševičius M (2011) Reservoir computing and self-organized neural hierarchies. PhD Thesis, Jacobs University Bremen, Bremen, Germany
Lukoševičius M, Jaeger H (2009) Reservoir computing approaches to recurrent neural network training. Comput Sci Rev 3(3):127–149. doi:10.1016/j.cosrev.2009.03.005
Article Google Scholar
Lukoševičius M, Popovici D, Jaeger H, Siewert U (2006) Time warping invariant echo state networks. IUB technical report 2, International University Bremen. http://minds.jacobs-university.de/pubs
Maass W (2011) Motivation, theory, and applications of liquid state machines. In: Cooper B, Sorbi A (eds) Computability in context: computation and logic in the real world. Imperial College Press, London, pp 275–296
Chapter Google Scholar
Maass W, Joshi P, Sontag E (2007) Computational aspects of feedback in neural circuits. PLoS Comput Biol 3(1):1–20
Article MathSciNet Google Scholar
Maass W, Natschläger T, Markram H (2002) Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput 14(11):2531–2560. doi:10.1162/089976602760407955
Article MATH Google Scholar
Martens J, Sutskever I (2011) Learning recurrent neural networks with Hessian-free optimization. In: Proc 28th int conf on machine learning. http://www.icml-2011.org/papers/532_icmlpaper.pdf
Google Scholar
Paquot Y, Duport F, Smerieri A, Dambre J, Schrauwen B, Haelterman M, Massar S (2012) Optoelectronic reservoir computing. Sci Rep 2:287. doi:10.1038/srep00287. http://www.nature.com/srep/2012/120227/srep00287/full/srep00287.html
Article Google Scholar
Salmen M, Plöger P (2005) Echo state networks used for motor control. In: Proc IEEE int conf on robotics and automation (ICRA), pp 1953–1958
Chapter Google Scholar
Schiller UD, Steil JJ (2005) Analyzing the weight dynamics of recurrent learning algorithms. Neurocomputing 63C:5–23
Article Google Scholar
Schrauwen B, D‘Haene M, Verstraeten D, Stroobandt D (2008) Compact hardware liquid state machines on FPGA for real-time speech recognition. Neural Netw 21(2–3):511–523
Article Google Scholar
Schürmann F, Meier K, Schemmel J (2005) Edge of chaos computation in mixed-mode VLSI—a hard liquid. In: Advances in neural information processing systems (NIPS 2004), vol 17. MIT Press, Cambridge, pp 1201–1208
Google Scholar
Shi Z, Han M (2007) Support vector echo-state machine for chaotic time-series prediction. IEEE Trans Neural Netw 18(2):359–372
Article Google Scholar
Skowronski MD, Harris JG (2007) Automatic speech recognition using a predictive echo state network classifier. Neural Netw 20(3):414–423. doi:10.1016/j.neunet.2007.04.006
Article MATH Google Scholar
Steil JJ (2004) Backpropagation-decorrelation: recurrent learning with O(N) complexity. In: Proceedings of the IEEE international joint conference on neural networks (IJCNN 2004), vol 2, pp 843–848
Google Scholar
Stieg AZ, Avizienis AV, Sillin HO, Martin-Olmos C, Aono M, Gimzewski JK (2012) Emergent criticality in complex Turing B-type atomic switch networks. Adv Mater 24(2):286–293. doi:10.1002/adma.201103053
Article Google Scholar
Sussillo D, Abbott LF (2009) Generating coherent patterns of activity from chaotic neural networks. Neuron 63(4):544–557. doi:10.1016/j.neuron.2009.07.018
Article Google Scholar
Triefenbach F, Jalalvand A, Schrauwen B, Martens JP (2010) Phoneme recognition with large hierarchical reservoirs. In: Advances in neural information processing systems (NIPS 2010), vol 23. MIT Press, Cambridge, pp 2307–2315. 2011
Google Scholar
Vandoorne K, Dierckx W, Schrauwen B, Verstraeten D, Baets R, Bienstman P, Campenhout JV (2008) Toward optical signal processing using photonic reservoir computing. Opt Express 16(15):11182–11192
Article Google Scholar
Verstraeten D (2009) Reservoir computing: computation with dynamical systems. PhD Thesis, Electronics and Information Systems, University of Ghent. http://organic.elis.ugent.be/biblio
Verstraeten D, Schrauwen B, D’Haene M, Stroobandt D (2007) An experimental unification of reservoir computing methods. Neural Netw 20(3):391–403
Article MATH Google Scholar
Verstraeten D, Schrauwen B, Stroobandt D (2006) Reservoir-based techniques for speech recognition. In: Proceedings of the IEEE international joint conference on neural networks (IJCNN 2006), pp 1050–1053
Chapter Google Scholar
Verstraeten D, Schrauwen B, Stroobandt D, Van Campenhout J (2005) Isolated word recognition with the liquid state machine: a case study. Inf Process Lett 95(6):521–528
Article MATH Google Scholar
Werbos PJ (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78(10):1550–1560
Article Google Scholar
Williams RJ, Zipser D (1989) A learning algorithm for continually running fully recurrent neural networks. Neural Comput 1:270–280
Article Google Scholar

Download references

Acknowledgements

The authors acknowledge support by the European FP7 project ORGANIC (http://reservoir-computing.org/organic). Patent note. The basic ESN architecture and algorithm are protected for commercial use by international patents held by the Fraunhofer Society [18].

Author information

Authors and Affiliations

Jacobs University Bremen, Campus Ring 1, 28759, Bremen, Germany
Mantas Lukoševičius & Herbert Jaeger
Ghent University, Sint Pietersnieuwstraat 41, 9000, Ghent, Belgium
Benjamin Schrauwen

Authors

Mantas Lukoševičius
View author publications
You can also search for this author in PubMed Google Scholar
Herbert Jaeger
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Schrauwen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mantas Lukoševičius.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lukoševičius, M., Jaeger, H. & Schrauwen, B. Reservoir Computing Trends. Künstl Intell 26, 365–371 (2012). https://doi.org/10.1007/s13218-012-0204-5

Download citation

Received: 01 March 2012
Accepted: 03 May 2012
Published: 16 May 2012
Issue Date: November 2012
DOI: https://doi.org/10.1007/s13218-012-0204-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Reservoir Computing Trends

Abstract

Similar content being viewed by others