Abstract
Reservoir Computing (RC) is a high-speed machine learning framework for temporal data processing. Especially, the Echo State Network (ESN), which is one of the RC models, has been successfully applied to many temporal tasks. However, its prediction ability depends heavily on hyperparameter values. In this work, we propose a new ESN training method inspired by Generative Adversarial Networks (GANs). Our method intends to minimize the difference between the distribution of teacher data and that of generated samples, and therefore we can generate samples that reflect the dynamics in the teacher data. We apply a feedforward neural network as a discriminator so that we don’t need to use backpropagation through time in training. We justify the effectiveness of the proposed method in time series prediction tasks.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Echo State Network
- Recurrent Neural Network
- Generative Adversarial Network
- Nonlinear time series prediction
1 Introduction
Reservoir Computing (RC) has been widely researched as a fast machine learning method. RC is the Recurrent Neural Network (RNN) based framework that only trains the linear readout layer and fixes parameters in other layers before training. RC shows excellent performance in various benchmark tasks despite its simple training algorithm. Moreover, another major advantage is that RC is suitable for hardware implementation with a wide variety of physical systems [8].
Especially, Echo State Networks (ESNs) were initially proposed by Jaeger [2] and it was shown that they were useful in nonlinear time series prediction [4]. The underlying principle is that a well-trained ESN is able to reproduce the attractor of given nonlinear dynamical systems. However, the performance of the ESNs is significantly sensitive to the settings of hyperparameters such as the input scaling and the spectral radius of the reservoir connection weight matrix. In the case of physical implementation, it is hard to change and adjust such hyperparameters. Therefore, a reduction of the hyperparameter sensitivity only by changing a training method is regarded as an important research topic.
In this work, we incorporate the concept of Generative Adversarial Networks (GANs) [1] into ESNs to solve the abovementioned problem. A GAN consists of two networks, a discriminator to distinguish between teacher data and generated samples and a generator to deceive the discriminator. Original GANs can minimize the Jensen-Shannon divergence between the real data distribution and the generated one instead of minimizing the squared error. In our method, a generator is an ESN and a Deep Neural Network (DNN) based discriminator distinguishes between the real time series data and those generated by the ESN. Then we use the weighted sum of the conventional squared error and the adversarial loss. By introducing the adversarial loss, it is expected that the ESN can generate samples which reflect the dynamics underlying the given data better than the conventional ESN training based on the least squared error.
There are three major advantages in the proposed method. First, the prediction accuracy can be improved even when the settings of hyperparameters in the ESN are inappropriate. Only by introducing the adversarial loss in the training step, we can construct a high-quality predictor with ‘bad’ reservoirs. Second, the computational cost for training in our method is much smaller than that of RNNs. We use simple feedforward neural networks as a discriminator, instead of temporal neural networks like RNNs, to avoid using computationally expensive backpropagation through time (BPTT) [9] in training. Simultaneously, we introduce the concept of time-delay embeddings to construct the input to a discriminator. Therefore, we can consider time-dependent features in a discriminator without BPTT. Third, trained parameters in the ESN are only those in the readout layer, and therefore our method can be applied to other types of reservoir models and physical reservoirs.
We demonstrate the effectiveness of our method for benchmark nonlinear time series prediction tasks. Especially, when the settings of hyperparameters are not so appropriate, our training method outperforms the conventional one.
2 Methods
2.1 Echo State Network
The ESN uses an RNN-based reservoir composed of discrete-time artificial neurons. The time evolution of the reservoir state vector is described as follows:
where \({\varvec{r}}(t)\) denotes the state vector of reservoir units, \({\varvec{W}}_{in}\) is the weight matrix of the input layer, and \({\varvec{A}}\) is the reservoir connection weight matrix. The readout layer gives the linear combination of the states of reservoir neurons with \({\varvec{W}}_{out}\), which denotes the weight matrix of the readout layer.
In the training period \(-T \le t < 0\), the readout weight is adjusted so as to minimize the squared error between the predicted value \(\hat{{\varvec{y}}}(t)\) and the teacher output \({{\varvec{y}}}(t)\) as follows:
where \(\beta > 0\) is the Tikhonov regularization parameter to avoid overfitting.
2.2 Echo State Network with Adversarial Training
In our method, adversarial training [6] is used to optimize the readout weight in the ESN. Figure 1 shows the architecture of the proposed model. The discriminator D is trained such that the output represents the probability that the input is drawn from teacher data. \({\varvec{u}}(t)\) and \(\hat{{\varvec{y}}}(t)\) are embedded into the same time-delay coordinates, and then the concatenation of them are fed to the discriminator as
where n and \(\tau \) represent the dimension and time delay of the time-delay coordinates, respectively.
We define the discriminator loss as follows:
Then we define the generator loss as follows:
where \(E_{L_{ADV}}\) and \(E_{L_{SE}}\) denote the expectation values of \(L_{G}^{ADV}\) and \(L_{G}^{SE}\), respectively. \(w_D\) is the weight of the adversarial loss in the generator loss. The discriminator loss \(L_D\) and the generator loss \(L_G\) are minimized alternately. Note that only \({\varvec{W}}_{out}\) is optimized in the training of the generator. The procedure is formally presented in Algorithm 1.
3 Results
We demonstrate the effectiveness of the proposed method in prediction tasks with two benchmark time series, NARMA10 and the Lorenz system. In these two experiments, we set the reservoir size at 100 and the scarcity of the connection weight matrix at 0.95. Before adversarial training, we pretrained the output weight in the ESN using Tikhonov regularization with \(\beta =10^{-4}\). The architecture of the discriminator model is a feedforward network that consists of four hidden layers of 32 ReLU units and we set \(n=20\).
3.1 NARMA10
The NARMA10 task [3] is the identification of the order-10 discrete-time nonlinear dynamical system. We used \(-900 \le t < 0\) for training and \(0 \le t < 100\) for testing. In this task, we set the input scaling of the ESN at 1.0. Time delay in embeddings is \(\tau =1\). We conducted experiments for two different cases, where the spectral radius is good (\(\rho =0.8\)) and bad (\(\rho =0.4\)).
Figure 2 shows the root mean squared error (RMSE) for the NARMA10 task, plotted against the value of \(w_D\). In the case with \(\rho =0.8\), we can see that the prediction performance is improved by the introduction of the adversarial loss in some settings. In addition, in the case with \(\rho =0.4\), the prediction accuracy for \(0.05 \le w_D \le 0.6\) is lower than the case when we use only the squared error. From this result, we can conclude that the adversarial loss in the ESN training improves the prediction accuracy, especially when the settings of the hyperparameter in the ESN are not so good.
3.2 Lorenz Systems
The Lorenz system [5] is a continuous-time nonlinear dynamical system which shows chaotic bahavior and is described by the following differential equations:
In this experiment, we predict the first variable x(t) to evaluate the performance of the proposed model. We used \(-100 \le t < 0\) for training and \(0 \le t < 25\) for testing and set \(\varDelta t=0.02\). The input scaling of the ESN is 0.1 and time delay \(\tau \) is 0.08. We conducted an experiment for a bad parameter setting where the spectral radius \(\rho =0.4\).
Figure 3 shows the RMSE in this task, plotted against the value of \(w_D\). The proposed method improves the prediction performance compared with the ESN with the conventional training in a wide range of \(w_D\) (the optimal setting of \(w_D\) appears 0.25). Our proposed method uses the concept of time-delay embeddings, and thus we can conclude that the generator can reflect the overall dynamics even when we can observe only one variable on the basis of Takens Embedding Theorem [7].
4 Conclusion
In this work, we proposed a new ESN training method using adversarial training where the loss function is described as the weighted sum of the conventional squared error and the adversarial loss. Then we demonstrated that the proposed method can improve the prediction accuracy in nonlinear time series prediction tasks. In future work, we will test another model as a discriminator and check the effectiveness for other tasks.
References
Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27, 2672–2680 (2014)
Jaeger, H.: The “echo state” approach to analysing and training recurrent neural networks. GMD Report 148, GMD - German National Research Institute for Computer Science (2001)
Jaeger, H.: Adaptive nonlinear system identification with echo state networks. In: Proceedings of the 15th International Conference on Neural Information Processing Systems, pp. 609–616 (2002)
Jaeger, H., Haas, H.: Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science 304(5667), 78–80 (2004)
Lorenz, E.N.: Deterministic nonperiodic flow. J. Atmos. Sci. 20(2), 130–141 (1963)
Saito, Y., Takamichi, S., Saruwatari, H.: Statistical parametric speech synthesis incorporating generative adversarial networks. IEEE/ACM Trans. Audio Speech Lang. Process. 26(1), 84–96 (2018)
Takens, F.: Detecting strange attractors in turbulence. In: Rand, D., Young, L.-S. (eds.) Dynamical Systems and Turbulence, Warwick 1980. LNM, vol. 898, pp. 366–381. Springer, Heidelberg (1981). https://doi.org/10.1007/BFb0091924
Tanaka, G., et al.: Recent advances in physical reservoir computing: a review. Neural Netw. 115, 100–123 (2019)
Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)
Acknowledgment
This work was partially supported by JSPS KAKENHI Grant Number JP16K00326 (GT) and based on results obtained from a project subsidized by the New Energy and Industrial Technology Development Organization (NEDO).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Akiyama, T., Tanaka, G. (2019). Echo State Network with Adversarial Training. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions. ICANN 2019. Lecture Notes in Computer Science(), vol 11731. Springer, Cham. https://doi.org/10.1007/978-3-030-30493-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-30493-5_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30492-8
Online ISBN: 978-3-030-30493-5
eBook Packages: Computer ScienceComputer Science (R0)