Error Entropy Minimization for LSTM Training

Alexandre, Luís A.; de Sá, J. P. Marques

doi:10.1007/11840817_26

Luís A. Alexandre²⁰ &
J. P. Marques de Sá²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4131))

Included in the following conference series:

International Conference on Artificial Neural Networks

3378 Accesses
6 Citations

Abstract

In this paper we present a new training algorithm for the Long Short-Term Memory (LSTM) recurrent neural network. This algorithm uses entropy instead of the usual mean squared error as the cost function for the weight update. More precisely we use the Error Entropy Minimization approach, were the entropy of the error is minimized after each symbol is present to the network. Our experiments show that this approach enables the convergence of the LSTM more frequently than with the traditional learning algorithm. This in turn relaxes the burden of parameter tuning since learning is achieved for a wider range of parameter values. The use of EEM also reduces, in some cases, the number of epochs needed for convergence.

This work was supported by the Portuguese FCT-Fundação para a Ciência e Tecnologia (project POSC/EIA/56918/2004).

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Overview of Long Short-Term Memory Neural Networks

LSTM-Based Language Models for Spontaneous Speech Recognition

A comparative performance analysis of different activation functions in LSTM networks for classification

Article 19 October 2017

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)
Article Google Scholar
Gers, F., Schmidhuber, J., Cummins, F.: Learning to forget: Continual prediction with LSTM. Neural Computation 12(10), 2451–2471 (2000)
Article Google Scholar
Gers, F., Schmidhuber, J.: Recurrent nets that time and count. In: Proc. IJCNN 2000, Int. Joint Conf. on Neural Networks, Como, Italy (2000)
Google Scholar
Pérez-Ortiz, J., Gers, F., Eck, D., Schmidhuber, J.: Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets. Neural Networks 16(2), 241–250 (2003)
Article Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall, Englewood Cliffs (1999)
MATH Google Scholar
Erdogmus., D., Principe, J.: An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems. IEEE Trans. Signal Processing 50(7), 1780–1786 (2002)
Article MathSciNet Google Scholar
Santos, J., Alexandre, L., Sereno, F., de Sá, J.M.: Optimization of the error entropy minimization algorithm for neural network classification. In: ANNIE 2004, St.Louis, USA. Intelligent Engineering Systems Through Artificial Neural Networks, vol. 14, pp. 81–86. ASME Press Series, St. Louis (2004)
Google Scholar
Santos, J., Alexandre, L., de Sá, J.M.: The error entropy minimization algorithm for neural network classification. In: Lofti, A. (ed.) Proceedings of the 5th International Conference on Recent Advances in Soft Computing, Nottingham, United Kingdom, pp. 92–97 (2004)
Google Scholar
Silva, L., de Sá, J.M., Alexandre, L.: Neural network classification using Shannon’s entropy. In: 13th European Symposium on Artificial Neural Networks - ESANN 2005, Bruges, Belgium, pp. 217–222 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics and IT-Networks and Multimedia Group, University of Beira Interior, Covilhã, Portugal
Luís A. Alexandre
Faculty of Engineering and INEB, University of Porto, Portugal
J. P. Marques de Sá

Authors

Luís A. Alexandre
View author publications
You can also search for this author in PubMed Google Scholar
J. P. Marques de Sá
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Computer Engineering, Image, Video and Multimedia Systems Laboratory, National Technical University of Athens, GR-157 80, Zographou, Greece
Stefanos D. Kollias
Department of Electrical and Computer Engineering, National Technical University of Athens, 15780, Zographou, Greece
Andreas Stafylopatis
Department of Informatics, Nicolaus Copernicus University, Toruń, Poland
Włodzisław Duch
Adaptive Informatics Research Centre, Helsinki University of Technology, HUT, P.O. Box 5400, 02015, Finland
Erkki Oja

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alexandre, L.A., de Sá, J.P.M. (2006). Error Entropy Minimization for LSTM Training. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds) Artificial Neural Networks – ICANN 2006. ICANN 2006. Lecture Notes in Computer Science, vol 4131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11840817_26

Download citation

DOI: https://doi.org/10.1007/11840817_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-38625-4
Online ISBN: 978-3-540-38627-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Error Entropy Minimization for LSTM Training

Abstract

Chapter PDF

Similar content being viewed by others

Overview of Long Short-Term Memory Neural Networks

LSTM-Based Language Models for Spontaneous Speech Recognition

A comparative performance analysis of different activation functions in LSTM networks for classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Error Entropy Minimization for LSTM Training

Abstract

Chapter PDF

Similar content being viewed by others

Overview of Long Short-Term Memory Neural Networks

LSTM-Based Language Models for Spontaneous Speech Recognition

A comparative performance analysis of different activation functions in LSTM networks for classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation