A Comparison of Deep Neural Network Training Methods for Large Vocabulary Speech Recognition

Tóth, László; Grósz, Tamás

doi:10.1007/978-3-642-40585-3_6

László Tóth²⁰ &
Tamás Grósz²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8082))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

2522 Accesses
12 Citations

Abstract

The introduction of deep neural networks to acoustic modelling has brought significant improvements in speech recognition accuracy. However, this technology has huge computational costs, even when the algorithms are implemented on graphic processors. Hence, finding the right training algorithm that offers the best performance with the lowest training time is now an active area of research. Here, we compare three methods; namely, the unsupervised pre-training algorithm of Hinton et al., a supervised pre-training method that constructs the network layer-by-layer, and deep rectifier networks, which differ from standard nets in their activation function. We find that the three methods can achieve a similar recognition performance, but have quite different training times. Overall, for the large vocabulary speech recognition task we study here, deep rectifier networks offer the best tradeoff between accuracy and training time.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Noisy training for deep neural networks in speech recognition

Article Open access 20 January 2015

A Primer on Deep Learning Architectures and Applications in Speech Processing

Article 11 June 2019

ANNs for Automatic Speech Recognition—A Survey

Keywords

References

Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine 29, 82–97 (2012)
Article Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proc. AISTATS, pp. 249–256 (2010)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Computation 18, 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Mohamed, A.R., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio, Speech, and Language Processing 20, 14–22 (2012)
Article Google Scholar
Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: Proc. ASRU, pp. 24–29 (2011)
Google Scholar
Jaitly, N., Nguyen, P., Senior, A., Vanhoucke, V.: Application of pretrained deep neural networks to large vocabulary conversational speech recognition. Technical report, Dept. Comp. Sci., University of Toronto (2012)
Google Scholar
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech, and Language Processing 20, 30–42 (2012)
Article Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier networks. In: Proc. AISTATS, pp. 315–323 (2011)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proc. ICML, pp. 807–814 (2010)
Google Scholar
Tóth, L.: Phone recognition with deep sparse rectifier neural networks. In: Proc. ICASSP (accepted, in print, 2013)
Google Scholar
Bourlard, H., Morgan, N.: Connectionist speech recognition: a hybrid approach. Kluwer Academic (1994)
Google Scholar
Young, S., et al.: The HTK book. Cambridge Univ. Engineering Department (2005)
Google Scholar
Abari, K., Olaszy, G., Zainkó, C., Kiss, G.: Hungarian pronunciation dictionary on Internet. In: Proc. MSZNY, pp. 223–230 (2006) (in Hungarian)
Google Scholar

Download references

Author information

Authors and Affiliations

MTA-SZTE Research Group on Artificial Intelligence, Hungarian Academy of Sciences and University of Szeged, Hungary
László Tóth & Tamás Grósz

Authors

László Tóth
View author publications
You can also search for this author in PubMed Google Scholar
Tamás Grósz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of West Bohemia, 306 14, Pilsen, Czech Republic
Ivan Habernal & Václav Matoušek &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tóth, L., Grósz, T. (2013). A Comparison of Deep Neural Network Training Methods for Large Vocabulary Speech Recognition. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-40585-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40584-6
Online ISBN: 978-3-642-40585-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Comparison of Deep Neural Network Training Methods for Large Vocabulary Speech Recognition

Abstract

Chapter PDF

Similar content being viewed by others

Noisy training for deep neural networks in speech recognition

A Primer on Deep Learning Architectures and Applications in Speech Processing

ANNs for Automatic Speech Recognition—A Survey

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Comparison of Deep Neural Network Training Methods for Large Vocabulary Speech Recognition

Abstract

Chapter PDF

Similar content being viewed by others

Noisy training for deep neural networks in speech recognition

A Primer on Deep Learning Architectures and Applications in Speech Processing

ANNs for Automatic Speech Recognition—A Survey

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation