Abstract
We show the results of studying models of the Russian language constructed with recurrent artificial neural networks for systems of automatic recognition of continuous speech. We construct neural network models with different number of elements in the hidden layer and perform linear interpolation of neural network models with the baseline trigram language model. The resulting models were used at the stage of rescoring the N best list. In our experiments on the recognition of continuous Russian speech with extra-large vocabulary (150 thousands of word forms), the relative reduction in the word error rate obtained after rescoring the 50 best list with the neural network language models interpolated with the trigram model was 14%.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Kholodenko, A.B., On the Construction of Statistical Language Models for Russian Speech Recognition Systems, Intellektual’nye Sist., 2001, vol. 6, nos. 1–4, pp. 381–394.
Kipyatkova, I.S. and Karpov, A.A., Automatic Processing and Statistic Analysis of a News Text Corpus for a Language Model of a Russian Language Speech Recognition System, Inf.-Upravl. Sist., 2010, no. 4(47), pp. 2–8.
Kozlova, N.V., Lingustic Corpora: Typology and Terms, Vestn. NGU, Ser.: Lingvist. Mezhkul’t. Kommunikatsiya, 2013, no. 1(11), pp. 79–88.
Krivnova, O.F., Speech Corpora on a New Technological Cycle, Rechevye Tekhnol., 2008, no. 2, pp. 13–23.
Schwenk, H. and Gauvain, J.-L., Training Neural Network Language Models on Very Large Corporation, Proc. Conf. on Empirical Methods in Natural Language Processing, 2005, pp. 201–208.
Mikolov, T., Karafiát, M., Burget, L., Černocký, J., and Khudanpur, S., Recurrent Neural Network Based Language Model, Proc. INTERSPEECH’2010, 2010, pp. 1045–1048.
Sundermeyer, M., Oparin, I., Gauvain, J.-L., Freiberg, B., Schluter, R., and Ney, H., Comparison of Feedforward and Recurrent Neural Network LanguageModels, Proc. ICASSP’2013, 2013, pp. 8430–8434.
Shi, Y., Larson, M., Wiggers, P., and Jonker, C.M., Exploiting the Succeeding Words in Recurrent Neural Network, Proc. INTERSPEECH’2013, 2013, pp. 632–636.
Mikolov, T., Deoras, A., Povey, D., Burget, L., and Černocký, J., Strategies for Training Large Scale Neural Network Language Models, Proc. ASRU’2011, 2011, pp. 196–201.
Huang, Z., Zweig, G., and Dumoulin, B., Cache Based Recurrent Neural Network Language Model Inference for First Pass Speech Recognition, in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 6404–6408.
Morioka, T., Iwata, T., Hori, T., and Kobayashi, T., Multiscale Recurrent Neural Network Based Language Model, Proc. INTERSPEECH-2015, 2015, pp. 2366–2370.
Titov, Y., Kilgour, K., Stüker, S., and Waibel, A., The 2011 Kit Quaero Speech-to-Text System for the Russian Language, Proc. 14 Int. Conf. “Speech and Computer” (SPECOM’2011), 2011, pp. 136–143.
Karpov, A.A., An Automatic Multimodal Speech Recognition System with Audio and Video Information, Autom. Remote Control, 2014, vol. 75, no. 12, pp. 2190–2200.
Vazhenina, D. and Markov, K., Evaluation of Advanced Language Modelling Techniques for Russian LVCSR, Proc. SPECOM 2013, 2013, pp. 124–131.
Gandhe, A., Metze, F., and Lane, I., Neural Network Language Models for Low Resource Languages, Proc. INTERSPEECH-2014, 2014, pp. 2615–2619.
Elman, J.L., Finding Structure in Time, Cognit. Sci., 1990, vol. 14, pp. 179–211.
Schmidhuber, J., Deep Learning in Neural Networks: An Overview, Neural Networks, 2015, vol. 61, pp. 85–117.
Kipyatkova, I.S. and Karpov, A.A., Automatic Russian Speech Recognition Using Factored Language Models, Iskusstv. Intellekt Prinyatie Reshenii, 2015, no. 3, pp. 62–69.
Stolcke, A., Zheng, J., Wang, W., and Abrash, V., SRILM at Sixteen: Update and Outlook, Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’2011), 2011.
Kipyatkova, I. and Karpov, A., Lexicon Size and Language Model Order Optimization for Russian LVCSR, Proc. SPECOM-2013, 2013, pp. 219–226.
Mikolov, T., Kombrink, S., Deoras, A., Burget, L., and Černocký, J., RNNLM—Recurrent Neural Network Language Modeling Toolkit, Proc. 2011 ASRU Workshop, 2011, pp. 196–201.
Mikolov, T., Kombrink, S., Burget, L., Černocký, J.H., and Khudanpur, S., Extensions of Recurrent Neural Network Language Model, Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2011, pp. 5528–5531.
Kipyatkova, I. and Karpov, A., A Comparison of RNNLM and FLM for Russian Speech Recognition, Proc. SPECOM-2015, 2015, pp. 42–50.
Kipyatkova, I. and Karpov, A., Recurrent Neural Network-based Language Modeling for an Automatic Russian Speech Recognition System, Proc. Int. Conf. AINL-ISMW FRUCT, 2015, pp. 33–38.
Moore, G.L., Adaptive Statistical Class-based Language Modelling, PhD Dissertation, Cambridge: Cambridge Univ., 2001.
Vazhenina, D.A., Kipyatkova, I.C., Markov, K.P., and Karpov, A.A., Technique for Phoneme Set Selection for Automatic Russian Speech Recognition, Tr. SPIIRAN, 2014, vol. 36, pp. 92–113.
Young, S. et al., The HTK Book (for HTK Version 3.4), Cambridge, UK, 2009.
Lee, A. and Kawahara, T., Recent Development of Open-Source Speech Recognition Engine Julius, Proc. Asia-Pacific Signal and Information Processing Association, Annual Summit and Conference (APSIPA ASC 2009), 2009, pp. 131–137.
Yusupov, R.M., Ronzhin, A.L., Prishchepa, M.V., and Ronzhin, Al.L., Models and Hardware-Software Solutions for Automatic Control of Intelligent Hall, Autom. Remote Control, 2011, vol. 72, no. 7, pp. 1389–1397.
Bilik, R.V., Zhozhikashvili, V.A., Petukhova, N.V., and Farkhadov, M.P., Analysis of the Oral Interface in the Interactive Servicing Systems. I, Autom. Remote Control, 2009, vol. 70, no. 2, pp. 244–252.
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © I.S. Kipyatkova, A.A. Karpov, 2017, published in Avtomatika i Telemekhanika, 2017, No. 5, pp. 110–122.
Rights and permissions
About this article
Cite this article
Kipyatkova, I.S., Karpov, A.A. A study of neural network Russian language models for automatic continuous speech recognition systems. Autom Remote Control 78, 858–867 (2017). https://doi.org/10.1134/S0005117917050083
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0005117917050083