Abstract
Automatic speech recognition systems are of two types, such as monolingual and multilingual. Due to its ability to use transfer learning techniques and create better SR models for resource-scarce languages, multilingual speech recognition has recently become more prevalent. Generally, multilingual speech recognition models use specific parameters and modules which activate depending on the language given to the models. These models lack efficiency if language identity is not specified and also lacks the ability to recognize code-switch speech. In this work, we propose a multilingual model for English and Indian languages that can convert speech to text without specifying the language identity and recognize code-switch speech through transliterated and English text as input transcription. The transliterated text can help the model to learn and map sound to its appropriate English character in the case of multilingual and code-switched speech. This multilingual model uses the DeepSpeech architecture by Baidu. The lowest word error rate (WER) and character error rate (CER) for the best model were 30.5% and 11.69%, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kumar, C.S., Wei, F.S.: A bilingual speech recognition system for English and Tamil. In: Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint, pp. 1641–1644. IEEE (2003)
Udhaykumar, N., Ramakrishnan, S.K., Swaminathan, R.: Multilingual speech recognition for information retrieval in Indian context. In: Proceedings of the Student Research Workshop at HLT-NAACL, pp. 1–6 (2004)
Burget, L., Schwarz, P., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., Glembek, O., Goel, N., Karafiát, M., Povey, D., Rastrow, A.: Multilingual acoustic modeling for speech recognition based on subspace Gaussian mixture models. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4334–4337. IEEE (2010)
Kumar, C.S., Mohandas, V.P., Li, H.: Multilingual speech recognition: a unified approach. In: Ninth European Conference on Speech Communication and Technology, pp. 3357–3360 (2005)
Heigold, G., Vanhoucke, V., Senior, A., Nguyen, P., Ranzato, M.A., Devin, M., Dean, J.: Multilingual acoustic models using distributed deep neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8619–8623. IEEE (2013)
Toshniwal, S., Sainath, T.N., Weiss, R.J., Li, B., Moreno, P., Weinstein, E., Rao, K.: Multilingual speech recognition with a single end-to-end model. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4904–4908. IEEE (2018)
Kannan, A., Datta, A., Sainath, T.N., Weinstein, E., Ramabhadran, B., Wu, Y., Bapna, A., Chen, Z., Lee, S.: Large-scale multilingual speech recognition with a streaming end-to-end model (2019). arXiv preprint arXiv:1909.05330
Cho, J., Baskar, M.K., Li, R., Wiesner, M., Mallidi, S.H., Yalta, N., Karafiat, M., Watanabe, S., Hori, T.: Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling. In: 2018 IEEE Spoken Language Technology Workshop, pp. 521–527. IEEE (2018)
Antony, P.J., Soman, K.P.: Machine transliteration for Indian languages: a literature survey. Int. J. Sci. Eng. Res. 2, 1–8 (2011)
Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., Ng, A.Y.: Deep speech: scaling up end-to-end speech recognition (2014). arXiv preprint arXiv:1412.5567
Agarwal, A., Zesch, T.: German end-to-end speech recognition based on DeepSpeech. In: KONVENS (2019)
Xu, J., Matta, K., Islam, S., Nürnberger, A.: German speech recognition system using DeepSpeech. In: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval, pp. 102–106. Association for Computing Machinery, New York (2020)
Iakushkin, O., Fedoseev, G., Shaleva, A., Degtyarev, A., Sedova, O.: Russian language speech recognition system based on deepspeech. In: Proceedings of the VIII International Conference on Distributed Computing and Grid-technologies in Science and Education (2018)
Bhat, I.A., Mujadia, V., Tammewar, A., Bhat, R.A., Shrivastava, M.: IIIT-h system submission for fire 2014 shared task on transliterated search. In: Proceedings of the Forum for Information Retrieval Evaluation, pp. 48–53 (2014)
Common Voice Download page. https://voice.mozilla.org/en/datasets. Last accessed 18 Apr 2021
Microsoft Speech Corpus for Indian languages. https://msropendata.com/datasets/7230b4b1-912d-400e-be58-f84e0512985e. Last accessed 15 Apr 2021
IIT Madras. Indic tts, https://www.iitm.ac.in/donlab/tts/. Last accessed 26 Apr 2021
Yadav, M., Kumar, R., Yadav, P.S.: Gender identification over voice sample using machine learning. In: Proceedings of International Conference on Computational Intelligence and Data Engineering, pp. 111–121. Springer, Singapore (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Priyamvada, R., Kumar, S.S., Ganesh, H.B.B., Soman, K.P. (2022). Multilingual Speech Recognition for Indian Languages. In: Gupta, D., Sambyo, K., Prasad, M., Agarwal, S. (eds) Advanced Machine Intelligence and Signal Processing. Lecture Notes in Electrical Engineering, vol 858. Springer, Singapore. https://doi.org/10.1007/978-981-19-0840-8_41
Download citation
DOI: https://doi.org/10.1007/978-981-19-0840-8_41
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-0839-2
Online ISBN: 978-981-19-0840-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)