Multilingual Speech Recognition for Indian Languages

Priyamvada, R.; Kumar, S. Sachin; Ganesh, H. B. Barathi; Soman, K. P.

doi:10.1007/978-981-19-0840-8_41

R. Priyamvada ORCID: orcid.org/0000-0003-4455-4104⁴¹,
S. Sachin Kumar⁴¹,
H. B. Barathi Ganesh⁴² &
…
K. P. Soman⁴¹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 858))

1153 Accesses
4 Citations

Abstract

Automatic speech recognition systems are of two types, such as monolingual and multilingual. Due to its ability to use transfer learning techniques and create better SR models for resource-scarce languages, multilingual speech recognition has recently become more prevalent. Generally, multilingual speech recognition models use specific parameters and modules which activate depending on the language given to the models. These models lack efficiency if language identity is not specified and also lacks the ability to recognize code-switch speech. In this work, we propose a multilingual model for English and Indian languages that can convert speech to text without specifying the language identity and recognize code-switch speech through transliterated and English text as input transcription. The transliterated text can help the model to learn and map sound to its appropriate English character in the case of multilingual and code-switched speech. This multilingual model uses the DeepSpeech architecture by Baidu. The lowest word error rate (WER) and character error rate (CER) for the best model were 30.5% and 11.69%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Hardcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Analyzing Multilingual Automatic Speech Recognition Systems Performance

Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling

Article Open access 12 January 2022

An end-to-end model for cross-lingual transformation of paralinguistic information

Article 06 April 2018

References

Kumar, C.S., Wei, F.S.: A bilingual speech recognition system for English and Tamil. In: Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint, pp. 1641–1644. IEEE (2003)
Google Scholar
Udhaykumar, N., Ramakrishnan, S.K., Swaminathan, R.: Multilingual speech recognition for information retrieval in Indian context. In: Proceedings of the Student Research Workshop at HLT-NAACL, pp. 1–6 (2004)
Google Scholar
Burget, L., Schwarz, P., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., Glembek, O., Goel, N., Karafiát, M., Povey, D., Rastrow, A.: Multilingual acoustic modeling for speech recognition based on subspace Gaussian mixture models. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4334–4337. IEEE (2010)
Google Scholar
Kumar, C.S., Mohandas, V.P., Li, H.: Multilingual speech recognition: a unified approach. In: Ninth European Conference on Speech Communication and Technology, pp. 3357–3360 (2005)
Google Scholar
Heigold, G., Vanhoucke, V., Senior, A., Nguyen, P., Ranzato, M.A., Devin, M., Dean, J.: Multilingual acoustic models using distributed deep neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8619–8623. IEEE (2013)
Google Scholar
Toshniwal, S., Sainath, T.N., Weiss, R.J., Li, B., Moreno, P., Weinstein, E., Rao, K.: Multilingual speech recognition with a single end-to-end model. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4904–4908. IEEE (2018)
Google Scholar
Kannan, A., Datta, A., Sainath, T.N., Weinstein, E., Ramabhadran, B., Wu, Y., Bapna, A., Chen, Z., Lee, S.: Large-scale multilingual speech recognition with a streaming end-to-end model (2019). arXiv preprint arXiv:1909.05330
Cho, J., Baskar, M.K., Li, R., Wiesner, M., Mallidi, S.H., Yalta, N., Karafiat, M., Watanabe, S., Hori, T.: Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling. In: 2018 IEEE Spoken Language Technology Workshop, pp. 521–527. IEEE (2018)
Google Scholar
Antony, P.J., Soman, K.P.: Machine transliteration for Indian languages: a literature survey. Int. J. Sci. Eng. Res. 2, 1–8 (2011)
Google Scholar
Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., Ng, A.Y.: Deep speech: scaling up end-to-end speech recognition (2014). arXiv preprint arXiv:1412.5567
Agarwal, A., Zesch, T.: German end-to-end speech recognition based on DeepSpeech. In: KONVENS (2019)
Google Scholar
Xu, J., Matta, K., Islam, S., Nürnberger, A.: German speech recognition system using DeepSpeech. In: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval, pp. 102–106. Association for Computing Machinery, New York (2020)
Google Scholar
Iakushkin, O., Fedoseev, G., Shaleva, A., Degtyarev, A., Sedova, O.: Russian language speech recognition system based on deepspeech. In: Proceedings of the VIII International Conference on Distributed Computing and Grid-technologies in Science and Education (2018)
Google Scholar
Bhat, I.A., Mujadia, V., Tammewar, A., Bhat, R.A., Shrivastava, M.: IIIT-h system submission for fire 2014 shared task on transliterated search. In: Proceedings of the Forum for Information Retrieval Evaluation, pp. 48–53 (2014)
Google Scholar
Common Voice Download page. https://voice.mozilla.org/en/datasets. Last accessed 18 Apr 2021
Microsoft Speech Corpus for Indian languages. https://msropendata.com/datasets/7230b4b1-912d-400e-be58-f84e0512985e. Last accessed 15 Apr 2021
IIT Madras. Indic tts, https://www.iitm.ac.in/donlab/tts/. Last accessed 26 Apr 2021
Yadav, M., Kumar, R., Yadav, P.S.: Gender identification over voice sample using machine learning. In: Proceedings of International Conference on Computational Intelligence and Data Engineering, pp. 111–121. Springer, Singapore (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
R. Priyamvada, S. Sachin Kumar & K. P. Soman
Federated AI Services, Coimbatore, India
H. B. Barathi Ganesh

Authors

R. Priyamvada
View author publications
You can also search for this author in PubMed Google Scholar
S. Sachin Kumar
View author publications
You can also search for this author in PubMed Google Scholar
H. B. Barathi Ganesh
View author publications
You can also search for this author in PubMed Google Scholar
K. P. Soman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Sachin Kumar .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Arunachal Pradesh (NITAP), Itanagar, Arunachal Pradesh, India
Deepak Gupta
Department of Computer Science and Engineering, National Institute of Technology Arunachal Pradesh (NITAP), Itanagar, Arunachal Pradesh, India
Koj Sambyo
School of Computer Science, Faculty of Engineering and IT, University of Technology, Sydney, NSW, Australia
Mukesh Prasad
Department of Information Technology, Indian Institute of Information Technology (IIIT), Allahabad, Uttar Pradesh, India
Sonali Agarwal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Priyamvada, R., Kumar, S.S., Ganesh, H.B.B., Soman, K.P. (2022). Multilingual Speech Recognition for Indian Languages. In: Gupta, D., Sambyo, K., Prasad, M., Agarwal, S. (eds) Advanced Machine Intelligence and Signal Processing. Lecture Notes in Electrical Engineering, vol 858. Springer, Singapore. https://doi.org/10.1007/978-981-19-0840-8_41

Download citation

DOI: https://doi.org/10.1007/978-981-19-0840-8_41
Published: 26 June 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-0839-2
Online ISBN: 978-981-19-0840-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Multilingual Speech Recognition for Indian Languages

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Analyzing Multilingual Automatic Speech Recognition Systems Performance

Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling

An end-to-end model for cross-lingual transformation of paralinguistic information

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multilingual Speech Recognition for Indian Languages

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Analyzing Multilingual Automatic Speech Recognition Systems Performance

Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling

An end-to-end model for cross-lingual transformation of paralinguistic information

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation