Skip to main content

Multilingual Speech Recognition for Indian Languages

  • Conference paper
  • First Online:
Advanced Machine Intelligence and Signal Processing

Abstract

Automatic speech recognition systems are of two types, such as monolingual and multilingual. Due to its ability to use transfer learning techniques and create better SR models for resource-scarce languages, multilingual speech recognition has recently become more prevalent. Generally, multilingual speech recognition models use specific parameters and modules which activate depending on the language given to the models. These models lack efficiency if language identity is not specified and also lacks the ability to recognize code-switch speech. In this work, we propose a multilingual model for English and Indian languages that can convert speech to text without specifying the language identity and recognize code-switch speech through transliterated and English text as input transcription. The transliterated text can help the model to learn and map sound to its appropriate English character in the case of multilingual and code-switched speech. This multilingual model uses the DeepSpeech architecture by Baidu. The lowest word error rate (WER) and character error rate (CER) for the best model were 30.5% and 11.69%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Kumar, C.S., Wei, F.S.: A bilingual speech recognition system for English and Tamil. In: Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint, pp. 1641–1644. IEEE (2003)

    Google Scholar 

  2. Udhaykumar, N., Ramakrishnan, S.K., Swaminathan, R.: Multilingual speech recognition for information retrieval in Indian context. In: Proceedings of the Student Research Workshop at HLT-NAACL, pp. 1–6 (2004)

    Google Scholar 

  3. Burget, L., Schwarz, P., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., Glembek, O., Goel, N., Karafiát, M., Povey, D., Rastrow, A.: Multilingual acoustic modeling for speech recognition based on subspace Gaussian mixture models. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4334–4337. IEEE (2010)

    Google Scholar 

  4. Kumar, C.S., Mohandas, V.P., Li, H.: Multilingual speech recognition: a unified approach. In: Ninth European Conference on Speech Communication and Technology, pp. 3357–3360 (2005)

    Google Scholar 

  5. Heigold, G., Vanhoucke, V., Senior, A., Nguyen, P., Ranzato, M.A., Devin, M., Dean, J.: Multilingual acoustic models using distributed deep neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8619–8623. IEEE (2013)

    Google Scholar 

  6. Toshniwal, S., Sainath, T.N., Weiss, R.J., Li, B., Moreno, P., Weinstein, E., Rao, K.: Multilingual speech recognition with a single end-to-end model. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4904–4908. IEEE (2018)

    Google Scholar 

  7. Kannan, A., Datta, A., Sainath, T.N., Weinstein, E., Ramabhadran, B., Wu, Y., Bapna, A., Chen, Z., Lee, S.: Large-scale multilingual speech recognition with a streaming end-to-end model (2019). arXiv preprint arXiv:1909.05330

  8. Cho, J., Baskar, M.K., Li, R., Wiesner, M., Mallidi, S.H., Yalta, N., Karafiat, M., Watanabe, S., Hori, T.: Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling. In: 2018 IEEE Spoken Language Technology Workshop, pp. 521–527. IEEE (2018)

    Google Scholar 

  9. Antony, P.J., Soman, K.P.: Machine transliteration for Indian languages: a literature survey. Int. J. Sci. Eng. Res. 2, 1–8 (2011)

    Google Scholar 

  10. Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., Ng, A.Y.: Deep speech: scaling up end-to-end speech recognition (2014). arXiv preprint arXiv:1412.5567

  11. Agarwal, A., Zesch, T.: German end-to-end speech recognition based on DeepSpeech. In: KONVENS (2019)

    Google Scholar 

  12. Xu, J., Matta, K., Islam, S., Nürnberger, A.: German speech recognition system using DeepSpeech. In: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval, pp. 102–106. Association for Computing Machinery, New York (2020)

    Google Scholar 

  13. Iakushkin, O., Fedoseev, G., Shaleva, A., Degtyarev, A., Sedova, O.: Russian language speech recognition system based on deepspeech. In: Proceedings of the VIII International Conference on Distributed Computing and Grid-technologies in Science and Education (2018)

    Google Scholar 

  14. Bhat, I.A., Mujadia, V., Tammewar, A., Bhat, R.A., Shrivastava, M.: IIIT-h system submission for fire 2014 shared task on transliterated search. In: Proceedings of the Forum for Information Retrieval Evaluation, pp. 48–53 (2014)

    Google Scholar 

  15. Common Voice Download page. https://voice.mozilla.org/en/datasets. Last accessed 18 Apr 2021

  16. Microsoft Speech Corpus for Indian languages. https://msropendata.com/datasets/7230b4b1-912d-400e-be58-f84e0512985e. Last accessed 15 Apr 2021

  17. IIT Madras. Indic tts, https://www.iitm.ac.in/donlab/tts/. Last accessed 26 Apr 2021

  18. Yadav, M., Kumar, R., Yadav, P.S.: Gender identification over voice sample using machine learning. In: Proceedings of International Conference on Computational Intelligence and Data Engineering, pp. 111–121. Springer, Singapore (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Sachin Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Priyamvada, R., Kumar, S.S., Ganesh, H.B.B., Soman, K.P. (2022). Multilingual Speech Recognition for Indian Languages. In: Gupta, D., Sambyo, K., Prasad, M., Agarwal, S. (eds) Advanced Machine Intelligence and Signal Processing. Lecture Notes in Electrical Engineering, vol 858. Springer, Singapore. https://doi.org/10.1007/978-981-19-0840-8_41

Download citation

Publish with us

Policies and ethics