Automatic Speech Recognition of Bengali Using Kaldi

Guchhait, Subhadeep; Hans, Arnold Sachith A; Augustine, Jacob

doi:10.1007/978-981-16-7657-4_14

Subhadeep Guchhait¹²,
Arnold Sachith A Hans¹² &
Jacob Augustine¹²

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 351))

700 Accesses
4 Citations

Abstract

Bengali is a prominent language of the Indian subcontinent. This paper presents a comparison between different Bengali speech recognition models built with the Kaldi and Pytorch toolkits. Deep learning has been employed to improve speech recognition performance, and we explore the performance of the techniques for Bengali datasets. Seven different deep neural techniques have been employed in this paper for comparison. Grapheme to phoneme (G2P) is an important module for Indian languages which helps to decode phones from words in Unicode format. We develop a G2P model for Bangla using RNN, and we have shown that it performs well for the purpose. This research also demonstrated that using Kaldi-based feature extraction with DNN-HMM acoustic models yielded the best WER of 4.16 when combined with the Li-GRU neural network. The aim is to demonstrate the performance of the Bengali language using the current state-of-the-art (Kaldi) method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Novel Deep Learning Based Nepali Speech Recognition

Brazilian Portuguese Speech Recognition Using Wav2vec 2.0

Gujarati Language Automatic Speech Recognition Using Integrated Feature Extraction and Hybrid Acoustic Model

References

Badhon SMSI, Rahaman MdH, Rupon FR, Abujar S (2020) State of art research in Bengali speech recognition. In: 2020 11th International conference on computing, communication and networking technologies (ICCCNT), Kharagpur, India, July 2020, pp 1–6. https://doi.org/10.1109/ICCCNT49239.2020.9225650
Ravanelli M, Parcollet T, Bengio Y (2019) The PyTorch-Kaldi speech recognition toolkit. In: ICASSP 2019—2019 IEEE International conference on acoustics, speech and signal processing (ICASSP), Brighton, United Kingdom, May 2019, pp 6465–6469. https://doi.org/10.1109/ICASSP.2019.8683713
Amin MdAA, Islam MdT, Kibria S, Rahman MS (2019) Continuous Bengali speech recognition based on deep neural network. In: 2019 International conference on electrical, computer and communication engineering (ECCE), Cox’s Bazar, Bangladesh, Feb 2019, pp 1–6. https://doi.org/10.1109/ECACE.2019.8679341
Rahman Saurav J, Amin S, Kibria S, Shahidur Rahman M (2018) Bangla speech recognition for voice search. In: 2018 International conference on Bangla speech and language processing (ICBSLP), Sylhet, Sept 2018, pp 1–4. https://doi.org/10.1109/ICBSLP.2018.8554944
Hosain Sumit S, Al Muntasir T, Arefin Zaman MM, Nath Nandi R, Sourov T (2018) Noise robust end-to-end speech recognition for Bangla language. In: 2018 International conference on Bangla speech and language processing (ICBSLP), Sylhet, Sept 2018, pp 1–5. https://doi.org/10.1109/ICBSLP.2018.8554871
IARPA Babel Bengali Language Pack. https://catalog.ldc.upenn.edu/LDC2016S08. Accessed Feb 2018
Alam F, Habib SM, Sultana DA, Khan M (2010) Development of annotated Bangla speech corpora. In: Spoken languages technologies for under-resourced languages
Google Scholar
Ravanelli M, Brakel P, Omologo M, Bengio Y (2018) Light gated recurrent units for speech recognition. IEEE Trans Emerg Top Comput Intell 2(2):92–102. https://doi.org/10.1109/TETCI.2017.2762739
Article Google Scholar
Guglani J, Mishra AN (2018) Continuous Punjabi speech recognition model based on Kaldi ASR toolkit. Int J Speech Technol 21(2):211–216. https://doi.org/10.1007/s10772-018-9497-6
Article Google Scholar
Ahmed Sumon S, Chowdhury J, Debnath S, Mohammed N, Momen S (2018) Bangla short speech commands recognition using convolutional neural networks. In: 2018 International conference on Bangla speech and language processing (ICBSLP), Sylhet, Sept 2018, pp 1–6. https://doi.org/10.1109/ICBSLP.2018.8554395
Syfullah SM, Zakaria ZB, Uddin MdP, Rabbi MdF, Afjal MI, Nitu AM (2018) Efficient vector code-book generation using K-means and Linde-Buzo-Gray (LBG) algorithm for Bengali voice recognition. In: 2018 International conference on advancement in electrical and electronic engineering (ICAEEE), Gazipur, Bangladesh, Nov 2018, pp 1–4. https://doi.org/10.1109/ICAEEE.2018.8642994
Upadhyaya P, Farooq O, Abidi MR, Varshney YV (2017) Continuous Hindi speech recognition model based on Kaldi ASR toolkit. In: 2017 International conference on wireless communications, signal processing and networking (WiSPNET), Chennai, Mar 2017, pp 786–789. https://doi.org/10.1109/WiSPNET.2017.8299868
Das B, Mandal S, Mitra P (2011) Bengali speech corpus for continuous automatic speech recognition system. In: 2011 International conference on speech database and assessments (oriental COCOSDA), Hsinchu City, Taiwan, Oct 2011, pp 51–55. https://doi.org/10.1109/ICSDA.2011.6085979
Mandal S, Das B, Mitra P, Basu A (2011) Developing Bengali speech corpus for phone recognizer using optimum text selection technique. In: 2011 International conference on Asian language processing, Penang, Malaysia, Nov 2011, pp 268–271. https://doi.org/10.1109/IALP.2011.16
Povey D, Arnab G, Gilles B, Lukas B, Ondiez G, Nagendra G, Mirko H, Petr M, Yanmin Q, Petr S, Jan S, Georg S, Karel V (2011) The Kaldi speech recognition toolkit. In: 2011 IEEE Workshop on automatic speech recognition and understanding, Hilton Waikoloa Village, Big Island, Hawaii, US
Google Scholar
Basu J, Basu T, Mitra M, Mandal SKD (2009) Grapheme to phoneme (G2P) conversion for Bangla. In: 2009 Oriental COCOSDA international conference on speech database and assessments, Urumqi, China, Aug 2009, pp 66–71. https://doi.org/10.1109/ICSDA.2009.5278373
https://github.com/cmusphinx/g2p-seq2seq.git
https://github.com/google/language-resources
Kaiser L (2017) Accelerating deep learning research with the Tensor2Tensor library. In: Google research blog
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv preprint arXiv:1706.03762

Download references

Acknowledgements

We would like to acknowledge Vikram Lakkavalli, Research & Development Head, Kaizen Secure Voiz Pvt. Ltd. for suggesting the problem and taking part in technical discussions without which this work would not have been possible. This work has been carried out at Kaizen Secure Voiz Pvt. Ltd. as a part of my thesis requirement.

Author information

Authors and Affiliations

Department of CSE, Presidency University, Bangalore, India
Subhadeep Guchhait, Arnold Sachith A Hans & Jacob Augustine

Authors

Subhadeep Guchhait
View author publications
You can also search for this author in PubMed Google Scholar
Arnold Sachith A Hans
View author publications
You can also search for this author in PubMed Google Scholar
Jacob Augustine
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Engineering, Tribhuvan University, Kirtipur, Nepal
Subarna Shakya
Department of Electrical and Computer Engineering, Concordia University, Montréal, QC, Canada
Ke-Lin Du
Go Perception Laboratory, Cornell University, Ithaca, NY, USA
Wang Haoxiang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guchhait, S., Hans, A.S.A., Augustine, J. (2022). Automatic Speech Recognition of Bengali Using Kaldi. In: Shakya, S., Du, KL., Haoxiang, W. (eds) Proceedings of Second International Conference on Sustainable Expert Systems . Lecture Notes in Networks and Systems, vol 351. Springer, Singapore. https://doi.org/10.1007/978-981-16-7657-4_14

Download citation

DOI: https://doi.org/10.1007/978-981-16-7657-4_14
Published: 26 February 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-7656-7
Online ISBN: 978-981-16-7657-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Automatic Speech Recognition of Bengali Using Kaldi

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Deep Learning Based Nepali Speech Recognition

Brazilian Portuguese Speech Recognition Using Wav2vec 2.0

Gujarati Language Automatic Speech Recognition Using Integrated Feature Extraction and Hybrid Acoustic Model

References

Acknowledgements

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automatic Speech Recognition of Bengali Using Kaldi

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Deep Learning Based Nepali Speech Recognition

Brazilian Portuguese Speech Recognition Using Wav2vec 2.0

Gujarati Language Automatic Speech Recognition Using Integrated Feature Extraction and Hybrid Acoustic Model

References

Acknowledgements

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation