Abstract
Bengali is a prominent language of the Indian subcontinent. This paper presents a comparison between different Bengali speech recognition models built with the Kaldi and Pytorch toolkits. Deep learning has been employed to improve speech recognition performance, and we explore the performance of the techniques for Bengali datasets. Seven different deep neural techniques have been employed in this paper for comparison. Grapheme to phoneme (G2P) is an important module for Indian languages which helps to decode phones from words in Unicode format. We develop a G2P model for Bangla using RNN, and we have shown that it performs well for the purpose. This research also demonstrated that using Kaldi-based feature extraction with DNN-HMM acoustic models yielded the best WER of 4.16 when combined with the Li-GRU neural network. The aim is to demonstrate the performance of the Bengali language using the current state-of-the-art (Kaldi) method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Badhon SMSI, Rahaman MdH, Rupon FR, Abujar S (2020) State of art research in Bengali speech recognition. In: 2020 11th International conference on computing, communication and networking technologies (ICCCNT), Kharagpur, India, July 2020, pp 1–6. https://doi.org/10.1109/ICCCNT49239.2020.9225650
Ravanelli M, Parcollet T, Bengio Y (2019) The PyTorch-Kaldi speech recognition toolkit. In: ICASSP 2019—2019 IEEE International conference on acoustics, speech and signal processing (ICASSP), Brighton, United Kingdom, May 2019, pp 6465–6469. https://doi.org/10.1109/ICASSP.2019.8683713
Amin MdAA, Islam MdT, Kibria S, Rahman MS (2019) Continuous Bengali speech recognition based on deep neural network. In: 2019 International conference on electrical, computer and communication engineering (ECCE), Cox’s Bazar, Bangladesh, Feb 2019, pp 1–6. https://doi.org/10.1109/ECACE.2019.8679341
Rahman Saurav J, Amin S, Kibria S, Shahidur Rahman M (2018) Bangla speech recognition for voice search. In: 2018 International conference on Bangla speech and language processing (ICBSLP), Sylhet, Sept 2018, pp 1–4. https://doi.org/10.1109/ICBSLP.2018.8554944
Hosain Sumit S, Al Muntasir T, Arefin Zaman MM, Nath Nandi R, Sourov T (2018) Noise robust end-to-end speech recognition for Bangla language. In: 2018 International conference on Bangla speech and language processing (ICBSLP), Sylhet, Sept 2018, pp 1–5. https://doi.org/10.1109/ICBSLP.2018.8554871
IARPA Babel Bengali Language Pack. https://catalog.ldc.upenn.edu/LDC2016S08. Accessed Feb 2018
Alam F, Habib SM, Sultana DA, Khan M (2010) Development of annotated Bangla speech corpora. In: Spoken languages technologies for under-resourced languages
Ravanelli M, Brakel P, Omologo M, Bengio Y (2018) Light gated recurrent units for speech recognition. IEEE Trans Emerg Top Comput Intell 2(2):92–102. https://doi.org/10.1109/TETCI.2017.2762739
Guglani J, Mishra AN (2018) Continuous Punjabi speech recognition model based on Kaldi ASR toolkit. Int J Speech Technol 21(2):211–216. https://doi.org/10.1007/s10772-018-9497-6
Ahmed Sumon S, Chowdhury J, Debnath S, Mohammed N, Momen S (2018) Bangla short speech commands recognition using convolutional neural networks. In: 2018 International conference on Bangla speech and language processing (ICBSLP), Sylhet, Sept 2018, pp 1–6. https://doi.org/10.1109/ICBSLP.2018.8554395
Syfullah SM, Zakaria ZB, Uddin MdP, Rabbi MdF, Afjal MI, Nitu AM (2018) Efficient vector code-book generation using K-means and Linde-Buzo-Gray (LBG) algorithm for Bengali voice recognition. In: 2018 International conference on advancement in electrical and electronic engineering (ICAEEE), Gazipur, Bangladesh, Nov 2018, pp 1–4. https://doi.org/10.1109/ICAEEE.2018.8642994
Upadhyaya P, Farooq O, Abidi MR, Varshney YV (2017) Continuous Hindi speech recognition model based on Kaldi ASR toolkit. In: 2017 International conference on wireless communications, signal processing and networking (WiSPNET), Chennai, Mar 2017, pp 786–789. https://doi.org/10.1109/WiSPNET.2017.8299868
Das B, Mandal S, Mitra P (2011) Bengali speech corpus for continuous automatic speech recognition system. In: 2011 International conference on speech database and assessments (oriental COCOSDA), Hsinchu City, Taiwan, Oct 2011, pp 51–55. https://doi.org/10.1109/ICSDA.2011.6085979
Mandal S, Das B, Mitra P, Basu A (2011) Developing Bengali speech corpus for phone recognizer using optimum text selection technique. In: 2011 International conference on Asian language processing, Penang, Malaysia, Nov 2011, pp 268–271. https://doi.org/10.1109/IALP.2011.16
Povey D, Arnab G, Gilles B, Lukas B, Ondiez G, Nagendra G, Mirko H, Petr M, Yanmin Q, Petr S, Jan S, Georg S, Karel V (2011) The Kaldi speech recognition toolkit. In: 2011 IEEE Workshop on automatic speech recognition and understanding, Hilton Waikoloa Village, Big Island, Hawaii, US
Basu J, Basu T, Mitra M, Mandal SKD (2009) Grapheme to phoneme (G2P) conversion for Bangla. In: 2009 Oriental COCOSDA international conference on speech database and assessments, Urumqi, China, Aug 2009, pp 66–71. https://doi.org/10.1109/ICSDA.2009.5278373
Kaiser L (2017) Accelerating deep learning research with the Tensor2Tensor library. In: Google research blog
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv preprint arXiv:1706.03762
Acknowledgements
We would like to acknowledge Vikram Lakkavalli, Research & Development Head, Kaizen Secure Voiz Pvt. Ltd. for suggesting the problem and taking part in technical discussions without which this work would not have been possible. This work has been carried out at Kaizen Secure Voiz Pvt. Ltd. as a part of my thesis requirement.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Guchhait, S., Hans, A.S.A., Augustine, J. (2022). Automatic Speech Recognition of Bengali Using Kaldi. In: Shakya, S., Du, KL., Haoxiang, W. (eds) Proceedings of Second International Conference on Sustainable Expert Systems . Lecture Notes in Networks and Systems, vol 351. Springer, Singapore. https://doi.org/10.1007/978-981-16-7657-4_14
Download citation
DOI: https://doi.org/10.1007/978-981-16-7657-4_14
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-7656-7
Online ISBN: 978-981-16-7657-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)