Enhancing Performance of Noise-Robust Gujarati Language ASR Utilizing the Hybrid Acoustic Model and Combined MFCC + GTCC Feature

Bhagat, Bhavesh; Dua, Mohit

doi:10.1007/978-981-99-8129-8_19

Bhavesh Bhagat¹³ &
Mohit Dua¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 832))

Included in the following conference series:

International Conference on MAchine inTelligence for Research & Innovations

92 Accesses

Abstract

The study introduces an enhanced method for improving the accuracy and performance of End-to-End Automatic Speech Recognition (ASR) systems. This involves combining Gammatone Frequency Cepstral Coefficient (GTCC) and Mel Frequency Cepstral Coefficient (MFCC) features with a hybrid CNN-BiGRU model. MFCC and GTCC features capture temporal and spectral aspects of speech, while the hybrid architecture enables effective local and global context modelling. The proposed approach is evaluated using a low-resource Gujarati multi-person speech dataset, incorporating clean and noisy conditions via added white noise. Results demonstrate a 4.6% reduction in Word Error Rate (WER) for clean speech and a significant 7.83% reduction in WER for noisy speech, compared to baseline MFCC with greedy decoding. This method exhibits potential for enhancing ASR systems, making them more reliable and accurate for real-world applications necessitating precise speech-to-text conversion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cepstral and acoustic ternary pattern based hybrid feature extraction approach for end-to-end bangla speech recognition

Article 09 October 2023

Toolkits for Robust Speech Processing

KALDI Recipes for the Czech Speech Recognition Under Various Conditions

References

Deshmukh AM (2020) Comparison of hidden markov model and recurrent neural network in automatic speech recognition. Eur J Eng Technol Res 5(8):958–965
Google Scholar
Billa J (2018) ISI ASR system for the low resource speech recognition challenge for Indian languages. Interspeech
Google Scholar
Gaudani H, Patel NM (2022) Comparative study of robust feature extraction techniques for ASR for limited resource Hindi language. In: Proceedings of second international conference on sustainable expert systems (ICSES 2021). Springer Nature, Singapore
Google Scholar
Lakshminarayanan V (2022) Impact of noise in automatic speech recognition for low-resourced languages. Rochester Institute of Technology
Google Scholar
Dua M, Aggarwal RK, Biswas M (2019) GFCC based discriminatively trained noise robust continuous ASR system for Hindi language. J Ambient Intell Humaniz Comput 10:2301–2314
Google Scholar
Dua M, Aggarwal RK, Biswas M (2018) Discriminative training using noise robust integrated features and refined HMM modeling. J Intell Syst 29(1):327–344
Google Scholar
Graves A et al (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning
Google Scholar
Bourlard HA, Morgan N (1994) Connectionist speech recognition: a hybrid approach, vol 247. Springer Science & Business Media
Google Scholar
Maji B, Swain M, Panda R (2022) A feature selection based parallelized CNN-BiGRU network for speech emotion recognition in Odia language
Google Scholar
Dubey P, Shah B (2022) Deep speech based end-to-end automated speech recognition (asr) for indian-english accents. Preprint at arXiv:2204.00977
Anoop CS, Ramakrishnan AG (2021) CTC-based end-to-end ASR for the low resource Sanskrit language with spectrogram augmentation. In: 2021 National conference on communications (NCC). IEEE
Google Scholar
Joshi B et al (2022) A novel deep learning based Nepali speech recognition. In: International conference on electrical and electronics engineering. Springer, Singapore
Google Scholar
Ephrat A et al (2018) Looking to listen at the cocktail party: a speaker-independent audio-visual model for speech separation. Preprint at arXiv:1804.03619
Bhogale K et al (2023) Effectiveness of mining audio and text pairs from public data for improving ASR systems for low-resource languages. In: ICASSP 2023-2023 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE
Google Scholar
Diwan A et al (2021) Multilingual and code-switching ASR challenges for low resource Indian languages. Preprint at arXiv:2104.00235
Raval D et al (2021) Improving deep learning based automatic speech recognition for Gujarati. Trans Asian Low-Resour Lang Inf Process 21(3):1–18
Google Scholar
Diwan A, Jyothi P (2020) Reduce and reconstruct: ASR for low-resource phonetic languages. Preprint at arXiv:2010.09322

Download references

Author information

Authors and Affiliations

National Institute of Technology, Kurukshetra, 136118, India
Bhavesh Bhagat & Mohit Dua

Authors

Bhavesh Bhagat
View author publications
You can also search for this author in PubMed Google Scholar
Mohit Dua
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bhavesh Bhagat .

Editor information

Editors and Affiliations

Dept of Instrumentation and Control Engg, Dr. B. R. Ambedkar National Institute of, Jalandhar, Punjab, India
Om Prakash Verma
School of Electrical and Electronics Engineering, Nanyang Technological University, Singapore, Singapore
Lipo Wang
Department of Electrical Engineering, Malaviya National Institute of Technolog, Jaipur, Rajasthan, India
Rajesh Kumar
Department of Mathematics, DR BR Ambedkar NIT Jalandhar, Jalandhar, Punjab, India
Anupam Yadav

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhagat, B., Dua, M. (2024). Enhancing Performance of Noise-Robust Gujarati Language ASR Utilizing the Hybrid Acoustic Model and Combined MFCC + GTCC Feature. In: Verma, O.P., Wang, L., Kumar, R., Yadav, A. (eds) Machine Intelligence for Research and Innovations. MAiTRI 2023. Lecture Notes in Networks and Systems, vol 832. Springer, Singapore. https://doi.org/10.1007/978-981-99-8129-8_19

Download citation

DOI: https://doi.org/10.1007/978-981-99-8129-8_19
Published: 03 March 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8128-1
Online ISBN: 978-981-99-8129-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Enhancing Performance of Noise-Robust Gujarati Language ASR Utilizing the Hybrid Acoustic Model and Combined MFCC + GTCC Feature

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Cepstral and acoustic ternary pattern based hybrid feature extraction approach for end-to-end bangla speech recognition

Toolkits for Robust Speech Processing

KALDI Recipes for the Czech Speech Recognition Under Various Conditions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Enhancing Performance of Noise-Robust Gujarati Language ASR Utilizing the Hybrid Acoustic Model and Combined MFCC + GTCC Feature

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Cepstral and acoustic ternary pattern based hybrid feature extraction approach for end-to-end bangla speech recognition

Toolkits for Robust Speech Processing

KALDI Recipes for the Czech Speech Recognition Under Various Conditions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation