MLN-based Bangla ASR using context sensitive triphone HMM

Hassan, Foyzul; Kotwal, Mohammed Rokibul Alam; Muhammad, Ghulam; Huda, Mohammad Nurul

doi:10.1007/s10772-011-9095-3

MLN-based Bangla ASR using context sensitive triphone HMM

Published: 30 June 2011

Volume 14, pages 183–191, (2011)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

International Journal of Speech Technology Aims and scope Submit manuscript

MLN-based Bangla ASR using context sensitive triphone HMM

Download PDF

Foyzul Hassan¹,
Mohammed Rokibul Alam Kotwal¹,
Ghulam Muhammad² &
…
Mohammad Nurul Huda¹

122 Accesses
4 Citations
Explore all metrics

Abstract

Building a continuous speech recognizer for the Bangla (widely used as Bengali) language is a challenging task due to the unique inherent features of the language like long and short vowels and many instances of allophones. Stress and accent vary in spoken Bangla language from region to region. But in formal read Bangla speech, stress and accents are ignored. There are three approaches to continuous speech recognition (CSR) based on the sub-word unit viz. word, phoneme and syllable. Pronunciation of words and sentences are strictly governed by set of linguistic rules. Many attempts have been made to build continuous speech recognizers for Bangla for small and restricted tasks. However, medium and large vocabulary CSR for Bangla is relatively new and not explored. In this paper, the authors have attempted for building automatic speech recognition (ASR) method based on context sensitive triphone acoustic models. The method comprises three stages, where the first stage extracts phoneme probabilities from acoustic features using a multilayer neural network (MLN), the second stage designs triphone models to catch context of both sides and the final stage generates word strings based on triphone hidden Markov models (HMMs). The objective of this research is to build a medium vocabulary triphone based continuous speech recognizer for Bangla language. In this experimentation using Bangla speech corpus prepared by us, the recognizer provides higher word accuracy as well as word correct rate for trained and tested sentences with fewer mixture components in HMMs.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Dupont, S., Ris, C., Couvreur, L., & Boite, J.-M. (2005). A study of implicit and explicit modeling of coarticulation and pronunciation variation. In Proc. of InterSpeech’05, Lisbon.
Google Scholar
Hasnat, M. A., Mowla, J., & Khan, M. (2007). Isolated and continuous Bangla speech recognition: implementation performance and application perspective. In Proc. international symposium on natural language processing (SNLP), Hanoi, Vietnam, December.
Google Scholar
Hassan, M. R., Nath, B., & Bhuiyan, M. A. (2003). Bengali phoneme recognition: a new approach. In Proc. 6th international conference on computer and information technology (ICCIT03), Dhaka, Bangladesh.
Google Scholar
Hossain, S. A., Rahman, M. L., Ahmed, F., & Dewan, M. (2004). Bangla speech synthesis, analysis, and recognition: an overview. In Proc. NCCPB, Dhaka.
Google Scholar
Hossain, S. A., Rahman, M. L., & Ahmed, F. (2007). Bangla vowel characterization based on analysis by synthesis. In Proc. WASET (Vol. 20, pp. 327–330).
Google Scholar
Houque, A. K. M. M. (2006). Bengali segmented speech recognition system. Undergraduate thesis, BRAC University, Bangladesh, May 2006.
Jurafsky, D., Ward, W., Jianping, Z., Herold, K., Xiuyang, Y., & Sen, Z. (2001). What kind of pronunciation variation is hard for triphones to model. In Proc. of ICASSP’01, Salt Lake City, Utah.
Google Scholar
Karim, R., Rahman, M. S., & Iqbal, M. Z. (2002). Recognition of spoken letters in Bangla. In Proc. 5th international conference on computer and information technology (ICCIT02), Dhaka, Bangladesh.
Google Scholar
Masica, C. (1991). The Indo-Aryan languages. Cambridge: Cambridge University Press.
Google Scholar
Matoušek, J., Hanzlíček, Z., & Tihelka, D. (2005). Hybrid syllable/triphone speech synthesis. In Proc. of InterSpeech’05, Lisbon.
Google Scholar
Ming, J. et al. (1998). Improved phone recognition using Bayesian triphone models. In Proc. ICASSP’98.
Google Scholar
Muhammad, G., Alotaibi, Y. A., & Huda, M. N. (2009). Automatic speech recognition for Bangla digits. In International conference on computer and information technology (ICCIT 2009), Dhaka, Bangladesh.
Google Scholar
Nitta, T. (1999). Feature extraction for speech recognition based on orthogonal acoustic-feature planes and LDA. In Proc. ICASSP’99 (pp. 421–424).
Google Scholar
Daily Prothom Alo. Online: www.prothom-alo.com.
Rahman, K. J., Hossain, M. A., Das, D., Islam, T., & Ali, M. G. (2003). Continuous Bangla speech recognition system. In Proc. 6th international conference on computer and information technology (ICCIT03), Dhaka, Bangladesh.
Google Scholar
Roy, K., Das, D., & Ali, M. G. (2002). Development of the speech recognition system using artificial neural network. In Proc. 5th international conference on computer and information technology (ICCIT02), Dhaka, Bangladesh.
Google Scholar
Thangarajan, R., Natarajan, A. M., & Selvam, M. (2008). Word and triphone based approaches in continuous speech recognition for Tamil language. In WSEAS transactions on signal processing (pp. 76–85).
Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE, United International University, Dhaka, Bangladesh
Foyzul Hassan, Mohammed Rokibul Alam Kotwal & Mohammad Nurul Huda
Department of CE, College of CIS, King Saud University, Riyadh, Kingdom of Saudi Arabia
Ghulam Muhammad

Authors

Foyzul Hassan
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Rokibul Alam Kotwal
View author publications
You can also search for this author in PubMed Google Scholar
Ghulam Muhammad
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Nurul Huda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Foyzul Hassan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hassan, F., Kotwal, M.R.A., Muhammad, G. et al. MLN-based Bangla ASR using context sensitive triphone HMM. Int J Speech Technol 14, 183–191 (2011). https://doi.org/10.1007/s10772-011-9095-3

Download citation

Received: 02 February 2011
Accepted: 13 June 2011
Published: 30 June 2011
Issue Date: September 2011
DOI: https://doi.org/10.1007/s10772-011-9095-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

MLN-based Bangla ASR using context sensitive triphone HMM

Abstract

Article PDF

Similar content being viewed by others

Using Gaussian Mixtures on Triphone Acoustic Modelling-Based Punjabi Continuous Speech Recognition

Continuous Punjabi speech recognition model based on Kaldi ASR toolkit

Monophone-based connected word Hindi speech recognition improvement

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MLN-based Bangla ASR using context sensitive triphone HMM

Abstract

Article PDF

Similar content being viewed by others

Using Gaussian Mixtures on Triphone Acoustic Modelling-Based Punjabi Continuous Speech Recognition

Continuous Punjabi speech recognition model based on Kaldi ASR toolkit

Monophone-based connected word Hindi speech recognition improvement

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation