Abstract
Building a continuous speech recognizer for the Bangla (widely used as Bengali) language is a challenging task due to the unique inherent features of the language like long and short vowels and many instances of allophones. Stress and accent vary in spoken Bangla language from region to region. But in formal read Bangla speech, stress and accents are ignored. There are three approaches to continuous speech recognition (CSR) based on the sub-word unit viz. word, phoneme and syllable. Pronunciation of words and sentences are strictly governed by set of linguistic rules. Many attempts have been made to build continuous speech recognizers for Bangla for small and restricted tasks. However, medium and large vocabulary CSR for Bangla is relatively new and not explored. In this paper, the authors have attempted for building automatic speech recognition (ASR) method based on context sensitive triphone acoustic models. The method comprises three stages, where the first stage extracts phoneme probabilities from acoustic features using a multilayer neural network (MLN), the second stage designs triphone models to catch context of both sides and the final stage generates word strings based on triphone hidden Markov models (HMMs). The objective of this research is to build a medium vocabulary triphone based continuous speech recognizer for Bangla language. In this experimentation using Bangla speech corpus prepared by us, the recognizer provides higher word accuracy as well as word correct rate for trained and tested sentences with fewer mixture components in HMMs.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Dupont, S., Ris, C., Couvreur, L., & Boite, J.-M. (2005). A study of implicit and explicit modeling of coarticulation and pronunciation variation. In Proc. of InterSpeech’05, Lisbon.
Hasnat, M. A., Mowla, J., & Khan, M. (2007). Isolated and continuous Bangla speech recognition: implementation performance and application perspective. In Proc. international symposium on natural language processing (SNLP), Hanoi, Vietnam, December.
Hassan, M. R., Nath, B., & Bhuiyan, M. A. (2003). Bengali phoneme recognition: a new approach. In Proc. 6th international conference on computer and information technology (ICCIT03), Dhaka, Bangladesh.
Hossain, S. A., Rahman, M. L., Ahmed, F., & Dewan, M. (2004). Bangla speech synthesis, analysis, and recognition: an overview. In Proc. NCCPB, Dhaka.
Hossain, S. A., Rahman, M. L., & Ahmed, F. (2007). Bangla vowel characterization based on analysis by synthesis. In Proc. WASET (Vol. 20, pp. 327–330).
Houque, A. K. M. M. (2006). Bengali segmented speech recognition system. Undergraduate thesis, BRAC University, Bangladesh, May 2006.
Jurafsky, D., Ward, W., Jianping, Z., Herold, K., Xiuyang, Y., & Sen, Z. (2001). What kind of pronunciation variation is hard for triphones to model. In Proc. of ICASSP’01, Salt Lake City, Utah.
Karim, R., Rahman, M. S., & Iqbal, M. Z. (2002). Recognition of spoken letters in Bangla. In Proc. 5th international conference on computer and information technology (ICCIT02), Dhaka, Bangladesh.
Masica, C. (1991). The Indo-Aryan languages. Cambridge: Cambridge University Press.
Matoušek, J., Hanzlíček, Z., & Tihelka, D. (2005). Hybrid syllable/triphone speech synthesis. In Proc. of InterSpeech’05, Lisbon.
Ming, J. et al. (1998). Improved phone recognition using Bayesian triphone models. In Proc. ICASSP’98.
Muhammad, G., Alotaibi, Y. A., & Huda, M. N. (2009). Automatic speech recognition for Bangla digits. In International conference on computer and information technology (ICCIT 2009), Dhaka, Bangladesh.
Nitta, T. (1999). Feature extraction for speech recognition based on orthogonal acoustic-feature planes and LDA. In Proc. ICASSP’99 (pp. 421–424).
Daily Prothom Alo. Online: www.prothom-alo.com.
Rahman, K. J., Hossain, M. A., Das, D., Islam, T., & Ali, M. G. (2003). Continuous Bangla speech recognition system. In Proc. 6th international conference on computer and information technology (ICCIT03), Dhaka, Bangladesh.
Roy, K., Das, D., & Ali, M. G. (2002). Development of the speech recognition system using artificial neural network. In Proc. 5th international conference on computer and information technology (ICCIT02), Dhaka, Bangladesh.
Thangarajan, R., Natarajan, A. M., & Selvam, M. (2008). Word and triphone based approaches in continuous speech recognition for Tamil language. In WSEAS transactions on signal processing (pp. 76–85).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hassan, F., Kotwal, M.R.A., Muhammad, G. et al. MLN-based Bangla ASR using context sensitive triphone HMM. Int J Speech Technol 14, 183–191 (2011). https://doi.org/10.1007/s10772-011-9095-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-011-9095-3