Abstract
In automatic speech recognition (ASR) visual speech information plays a pivotal role especially in the presence of acoustic noise. This paper provides a short review of the different methods for visual speech recognition systems (VSR). Here, we discuss the different stages of VSR including the face and lip localization techniques and different visual feature extraction techniques. We also provide the details of audio-visual database related to this study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Dupont, S., Luettin, J.: Audio-visual speech modelling for continuous speech recognition. IEEE Trans. Multimed. 2(3), 141–151 (2000)
Hazen, T.J.: Visual modal structures and asynchrony constraints for audio-visual speech recognition. IEEE Trans. Audio Speech Lang. Process. 14(3) (2006)
Seymour, R., Stewart, D., Ming, J.: Comparison of image transform based features for visual speech recognition in clean and corrupted videos. EURASIP J. Image Video Process. 2008(14) (2008)
Puvisan, N., Palanivel, S.: Lip reading of hearing impaired persons using HMM. Int. J. Expert Syst. Appl. 38(4) (2011)
Kaynak, M.N., Cheok, A.D., Sengupta, K., Jian, Z., Chung, K.C.: Lip geometric features for human-computer interaction using bimodal speech recognition: comparison and analysis. Speech Commun. 43(1–2), 1–16 (2004)
Jachimski, D., Czyzewski, A., Ciszewski, T.A.: Comparative study of English viseme recognition methods and algorithms. Multimed. Tools Appl. (2017)
Hassanat, A.B.: Visual words for automatic lip reading. Ph.D. thesis, Buckingham, UK, University of Buckingham (2009)
Upadhyaya, P., Farooq, O.: Comparative study of visual feature for bimodal Hindi speech recognition. Arch. Acoust. 609–619 (2015)
Morade, S.S., Patnaik, S.: Comparison of classifiers for lip reading with CUAVE and TULIPS database. Int. J. Light Electr. Opt. 126(24) (2015). Elsevier
Morade, S.S., Patnaik, S.: A novel lip-reading algorithm by using localized ACM and HMM: tested for digit recognition. Int. J. Light Electr. Opt. 125(18) (2014). Elsevier
Astik, B., Sahu, P.K., Chandra, M.: Multiple camera audio visual speech recognition using active appearance model in car environment. Int. J. Speech Technol. 19(1) (2016). Springer
Harte, N.: TCD-TIMIT: an audio-visual corpus of continuous speech. IEEE Trans. Multimed. (2015)
Matthews, I., Cootes, T.F., Banbham, J.A., Cox, S., Harvey, R.: Extraction of visual features of lip reading. IEEE Trans. Pattern Anal. Mach. Intell. 24(2) (2002)
Czyzewski, A., Kostek, B., Bratoszewski, P., Kotus, J., Szykulski, M.: An audio-visual corpus for multimodal automatic speech recognition. J. Intell. Inf. Syst. 49, 167 (2017)
Ibrahim, M.Z., Mulvaney, D.J.: Geometric based lip-reading using template probabilistic multi-dimension dynamic time warping. J. Vis. Commun. Image Represent. 30 (2015)
Zhu, Z., Zhao, G., Hong, X., Pietikainen, M.: A review of recent advances in visual speech decoding. Int. J. Image Vis. Comput. 32(9) (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bhaskar, S., Thasleema, T.M., Rajesh, R. (2019). A Survey on Different Visual Speech Recognition Techniques. In: Nagabhushan, P., Guru, D., Shekar, B., Kumar, Y. (eds) Data Analytics and Learning. Lecture Notes in Networks and Systems, vol 43. Springer, Singapore. https://doi.org/10.1007/978-981-13-2514-4_26
Download citation
DOI: https://doi.org/10.1007/978-981-13-2514-4_26
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2513-7
Online ISBN: 978-981-13-2514-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)