Abstract
Lipreading has become a hot research topic in recent years since the visual information extracted from the lip movement has been shown to improve the performance of automatic speech recognition (ASR) system especially under noisy environments [1]-[3], [5]. There are two important issues related to lipreading: 1) how to extract the most efficient features from lip image sequences, 2) how to build lipreading models. This paper mainly focuses on how to choose more efficient features for lipreading.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
- Linear Discriminant Analysis
- Discrete Cosine Transform
- Discrete Fourier Transform
- Local Binary Pattern
- Automatic Speech Recognition
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Morishima, S., Ogata, S., Murai, K., Nakamura, S.: Audio-visual speech translation with automatic lip synchronization and face tracking based on 3D head model. In: Proc. IEEE Int. Conf. Acoustics, Speech,and Signal Processing, vol. 2, pp. 2117–2120 (2002)
Potamianos, G., Graf, H.P., Cosatto, E.: An image transform approach for HMM based automatic lipreading. In: Proc. Int. Conf. Image Process, Chicago, pp. 173–177 (1998)
Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Trans. On Multimedia 2, 141–151 (2000)
Shen, L., Bai, L.: Gabor feature based face recognition using kernel methods. AFGR, pp. 170–176 (2004)
Matthews., et al.: Extraction of Visual Features for Lipreading. IEEE Trans. on Pattern Analysis and Machine Intelligence 24(2) (2002)
Duchnowski, P., et al.: Toward movement-invariant automatic lip-reading and speech recognition. In: Duchnowski, P. (ed.) Proc. Int. Conf. Acoust. Speech Signal Process, Detroit, pp. 109–111 (1995)
Navon, D.: Forest before the trees: the precedence of global features in visual perception. Cognitive Psychology 9, 353–383 (1977)
Biederman, I.: On the semantics of a glance at a scene. In: Kubovy, M., Pomerantz, J. (eds.) Perceptual organization, pp. 213–253. Erlbaum (1981)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, S., Yao, H., Wan, Y., Wang, D. (2007). Combining Global and Local Classifiers for Lipreading. In: Paiva, A.C.R., Prada, R., Picard, R.W. (eds) Affective Computing and Intelligent Interaction. ACII 2007. Lecture Notes in Computer Science, vol 4738. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74889-2_73
Download citation
DOI: https://doi.org/10.1007/978-3-540-74889-2_73
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74888-5
Online ISBN: 978-3-540-74889-2
eBook Packages: Computer ScienceComputer Science (R0)