Effective Lip Localization and Tracking for Achieving Multimodal Speech Recognition

Ooi, Wei Chuan; Jeon, Changwon; Kim, Kihyeon; Ko, Hanseok; Han, David K.

doi:10.1007/978-3-540-89859-7_3

Wei Chuan Ooi⁴,
Changwon Jeon⁴,
Kihyeon Kim⁴,
Hanseok Ko⁴ &
…
David K. Han⁵

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 35))

Abstract

Effective fusion of acoustic and visual modalities in speech recognition has been an important issue in Human Computer Interfaces, warranting further improvements in intelligibility and robustness. Speaker lip motion stands out as the most linguistically relevant visual feature for speech recognition. In this paper, we present a new hybrid approach to improve lip localization and tracking, aimed at improving speech recognition in noisy environments. This hybrid approach begins with a new color space transformation for enhancing lip segmentation. In the color space transformation, a PCA method is employed to derive a new one dimensional color space which maximizes discrimination between lip and non-lip colors. Intensity information is also incorporated in the process to improve contrast of upper and corner lip segments. In the subsequent step, a constrained deformable lip model with high flexibility is constructed to accurately capture and track lip shapes. The model requires only six degrees of freedom, yet provides a precise description of lip shapes using a simple least square fitting method. Experimental results indicate that the proposed hybrid approach delivers reliable and accurate localization and tracking of lip motions under various measurement conditions.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Inner lips feature extraction based on CLNF with hybrid dynamic template for Cued Speech

Article Open access 19 December 2017

An adaptive approach for lip-reading using image and depth data

Article 09 July 2015

Lip segmentation using automatic selected initial contours based on localized active contour model

Article Open access 01 February 2018

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Reference

G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. W. Senior, Recent advances in the automatic recognition of audio-visual speech, Invited, IEEE Proc., 91, 1306–1326, 2003.
Article Google Scholar
T. W. Lewis.and D. M. Powers, Lip feature extraction using Red Exclusion, Proc. Selected papers from Pan-Sydney Workshop on Visual Information Processing, pp. 61–67, 2000.
Google Scholar
R. L. Hsu, M. Abdel, A. K. Jain, Face detection in color images, IEEE Trans. Pattern Anal. Mach. Intelli., 2002.
Google Scholar
S. Igawa, A. Ogihara, A. Shintani, and S. Takamatsu, Speech recognition based on fusion of visual and auditory information using full-frame color image, ZEZCE Trans. Fundam., 1996.
Google Scholar
A. Hulbert and T. Poggio, Synthesizing a color algorithm from examples, Science, 239, 482–485, 1998.
Article Google Scholar
M. T. Chan, Automatic lip model extraction for constrained contour-based tracking, ICIP, 848–851 1999.
Google Scholar
N. Otsu, A threshold selection method from gray-level histograms, IEEE Trans. Syst. Man Cyber., 62–66, 1979.
Google Scholar
S. L. Wang, W. H. Lau, and S. H. Leung, A new real-time lip contour extraction algorithm, ICASSP, 217–220, 2003.
Google Scholar
T. C. Terrillon, M. N. Shirazi, and H. Fukamachi, Comparative performance of different chrominance skin chrominance models and chrominance spaces for the automatic detection of human faces in color images, Proc. IEEE Int. Conf. Autom. Face Gesture Recogn., 54–61, 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering, Korea University, Seoul, Korea
Wei Chuan Ooi, Changwon Jeon, Kihyeon Kim & Hanseok Ko
United States Naval Acedemy, USA
David K. Han

Authors

Wei Chuan Ooi
View author publications
You can also search for this author in PubMed Google Scholar
Changwon Jeon
View author publications
You can also search for this author in PubMed Google Scholar
Kihyeon Kim
View author publications
You can also search for this author in PubMed Google Scholar
Hanseok Ko
View author publications
You can also search for this author in PubMed Google Scholar
David K. Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Chuan Ooi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ooi, W.C., Jeon, C., Kim, K., Ko, H., Han, D.K. (2009). Effective Lip Localization and Tracking for Achieving Multimodal Speech Recognition. In: Hahn, H., Ko, H., Lee, S. (eds) Multisensor Fusion and Integration for Intelligent Systems. Lecture Notes in Electrical Engineering, vol 35. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89859-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-89859-7_3
Published: 25 March 2009
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89858-0
Online ISBN: 978-3-540-89859-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Effective Lip Localization and Tracking for Achieving Multimodal Speech Recognition

Abstract

Chapter PDF

Similar content being viewed by others

Inner lips feature extraction based on CLNF with hybrid dynamic template for Cued Speech

An adaptive approach for lip-reading using image and depth data

Lip segmentation using automatic selected initial contours based on localized active contour model

Keywords

Reference

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Effective Lip Localization and Tracking for Achieving Multimodal Speech Recognition

Abstract

Chapter PDF

Similar content being viewed by others

Inner lips feature extraction based on CLNF with hybrid dynamic template for Cued Speech

An adaptive approach for lip-reading using image and depth data

Lip segmentation using automatic selected initial contours based on localized active contour model

Keywords

Reference

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation