A New Visual Speech Recognition Approach for RGB-D Cameras

Rekik, Ahmed; Ben-Hamadou, Achraf; Mahdi, Walid

doi:10.1007/978-3-319-11755-3_3

Ahmed Rekik¹⁷,
Achraf Ben-Hamadou¹⁸ &
Walid Mahdi¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8815))

Included in the following conference series:

International Conference Image Analysis and Recognition

2431 Accesses
20 Citations

Abstract

Visual speech recognition remains a challenging topic due to various speaking characteristics. This paper proposes a new approach for lipreading to recognize isolated speech segments (words, digits, phrases, etc.) using both of 2D image and depth data. The process of the proposed system is divided into three consecutive steps, namely, mouth region tracking and extraction, motion and appearance descriptors (HOG and MBH) computing, and classification using the Support Vector Machine (SVM) method. To evaluate the proposed approach, three public databases (MIRALC, Ouluvs, and CUAVE) were used. Speaker dependent and speaker independent settings were considered in the evaluation experiments. The obtained recognition results demonstrate that lipreading can be performed effectively, and the proposed approach outperforms recent works in the literature for the speaker dependent setting while being competitive for the speaker independent setting.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Gabor Based Lipreading with a New Audiovisual Mandarin Corpus

Lip-Reading Using Pixel-Based and Geometry-Based Features for Multimodal Human–Robot Interfaces

An adaptive approach for lip-reading using image and depth data

Article 09 July 2015

Keywords

References

Bakry, A., Elgammal, A.: Mkpls: Manifold kernel partial least squares for lipreading and speaker identification. In: CVPR, pp. 684–691. IEEE (2013)
Google Scholar
Ben-Hamadou, A., Soussen, C., Daul, C., Blondel, W., Wolf, D.: Flexible projector calibration for active stereoscopic systems. In: 2010 IEEE International Conference on Image Processing, pp. 4241–4244 (September 2010)
Google Scholar
Ben-Hamadou, A., Soussen, C., Daul, C., Blondel, W., Wolf, D.: Flexible calibration of structured-light systems projecting point patterns. Computer Vision and Image Understanding 117(10), 1468–1481 (2013)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)
Google Scholar
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006)
Chapter Google Scholar
Huang, D., Shan, C., Ardabilian, M., Wang, Y., Chen, L.: Local binary patterns and its application to facial image analysis: a survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 41(6), 765–781 (2011)
Article Google Scholar
Nanni, L., Lumini, A., Brahnam, S.: Survey on lbp based texture descriptors for image classification. Expert Syst. Appl. 39(3), 3634–3641 (2012)
Article Google Scholar
Patterson, E.K., Gurbuz, S., Tufekci, Z., Gowdy, J.: Cuave: A new audio-visual database for multimodal human-computer interface research. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, pp. II-2017-II-2020. IEEE (2002)
Google Scholar
Pei, Y., Kim, T.K., Zha, H.: Unsupervised random forest manifold alignment for lipreading. In: ICCV, pp. 129–136 (2013)
Google Scholar
Rekik, A., Ben-Hamadou, A., Mahdi, W.: Face pose tracking under arbitrary illumination changes. In: VISAPP (2014)
Google Scholar
Shaikh, A.A., Kumar, D.K., Yau, W.C., Che Azemin, M., Gubbi, J.: Lip reading using optical flow and support vector machines. In: 2010 3rd International Congress on Image and Signal Processing (CISP), vol. 1, pp. 327–330. IEEE (2010)
Google Scholar
Shin, J., Lee, J., Kim, D.: Real-time lip reading system for isolated korean word recognition. Pattern Recognition 44(3), 559–571 (2011)
Article MATH Google Scholar
Vapnik, V.: The nature of statistical learning theory. Springer (2000)
Google Scholar
Yargic, A., Dogan, M.: A lip reading application on ms kinect camera. In: 2013 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA), pp. 1–5. IEEE (2013)
Google Scholar
Zhao, G., Barnard, M., Pietikainen, M.: Lipreading with local spatiotemporal descriptors. IEEE Transactions on Multimedia 11(7), 1254–1265 (2009)
Article Google Scholar
Zhou, Z., Zhao, G., Pietikainen, M.: Towards a practical lipreading system. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 137–144. IEEE (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Multimedia Information Systems and Advanced Computing Laboratory (MIRACL), Sfax University Pôle technologique de Sfax, route de Tunis Km 10, BP 242, 3021, Sfax, Tunisia
Ahmed Rekik & Walid Mahdi
Valeo Driving Assistance Research Center, 34 rue St-André Z.I. des Vignes, 93012, Bobigny, France
Achraf Ben-Hamadou

Authors

Ahmed Rekik
View author publications
You can also search for this author in PubMed Google Scholar
Achraf Ben-Hamadou
View author publications
You can also search for this author in PubMed Google Scholar
Walid Mahdi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Rekik .

Editor information

Editors and Affiliations

Faculty of Engineering, University of Porto, Porto, Portugal
Aurélio Campilho
Dept. of Electrical and Computer Eng., University of Waterloo, Waterloo, Ontario, Canada
Mohamed Kamel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rekik, A., Ben-Hamadou, A., Mahdi, W. (2014). A New Visual Speech Recognition Approach for RGB-D Cameras. In: Campilho, A., Kamel, M. (eds) Image Analysis and Recognition. ICIAR 2014. Lecture Notes in Computer Science(), vol 8815. Springer, Cham. https://doi.org/10.1007/978-3-319-11755-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-11755-3_3
Published: 10 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11754-6
Online ISBN: 978-3-319-11755-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A New Visual Speech Recognition Approach for RGB-D Cameras

Abstract

Chapter PDF

Similar content being viewed by others

Gabor Based Lipreading with a New Audiovisual Mandarin Corpus

Lip-Reading Using Pixel-Based and Geometry-Based Features for Multimodal Human–Robot Interfaces

An adaptive approach for lip-reading using image and depth data

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A New Visual Speech Recognition Approach for RGB-D Cameras

Abstract

Chapter PDF

Similar content being viewed by others

Gabor Based Lipreading with a New Audiovisual Mandarin Corpus

Lip-Reading Using Pixel-Based and Geometry-Based Features for Multimodal Human–Robot Interfaces

An adaptive approach for lip-reading using image and depth data

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation