Robot Command Interface Using an Audio-Visual Speech Recognition System

Ceballos, Alexánder; Gómez, Juan; Prieto, Flavio; Redarce, Tanneguy

doi:10.1007/978-3-642-10268-4_102

Alexánder Ceballos^18,19,
Juan Gómez^19,21,
Flavio Prieto²⁰ &
…
Tanneguy Redarce²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5856))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

1533 Accesses
1 Citations

Abstract

In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command’s automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth’s outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.

Download to read the full chapter text

Chapter PDF

Developing of a Software–Hardware Complex for Automatic Audio–Visual Speech Recognition in Human–Robot Interfaces

Using a High-Speed Video Camera for Robust Audio-Visual Speech Recognition in Acoustically Noisy Conditions

Multimodal speech recognition: increasing accuracy using high speed video data

Article 01 August 2018

Keywords

References

Sackier, J., Wang, Y.: Robotically assisted laparoscopic surgery from concept to development. Surgical Endoscopy 8(1), 63–66 (1994)
Article Google Scholar
Allen, T.P.K., Goldman, R., Hogle, N.J., Fowler, D.L.: In vivo pan/tilt endoscope with integrated light source, zoom and auto-focusing. Studies in Health Technologies and Informatics, 132–174 (2008)
Google Scholar
Allaf, M., Jackman, S., Schulam, P., Cadeddu, J., Lee, B., Moore, R., Kavoussi, L.: Voice vs foot pedal interfaces for control of the AESOP robot. Surgical Endoscopy 12, 1415–1418 (1998)
Article Google Scholar
Murioz, V., Thorbeck, C.V., DeGabriel, J., Lozano, J., Sanchez-Badajoz, E., Garcia-Cerezoand, A., Toscano, R., Jimenez-Garrido, A.: A medical robotic assistant for minimally invasive surgery. In: IEEE Int. Conf. Robotics and Automation, San Francisco, CA, USA, pp. 2901–2906 (2000)
Google Scholar
Krupa, A., Gangloff, J., Doignon, C., de Mathelin, M.F., Morel, G., Leroy, J., Soler, L., Marescaux, J.: Autonomous 3-D Positioning of Surgical Instruments in Robotized Laparoscopic Surgery Using Visual Servoing. IEEE transactions on robotics and automation 19(5), 842–853 (2003)
Article Google Scholar
Goecke, R.: Current trends in joint audio-video signal processing: A review. In: Eighth International Symposium on Signal Processing and Its Applications (ISSPA 2005), vol. 1, pp. 70–73 (2005)
Google Scholar
Campbell, R.: Audio-visual speech processing, pp. 562–569. Elsevier, Amsterdam (2006)
Google Scholar
Campbell, R.: The processing of audio-visual speech: empirical and neural bases. Philosophical Transactions of The Royal Society B 363, 1001–1010 (2008)
Article Google Scholar
Gómez, J.B., Ceballos, A., Prieto, F., Redarce, T.: Mouth Gesture and Voice Command Based Robot Command Interface. In: Proceedings of 2009 IEEE International Conference on Robotics and Automation (ICRA 2009), pp. 333–338 (2009)
Google Scholar
Nefian, A.V., Liang, L., Pi, X., Liu, X., Murphy, K.: Dynamic bayesian networks for audio-visual speech recognition. EURASIP Journal on Applied Signal Processing, 1–15 (2002)
Google Scholar
Aleksic, P.S., Katsaggelos, A.K.: Comparision of MPEG-4 facial animation parameter groups with respect to audio-visual speech recognition performance. In: IEEE International Conference on Image Processing, ICIP 2005, vol. 3, p. III-501-4 (2005)
Google Scholar
Kratt, J., Metze, F., Stiefelhagen, R., Waibel, A.: Large vocabulary audio-visual speech recognition using the janus speech recognition toolkit. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 488–495. Springer, Heidelberg (2004)
Google Scholar
Myung, K., Joung, R., Eun, K.: Speech Recognition with Multi-modal Features Based on Neural Networks. In: King, I., Wang, J., Chan, L.-W., Wang, D. (eds.) ICONIP 2006. LNCS, vol. 4233, pp. 489–498. Springer, Heidelberg (2006)
Google Scholar
Huang, J., Potamianos, G., Connell, J., Neti, C.: Audio-visual speech recognition using an infrared headset. Speech Communication 44, 83–96 (2004)
Article Google Scholar
Potamianos, G.: Speech recognition, audio-visual, pp. 800–805. Elsevier, Amsterdam (2006)
Google Scholar
ISO/IEC: Information technology-generic coding of audio-visual objects, part 2: Visual, ISO/IEC FDIS 14496-2 (final drafts international standard), ISO/IEC JTC1/SC29/WG11 N2502 (1998)
Google Scholar
Zhilin, W., Aleksic, P., Katsaggelos, A.: Lip tracking for MPEG-4 facial animation. In: Fourth IEEE International Conference on Multimodal Interfaces Processing, vol. 1, pp. 293–298 (2002)
Google Scholar
Elliot, R.J., Aggoun, L., Moore, J.B.: Applications of mathematics. In: Karatzas, I., Yor, M. (eds.) Hidden Markov Models. Estimation and Control. Springer, New York (1995)
Google Scholar
Anderson, S., Kewley-Port, D.: Evaluation of speech recognizers for speech training applications. IEEE Transactions on Speech and Audio Processing 3(4), 229–241 (1995)
Article Google Scholar
Pasamontes, J.C.: Estrategias de incorporación de conocimiento sintáctico y semántico en sistemas de comprensión de habla continua en espanol. Estudios de Lingüistica Española (2001)
Google Scholar
Aguilar, R.C.: Diseño y manipulación de modelos ocultos de markov, utilizando herramientas HTK. Ingeniare. Revista chilena de ingeniería 15(1), 18–26 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Tecnológico Metropolitano, Medellín, Colombia
Alexánder Ceballos
DIEEC, Universidad Nacional de Colombia Sede Manizales, Manizales, Colombia
Alexánder Ceballos & Juan Gómez
DIMM, Universidad Nacional de Colombia Sede Bogotá, Bogotá, Colombia
Flavio Prieto
Institut National des Sciences Appliquées de Lyon, Lyon, France
Juan Gómez & Tanneguy Redarce

Authors

Alexánder Ceballos
View author publications
You can also search for this author in PubMed Google Scholar
Juan Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Flavio Prieto
View author publications
You can also search for this author in PubMed Google Scholar
Tanneguy Redarce
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Ingeniería Eléctrica y Ciencias de la Computación, CINVESTAV, Unidad Guadalajara, Jalisco, México
Eduardo Bayro-Corrochano
Computer Vision and Active Perception Laboratory, CSC, KTH, SE-100 44, Stockholm, Sweden
Jan-Olof Eklundh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ceballos, A., Gómez, J., Prieto, F., Redarce, T. (2009). Robot Command Interface Using an Audio-Visual Speech Recognition System. In: Bayro-Corrochano, E., Eklundh, JO. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2009. Lecture Notes in Computer Science, vol 5856. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10268-4_102

Download citation

DOI: https://doi.org/10.1007/978-3-642-10268-4_102
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10267-7
Online ISBN: 978-3-642-10268-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Robot Command Interface Using an Audio-Visual Speech Recognition System

Abstract

Chapter PDF

Similar content being viewed by others

Developing of a Software–Hardware Complex for Automatic Audio–Visual Speech Recognition in Human–Robot Interfaces

Using a High-Speed Video Camera for Robust Audio-Visual Speech Recognition in Acoustically Noisy Conditions

Multimodal speech recognition: increasing accuracy using high speed video data

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Robot Command Interface Using an Audio-Visual Speech Recognition System

Abstract

Chapter PDF

Similar content being viewed by others

Developing of a Software–Hardware Complex for Automatic Audio–Visual Speech Recognition in Human–Robot Interfaces

Using a High-Speed Video Camera for Robust Audio-Visual Speech Recognition in Acoustically Noisy Conditions

Multimodal speech recognition: increasing accuracy using high speed video data

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation