Abstract
This work presents an attentional mechanism with the capability of detecting the localization of a speaker for interaction purposes, based on audio and video information. The localization is computed in terms of azimuth and elevation angles, to be used as input values for controlling mobile systems such as a pan-tilt videocamera or a robotic head. For this purpose the SRP-PHAT algorithm has been implemented with a commercial array of microphones for embedded devices, in order to estimate the localization of a sound source in the surroundings of the array. In order to improve the limitations of the SRP-PHAT algorithm in the estimation of the z coordinate, the elevation angle is corrected via video information by using Haar cascade classifiers for face detection. Simulations and experiments show the accuracy of the system, as well as the application for controlling a pan-tilt videocamera in a real scenario with speakers and ambient noise.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Fong, T., Nourbakhsh, I., Dautenhahn, K.: A survey of socially interactive robots. Robot. Auton. Syst. 42(3), 143–166 (2003)
DiBiase, J.: A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays. Ph.D. Thesis. Brown University (2000)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR, vol. 1, pp. 511–518 (2001)
Valin, J.M., Michaud, F., Rouat, J., Letourneau, D.: Robust sound source localization using a microphone array on a mobile robot. In: International Conference on Intelligent Robots and Systems (IROS 2003), vol. 2, pp. 1228–1233 (2003)
Nakamura, K., Nakadai, K., Asano, F., Ince, G.: Intelligent sound source localization and its application to multimodal human tracking. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 143–148 (2011)
Ferreira, J., Lobo, J., Bessiere, P., Castelo-Branco, M., Dias, J.: A Bayesian framework for active artificial perception. IEEE Trans. Cybern. 43(2), 699–711 (2013)
Viciana-Abad, R., Marfil, R., Perez-Lorenzo, J.M., Bandera, J.P., Romero-Garces, A., Reche-Lopez, P.: Audio-visual perception system for a humanoid robotic head. Sensors 14(6), 9522–9545 (2014)
Do, H., Silverman, H.F., Yu, Y.: A real-time SRP-PHAT source location implementation using stochastic region contraction (SRC) on a large-aperture microphone array. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP 2007, vol. 1, pp. 121–124 (2007)
Do, H., Silverman, H.F.: A fast microphone array SRP-PHAT source location implementation using coarse-to-fine region contraction (CFRC). In: 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 295–298 (2007)
Marti, A., Cobos M., Lopez, J.J.: Real time speaker localization and detection system for camera steering in multiparticipant videoconferencing environments. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2592–2595 (2011)
Silverman, H., Yu, Y., Sachar, J., Patterson, W.: Performance of real-time source-location estimators for a large-aperture microphone array. IEEE Trans. Speech Audio Process. 13(4), 593–606 (2005)
Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976)
Perez-Lorenzo, J.M., Viciana-Abad, R., Reche-Lopez, P., Rivas, F., Escolano, J.: Evaluation of generalized cross-correlation methods for direction of arrival estimation using two microphones in real environments. Appl. Acoust. 73(8), 698–712 (2012)
DiBiase, J.H., Silverman, H.F., Brandstein, M.S.: Microphone arrays: signal processing techniques and applications. In: Brandstein, M.S., Ward, D. (Eds.) Springer (2001)
Marti, A.: Multichannel audio processing for speaker localization, separation and enhancement. Ph.D. Thesis. Universitat Politècnica de València (2013)
Campbell, D.R., Palomäki, K.J., Brown, G.J.: A MAT-LAB simulation of shoebox room acoustics for use in re-search and teaching. Comput. Inf. Syst. J. 9 (2005)
Manso, L.J., Bachiller, P., Bustos, P., Núñez, P., Cintas, R., Calderita, L.V.: RoboComp: a tool-based robotics framework. In: Ando, N., Balakirsky, S., Hemker, T., Reggiani, M., von Stryk, O. (eds.) Simulation, Modeling, and Programming for Autonomous Robots, pp. 251–261. Springer, Berlin (2010)
Acknowledgements
This work has been supported by Economy and Competitiveness Department of the Spanish Government and European Regional Development Fund under the project TIN2015-65686-C5-2-R (MINECO/FEDER, UE).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Martinez-Colon, A., Perez-Lorenzo, J.M., Rivas, F., Viciana-Abad, R., Reche-Lopez, P. (2019). Attentional Mechanism Based on a Microphone Array for Embedded Devices and a Single Camera. In: Fuentetaja Pizán, R., García Olaya, Á., Sesmero Lorente, M., Iglesias Martínez, J., Ledezma Espino, A. (eds) Advances in Physical Agents. WAF 2018. Advances in Intelligent Systems and Computing, vol 855. Springer, Cham. https://doi.org/10.1007/978-3-319-99885-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-99885-5_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99884-8
Online ISBN: 978-3-319-99885-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)