Abstract
In real-world auditory scene analysis of human-robot interactions, three types of information are essential and need to be extracted from the observation data – who speaks when and where. We present a speaker diarization system that is used to accomplish the resolution. Multiple signal classification (MUSIC) is a powerful method for voice activity detection (VAD) and direction of arrival (DOA) estimation. We propose our system and compare its performance in VAD and DOA with the method based on MUSIC algorithm.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Kubota, Y., Yoshida, M., Komatani, K., Ogata, T., Okuno, H.G.: Design and implementation of 3d auditory scene visualizer towards auditory awareness with face tracking. In: Tenth IEEE International Symposium on Multimedia, pp. 468–476 (2008)
Nakadai, K., Takahashi, T., Okuno, H.G., Nakajima, H., Hasegawa, Y., Tsujino, H.: Design and implementation of robot audition system ’hark’ open source software for listening to three simultaneous speakers. Advanced Robotics 24(5-6), 739–761 (2010)
Araki, S., Hori, T., Fujimoto, M., Watanabe, S., Yoshioka, T., Nakatani, T., Nakamura, A.: Online meeting recognizer with multichannel speaker diarization. In: ASILOMAR, pp. 1697–1701 (2010)
Tranter, S.E., Reynolds, D.A.: An overview of automatic speaker diarization systems. Proceedings of the IEEE Transactions on Audio, Speech, and Language Processing 14(5 ), 1557–1565 (2006)
Nakamura, K., Nakadai, K., Asano, F., Ince, G.: Intelligent sound source localization and its application to multimodal human tracking. In: Proceedings of the IEEE/RSJ International Conference on IROS, pp. 143–148 (2011)
Hyvarinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley Interscience (2001)
Ono, N.: Stable and fast update rules for independent vector analysis based on auxiliary function technique. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 189–192 (2011)
Schmidt, R.: Multiple emitter location and signal parameter estimation. IEEE Transactions on Antennas and Propagation 34(3), 276–280 (1986)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Huang, Y., Otsuka, T., Okuno, H.G. (2013). A Speaker Diarization System with Robust Speaker Localization and Voice Activity Detection. In: Ali, M., Bosse, T., Hindriks, K., Hoogendoorn, M., Jonker, C., Treur, J. (eds) Contemporary Challenges and Solutions in Applied Artificial Intelligence. Studies in Computational Intelligence, vol 489. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00651-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-00651-2_11
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00650-5
Online ISBN: 978-3-319-00651-2
eBook Packages: EngineeringEngineering (R0)