Abstract
In this work, we address the problem of multi-channel speech separation. We use a localization network to estimate delay times to compute steering vectors and derive spatial filters using these vectors and mixtures, in a similar way as a recently proposed method. The beamformer has difficulties in speech separation when speakers are close to each other or their locations are estimated inaccurately. To overcome this problem, we propose to inform beamforming about speakers so that it tracks speakers using not only locations but also speaker characteristics through utterances. We investigate and compare different methods of using the speaker information in beamforming such as multiplying steering vectors with speaker weights. Experiments on simulated data demonstrate that the proposed method can improve the performance of both speech separation and speech recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Schmidt, M.N., Olsson, R.K.: Single-channel speech separation using sparse non-negative matrix factorization. In: INTERSPEECH, pp. 2614–2617 (2006)
Lee, T.-W.: Independent Component Analysis, pp. 27–66. Springer, Heidelberg (1998)
Cooke, M.: Modelling Auditory Processing and Organisation, vol. 7. Cambridge University Press, Cambridge (2005)
Erdogan, H., Hershey, J.R., Watanabe, S., Mandel, M.I., Le Roux, J.: Improved MVDR beamforming using single-channel mask prediction networks. In: Interspeech, pp. 1981–1985 (2016)
Heymann, J., Drude, L., Haeb-Umbach, R.: Neural network based spectral mask estimation for acoustic beamforming. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 196–200. IEEE (2016)
Higuchi, T., Ito, N., Araki, S., Yoshioka, T., Delcroix, M., Nakatani, T.: Online MVDR beamformer based on complex gaussian mixture model with spatial prior for noise robust ASR. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 780–793 (2017)
Drude, L., Haeb-Umbach, R.: Tight integration of spatial and spectral features for BSS with deep clustering embeddings. In: Proceedings of Interspeech, pp. 2650–2654 (2017)
Liu, C., Inoue, N., Shinoda, K.: A unified network for multi-speaker speech recognition with multi-channel recordings. In: Accepted to APSIPA (2017)
Yoshioka, T., Erdogan, H., Chen, Z., Alleva, F.: Multi-microphone neural speech separation for far-field multi-talker speech recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5739–5743. IEEE (2018)
Zmolikova, K., Delcroix, M., Kinoshita, K., Higuchi, T., Ogawa, A., Nakatani, T.: Speaker-aware neural network based beamformer for speaker extraction in speech mixtures. In: Interspeech (2017)
Wang, Q., Muckenhirn, H., Wilson, K., Sridhar, P., Wu, Z., Hershey, J., Saurous, R.A., Weiss, R.J., Jia,Y., Moreno, I.L.: Voicefilter: targeted voice separation by speaker-conditioned spectrogram masking. arXiv preprint arXiv:1810.04826 (2018)
Hershey, J.R., Chen, Z., Le Roux, J., Watanabe, S.: Deep clustering: discriminative embeddings for segmentation and separation. In: ICASSP, pp. 31–35 (2016)
Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE/ACM Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Nautsch, A., Darmstadt, H.: Speaker verification using i-vector. University of Applied Science Hochschule Darmstadt, Germany (2014)
Kanagasundaram, A., Vogt, R., Dean, D.B., Sridharan, S., Mason, M.W.: I-vector based speaker recognition on short utterances. In: INTERSPEECH, pp. 2341–2344 (2011)
Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)
Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE/ACM Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976)
Mestre, X., Lagunas, M.A.: On diagonal loading for minimum variance beamformers. In: ISSPIT, pp. 459–462 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, C., Liu, Y. (2019). Multi-channel Speaker Separation Using Speaker-Aware Beamformer. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Intelligent Computing. CompCom 2019. Advances in Intelligent Systems and Computing, vol 997. Springer, Cham. https://doi.org/10.1007/978-3-030-22871-2_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-22871-2_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22870-5
Online ISBN: 978-3-030-22871-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)