Multi-channel Speaker Separation Using Speaker-Aware Beamformer

Liu, Conggui; Liu, Yinhua

doi:10.1007/978-3-030-22871-2_32

Conggui Liu¹⁷ &
Yinhua Liu¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 997))

Included in the following conference series:

Intelligent Computing - Proceedings of the Computing Conference

1152 Accesses

Abstract

In this work, we address the problem of multi-channel speech separation. We use a localization network to estimate delay times to compute steering vectors and derive spatial filters using these vectors and mixtures, in a similar way as a recently proposed method. The beamformer has difficulties in speech separation when speakers are close to each other or their locations are estimated inaccurately. To overcome this problem, we propose to inform beamforming about speakers so that it tracks speakers using not only locations but also speaker characteristics through utterances. We investigate and compare different methods of using the speaker information in beamforming such as multiplying steering vectors with speaker weights. Experiments on simulated data demonstrate that the proposed method can improve the performance of both speech separation and speech recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Concurrent speakers localization using blind source separation and microphone array geometry

Article 09 May 2021

A recursive expectation-maximization algorithm for speaker tracking and separation

Article Open access 04 December 2021

A Novel Nested Circular Microphone Array and Subband Processing-Based System for Counting and DOA Estimation of Multiple Simultaneous Speakers

Article 19 May 2015

References

Schmidt, M.N., Olsson, R.K.: Single-channel speech separation using sparse non-negative matrix factorization. In: INTERSPEECH, pp. 2614–2617 (2006)
Google Scholar
Lee, T.-W.: Independent Component Analysis, pp. 27–66. Springer, Heidelberg (1998)
Google Scholar
Cooke, M.: Modelling Auditory Processing and Organisation, vol. 7. Cambridge University Press, Cambridge (2005)
Google Scholar
Erdogan, H., Hershey, J.R., Watanabe, S., Mandel, M.I., Le Roux, J.: Improved MVDR beamforming using single-channel mask prediction networks. In: Interspeech, pp. 1981–1985 (2016)
Google Scholar
Heymann, J., Drude, L., Haeb-Umbach, R.: Neural network based spectral mask estimation for acoustic beamforming. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 196–200. IEEE (2016)
Google Scholar
Higuchi, T., Ito, N., Araki, S., Yoshioka, T., Delcroix, M., Nakatani, T.: Online MVDR beamformer based on complex gaussian mixture model with spatial prior for noise robust ASR. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 780–793 (2017)
Google Scholar
Drude, L., Haeb-Umbach, R.: Tight integration of spatial and spectral features for BSS with deep clustering embeddings. In: Proceedings of Interspeech, pp. 2650–2654 (2017)
Google Scholar
Liu, C., Inoue, N., Shinoda, K.: A unified network for multi-speaker speech recognition with multi-channel recordings. In: Accepted to APSIPA (2017)
Google Scholar
Yoshioka, T., Erdogan, H., Chen, Z., Alleva, F.: Multi-microphone neural speech separation for far-field multi-talker speech recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5739–5743. IEEE (2018)
Google Scholar
Zmolikova, K., Delcroix, M., Kinoshita, K., Higuchi, T., Ogawa, A., Nakatani, T.: Speaker-aware neural network based beamformer for speaker extraction in speech mixtures. In: Interspeech (2017)
Google Scholar
Wang, Q., Muckenhirn, H., Wilson, K., Sridhar, P., Wu, Z., Hershey, J., Saurous, R.A., Weiss, R.J., Jia,Y., Moreno, I.L.: Voicefilter: targeted voice separation by speaker-conditioned spectrogram masking. arXiv preprint arXiv:1810.04826 (2018)
Hershey, J.R., Chen, Z., Le Roux, J., Watanabe, S.: Deep clustering: discriminative embeddings for segmentation and separation. In: ICASSP, pp. 31–35 (2016)
Google Scholar
Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE/ACM Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Google Scholar
Nautsch, A., Darmstadt, H.: Speaker verification using i-vector. University of Applied Science Hochschule Darmstadt, Germany (2014)
Google Scholar
Kanagasundaram, A., Vogt, R., Dean, D.B., Sridharan, S., Mason, M.W.: I-vector based speaker recognition on short utterances. In: INTERSPEECH, pp. 2341–2344 (2011)
Google Scholar
Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)
Google Scholar
Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE/ACM Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976)
Google Scholar
Mestre, X., Lagunas, M.A.: On diagonal loading for minimum variance beamformers. In: ISSPIT, pp. 459–462 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Tokyo Institute of Technology, Tokyo, Japan
Conggui Liu
Institue for Future, Qingdao University, Qingdao, China
Yinhua Liu

Authors

Conggui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yinhua Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Conggui Liu .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai
The Science and Information SAI Organization, Bradford, West Yorkshire, UK
Rahul Bhatia
The Science and Information SAI Organization, Bradford, West Yorkshire, UK
Supriya Kapoor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, C., Liu, Y. (2019). Multi-channel Speaker Separation Using Speaker-Aware Beamformer. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Intelligent Computing. CompCom 2019. Advances in Intelligent Systems and Computing, vol 997. Springer, Cham. https://doi.org/10.1007/978-3-030-22871-2_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-22871-2_32
Published: 23 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22870-5
Online ISBN: 978-3-030-22871-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Multi-channel Speaker Separation Using Speaker-Aware Beamformer

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Concurrent speakers localization using blind source separation and microphone array geometry

A recursive expectation-maximization algorithm for speaker tracking and separation

A Novel Nested Circular Microphone Array and Subband Processing-Based System for Counting and DOA Estimation of Multiple Simultaneous Speakers

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multi-channel Speaker Separation Using Speaker-Aware Beamformer

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Concurrent speakers localization using blind source separation and microphone array geometry

A recursive expectation-maximization algorithm for speaker tracking and separation

A Novel Nested Circular Microphone Array and Subband Processing-Based System for Counting and DOA Estimation of Multiple Simultaneous Speakers

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation