Application of Adversarial Domain Adaptation to Voice Activity Detection

Kim, TaeSoo; Ko, Jong Hwan

doi:10.1007/978-3-030-82199-9_55

TaeSoo Kim¹⁰ &
Jong Hwan Ko¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 296))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

1604 Accesses
1 Citations

Abstract

Voice Activity Detection (VAD) is becoming an essential front-end component in various speech processing systems. As those systems are commonly deployed in environments with diverse noise types and low signal-to-noise ratios (SNRs), an effective VAD method should perform robust detection of speech region out of noisy background signals. In this paper, we propose applying an adversarial domain adaptation technique to VAD. The proposed method trains DNN models for a VAD task in a supervised manner, simultaneously mitigating the problem of area mismatch between noisy and clean audio stream in a unsupervised manner. The experimental results show that the proposed method improves robust detection performance in noisy environments compared to other DNN-based model learned with hand-crafted acoustic feature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Robust Voice Activity Detection Based on Concept of Modulation Transfer Function in Noisy Reverberant Environments

Article 11 June 2015

Deep Learning Approaches for Voice Activity Detection

Lightweight CNN for Robust Voice Activity Detection

References

Goodfellow, I.J., et al.: Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014)
Tzeng, E., et al.: Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167–7176 (2017)
Google Scholar
Kim, J., Kim, J., Lee, S., Park, J., Hahn, M.: Vowel based voice activity detection with LSTM recurrent neural network. In Proceedings of the 8th International Conference on Signal Processing Systems, pp. 134–137 (November 2016)
Google Scholar
Eyben, F., Weninger, F., Squartini, S., Schuller, B.: Real-life voice activity detection with lstm recurrent neural networks and an application to Hollywood movies. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 483–487. IEEE (May 2013)
Google Scholar
Zhang, X. L., Wang, D.: Boosted deep neural networks and multi-resolution cochleagram features for voice activity detection. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)
Google Scholar
Kim, J., Hahn, M.: Voice activity detection using an adaptive context attention model. IEEE Sig. Process. Lett. 25(8), 1181–1185 (2018)
Google Scholar
Tong, S., Gu, H., Yu, K.: A comparative study of robustness of deep learning approaches for VAD. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5695–5699. IEEE (March 2016)
Google Scholar
Zhang, X.L., Wu, J.: Deep belief networks based voice activity detection. IEEE Trans. Audio, Speech, Lang. Process. 21(4), 697–710 (2012)
Google Scholar
Zhang, X.-L.: Unsupervised domain adaptation for deep neural network based voice activity detection. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6864–6868. IEEE (2014)
Google Scholar
Lavechin, M., Gill, M. P., Bousbib, R., Bredin, H., Garcia-Perera, L.P.: End-to-end Domain-Adversarial Voice Activity Detection. arXiv preprint arXiv:1910.10655 (2019)
Shahid, M., Beyan, C., Murino, V.: Voice activity detection by upper body motion analysis and unsupervised domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Google Scholar
Varga, A., Steeneken, H.J.: Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)
Google Scholar
Zue, V., Seneff, S., Glass, J.: Speech database development at MIT: TIMIT and beyond. Speech Commun. 9(4), 351–356 (1990)
Google Scholar
Berthelot, D., Schumm, T., Metz, L.: Began: Boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717 (2017)
Ishizuka, K., Nakatani, T., Fujimoto, M., Miyazaki, N.: Noise robust voice activity detection based on periodic to aperiodic component ratio. Speech Commun. 52(1), 41–60 (2010)
Google Scholar
SoSound-ideas.com, Generalseries6000combo (2012). https://www.sound-ideas.com/Product/51/General-Series-6000-Combo
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)
Google Scholar
Hirsch, H.G.: Fant-filtering and noise adding tool. Niederrhein University of Applied Sciences (2005). http://dnt.kr.hsnr.de/download.html
Ravanelli, M., Bengio, Y.: Speaker recognition from raw waveform with sincnet. In: 2018 IEEE Spoken Language Technology Workshop (SLT), pp. 1021–1028. IEEE (December 2018)
Google Scholar
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)
Google Scholar

Download references

Acknowledgment

This research was partially supported by the National Research Foundation (NRF) Grant (No. 2019R1F1A1048115), the Institute of Information & communications Technology Planning & Evaluation (IITP) Grant (No. IITP-2021-0-00066), and the ICT Creative Consilience program (No. IITP-2020-0-01821) funded by the Korea government (MSIT).

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, SungKyunKwan University, Suwon, South Korea
TaeSoo Kim & Jong Hwan Ko

Authors

TaeSoo Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jong Hwan Ko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jong Hwan Ko .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, T., Ko, J.H. (2022). Application of Adversarial Domain Adaptation to Voice Activity Detection. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2021. Lecture Notes in Networks and Systems, vol 296. Springer, Cham. https://doi.org/10.1007/978-3-030-82199-9_55

Download citation

DOI: https://doi.org/10.1007/978-3-030-82199-9_55
Published: 07 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82198-2
Online ISBN: 978-3-030-82199-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Application of Adversarial Domain Adaptation to Voice Activity Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Robust Voice Activity Detection Based on Concept of Modulation Transfer Function in Noisy Reverberant Environments

Deep Learning Approaches for Voice Activity Detection

Lightweight CNN for Robust Voice Activity Detection

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Application of Adversarial Domain Adaptation to Voice Activity Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Robust Voice Activity Detection Based on Concept of Modulation Transfer Function in Noisy Reverberant Environments

Deep Learning Approaches for Voice Activity Detection

Lightweight CNN for Robust Voice Activity Detection

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation