Skip to main content

Application of Adversarial Domain Adaptation to Voice Activity Detection

  • Conference paper
  • First Online:
Intelligent Systems and Applications (IntelliSys 2021)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 296))

Included in the following conference series:

Abstract

Voice Activity Detection (VAD) is becoming an essential front-end component in various speech processing systems. As those systems are commonly deployed in environments with diverse noise types and low signal-to-noise ratios (SNRs), an effective VAD method should perform robust detection of speech region out of noisy background signals. In this paper, we propose applying an adversarial domain adaptation technique to VAD. The proposed method trains DNN models for a VAD task in a supervised manner, simultaneously mitigating the problem of area mismatch between noisy and clean audio stream in a unsupervised manner. The experimental results show that the proposed method improves robust detection performance in noisy environments compared to other DNN-based model learned with hand-crafted acoustic feature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Goodfellow, I.J., et al.: Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014)

  2. Tzeng, E., et al.: Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167–7176 (2017)

    Google Scholar 

  3. Kim, J., Kim, J., Lee, S., Park, J., Hahn, M.: Vowel based voice activity detection with LSTM recurrent neural network. In Proceedings of the 8th International Conference on Signal Processing Systems, pp. 134–137 (November 2016)

    Google Scholar 

  4. Eyben, F., Weninger, F., Squartini, S., Schuller, B.: Real-life voice activity detection with lstm recurrent neural networks and an application to Hollywood movies. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 483–487. IEEE (May 2013)

    Google Scholar 

  5. Zhang, X. L., Wang, D.: Boosted deep neural networks and multi-resolution cochleagram features for voice activity detection. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)

    Google Scholar 

  6. Kim, J., Hahn, M.: Voice activity detection using an adaptive context attention model. IEEE Sig. Process. Lett. 25(8), 1181–1185 (2018)

    Google Scholar 

  7. Tong, S., Gu, H., Yu, K.: A comparative study of robustness of deep learning approaches for VAD. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5695–5699. IEEE (March 2016)

    Google Scholar 

  8. Zhang, X.L., Wu, J.: Deep belief networks based voice activity detection. IEEE Trans. Audio, Speech, Lang. Process. 21(4), 697–710 (2012)

    Google Scholar 

  9. Zhang, X.-L.: Unsupervised domain adaptation for deep neural network based voice activity detection. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6864–6868. IEEE (2014)

    Google Scholar 

  10. Lavechin, M., Gill, M. P., Bousbib, R., Bredin, H., Garcia-Perera, L.P.: End-to-end Domain-Adversarial Voice Activity Detection. arXiv preprint arXiv:1910.10655 (2019)

  11. Shahid, M., Beyan, C., Murino, V.: Voice activity detection by upper body motion analysis and unsupervised domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)

    Google Scholar 

  12. Varga, A., Steeneken, H.J.: Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)

    Google Scholar 

  13. Zue, V., Seneff, S., Glass, J.: Speech database development at MIT: TIMIT and beyond. Speech Commun. 9(4), 351–356 (1990)

    Google Scholar 

  14. Berthelot, D., Schumm, T., Metz, L.: Began: Boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717 (2017)

  15. Ishizuka, K., Nakatani, T., Fujimoto, M., Miyazaki, N.: Noise robust voice activity detection based on periodic to aperiodic component ratio. Speech Commun. 52(1), 41–60 (2010)

    Google Scholar 

  16. SoSound-ideas.com, Generalseries6000combo (2012). https://www.sound-ideas.com/Product/51/General-Series-6000-Combo

  17. Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)

    Google Scholar 

  18. Hirsch, H.G.: Fant-filtering and noise adding tool. Niederrhein University of Applied Sciences (2005). http://dnt.kr.hsnr.de/download.html

  19. Ravanelli, M., Bengio, Y.: Speaker recognition from raw waveform with sincnet. In: 2018 IEEE Spoken Language Technology Workshop (SLT), pp. 1021–1028. IEEE (December 2018)

    Google Scholar 

  20. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)

    Google Scholar 

Download references

Acknowledgment

This research was partially supported by the National Research Foundation (NRF) Grant (No. 2019R1F1A1048115), the Institute of Information & communications Technology Planning & Evaluation (IITP) Grant (No. IITP-2021-0-00066), and the ICT Creative Consilience program (No. IITP-2020-0-01821) funded by the Korea government (MSIT).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jong Hwan Ko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kim, T., Ko, J.H. (2022). Application of Adversarial Domain Adaptation to Voice Activity Detection. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2021. Lecture Notes in Networks and Systems, vol 296. Springer, Cham. https://doi.org/10.1007/978-3-030-82199-9_55

Download citation

Publish with us

Policies and ethics