Skip to main content

Sound Classification Using Residual Convolutional Network

  • Conference paper
  • First Online:
Data Engineering for Smart Systems

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 238))

Abstract

In this paper, we proposed a new architecture for environmental sound classification on the ESC-50 and urban sound dataset. The ESC-50 dataset is a collection of 2000 labeled environmental audio recordings and the urban sound dataset is a collection of 8732 labeled sound records. The Mel frequency cepstral has been used to obtain the power spectrum of the sound wave. The resulting matrix, made possible the use of the convolutional neural network architecture over the dataset. The new architecture extracts far more complex features repeatedly, while being able to carry it along a greater depth using a ResNet type architecture. After the fine-tuned network, we achieved 89.5% validation accuracy on environmental classification dataset and 96.76% on urban sound dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cotton CV, Ellis DPW (2011) Spectral vs. spectro-temporal features for acoustic event detection. In: Proceedings of the applications of signal processing to audio and acoustics. New Paltz, NY, USA, pp 16–19

    Google Scholar 

  2. Ntalampiras S, Potamitis I, Fakotakis N (2013) Large-scale audio feature extraction and SVM for acoustic scene classification. In: Proceedings of the applications of signal processing to audio and acoustics. New Paltz, NY, USA, pp 20–23

    Google Scholar 

  3. Automatic recognition of urban environmental sounds events (2008) In: Proceedings of the IAPR workshop on cognitive information processing cip. Santorini, Greece, pp 9–10

    Google Scholar 

  4. Kaiming H et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Google Scholar 

  5. Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: Proceedings of 2013 IEEE International conference on acoustics, speech and signal processing. IEEE

    Google Scholar 

  6. Yücesoy E, Nabiyev VV (2013) Gender identification of a speaker using MFCC and GMM. In: Proceedings of 2013 8th International conference on electrical and electronics engineering (ELECO). IEEE

    Google Scholar 

  7. Piczak KJ (2015) ESC: dataset for environmental sound classification. In: Proceedings of the 23rd annual ACM conference on multimedia. Brisbane, Australia

    Google Scholar 

  8. Salamon J, Jacoby C, Bello JP (2014) A dataset and taxonomy for urban sound research. In: Proceedings of 22nd ACM International conference on multimedia, Orlando, USA

    Google Scholar 

  9. Piczak KJ (2015) Environmental sound classification with convolutional neural networks. In: Proceedings of 2015 IEEE 25th International workshop on machine learning for signal processing (MLSP). IEEE

    Google Scholar 

  10. Dai W, Dai C, Qu S, Li J, Das S (2017) Very deep convolutional neural networks for raw waveforms. In: Proceedings of 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 421–425

    Google Scholar 

  11. Guzhov A et al (2020) ESResNet: Environmental sound classification based on visual domain models. ArXiv abs/2004.07301:nPag

    Google Scholar 

  12. McFee B, Raffel C, Liang D, Ellis DPW, McVicar M, Battenberg E, Nieto O (2015) Librosa: audio and music signal analysis in python. In: Proceedings of the 14th python in science conference, pp 18–25

    Google Scholar 

  13. Wikipedia contributors (2020) Mel-frequency cepstrum. Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 21 Dec 2019, Web 16 Sep

    Google Scholar 

  14. Stevens SS, Volkmann J, Newman EB (1937) A scale for the measurement of the psychological magnitude pitch. J Acoust Soc Am 8(3):185–190

    Google Scholar 

  15. Ahmed N, Natarajan T, Rao KR (1974) Discrete cosine transform. IEEE Trans Comput 100(1):90–93

    Google Scholar 

  16. Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366

    Article  Google Scholar 

  17. Huang X, Acero A, Hon H (2001) Spoken language processing: a guide to theory, algorithm, and system development. Prentice Hall

    Google Scholar 

  18. Bottou L (1991) Stochastic gradient learning in neural networks. Proc Neuro-Nımes 91(8):12

    Google Scholar 

  19. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. 1412.6980

  20. Chollet F et al (2015) Keras. GitHub. https://github.com/fchollet/keras

  21. Wikipedia contributors (2020) Tikhonov regularization. Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 15 Sep 2020. Web

    Google Scholar 

  22. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167

  23. Srivastava N et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jangid, M., Nagpal, K. (2022). Sound Classification Using Residual Convolutional Network. In: Nanda, P., Verma, V.K., Srivastava, S., Gupta, R.K., Mazumdar, A.P. (eds) Data Engineering for Smart Systems. Lecture Notes in Networks and Systems, vol 238. Springer, Singapore. https://doi.org/10.1007/978-981-16-2641-8_23

Download citation

Publish with us

Policies and ethics