Sound Classification Using Residual Convolutional Network

Jangid, Mahesh; Nagpal, Kabir

doi:10.1007/978-981-16-2641-8_23

Mahesh Jangid¹⁴ &
Kabir Nagpal¹⁴

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 238))

795 Accesses
1 Citations

Abstract

In this paper, we proposed a new architecture for environmental sound classification on the ESC-50 and urban sound dataset. The ESC-50 dataset is a collection of 2000 labeled environmental audio recordings and the urban sound dataset is a collection of 8732 labeled sound records. The Mel frequency cepstral has been used to obtain the power spectrum of the sound wave. The resulting matrix, made possible the use of the convolutional neural network architecture over the dataset. The new architecture extracts far more complex features repeatedly, while being able to carry it along a greater depth using a ResNet type architecture. After the fine-tuned network, we achieved 89.5% validation accuracy on environmental classification dataset and 96.76% on urban sound dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Environmental Sound Classification Based on CAR-Transformer Neural Network Model

Article 28 April 2023

Deep Convolutional Neural Network with Mixup for Environmental Sound Classification

Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction

Article 26 May 2021

References

Cotton CV, Ellis DPW (2011) Spectral vs. spectro-temporal features for acoustic event detection. In: Proceedings of the applications of signal processing to audio and acoustics. New Paltz, NY, USA, pp 16–19
Google Scholar
Ntalampiras S, Potamitis I, Fakotakis N (2013) Large-scale audio feature extraction and SVM for acoustic scene classification. In: Proceedings of the applications of signal processing to audio and acoustics. New Paltz, NY, USA, pp 20–23
Google Scholar
Automatic recognition of urban environmental sounds events (2008) In: Proceedings of the IAPR workshop on cognitive information processing cip. Santorini, Greece, pp 9–10
Google Scholar
Kaiming H et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Google Scholar
Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: Proceedings of 2013 IEEE International conference on acoustics, speech and signal processing. IEEE
Google Scholar
Yücesoy E, Nabiyev VV (2013) Gender identification of a speaker using MFCC and GMM. In: Proceedings of 2013 8th International conference on electrical and electronics engineering (ELECO). IEEE
Google Scholar
Piczak KJ (2015) ESC: dataset for environmental sound classification. In: Proceedings of the 23rd annual ACM conference on multimedia. Brisbane, Australia
Google Scholar
Salamon J, Jacoby C, Bello JP (2014) A dataset and taxonomy for urban sound research. In: Proceedings of 22nd ACM International conference on multimedia, Orlando, USA
Google Scholar
Piczak KJ (2015) Environmental sound classification with convolutional neural networks. In: Proceedings of 2015 IEEE 25th International workshop on machine learning for signal processing (MLSP). IEEE
Google Scholar
Dai W, Dai C, Qu S, Li J, Das S (2017) Very deep convolutional neural networks for raw waveforms. In: Proceedings of 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 421–425
Google Scholar
Guzhov A et al (2020) ESResNet: Environmental sound classification based on visual domain models. ArXiv abs/2004.07301:nPag
Google Scholar
McFee B, Raffel C, Liang D, Ellis DPW, McVicar M, Battenberg E, Nieto O (2015) Librosa: audio and music signal analysis in python. In: Proceedings of the 14th python in science conference, pp 18–25
Google Scholar
Wikipedia contributors (2020) Mel-frequency cepstrum. Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 21 Dec 2019, Web 16 Sep
Google Scholar
Stevens SS, Volkmann J, Newman EB (1937) A scale for the measurement of the psychological magnitude pitch. J Acoust Soc Am 8(3):185–190
Google Scholar
Ahmed N, Natarajan T, Rao KR (1974) Discrete cosine transform. IEEE Trans Comput 100(1):90–93
Google Scholar
Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366
Article Google Scholar
Huang X, Acero A, Hon H (2001) Spoken language processing: a guide to theory, algorithm, and system development. Prentice Hall
Google Scholar
Bottou L (1991) Stochastic gradient learning in neural networks. Proc Neuro-Nımes 91(8):12
Google Scholar
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. 1412.6980
Chollet F et al (2015) Keras. GitHub. https://github.com/fchollet/keras
Wikipedia contributors (2020) Tikhonov regularization. Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 15 Sep 2020. Web
Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167
Srivastava N et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE, Manipal University Jaipur, Jaipur, Rajasthan, India
Mahesh Jangid & Kabir Nagpal

Authors

Mahesh Jangid
View author publications
You can also search for this author in PubMed Google Scholar
Kabir Nagpal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Data Engineering, University of Technology Sydney, Sydney, NSW, Australia
Priyadarsi Nanda
Manipal University Jaipur, Jaipur, Rajasthan, India
Vivek Kumar Verma
Manipal University Jaipur, Jaipur, Rajasthan, India
Sumit Srivastava
Manipal University Jaipur, Jaipur, Rajasthan, India
Rohit Kumar Gupta
Department of Computer Science and Engineering, Malviya National Institute of Technology, Jaipur, Rajasthan, India
Arka Prokash Mazumdar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jangid, M., Nagpal, K. (2022). Sound Classification Using Residual Convolutional Network. In: Nanda, P., Verma, V.K., Srivastava, S., Gupta, R.K., Mazumdar, A.P. (eds) Data Engineering for Smart Systems. Lecture Notes in Networks and Systems, vol 238. Springer, Singapore. https://doi.org/10.1007/978-981-16-2641-8_23

Download citation

DOI: https://doi.org/10.1007/978-981-16-2641-8_23
Published: 14 November 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-2640-1
Online ISBN: 978-981-16-2641-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Sound Classification Using Residual Convolutional Network

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Environmental Sound Classification Based on CAR-Transformer Neural Network Model

Deep Convolutional Neural Network with Mixup for Environmental Sound Classification

Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Sound Classification Using Residual Convolutional Network

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Environmental Sound Classification Based on CAR-Transformer Neural Network Model

Deep Convolutional Neural Network with Mixup for Environmental Sound Classification

Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation