Abstract
Large amounts of audio data are available with the advent of technology. The role of audio data is decisive in analysing the data, be it activity recognition, event detection, etc. Classification of audio stream will help us to corroborate the results obtained from other media. We trained a CNN model to classify benchmark data sets ESC-10 and ESC-50. Along with these benchmark data sets, we tried a custom data set as well. CNN is trained on extracted low-level audio features from the custom and benchmark audio snippets which are both from constrained and noisy environments. We are able to identify CNN architecture with minimum layers which works good with both benchmark and custom data set. We also experimented to detect the most influencing feature which alone is sufficient to classify the multiple classes of audio data. Classification accuracy as high as 98% is reported.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Eyben F (2016) Real-time speech and music classification by large audio feature space extraction. Springer International Publishing, Springer theses
Schuller BW (2013) Intelligent audio analysis. Springer, Berlin, Heidelberg
Paraskevas I, Chilton E (2003) Audio classification using acoustic images for retrieval from multimedia databases. In: Proceedings EC-VIP-MC 2003. 4th EURASIP conference focused on video/image processing and multimedia communications (IEEE Cat. No.03EX667), vol 1, pp 187–192. https://doi.org/10.1109/VIPMC.2003.1220460
Piczak KJ, Mohaimenuzzaman Md (2015) ESC: dataset for environmental sound classification
Kumar A, Ithapu VK (2020) A sequential self teaching approach for improving generalization in sound event recognition
Kim J (2020) Urban sound tagging using multi-channel audio feature with convolutional neural networks. AI Research Lab, IVS Inc, Seoul, South Korea
Nanni L, Maguoloa G, Brahnam S, Paci M (2021) An ensemble of convolutional neural networks for audio classification
Sailor HB, Agrawal DM, Patil HA (2017) Unsupervised filterbank learning using convolutional restricted Boltzmann machine for environmental sound classification. INTERSPEECH 2017, August 2017. Stockholm, Sweden, pp 3107–3111
Huang JJ, Leanos JJA (2018) Aclnet: efficient end-to-end audio classification CNN
Wilkinghoff K (2021) On open-set classification with L3-net embeddings for machine listening applications
Tak RN, Agrawal D, Patil H (2017) Novel phase encoded mel filterbank energies for environmental sound classification
Kumar A, Khadkevich M, Fugen C (2018) Knowledge transfer from weakly labeled audio using convolutional neural network for sound events and scenes
Hu D, Nie F, Li X (2019) Deep multimodal clustering for unsupervised audio visual learning
Agrawal DM, Sailor HB, Soni MH, Patil HA (2017) Novel TEO-based Gammatone features for environmental sound classification
Xu Y, Kong Q, Wang W, Plumbley MD (2018) Large-scale weakly supervised audio classification using gated convolutional neural network. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 121–125. https://doi.org/10.1109/ICASSP.2018.8461975
Hershey S et al (2017) CNN architectures for large-scale audio classification. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 131–135. https://doi.org/10.1109/ICASSP.2017.7952132
Zeng Y, Mao H, Peng D, Yi Z (2019) Spectrogram based multi-task audio classification. Multimedia Tools Appl (Springer) 78:3705–3722
Nannia L, Costab YM, Luciob DR, Silla CN Jr, Brahnamd S (2017) Combining visual and acoustic features for audio classification tasks. Pattern Recogn Lett 88:49–56
Prathima T, Govardhan A, Ramadevi Y (2018) Rough set based classification of audio data. In: 3rd international conference on computational intelligence & informatics (ICCII-2018), December 2018. Hyderabad, Telangana, India
Freesound.org
McFee B, Raffel C, Liang D, Ellis PW, McVicar M, Battenberg E, Nieto O (2015) Librosa: audio and music signal analysis in python. In: Proceedings of the 14th python in science conference, pp 18–25
Nielsen MA (2015) Neural networks and deep learning. Determination Press
Goodfellow, Bengio Y, Courville A (2016) Deep learning. MIT Press (e-book)
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. The Morgan Kaufmann series in data management systems, 3rd edn. Morgan Kaufmann Publishers. ISBN 978–0123814791
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Prathima, T., Govardhan, A., Palla, S., Sri Yagna, K. (2022). Constrained and Unconstrained Audio Classifıcation. In: Pandian, A.P., Palanisamy, R., Narayanan, M., Senjyu, T. (eds) Proceedings of Third International Conference on Intelligent Computing, Information and Control Systems. Advances in Intelligent Systems and Computing, vol 1415. Springer, Singapore. https://doi.org/10.1007/978-981-16-7330-6_75
Download citation
DOI: https://doi.org/10.1007/978-981-16-7330-6_75
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-7329-0
Online ISBN: 978-981-16-7330-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)