Automatic Identification of Vietnamese Singer Voices Using Deep Learning and Data Augmentation

Le Thuy, Dao Thi; Thanh, Chu Ba; Van Loan, Trinh; Thanh, Le Xuan

doi:10.1007/978-3-031-50818-9_27

Dao Thi Le Thuy ORCID: orcid.org/0000-0001-6283-4869¹⁴,
Chu Ba Thanh ORCID: orcid.org/0009-0005-6049-9469¹⁵,
Trinh Van Loan¹⁶ &
…
Le Xuan Thanh¹⁶

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 848))

Included in the following conference series:

International Conference on Advances in Information and Communication Technology

223 Accesses

Abstract

Singer identification is a fascinating and complex field in audio processing and voice recognition. It aims to determine the identity of a singer from a particular audio clip or song. This can be useful in various applications, from constructing personalised playlists to categorising and sorting large music libraries. Traditional methods of voice identification often rely on handcrafted features and statistical models. However, deep learning models, particularly neural networks, have demonstrated exceptional capabilities in automatically learning relevant features directly from raw audio data. This allows them to uncover subtle patterns and representations that were previously challenging to detect. In this paper, we present our study of the automatic identification of some Vietnamese singer voices using deep learning and data augmentation. We built a dataset consisting of ten Vietnamese artists with 2200 excerpts, then performed the identification of these artist’s voices using GRU, LSTM, and CNN models. Our research showed that the GRU model has a higher identification accuracy than the LSTM and CNN models. We proposed a new method of data augmentation for singer voice identification. This new method proved to be much more effective than the regular methods for audio data, such as noise addition and pitch shifting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An ensemble model of CNN with Bi-LSTM for automatic singer identification

Article 27 March 2023

High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation

Article Open access 14 September 2022

Singing voice separation using a deep convolutional neural network trained by ideal binary mask and cross entropy

Article 13 December 2018

References

Wai, S.L.: Singer identification using Gaussian mixture model (GMM). Doctoral dissertation, MERAL Portal (2010)
Google Scholar
Tsai, W.H., Lin, H.P.: Background music removal based on cepstrum transformation for popular singer identification. IEEE Trans. Audio Speech Lang. Process. 19(5), 1196–1205 (2010)
Article Google Scholar
Ratanpara, T., Patel, N.: Singer identification using perceptual features and cepstral coefficient form of an audio signal from Indian video songs. EURASIP J. Audio Speech Music Process. 2015(1), 1–12 (2015)
Article Google Scholar
Sangeetha, R., Nalini, N.J.: Singer identification using MFCC and CRP features with support vector machines. In: Computational Intelligence in Pattern Recognition: Proceedings of CIPR 2019, pp. 295–306. Springer Singapore (2020)
Google Scholar
Jitendra, M.S., Radhika, Y.: An ensemble model of CNN with Bi-LSTM for automatic singer identification. Multimed. Tools Appl. 1–22 (2023)
Google Scholar
Schlüter, J., Grill, T.: Exploring data augmentation for improved singing voice detection with neural networks. In: Proceedings of the 16th International Society for Music Information Retrieval Conference. Malaga, Spain, 26–30 October 2015. pp. 121–126 (2015)
Google Scholar
Srinivasa Murthy, Y.V., Koolagudi, S.G., Jeshventh Raja, T.K.: Singer identification for Indian singers using convolutional neural networks. Int. J. Speech Technol. 24, 781–796 (2021). https://doi.org/10.1007/s10772-021-09849-5
Article Google Scholar
Shen, Z., Yong, B., Zhang, G., Zhou, R., Zhou. Q.: A deep learning method for Chinese singer identification. Tsinghua Sci. Technol. 24(4), 371–378. https://doi.org/10.26599/TST.2018.9010121 (2019)
Zhang, X., Yu, Y., Gao, Y., Chen, X., Li, W.: Research on singing voice detection based on a long-term recurrent convolutional network with vocal separation and temporal smoothing. Electron. 9, 1458 (2020). https://doi.org/10.3390/electronics9091458
Article Google Scholar
Lehner, B., Widmer, G., Bock, S.: A low-latency, real-time-capable singing voice detection method with LSTM recurrent neural networks. In: Proceedings of the IEEE 23rd European Signal Processing Conference, Nice, France, 31 August-4 September. pp. 21–25 (2015)
Google Scholar
Leglaive, S., Hennequin, R., Badeau, R.: Singing voice detection with deep recurrent neural networks. In: Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, QLD, Australia, 19–24 April 2015. pp. 121–125 (2015)
Google Scholar
Huang, H.M., Chen, W.K., Liu, C.H., You, S.D.: Singing voice detection based on convolutional neural networks. In: Proceedings of the IEEE 7th International Symposium on Next Generation Electronics, Taipei, Taiwan, 7–9, pp. 1–4 May (2018)
Google Scholar
Zhang, X., Li, S., Li, Z., Chen, S., Gao, Y., Li, W.: Singing voice detection using multi-feature deep fusion with CNN. In: Proceedings of the 7th Conference on Sound and Music Technology (CSMT), pp. 41–52. Springer, Berlin/Heidelberg, Germany (2020)
Google Scholar
Kum, S., Nam, J.: Joint detection and classification of singing voice melody using convolutional recurrent neural networks. Sci. 9, 1324 Appl (2019)
Google Scholar
Donahue, J., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12. pp. 2625–2634 (2015)
Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2298–2304 (2016)
Article Google Scholar
Hsieh, T. -H., Cheng, K. -H., Fan, Z. -C., Yang, Y. -C., Yang, Y. -H.: Addressing the confounds of accompaniments in singer identification. In: ICASSP 2020−2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, pp. 1–5 (2020). https://doi.org/10.1109/ICASSP40776.2020.9054069
Zhang, X., Wang, J., Cheng, N., Xiao, J.: MetaSID: singer identification with domain adaptation for metaverse. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–7. Padua, Italy (2022). https://doi.org/10.1109/IJCNN55064.2022.9892793
Zhang, X., et al.: Singer identification using deep timbre feature learning with KNN-NET. In: ICASSP 2021−2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3380–3384 (2021)
Google Scholar
Hennequin, R., Khlif, A., Voituret, F., Moussallam, M.: Spleeter: a fast and state-of the art music source separation tool with pre-trained models. Late-Breaking/Demo ISMIR (2019)
Google Scholar
Thanh, C.B., Van Loan, T., Le Thuy, D.T.: Automatic identification of some Vietnamese folk songs Cheo and Quanho using deep neural networks. J. Comput. Sci. Cybern. 38(1), 63−83
Google Scholar
Le, T.D.T., Van, L.T., Hong, Q.N.: Deep convolutional neural networks for emotion recognition of Vietnamese. Int. J. Mach. Learn. Comput. 10(5), 692–699 (2020). https://doi.org/10.18178/ijmlc.2020.10.5.992
Trinh Van, L., Dao Thi Le, T., Le Xuan, T., Castelli, E.: Emotional speech recognition using deep neural networks. Sens. 22(4), 1414 (2022). https://doi.org/10.3390/s22041414
Oppenheim, A., Schafer, R.: Discrete-Time Signal Processing. Pearson India (2014)
Google Scholar
McFee, B., et al.: Librosa: audio and music signal analysis in python. In: Proceedings of the Python in Science Conference (2015). https://doi.org/10.25080/majora-7b98e3ed-003
Murthy, Y.V.S., Jeshventh, T.K.R.M., Zoeb, M., Saumyadip, M., Shashidhar, G.K.: Singer identification from smaller snippets of audio clips using acoustic features and DNNs. In: 2018 Eleventh International Conference on Contemporary Computing (IC3), pp. 1–6. Noida, India (2018). https://doi.org/10.1109/IC3.2018.8530602
Thuy, D.T.L., Loan, T.V., Thanh, C.B., Cuong, N.H.: Music genre classification using densenet and data augmentation. Comput. Syst. Sci. Eng. 47(1), 657–674 (2023)
Article Google Scholar
Jobsn, A.: How to treat overfitting in convolutional neural networks (2020). Available online: https://www.analyticsvidhya.com/blog/2020/09/overfitting-in-cnn-show-to-treat-overfitting-in-convolutional-neural-networks. Accessed 4 Oct 2022
Bhandari, A.: AUC-ROC curve in machine learning clearly explained (2020). Available online: https://www.analyticsvidhya.com/blog/2020/06/auc-roc-curve-machine-learning/. Accessed 4 Oct 4 2022

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, University of Transport and Communications, Hanoi, Vietnam
Dao Thi Le Thuy
Faculty of Information Technology, University of Technology and Education Hung Yen, Hungyen, Vietnam
Chu Ba Thanh
School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi, Vietnam
Trinh Van Loan & Le Xuan Thanh

Authors

Dao Thi Le Thuy
View author publications
You can also search for this author in PubMed Google Scholar
Chu Ba Thanh
View author publications
You can also search for this author in PubMed Google Scholar
Trinh Van Loan
View author publications
You can also search for this author in PubMed Google Scholar
Le Xuan Thanh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chu Ba Thanh .

Editor information

Editors and Affiliations

Thai Nguyen University of Information and Communication Technology, Thai Nguyen, Vietnam
Phung Trung Nghia
Thai Nguyen University of Information and Communication Technology, Thai Nguyen, Vietnam
Vu Duc Thai
VNU University of Engineering and Technology, Vietnam National University, Ha Noi, Vietnam
Nguyen Thanh Thuy
Information Technology Institute, Vietnam National University, Ha Noi, Vietnam
Le Hoang Son
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Van-Nam Huynh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Le Thuy, D.T., Thanh, C.B., Van Loan, T., Thanh, L.X. (2024). Automatic Identification of Vietnamese Singer Voices Using Deep Learning and Data Augmentation. In: Nghia, P.T., Thai, V.D., Thuy, N.T., Son, L.H., Huynh, VN. (eds) Advances in Information and Communication Technology. ICTA 2023. Lecture Notes in Networks and Systems, vol 848. Springer, Cham. https://doi.org/10.1007/978-3-031-50818-9_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-50818-9_27
Published: 04 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50817-2
Online ISBN: 978-3-031-50818-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Automatic Identification of Vietnamese Singer Voices Using Deep Learning and Data Augmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An ensemble model of CNN with Bi-LSTM for automatic singer identification

High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation

Singing voice separation using a deep convolutional neural network trained by ideal binary mask and cross entropy

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automatic Identification of Vietnamese Singer Voices Using Deep Learning and Data Augmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An ensemble model of CNN with Bi-LSTM for automatic singer identification

High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation

Singing voice separation using a deep convolutional neural network trained by ideal binary mask and cross entropy

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation