Spoken Language Identification of Indian Languages Using MFCC Features

Biswas, Mainak; Rahaman, Saif; Kundu, Satwik; Singh, Pawan Kumar; Sarkar, Ram

doi:10.1007/978-981-15-9492-2_12

Mainak Biswas⁴,
Saif Rahaman⁴,
Satwik Kundu⁴,
Pawan Kumar Singh⁴ &
…
Ram Sarkar⁵

Part of the book series: Studies in Big Data ((SBD,volume 82))

627 Accesses
7 Citations

Abstract

Spoken language has been the most natural mode of communication since a long time. In today’s world, the advancement of telecommunication technology has taken communication to the next level. This acts as motivation for an automatic speech recognition (ASR) process. As research in ASR progresses, the development of language identification would be necessary for any multi-lingual ASR system. This problem is highly pertinent to India due to the existence of a wide diversity of languages, each having hundreds of dialects. In this chapter, we propose a spoken language identification framework for the recognition of the six most widely used spoken languages in India namely, English, Hindi, Bangla, Marathi, Tamil and Telugu. Firstly, the pre-processing is performed on an input audio clip, which focuses on removing its silent frames. This, in turn, increases its probability of recognition as redundant and non-informative information gets eliminated. The silence removal parameters are optimized in order to achieve the best performance. Secondly, the Mel Frequency Cepstral Coefficient (MFCC) features are extracted from the speech signals. Using these features, a Support Vector Machine (SVM) classifier is trained, which has been established as a superior classifier, due to its large-margin classification property, and its ability to classify complex and non-linear data. The proposed framework is tested on a standard IndicTTS speech database. SVM is trained for 13 (static), 26 (static + delta) and 39 (static + delta + delta-delta) MFCC features. It is found that the best results are achieved using only 13 static features, and addition of delta and delta-delta features decreases the performance. Maximum accuracy obtained is 60.33% for the above-mentioned 6 languages, and 89.33% for only Bangla, English and Hindi.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Language Discrimination from Speech Signal Using Perceptual and Physical Features

Recognition of Spoken Languages from Acoustic Speech Signals Using Fourier Parameters

Article 04 April 2019

A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features

Article 12 October 2018

References

B. Aarti, S.K. Kopparapu, Spoken Indian language identification: a review of features and databases. Sādhanā 43(4), 53 (2018). https://www.ias.ac.in/article/fulltext/sadh/043/04/0053
M.A.A. Albadr, S. Tiun, F.T. AL-Dhief, M.A.M. Sammour, Spoken language identification based on the enhanced self-adjusting extreme learning machine approach. PLOS ONE 13(4) (2018). https://doi.org/10.1371/journal.pone.0194770
S.A. Alim, N.K.A. Rashid, Some commonly used speech feature extraction algorithms. From Natural to Artificial Intelligence-Algorithms and Applications (2018)
Google Scholar
B. Barai, D. Das, N. Das, S. Basu, M. Nasipuri, VQ/GMM-based speaker identification with emphasis on language dependency. In Advanced Computing and Systems for Security (Springer, 2019), pp. 125–141. https://www.researchgate.net/profile/Bidhan_Barai/publication/330414980_VQGMM-Based_Speaker_Identification_with_Emphasis_on_Language_Dependency_Volume_Eight/links/5c5043ad299bf12be3eb7d6a/VQ-GMM-Based-Speaker-Identification-with-Emphasis-on-Language-Depen
Google Scholar
P. Beckmann,M. Kegler, H. Saltini, M. Cerňak, Speech-VGG: A deep feature extractor for speech processing (2019). https://arxiv.org/pdf/1910.09909.pdf
P. Bhaskararao, Salient phonetic features of Indian languages in speech technology. Sadhana 36(5), 587–599 (2011). https://www.ias.ac.in/article/fulltext/sadh/036/05/0587-0599
M. Gupta, S.S. Bharti, S. Agarwal, Implicit language identification system based on random forest and support vector machine for speech, in 2017 4th International Conference on Power, Control & Embedded Systems (ICPCES), pp. 1–6 (2017). https://ieeexplore.ieee.org/document/8117624
J. Hui, Speech Recognition—Feature Extraction MFCC & PLP (2019). https://medium.com/@jonathan_hui/speech-recognition-feature-extraction-mfcc-plp-5455f5a69dd9
Importance of Language in Society, pp. 1–35 (n.d.). https://shodhganga.inflibnet.ac.in/bitstream/10603/29223/17/9_chapter 1.pdf
B. Jiang, Y. Song, S. Wei, J.-H. Liu, I.V. McLoughlin, L.-R. Dai, Deep Bottleneck Features for Spoken Language Identification. PLoS ONE 9(7), e100795 (2014). https://doi.org/10.1371/journal.pone.0100795
Article Google Scholar
P. Kumar, A. Biswas, A. Mishra, M. Chandra, Spoken language identification using hybrid feature extraction methods. J. Telecommun. 1(2), 11–15 (2010). https://www.researchgate.net/publication/45909010_Spoken_Language_Identification_Using_Hybrid_Feature_Extraction_Methods/citation/download
Learn Natural Language Processing: From Beginner to Expert (2020). https://www.commonlounge.com/discussion/3ecabc3d82684d57a62ad8fbc200f43b
G. Madzarov, D. Gjorgjevikj, Multi-class classification using support vector machines in decision tree architecture, in IEEE EUROCON 2009, pp. 288–295 (2009). https://www.researchgate.net/publication/224564327_Multi-Class_Classification_Using_Support_Vector_Machines_In_Decision_Tree_Architecture
S. Maity, A.K. Vuppala, K.S. Rao, D. Nandi, IITKGP-MLILSC speech database for language identification, in 2012 National Conference on Communications (NCC), pp. 1–5 (2012). http://cdn.iiit.ac.in/cdn/speech.iiit.ac.in/svlpubs/conference/sudhamay-anil.pdf
S. Manchala, V.K. Prasad, V. Janaki, GMM based language identification system using robust features. Int. J. Speech Technol. 17(2), 99–105 (2014). https://springerlink.bibliotecabuap.elogim.com/article/10.1007/s10772-013-9209-1
J.M. Moguerza, A. Muñoz, Support vector machines with applications. Stati. Sci. 21(3), 322–336 (2006)
Article MathSciNet Google Scholar
S. Mohanty, Phonotactic model for spoken language identification in Indian language perspective. Int. J. Comput. Appl. 19, 18–24 (2011). https://doi.org/10.5120/2389-3164
Article Google Scholar
H. Mukherjee, A. Dhar, S.M. Obaidullah, S. Phadikar, K. Roy, Image-based features for speech signal classification. Multimedia Tools Appl., 1–17 (2020). https://doi.org/10.1007/s11042-019-08553-6
H. Mukherjee, S. Ghosh, S. Sen, O. Sk Md, K.C. Santosh, S. Phadikar, K. Roy, Deep learning for spoken language identification: can we visualize speech signal patterns? Neural Comput. Appl. 31(12), 8483–8501 (2019). https://doi.org/10.1007/s00521-019-04468-3
H. Mukherjee, S.M. Obaidullah, K.C. Santosh, S. Phadikar, K. Roy, A lazy learning-based language identification from speech using MFCC-2 features. Int. J. Mach. Learn. Cybernet. 11(1), 1–14 (2020). https://doi.org/10.1007/s13042-019-00928-3
Article Google Scholar
N. Krishna, A. Patil, M.S. Prince, S. Sai, P. Garapati, Identification of Indian Languages using Ghost-VLAD pooling (2020). https://www.researchgate.net/publication/339065645_Identification_of_Indian_Languages_using_Ghost-VLAD_pooling
Nyquist-Shannon Sampling Theorem, Wikipedia (n.d.). https://en.wikipedia.org/wiki/Nyquist–Shannon_sampling_theorem
B. Padi, A. Mohan, S. Ganapathy, Towards relevance and sequence modeling in language recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 1223–1232 (2020). https://doi.org/10.1109/TASLP.2020.2983580
Article Google Scholar
A.D. Patil, Spoken language identification using machine learning, in Project report submitted to M S Ramaiah Institute of Technology (Issue May 2012). http://timewarp.adarshpatil.in/misc/LID_final_report.pdf
A. Patle, D.S. Chouhan, SVM kernel functions for classification. Int. Conf. Adv. Technol. Eng. (ICATE) 2013, 1–9 (2013). https://doi.org/10.1109/ICAdTE.2013.6524743
Article Google Scholar
U. Shrawankar, V.M. Thakare, Techniques for feature extraction in speech recognition system: a comparative study (2013). ArXiv Preprint ArXiv:1305.1145. https://arxiv.org/ftp/arxiv/papers/1305/1305.1145.pdf
Signal Processing Toolbox, Mathworks (n.d.). https://www.mathworks.com/help/signal/
E. Singer, P. Torres-Carrasquillo, T. Gleason, W. Campbell, D. Reynolds, Acoustic, phonetic, and discriminative approaches to automatic language identification, in Eighth European Conference on Speech Communication and Technology, vol. 1 (2003). https://www.researchgate.net/publication/221489129_Acoustic_phonetic_and_discriminative_approaches_to_automatic_language_identification
Speech and Music Technology Lab IIT Madras, IIT Madras Speech Corpus (n.d.). https://www.iitm.ac.in/donlab/tts/database.php
Statistics and Machine Learning Toolbox. Mathworks (n.d.). https://www.mathworks.com/help/stats/index.html
G. Strang, Linear algebra and its application, in Linear Algebra 4th Edition, pp. 211–221, Chap. 3.5 (n.d.-a). http://facultymember.iaukhsh.ac.ir/images/Uploaded_files/[Strang_G.]_Linear_algebra_and_its_applications(4)[5881001].PDF
G. Strang, Linear algebra and its application. In Linear Algebra 4th Edition, pp. 180–195, Chap. 3.3 (n.d.-b). http://facultymember.iaukhsh.ac.ir/images/Uploaded_files/[Strang_G.]_Linear_algebra_and_its_applications(4)[5881001].PDF
A. Titus, J. Silovsky, N. Chen, R. Hsiao, M. Young, A. Ghoshal, Improving Language Identification for Multilingual Speakers (2020). https://arxiv.org/pdf/2001.11019.pdf

Download references

Author information

Authors and Affiliations

Department of Information Technology, Jadavpur University, Kolkata, 700106, West Bengal, India
Mainak Biswas, Saif Rahaman, Satwik Kundu & Pawan Kumar Singh
Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, West Bengal, India
Ram Sarkar

Authors

Mainak Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Saif Rahaman
View author publications
You can also search for this author in PubMed Google Scholar
Satwik Kundu
View author publications
You can also search for this author in PubMed Google Scholar
Pawan Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar
Ram Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pawan Kumar Singh .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering and Information Technology, Jaypee University of Information Technology, Solan, Himachal Pradesh, India
Pardeep Kumar
Department of Computer Science and Engineering, National Institute of Technology, Patna, Bihar, India
Amit Kumar Singh

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Biswas, M., Rahaman, S., Kundu, S., Singh, P.K., Sarkar, R. (2021). Spoken Language Identification of Indian Languages Using MFCC Features. In: Kumar, P., Singh, A.K. (eds) Machine Learning for Intelligent Multimedia Analytics. Studies in Big Data, vol 82. Springer, Singapore. https://doi.org/10.1007/978-981-15-9492-2_12

Download citation

DOI: https://doi.org/10.1007/978-981-15-9492-2_12
Published: 17 January 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9491-5
Online ISBN: 978-981-15-9492-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Spoken Language Identification of Indian Languages Using MFCC Features

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Language Discrimination from Speech Signal Using Perceptual and Physical Features

Recognition of Spoken Languages from Acoustic Speech Signals Using Fourier Parameters

A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Spoken Language Identification of Indian Languages Using MFCC Features

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Language Discrimination from Speech Signal Using Perceptual and Physical Features

Recognition of Spoken Languages from Acoustic Speech Signals Using Fourier Parameters

A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation