Abstract
Spoken language has been the most natural mode of communication since a long time. In today’s world, the advancement of telecommunication technology has taken communication to the next level. This acts as motivation for an automatic speech recognition (ASR) process. As research in ASR progresses, the development of language identification would be necessary for any multi-lingual ASR system. This problem is highly pertinent to India due to the existence of a wide diversity of languages, each having hundreds of dialects. In this chapter, we propose a spoken language identification framework for the recognition of the six most widely used spoken languages in India namely, English, Hindi, Bangla, Marathi, Tamil and Telugu. Firstly, the pre-processing is performed on an input audio clip, which focuses on removing its silent frames. This, in turn, increases its probability of recognition as redundant and non-informative information gets eliminated. The silence removal parameters are optimized in order to achieve the best performance. Secondly, the Mel Frequency Cepstral Coefficient (MFCC) features are extracted from the speech signals. Using these features, a Support Vector Machine (SVM) classifier is trained, which has been established as a superior classifier, due to its large-margin classification property, and its ability to classify complex and non-linear data. The proposed framework is tested on a standard IndicTTS speech database. SVM is trained for 13 (static), 26 (static + delta) and 39 (static + delta + delta-delta) MFCC features. It is found that the best results are achieved using only 13 static features, and addition of delta and delta-delta features decreases the performance. Maximum accuracy obtained is 60.33% for the above-mentioned 6 languages, and 89.33% for only Bangla, English and Hindi.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
B. Aarti, S.K. Kopparapu, Spoken Indian language identification: a review of features and databases. Sādhanā 43(4), 53 (2018). https://www.ias.ac.in/article/fulltext/sadh/043/04/0053
M.A.A. Albadr, S. Tiun, F.T. AL-Dhief, M.A.M. Sammour, Spoken language identification based on the enhanced self-adjusting extreme learning machine approach. PLOS ONE 13(4) (2018). https://doi.org/10.1371/journal.pone.0194770
S.A. Alim, N.K.A. Rashid, Some commonly used speech feature extraction algorithms. From Natural to Artificial Intelligence-Algorithms and Applications (2018)
B. Barai, D. Das, N. Das, S. Basu, M. Nasipuri, VQ/GMM-based speaker identification with emphasis on language dependency. In Advanced Computing and Systems for Security (Springer, 2019), pp. 125–141. https://www.researchgate.net/profile/Bidhan_Barai/publication/330414980_VQGMM-Based_Speaker_Identification_with_Emphasis_on_Language_Dependency_Volume_Eight/links/5c5043ad299bf12be3eb7d6a/VQ-GMM-Based-Speaker-Identification-with-Emphasis-on-Language-Depen
P. Beckmann,M. Kegler, H. Saltini, M. Cerňak, Speech-VGG: A deep feature extractor for speech processing (2019). https://arxiv.org/pdf/1910.09909.pdf
P. Bhaskararao, Salient phonetic features of Indian languages in speech technology. Sadhana 36(5), 587–599 (2011). https://www.ias.ac.in/article/fulltext/sadh/036/05/0587-0599
M. Gupta, S.S. Bharti, S. Agarwal, Implicit language identification system based on random forest and support vector machine for speech, in 2017 4th International Conference on Power, Control & Embedded Systems (ICPCES), pp. 1–6 (2017). https://ieeexplore.ieee.org/document/8117624
J. Hui, Speech Recognition—Feature Extraction MFCC & PLP (2019). https://medium.com/@jonathan_hui/speech-recognition-feature-extraction-mfcc-plp-5455f5a69dd9
Importance of Language in Society, pp. 1–35 (n.d.). https://shodhganga.inflibnet.ac.in/bitstream/10603/29223/17/9_chapter 1.pdf
B. Jiang, Y. Song, S. Wei, J.-H. Liu, I.V. McLoughlin, L.-R. Dai, Deep Bottleneck Features for Spoken Language Identification. PLoS ONE 9(7), e100795 (2014). https://doi.org/10.1371/journal.pone.0100795
P. Kumar, A. Biswas, A. Mishra, M. Chandra, Spoken language identification using hybrid feature extraction methods. J. Telecommun. 1(2), 11–15 (2010). https://www.researchgate.net/publication/45909010_Spoken_Language_Identification_Using_Hybrid_Feature_Extraction_Methods/citation/download
Learn Natural Language Processing: From Beginner to Expert (2020). https://www.commonlounge.com/discussion/3ecabc3d82684d57a62ad8fbc200f43b
G. Madzarov, D. Gjorgjevikj, Multi-class classification using support vector machines in decision tree architecture, in IEEE EUROCON 2009, pp. 288–295 (2009). https://www.researchgate.net/publication/224564327_Multi-Class_Classification_Using_Support_Vector_Machines_In_Decision_Tree_Architecture
S. Maity, A.K. Vuppala, K.S. Rao, D. Nandi, IITKGP-MLILSC speech database for language identification, in 2012 National Conference on Communications (NCC), pp. 1–5 (2012). http://cdn.iiit.ac.in/cdn/speech.iiit.ac.in/svlpubs/conference/sudhamay-anil.pdf
S. Manchala, V.K. Prasad, V. Janaki, GMM based language identification system using robust features. Int. J. Speech Technol. 17(2), 99–105 (2014). https://springerlink.bibliotecabuap.elogim.com/article/10.1007/s10772-013-9209-1
J.M. Moguerza, A. Muñoz, Support vector machines with applications. Stati. Sci. 21(3), 322–336 (2006)
S. Mohanty, Phonotactic model for spoken language identification in Indian language perspective. Int. J. Comput. Appl. 19, 18–24 (2011). https://doi.org/10.5120/2389-3164
H. Mukherjee, A. Dhar, S.M. Obaidullah, S. Phadikar, K. Roy, Image-based features for speech signal classification. Multimedia Tools Appl., 1–17 (2020). https://doi.org/10.1007/s11042-019-08553-6
H. Mukherjee, S. Ghosh, S. Sen, O. Sk Md, K.C. Santosh, S. Phadikar, K. Roy, Deep learning for spoken language identification: can we visualize speech signal patterns? Neural Comput. Appl. 31(12), 8483–8501 (2019). https://doi.org/10.1007/s00521-019-04468-3
H. Mukherjee, S.M. Obaidullah, K.C. Santosh, S. Phadikar, K. Roy, A lazy learning-based language identification from speech using MFCC-2 features. Int. J. Mach. Learn. Cybernet. 11(1), 1–14 (2020). https://doi.org/10.1007/s13042-019-00928-3
N. Krishna, A. Patil, M.S. Prince, S. Sai, P. Garapati, Identification of Indian Languages using Ghost-VLAD pooling (2020). https://www.researchgate.net/publication/339065645_Identification_of_Indian_Languages_using_Ghost-VLAD_pooling
Nyquist-Shannon Sampling Theorem, Wikipedia (n.d.). https://en.wikipedia.org/wiki/Nyquist–Shannon_sampling_theorem
B. Padi, A. Mohan, S. Ganapathy, Towards relevance and sequence modeling in language recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 1223–1232 (2020). https://doi.org/10.1109/TASLP.2020.2983580
A.D. Patil, Spoken language identification using machine learning, in Project report submitted to M S Ramaiah Institute of Technology (Issue May 2012). http://timewarp.adarshpatil.in/misc/LID_final_report.pdf
A. Patle, D.S. Chouhan, SVM kernel functions for classification. Int. Conf. Adv. Technol. Eng. (ICATE) 2013, 1–9 (2013). https://doi.org/10.1109/ICAdTE.2013.6524743
U. Shrawankar, V.M. Thakare, Techniques for feature extraction in speech recognition system: a comparative study (2013). ArXiv Preprint ArXiv:1305.1145. https://arxiv.org/ftp/arxiv/papers/1305/1305.1145.pdf
Signal Processing Toolbox, Mathworks (n.d.). https://www.mathworks.com/help/signal/
E. Singer, P. Torres-Carrasquillo, T. Gleason, W. Campbell, D. Reynolds, Acoustic, phonetic, and discriminative approaches to automatic language identification, in Eighth European Conference on Speech Communication and Technology, vol. 1 (2003). https://www.researchgate.net/publication/221489129_Acoustic_phonetic_and_discriminative_approaches_to_automatic_language_identification
Speech and Music Technology Lab IIT Madras, IIT Madras Speech Corpus (n.d.). https://www.iitm.ac.in/donlab/tts/database.php
Statistics and Machine Learning Toolbox. Mathworks (n.d.). https://www.mathworks.com/help/stats/index.html
G. Strang, Linear algebra and its application, in Linear Algebra 4th Edition, pp. 211–221, Chap. 3.5 (n.d.-a). http://facultymember.iaukhsh.ac.ir/images/Uploaded_files/[Strang_G.]_Linear_algebra_and_its_applications(4)[5881001].PDF
G. Strang, Linear algebra and its application. In Linear Algebra 4th Edition, pp. 180–195, Chap. 3.3 (n.d.-b). http://facultymember.iaukhsh.ac.ir/images/Uploaded_files/[Strang_G.]_Linear_algebra_and_its_applications(4)[5881001].PDF
A. Titus, J. Silovsky, N. Chen, R. Hsiao, M. Young, A. Ghoshal, Improving Language Identification for Multilingual Speakers (2020). https://arxiv.org/pdf/2001.11019.pdf
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Biswas, M., Rahaman, S., Kundu, S., Singh, P.K., Sarkar, R. (2021). Spoken Language Identification of Indian Languages Using MFCC Features. In: Kumar, P., Singh, A.K. (eds) Machine Learning for Intelligent Multimedia Analytics. Studies in Big Data, vol 82. Springer, Singapore. https://doi.org/10.1007/978-981-15-9492-2_12
Download citation
DOI: https://doi.org/10.1007/978-981-15-9492-2_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9491-5
Online ISBN: 978-981-15-9492-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)