Skip to main content

An Efficient Feature Fusion Technique for Text-Independent Speaker Identification and Verification

  • Conference paper
  • First Online:
Advances in Data Computing, Communication and Security

Abstract

Speaker identification and verification is an important research area that finds applications in forensics voice verification, mobile banking and security authentication for access control. Various techniques for feature extraction are available in the literature. In this work, a speech feature fusion extraction technique based on fusion of time domain, frequency domain and cepstral domain features has been proposed. Supervised machine learning classification algorithms are used for speaker feature classification. Performance of proposed technique has been evaluated on two open-source speech datasets. Performance metrics of training time and accuracy (validation and test) are measured with the help of confusion matrix. The results indicate that even with smaller training datasets, the average accuracy achieved is 2.97 and 8.97% better and training time 1.95 and 2.03 s less as compared to MFCC and (MFCC + delta + delta delta) MFCC + Δ + Δ2, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. H. Garg, R.K. Bansal, S. Bansal, Improved speech compression using LPC and DWT approach. Int. J. Electron. Commun. Instrum. Eng. Res. Dev. (IJECIERD) 4(2), 155–162 (2014)

    Google Scholar 

  2. Z. Zhang, Mechanics of human voice production and control. J. Acoust. Soc. Am. 140, 2614–2635 (2016). https://doi.org/10.1121/1.4964509

    Article  Google Scholar 

  3. R.M. Hanifa, K. Isa, S. Mohamad, A review on speaker recognition: technology and challenges. Comput. Electr. Eng. 90, 107005 (2021). https://doi.org/10.1016/j.compeleceng.2021.107005

  4. Z. Bai, X.-L. Zhang, Speaker recognition based on deep learning: an overview. Neural Netw. 140, 65–99 (2021). https://doi.org/10.1016/j.neunet.2021.03.004

    Article  Google Scholar 

  5. G. Sharma, K. Umapathy, S. Krishnan, Trends in audio signal feature extraction methods. Appl. Acoust. 158, 107020 (2020). https://doi.org/10.1016/j.apacoust.2019.107020

  6. F. Alías, J.C. Socoró, X. Sevillano, A review of physical and perceptual feature extraction techniques for speech, music and environmental sounds. Appl. Sci. 6(5), 143 (2016). https://doi.org/10.3390/app6050143

    Article  Google Scholar 

  7. K.S.R. Murty, B. Yegnanarayana, Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process. Lett. 13(1), 52–55 (2006). https://doi.org/10.1109/LSP.2005.860538

    Article  Google Scholar 

  8. S. Fong, K. Lan, R. Wong, Classifying human voices by using hybrid SFX time-series preprocessing and ensemble feature selection. BioMed Res. Int. 2013(720834) (2013). https://doi.org/10.1155/2013/720834

  9. H. Ali, S.N. Tran, E. Benetos et al., Speaker recognition with hybrid features from a deep belief network. Neural Comput. Appl. 29, 13–19 (2018). https://doi.org/10.1007/s00521-016-2501-7

    Article  Google Scholar 

  10. M. Soleymanpour, H. Marvi, Text-independent speaker identification based on selection of the most similar feature vectors. Int. J. Speech Technol. 20, 99–108 (2017). https://doi.org/10.1007/s10772-016-9385-x

    Article  Google Scholar 

  11. S. Selva Nidhyananthan, R. Shantha Selva Kumari, T. Senthur Selvi, Noise robust speaker identification using RASTA–MFCC feature with quadrilateral filter bank structure. Wireless Pers. Commun. 91, 1321–1333 (2016). https://doi.org/10.1007/s11277-016-3530-3

  12. M. Mohammadi, H.R. Sadegh Mohammadi, Robust features fusion for text independent speaker verification enhancement in noisy environments, in 2017 Iranian Conference on Electrical Engineering (ICEE) (2017), pp. 1863–1868. https://doi.org/10.1109/IranianCEE.2017.7985357

  13. R. Jahangir et al., Text-independent speaker identification through feature fusion and deep neural network. IEEE Access 8, 32187–32202 (2020). https://doi.org/10.1109/ACCESS.2020.2973541

    Article  Google Scholar 

  14. S. Bansal, R.K. Bansal, Y. Sharma, ANN based efficient feature fusion technique for speaker recognition, in International Conference on Emerging Technologies: AI, IoT and CPS for Science & Technology Applications (2021). http://ceur-ws.org/Vol-3058/Paper-063.pdf

  15. M.A. Hossan, S. Memon, M.A. Gregory, A novel approach for MFCC feature extraction, in 2010 4th International Conference on Signal Processing and Communication Systems (2010), pp. 1–5. https://doi.org/10.1109/ICSPCS.2010.5709752

  16. E. Alexandre-Cortizo, M. Rosa-Zurera, F. Lopez-Ferreras, Application of Fisher linear discriminant analysis to speech/music classification, in EUROCON 2005—The International Conference on “Computer as a Tool” (2005), pp. 1666–1669. https://doi.org/10.1109/EURCON.2005.1630291

  17. S. Sun, C. Zhang, Subspace ensembles for classification. Physica A 385(1), 199–207 (2007). https://doi.org/10.1016/j.physa.2007.05.010

    Article  MathSciNet  Google Scholar 

  18. G. Pirker, M. Wohlmayr, S. Petrik, F. Pernkopf, A pitch tracking corpus with evaluation on multipitch tracking scenario. Interspeech, 1509–1512 (2011). Available Online https://www2.spsc.tugraz.at/databases/PTDB-TUG/

  19. ST-AEDS-20180100_1, Free ST American English Corpus. Available Online https://www.openslr.org/45/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bansal, S., Bansal, R.K., Sharma, Y. (2022). An Efficient Feature Fusion Technique for Text-Independent Speaker Identification and Verification. In: Verma, P., Charan, C., Fernando, X., Ganesan, S. (eds) Advances in Data Computing, Communication and Security. Lecture Notes on Data Engineering and Communications Technologies, vol 106. Springer, Singapore. https://doi.org/10.1007/978-981-16-8403-6_56

Download citation

Publish with us

Policies and ethics