Skip to main content

Comparative Analysis of Spectral and Cepstral Feature Extraction Techniques for Phoneme Modelling

  • Conference paper
  • First Online:
Multimedia and Network Information Systems (MISSI 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 833))

Included in the following conference series:

Abstract

Phoneme parameter extraction framework based on spectral and cepstral parameters is proposed. Using this framework, the phoneme signal is divided into frames and Hamming window is used. The performances are evaluated for recognition of Lithuanian vowel and semivowel phonemes. Different feature sets without noise as well as at different level of noise are considered. Two classical machine learning methods (Naive Bayes and Support Vector Machine) are used for classifying each problem, separately. The experiment results show that cepstral parameters give higher accuracies than spectral parameters. Moreover, cepstral parameters give better performance compared to spectral parameters in noisy conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Korvel, G., Kostek, B.: Examining feature vector for phoneme recognition. In: Proceeding of IEEE International Symposium on Signal Processing and Information Technology, ISSPIT 2017, Bilbao, Spain (2017). https://drive.google.com/open?id=1ugidXH_qNO9LRWTnkJU6Mbrk4AfTonGw

  2. Eringis, D., Tamulevičius, G.: Improving speech recognition rate through analysis parameters. Electr. Control Commun. Eng. 5(1), 61–66 (2014)

    Article  Google Scholar 

  3. Kostek, B., Piotrowska, M., Ciszewski, T., Czyzewski, A.: Comparative study of self-organizing maps vs subjective evaluation of quality of allophone pronunciation for non-native english speakers. In: Audio Engineering Society Convention, p. 143 (2017)

    Google Scholar 

  4. Fastl, H., Zwicker, E.: Psychoacoustics: Facts and Models. Springer Series in Information Sciences, 3rd edn. Springer, Heidelberg (2007)

    Google Scholar 

  5. Schuller, B.R., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C., Narayanan, S.: Paralinguistics in speech and language - state-of-the-art and the challenge. Comput. Speech Lang. 27(1), 4–39 (2009)

    Article  Google Scholar 

  6. Eyben, F., Scherer, K.R., Schuller, B.W., Sundberg, J., Andre, E., Busso, C., Devillers, L.Y., Epps, J., Laukka, P., Narayanan, S.S., Truong, K.P.: The geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016)

    Article  Google Scholar 

  7. Boersma, P.: Praat, a system for doing phonetics by computer. Glot Int. 5(9/10), 341–347 (2002)

    Google Scholar 

  8. Matsumoto, M., Hori, J.: Classification of silent speech using support vector machine and relevance vector machine. Appl. Soft Comput. 20, 95–102 (2014)

    Article  Google Scholar 

  9. Bojanic, M., Crnojevic, V., Delic, V.: Application of neural networks in emotional speech recognition. In: 2012 11th Symposium on Neural Network Applications in Electrical Engineering (NEUREL), pp. 223–226. IEEE (2012)

    Google Scholar 

  10. Sadjadi, S.O., Pelecanos, J., Zhu, W.: Nearest neighbor based i-vector normalization for robust speaker recognition under unseen channel conditions. In: Fifteenth Annual Conference of the International Speech Communication Association, Singapore, pp. 1860–1864 (2014)

    Google Scholar 

  11. Kotsiantis, S.B.: Supervised machine learning: a review of classification techniques. Informatica 31, 249–268 (2007)

    MathSciNet  MATH  Google Scholar 

  12. Metallinou, A., Katsamanis, A., Narayanan, S.: A hierarchical framework for modeling multimodality and emotional evolution in affective dialogs. In: IEEE International Conference on IEEE Acoustics, Speech and Signal Processing (ICASSP), pp. 2401–2404 (2012)

    Google Scholar 

  13. Vasuki, P., Aravindan, C.: Improving emotion recognition from speech using sensor fusion techniques. In: IEEE Region 10 Conference, TENCON, pp. 1–6 (2012)

    Google Scholar 

  14. Rabiner, L., Juang, B.H.: Fundamental of Speech Recognition. Prentice Hall, Upper Saddle River (1933)

    Google Scholar 

  15. Rao, K.S., Vuppala, A.K.: Speech Processing in Mobile Environments, Springer Science & Business Media, Heidelberg (2014)

    Google Scholar 

  16. Pyž, G., Šimonytė, V., Slivinskas, V.: Developing models of lithuanian speech vowels and semivowels. Informatica 25(1), 55–72 (2014)

    Article  Google Scholar 

  17. Rabiner, L.R., Schafer, R.W.: Introduction to Digital Speech Processing. Now Publishers Inc., Breda (2007)

    MATH  Google Scholar 

  18. LIEPA Homepage. https://www.raštija.lt/liepa/about-project-liepa/7596

  19. Refaeilzadeh, P., Tang, L., Liu, H.: Cross-validation. In: Encyclopedia of Database Systems, pp. 532–538 (2009)

    Google Scholar 

  20. Ghosh, J.K., Delampady, M., Samanta, T.: An Introduction to Bayesian Analysis: Theory and Methods, 1st edn. Springer Science Business Media, LLC, New York (2006)

    MATH  Google Scholar 

  21. Palaniappan, R., Sundaraj, K., Sundaraj, S.: A comparative study of the svm and k-nn machine learning algorithms for the diagnosis of respiratory pathologies using pulmonary acoustic signals. BMC Bioinf. 15, 1–8 (2014)

    Article  Google Scholar 

  22. Hegde, S., Achary, K.K., Shetty, S.: Feature selection using fisher’s ratio technique for automatic speech recognition. Int. J. Cybern. Inf. (IJCI) 4(2) (2015). https://doi.org/10.5121/ijci.2015.4204

  23. Korvel, G., Kostek, B.: Voiceless stop consonant modelling and synthesis framework based on MISO dynamic system. Arch. Acoust. 3(42), 375–383 (2017). https://doi.org/10.1515/aoa-2017-0039

    Article  Google Scholar 

Download references

Acknowledgment

This research is funded by the European Social Fund under the No 09.3.3-LMT-K-712 “Development of Competences of Scientists, other Researchers and Students through Practical Research Activities” measure.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Grazina Korvel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Korvel, G., Kurasova, O., Kostek, B. (2019). Comparative Analysis of Spectral and Cepstral Feature Extraction Techniques for Phoneme Modelling. In: Choroś, K., Kopel, M., Kukla, E., Siemiński, A. (eds) Multimedia and Network Information Systems. MISSI 2018. Advances in Intelligent Systems and Computing, vol 833. Springer, Cham. https://doi.org/10.1007/978-3-319-98678-4_48

Download citation

Publish with us

Policies and ethics