Comparative Analysis of Spectral and Cepstral Feature Extraction Techniques for Phoneme Modelling

Korvel, Grazina; Kurasova, Olga; Kostek, Bozena

doi:10.1007/978-3-319-98678-4_48

Grazina Korvel¹⁸,
Olga Kurasova¹⁸ &
Bozena Kostek¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 833))

Included in the following conference series:

International Conference on Multimedia and Network Information System

743 Accesses
2 Citations

Abstract

Phoneme parameter extraction framework based on spectral and cepstral parameters is proposed. Using this framework, the phoneme signal is divided into frames and Hamming window is used. The performances are evaluated for recognition of Lithuanian vowel and semivowel phonemes. Different feature sets without noise as well as at different level of noise are considered. Two classical machine learning methods (Naive Bayes and Support Vector Machine) are used for classifying each problem, separately. The experiment results show that cepstral parameters give higher accuracies than spectral parameters. Moreover, cepstral parameters give better performance compared to spectral parameters in noisy conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Wavelet-Based Power Normalized Spectrum for Hindi Phoneme Classification

Article 16 April 2019

Robust phoneme classification for automatic speech recognition using hybrid features and an amalgamated learning model

Article 11 October 2016

Performance of Classifiers on MFCC-Based Phoneme Recognition for Language Identification

References

Korvel, G., Kostek, B.: Examining feature vector for phoneme recognition. In: Proceeding of IEEE International Symposium on Signal Processing and Information Technology, ISSPIT 2017, Bilbao, Spain (2017). https://drive.google.com/open?id=1ugidXH_qNO9LRWTnkJU6Mbrk4AfTonGw
Eringis, D., Tamulevičius, G.: Improving speech recognition rate through analysis parameters. Electr. Control Commun. Eng. 5(1), 61–66 (2014)
Article Google Scholar
Kostek, B., Piotrowska, M., Ciszewski, T., Czyzewski, A.: Comparative study of self-organizing maps vs subjective evaluation of quality of allophone pronunciation for non-native english speakers. In: Audio Engineering Society Convention, p. 143 (2017)
Google Scholar
Fastl, H., Zwicker, E.: Psychoacoustics: Facts and Models. Springer Series in Information Sciences, 3rd edn. Springer, Heidelberg (2007)
Google Scholar
Schuller, B.R., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C., Narayanan, S.: Paralinguistics in speech and language - state-of-the-art and the challenge. Comput. Speech Lang. 27(1), 4–39 (2009)
Article Google Scholar
Eyben, F., Scherer, K.R., Schuller, B.W., Sundberg, J., Andre, E., Busso, C., Devillers, L.Y., Epps, J., Laukka, P., Narayanan, S.S., Truong, K.P.: The geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016)
Article Google Scholar
Boersma, P.: Praat, a system for doing phonetics by computer. Glot Int. 5(9/10), 341–347 (2002)
Google Scholar
Matsumoto, M., Hori, J.: Classification of silent speech using support vector machine and relevance vector machine. Appl. Soft Comput. 20, 95–102 (2014)
Article Google Scholar
Bojanic, M., Crnojevic, V., Delic, V.: Application of neural networks in emotional speech recognition. In: 2012 11th Symposium on Neural Network Applications in Electrical Engineering (NEUREL), pp. 223–226. IEEE (2012)
Google Scholar
Sadjadi, S.O., Pelecanos, J., Zhu, W.: Nearest neighbor based i-vector normalization for robust speaker recognition under unseen channel conditions. In: Fifteenth Annual Conference of the International Speech Communication Association, Singapore, pp. 1860–1864 (2014)
Google Scholar
Kotsiantis, S.B.: Supervised machine learning: a review of classification techniques. Informatica 31, 249–268 (2007)
MathSciNet MATH Google Scholar
Metallinou, A., Katsamanis, A., Narayanan, S.: A hierarchical framework for modeling multimodality and emotional evolution in affective dialogs. In: IEEE International Conference on IEEE Acoustics, Speech and Signal Processing (ICASSP), pp. 2401–2404 (2012)
Google Scholar
Vasuki, P., Aravindan, C.: Improving emotion recognition from speech using sensor fusion techniques. In: IEEE Region 10 Conference, TENCON, pp. 1–6 (2012)
Google Scholar
Rabiner, L., Juang, B.H.: Fundamental of Speech Recognition. Prentice Hall, Upper Saddle River (1933)
Google Scholar
Rao, K.S., Vuppala, A.K.: Speech Processing in Mobile Environments, Springer Science & Business Media, Heidelberg (2014)
Google Scholar
Pyž, G., Šimonytė, V., Slivinskas, V.: Developing models of lithuanian speech vowels and semivowels. Informatica 25(1), 55–72 (2014)
Article Google Scholar
Rabiner, L.R., Schafer, R.W.: Introduction to Digital Speech Processing. Now Publishers Inc., Breda (2007)
MATH Google Scholar
LIEPA Homepage. https://www.raštija.lt/liepa/about-project-liepa/7596
Refaeilzadeh, P., Tang, L., Liu, H.: Cross-validation. In: Encyclopedia of Database Systems, pp. 532–538 (2009)
Google Scholar
Ghosh, J.K., Delampady, M., Samanta, T.: An Introduction to Bayesian Analysis: Theory and Methods, 1st edn. Springer Science Business Media, LLC, New York (2006)
MATH Google Scholar
Palaniappan, R., Sundaraj, K., Sundaraj, S.: A comparative study of the svm and k-nn machine learning algorithms for the diagnosis of respiratory pathologies using pulmonary acoustic signals. BMC Bioinf. 15, 1–8 (2014)
Article Google Scholar
Hegde, S., Achary, K.K., Shetty, S.: Feature selection using fisher’s ratio technique for automatic speech recognition. Int. J. Cybern. Inf. (IJCI) 4(2) (2015). https://doi.org/10.5121/ijci.2015.4204
Korvel, G., Kostek, B.: Voiceless stop consonant modelling and synthesis framework based on MISO dynamic system. Arch. Acoust. 3(42), 375–383 (2017). https://doi.org/10.1515/aoa-2017-0039
Article Google Scholar

Download references

Acknowledgment

This research is funded by the European Social Fund under the No 09.3.3-LMT-K-712 “Development of Competences of Scientists, other Researchers and Students through Practical Research Activities” measure.

Author information

Authors and Affiliations

Institute of Data Science and Digital Technologies, Vilnius University, Akademijos str. 4, 04812, Vilnius, Lithuania
Grazina Korvel & Olga Kurasova
Audio Acoustics Laboratory, Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, G. Narutowicza 11/12, 80-233, Gdansk, Poland
Bozena Kostek

Authors

Grazina Korvel
View author publications
You can also search for this author in PubMed Google Scholar
Olga Kurasova
View author publications
You can also search for this author in PubMed Google Scholar
Bozena Kostek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Grazina Korvel .

Editor information

Editors and Affiliations

Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wrocław, Poland
Kazimierz Choroś
Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wrocław, Poland
Marek Kopel
Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wrocław, Poland
Elżbieta Kukla
Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wrocław, Poland
Andrzej Siemiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Korvel, G., Kurasova, O., Kostek, B. (2019). Comparative Analysis of Spectral and Cepstral Feature Extraction Techniques for Phoneme Modelling. In: Choroś, K., Kopel, M., Kukla, E., Siemiński, A. (eds) Multimedia and Network Information Systems. MISSI 2018. Advances in Intelligent Systems and Computing, vol 833. Springer, Cham. https://doi.org/10.1007/978-3-319-98678-4_48

Download citation

DOI: https://doi.org/10.1007/978-3-319-98678-4_48
Published: 15 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98677-7
Online ISBN: 978-3-319-98678-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Comparative Analysis of Spectral and Cepstral Feature Extraction Techniques for Phoneme Modelling

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Wavelet-Based Power Normalized Spectrum for Hindi Phoneme Classification

Robust phoneme classification for automatic speech recognition using hybrid features and an amalgamated learning model

Performance of Classifiers on MFCC-Based Phoneme Recognition for Language Identification

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Comparative Analysis of Spectral and Cepstral Feature Extraction Techniques for Phoneme Modelling

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Wavelet-Based Power Normalized Spectrum for Hindi Phoneme Classification

Robust phoneme classification for automatic speech recognition using hybrid features and an amalgamated learning model

Performance of Classifiers on MFCC-Based Phoneme Recognition for Language Identification

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation