Hybrid Approach of Feature Extraction and Vector Quantization in Speech Recognition

Manchanda, Sarthak; Gupta, Divya

doi:10.1007/978-981-10-8228-3_59

Sarthak Manchanda¹⁹ &
Divya Gupta¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 712))

958 Accesses
3 Citations

Abstract

This paper examines the speech recognition process. Speech recognition has two phases: the front end which comprises of preprocessing of the speech waveform and the back end which comprises of feature extraction and feature matching. In this review, we discuss some feature extraction techniques like MFCC, LPC, LPCC, PLP, and RASTA-PLP. As these techniques have some demerits stated in the paper, we discuss a hybrid approach of feature extraction with some combinations of the above techniques. Feature matching helps in the recognition part of speech recognition. It is done by comparing the feature vectors of the current user to the feature vectors stored in the database. It can be optimized by vector quantization (VQ) in order to speed up the recognition process.

Access provided by CONRICYT-eBooks. Download conference paper PDF

The State of the Art of Feature Extraction Techniques in Speech Recognition

Speech classification using SIFT features on spectrogram images

Article Open access 16 June 2016

A Review of Various Techniques Related to Feature Extraction and Classification for Speech Signal Analysis

Keywords

1 Introduction

Speech recognition is known as the ability of a machine to detect and identify the words or phrases uttered by the human being and to detect and convert this speech signal into machine read format. This concept of Artificial Intelligence is quite common these days and many advancements are being done toward it [1].

Speech recognition has front end and back end analysis.

Front End:

1.
Preprocessing: It is used to increase efficiency for further processes. It helps in making the system more robust. It also helps in generating parametric representation of the speech signal.
2.
Framing and Windowing: Most part of a speech signal processing depends on short time analysis done with the help of framing. The signal gets blocked into frames of samples N with a duration ranging between 10 and 30 ms.

Back End:

1.
Feature Extraction: It is useful to extract useful information and remove the unwanted and redundant information. Certain features are extracted using this process which helps in speech recognizing. The goal of this technique is to compute the sequence savings of feature vectors which represent the computing signals. Different types of feature extraction techniques are: MFCC, LPCC, LPC, PLP, and RATSA-PLP [2].
2.
Feature Matching: It is known as the process in which identification of the speaker is done by comparing the extracted features to the features which are stored in a database. It is basically done by first storing the input signal features in the database and then comparing the stored values to the input values of the unknown speaker. It gives the result as matched or not matched [2].

2 Type of Speech Uttered by Human Beings

There are many types of speech uttered by the human beings which differ in parameters [3]. These types are:

1.
Isolated Speech:

It requires single word at a time. This is one of the best speech types in which there is very less noise on both sides of the window.
2.
Connected Speech:

This type has minimum pause between the words uttered by the human.
3.
Continuous Speech:

This kind of speech has no pause between the words. It is basically what computer dictates and what human beings say the most.
4.
Spontaneous Speech:

This type of speech can be thought of as natural and without trying it before. An ASR system can handle spontaneous speech when it has every word with its meaning in the database.

3 Feature Extraction Techniques

It is the most important part in any ASR system. The meaning of extracting feature is to extract the useful information and remove unwanted information (Figs. 1 and 2).

1.
Mel Frequency Cepstral Coefficients(MFCC):

It is the most common and most popular feature extraction technique used for an ASR. Frequency bands in MFCC are placed in a logarithmical order so that it can approximate the human being system closely than any of the other system (Fig. 3).
Fig. 3
Linear prediction coding coefficient(LPCC)
Full size image
2.
Linear Prediction Coding(LPC):

It is a good signal feature extraction method for linear prediction in speech recognition processes (Fig. 4). The basic idea behind LPC is that the speech signal can be approximated as a linear combination of past speech samples stored in the database [4].
Fig. 4
Vector quantization
Full size image
3.
Linear Prediction Coding Coefficient(LPCC):

This technique works at a low bit by demonstrating the speech signal by a finite number of measure of the signals. It represents a mimic of speech after computing a smooth version of cepstral coefficient using autocorrelation method. In LPCC feature extraction, cepstrum analysis is done on the LPC analysis as seen below [4] (Fig. 3).

Other feature extraction techniques are PLP, RASTA-PLP etc.
4.
Hybrid and Robust Technique:

In order to obtain a new and more effective feature extraction technique, certain hybrid algorithms make a distinction from the previous feature techniques. The previous techniques have some drawbacks which do not make the recognition process accurate and robust. The drawbacks are given in Table 1 [4].

Table 1 Demerits of feature extraction techniques

Full size table

To overcome these drawbacks, a hybrid technique is used. Combination of previous features is taken to make the extraction more robust.

There is a generation of 13 parameter coefficients of these techniques

(MFCC, LPC, LPCC, PLP, and RASTA-PLP). In four experiments, different combinations of these techniques were used to give “39 coefficient parameters” in a single vector:

(a)
13(MFCC) + 13(LPCC) + 13(PLP)
(b)
13(MFCC) + 13(LPCC) + 13(RASTA PLP)
(c)
13(MFCC) + 13(PLP) + 13(RASTA PLP)
(d)
13(LPCC) + 13(PLP) + 13(RASTA-PLP).

4 Feature Matching

While the above feature extraction techniques were used in the front end to extract certain important characteristics, vector quantization is used in the back end to generate a correct decision while maintaining a codebook [5, 6].

It is a procedure to identify an unknown speaker by comparing the extracted features from the above approach with the data of the known speaker stored in the database [2].

Vector Quantization(VQ):

VQ is nothing more than “rounding off to the nearest integer.” It is also known as the centroid model. It was introduced in the year 1980 and it has its roots from data compression. It has many advantages:

speed up the recognition process
for lightweight practical use
ease of implementation.

Apart from these advantages, it can lead to loss of potential data.

In VQ, the signal vectors from a very large vector scale can be mapped on a finite region of space (Fig. 4). Every region of the space is known as “cluster” and is shown by a center called “codeword”. The group of these code words is known as a “codebook” [7].

In the figure are two different speakers on a two-dimensional space. Triangle refers to the vector from “speaker 1” while the circle refers to the vector from “speaker 2”. There are two phases in VQ. In the first phase which is also known as the training phase, a codebook is generated which is speaker specific for every single known speaker taking the training vectors in the database. The result of these code words is represented as the centroid of every triangle or circle as shown in the figure by Green Circles for “speaker 1” and Green Triangles for “speaker 2”.

In the second phase which is also known as the recognition phase, speech input signal is “vector quantized” using a trained codebook and the total VQ-distortion is computed. The speaker having the codebook with the least distortion is discovered.

5 Conclusion and Future Work

The main objective of this research is to analyze why the hybrid approach is better to use for feature extraction and how can vector quantization help in fast and accurate feature matching. This paper gives the detail about various feature extraction techniques and how hybrid approach is better than these techniques. It shows some of the advantages of the hybrid approach in the table.

On the basis of the review, it is analyzed that the combination of techniques can help in achieving good results in a noisy environment also as variety of filters is used in hybrid approach. This hybrid approach is used for a flexible kind of environment.

For feature matching, vector quantization is used in which optimization is done which speeds up the recognition process. The future work is to develop an ASR System with high accuracy.

References

Karpagavalli, S., and Chandra, E “A Review on Automatic Speech Recognition Architecture and Approaches” International Journal of Signal Processing, Image Processing and Pattern Recognition Vol. 9, No. 4, (2016) pp. 393–404.
Article MathSciNet Google Scholar
Geeta Nijhawan 1, Dr. M.K Soni 2 “Speaker Recognition Using MFCC and Vector Quantisation” ACEEE Int. J. on Recent Trends in Engineering and Technology, Vol. 11, No. 1, July 2014.
Google Scholar
Akansha Madan, Divya Gupta “Speech Feature Extraction and Classification: A Comparative Review” International Journal of Computer Applications (0975–8887) Volume 90 – No 9, March 2014.
Google Scholar
Veton Z. Këpuska, Hussien A. Elharati “Robust Speech Recognition System Using Conventional and Hybrid Features of MFCC, LPCC, PLP, RASTA-PLP and Hidden Markov Model Classifier in Noisy Conditions” Journal of Computer and Communications, 2015, 3, 1–9 Published Online June 2015.
Google Scholar
Hemlata Eknath Kamale, Dr.R. S. Kawitkar “Vector Quantization Approach for Speaker Recognition” International Journal of Computer Technology and Electronics Engineering (IJCTEE) Volume 3, Special Issue, March-April 2013, An ISO 9001: 2008 Certified Journal.
Google Scholar
Preeti Saini, Parneet Kaur “Automatic Speech Recognition: A Review” “FLEXIBLE FEATURE EXTRACTION AND HMM DESIGN FOR A HYBRID DISTRIBUTED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENTS” International Journal of Engineering Trends and Technology-Volume 4 Issue 2– 2013.
Google Scholar
Dipmoy Gupta, Radha Mounima C. Navya Manjunath, Manoj PB “Isolated Word Speech Recognition Using Vector Quantization (VQ)” International Journal of Advanced Research in Computer Science and Software Engineering Volume 2, Issue 5, May 2012 ISSN: 2277 128X.
Google Scholar
Veton Z. Këpuska, Hussien A Elharati “Performance Evaluation of Conventional and Hybrid Feature Extractions Using Multivariate HMM Classifier” Int. Journal of Engineering Research and Applications www.ijera.com ISSN: 2248-9622, Vol. 5, Issue 4, (Part -1) April 2015, pp. 96–101.
Pratik K. Kurzekar 1, Ratnadeep R. Deshmukh 2, Vishal B. Waghmare 2, Pukhraj P. Shrishrimal 2 “A Comparative Study of Feature Extraction Techniques for Speech Recognition System” International Journal of Innovative Research in Science, Engineering and Technology (An ISO 3297: 2007 Certified Organization) Vol. 3, Issue 12, December 2014.
Google Scholar
Pawan Kumar, Astik Biswas, A. N. Mishra and Mahesh Chandra “Spoken Language Identification Using Hybrid Feature Extraction Methods” JOURNAL OF TELECOMMUNICATIONS, VOLUME 1, ISSUE 2, MARCH 2010.
Google Scholar
H. B. Kekre, Tanuja K. Sarode “Speech Data Compression using Vector Quantization” International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol: 2, No: 3, 2008.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University, Noida, Uttar Pradesh, India
Sarthak Manchanda & Divya Gupta

Authors

Sarthak Manchanda
View author publications
You can also search for this author in PubMed Google Scholar
Divya Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sarthak Manchanda .

Editor information

Editors and Affiliations

Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial Group of Professional Colleges, Lucknow, Uttar Pradesh, India
Vikrant Bhateja
Departamento de Engenharia Mecânica, Universidade do Porto, Porto, Portugal
João Manuel R.S. Tavares
Department of Computer Science and Engineering, JNTUH College of Engineering Hyderabad (Autonomous), Hyderabad, Telangana, India
B. Padmaja Rani
Department of Computer Science and Engineering, JNTUH College of Engineering Hyderabad (Autonomous), Hyderabad, Telangana, India
V. Kamakshi Prasad
CMR Technical Campus, Hyderabad, Telangana, India
K. Srujan Raju

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Manchanda, S., Gupta, D. (2018). Hybrid Approach of Feature Extraction and Vector Quantization in Speech Recognition. In: Bhateja, V., Tavares, J., Rani, B., Prasad, V., Raju, K. (eds) Proceedings of the Second International Conference on Computational Intelligence and Informatics . Advances in Intelligent Systems and Computing, vol 712. Springer, Singapore. https://doi.org/10.1007/978-981-10-8228-3_59

Download citation

DOI: https://doi.org/10.1007/978-981-10-8228-3_59
Published: 24 July 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8227-6
Online ISBN: 978-981-10-8228-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Hybrid Approach of Feature Extraction and Vector Quantization in Speech Recognition

Abstract

Similar content being viewed by others

The State of the Art of Feature Extraction Techniques in Speech Recognition

Speech classification using SIFT features on spectrogram images

A Review of Various Techniques Related to Feature Extraction and Classification for Speech Signal Analysis

Keywords

1 Introduction

2 Type of Speech Uttered by Human Beings

3 Feature Extraction Techniques

4 Feature Matching

5 Conclusion and Future Work

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Hybrid Approach of Feature Extraction and Vector Quantization in Speech Recognition

Abstract

Similar content being viewed by others

The State of the Art of Feature Extraction Techniques in Speech Recognition

Speech classification using SIFT features on spectrogram images

A Review of Various Techniques Related to Feature Extraction and Classification for Speech Signal Analysis

Keywords

1 Introduction

2 Type of Speech Uttered by Human Beings

3 Feature Extraction Techniques

4 Feature Matching

5 Conclusion and Future Work

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation