Abstract
We investigate and compare several techniques for automatic recognition of unconstrained context-independent phoneme strings from TIMIT and NTIMIT databases. Among the compared techniques, the technique based on TempoRAl Patterns (TRAP) achieves the best results in the clean speech, it achieves about 10% relative improovements against baseline system. Its advantage is also observed in the presence of mismatch between training and testing conditions. Issues such as the optimal length of temporal patterns in the TRAP technique and the effectiveness of mean and variance normalization of the patterns and the multi-band input the TRAP estimations, are also explored.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lee, K., Hon, H.: Speaker-independent phone recognition using hidden Markov models. IEEE Transactions on Acoustics, Speech, and Signal Processing 37(11), 1641–1648 (1989)
Robinson, A.: An application of recurrent nets to phone probability estimation. IEEE Transactions on Neural Networks 5(3) (1994)
Chengalvarayan, R., Deng, L.: HMM-Based Speech Recognition Using State-Dependent, Discriminatively Derived Transforms on Mel-Warped DFT Features. IEEE Transactions on Speech and Audio Processing 5(3) (1997)
Chengalvarayan, R., Deng, L.: Use of Generalized Dynamic Feature Parameters for Speech Recognition. IEEE Transactions on Speech and Audio Processing 5(3) (1997)
Zahorian, S.A., Silsbee, P.L., Wang, X.: Phone Classification with Segmental Features and a Binary-Pair Partitioned Neural Network Classifier. In: Proc. ICASSP 1997, Munich, Germany, April 1997, vol. 97, pp. 1011–1014 (1997)
Bourlard, H., Morgan, N.: Connectionist speech recognition: A hybrid approach. Kluwer Academic Publishers, Boston (1994)
Sharma, S., Ellis, D., Karajekar, S., Jain, P., Hermansky, H.: Feature extraction using non-linear transformation for robust speech recognition on the Aurora database. In: Proc. ICASSP 2000, Turkey (2000)
Hermansky, H., Sharma, S.: Temporal Patterns (TRAPS) in ASR of Noisy Speech. In: Proc. ICASSP 1999, Phoenix, Arizona, USA (March 1999)
The SPRACHcore software packages, http://www.icsi.berkeley.edu/dpwe/projects/sprach/
HTK toolkit, htk.eng.cam.ac.uk/
Jain, P., Hermansky, H.: Beyond a single critical-band in TRAP based ASR. submitted to Eurospeech (2003)
Jain, P.: Personal comunication
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Matějka, P., Schwarz, P., Hermansky, H., Černocký, J. (2003). Phoneme Recognition Using Temporal Patterns. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2003. Lecture Notes in Computer Science(), vol 2807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39398-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-39398-6_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20024-6
Online ISBN: 978-3-540-39398-6
eBook Packages: Springer Book Archive