Skip to main content

Phoneme Recognition Using Temporal Patterns

  • Conference paper
Text, Speech and Dialogue (TSD 2003)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2807))

Included in the following conference series:

Abstract

We investigate and compare several techniques for automatic recognition of unconstrained context-independent phoneme strings from TIMIT and NTIMIT databases. Among the compared techniques, the technique based on TempoRAl Patterns (TRAP) achieves the best results in the clean speech, it achieves about 10% relative improovements against baseline system. Its advantage is also observed in the presence of mismatch between training and testing conditions. Issues such as the optimal length of temporal patterns in the TRAP technique and the effectiveness of mean and variance normalization of the patterns and the multi-band input the TRAP estimations, are also explored.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Lee, K., Hon, H.: Speaker-independent phone recognition using hidden Markov models. IEEE Transactions on Acoustics, Speech, and Signal Processing 37(11), 1641–1648 (1989)

    Article  Google Scholar 

  2. Robinson, A.: An application of recurrent nets to phone probability estimation. IEEE Transactions on Neural Networks 5(3) (1994)

    Google Scholar 

  3. Chengalvarayan, R., Deng, L.: HMM-Based Speech Recognition Using State-Dependent, Discriminatively Derived Transforms on Mel-Warped DFT Features. IEEE Transactions on Speech and Audio Processing 5(3) (1997)

    Google Scholar 

  4. Chengalvarayan, R., Deng, L.: Use of Generalized Dynamic Feature Parameters for Speech Recognition. IEEE Transactions on Speech and Audio Processing 5(3) (1997)

    Google Scholar 

  5. Zahorian, S.A., Silsbee, P.L., Wang, X.: Phone Classification with Segmental Features and a Binary-Pair Partitioned Neural Network Classifier. In: Proc. ICASSP 1997, Munich, Germany, April 1997, vol. 97, pp. 1011–1014 (1997)

    Google Scholar 

  6. Bourlard, H., Morgan, N.: Connectionist speech recognition: A hybrid approach. Kluwer Academic Publishers, Boston (1994)

    Google Scholar 

  7. Sharma, S., Ellis, D., Karajekar, S., Jain, P., Hermansky, H.: Feature extraction using non-linear transformation for robust speech recognition on the Aurora database. In: Proc. ICASSP 2000, Turkey (2000)

    Google Scholar 

  8. Hermansky, H., Sharma, S.: Temporal Patterns (TRAPS) in ASR of Noisy Speech. In: Proc. ICASSP 1999, Phoenix, Arizona, USA (March 1999)

    Google Scholar 

  9. The SPRACHcore software packages, http://www.icsi.berkeley.edu/dpwe/projects/sprach/

  10. HTK toolkit, htk.eng.cam.ac.uk/

  11. Jain, P., Hermansky, H.: Beyond a single critical-band in TRAP based ASR. submitted to Eurospeech (2003)

    Google Scholar 

  12. Jain, P.: Personal comunication

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Matějka, P., Schwarz, P., Hermansky, H., Černocký, J. (2003). Phoneme Recognition Using Temporal Patterns. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2003. Lecture Notes in Computer Science(), vol 2807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39398-6_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39398-6_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20024-6

  • Online ISBN: 978-3-540-39398-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics