Abstract
Corpus-based stochastic language models have achieved significant success in speech recognition, but construction of a corpus pertaining to a specific application is a difficult task. This paper introduces a Case-Based Reasoning system to generate natural language corpora. In comparison to traditional natural language generation approaches, this system overcomes the inflexibility of template-based methods while avoiding the linguistic sophistication of rule-based packages. The evaluation of the system indicates our approach is effective in generating users’ specifications or queries as 98% of the generated sentences are grammatically correct. The study result also shows that the language model derived from the generated corpus can significantly outperform a general language model or a dictation grammar.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Becchetti, C., Ricotti, L.P.: Speech Recognition: Theory and C++ Implementation. John Wiley & Sons, Chichester (1999)
Somers, H.: Empirical Approaches to Natural Language Processing. In: Dale, R., et al. (eds.) Handbook of Natural Language Processing, pp. 377–384. New York, Marcel Dekker (2000)
Jurafsky, D., et al.: The Berkeley Restaurant Project. In: Proceedings of ICSLP-1994, Yokohama, Japan, pp. 2139–2142 (1994)
Lesher, G.W., et al.: Effects of ngram order and training text size on word prediction. In: Proc. of the RESNA 1999 Annual Conference, Arlington, VA, pp. 52–54 (1999)
Rudnicky, A.I., et al.: Task and Domain Specific Modeling in the Carnegie Mellon Communicator System. In: ICSLP 2000, Beijing, China (2000)
Lesher, G.W., Sanelli, C.: A Web-Based System for Autonomous Text Corpus Generation. In: Proceedings of ISSAAC 2000, Washington DC, U.S.A. (2000)
Thompson, H.S.: Corpus Creation for Data-Intensive Linguistics. In: Dale, R., et al. (eds.) Handbook of Natural Language Processing, New York, Marcel Dekker, pp. 385–401 (2000)
Reiter, E.: NLG vs. Templates. In: Proceedings of the 5th European Workshop on Natural Language Generation, Leiden, The Netherlands (1995)
Oh, A.H., Rudnicky, A.: Stochastic Language Generation for Spoken Dialogue Systems. In: Proceedings of the ANLP/NAACL Workshop on Conversational Systems, May 2000, pp. 27–32 (2000)
Varges, S., Mellish, C.: Instance-based Natural Language Generation. In: Proceedings of the 2nd Meeting of the North America Chapter of the Association for Computational Linguistics (NAACL-2001), Pittsburgh, PA (June 2001)
Pan, S., Weng, W.: Designing a speech corpus for instance-based spoken language generation. In: Proceedings of INLG 2002, New York, U.S.A. (2002)
Varges, S.: Instance-based Natural Language Generation, PhD thesis, Institute for Communicating and Collaborative Systems, School of Informatics, University of Edinburgh (2003)
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, pp. 332–334. Prentice Hall, Englewood Cliffs (2000)
Sun, J., et al.: A Robust Speech Understanding System Using Conceptual Relational Grammar. In: Proceedings of ICSLP 2000, Beijing, China (October 2000)
Minock, M.J.: A Phrasal Generator for Describing Relational Database Queries. In: Proceedings of the 9th European Association of Computational Linguistics workshop on Natural Language Generation, Budapest, Hungary (April 2003)
Halliday, M.A.K., Matthiessen, M.I.M.: An Introduction to Functional Grammar, 3rd edn., ARNOLD (2004)
Ratnaparkhi, A.: Trainable Methods for Surface Natural Language Generation. In: Proceedings of the ANLP/NAACL 2000, Seattle, WA, pp. 194–201 (2000)
The CMU Sphinx Group Open Source Speech Recognition Engines. (Retrieved December 12, 2004), From http://cmusphinx.sourceforge.net/html/cmusphinx.php
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fan, Y., Kendall, E. (2005). A Case-Based Reasoning Approach for Speech Corpus Generation. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_86
Download citation
DOI: https://doi.org/10.1007/11562214_86
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)