Robustness in Statistical Language Modeling: Review and Perspectives

Bellegarda, Jerome R.

doi:10.1007/978-94-015-9719-7_4

Jerome R. Bellegarda⁵

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 17))

133 Accesses
8 Citations

Abstract

Robustness in statistical language modeling refers to the need to maintain adequate speech recognition accuracy as fewer and fewer constraints are placed on the spoken utterances, or more generally when the lexical, syntactic, or semantic characteristics of the discourse in the training and testing tasks differ. Obstacles to robustness involve the dual issues of model coverage and parameter reliability, which are intricately related to the quality and quantity of training data, as well as the estimation paradigm selected. Domain-to-domain differences impose further variations in vocabulary, context, grammar, and style. This chapter reviews a selected subset of recent approaches proposed to deal with some of these issues, and discusses possible future directions of improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Decade of Discriminative Language Modeling for Automatic Speech Recognition

Topic Modeling for Speech and Language Processing

Statistical and Linguistic Knowledge Based Speech Recognition System: Language Acquisition Device for Machines

References

Adda, G., Jardino, M. and Gauvain, J. L. (1999). Language modeling for broadcast news transcription, Proceedings of the Sixth European Conference Speech Communication and Technology, Vol. 4, Budapest, Hungary, pp. 1759–1762.
Google Scholar
Bahl, L. R., Brown, P. E, de Souza, P. V. and Mercer, R. L. (1989). A tree-based statistical language model for natural language speech recognition, IEEE Transactions on Acoustics, Speech, and Signal ProcessingASSP-37(7): 1001–1008.
Google Scholar
Bahl, L. R., Jelinek, E. and Mercer, R. L. (1983). A maximum likelihood approach to continuous speech recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI5 (2): 179–190.
Google Scholar
Bellegarda, J. R. (1996). Context-dependent vector clustering for speech recognition, in C.-H. Lee, E K. Soong and K. K. Paliwal (eds), Automatic Speech and Speaker Recognition: Advanced Topics, Kluwer Academic Publishers, New York, chapter 6, pp. 133–157.
Google Scholar
Bellegarda, J. R. (1997). A latent semantic analysis framework for large-span language modeling, Proceedings of the Fifth European Conference Speech Communication and Technology, Vol. 3, Rhodes, Greece, pp. 1451–1454.
Google Scholar
Bellegarda, J. R. (1998a). Exploiting both local and global constraints for multi-span statistical language modeling, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, Seattle, WA, pp. 677–680.
Google Scholar
Bellegarda, J. R. (1998b). A multi-span language modeling framework for large vocabulary speech recognition, IEEE Transactions on Speech and Audio Processing 6 (5): 456–467.
Article Google Scholar
Bellegarda, J. R. (1999). Speech recognition experiments using multi-span statistical language modeling, Proceedings of the 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. II, Phoenix, AZ, pp. 717–720.
Google Scholar
Bellegarda, J. R., Butzberger, J. W, Chow, Y.-L., Coccaro, N. B. and Naik, D. (1996). A novel word clustering algorithm based on latent semantic analysis, Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Atlanta, GA, pp. 172–175.
Chapter Google Scholar
Bellegarda, J. R. and Nahamoo, D. (1990). Tied mixture continuous parameter modeling for speech recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP38(12): 2033–2045.
Google Scholar
Berry, M. and Sameh, A. (1989). An overview of parallel algorithms for the singular value and dense symmetric eigenvalue problems, Journal of Computational Applied Mathematics 27: 191–213.
Article Google Scholar
Berry, M. W. (1992). Large-scale sparse singular value computations, International Journal for Supercomputer Applications 6 (1): 13–49.
Google Scholar
Berry, M. W, Dumais, S. T. and O’Brien, G. W. (1995). Using linear algebra for intelligent information retrieval, SIAM Review 37 (4): 573–595.
Article Google Scholar
Brousseau, J., Drouin, C., Foster, G., Isabelle, P, Kuhn, R., Normandin, Y. and Plamondon, P (1995). French speech recognition in an automatic dictation system for translators: The TransTalk project, Proceedings of the Fourth European Conference Speech Communication and Technology, Vol. 1, Madrid, pp. 193–196.
Google Scholar
Chase, L., Rosenfeld, R. and Ward, W. (1994). Error-responsive modifications to speech recognizers: Negative n-grams, Proceedings of the 1994 International Conference Spoken Language Processing, Yokohama.
Google Scholar
Chelba, C., Engle, D., Jelinek, F., Jimenez, V, Khudanpur, S., Mangu, L., Printz, H., Ristad, E. S., Rosenfeld, R., Stolcke, A. and Wu, D. (1997). Structure and performance of a dependency language model, Proceedings of the Fifth European Conference Speech Communication and Technology, Vol. 5, Rhodes, Greece, pp. 2775–2778.
Google Scholar
Chelba, C. and Jelinek, E (1999). Recognition performance of a structured language model, Proceedings of the Sixth European Conference Speech Communication and Technology, Vol. 4, Budapest, pp. 1567–1570.
Google Scholar
Chen, S. (1996). Building Probabilistic Models for Natural Language, PhD thesis, Harvard University, Cambridge, MA.
Google Scholar
Chou, P. A. (1988). Applications of Information Theory to Pattern Recognition and the Design of Decision Trees and Trellises, PhD thesis, Stanford University, Stanford, CA.
Google Scholar
Church, K. W. (1987). Phonological Parsing in Speech Recognition, Kluwer Academic Publishers, New York.
Book Google Scholar
Clarkson, P. R. and Robinson, A. J. (1997). Language model adaptation using mixtures and an exponentially decaying cache, Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, Munich, pp. 799–802.
Google Scholar
Cullum, J. K. and Willoughby, R. A. (1985). Real rectangular matrices, Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1 Theory, Brickhauser, Boston, chapter 5.
Google Scholar
Darroch, J. N. and Ratcliff, D. (1972). Generalized iterative scaling for log-linear models, Annals of Mathematical Statistics 43 (5): 1470–1480.
Article Google Scholar
Deerwester, S., Dumais, S. T, Fumas, G. W, Landauer, T. K. and Harshman, R. (1990). Indexing by latent semantic analysis, Journal of the American Society for Information Science 41: 391–407.
Article Google Scholar
Della Pietra, S., Della Pietra, V. and Lafferty, J. (1997). Inducing features of random fields, IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-19(1): 1–13.
Google Scholar
Della Pietra, S., Della Pietra, V, Mercer, R. and Roukos, S. (1992). Adaptive language model estimation using minimum discrimination estimation, Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, San Francisco, CA, pp. 633–636.
Google Scholar
Dumais, S. T. (1991). Improving the retrieval of information from external sources, Behavior Research on Methods, Instrumentation, and Computers 23 (2): 229–236.
Google Scholar
Dumais, S. T. (1994). Latent semantic indexing (LSI) and TREC-2, in D. Harman (ed.), Second Text REtrieval Conference (TREC-2), NIST Publication 500–215, pp. 105–116.
Google Scholar
Essen, U. and Steinbiss, V. (1992). Co-occurrence smoothing for stochastic language modeling, Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, San Francisco, CA, pp. 161–164.
Google Scholar
Farhat, A., Isabelle, J. and O’Shaughnessy, D. (1996). Clustering words for statistical language models based on contextual word similarity, Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Atlanta, GA, pp. 180–183.
Google Scholar
Federico, M. and de Mori, R. (1998). Language modeling, in R. de Mori (ed.), Spoken Dialogues with Computers, Academic Press, London, chapter 7, pp. 199–230.
Google Scholar
Foltz, P. W. and Dumais, S. T. (1992). Personalized information delivery: An analysis of information filtering methods, Communications of the ACM 35 (12): 51–60.
Article Google Scholar
Gildea, D. and Hoffman, T. (1999). Topic-based language modeling using EM, Proceedings of the Sixth European Conference Speech Communication and Technology, VoL 5, Budapest, pp. 2167–2170.
Google Scholar
Gotoh, Y. and Renais, S. (1997). Document space models using latent semantic analysis, Proceedings of the Fifth European Conference Speech Communication and Technology, Vol. 3, Rhodes, Greece, pp. 1443–1448.
Google Scholar
Isotani, R. and Matsunaga, S. (1994). A stochastic language model for speech recognition integrating local and global constraints, Proceedings of the 1994 IFF.R International Conference on Acoustics, Speech, and Signal Processing, Vol. II, Adelaide, Australia, pp. 5–8.
Google Scholar
Iyer, R. and Ostendorf, M. (1999). Modeling long distance dependencies in language: Topic mixtures versus dynamic cache models, IEEE Transactions on Speech and Audio Processing 7 (1): 30–39.
Article Google Scholar
Iyer, R., Ostendorf, M. and Rohlicek, J. R. (1994). Language modeling with sentence-level mixtures, Proceedings of the ARPA Speech and Natural Language Workshop, Morgan Kaufmann Publishers, pp. 82–86.
Google Scholar
Jardino, M. (1996). Multilingual stochastic n-gram class language models, Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Atlanta, GA, pp. 161–163.
Chapter Google Scholar
Jardino, M. and Adda, G. (1993). Automatic word classification using simulated annealing, Proceedings of the 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, Minneapolis, MN, pp. 41–44.
Google Scholar
Jelinek, F. (1985). The development of an experimental discrete dictation recognizer, Proceedings of the IEEE 73 (11): 1616–1624.
Article Google Scholar
Jelinek, E. (1990). Self-organized language modeling for speech recognition, in A. Waibel and K.-F. Lee (eds), Readings in Speech Recognition, Morgan Kaufmann Publishers, pp. 450–506.
Google Scholar
Jelinek, F. and Chelba, C. (1999). Putting language into language modeling, Proceedings of the Sixth European Conference Speech Communication and Technology, Vol. 1, Budapest, pp. KN1KN5.
Google Scholar
Jelinek, F. and Lafferty, J. D. (1991). Computation of the probability of initial substring generation by stochastic context-free grammars, Computational Linguistics 17: 315–323.
Google Scholar
Jelinek, F. and Mercer, R. L. (1980). Interpolated estimation of Markov source parameters from sparse data, Pattern Recognition in Practice, Amsterdam, pp. 381–397.
Google Scholar
Jurafsky, D., Wooters, C., Segal, J., Stolcke, A., Fosler, E., Tajchman, G. and Morgan, N. (1995). Using a stochastic context-free grammar as a language model for speech recognition, Proceedings of the 1995 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Detroit, MI, pp. 189–192.
Chapter Google Scholar
Katz, S. M. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer, IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP35: 400–401.
Google Scholar
Kenne, P. E., O’Kane, M. and Pearcy, H. G. (1995). Language modeling of spontaneous speech in a court context, Proceedings of the Fourth European Conference Speech Communication and Technology, Vol. 3, Madrid, pp. 1801–1804.
Google Scholar
Kneser, R. (1996). Statistical language modeling using a variable context, Proceedings of the 1996 International Conference on Spoken Language Processing, Philadelphia, PA, pp. 494–497.
Google Scholar
Kneser, R. and Ney, H. (1995). Improved backing-off for n-gram language modeling, Proceedings of the 1995 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Detroit, MI, pp. 181–184.
Chapter Google Scholar
Kneser, R. and Steinbiss, V. (1993). On the dynamic adaptation of stochastic language models, Proceedings of the 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. II, Minneapolis, MN, pp. 586–588.
Chapter Google Scholar
Kubala, F, Bellegarda, J. R., Cohen, J. R., Pallett, D., Paul, D. B., Phillips, M., Rajasekaran, R., Richardson, F, Riley, M., Rosenfeld, R., Roth, R. and Weintraub, M. (1994). The hub and spoke paradigm for CSR evaluation, Proceedings of the ARPA Speech and Natural Language Workshop, Morgan Kaufmann Publishers, pp. 40–44.
Google Scholar
Kuhn, R. and de Mori, R. (1990). A cache-based natural language method for speech recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-12-(6): 570–582.
Google Scholar
Lafferty, J. D. and Suhm, B. (1995). Cluster expansion and iterative scaling for maximum entropy language models, in K. Hanson and R. Silver (eds), Maximum Entropy and Bayesian Methods, Kluwer Academic Publishers, Norwell, MA.
Google Scholar
Landauer, T. K. and Dumais, S. T. (1997). Solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge, Psychological Review 104 (2): 211–240.
Article Google Scholar
Landauer, T. K., Laham, D., Rehder, B. and Schreiner, M. E. (1998). How well can passage meaning be derived without using word order: A comparison of latent semantic analysis and humans, Proceedings of the Cognitive Science Society.
Google Scholar
Lau, R., Rosenfeld, R. and Roukos, S. (1993). Trigger-based language models: A maximum entropy approach, Proceedings of the 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. II, Minneapolis, MN, pp. 45–48.
Chapter Google Scholar
Maltese, G. and Mancini, F. (1992). An automatic technique to include grammatical and morphological information in a trigram-based statistical language model, Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, San Francisco, CA, pp. 157–160.
Google Scholar
Martin, S. C., Liermann, J. and Ney, H. (1997). Adaptive topic-dependent language modelling using word-based varigrams, Proceedings of the Fifth European Conference Speech Communication and Technology, Vol. 3, Rhodes, Greece, pp. 1447–1450.
Google Scholar
Mood, A., Graybill, F. and Boes, D. (1974). Introduction to the Theory of Statistics, McGraw-Hill, New York.
Google Scholar
Ney, H., Essen, U. and Kneser, R. (1994). On structuring probabilistic dependences in stochastic language modeling, Computer, Speech, and Language 8: 1–38.
Article Google Scholar
Niesler, T. and Woodland, p (1996). A variable-length category-based n-gram language model, Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Atlanta, GA, pp. 164–167.
Chapter Google Scholar
Pereira, E C., Singer, Y. and Tishby, N. (1995). Beyond word n-grams, in D. Yarowsky and K. Church (eds), Proceedings of the Third Workshop on Very Large Corpora, Massachusetts Institute of Technology, Cambridge, MA, pp. 95–106.
Google Scholar
Rabiner, L. R., Juang, B.-H. and Lee, C.-H. (1996). An overview of automatic speech recognition, in C.-H. Lee, F. K. Soong and K. K. Paliwal (eds), Automatic Speech and Speaker Recognition: Advanced Topics, Kluwer Academic Publishers, Boston, MA, chapter 1, pp. 1–30.
Google Scholar
Rosenfeld, R. (1994). The CMU statistical language modeling toolkit and its use in the 1994 ARPA CSR evaluation, Proceedings of the ARPA Speech and Natural Language Workshop, Morgan Kaufmann Publishers.
Google Scholar
Rosenfeld, R. (1995). Optimizing lexical and n-gram coverage via judicious use of linguistic data, Proceedings of the Fourth European Conference on Speech Communication and Technology, Madrid, pp. 1763–1766.
Google Scholar
Rosenfeld, R. (1996). A maximum entropy approach to adaptive statistical language modeling, Computer Speech and Language 10: 187–228.
Article Google Scholar
Roukos, S. (1997). Language representation, in R. Cole (ed.), Survey of the State of the Art in Human Language Technology, Cambridge University Press, chapter 6.
Google Scholar
Schwartz, R., Imai, T, Kubala, F., Nguyen, L. and Makhoul, J. (1997). A maximum likelihood model for topic classification of broadcast news, Proceedings of the Fifth European Conference Speech Communication and Technology, Vol. 3, Rhodes, Greece, pp. 1455–1458.
Google Scholar
Spies, M. (1995). A language model for compound words in speech recognition, Proceedings of the Fourth European Conference on Speech Communication and Technology, Madrid, pp. 1767–1770.
Google Scholar
Stolcke, A. and Segal, J. (1994). Precise n-gram probabilities from stochastic context-free grammars, Proceedings of the 32nd Meeting of the Association for Computational Linguistics, Las Cruces, NM, pp. 74–79.
Google Scholar
Story, R. E. (1996). An explanation of the effectiveness of latent semantic indexing by means of a bayesian regression model, Information Processing and Management 32 (3): 329–344.
Article Google Scholar
Tamoto, M. and Kawabata, T. (1995). Clustering word category based on binomial posteriori cooccurrence distribution, Proceedings of the 1995 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Detroit, MI, pp. 165–168.
Google Scholar
Witten, I. H. and Bell, T. C. (1991). The zero-frequency problem: Estimating the probability of novel events in adaptive text compression, IEEE Transactions on Information Theory 37(4): 10851094.
Google Scholar
Woodland, P C., Odell, J. J., Valtchev, V. and Young, S. J. (1994). Large vocabulary continuous speech recognition using HTK, Proceedings of the 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, Adelaide, Australia, pp. 125–128.
Google Scholar
Younger, D. H. (1967). Recognition and parsing of context-free languages in time N 3, Information and Control 10: 198–208.
Article Google Scholar
Zhang, R., Black, E. and Finch, A. (1999). Using detailed linguistic structure in language modeling, Proceedings of the Sixth European Conference Speech Communication and Technology, Vol. 4, Budapest, pp. 1815–1818.
Google Scholar
Zue, V, Glass, J., Goodine, D., Leung, H., Phillips, M., Polifroni, J. and Seneff, S. (1991). Integration of speech recognition and natural language processing in the MIT voyager system, Proceedings of the 1991 IEEE International Conference on Acoustics, Speech, and Signal Processing, Toronto, pp. 713–716.
Google Scholar

Download references

Author information

Authors and Affiliations

Two Infinite Loop, Apple Computer, Cupertino, California, 95014, USA
Jerome R. Bellegarda

Authors

Jerome R. Bellegarda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Panasonic Speech Technology Laboratory, Santa Barbara, California, USA
Jean-Claude Junqua
University of Groningen, The Netherlands
Gertjan van Noord

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bellegarda, J.R. (2001). Robustness in Statistical Language Modeling: Review and Perspectives. In: Junqua, JC., van Noord, G. (eds) Robustness in Language and Speech Technology. Text, Speech and Language Technology, vol 17. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-9719-7_4

Download citation

DOI: https://doi.org/10.1007/978-94-015-9719-7_4
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5643-6
Online ISBN: 978-94-015-9719-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Robustness in Statistical Language Modeling: Review and Perspectives

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Decade of Discriminative Language Modeling for Automatic Speech Recognition

Topic Modeling for Speech and Language Processing

Statistical and Linguistic Knowledge Based Speech Recognition System: Language Acquisition Device for Machines

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Robustness in Statistical Language Modeling: Review and Perspectives

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Decade of Discriminative Language Modeling for Automatic Speech Recognition

Topic Modeling for Speech and Language Processing

Statistical and Linguistic Knowledge Based Speech Recognition System: Language Acquisition Device for Machines

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation