Skip to main content

Robustness in Statistical Language Modeling: Review and Perspectives

  • Chapter
Robustness in Language and Speech Technology

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 17))

Abstract

Robustness in statistical language modeling refers to the need to maintain adequate speech recognition accuracy as fewer and fewer constraints are placed on the spoken utterances, or more generally when the lexical, syntactic, or semantic characteristics of the discourse in the training and testing tasks differ. Obstacles to robustness involve the dual issues of model coverage and parameter reliability, which are intricately related to the quality and quantity of training data, as well as the estimation paradigm selected. Domain-to-domain differences impose further variations in vocabulary, context, grammar, and style. This chapter reviews a selected subset of recent approaches proposed to deal with some of these issues, and discusses possible future directions of improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Adda, G., Jardino, M. and Gauvain, J. L. (1999). Language modeling for broadcast news transcription, Proceedings of the Sixth European Conference Speech Communication and Technology, Vol. 4, Budapest, Hungary, pp. 1759–1762.

    Google Scholar 

  • Bahl, L. R., Brown, P. E, de Souza, P. V. and Mercer, R. L. (1989). A tree-based statistical language model for natural language speech recognition, IEEE Transactions on Acoustics, Speech, and Signal ProcessingASSP-37(7): 1001–1008.

    Google Scholar 

  • Bahl, L. R., Jelinek, E. and Mercer, R. L. (1983). A maximum likelihood approach to continuous speech recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI5 (2): 179–190.

    Google Scholar 

  • Bellegarda, J. R. (1996). Context-dependent vector clustering for speech recognition, in C.-H. Lee, E K. Soong and K. K. Paliwal (eds), Automatic Speech and Speaker Recognition: Advanced Topics, Kluwer Academic Publishers, New York, chapter 6, pp. 133–157.

    Google Scholar 

  • Bellegarda, J. R. (1997). A latent semantic analysis framework for large-span language modeling, Proceedings of the Fifth European Conference Speech Communication and Technology, Vol. 3, Rhodes, Greece, pp. 1451–1454.

    Google Scholar 

  • Bellegarda, J. R. (1998a). Exploiting both local and global constraints for multi-span statistical language modeling, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, Seattle, WA, pp. 677–680.

    Google Scholar 

  • Bellegarda, J. R. (1998b). A multi-span language modeling framework for large vocabulary speech recognition, IEEE Transactions on Speech and Audio Processing 6 (5): 456–467.

    Article  Google Scholar 

  • Bellegarda, J. R. (1999). Speech recognition experiments using multi-span statistical language modeling, Proceedings of the 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. II, Phoenix, AZ, pp. 717–720.

    Google Scholar 

  • Bellegarda, J. R., Butzberger, J. W, Chow, Y.-L., Coccaro, N. B. and Naik, D. (1996). A novel word clustering algorithm based on latent semantic analysis, Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Atlanta, GA, pp. 172–175.

    Chapter  Google Scholar 

  • Bellegarda, J. R. and Nahamoo, D. (1990). Tied mixture continuous parameter modeling for speech recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP38(12): 2033–2045.

    Google Scholar 

  • Berry, M. and Sameh, A. (1989). An overview of parallel algorithms for the singular value and dense symmetric eigenvalue problems, Journal of Computational Applied Mathematics 27: 191–213.

    Article  Google Scholar 

  • Berry, M. W. (1992). Large-scale sparse singular value computations, International Journal for Supercomputer Applications 6 (1): 13–49.

    Google Scholar 

  • Berry, M. W, Dumais, S. T. and O’Brien, G. W. (1995). Using linear algebra for intelligent information retrieval, SIAM Review 37 (4): 573–595.

    Article  Google Scholar 

  • Brousseau, J., Drouin, C., Foster, G., Isabelle, P, Kuhn, R., Normandin, Y. and Plamondon, P (1995). French speech recognition in an automatic dictation system for translators: The TransTalk project, Proceedings of the Fourth European Conference Speech Communication and Technology, Vol. 1, Madrid, pp. 193–196.

    Google Scholar 

  • Chase, L., Rosenfeld, R. and Ward, W. (1994). Error-responsive modifications to speech recognizers: Negative n-grams, Proceedings of the 1994 International Conference Spoken Language Processing, Yokohama.

    Google Scholar 

  • Chelba, C., Engle, D., Jelinek, F., Jimenez, V, Khudanpur, S., Mangu, L., Printz, H., Ristad, E. S., Rosenfeld, R., Stolcke, A. and Wu, D. (1997). Structure and performance of a dependency language model, Proceedings of the Fifth European Conference Speech Communication and Technology, Vol. 5, Rhodes, Greece, pp. 2775–2778.

    Google Scholar 

  • Chelba, C. and Jelinek, E (1999). Recognition performance of a structured language model, Proceedings of the Sixth European Conference Speech Communication and Technology, Vol. 4, Budapest, pp. 1567–1570.

    Google Scholar 

  • Chen, S. (1996). Building Probabilistic Models for Natural Language, PhD thesis, Harvard University, Cambridge, MA.

    Google Scholar 

  • Chou, P. A. (1988). Applications of Information Theory to Pattern Recognition and the Design of Decision Trees and Trellises, PhD thesis, Stanford University, Stanford, CA.

    Google Scholar 

  • Church, K. W. (1987). Phonological Parsing in Speech Recognition, Kluwer Academic Publishers, New York.

    Book  Google Scholar 

  • Clarkson, P. R. and Robinson, A. J. (1997). Language model adaptation using mixtures and an exponentially decaying cache, Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, Munich, pp. 799–802.

    Google Scholar 

  • Cullum, J. K. and Willoughby, R. A. (1985). Real rectangular matrices, Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1 Theory, Brickhauser, Boston, chapter 5.

    Google Scholar 

  • Darroch, J. N. and Ratcliff, D. (1972). Generalized iterative scaling for log-linear models, Annals of Mathematical Statistics 43 (5): 1470–1480.

    Article  Google Scholar 

  • Deerwester, S., Dumais, S. T, Fumas, G. W, Landauer, T. K. and Harshman, R. (1990). Indexing by latent semantic analysis, Journal of the American Society for Information Science 41: 391–407.

    Article  Google Scholar 

  • Della Pietra, S., Della Pietra, V. and Lafferty, J. (1997). Inducing features of random fields, IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-19(1): 1–13.

    Google Scholar 

  • Della Pietra, S., Della Pietra, V, Mercer, R. and Roukos, S. (1992). Adaptive language model estimation using minimum discrimination estimation, Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, San Francisco, CA, pp. 633–636.

    Google Scholar 

  • Dumais, S. T. (1991). Improving the retrieval of information from external sources, Behavior Research on Methods, Instrumentation, and Computers 23 (2): 229–236.

    Google Scholar 

  • Dumais, S. T. (1994). Latent semantic indexing (LSI) and TREC-2, in D. Harman (ed.), Second Text REtrieval Conference (TREC-2), NIST Publication 500–215, pp. 105–116.

    Google Scholar 

  • Essen, U. and Steinbiss, V. (1992). Co-occurrence smoothing for stochastic language modeling, Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, San Francisco, CA, pp. 161–164.

    Google Scholar 

  • Farhat, A., Isabelle, J. and O’Shaughnessy, D. (1996). Clustering words for statistical language models based on contextual word similarity, Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Atlanta, GA, pp. 180–183.

    Google Scholar 

  • Federico, M. and de Mori, R. (1998). Language modeling, in R. de Mori (ed.), Spoken Dialogues with Computers, Academic Press, London, chapter 7, pp. 199–230.

    Google Scholar 

  • Foltz, P. W. and Dumais, S. T. (1992). Personalized information delivery: An analysis of information filtering methods, Communications of the ACM 35 (12): 51–60.

    Article  Google Scholar 

  • Gildea, D. and Hoffman, T. (1999). Topic-based language modeling using EM, Proceedings of the Sixth European Conference Speech Communication and Technology, VoL 5, Budapest, pp. 2167–2170.

    Google Scholar 

  • Gotoh, Y. and Renais, S. (1997). Document space models using latent semantic analysis, Proceedings of the Fifth European Conference Speech Communication and Technology, Vol. 3, Rhodes, Greece, pp. 1443–1448.

    Google Scholar 

  • Isotani, R. and Matsunaga, S. (1994). A stochastic language model for speech recognition integrating local and global constraints, Proceedings of the 1994 IFF.R International Conference on Acoustics, Speech, and Signal Processing, Vol. II, Adelaide, Australia, pp. 5–8.

    Google Scholar 

  • Iyer, R. and Ostendorf, M. (1999). Modeling long distance dependencies in language: Topic mixtures versus dynamic cache models, IEEE Transactions on Speech and Audio Processing 7 (1): 30–39.

    Article  Google Scholar 

  • Iyer, R., Ostendorf, M. and Rohlicek, J. R. (1994). Language modeling with sentence-level mixtures, Proceedings of the ARPA Speech and Natural Language Workshop, Morgan Kaufmann Publishers, pp. 82–86.

    Google Scholar 

  • Jardino, M. (1996). Multilingual stochastic n-gram class language models, Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Atlanta, GA, pp. 161–163.

    Chapter  Google Scholar 

  • Jardino, M. and Adda, G. (1993). Automatic word classification using simulated annealing, Proceedings of the 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, Minneapolis, MN, pp. 41–44.

    Google Scholar 

  • Jelinek, F. (1985). The development of an experimental discrete dictation recognizer, Proceedings of the IEEE 73 (11): 1616–1624.

    Article  Google Scholar 

  • Jelinek, E. (1990). Self-organized language modeling for speech recognition, in A. Waibel and K.-F. Lee (eds), Readings in Speech Recognition, Morgan Kaufmann Publishers, pp. 450–506.

    Google Scholar 

  • Jelinek, F. and Chelba, C. (1999). Putting language into language modeling, Proceedings of the Sixth European Conference Speech Communication and Technology, Vol. 1, Budapest, pp. KN1KN5.

    Google Scholar 

  • Jelinek, F. and Lafferty, J. D. (1991). Computation of the probability of initial substring generation by stochastic context-free grammars, Computational Linguistics 17: 315–323.

    Google Scholar 

  • Jelinek, F. and Mercer, R. L. (1980). Interpolated estimation of Markov source parameters from sparse data, Pattern Recognition in Practice, Amsterdam, pp. 381–397.

    Google Scholar 

  • Jurafsky, D., Wooters, C., Segal, J., Stolcke, A., Fosler, E., Tajchman, G. and Morgan, N. (1995). Using a stochastic context-free grammar as a language model for speech recognition, Proceedings of the 1995 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Detroit, MI, pp. 189–192.

    Chapter  Google Scholar 

  • Katz, S. M. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer, IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP35: 400–401.

    Google Scholar 

  • Kenne, P. E., O’Kane, M. and Pearcy, H. G. (1995). Language modeling of spontaneous speech in a court context, Proceedings of the Fourth European Conference Speech Communication and Technology, Vol. 3, Madrid, pp. 1801–1804.

    Google Scholar 

  • Kneser, R. (1996). Statistical language modeling using a variable context, Proceedings of the 1996 International Conference on Spoken Language Processing, Philadelphia, PA, pp. 494–497.

    Google Scholar 

  • Kneser, R. and Ney, H. (1995). Improved backing-off for n-gram language modeling, Proceedings of the 1995 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Detroit, MI, pp. 181–184.

    Chapter  Google Scholar 

  • Kneser, R. and Steinbiss, V. (1993). On the dynamic adaptation of stochastic language models, Proceedings of the 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. II, Minneapolis, MN, pp. 586–588.

    Chapter  Google Scholar 

  • Kubala, F, Bellegarda, J. R., Cohen, J. R., Pallett, D., Paul, D. B., Phillips, M., Rajasekaran, R., Richardson, F, Riley, M., Rosenfeld, R., Roth, R. and Weintraub, M. (1994). The hub and spoke paradigm for CSR evaluation, Proceedings of the ARPA Speech and Natural Language Workshop, Morgan Kaufmann Publishers, pp. 40–44.

    Google Scholar 

  • Kuhn, R. and de Mori, R. (1990). A cache-based natural language method for speech recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-12-(6): 570–582.

    Google Scholar 

  • Lafferty, J. D. and Suhm, B. (1995). Cluster expansion and iterative scaling for maximum entropy language models, in K. Hanson and R. Silver (eds), Maximum Entropy and Bayesian Methods, Kluwer Academic Publishers, Norwell, MA.

    Google Scholar 

  • Landauer, T. K. and Dumais, S. T. (1997). Solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge, Psychological Review 104 (2): 211–240.

    Article  Google Scholar 

  • Landauer, T. K., Laham, D., Rehder, B. and Schreiner, M. E. (1998). How well can passage meaning be derived without using word order: A comparison of latent semantic analysis and humans, Proceedings of the Cognitive Science Society.

    Google Scholar 

  • Lau, R., Rosenfeld, R. and Roukos, S. (1993). Trigger-based language models: A maximum entropy approach, Proceedings of the 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. II, Minneapolis, MN, pp. 45–48.

    Chapter  Google Scholar 

  • Maltese, G. and Mancini, F. (1992). An automatic technique to include grammatical and morphological information in a trigram-based statistical language model, Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, San Francisco, CA, pp. 157–160.

    Google Scholar 

  • Martin, S. C., Liermann, J. and Ney, H. (1997). Adaptive topic-dependent language modelling using word-based varigrams, Proceedings of the Fifth European Conference Speech Communication and Technology, Vol. 3, Rhodes, Greece, pp. 1447–1450.

    Google Scholar 

  • Mood, A., Graybill, F. and Boes, D. (1974). Introduction to the Theory of Statistics, McGraw-Hill, New York.

    Google Scholar 

  • Ney, H., Essen, U. and Kneser, R. (1994). On structuring probabilistic dependences in stochastic language modeling, Computer, Speech, and Language 8: 1–38.

    Article  Google Scholar 

  • Niesler, T. and Woodland, p (1996). A variable-length category-based n-gram language model, Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Atlanta, GA, pp. 164–167.

    Chapter  Google Scholar 

  • Pereira, E C., Singer, Y. and Tishby, N. (1995). Beyond word n-grams, in D. Yarowsky and K. Church (eds), Proceedings of the Third Workshop on Very Large Corpora, Massachusetts Institute of Technology, Cambridge, MA, pp. 95–106.

    Google Scholar 

  • Rabiner, L. R., Juang, B.-H. and Lee, C.-H. (1996). An overview of automatic speech recognition, in C.-H. Lee, F. K. Soong and K. K. Paliwal (eds), Automatic Speech and Speaker Recognition: Advanced Topics, Kluwer Academic Publishers, Boston, MA, chapter 1, pp. 1–30.

    Google Scholar 

  • Rosenfeld, R. (1994). The CMU statistical language modeling toolkit and its use in the 1994 ARPA CSR evaluation, Proceedings of the ARPA Speech and Natural Language Workshop, Morgan Kaufmann Publishers.

    Google Scholar 

  • Rosenfeld, R. (1995). Optimizing lexical and n-gram coverage via judicious use of linguistic data, Proceedings of the Fourth European Conference on Speech Communication and Technology, Madrid, pp. 1763–1766.

    Google Scholar 

  • Rosenfeld, R. (1996). A maximum entropy approach to adaptive statistical language modeling, Computer Speech and Language 10: 187–228.

    Article  Google Scholar 

  • Roukos, S. (1997). Language representation, in R. Cole (ed.), Survey of the State of the Art in Human Language Technology, Cambridge University Press, chapter 6.

    Google Scholar 

  • Schwartz, R., Imai, T, Kubala, F., Nguyen, L. and Makhoul, J. (1997). A maximum likelihood model for topic classification of broadcast news, Proceedings of the Fifth European Conference Speech Communication and Technology, Vol. 3, Rhodes, Greece, pp. 1455–1458.

    Google Scholar 

  • Spies, M. (1995). A language model for compound words in speech recognition, Proceedings of the Fourth European Conference on Speech Communication and Technology, Madrid, pp. 1767–1770.

    Google Scholar 

  • Stolcke, A. and Segal, J. (1994). Precise n-gram probabilities from stochastic context-free grammars, Proceedings of the 32nd Meeting of the Association for Computational Linguistics, Las Cruces, NM, pp. 74–79.

    Google Scholar 

  • Story, R. E. (1996). An explanation of the effectiveness of latent semantic indexing by means of a bayesian regression model, Information Processing and Management 32 (3): 329–344.

    Article  Google Scholar 

  • Tamoto, M. and Kawabata, T. (1995). Clustering word category based on binomial posteriori cooccurrence distribution, Proceedings of the 1995 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Detroit, MI, pp. 165–168.

    Google Scholar 

  • Witten, I. H. and Bell, T. C. (1991). The zero-frequency problem: Estimating the probability of novel events in adaptive text compression, IEEE Transactions on Information Theory 37(4): 10851094.

    Google Scholar 

  • Woodland, P C., Odell, J. J., Valtchev, V. and Young, S. J. (1994). Large vocabulary continuous speech recognition using HTK, Proceedings of the 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, Adelaide, Australia, pp. 125–128.

    Google Scholar 

  • Younger, D. H. (1967). Recognition and parsing of context-free languages in time N 3, Information and Control 10: 198–208.

    Article  Google Scholar 

  • Zhang, R., Black, E. and Finch, A. (1999). Using detailed linguistic structure in language modeling, Proceedings of the Sixth European Conference Speech Communication and Technology, Vol. 4, Budapest, pp. 1815–1818.

    Google Scholar 

  • Zue, V, Glass, J., Goodine, D., Leung, H., Phillips, M., Polifroni, J. and Seneff, S. (1991). Integration of speech recognition and natural language processing in the MIT voyager system, Proceedings of the 1991 IEEE International Conference on Acoustics, Speech, and Signal Processing, Toronto, pp. 713–716.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Bellegarda, J.R. (2001). Robustness in Statistical Language Modeling: Review and Perspectives. In: Junqua, JC., van Noord, G. (eds) Robustness in Language and Speech Technology. Text, Speech and Language Technology, vol 17. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-9719-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-94-015-9719-7_4

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-5643-6

  • Online ISBN: 978-94-015-9719-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics