Abstract
Robustness in statistical language modeling refers to the need to maintain adequate speech recognition accuracy as fewer and fewer constraints are placed on the spoken utterances, or more generally when the lexical, syntactic, or semantic characteristics of the discourse in the training and testing tasks differ. Obstacles to robustness involve the dual issues of model coverage and parameter reliability, which are intricately related to the quality and quantity of training data, as well as the estimation paradigm selected. Domain-to-domain differences impose further variations in vocabulary, context, grammar, and style. This chapter reviews a selected subset of recent approaches proposed to deal with some of these issues, and discusses possible future directions of improvement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adda, G., Jardino, M. and Gauvain, J. L. (1999). Language modeling for broadcast news transcription, Proceedings of the Sixth European Conference Speech Communication and Technology, Vol. 4, Budapest, Hungary, pp. 1759–1762.
Bahl, L. R., Brown, P. E, de Souza, P. V. and Mercer, R. L. (1989). A tree-based statistical language model for natural language speech recognition, IEEE Transactions on Acoustics, Speech, and Signal ProcessingASSP-37(7): 1001–1008.
Bahl, L. R., Jelinek, E. and Mercer, R. L. (1983). A maximum likelihood approach to continuous speech recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI5 (2): 179–190.
Bellegarda, J. R. (1996). Context-dependent vector clustering for speech recognition, in C.-H. Lee, E K. Soong and K. K. Paliwal (eds), Automatic Speech and Speaker Recognition: Advanced Topics, Kluwer Academic Publishers, New York, chapter 6, pp. 133–157.
Bellegarda, J. R. (1997). A latent semantic analysis framework for large-span language modeling, Proceedings of the Fifth European Conference Speech Communication and Technology, Vol. 3, Rhodes, Greece, pp. 1451–1454.
Bellegarda, J. R. (1998a). Exploiting both local and global constraints for multi-span statistical language modeling, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, Seattle, WA, pp. 677–680.
Bellegarda, J. R. (1998b). A multi-span language modeling framework for large vocabulary speech recognition, IEEE Transactions on Speech and Audio Processing 6 (5): 456–467.
Bellegarda, J. R. (1999). Speech recognition experiments using multi-span statistical language modeling, Proceedings of the 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. II, Phoenix, AZ, pp. 717–720.
Bellegarda, J. R., Butzberger, J. W, Chow, Y.-L., Coccaro, N. B. and Naik, D. (1996). A novel word clustering algorithm based on latent semantic analysis, Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Atlanta, GA, pp. 172–175.
Bellegarda, J. R. and Nahamoo, D. (1990). Tied mixture continuous parameter modeling for speech recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP38(12): 2033–2045.
Berry, M. and Sameh, A. (1989). An overview of parallel algorithms for the singular value and dense symmetric eigenvalue problems, Journal of Computational Applied Mathematics 27: 191–213.
Berry, M. W. (1992). Large-scale sparse singular value computations, International Journal for Supercomputer Applications 6 (1): 13–49.
Berry, M. W, Dumais, S. T. and O’Brien, G. W. (1995). Using linear algebra for intelligent information retrieval, SIAM Review 37 (4): 573–595.
Brousseau, J., Drouin, C., Foster, G., Isabelle, P, Kuhn, R., Normandin, Y. and Plamondon, P (1995). French speech recognition in an automatic dictation system for translators: The TransTalk project, Proceedings of the Fourth European Conference Speech Communication and Technology, Vol. 1, Madrid, pp. 193–196.
Chase, L., Rosenfeld, R. and Ward, W. (1994). Error-responsive modifications to speech recognizers: Negative n-grams, Proceedings of the 1994 International Conference Spoken Language Processing, Yokohama.
Chelba, C., Engle, D., Jelinek, F., Jimenez, V, Khudanpur, S., Mangu, L., Printz, H., Ristad, E. S., Rosenfeld, R., Stolcke, A. and Wu, D. (1997). Structure and performance of a dependency language model, Proceedings of the Fifth European Conference Speech Communication and Technology, Vol. 5, Rhodes, Greece, pp. 2775–2778.
Chelba, C. and Jelinek, E (1999). Recognition performance of a structured language model, Proceedings of the Sixth European Conference Speech Communication and Technology, Vol. 4, Budapest, pp. 1567–1570.
Chen, S. (1996). Building Probabilistic Models for Natural Language, PhD thesis, Harvard University, Cambridge, MA.
Chou, P. A. (1988). Applications of Information Theory to Pattern Recognition and the Design of Decision Trees and Trellises, PhD thesis, Stanford University, Stanford, CA.
Church, K. W. (1987). Phonological Parsing in Speech Recognition, Kluwer Academic Publishers, New York.
Clarkson, P. R. and Robinson, A. J. (1997). Language model adaptation using mixtures and an exponentially decaying cache, Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, Munich, pp. 799–802.
Cullum, J. K. and Willoughby, R. A. (1985). Real rectangular matrices, Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1 Theory, Brickhauser, Boston, chapter 5.
Darroch, J. N. and Ratcliff, D. (1972). Generalized iterative scaling for log-linear models, Annals of Mathematical Statistics 43 (5): 1470–1480.
Deerwester, S., Dumais, S. T, Fumas, G. W, Landauer, T. K. and Harshman, R. (1990). Indexing by latent semantic analysis, Journal of the American Society for Information Science 41: 391–407.
Della Pietra, S., Della Pietra, V. and Lafferty, J. (1997). Inducing features of random fields, IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-19(1): 1–13.
Della Pietra, S., Della Pietra, V, Mercer, R. and Roukos, S. (1992). Adaptive language model estimation using minimum discrimination estimation, Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, San Francisco, CA, pp. 633–636.
Dumais, S. T. (1991). Improving the retrieval of information from external sources, Behavior Research on Methods, Instrumentation, and Computers 23 (2): 229–236.
Dumais, S. T. (1994). Latent semantic indexing (LSI) and TREC-2, in D. Harman (ed.), Second Text REtrieval Conference (TREC-2), NIST Publication 500–215, pp. 105–116.
Essen, U. and Steinbiss, V. (1992). Co-occurrence smoothing for stochastic language modeling, Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, San Francisco, CA, pp. 161–164.
Farhat, A., Isabelle, J. and O’Shaughnessy, D. (1996). Clustering words for statistical language models based on contextual word similarity, Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Atlanta, GA, pp. 180–183.
Federico, M. and de Mori, R. (1998). Language modeling, in R. de Mori (ed.), Spoken Dialogues with Computers, Academic Press, London, chapter 7, pp. 199–230.
Foltz, P. W. and Dumais, S. T. (1992). Personalized information delivery: An analysis of information filtering methods, Communications of the ACM 35 (12): 51–60.
Gildea, D. and Hoffman, T. (1999). Topic-based language modeling using EM, Proceedings of the Sixth European Conference Speech Communication and Technology, VoL 5, Budapest, pp. 2167–2170.
Gotoh, Y. and Renais, S. (1997). Document space models using latent semantic analysis, Proceedings of the Fifth European Conference Speech Communication and Technology, Vol. 3, Rhodes, Greece, pp. 1443–1448.
Isotani, R. and Matsunaga, S. (1994). A stochastic language model for speech recognition integrating local and global constraints, Proceedings of the 1994 IFF.R International Conference on Acoustics, Speech, and Signal Processing, Vol. II, Adelaide, Australia, pp. 5–8.
Iyer, R. and Ostendorf, M. (1999). Modeling long distance dependencies in language: Topic mixtures versus dynamic cache models, IEEE Transactions on Speech and Audio Processing 7 (1): 30–39.
Iyer, R., Ostendorf, M. and Rohlicek, J. R. (1994). Language modeling with sentence-level mixtures, Proceedings of the ARPA Speech and Natural Language Workshop, Morgan Kaufmann Publishers, pp. 82–86.
Jardino, M. (1996). Multilingual stochastic n-gram class language models, Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Atlanta, GA, pp. 161–163.
Jardino, M. and Adda, G. (1993). Automatic word classification using simulated annealing, Proceedings of the 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, Minneapolis, MN, pp. 41–44.
Jelinek, F. (1985). The development of an experimental discrete dictation recognizer, Proceedings of the IEEE 73 (11): 1616–1624.
Jelinek, E. (1990). Self-organized language modeling for speech recognition, in A. Waibel and K.-F. Lee (eds), Readings in Speech Recognition, Morgan Kaufmann Publishers, pp. 450–506.
Jelinek, F. and Chelba, C. (1999). Putting language into language modeling, Proceedings of the Sixth European Conference Speech Communication and Technology, Vol. 1, Budapest, pp. KN1KN5.
Jelinek, F. and Lafferty, J. D. (1991). Computation of the probability of initial substring generation by stochastic context-free grammars, Computational Linguistics 17: 315–323.
Jelinek, F. and Mercer, R. L. (1980). Interpolated estimation of Markov source parameters from sparse data, Pattern Recognition in Practice, Amsterdam, pp. 381–397.
Jurafsky, D., Wooters, C., Segal, J., Stolcke, A., Fosler, E., Tajchman, G. and Morgan, N. (1995). Using a stochastic context-free grammar as a language model for speech recognition, Proceedings of the 1995 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Detroit, MI, pp. 189–192.
Katz, S. M. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer, IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP35: 400–401.
Kenne, P. E., O’Kane, M. and Pearcy, H. G. (1995). Language modeling of spontaneous speech in a court context, Proceedings of the Fourth European Conference Speech Communication and Technology, Vol. 3, Madrid, pp. 1801–1804.
Kneser, R. (1996). Statistical language modeling using a variable context, Proceedings of the 1996 International Conference on Spoken Language Processing, Philadelphia, PA, pp. 494–497.
Kneser, R. and Ney, H. (1995). Improved backing-off for n-gram language modeling, Proceedings of the 1995 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Detroit, MI, pp. 181–184.
Kneser, R. and Steinbiss, V. (1993). On the dynamic adaptation of stochastic language models, Proceedings of the 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. II, Minneapolis, MN, pp. 586–588.
Kubala, F, Bellegarda, J. R., Cohen, J. R., Pallett, D., Paul, D. B., Phillips, M., Rajasekaran, R., Richardson, F, Riley, M., Rosenfeld, R., Roth, R. and Weintraub, M. (1994). The hub and spoke paradigm for CSR evaluation, Proceedings of the ARPA Speech and Natural Language Workshop, Morgan Kaufmann Publishers, pp. 40–44.
Kuhn, R. and de Mori, R. (1990). A cache-based natural language method for speech recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-12-(6): 570–582.
Lafferty, J. D. and Suhm, B. (1995). Cluster expansion and iterative scaling for maximum entropy language models, in K. Hanson and R. Silver (eds), Maximum Entropy and Bayesian Methods, Kluwer Academic Publishers, Norwell, MA.
Landauer, T. K. and Dumais, S. T. (1997). Solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge, Psychological Review 104 (2): 211–240.
Landauer, T. K., Laham, D., Rehder, B. and Schreiner, M. E. (1998). How well can passage meaning be derived without using word order: A comparison of latent semantic analysis and humans, Proceedings of the Cognitive Science Society.
Lau, R., Rosenfeld, R. and Roukos, S. (1993). Trigger-based language models: A maximum entropy approach, Proceedings of the 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. II, Minneapolis, MN, pp. 45–48.
Maltese, G. and Mancini, F. (1992). An automatic technique to include grammatical and morphological information in a trigram-based statistical language model, Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, San Francisco, CA, pp. 157–160.
Martin, S. C., Liermann, J. and Ney, H. (1997). Adaptive topic-dependent language modelling using word-based varigrams, Proceedings of the Fifth European Conference Speech Communication and Technology, Vol. 3, Rhodes, Greece, pp. 1447–1450.
Mood, A., Graybill, F. and Boes, D. (1974). Introduction to the Theory of Statistics, McGraw-Hill, New York.
Ney, H., Essen, U. and Kneser, R. (1994). On structuring probabilistic dependences in stochastic language modeling, Computer, Speech, and Language 8: 1–38.
Niesler, T. and Woodland, p (1996). A variable-length category-based n-gram language model, Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Atlanta, GA, pp. 164–167.
Pereira, E C., Singer, Y. and Tishby, N. (1995). Beyond word n-grams, in D. Yarowsky and K. Church (eds), Proceedings of the Third Workshop on Very Large Corpora, Massachusetts Institute of Technology, Cambridge, MA, pp. 95–106.
Rabiner, L. R., Juang, B.-H. and Lee, C.-H. (1996). An overview of automatic speech recognition, in C.-H. Lee, F. K. Soong and K. K. Paliwal (eds), Automatic Speech and Speaker Recognition: Advanced Topics, Kluwer Academic Publishers, Boston, MA, chapter 1, pp. 1–30.
Rosenfeld, R. (1994). The CMU statistical language modeling toolkit and its use in the 1994 ARPA CSR evaluation, Proceedings of the ARPA Speech and Natural Language Workshop, Morgan Kaufmann Publishers.
Rosenfeld, R. (1995). Optimizing lexical and n-gram coverage via judicious use of linguistic data, Proceedings of the Fourth European Conference on Speech Communication and Technology, Madrid, pp. 1763–1766.
Rosenfeld, R. (1996). A maximum entropy approach to adaptive statistical language modeling, Computer Speech and Language 10: 187–228.
Roukos, S. (1997). Language representation, in R. Cole (ed.), Survey of the State of the Art in Human Language Technology, Cambridge University Press, chapter 6.
Schwartz, R., Imai, T, Kubala, F., Nguyen, L. and Makhoul, J. (1997). A maximum likelihood model for topic classification of broadcast news, Proceedings of the Fifth European Conference Speech Communication and Technology, Vol. 3, Rhodes, Greece, pp. 1455–1458.
Spies, M. (1995). A language model for compound words in speech recognition, Proceedings of the Fourth European Conference on Speech Communication and Technology, Madrid, pp. 1767–1770.
Stolcke, A. and Segal, J. (1994). Precise n-gram probabilities from stochastic context-free grammars, Proceedings of the 32nd Meeting of the Association for Computational Linguistics, Las Cruces, NM, pp. 74–79.
Story, R. E. (1996). An explanation of the effectiveness of latent semantic indexing by means of a bayesian regression model, Information Processing and Management 32 (3): 329–344.
Tamoto, M. and Kawabata, T. (1995). Clustering word category based on binomial posteriori cooccurrence distribution, Proceedings of the 1995 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. I, Detroit, MI, pp. 165–168.
Witten, I. H. and Bell, T. C. (1991). The zero-frequency problem: Estimating the probability of novel events in adaptive text compression, IEEE Transactions on Information Theory 37(4): 10851094.
Woodland, P C., Odell, J. J., Valtchev, V. and Young, S. J. (1994). Large vocabulary continuous speech recognition using HTK, Proceedings of the 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, Adelaide, Australia, pp. 125–128.
Younger, D. H. (1967). Recognition and parsing of context-free languages in time N 3, Information and Control 10: 198–208.
Zhang, R., Black, E. and Finch, A. (1999). Using detailed linguistic structure in language modeling, Proceedings of the Sixth European Conference Speech Communication and Technology, Vol. 4, Budapest, pp. 1815–1818.
Zue, V, Glass, J., Goodine, D., Leung, H., Phillips, M., Polifroni, J. and Seneff, S. (1991). Integration of speech recognition and natural language processing in the MIT voyager system, Proceedings of the 1991 IEEE International Conference on Acoustics, Speech, and Signal Processing, Toronto, pp. 713–716.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Bellegarda, J.R. (2001). Robustness in Statistical Language Modeling: Review and Perspectives. In: Junqua, JC., van Noord, G. (eds) Robustness in Language and Speech Technology. Text, Speech and Language Technology, vol 17. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-9719-7_4
Download citation
DOI: https://doi.org/10.1007/978-94-015-9719-7_4
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5643-6
Online ISBN: 978-94-015-9719-7
eBook Packages: Springer Book Archive