Improving command and control speech recognition on mobile devices: using predictive user models for language modeling

Paek, Tim; Chickering, David Maxwell

doi:10.1007/s11257-006-9021-6

Improving command and control speech recognition on mobile devices: using predictive user models for language modeling

Original Paper
Published: 18 January 2007

Volume 17, pages 93–117, (2007)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

Improving command and control speech recognition on mobile devices: using predictive user models for language modeling

Download PDF

Tim Paek¹ &
David Maxwell Chickering¹

236 Accesses
29 Citations
3 Altmetric
Explore all metrics

Abstract

Command and control (C&C) speech recognition allows users to interact with a system by speaking commands or asking questions restricted to a fixed grammar containing pre-defined phrases. Whereas C&C interaction has been commonplace in telephony and accessibility systems for many years, only recently have mobile devices had the memory and processing capacity to support client-side speech recognition. Given the personal nature of mobile devices, statistical models that can predict commands based in part on past user behavior hold promise for improving C&C recognition accuracy. For example, if a user calls a spouse at the end of every workday, the language model could be adapted to weight the spouse more than other contacts during that time. In this paper, we describe and assess statistical models learned from a large population of users for predicting the next user command of a commercial C&C application. We explain how these models were used for language modeling, and evaluate their performance in terms of task completion. The best performing model achieved a 26% relative reduction in error rate compared to the base system. Finally, we investigate the effects of personalization on performance at different learning rates via online updating of model parameters based on individual user data. Personalization significantly increased relative reduction in error rate by an additional 5%.

Article PDF

Statistical and Linguistic Knowledge Based Speech Recognition System: Language Acquisition Device for Machines

Empirical Exploration of Language Modeling for the google.com Query Stream as Applied to Mobile Voice Search

A Decade of Discriminative Language Modeling for Automatic Speech Recognition

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Chickering, D., Heckerman, D., Meek, C.: A Bayesian approach to learning Bayesian networks with local structure. In: Proceedings of Thirteenth Conference on Uncertainty in Artificial Intelligence, Providence, RI, pp. 80–89. Morgan Kaufmann, 1997
Chickering, D., Paek, T.: Personalizing influence diagrams: applying online learning strategies to dialogue management. User Modeling and User-Adaped Interaction, 2005
Chickering, D.M.: The winmine toolkit. Technical Report MSR-TR-2002-103 Microsoft, Redmond, WA, 2002
Dietterich T.G. (1998) Approximate statistical test for comparing supervised classification learning algorithms. Neural Comput. 10(7):1895–1923
Article Google Scholar
Horvitz, E., Paek, T.: Harnessing models of users’ goals to mediate clarification dialog in spoken language systems. In: Proceedings of the Eighth International Conference on User Modeling, pp. 3–13. Sonthofen, Germany, 2001
Horvitz, E., Shwe, M.: In pursuit of effective handsfree decision support: coupling Bayesian inference, speech understanding, and user models. In: Nineteenth Anuual Symposium on Computer Applications in Medical Care. Toward Cost-Effective Clinical Computing, 1995
Hunt, A., McGlashan, S. (eds.): Speech Recognition Grammar Specification Version 1.0, W3C Recommendation (2004) http://www.w3.org/TR/2004/REC-speech-grammar-20040316/.
Jameson A., Klöckner K. (2004) User multitasking with mobile multimodal systems. In: Minker W., Bühler D., Dybkjær L. (eds). Spoken Multimodal Human-Computer Dialogue in Mobile Environments. Kluwer Academic Publishers, Dordrecht, pp. 349–377
Google Scholar
Jameson, A., Wittig, F.: Leveraging data about users in general in the learning of individual user models. In: Nebel B., (ed.) Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 1185–1192. Morgan Kaufmann, San Francisco, CA 2001
Jelinek F. (1997) Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA
Google Scholar
Johansson, P.: User modeling in dialog systems. Technical Report Technical Report SAR 02-2, Santa Anna IT Research, 2002
Manning C.D., Schütze H. (1999) Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge Massachusetts
MATH Google Scholar
Oviatt S., MacEachern M., Levow G. (1998) Predicting hyperarticulate speech during human-computer error resolution. Speech Commun. 24(2):87–110
Article Google Scholar
Paek, T., Horvitz, E.: Conversation as action under uncertainty. In: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 455–464. Stanford, CA, 2000
Rosenfeld, R.: Two decades of statistical language modeling: where do we go from here? In: Proc. IEEE 88(8), 1270–1278 (2000)
Rosenfeld R., Olsen D., Rudnicky A. (2001) Universal speech interfaces. Interactions 8(6):34–44
Article Google Scholar
Strother, N.: Future cell phones: the big trends, 2005–2010. Technical Report IN0502105WH, In-Stat, Scottsdale, AZ, 2005
Webb G., Pazzani M., Billsus D. (2001) Machine learning for user modeling. User Model. User-Adapted Interac. 11, 19–20
Article MATH Google Scholar
Widmer G., Kubat M. (1996) Learning in the presence of concept drift and hidden contexts. Machine Learning 23, 69–101
Google Scholar
Woods, W.A.: Language processing for speech understanding. Computer Speech Processing, pp. 305–334, Prentice Hall, UK (1985)
Yu, D., Wang, K., Mahajan, M., Mau, P., Acero, A.: Improved name recognition with user modeling. In: Proceedings of the Eurospeech Conference, pp. 1229–1232. Geneva, Switzerland, 2003

Download references

Author information

Authors and Affiliations

Microsoft Research, Redmond, WA, USA
Tim Paek & David Maxwell Chickering

Authors

Tim Paek
View author publications
You can also search for this author in PubMed Google Scholar
David Maxwell Chickering
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tim Paek.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Paek, T., Chickering, D.M. Improving command and control speech recognition on mobile devices: using predictive user models for language modeling. User Model User-Adap Inter 17, 93–117 (2007). https://doi.org/10.1007/s11257-006-9021-6

Download citation

Received: 01 November 2005
Accepted: 11 July 2006
Published: 18 January 2007
Issue Date: March 2007
DOI: https://doi.org/10.1007/s11257-006-9021-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Improving command and control speech recognition on mobile devices: using predictive user models for language modeling

Abstract

Article PDF

Similar content being viewed by others

Statistical and Linguistic Knowledge Based Speech Recognition System: Language Acquisition Device for Machines

Empirical Exploration of Language Modeling for the google.com Query Stream as Applied to Mobile Voice Search

A Decade of Discriminative Language Modeling for Automatic Speech Recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving command and control speech recognition on mobile devices: using predictive user models for language modeling

Abstract

Article PDF

Similar content being viewed by others

Statistical and Linguistic Knowledge Based Speech Recognition System: Language Acquisition Device for Machines

Empirical Exploration of Language Modeling for the google.com Query Stream as Applied to Mobile Voice Search

A Decade of Discriminative Language Modeling for Automatic Speech Recognition

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation