Modern Conversational Agents

Suendermann-Oeft, David

doi:10.1007/978-3-658-04745-0_4

David Suendermann-Oeft³

4340 Accesses
1 Citations

Abstract

Conversational agents are computer programs engaging with human users in a conversation to assist, educate, or entertain. Being subject to substantial research interest ever since the advent of artificial intelligence in the 1950s and 60s, recent advances in cloud computing and the availability of smart devices with wireless high-speed Internet connection have led to steep progress in the engineering of conversational technology. “Modern” conversational agents understand spoken language, are able to answer complicated questions, or interact with humans in a dialog of hundreds of user turns. The reader of this article will learn about strengths of modern conversational agents driven by synergy among highperforming speech recognition, smart devices, high-speed Internet, cloud computing, standardization, and crowdsourcing. Together, we will see how the field is primarily driven by commercial stakeholders, and how open-source alternatives are expected to play a major role in the future of modern conversational agents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Review of Technologies for Conversational Systems

Conversational UX Design: An Introduction

The Rise of the Conversational Interface: A New Kid on the Block?

References

itu 1993: Pulse Code Modulation (PCM) of Voice Frequencies. Technical Report ITU-T Recommendation G.711, ITU, Geneva, Switzerland.
Google Scholar
itu 2012: 7 kHz Audio-Coding within 64 kbit/s. Technical Report ITU-T Recommendation G.722, ITU, Geneva, Switzerland.
Google Scholar
Adda, G./Mariani, J./Besacier, L./Gelas, H. 2013: Economic and ethical background of crowdsourcing for speech. In Eskenazi, M./Levow, G./Meng, H./Parent, G./Suendermann, D., (eds): Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment. Wiley, Hoboken, USA.
Google Scholar
Bacchiani, M./Beaufays, F./Schalkwyk, J./Schuster, M./Strope, B. 2008: Deploying GOOG-411: Early Lessons in Data, Measurement, and Testing. In Proc. of the ICASSP, Las Vegas, USA.
Google Scholar
Black, A./Tokuda, K. 2005: Blizzard Challenge – 2005: Evaluating Corpus-Based Speech Synthesis on Common Datasets. In Proc. of the Interspeech, Lisbon, Portugal.
Google Scholar
Boyer, L./Danielsen, P./Ferrans, J./Karam, G./Ladd, D./Lucas, B./Rehor, K. 2004: VoiceXML 0.9. W3C Note–Initial Release. http://www.w3.org/TR/2000/ NOTEvoicexml- 20000505.
Boysen, E./Flathagen, J. 2011: Using SIP for Seamless Handover in Heterogeneous Networks. In Proc. of the ICUMT, Budapest, Hungary.
Google Scholar
Brants, T./Franz, A. 2006: Web 1T 5-Gram Corpus Version 1.1. Technical report, Google Research.
Google Scholar
Breazeal, C. 2005: Socially Intelligent Robots. Interactions, 12(2). (Bridle, 2004) Bridle,
Google Scholar
J. (2004). Towards Better Understanding of the Model Implied by the Use of Dynamic Features in HMMs. In Proc. of the ICSLP, Jeju Island, South Korea.
Google Scholar
Burnett, D./Shanmugham, S. 2012: Media Resource Control Protocol Version 2 (MRCPv2). http://tools.ietf.org/html/rfc6787.
Burnett, D./Shuang, Z./Baggia, P./Bagshaw, P./Bodell, M./Huang, D./Xiaoyan, L./McGlashan, S./Tao, J./Jun, Y./Fang, H./Kang, Y./Meng, H./Xia, W./Hairong, X./Wu, Z. 2010: Speech Synthesis Markup Language (SSML) Version 1.1. W3C
Google Scholar
Recommendation. http://www.w3.org/TR/2010/REC-speech-synthesis11–20100907.
Chai, J./Horvath, V./Nicolov, N./Stys, M./Kambhatla, N./Zadrozny, W./Melville, P. 2002: Natural Language Assistant–A Dialog System for Online Product Recommendation. AI Magazine, 23(2).
Google Scholar
Chen, S./Kingsbury, B./Mangu, L./Povey, D./Saon, G./Zweig, H. S. G. 2006: Advances in Speech Transcription at IBM under the DARPA EARS Program. IEEE Trans. on Audio, Speech and Language Processing, 14(5).
Google Scholar
Clarke, A. 1968: 2001: A Space Odyssey. New American Library, New York, USA.
Google Scholar
Davis, K./Biddulph, R./Balashek, S. 1952: Automatic Recognition of Spoken Digits. Journal of the Acoustical Society of America, 24(6).
Google Scholar
de Melo, G./Hose, K. 2013: Advances in Information Retrieval. Springer, New York, USA.
Google Scholar
ECMA 1999: Standard ECMA-262 ECMAScript Language Specification. http://www.ecma-international.org/publications/standards/Ecma-262.htm.
Eskenazi, M./Levow, G./Meng, H./Parent, G./Suendermann, D. 2013: Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment. Wiley, Hoboken, USA.
Google Scholar
Evanini, K./Suendermann, D./Pieraccini, R. 2007: Call Classification for Automated Troubleshooting on Large Corpora. In Proc. of the ASRU, Kyoto, Japan.
Google Scholar
Ferrucci, D./Brown, E./Chu-Carroll, J./Fan, J./Gondek, D./Kalyanpur, A./Lally, A./Murdock, W./Nyberg, E./Prager, J./Schlaefer, N./Welty, C. 2010: Building Watson: An Overview of the DeepQA Project. AI Magazine, 31(3).
Google Scholar
Fielding, R./Kaiser, G. 1997: The Apache HTTP Server Project. Internet Computing, 1(4).
Google Scholar
Fryer, L./Carpenter, R. 2006: Emerging Technologies–Bots as Language Learning Tools. Language Learning & Technology, 10(3).
Google Scholar
Gibbon, D./Moore, R./Winski, R. 1997: Handbook of Standards and Resources for Spoken Language Systems. Mouton de Gruyter, New York, USA.
Google Scholar
Glass, J./Hazen, T./Hetherington, I. 1999: Real-Time Telephone-Based Speech Recognition in the Jupiter Domain. In Proc. of the ICASSP, Phoenix, USA.
Google Scholar
Hakkani-Tür, D./Tur, G./Heck, L. 2012: Research Challenges and Opportunities in Mobile Applications. Signal Processing Magazine, 28(4).
Google Scholar
Hemphill, C./Godfrey, J./Doddington, G. 1990: The ATIS Spoken Language Systems Pilot Corpus. In Proc. of the Workshop on Speech and Natural Language, Hidden Valley, USA.
Google Scholar
Herzfeld, N. 2002: In Our Image: Artificial Intelligence and the Human Spirit. Fortress Press, Minneapolis, USA.
Google Scholar
Herzog, O./Siekmann, J./Rollinger, C. 1991: Text Understanding in LILOG: Integrating Computational Linguistics and Artificial Intelligence–Final Report on the LILOGProject. Springer, New York, USA.
Google Scholar
Hillebrand, F. 2002: GSM and UMTS: The Creation of Global Mobile Communications. Wiley, New York, USA.
Google Scholar
Hinton, G./Deng, L./Yu, D./Dahl, G./Mohamed, A./Jaitly, N./Senior, A./Vanhoucke, V./Nguyen, P./Sainath, T./Kingsbury, B. 2012: Deep Neural Networks for Acoustic Modeling in Speech Recognition. Signal Processing Magazine, 29(6).
Google Scholar
Holovaty, A./Kaplan-Moss, J. 2009: The Definitive Guide to Django: Web Development Done Right. Apress, New York, USA.
Google Scholar
Hunt, A. 2000: JSpeech Grammar Format. W3C Note. http://www.w3.org/TR/2000/NOTE-jsgf-20000605.
Hunt, A./McGlashan, S. 2004: Speech Recognition Grammar Specification Version 1.0.
Google Scholar
W3C Recommendation. http://www.w3.org/TR/2004/REC-speech-grammar-2004 0316.
Jelinek, F. 1997: Statistical Methods for Speech Recognition. MIT Press, Cambridge, USA.
Google Scholar
Johnston, A. 2004: SIP: Understanding the Session Initiation Protocol. Artech House, Norwood, USA.
Google Scholar
Keeling, K./McGoldrick, P./Beatty, S. 2007: Virtual Onscreen Assistants: A Viable Strategy to Support Online Customer Relationship Building? Advances in Consumer Research, 34.
Google Scholar
King, S./Karaiskos, V. 2010: The Blizzard Challenge 2010. In Blizzard Challenge Workshop, Kansai Science City, Japan.
Google Scholar
Kumar, A./Tewari, A./Horrigan, S./Kam, M./Metze, F./Canny, J. 2011: Rethinking Speech Recognition on Mobile Devices. In Proc. of the IUI, Palo Alto, USA.
Google Scholar
Lamere, P./Kwok, P./Gouvea, E./Raj, B., Singh/R., Walker, W./Warmuth, M./Wolf, P. 2003: The CMU SPHINX-4 Speech Recognition System. In Proc. of the ICASSP’03, Hong Kong, China.
Google Scholar
Larson, J. 2000: Introduction and Overview of W3C Speech Interface Framework. W3C Working Draft. http://www.w3.org/TR/voice-intro.
Lea, W. 1980: Trends in Speech Recognition. Prentice Hall, Englewood Cliffs, USA.
Google Scholar
Liu, Z./Bacchiani, M. 2011: TechWare: Mobile Media Search Resources. Signal Processing Magazine, 28(4).
Google Scholar
Maybury, M. 2004: New Directions in Question Answering. AAAI Press, Menlo Park, USA.
Google Scholar
Moreno, A./Lindberg, B./Draxler, C./Richard, G./Choukri, K./Euler, S./Allen, J. 2000: SPEECHDAT-CAR. A Large Speech Database for Automotive Environments. In Proc. of the LREC, Athens, Greece.
Google Scholar
Neustein, A. 2010: Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics. Springer, New York, USA.
Google Scholar
Oshry, M./Auburn, R./Baggia, P./Bodell, M./Burke, D./Burnett, D./Candell, E./Carter, J./McGlashan, S./Lee, A./Porter, B./Rehor, K. 2004: VoiceXML 2.1. W3C Recommendation. http://www.w3.org/ TR/2007/REC-voicexml21–20070619.
Pallett, D. 2003: A Look at NIST’s Benchmark ASR Tests: Past, Present, and Future. In Proc. of the ASRU, Virgin Islands, USA.
Google Scholar
Pieraccini, R. 2012: The Voice in the Machine: Building Computers that Understand Speech. MIT Press, Cambridge, USA.
Google Scholar
Price, P. 1990: Evaluation of Spoken Language Systems: The ATIS Domain. In Proc. of the Workshop on Speech and Natural Language, Hidden Valley, USA.
Google Scholar
Prylipko, D./Schnelle-Walka, D./Lord, S./Wendemuth, A. 2011: Zanzibar OpenIVR: An Open-Source Framework for Development of Spoken Dialog Systems. In Proc. of the TSD, Pilsen, Czech Republic.
Google Scholar
Rabiner, L. 1989: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. of the IEEE, 77(2).
Google Scholar
Radomski, S./Schnelle-Walka, D. 2012: VoiceXML for Pervasive Environments. International Journal of Mobile Human Computer Interaction, 4(2).
Google Scholar
Russell, S./Norvig, P. 2003: Artificial Intelligence–A Modern Approach. Prentice Hall, Upper Saddle River, USA.
Google Scholar
Schlangen, D./Skantze, G. 2009: A General, Abstract Model of Incremental Dialogue Processing. In Proc. of the EACL, Athens, Greece.
Google Scholar
Schnelle-Walka, D./Radomski, S./Mühlhäuser, M. 2013: JVoiceXML as a Modality Component in the W3C Multimodal Architecture. Journal on Multimodal User Interfaces.
Google Scholar
Seneff, S./Hurley, E./Lau, R./Pao, C./Schmid, P./Zue, V. 1998: Galaxy-II: A Reference Architecture for Conversational System Development. In Proc. of the ICSLP, Sydney, Australia.
Google Scholar
Simon, H. 1965: The Shape of Automation for Men and Management. Harper & Row, New York, USA.
Google Scholar
Suendermann, D. 2011: Advances in Commercial Deployment of Spoken Dialog Systems. Springer, New York, USA.
Google Scholar
Suendermann, D./Hunter, P./Pieraccini, R. 2008: Call Classification with Hundreds of Classes and Hundred Thousands of Training Utterances ... and No Target Domain Data. In Proc. of the PIT, Kloster Irsee, Germany.
Google Scholar
Suendermann, D./Liscombe, J./Dayanidhi, K./Pieraccini, R. 2009: A Handsome Set of Metrics to Measure Utterance Classification Performance in Spoken Dialog Systems. In Proc. of the SIGdial, London, UK.
Google Scholar
Suendermann, D./Liscombe, J./Pieraccini, R. 2010a: Contender. In Proc. of the SLT, Berkeley, USA.
Google Scholar
Suendermann, D./Liscombe, J./Pieraccini, R. 2010b: How to Drink from a Fire Hose: One Person Can Annoscribe 693 Thousand Utterances in One Month. In Proc. of the SIGdial, Tokyo, Japan.
Google Scholar
Suendermann, D./Liscombe, J./Pieraccini, R. 2010c: Minimally Invasive Surgery for Spoken Dialog Systems. In Proc. of the Interspeech, Makuhari, Japan.
Google Scholar
Suendermann, D./Liscombe, J./Pieraccini, R./Evanini, K. 2010d: ‘How am I doing?’ A new framework to effectively measure the performance of automated customer care contact centers. In Neustein, A. (ed.): Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics. Springer, New York, USA.
Google Scholar
Suendermann, D./Pieraccini, R. 2011: SLU in commercial and research spoken dialogue systems. In Tur, G./de Mori, R. (eds): Spoken Language Understanding. Wiley, New York, USA.
Google Scholar
Suendermann, D./Pieraccini, R. 2013: Crowdsourcing for industrial spoken dialog systems. In Eskenazi, M./Levow, G./Meng, H./Parent, G./Suendermann, D. (eds): Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment. Wiley, Hoboken, USA.
Google Scholar
Suendermann, D./Ney, H. 2003: synther – a New M-Gram POS Tagger. In Proc. of the NLPKE, Beijing, China.
Google Scholar
Suendermann, D./Strecha, G./Bonafonte, A./Höge, H./Ney, H. 2005: Evaluation of VTLN-Based Voice Conversion for Embedded Speech Synthesis. In Proc. of the Interspeech, Lisbon, Portugal.
Google Scholar
Tichelen, L./Burke, D. 2007: Semantic Interpretation for Speech Recognition (SISR) Version 1.0. W3C Recommendation. http://www.w3.org/TR/semantic-interpretation.
Tur, G./de Mori, R. 2011: Spoken Language Under- standing: Systems for Extracting Semantic Information from Speech. Wiley, Hoboken, USA.
Google Scholar
Turing, A. 1950: Computing Machinery and Intelligence. Mind, 59.
Google Scholar
Valin, J. 2006: Speex: A Free Codec for Free Speech. In Proc. of the Australian National Linux Conference, Dunedin, New Zealand.
Google Scholar
van Meggelen, J./Smith, J./Madsen, L. 2009: Asterisk: The Future of Telephony. O’Reilly, Sebastopol, USA.
Google Scholar
Wahlster, W. 2000: Verbmobil: Foundations of Speech-to-Speech Translation. Springer, New York, USA.
Google Scholar
Walker, M./Aberdeen, J./Sanders, G. 2003: 2001 Commu- nicator Evaluation. Linguistic Data Consortium, Philadelphia, USA.
Google Scholar
Walker, M./Rambow, O. 2002: Spoken Language Generation. Computer Speech and Language, 16(3).
Google Scholar
Walker, W./Lamere, P./Kwok, P. 2002: FreeTTS: A Performance Case Study. Technical report, Sun Microsystems, Santa Clara, USA.
Google Scholar
Wang, A. 2006:The Shazam Music Recognition Service. Communications of the ACM, 49(8).
Google Scholar
Weizenbaum, J. 1966: ELIZA–A Computer Program for the Study of Natural Language Communication between Man and Machine. Communications of the ACM, 9(1).
Google Scholar
Wiedenroth, H./Wollschläger, H. 2007: Karl Mays Werke: Historisch-Kritische Ausgabe. Karl-May-Verlag, Bamberg and Radebeul, Germany.
Google Scholar
Williams, J./Witt-Ehsani, S./Liska, A./Suendermann, D. 2011: Speech Recognition in a Multi-Modal Health Care Application: Two Sides of the Coin. In Proc. of the AVIxD/IxDA Workshop, New York, USA.
Google Scholar
Winarsky, N./Mark, B./Kressel, H. 2012: The Development of Siri and the SRI Venture Creation Process. Technical report, SRI International, Menlo Park, USA.
Google Scholar
Zechner, K./Higgins, D./Xi, X. 2007: Speechrater: A Construct-Driven Approach to Scoring Spontaneous Non-Native Speech. In Proc. of the SLaTE, Farmington, USA.
Google Scholar
Zyda, M./Thukral, D./Ferrans, J./Engelsma, J./Hans, M. 2008: Enabling a Voice Modality in Mobile Games through VoiceXML. In Proc. of the ACM SIGGRAPH symposium on Video games, Los Angeles, USA. Toad for Cloud Databases 2012. Online abrufbar unter: http://toadforcloud.com

Download references

Author information

Authors and Affiliations

Studiengangsleiter Informatik, Duale Hochschule Baden-Württemberg Stuttgart, Stuttgart, Deutschland
Prof. Dr. David Suendermann-Oeft

Authors

Prof. Dr. David Suendermann-Oeft
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Suendermann-Oeft .

Editor information

Editors and Affiliations

MFG Baden-Württemberg, Stuttgart, Germany
Jürgen Jähnert
MFG Baden-Württemberg, Stuttgart, Germany
Christian Förster

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Suendermann-Oeft, D. (2014). Modern Conversational Agents. In: Jähnert, J., Förster, C. (eds) Technologien für digitale Innovationen. Springer VS, Wiesbaden. https://doi.org/10.1007/978-3-658-04745-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-658-04745-0_4
Published: 21 December 2013
Publisher Name: Springer VS, Wiesbaden
Print ISBN: 978-3-658-04744-3
Online ISBN: 978-3-658-04745-0
eBook Packages: Humanities, Social Science (German Language)

Publish with us

Policies and ethics

Modern Conversational Agents

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Review of Technologies for Conversational Systems

Conversational UX Design: An Introduction

The Rise of the Conversational Interface: A New Kid on the Block?

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Modern Conversational Agents

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Review of Technologies for Conversational Systems

Conversational UX Design: An Introduction

The Rise of the Conversational Interface: A New Kid on the Block?

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation