Keywords

1 The Rise of the Conversational Interface

The year 2016 marked a tipping point for chatbots and conversational interfaces. Major tech companies started to invest heavily in the technologies required to develop sophisticated systems capable of interacting with users in a natural, conversational style, such as artificial intelligence (AI), particularly deep learning and natural language processing (NLP). They have also been hiring many of the best researchers from university research labs and buying up smaller companies that had specialized in these technologies. Microsofts CEO Satya Natella declared that “conversational interfaces will be born on the devices you use today” and that “chatbots will fundamentally revolutionize how computing is experienced by everybody”,Footnote 1 while David Marcus, VP of Facebook Messaging stated that “threads (of conversation) are the new app”,Footnote 2 and Chris Messina of Uber announced that “2016 will be the year of conversational commerce”.Footnote 3

There are many reasons why chatbots, also known as messaging apps, have suddenly come to the fore. One reason is to do with usage. BI Intelligence reported that the usage of messaging apps on smartphones has surpassed the usage of social networks.Footnote 4 Another factor is that interaction with a messaging app is easier than with a traditional smartphone app that has to be downloaded and installed. In contrast messaging apps run on a standard platform such as Facebook Messenger or Skype that users are already familiar with. Although users often have a large number of apps on their device, it has been shown that many of these are only used once or twice and then abandoned. In fact, only a small number of all the apps on a user’s device are used regularly. A further advantage is that chatbots do not have to be built separately for different operating systems, such as Android, iOS, or Windows – they can work across apps such as Skype, Facebook Messenger, Line, or any other messaging service. This makes life easier for users as well as developers.

There are also various technological drivers that have facilitated the development and deployment of chatbots and conversational interfaces:

  • Advances in AI, particularly in deep learning (deep neural networks).

  • Greater computing processing power to support the massive parallel computations required to run deep neural networks.

  • The availability of vast amounts of data (known as big data) that enable AI systems to learn and become increasingly more intelligent.

  • Increased connectivity, allowing users to connect their smart devices to vast cloud-based resources.

  • Advances in Speech Recognition and Natural Language Processing technologies, mainly as a result of the application of deep neural networks.

  • The interest of the major technology companies in chatbots and conversational interfaces, enabling them to more accurately profile their users and thus gain a competitive advantage in the promotion of their e-commerce services.

2 Defining Chatbots and Conversational Interfaces

Various terms are used to refer to systems that provide conversational interaction for users, including bot, chatbot, virtual personal assistant, digital assistant, conversational agent, conversational bot, and messaging app.

2.1 Bots and Chatbots

It is useful to distinguish between bots and chatbots. Bots can be defined as software applications that perform automated tasks. Bots are used to automatically post messages to social media, to crawl search results, and for various other routine and mundane tasks that would otherwise be costly and time-consuming for humans to do. For example, bots are used by Wikipedia to scan its millions of articles to fix errors, add links to other pages, and perform various other housekeeping tasks. In some cases bots are used for less desirable purposes, such as spreading spam emails, and there is a growing use of political bots that can send large numbers of automated messages on social media to sway opinions at elections and to spread propaganda [25].

A chatbot is also a software application that performs automated tasks, but it differs from a bot in that it also engages in a conversation (or chat) with the user. We can distinguish between task-oriented chatbots that use conversation to automate a task, such as scheduling a meetingFootnote 5 or ordering a pizza,Footnote 6 and those chatbots that engage in conversation mainly for entertainment or to take part in competitions to find the most humanlike chatbot.Footnote 7

2.2 Conversational Interface

A conversational interface, also known as conversational user interface (CUI), provides the front-end to a chatbot or virtual personal assistant, allowing the user to interact with the app using speech, text, touch, and various other input and output modes.

What does it mean to be conversational? The term conversational covers two different dimensions of a conversational interface.

Firstly, in terms of language, a conversational interface can mean a system in which the language being used is natural, as in naturally occurring conversation, as opposed to language restricted to a fixed set of commands and phrases. Conversational language also implies flexibility, so that messages can be expressed in a variety of different ways as opposed to a single fixed expression. For chatbots the language may also take the form of textese or chatspeak, terms that refer to the special forms of language, including abbreviations and syntactic variations, that are used commonly in messaging apps.

A second meaning of conversational can refer to the interactional style supported in the interface. At a basic level this can refer simply to the interaction style used in messaging apps where the user and system interact on a turn-by-turn basis, as opposed to clicking and selecting from drop-down menus on a graphical user interface. A more advanced usage refers to a style of interaction that is more flexible, as in human-human conversation, where both system and user can make contributions to the conversation (known as mixed-initiative interaction). A more advanced conversational system should also keep track of the context of the conversation in order to allow follow-up questions and topic tracking. Most current chatbots support one-shot queries that are stateless, i.e. the user asks a question and the system answers, but any subsequent question is treated as completely independent of previously asked questions.

In sum, the interaction afforded by a conversational app can range in complexity from a tightly constrained form in which the user is restricted to simple inputs or selecting from a small set of options (also known as quick replies), to a form that is similar to a conversation between humans.

3 Origins of the Conversational Interface

Although conversational interfaces and chatbots became hot topics in 2016, the idea of creating a conversational computer has been around for a long time, going back in fact to the 1960s. It is worth considering the main achievements and findings of this work, both in order to inform current efforts and also to avoid the problem of unnecessarily re-inventing the wheel.

There are four different communities that have worked on conversational interfaces, largely independently of one another (for more detail, see [14]):

  • Spoken dialogue systems (SDSs)

  • Voice user interfaces (VUIs)

  • Embodied conversational agents (ECAs)

  • Chatbots

3.1 Spoken Dialogue Systems

Spoken dialogue systems allow humans to interact with a computer on a turn-by-turn basis using spoken natural language for input and output. The earliest dialogue systems, developed in the 1960s and 1970s, were text-based and mainly motivated by efforts to apply techniques from linguistics to dialogue. In the 1980s researchers focused more on the nature of conversational competence, looking at issues such as how to recover from conversational breakdowns arising from the user’s misconceptions and false assumptions. This work also built on areas of artificial intelligence such as user modeling and planning. For a detailed account of early text-based dialogue systems, see [13].

Around the late 1980s and early 1990s, with the emergence of more powerful and more accurate speech recognition engines, spoken dialogue systems began to appear. Early examples were the ATIS (Air Travel Information Service) in the USA, [6], while in Europe SUNDIAL was a major project funded by the European community [12]. Later systems include MIT’s Mercury [19], the DARPA Communicator systems [24], RavenclawFootnote 8, and TRIPS [2].

Some of the achievements from research and development in spoken dialogue systems are still relevant today, including:

  • The application of techniques from logic-based AI, such as plan-based dialogue [1], Information State Update Theory [22], and dialogue as rational interaction [9].

  • The application of techniques from data-driven and statistical AI, such as reinforcement learning [18], corpus-based dialogue systems [8], and deep neural networks for sequence to sequence learning [21].

  • The production of toolkits to support developers of spoken dialogue systems, including the CSLU Toolkit,Footnote 9 TRINDIKIT,Footnote 10 and OpenDialFootnote 11.

3.2 Voice User Interfaces

While spoken dialogue systems were developed in academic and industrial research laboratories, at the same time similar systems were being developed by various companies and enterprises for commercial deployment. These were called Voice User Interfaces. AT&T’s How May I help You? (HMIHY) system is an early example [7]. HMIHY supported call routing and by the end of 2001 was handling more than 2 million calls per month and showing significant improvements in customer satisfaction over alternative solutions. Many similar systems have been developed subsequently to support automated customer self-service tasks such as directory assistance, information enquiries, and simple transactions.

Some of the achievements of the voice user interface community are:

  • The production of design and evaluation guidelines [4, 11, 16].

  • The development of standards, such as VoiceXML, a W3C standard for scripting spoken dialogues. For a recent book on W3C standards for voice user interfaces, see [5].

  • Toolkits, especially for developing VoiceXML-based applications, for example, Voxeo Evolution.Footnote 12

  • Speech Analytics – the process of mining recorded conversations between a company’s service agents and customers to obtain information about the quality of the interaction, agent performance, customer engagement, and other factors that determine customer satisfaction and loyalty.Footnote 13

  • Usability testing – the application of effective metrics and methods for the evaluation of the usability of voice user interfaces.Footnote 14

3.3 Embodied Conversational Agents

Embodied conversational agents (ECAs) are computer-generated animated characters that combine facial expression, body stance, hand gestures, and speech to provide a more human-like and more engaging interaction. Examples are:

  • Smartakus, an animated character used in the SmartKom project to present information [23].

  • REA, a real-time, multimodal, life-sized ECA that plays the role of a real estate agent [3].

  • GRETA, a real-time three dimensional embodied ECA that talks and displays facial expressions, gestures, gaze, and head movements.Footnote 15

  • The Aldebaran robots: Pepper, NAO, and Romeo.Footnote 16

  • Jibo, a social robot with a single eye and a moving head and body that are used to give him a personality and promote social engagement.Footnote 17

  • Furhat, a robotic head based on a projection system that renders facial expressions, with motors to move the neck and head.Footnote 18

  • Hello Barbie, a commercially available conversational toy that responds to children’s inputs and retrieves answers from data sources on the Web.Footnote 19

The achievements of the ECA community include the following:

  • Advances in technology, such as how to handle multimodal input and output, the development of avatars and talking heads, and the production and interpretation of gestures and emotions.

  • The development of standards, such as SAIBA (Situation, Agent, Intention, Behaviour, Animation), BML (Behavior Markup Language), FML (Functional Markup Language), MURML (Multimodal Utterance Representation Language), and EML (Emotion Markup Language). See [5] for descriptions of many of these standards.

  • Toolkits, for example, the Virtual Human ToolkitFootnote 20 and ACE (Articulated Communicator Engine).Footnote 21

For more detailed descriptions of ECAs, see [14], especially Chaps. 13–16.

3.4 Chatbots

Chatbots, also known as chatterbots, produce simulated conversations in which the human user inputs some text and the chatbot makes a response. One of the motivations for developers of chatbots is to try to fool the user into thinking that they are conversing with another human. Competitions such as the Loebner prize, launched in 1991 by the late Dr Hugh Loebner, have the aim of finding the most human-like chatbot.

To date most conversations with chatbots have been text-based, although some more recent chatbots make use of spoken input and output and in some cases also include avatars or talking heads to endow the chatbot with a more human-like personality. Generally, the interaction with chatbots takes the form of small talk as opposed to the task-oriented dialogues of SDSs and VUIs.

The achievements of the chatbot community include the following:

  • The development of scripting languages such as AIML (Artificial Intelligence Markup Language)Footnote 22 and ChatScript.Footnote 23

  • Toolkits, for example, PandorabotsFootnote 24 and PullString.Footnote 25

  • Advances in technology, such as the use of knowledge repositories to provide some degree of world knowledge and discourse mechanisms to provide limited support for anaphora resolution and topic tracking.

  • The incorporation of mobile functions to enable the deployment of chatbots on smartphones and other smart devices.

  • Machine learning of conversational patterns from a corpus of conversational data [20].

4 What Is Different Now?

Given the extensive work on conversational interfaces described in the previous section, we might ask what is different now with present-day chatbots and conversational interfaces. While there is much to be learned from the achievements and also some of the failures of the past, many of the systems described above suffered from one or more of the following limitations:

  • Some early systems were extremely brittle and would fall over or crash if there was the slightest deviation from the expected input.

  • The systems worked well for the purposes for which they were designed but did not scale up or transfer easily to other domains.

  • They were often developed using proprietary toolkits and languages.

  • They were deployed on specialized platforms and could not be easily ported and deployed on other platforms.

In contrast, present-day chatbots and conversational interfaces benefit from the following advantages:

  • As mentioned earlier, they can be developed and deployed on messaging apps that users are already familiar with and thus they work seamlessly across multiple devices and platforms. Moreover, the user does not need to download and install separate apps for each application.

  • Chatbots have access to contextual information about users, such as their location, health, and other data that may have been acquired through sensors. This allows them to provide a more personalised experience for each user.

  • Chatbots can learn from experience in contrast with earlier systems that were static and did not alter or improve their behaviour over time.

  • A number of toolkits have become available that incorporate the latest developments in AI, machine learning, and NLP, and provide an intuitive and easy-to-learn facility for developers. Examples include:

5 Some Issues for Future Work

Chatbots and conversational interfaces provide a new and easy interface for users as well as an opportunity for businesses to promote goods and services more effectively. However, as mentioned earlier, there is a danger with all the hype that lessons learned in the past are being ignored. Two areas are of particular interest here:

  • How to design and evaluate a conversational interface.

  • How to provide a satisfying conversational experience.

5.1 Guidelines for Design and Evaluation

Many articles are being published on an almost daily basis setting out how to design chatbots, providing tips and highlighting common design mistakes, with titles such as “Eleven rules to follow when designing a chatbot”Footnote 35 and “Top 6 conversational skills to teach your chatbots”.Footnote 36 While articles such as these often provide useful hints for novice developers, in some cases the advice given is very high-level and not easy to actually operationalise, for example:

  • Make life easier.

  • Make interactions simpler.

  • be a broken record.

  • sound like a robot.

Instead of a proliferation of hints and suggestions, what is needed is a coherent set of design and evaluation guidelines. As one commentator has written:

With the hype around bots, everyone is running around like headless chickens to build bots that will solve every existing problem in the world.Footnote 37

Chatbot developers are in danger of re-inventing the wheel when there has already been considerable work done in areas such as voice user interfaces, backed by years of experience in commercial deployment. See, for example, [5, 11, 16].

Little has been written on how a chatbot should be evaluated, although companies are beginning to emerge that specialise in the analysis of conversational data with bots).Footnote 38

A common distinction has been made in the evaluation of spoken dialogue systems and voice user interfaces between objective and subjective measures:

  • Objective metrics are computed from logs of the interactions of users with the system, such as the duration of the dialogue, the number of system corrections, word error rate, and the containment rate i.e. the number of interactions handled by the automated system.

  • Subjective metrics elicit the opinions of users about some aspect of quality, such as the intelligibility of the synthesized speech, the overall user experience, and the expected future usage of the system.

An initial step in developing evaluation guidelines for chatbots and conversational interfaces should be to consider and possibly further develop and amend metrics that have been used successfully over several decades for spoken dialogue systems and voice user interfaces. For more detail on the evaluation of conversational interfaces, see [14], Chap. 17.

There are two interesting new initiatives within the speech and W3C communities that are currently addressing issues of design guidelines and standards:

  • Guidelines for conversational interfaces – this initiative has been launched by the Association for Voice Interaction Design (AVIxD),Footnote 39 aiming to investigate issues such as:

    • How to evaluate conversational interfaces.

    • Contexts of use.

    • Interactions between different modalities (visual, spoken, touch).

    • Should conversations always be user-initiated or sometimes system-initiated and under what circumstances?

  • Standards for virtual assistants – this is a W3C community group, led by Deborah Dahl,Footnote 40 investigating issues such as:

    • Collection of new use cases for voice interaction.

    • Languages for defining intelligent, conversational dialogues.

    • Standard semantic representations for common concepts, e.g. time, location, etc.

    • Communication standards between different virtual assistants.

5.2 Providing a Satisfying Conversational Experience

As yet there has been little written on how to provide a satisfying conversational experience for users. One aspect concerns the management of the conversation flow in a chatbot application. Some toolkits allow the designer to design the conversation flow using a graphical tool.Footnote 41 However, while this approach is useful for simple conversations with few choices, it has been shown in spoken dialogue research that when a dialogue increases in complexity with several branches at each dialogue state, trying to represent the complete dialogue flow as a graph becomes difficult, if not impossible [15, 17]. Other methods for dealing with conversation flow need to be considered (see, for example, discussion in [14], Chap. 10).

There are many other issues that have been addressed in research on dialogue management for spoken dialogue systems and voice user interfaces that are also potentially relevant for chatbots, including:

  • How to manage multi-turn conversations as opposed to one-shot queries, where the conversation either involves slot-filling or simply an extended interaction. See [14], Chap. 10 for discussion of dialogue management issues.

  • Within a multi-turn conversation how to handle requests for clarification and follow-up questions.

  • How to manage context, including the conversational context, the user’s physical context, and other relevant contextual factors such as user preferences and attributes.

  • How to handle more advanced pragmatic features of conversation, such as the interpretation of indirect speech acts, conversational implicature, and presupposition (see [10] for a comprehensive treatment of these issues).

  • How to deal with elements of conversational behaviour, such as social engagement, personality, and emotion (see [14], Chap. 14 for a detailed discussion of these issues).

6 Concluding Remarks

Chatbots with conversational interfaces provide a new and exciting medium for users interacting with smart devices. There are great opportunities but also many challenges ahead. It is important that developers should not ignore the rich body of scientific work on conversational interfaces that has produced guidelines and standards as well as highlighting some of the pitfalls to be avoided.