Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Fifty years ago, the chatbot ELIZA was created and considered the first piece of conversational software. The chatbot ELIZA was intended to emulate a psycho-therapist. At that time, it did not pass the Turing test (Turing 1950). Today, conversational computer systems are emerging in many domains, ranging from hotline support over game environments to educational contexts. Some of them can pass the Turing test (e.g., Eugene Goostman (Eugene 2014)). Not only we can find conversational computer systems in many application domains, but smartphones that almost everyone uses daily are integrated with a natural language speech assistant (e.g., “Siri” for iPads, “S-Voice” for Samsung tablets/smartphones, “Google Now”), which allows the user to give commands or to ask for information. Recently, “Alexa” speaker of Amazon has been developed and is available for English and German speakers. We are facing a change in human-computer interaction: the interaction between humans and computer systems is shifting towards natural language-based interfaces. This paper aims at reviewing the technologies that have been being developed to build conversational systems. Concretely, we investigate the following research questions: Which technologies have been deployed for developing conversational systems? Which language tricks have been commonly exploited? How are typical evaluation methods for conversational systems?

2 Methodology

In order to answer the above questions, we searched on the Internet using search machines. Documents that matched the keywords “chatbot”, “conversational agent”, “pedagogical agent”, or “conversational system” were collected. The number of resulting papers was enormous. Since we intended to investigate technologies for developing conversational systems, we constrained our search based on the following criteria:

  1. 1.

    The conversational system was developed for scientific purposes;

  2. 2.

    The conversational system must have been scientifically evaluated or participated in a competition;

  3. 3.

    Information about the technologies deployed in that system was available.

At the end, we reviewed 59 conversational systems that are summarized in Appendix “Table of reviewed conversational systems”. We categorized the collected systems into “chatbots” and “dialog systems” (Klüwer 2011; Dingli and Scerri 2013; van Woudenberg 2014). The terminology “chatbot” originated from the system CHATTERBOT, which was invented as a game character for the 1989 multiuser dungeon game “TinyMUD” (Mauldin 1994). From the technical point of view, Klüwer (2011) summarized the following typical processing steps of a chatbot: (1) input cleaning (removal and substitution of characters and words like smileys and contractions), (2) using a pattern-matching algorithm to match input templates against the cleaned input, (3) determining the response templates, and (4) generating a response. The second category of conversational systems is “dialog system”. This term denotes a system, which is able to hold a conversation with another agent or with a human. McTear notes the following differences between dialog systems and chatbots: “Dialog systems make use of more theoretically motivated techniques” and “dialog systems often are developed for a specific domain, whereas simulated conversational systems [chatbots] are aimed at open domain conversation.” (McTear 2004). While a typical chatbot is built based on a knowledge base, which comprises a fixed set of input-response templates and a pattern-matching algorithm, a dialogue system typically requires four components: a preprocessing component, a natural language understanding component, a dialog manager, and a response generation component (Lester et al. 2004). The main differences in the architecture between dialog systems and chatbots are the natural language understanding component and the dialog manager.

These two categories of conversational systems are not clearly defined. Rather, these categories describe typical components of each type of conversational systems. A chatbot may also have been implemented using natural language understanding technologies, e.g., LSABot (Agostaro et al. 2005) or overlaps with other components of a typical dialog system. Despite of the overlapping between the two categories, our review is based on them to classify collected conversational systems and their technologies.

3 Results

3.1 Chatbots

Pattern Matching.

Pattern matching techniques were used by many chatbots including ELIZA (Weizenbaum 1966), SHRDLU (Winograd 1972; Hutchens 1997), Speech Chatbot (Senef et al. 1991), PARRY (Colby 1981; Hutchens 1997), PC Therapist III (Weintraub 1986; Hutchens 1997), Chatterbot in “TinyMUD” (Mauldin 1994), TIPS (Whalen 1996; Hutchens 1997), FRED (Garner 1996; Hutchens 1997), CONVERSE (Batacharia et al. 1997; Bradeško and Mladenić 2012), HEX (Hutchens 1997), Albert One (Garner, 2005; Bradeško and Mladenić, 2012), Jabberwock (Pirner 2005; Bradeško and Mladenić 2012). ELIZA, the first chatbot developed by Weizenbaum (1966), deployed pattern matching in order to generate an appropriate response to the user’s utterance. For example, ELIZA would analyze the user’s input “He says I’m depressed much of the time” by matching it to the keywords in a pre-specified dictionary. Then, for a found keyword, ELIZA applies an associated input-response rule. Based on this principle, ELIZA transforms the phrase “I am” into the phrase “You are”. The response generation algorithm adds a phrase “I am sorry to hear” prior to “you are” and a response is generated “I am sorry to hear you are depressed.”

Cleverscript.

Rollo Carpenter invented the core concepts and developed an algorithm for a chatbot in 1982 (https://www.existor.com/products/cleverbot-data-for-machine-learning). In 1996, this algorithm and the chatbot went online under the name “Jabberwacke”. Since 2006, this chatbot was rebranded as Cleverbot and the authoring language Cleverscript for developing chatbots was announced (Cleverscript 2016). The main concept of Cleverscript is based on spreadsheets. Words and phrases that can be recognized (input) or generated (output) by Cleverscript are written on separate lines of the spreadsheet (Jackermeier 2015). Cleverscript and the concept of this chatbot authoring language make the development of chatbots relatively easy. In 2007, Eviebot (https://www.eviebot.com/en/), a female embodied chatbot with realistic facial expressions, went online. Additionally, Boibot, a male counterpart for Eviebot, was introduced in 2015 (https://www.boibot.com/en/). Both share the same technology with Cleverbot and are able to speak several languages.

Chatscript.

Chatscript is another authoring language, which serves to facilitate the development of chatbots. Similar to Cleverscript, Chatscript is based on pattern matching (Jackermeier 2015). Another special feature of Chatscript is the so-called Concept Set, which covers semantic-related concepts of a constituent in user input. Chatbots that have been developed using Chatscripts include Suzette (Wilcox and Wilcox 2010), Rosette (Abdul-Kader and Woods 2015), Albert (Latorre-Navarro and Harris 2015), and a conversational agent of Bogatu and colleagues (2015).

AIML.

In 2001, an XML based language for developing chatbots called AIML was released. The “A.L.I.C.E.” chatbot (Wallace 2003) was the first one developed using this technology. In the past few years, AIML has established itself as one of the most used technologies in today’s chatbots. AIML is based on pattern matching (das Graças Bruno Marietto et al. 2013). An AIML script consists of several “categories”, which are defined by the tag <category>. Each category consists of only one <pattern> tag, which defines a possible user input, and at least one <template> tag, which specifies the chatbot’s response for the user’s input. Like Cleverscript, AIML makes use of wildcards in order to cover a large possibility of user’s inputs. In order to interpret these AIML tags, a chatbot needs an AIML interpreter, which is implemented according to the corresponding AIML specification (either 1.0 or 2.0). Various AIML interpreters using different programming languages such as Java or Python are available (http://www.alicebot.org). Since developing AIML chatbots does not require skills in a specific programming language, this technology facilitates the development of chatbots. Thus, a huge body of chatbots has been developed using AIML technology such as Freudbot (Freudbot 2009), Max (Kopp et al. 2005), the chatbots in (Pilato et al. 2005), Penelope and Alex (Doering et al. 2008), HmoristBot (Augello et al. 2008), chatbot of Alencar et al. (Alencar and Netto 2011), the system of van Rosmalen et al. (2012), Ella (Bradeško and Mladenić 2012), MathGame (Silvervarg et al. 2013), Chappie (Behera 2016), and Mitsuku (Abdul-Kader et al. 2015).

Language Tricks.

In addition to the technologies for chatbots above, we also notice that many chatbots used language tricks in order to fool users and to pass the evaluation. Abdul-Kader (2015) and Bradeško and Mladenić (2012) summarized four language tricks that are usually used by chatbots including: canned responses, model of personal history, no logical conclusion, typing errors and simulating key strokes. Canned responses are used by chatbots in order to cover questions/answers of the user that are not anticipated in the knowledge based of the chatbot. A model of personal history (e.g., history about the past, childhood stories, social environments, and political and religious attitudes, etc.) enriches the “social background” of a chatbot and pretends the user to a real “person”. Statements with no logical conclusion like “today is today” are embedded in chatbots in order to enrich smalltalks. Typing errors and simulating key strokes are usually used to simulate a “human being” who is typing and making typo errors. HeX (Hutchens 1997), CONVERSE (Batacharia et al. 1997; Bradeško and Mladenić 2012), PC Therapist III (Bradeško and Mladenić, 2012), and TIPS (Bradeško and Mladenić 2012) are conversational systems that make use of one or more language tricks.

3.2 Dialog Systems

Based on typical components of a dialog system (Lester et al. 2004), we reviewed the technologies of these components.

Preprocessing.

Most dialog systems process the user’s input before it is forwarded to the Natural Language Understanding component. The tasks of pre-process are divers. Berger (2014) summarized the following preprocessing tasks of dialog systems: sentence detection, co-resolution, tokenization, lemmatization, POS-tagging, dependency parsing, named entity recognition, semantic role labeling. We found that the dialog systems mostly deployed the following natural language preprocessing tasks: Tokenization (Veselov 2010; Wilks et al. 2010; Eugene 2014; Bogatu et al. 2015; Amilon 2015), POS-Tagging (Lasguido et al. 2013; Dingli et al. 2013; Higashinaka et al. 2014; Ravichandran et al. 2015), sentence detection or chunking (Latorre-Navarro et al. 2015), Named Entity Recognition (Wilks et al. 2010; Lasguido et al. 2013).

Natural Language Understanding.

The result of preprocessing tasks is ready for the natural language understanding component. For this step, the following approaches are used in dialog systems: Latent Semantic Analysis based on the Vector Space Model (VSM), e.g. in LSAbot (Agostaro et al. 2005), IRIS (Branchs et al. 2012), AutoTutor (Graesser et al. 1999), Operation ARIES! (Forsyth et al. 2013), dialog system of Pilato et al. (2005); TF-IDF techniques, e.g., Discussion-Bot (Feng et al. 2007).

Dialog Manager.

The dialogue manager is responsible for coordinating the flow of the conversation in a dialogue system. Approaches to developing dialogue manager are categorized in (1) finite state-based systems, (2) frame-based systems, and (3) agent-based systems Klüwer (2011) and Berger (2014). In finite state-based dialog systems, the flow of the dialogue is specified through a set of dialogue states with transitions denoting various alternative paths through a dialogue graph. At each state, the system produces prompts, recognizes (or rejects) specific words and phrases in response to the prompt, and produces actions based on the recognized response. The dialogue states and their transitions must be designed in advance. Many dialogue systems have been developed applying this approach, e.g. the Nuance automatic banking system (van Woudenberg 2014). Frame-based systems ask the user questions that enable the system to fill slots in a template in order to perform a task such as providing train timetable information. In this type of systems, the dialog flow is not fixed. The dialog flow depends upon the content of the user input, and the information that is elicited by the system. This approach has been used in systems that provide information about movies, train schedules, and the weather. The advantage of the simplicity of these domains is that it is possible to build very robust dialogue systems. One does not need to obtain full linguistic analyses of the user input. The approach underlying agent-based dialog systems is detecting the plans, beliefs and desires of the users and modeling this information in a Belief-Desire-Intention (BDI) agent. Due to the multiple reasoning steps for constructing plans, beliefs and desires of the users, this approach is challenging.

Response Generation. The technologies deployed for generating responses are various in different dialog systems. CONVERSE has a generation module, which adds different types of the same expression to an utterance and generates a smooth response (Batacharia et al. 1997). RITEL has a natural language generation module, which is based upon a set of template sentences (Galibert et al. 2005). The proposed conversational system of Higashinaka et al. (2014) combines different modules for utterance generation: the versatile, question answering, personal question answering, topic-inducing, related-word, Twitter, predicate-argument structure, pattern and user predicate-argument structure modules. The generation of utterances applying these modules is based on the last estimated dialogue-act. The conversational agent Albert (Latorre-Navarro et al. 2015) has a language generation module, which consists of templates containing text, pointers, variables and other control functions.

Special Features.

In addition to technologies for typical dialog systems, we also have learned that conversational systems have been implemented with special features in order to make them more likely “humans”. For instance, some systems are able to learn from conversations and can apply this knowledge later. The chatbot MegaHal (Hutchens 1997; Hutchens et al. 1998) talks a lot of gibberish in order to fool its user, whereas the system Ella (Bradeško and Mladenić 2012) is able to spot gibberish initiated by the user and react in an appropriate way. Moreover, there are many multimodal systems (Ferguson et al. 1996; Bickmore et al. 2000; Bohus et al. 2004; Pradhan et al. 2016), which can communicate with the user through both text and speech channels. With the development of embodied conversational agents, features like gestures, facial expressions or eye gazes become increasingly important (Alexander et al. 2006; Ayedoun et al. 2015). Developers of pedagogical agents also often include graphics, videos, animations and interactive simulations into their system to increase the student’s motivation (Kim et al. 2007; Forsyth et al. 2013; Pradhan et al. 2016).

3.3 Evaluation Methods

Since we only collected conversational systems that have been evaluated or participated in a competition contest, we categorized the evaluation methods that have been used into four classes: (1) qualitative analysis, (2) quantitative analysis, (3) pre-/posttest, and (4) chatbot competitions. Note, that many systems may have been evaluated using more than just one evaluation method.

The first most applied evaluation method was the quantitative method, which used interviews or questionnaires. Examples of conversational systems that have been evaluated using this method include, e.g., Speech Chatbot (Senef et al. 1991), TRAINS-95 (Ferguson et al. 1996; Sikorski and Allen 1996), Herman the Bug in Design-A-Plant (Lester et al. 1997), REA (Bickmore et al., Bickmore and Cassell 2000), LARRI (Bohus et al. 2004), FAQchat (Shawar et al. 2005), Discussion-Bot (Feng et al. 2007), Freudbot (Freudbot 2009), Justin and Justina (Kenny et al. 2011), the dialogue system of Shibata et al. (2014), or Pharmabot (Comendador et al. 2015).

The second widely used evaluation method is quantitative. The quantitative method makes use of dialog protocols generated by conversations between the user and the system. Examples of conversational systems that have been evaluated using this method include RAILTEL (Bennacef et al. 1996), Max (Kopp et al. 2005), HumoristBot (Augello et al. 2008), Senior Companion (Wilks et al. 2010, 2008), SimStudent (MacLellan et al. 2014), Betty’s Brain (Leelawong et al. 2008; Biswas et al. 2005), CALMsystem (Kerly et al. 2007), Discussion-Bot (Feng et al. 2007), the dialogue system of Planells et al. (2013), or Albert (Latorre-Navarro et al. 2015).

The third evaluation method deploys pre- and post-tests. The method has been used usually for evaluating pedagogical agents to measure the learning effect. This method was applied for the evaluation of MathGirls (Kim et al. 2007), My Science Tutor (Pradhan et al. 2016), Herman the Bug (Lester et al. 1997) or MetaTutor (Bouchet et al. 2013; Harley et al. 2014).

The fourth evaluation method is the participation of a conversational system in a competition contest, for example, the Loener prize, which is based on the Turing Test (Abdul-Kader et al. 2015). Loebner Prize winners were, for instance, PARRY (Colby 1981; Hutchens 1977), CONVERSE (Batacharia et al. 1997; Bradevsko et al. 2012), A.L.I.C.E (Wallace 2003), Albert One (Garner 2005; Bradeško and Mladenić 2012), Elbot (Abdul-Kader et al. 2015), and Mitsuku (Abdul-Kader et al. 2015).

4 Discussion and Conclusions

In this paper, we have reviewed the technologies, language tricks, special features, and evaluation methods of conversational systems. While chatbots deploy dominantly pattern matching techniques and language tricks, most dialog systems exploit natural language technologies. We also have learned that most chatbots participated in the Turing test contests (e.g., Loebner prize), while dialog systems were mostly evaluated by the pre-/post-test, quantitative, or qualitative methods. This can be explained by the fact that dialog systems are more goal-oriented (e.g., to improve learning gains of students) and chatbots rather serve smalltalks in different domains. Based on the summary table in Appendix, we can notice the tendency of applied technologies for conversational systems: they are becoming more AI-oriented and deploying more natural language processing technologies.

In this paper, due to the page limit, we summarized the technologies for developing conversational systems. We plan to elaborate on these technologies in more details in a journal article.