Keywords

1 Introduction

Artificial Intelligence (AI) applied in education is expanding quickly. Some of the most popular AI technologies, like Conversational Agents are being used to support teaching and learning activities in the classroom or at home [1].

Conversational agents (CA) or dialog systems, also called chatbots or chatterbots, have become increasingly common. Language-based HMIs, like rirtual assistants or chatbots, provide information without time-consuming queries. Moreover, they hide the complexity and size of the information behind [2]. Applications of chatbots range from personal assistants on cell phones, sales bots on e-commerce websites, information retrieval, helpdesk, customer support and digital assistants, teaching-learning process support, and others. These systems are intended to carry out coherent conversations with humans in text or speech or both, in natural language. The creation of chatbots dates back to the ELIZA conversational system, which emulated a psychotherapist [3]. Over the years, new Artificial Intelligence (AI) techniques have been applied to the construction of these agents, so that examples of systems can be broadly categorized into three paradigms or “generations”: the first, based on the combination of patterns and grammar rules; the second, grounded in production rules and artificial neural networks; and the third, which makes use of AIML markup languages [4]. However, the development of intelligent conversation with agents is still an unsolved research problem that raises many challenges in the artificial intelligence community [5]. This paper aims to identify and compare the main existing approaches to build chatbots, categorize them, compare them and highlight the main strengths and weaknesses. It also seeks to contextualize their use in an educational context. The main goal is to discover the issues related to this task that may help in choosing future research in the area of conversational NLP in an educational context.

2 Categories of Conversational Agents

CA can be categorized according to different characteristics, such as the interaction type, the domain of application, its purpose and the response generation models. The considered characteristics can consider the main learning strategy of the CAs and the contextualization capabilities of the model. In general we can classify CAs, based on different aspects [6]:

  • Mode of interaction

  • Goals

  • Design approach

  • Knowledge domain

  • Regardless of how the Response Generation is done, these Chatbots share the same basis: analyze what the user says, interpret that analysis, and finally provide a response.

3 Approaches in the Implementation of Conversational Agents

This section discusses how CA can be developed, highlighting rule-based CA and AI- based CA. In AI-based CA, a distinction will also be made between information retrieval CA and generative CA. The pros and cons of each approach are also discussed. It should be noted that it is possible to use combinations of different Models in order to produce as optimal results as possible.

3.1 Rule-Based

Initial CA were based in rules. These approaches are generally simpler, but less broad in scope due to the lack of capabilities in responding to difficult questions [7]. Rule-based CAs reply to queries through pattern matching. In this way, they are insensitive and unable to adapt to unknown patterns. In addition, pattern-matching rules can be difficult and time-consuming to produce and maintain. Pattern matching rules are specific to a domain, and not easily transferable among different contexts [7].

3.2 AI-Based

Unlike rule-based models, AI based approaches usually rely on ML models and extract information by learning from previous knowledge of through interaction with humans. In order to accomplish such task, it is required to train with an ML algorithm that can learn a model based on training samples. Using ML algorithms removes the need to manually outline and code new sample matching rules, making chatbots greater and much less depending on a specific domain knowledge. [7]. These models can be subcategorized into models based on Retrieval Information and Generator models.

Information Retrieval (IR) Based

Having a dataset of Question-Response(Q-R) pairs, the IR-based model will search the Q-R dataset for the pair (Q’,R’) that best matches Q and returns R as the answer to Q [5]. Through this process, it enables reflecting training samples. Many search baseline models have been proposed to accomplish this purpose. [8]. Various works have addressed Term Frequency-Inverse Document Frequency (TF-IDF) retrieval models as a way to create CAs. For example, in [9] this approach is used to create a model directed to customer assistance and suggestion of products. Authors propose the application of Rhetorical Structure Theory [10] as a may to represent the characterization of connections among different replies. Among the used open domain datasets that have been most widely used to create the dialogue systems for generalist IR-based chatbots are WikiAnswers, Yahoo Answers, and Twitter conversations [7].

Generator Model-Based

Generator templates, create new answers for sentences according to the human interaction. Completely new sentences can be generated to respond different queries. Accomplishing this requires such models to learn how identify text structure and syntax, which is a difficult task. Consequently, results may lack consistency and even elegance in the generated texts [11]. Generators are usually based on sentences drawn from conversations. The algorithm learns from the data it is given. Its goal is to enable algorithms to generate good, linguistically correct answers based on input texts. Such models are generally based on deep learning (DL) algorithms that consist of encorder/decoders. [12].

Standard Models

Sequence-to-Sequence (Seq-to-Seq) models are the standard for chatbot modeling [7]. These models are fit for machine language problems; however, they also present good performance in natural language creation. The typical approach is using encoders and decoders [12]. This type of approach has several advantages. It is able to learn from data of different natures, domains and contexts, i.e. different domains, rather than one specific domain. This model does not require domain-specific knowledge to yield valuable results, but can be adapted to work with other algorithms if domain-specific knowledge needs to be further incorporated. Hence becoming a straightforward, but dynamic model, which may applied to very distinct PLN problems [13]. However, the main problem is that the size of the contextual information is restricted to a single vector, which means that when the size of the input text increases, there is a much higher chance that information, possibly relevant, will be lost. As a consequence, sequence models under-perform when analysing long sentences and often generate confounding responses. Additionally, Seq-to-Seq models address single response at each time, hence often outputing inconsistent conversational order. [11].

Transformers

Transformers are the new trend in automatic/intelligent language models [14]. Transformers learn how to measure the importance of different pieces of data/text. They also support training parallelism, which allows dealing which much bigger pieces of data than before. These models have given birth to some of the most famous pre-trained systems such as BERT (Bidirectional Encoder Representations of transformers) [15] and GPT (transformer pre-trained generator). These models have created, and have evolved, using large language datasets, such as the Wikipedia and Common Crawl corpuses. However, they can still be refined for ad-hoc problems [16]. Other models were developed to address specific challenges, e.g. Reformer [17] and Transformer XL [18].

4 Evaluation Methods

A variety of CA evaluation methods have been used (Table 1). These usually follow the ISO 9214 usability guidelines [19]. The most popular methods for evaluating CAs are those based on efficiency. Other methods used are those based on satisfaction and effectiveness [20].

Table 1. Methods for evaluating chatbot against ISO 9214 [20]

5 Conversational Agents in Education

Many works can be found on CAs applied in teaching and learning [21, 22], assessment [23], administrative service delivery [24], consulting [25] or research and development [26].

The main advantages of using CA in education include [27]: content delivery, for example the ability for teachers/tutors to provide information in an online platform; quick and easy access, stimulus and engagement of learners. CAs in education also allows providing instant support during individual learning by supporting learners to facilitate activities e.g. delivering homework and evaluations [1], replying e-mails [28], adaptable to students’ actions and emotions [29], and fast responses to their queries [30]. Future paths for CA research in aspects related to education include the development of ethical and functionality principles and usability testing. This denotes that the framework for chatbot development and implementation as well as design and content functionality needs to be improved. [27].

6 Conclusions

The development and use of conversational agents is increasing rapidly in multiple application domains. These agents are emerging in the form of virtual assistants, chatbots and other language-based interfaces, interacting with humans as digital assistants, sales bots, customer supporter, among many others. This paper has analysed how these systems are able to carry out coherent conversations with humans in text or speech or both, using natural language. For this purpose, while focusing on the application of conversational agents in education, the paper has identified several of the most promising approaches for the implementation of conversational agents, has reviewed how the performance evaluation takes place, and identified some relevant paths for future research and development, which include the development of functionality and ethical principles in chatbots, and the improvement of usability testing.