Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introducing Conversational Agents

In the field of artificial intelligence, the terms “agent” or “intelligent system” refer to any entity that perceives its environment through sensors and acts upon it using effectors (Franklin & Graesser, 1997). However, through the prism of e-learning, a “pedagogical agent” refers to a computer-generated character typically employed to fulfill a series of pedagogical aims in an educational system (Gulz, Haake, Silvervarg, Sjödén, & Veletsianos, 2011).

In our work, we focus on “conversational agents,” a subgroup of pedagogical agents that engage in a conversation with the learners using natural language. The type of communication occurring between a conversational agent and a learner can be text based, oral, or even nonverbal, including body language movements and facial expressions (Kerly, Ellis, & Bull, 2009).

Moreover, the graphical representation of conversational agents may also vary, ranging from a two-dimensional cartoonish appearance to a three-dimensional photo-realistic character (Veletsianos, 2010). Conversational agents that have a visual representation are frequently mentioned as “embodied” conversational agents (Cassell, Sullivan, Prevost, & Churchill, 2000). Research has repeatedly identified the agent visual appearance as an important design element, which affects learners’ stereotypes or expectations of the agent intelligence (e.g., Haake & Gulz, 2008; Veletsianos, 2010). Indeed, the agent embodiment had a major impact on the evolution of conversational agents from the impersonal characters found in the intelligent tutoring systems (ITSs) of the past to the tangible personalized pedagogical agents of today (Gulz et al., 2011).

Another important factor regarding the agent design is the role of the agent in the learning environment. Conversational agents have been developed to serve multiple pedagogical roles including—but not limited to—coaches, tutors, motivators, or learning partners (Haake & Gulz, 2009). Many studies have been conducted to explore the various roles and uses of conversational agents in individual learning settings, where the agent has engaged in one-to-one interactions with the learner (Kerly et al., 2009). Agents acting as peer learners have been shown to lower students’ anxiety and promote students’ empathy (Chase, Chin, Oppezzo, & Schwartz, 2009). Additionally, it was reported that such agents tend to be less intrusive than agents acting as instructors (Sklar & Richards, 2010).

More recently, taking into account the pedagogical benefits of computer-supported collaborative learning (CSCL) (Dillenbourg, 1999), researchers have expressed their interest in assessing the use of conversational agents in providing dynamic collaborative learning support (e.g., Chaudhuri, Kumar, Howley, & Rosé, 2009; Walker, Rummel, & Koedinger, 2011). A study revealed that conversational agents can efficiently utilize both social and task-oriented intervention strategies to support students’ collaboration (Kumar, Ai, Beuth, & Rosé, 2010). Furthermore, other studies explored the positive impact of conversational agents on collaborative learning settings by emphasizing on discourse scaffolding (Stahl, Rosé, O’Hara, & Powell, 2010), reflective prompting (Walker et al., 2011), or reasoning elicitation (Kumar, Rosé, Wang, Joshi, & Robinson, 2007).

Encouraging as such findings may be, several key questions have emerged. For instance, what types of collaborative problems are best suited for such conversational agent systems? (Harrer, McLaren, Walker, Bollen, & Sewall, 2006) Should the supportive prompts provided by the agent be solicited or unsolicited? (Chaudhuri et al., 2009) How can the different roles of the agent (e.g., tutor, peer, motivator) affect peer dialogue? What is the impact of the agent presence (“persona effect”) on the behavior of students working together? (Veletsianos & Russell, 2014)

Following this potentially promising research direction, we have argued that conversational agents for collaborative learning can be designed by focusing on the role of the teacher as well as the peers’ interactions occurring while students work together (Tegos, Demetriadis, & Tsiatsos, 2012). Based on this rationale, we have developed a prototype conversational agent system, named MentorChat (Tegos, Demetriadis, & Tsiatsos, 2014). In the following sections, we present an overview of the MentorChat system and an evaluation study exploring how the students’ perceptions of the agent and their conversational behavior may be affected by the different roles (peer or tutor) of a conversational agent.

MentorChat System Overview

MentorChat is a cloud-based multimodal dialogue system that utilizes an embodied conversational agent to scaffold learners’ discussions (Tegos et al., 2014). We have developed MentorChat as a domain-independent dialogue system that (a) promotes constructive peer interactions using facilitative agent interventions (prompts) and (b) enables the teacher to configure the support provided by the conversational agent.

MentorChat can support discussions in English or Greek and was implemented using modern web technologies, such as HTML5, CSS3, and AJAX. The system infrastructure is based on a client–server model, which allocates workloads between the server and the clients. Its architecture comprises three main modules: the student, the teacher, and the conversational agent module (Fig. 1).

Fig. 1
figure 1

MentorChat system architecture

The conversational agent of MentorChat is based upon the following three models:

  • The peer interaction model, which records and stores the students’ interactions in a computational format

  • The domain model, which utilizes the teacher’s domain knowledge representation in conjunction with the pattern-matching algorithms to determine whether an agent intervention would be appropriate

  • The intervention model, which examines a series of various micro-parameters (e.g., the time passed since the last agent intervention) to determine if an intervention will eventually be displayed

A teacher can use MentorChat to design, deploy, and monitor an online dialogue-based learning activity. These can be accomplished using the MentorChat administration panels, which are available in the teacher’s interface. More specifically, the teacher may set up the discussion topics/phases of the collaborative activity (activity structure panel), manage the participating users and groups (user management panel), monitor groups’ discussions (monitoring panel), or configure the agent domain model for the activity by inserting a set of rules or creating a concept map (domain modeling panel).

Students entering MentorChat are asked to collaborate with their partner(s) on a given task (Fig. 2, A) using text-based synchronous communication. During the students’ dialogue, an animated humanlike conversational agent (Fig. 2, B) analyzes their discussion providing supportive interventions that trigger fruitful peer interactions on key domain concepts. Each agent intervention is dynamically displayed in a pop-up frame, next to the peers’ chat frame (Fig. 2, C), allowing learners to complete their ongoing conversational interaction before answering the agent question.

Fig. 2
figure 2

A (translated) screenshot of the MentorChat student interface

Method

MentorChat was used in an experimental activity, which was conducted in the context of a computer science course offered by the Second Chance School of Thessaloniki in Greece. The aim of the study was to explore the impact of two different agent roles (peer vs. tutor) on students’ perceptions and behavior. The total participants of the study were 24 Second Chance School students (13 males and 11 females), who were not able to attend mainstream secondary education for various socioeconomic reasons. Students were adults whose age ranged from 19 to 67 years (N = 24, M = 37.4, SD = 13.36). Although their nationality also varied (e.g., Albanians, Bulgarians, Greeks), all of them spoke Greek in class.

Procedure

In the course lectures, the students were introduced to the concepts of the Internet and the web-based applications. The classroom sessions involved many discussions around synchronous and asynchronous communication tools, micro-blogging, blogging, and social networking. The MentorChat environment was also presented to the students as an example of an online collaborative learning environment.

After a 3-week period, an experimental activity was carried out in the computer lab of the Second Chance School for 2 teaching hours (90 min). The participating students were asked to use MentorChat to discuss the web applications they had learned and used in class. They were also informed that during their conversation a virtual peer or tutor would raise some questions, which they should discuss within their group in order to provide a joint answer.

The agent was configured by the two classroom teachers to raise issues regarding social networking, search engines, and modern communication tools. In particular, the teachers used the MentorChat domain authoring panel to form the agent domain model by entering a set of rules. Each rule consisted of a domain concept (a key word or phrase—e.g., “Mozilla Firefox”) along with a particular intervention. The interventions were reflective questions that asked students to elaborate on the subject and provide a thoughtful joint response (e.g., “If you want to create a webpage featuring articles in a chronological order, should you use a blog or a wiki? Why?”).

During the students’ discourse, the conversational agent displayed the teacher-defined intervention whenever the associated domain concept (keyword or phrase) was identified. Subsequently, the students were encouraged to discuss with each other and provide a joint response, typing their answer into the agent answer box (Fig. 2, C). In addition to this intervention method, which was active throughout the students’ discussion, a final intervention was also made by the agent at the end of the activity. More specifically, before exiting the activity, the agent reminded students all the teacher-defined domain concepts that had not mentioned during their discussion, providing them with the option to continue their conversations on the suggested topics (e.g., “It seems that you have not discussed wikis. Do you want to continue your discussion or finish the activity?”). If students’ discussion included all the teacher-defined domain concepts, the agent did not display any intervention.

Compared Conditions

The teachers assigned the students into small groups consisting of two or three members (six dyads and four triads). Each student was given a score, which indicated students’ expertise in computer science (based on the course grades and in-class performance), and the final groups emerged in such a way as to be slightly heterogeneous. According to Rovai (2007), the above method constitutes an effective strategy for creating an educational context that facilitates peers’ online discussions and promotes equal participation.

Furthermore, a combined score was also calculated for each group based on the average of the individual students’ scores. Taking this score into account, instructors stratified the student groups by their domain knowledge and assigned them to two conditions so that both conditions were balanced in terms of the overall scores of the groups.

Our two-condition experimental design involved (a) six groups (four dyads and two triads) interacting with a conversational agent that enacted the role of a peer (P condition) displaying the interventions in an informal manner (Fig. 3, A) and (b) five groups (two dyads and three triads) interacting with an agent that enacted the role of a tutor (T condition) employing a more formal appearance and communication style (Fig. 3, B). Each of the teacher-defined agent interventions was tailored according to the different agent communication styles in the two conditions (Fig. 3).

Fig. 3
figure 3

The agents acting as peer (A) and tutor (B) in the P and T conditions, respectively

Data Collection and Analysis

Post-task Questionnaires

After the activity, students were asked to fill in a post-task questionnaire, which aimed to explore students’ opinions about the MentorChat interface and the agent interventions. The questionnaire included three multiple-choice questions, two open-ended questions, and ten Likert-scaled questions. Measures of central tendency were computed for all questionnaire items. Additionally, a series of Pearson product-moment correlation coefficients was calculated to examine the relationships among the questionnaire variables.

Interviews

Interviews were conducted in order to record details of how the students worked or perceived the learning activity. The interviews were semi-structured and lasted for about 10 min each. They focused on students’ opinions about (a) the collaborative dialogue-based activity as a whole, (b) the usability of the MentorChat tool, and (c) the pedagogical efficacy of the conversational agent interventions. Students were interviewed individually. All interviews were transcribed verbatim and analyzed in search of common themes using the open-coding process of the constant comparative method (Corbin & Strauss, 1990).

Discourse Data Observations

Following the completion of the activity, the authors examined the text files of all group discussions. In particular, the authors acted as independent raters assessing the degree of formality/informality of users’ responses to the agent. A scoring rubric, deriving from Moskal’s study (2000), was used to measure the formality of students’ utterances on a simple 2-point scale (0 for formal and 1 for informal). The inter-rater reliability for the scoring process was found to be high (Kappa = 0.82; ICC = 0.83). Following this asynchronous process, raters participated in a roundtable discussion elaborating on each group dialogue to draw joint inferences.

It should be noted that the individual student constituted the unit of analysis for the post-task questionnaires and the interviews, whereas the discourse data observations involved both individual- and group-level analyses.

Results

Post-task Questionnaire Analysis

The post-task questionnaire results revealed that most students were familiar with instant messaging applications (F = 70.83 %), while only some of them used them on a daily basis (F = 41.2 %). They also rated their typing speed as slightly below average (N = 24, M = 2.38, SD = 1.27) on a 5-point scale ranging from 1 (slow) to 5 (fast).

Table 1 presents the descriptive statistics computed for the Likert-scale questions that measured the user acceptance of the MentorChat tool. Likewise, Table 2 presents a selection of the results relating to the agent behavior.

Table 1 The questionnaire results concerning the MentorChat tool
Table 2 The questionnaire results concerning the interventions of the agent

Furthermore, a Spearman’s product-moment correlation analysis revealed two significant correlations among the questionnaire variables. First, there was a negative correlation between the “learners’ age” and the “system ease of use” (r = −0.51, p = 0.01). Second, the “learners’ typing skill” was found to be positively correlated with the “comprehensiveness of the interface options” (r = 0.48, p = 0.02). These correlations were anticipated since younger students are typically more familiar and experienced with the interface and functionality of instant messaging applications.

Given that the normality and the homogeneity of variance criteria were satisfied, we proceeded to apply parametric statistics to our individual-level questionnaire data. More specifically, a series of independent samples T-tests was performed comparing the scores of the questionnaire variables in the P and T conditions. The analysis did not reveal any significant difference in the scores for the two conditions. Nevertheless, although nonsignificant (t[22] = 4.76, p = 0.07), worth mentioning is the difference in how students in the two conditions perceived the content of the agent interventions. In respect to the agent interventions appearing during students’ discussions, the students who interacted with the peer-agent (P condition) considered its interventions as more comprehensible (M = 4.00, SD = 0.35) than the students who interacted with the tutor-agent (T condition) (M = 4.92, SD = 0.05).

Interview Analysis

The qualitative data that derived from the analysis of the interview transcripts indicated five common themes, as presented in Table 3.

Table 3 Common themes identified

Discourse Data Observations

The examination of the groups’ discussions revealed a series of patterns regarding the students’ interaction and conversational behavior. First of all, we observed that when participants entered their group discussions, they posted a number of messages (N = 10 groups, M = 7.10, SD = 4.15) that played a purely social function and were not related to the task. Although a lot of students (10 out of 24) initiated their discussion typing in “Greeklish” (i.e., writing in Greek but using Latin characters), all of them altered their typing to Greek when at the beginning of the activity they saw the agent prompt urging them to use Greek characters.

Furthermore, the agent interventions displayed, especially at an early stage of the activity, seemed to have caused some confusion in half of the groups (5 out of 10). More specifically, a relatively high number of task coordination contributions was identified in students’ utterances after the first agent intervention occurred. Taking a closer look at the students’ dialogues we found that, despite the agent explicit instructions (Fig. 2, C), some peers (8 out of 24) could not understand at first if they should individually answer to the agent or provide a joint response. In fact, in some occasions, some of these students rushed to provide a response without reaching an agreement with their partner.

It should also be noted that even though all students communicated with each other in a friendly manner, we observed a considerable difference in the way the learners interacted with the agent in the P and T conditions. Specifically, the descriptive statistical analysis of the rubric scores indicated that the student groups (N = 5) in the P condition responded to the agent questions in a far more informal way (M = 0.84, SD = 0.21) as compared to the student groups (N = 5) in the T condition (M = 0.18, SD = 0.16). In particular, the students in condition P responded to the agent as they would respond to a question of their human partner(s) (e.g., “Hi Elena! I can help you with the webpage …”) while the students in condition T engaged in a more formal communication with the agent (e.g., “From my point of view, the webpage should be developed …”), as a student would answer the question of a human tutor in class.

Moreover, an examination of the agent interventions made revealed that most of the groups (seven out of ten) did not discuss all the teacher-proposed topics and, hence, triggered an agent intervention at the end of the activity. Although these agent interventions reminded these groups of the domain topics not discussed, only some of them (four out of seven) decided to resume their discussion on the topics proposed. Nevertheless, we consider that this happened mainly due to the limited duration of the learning activity.

Discussion and Conclusions

The main goal of the study is to investigate whether the different roles (peer or tutor) of the conversational agent may affect students’ perceptions of the agent and their utterances. The study results indicate that the different appearance and communication styles of the agent affected the formal/informal style of the students’ responses to it (discourse data observations) but did not conclude in any significant difference in students’ opinions about the agent (post-task questionnaires).

More specifically, all students had a favorable opinion regarding the user interface and the usability of MentorChat (Table 1). Moreover, students believed that the agent interventions were simple and comprehensible and made the discussion more interesting. They also stated that the agent interventions helped them recall and identify valuable points of the topics being discussed (Table 2, rows 3 and 4) or understand the domain subject better through answering the agent questions (Table 2, row 6).

Furthermore, all the students seemed to appreciate the agent interventions and the analysis did not reveal any significant difference between P and T conditions. The students in both conditions perceived the agent as a valuable discussion facilitator whether acting as a tutor or a peer (Table 2). Students’ perceptions were not adversely affected by the different appearance or communication styles of the agent.

However, although no statistically significant differences were reported in the post-task questionnaires, there is some evidence to suggest that students in the P condition considered the agent interventions as more comprehensible than the students in the T condition. Based on our findings, we argue that the friendlier interventions of the peer agent had a more positive impact on students, making them feel as if they were engaged in human-to-human conversation, and eventually more willing to focus on prompt information.

This result seems to support the “personalization principle” of multimedia learning theory as described by Clark and Mayer (2011). This principle suggests that instructional designers should use a conversational rather than formal communication style so that learners interact with the interface in a way that resembles human-to-human conversations. Indeed, although the students interviewed reported being aware that the virtual character was not in an actual conversation with them, they seemed more likely to act as if the agent was their conversation partner in the P condition.

Moreover, the discourse data revealed that students in the P condition responded to the agent questions in a more friendly/informal way as compared to students in the T condition. This result shows that the appearance of the agent or the conversational style of the agent interventions may influence students’ conversational behavior. Hence, the different role of the agent in a collaborative learning activity can impact on specific social characteristics of students’ utterances.

This finding appears to be consistent with the outcomes of other studies exploring how the different roles and appearances of contextually relevant conversational agents can affect learners’ impressions, stereotypes, or engagement (Veletsianos, 2010). For instance, Rosenberg-Kima et al.’s study (2010) indicates that the strategic use of pedagogical agents of various races and genders can provide learners with social models that are similar to them, thus increasing their interest towards the agent.

Furthermore, Gulz et al. (2011) highlight that a key challenge for the agent design is to manage students’ expectations about the social profile of the conversational agent. Students have expectations of both what the agent may be able to say to them and how it will address them. Thus, a good match between the students’ expectations of the agent’s social profile can alter how the students perceive the agent’s general personal features (e.g., a humorous or a serious character, a figure of authority, or a peer) as well as enhance the pedagogical objective of making the conversation engaging and motivating.

In spite of the various study limitations, such as the small number of participants and the limited duration of the activity, we consider that this study provides preliminary evidence and valuable insights into the potential effect of the conversational agent roles (peer or tutor) and their subsequent appearance and communication styles in collaborative learning settings. We consider that teacher-configurable conversational agents have a pedagogically beneficial role to play in the e-learning systems of the future. We expect this study to contribute towards exploring various agent roles or interventions that can improve collaboration in everyday instructional situations.