Teachable Agents and the Protégé Effect: Increasing the Effort Towards Learning

Chase, Catherine C.; Chin, Doris B.; Oppezzo, Marily A.; Schwartz, Daniel L.

doi:10.1007/s10956-009-9180-4

Teachable Agents and the Protégé Effect: Increasing the Effort Towards Learning

Published: 30 June 2009

Volume 18, pages 334–352, (2009)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Journal of Science Education and Technology Aims and scope Submit manuscript

Teachable Agents and the Protégé Effect: Increasing the Effort Towards Learning

Download PDF

Catherine C. Chase¹,
Doris B. Chin¹,
Marily A. Oppezzo¹ &
…
Daniel L. Schwartz¹

6212 Accesses
205 Citations
114 Altmetric
15 Mentions
Explore all metrics

Abstract

Betty’s Brain is a computer-based learning environment that capitalizes on the social aspects of learning. In Betty’s Brain, students instruct a character called a Teachable Agent (TA) which can reason based on how it is taught. Two studies demonstrate the protégé effect: students make greater effort to learn for their TAs than they do for themselves. The first study involved 8th-grade students learning biology. Although all students worked with the same Betty’s Brain software, students in the TA condition believed they were teaching their TAs, while in another condition, they believed they were learning for themselves. TA students spent more time on learning activities (e.g., reading) and also learned more. These beneficial effects were most pronounced for lower achieving children. The second study used a verbal protocol with 5th-grade students to determine the possible causes of the protégé effect. As before, students learned either for their TAs or for themselves. Like study 1, students in the TA condition spent more time on learning activities. These children treated their TAs socially by attributing mental states and responsibility to them. They were also more likely to acknowledge errors by displaying negative affect and making attributions for the causes of failures. Perhaps having a TA invokes a sense of responsibility that motivates learning, provides an environment in which knowledge can be improved through revision, and protects students’ egos from the psychological ramifications of failure.

The interrelationship between concepts about agency and students’ use of teachable-agent learning technology

Article Open access 18 April 2019

From Design to Implementation to Practice a Learning by Teaching System: Betty’s Brain

Article 02 July 2015

Teaching Without Learning: Is It OK With Weak AI?

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The interactive potential of the computer naturally draws comparisons to social behavior. For example, the Turing test proposed that if a human interacts with a computer, and the human believes the computer is a person, then the computer has achieved human intelligence (Turing 1950). A number of computer programs were engineered to challenge the validity of the Turing test. ELIZA, for instance, successfully impersonated the dialog of a Rogerian therapist, but the computer used such simple rules that it would be absurd to consider it truly intelligent (Weizenbaum 1976). Whether or not the Turing test is adequate for deciding the intelligence of a computer, it is useful to note that the test is really about the social behavior of the computer. There could have been other tests of human intelligence; for example, could the computer learn language? But, instead the test assessed whether people would treat the computer as a social entity. Here, we use the natural social attractions of the computer to improve students’ science learning.

Computers readily draw forth people’s social schemas. Even when they explicitly know they are interacting with a computer, people will behave in socially appropriate ways (Reeves and Nass 1998). People’s tendency to attribute social intelligence to computers has fueled the creation of graphical worlds that comingle human and computer intelligence. Examples include Second Life, the Sims, and World of Warcraft—where people interact with graphical characters that may represent a live person or a computer character. These human-computer hybrids not only boost natural social inclinations, they can also produce novel social configurations that sustain unusual psychological states. For instance, game players can program graphical characters to act (and interact) in virtual social worlds, even when the players are no longer at their computer.

The novel social configuration presented here involves software agents that blend student and computer intelligence. We have created a computer-based learning environment that features a Teachable Agent (TA)—a graphical computer character that students teach. The TA uses artificial intelligence to learn and reason about what it has been taught. TAs are a hybrid; they reflect their owners’ knowledge, yet have minds of their own. This social arrangement has benefits for learning. For example, students are likely to adopt their TAs’ reasoning methods (Schwartz et al. 2009). Here, we focus on the motivational consequences.

We begin with a brief review of agents and avatars, which are the two main classes of virtual characters used in educational applications. We then introduce TAs, which combine properties of agents and avatars. This sets the stage for two studies that demonstrate what we term the protégé effect: students make greater effort to learn for their TAs than they do for themselves. The first study produces this effect, even when the only difference between conditions is whether students believe they are learning for their TAs or for themselves. The second study shows the social nature of the interaction with the TA and how it contributes to the protégé effect. We conclude with some initial thoughts on the role of TAs in creating a distinctly social set of motivations to learn, which are supported by an ego-protective buffer, an incrementalist approach to learning, and a sense of responsibility.

Learning and Motivation with Agents, Avatars, and Hybrids

Interactive computer characters traditionally come in one of two forms: avatar and agent (Bailenson and Blascovich 2004). An avatar is a character that represents and is controlled by a human. For example, in a video game, the characters manipulated by the players are avatars. In contrast, an agent is a character controlled by the computer. When people play a hockey video game by themselves, they each control their own avatars, while the computer controls the other players (agents) on the team. One of the interesting things about these computer games is that the users can jump from character to character, so they control whichever player happens to have the hockey puck. This is a nice example of a novel social configuration that computers support.

Agents and avatars each have advantages for education. A number of useful learning situations can be created by agents (for a nice collection of instances, see Baylor 2007). For example, agents can provide role models for how to think or act. Ryokai et al. (2003) used an embodied conversational agent named Sam to engage children in collaborative story-telling. Children who interacted with Sam adopted his conversational behaviors and used more advanced narrative skills than children who conversed with peers. Another type of agent is a pedagogical agent, which provides advice to learners. For instance, Shimoda et al. (2002) used a panoply of agents to deliver meta-cognitive tips during scientific inquiry. Clarebout et al. (2002) have created a typology of pedagogically relevant agent behaviors such as showing, explaining, and questioning.

Agents can also be used to improve motivation. Lester et al. (1997) experimented with five varieties of Herman the Bug, a pedagogical agent who worked with middle school students as they designed a plant. In a condition where the agent gave no advice but exhibited social behaviors of encouragement, students gave the agent high ratings on entertainment value and chose to have Herman help them with homework. Lester et al. (1997) dubbed this the persona effect, claiming that the socialness of the agent helped to engage students with the software. Similarly, Baylor and Kim (2005) found that pedagogical agents equipped with encouraging dialogue were perceived as more motivating and showed a moderate trend for enhancing student self-efficacy.

Like agents, avatars (which humans control) may also have benefits for learning. For example, people may learn to take on the attributes of their avatars. Yee and Bailenson (2007) termed this the proteus effect. In one study, participants were assigned to use either a tall or short avatar. They then played a negotiation game with another person in virtual reality. The people who played as the tall avatar were tougher negotiators and were more likely to come out ahead. Presumably, they took on the stereotype that height confers power and authority. This tendency for adoption has educational potential, when the attributes to be adopted are useful dispositions for learning.

Avatars can also motivate students to take risks. If the avatar makes a mistake, the user does not necessarily suffer the consequences. When getting checked into the boards in a virtual hockey game, the players not only do not get hurt, but they can also “laugh it off”. Just as computer simulations of nuclear fusion are physically safer than the real thing (Perkins et al. 2006), avatars can make it psychologically safer to try new things, without experiencing the real consequences of failure.

A hybrid agent/avatar blends the properties of an agent and an avatar. It is a character that includes a bit of the computer and bit of the human user. A key element of a hybrid agent/avatar is its ability to behave without explicit human control while still reflecting prior interactions with a human user. A growing number of hybrids vary the mix of human dependence and independence. Some applications have the user try to “program” a character so it lives and acts exactly the way the user intends (Gerhard et al. 2004; Imbert and de Antonio 2000). For example, in The Sims, a popular commercial game, computer characters behave based on the attributes supplied by their users plus some amount of their own apparent “free will”.

Another example hybrid is the Tamagotchi—a digital pet housed in a small, egg-shaped computer. Children are responsible for feeding, cleaning, and nurturing their Tamagotchis. The pets respond and grow based on the children’s care. Children (especially girls) find the responsibility and nurturing highly motivating (Pesce 2000). The research presented here shows that a sense of responsibility towards a hybrid can lead to educationally relevant outcomes as well.

A TA is a “sentient” hybrid agent/avatar that has been specifically designed for educational outcomes. The TA engages learners in a teacher-pupil metaphor and takes on the role of protégé. The student teaches the TA, so the TA is dependent on the student. At the same time, the TA contains artificial intelligence that allows it to behave independently. For instance, the TA can reason, answer questions, and complete various assessments based on how it was taught. Moreover, a TA possesses the educational benefits of both agents and avatars. Like an agent, a TA provides an independent social presence that motivates students to interact with it, plus it offers new models of thinking and reasoning. Like an avatar, the TA has properties that students can adopt, without the intellectual risks that come with learning something on one’s own.

A Teachable Agent Called Betty’s Brain

There are several types of TA software (see Schwartz et al. 2007); here we focus on Betty’s Brain. Betty was designed to model chains of cause and effect relationships. For example, when the brain’s temperature set point rises, several multi-step pathways cause the body’s temperature to increase and develop a fever (see Fig. 1). Betty is especially relevant to science domains where long chains of qualitative causes are a useful way to explain phenomena. Biology content like food webs and ecosystems, bodily systems, and global warming are well-modeled by Betty’s architecture.

Before teaching in Betty’s Brain, each student names and designs the appearance of her own TA (Betty’s Brain is the name of the software; students create characters for themselves). A student then teaches her TA by creating a concept map of nodes connected by qualitative causal links; for example, ‘heat release’ decreases ‘body temperature’. The map fancifully symbolizes the interior of the TA’s brain. Once taught, a TA can answer questions. For instance, Betty includes a simple query feature. In Fig. 1, the TA uses the map it was taught to answer the query, “If blood flow to skin increases, what happens to body temperature?” Using basic artificial intelligence techniques, the TA animates its reasoning process by successively highlighting each node and link in the causal chain (see Biswas et al. 2005). A student can trace her TA’s reasoning, and then remediate its knowledge (and her own) if necessary. A TA always reasons logically, but depending on the nodes and links it was taught, it will reach a right or wrong answer.

Betty’s Brain is not meant to be the only means of instruction, but rather to provide a way for students to organize and reason about content they have learned in the classroom (Schwartz et al. 2007). Betty is intended to complement many styles of instruction, not replace them. One of her complementary strengths is feedback. Betty comes with a number of software options that provide feedback in various forms, some of which can spark classroom discussion. The option shown in Fig. 2a enables a teacher to project multiple TAs’ maps using a classroom projector. The teacher can ask the same question of all the TAs simultaneously, then zoom in to focus the discussion on one or two maps. Figure 2b shows the All Possible Questions (APQ) matrix—a tool that asks the TA every possible question. It then compares the answers of the TA with those of a hidden, pre-programmed expert map to produce a grid that indicates which questions the TA got right and wrong.

Several of Betty’s attributes were designed to encourage students to treat their TAs as social beings. For instance, a TA can draw inferences from questions, take quizzes, play games, and even comment on its own knowledge (depending on the configuration of the software). Betty’s Brain also comes with narratives and graphical elements to help support the mindset of teaching. Finally, each student can customize her TA’s appearance and give it a name, which makes her TA more personal than a sterile, generic computerized icon. In reality, students are simply programming their TAs in a high-level graphical language, and children know the computer is not really alive. Nevertheless, as we demonstrate in Study 2, students suspend disbelief enough to treat the computer as possessing knowledge and feelings (e.g., Reeves and Nass 1998; Turkle 1995).

One of a TA’s most social elements is its ability to externalize its thought processes. When a TA animates its reasoning on the screen, it literally makes its “thinking” visible. A study with 6th-graders indicated that students do learn from the TA’s overt model of causal reasoning (Schwartz et al. 2009). In one condition, students worked with their TAs to organize what they had learned from various readings, films, and hands-on activities. In another condition, students learned the same content, but worked with a commercial concept mapping program called Inspiration. Students took periodic paper and pencil tests across 3 weeks of a curriculum about global warming. Over time, the TA students increasingly outperformed the Inspiration students, and TA students demonstrated the greatest advantage on questions that required longer chains of causal inference. These results indicate that students adopted the reasoning process modeled by the TAs in Betty’s Brain.

Other studies have also found learning benefits when students work with Betty’s Brain. A 2-month study had 5th-graders learn river ecology (Wagster et al. 2007). In the Teach condition, each student taught Betty (in this study all students taught the same graphical character called Betty rather than creating their own TAs). In the Being-Taught condition, Betty’s image was replaced with a “mentor agent” named Mr. Davis. In the Being-Taught condition, students also created maps. When a student asked a question of her map, the mentor agent traced through the map (in exactly the same way that Betty did for students in the Teach condition). Thus, the primary difference between conditions was quite subtle—the mindset of teaching versus being taught. Students in the Teach condition produced more accurate concept maps. The benefits also transferred to a unit on land ecology, when the students were no longer in their respective treatments. Students who had been in the Teach condition again made better concept maps.

Overview of Studies

Given evidence of cognitive gains, the current research was designed to get a closer look at the motivational properties of TAs. The first study demonstrates the protégé effect: students are willing to work harder to learn for their TAs than for themselves, and this is especially true for low-achieving students. The second study finds that students treat their TAs as social, thinking beings. Students closely monitor and take responsibility for their TAs’ failures, which motivates them to revise their own understanding so they can teach better. Both studies were short in duration, only one to 3 h, so there was minimal expectation of finding learning differences. Instead, the research focused specifically on affective elements that may have contributed to the learning benefits found in earlier research.

In the current studies, one of Betty’s features was particularly important—the Triple-A-Challenge Gameshow. The Gameshow is an online environment where multiple TAs, each taught by a different student, can interact and compete with one another (Fig. 3). Students can log on from home to teach their TAs (by accessing the Betty software), chat with other students, and eventually have their TAs play in a game. During game play, the host poses questions of the form, “If X increases/decreases, what happens to Y?” After each question, the student wagers from 0 to 500 points, and the TA answers based on what it has been taught. Then, the host reveals the correct answer and awards points. Students normally play the Gameshow in rounds, with each round consisting of about six questions, and subsequent rounds including more difficult questions (i.e., requiring longer chains of reasoning).

The Gameshow was developed to make homework more interactive, social, and fun. In one study, Schwartz et al. (2009) found high levels of homework compliance when students used the Gameshow with TAs, and the Gameshow prepared students to learn related content in class over the next few days. In the current studies, the Gameshow was not used for homework, but was used in the classroom in Study 1, and for individual sessions in Study 2. In both studies, the manipulation was whether the character in the software represented a TA, or whether the character was an Avatar that represented the student. In the TA condition, the TAs answered the host’s questions while students wagered on their protégés. In the Avatar condition, the students answered the host’s questions and wagered on themselves.

Our predictions were simple. Students in both conditions would be engaged by the novelty of the technologies, especially in the context of school. However, the TA would yield a specific type of engagement. Students would be more motivated to learn for their protégés than for themselves. Specifically, they would spend more time reading and revising their knowledge. Furthermore, this motivation would be partially driven by the “make believe” that their TAs have thoughts and feelings and by the sense of responsibility students would develop towards their digital pupils.

Study 1: The Protégé Effect

One of the interesting benefits of new technologies is that they permit “clean tests” that are hard to match in the physical world. For example, most research that claims to have demonstrated a benefit of social interaction for learning has been confounded by the many differences between a social and non-social interaction (e.g., Kuhl et al. 2003; Moreno et al. 2001). For example, demonstrating that an individual learns more by working in a group than working alone may be attributed to the increase of information exchange and not to the fact that the individual was in a social exchange. Chi et al. (2008), recognizing this distinction, proposed that learning from social interaction may be due to the same processes involved in self-explanation (e.g., elaborating on a topic by explaining to oneself).

New technologies provide fresh possibilities for untangling these matters (Blascovich et al. 2002). For example, Okita et al. (2007) had adults interact with a graphical character in immersive virtual reality. The participants and the character discussed the biological mechanisms that sustain a fever. The interactions were covertly scripted so that each participant said and heard the same things at the same times. The experimental manipulation was simply whether the participants were told that the character was a computer agent or that the character represented a person in another room (in reality, it was always a computer program). When participants thought the character was the avatar of another person, they learned more about fever mechanisms and were able to apply their learning to new situations. They also showed higher levels of arousal as measured by skin conductance, and this arousal was correlated with how well they had learned. Even though all the information and behaviors were held constant, the mere belief of a social interaction led to better learning. More recent research (Chen et al. 2009) suggests that believing an experience is social activates the brain’s reward circuitry, which helps to cement the learning of new associations (e.g., Davachi et al. 2003).

The current study also adopts a “mere belief” manipulation. In the Okita et al. study, social was operationalized as interacting with another person versus interacting with a computer. In the current studies, social is operationalized as other versus self, or to be more precise, protégé versus self. On the first day of the study, the sole difference between conditions was whether students thought they were teaching their TAs or making concept maps for their own learning. Ideally, the results of this clean comparison will illuminate some of the mechanisms that underlie the benefits of learning-by-teaching more generally (e.g., Renkl 1995) and not just those found in this particular technology environment.

The study was designed to examine whether students would produce greater effort to learn for their TAs than for themselves. In addition to the direct comparison of treatments, a second question was whether the TA treatment would have positive effects for lower achieving students. In prior implementations of the Teachable Agent software, teachers reported that their lower achieving students seemed to benefit especially from the Teachable Agents. It is conceivable that TAs may protect the students from being wrong themselves (it was their TAs and not them who got it wrong). Moreover, the TA provides a new way to learn. Students who have not had much success with traditional approaches may find this a welcome change. In either case, it is important to gather direct evidence regarding the teachers’ observations.

In this study, 8th-grade students used Betty’s Brain over two 50-min class periods. During this time, they learned how to use the software, read about fever mechanisms, created and tested their concepts maps, chatted with each other online, and played the Gameshow. Figure 4 is a screen shot of the expert map that was used by the software to judge the TA’s or student’s knowledge (depending on condition). Students did not see this map. It is included here to show the complex interrelationships represented in the content. To learn about the mechanisms of a fever, students could access a one-page reading document through the Gameshow environment (see “Appendix” for fever passage).

Figure 5 shows the time course of the study. The key points of difference between conditions are underlined. In both conditions, students used Betty’s Brain to create concept maps. In the TA condition, the characters represented the students’ pupils, and students were told they were making and testing concept maps to help their protégés learn. In the Avatar condition, the characters represented the students themselves, and they were told to use the concept mapping activities to help themselves learn. In either case, the software was intelligent and could answer questions based on the maps the students had created. For example, students in either condition could submit their maps to a quiz feature that scored the maps on a set of questions. The difference on Day 1 was only in the cover story, and students in the TA condition did not know their TAs would be playing in a Gameshow. On Day 2, the manipulation was less subtle. All students played the Gameshow. Students in the Avatar condition answered questions for themselves, while students in the TA condition watched their TAs answer the Gameshow questions.

Methods

Participants. Sixty-two 8th-graders, drawn evenly from four different classes, participated in the study. The children attended a diverse San Francisco Bay Area middle school, composed of 35% Asian, 25% Hispanic, 22% Filipino, 11% White, and 4% African–American students. Thirty-seven percent of the students qualified for free or reduced lunch programs. All students had the same 8th-grade science teacher. Halves of each of the classes were assigned intact to treatment, so that half of two classes completed the Avatar condition and half of two classes completed the TA condition (the other class halves completed an entirely different study). Stratified random sampling of the children from each class ensured that pre-existing achievement scores were the same across the two conditions (M_Avatar = 78.5, SD = 6.5, M_TA = 78.2, SD = 8.5). Achievement was based on the cumulative score the children had earned over the prior 8 months in science class. Nevertheless, issues of intact assignment need to be kept in mind when attempting to generalize the results.

Design and Procedures. There were two conditions: TA and Avatar. In the TA condition, the graphical characters represented the students’ protégés; students used the mapping software to teach about fever mechanisms; students answered the questions themselves in the Gameshow on Day 1; and on Day 2, students’ TAs answered the questions. In the Avatar condition, the graphical characters represented the students; students used the mapping software to learn about fever mechanisms themselves; and students themselves answered the Gameshow questions on both days. Since two to four students (or TAs) played against each other at once, there were up to nine different games going at the same time within a class.

On Day 1, all students logged on to the Triple-A Gameshow system. They learned to customize their TAs, chat, access reading resources, create causal maps, ask questions of the maps, and use the quiz feature. Students received the relevant fever nodes, and their task was to link them up using the reading passage as a guide. The manipulation was given in the instructions and framing of the concept mapping software: students were either making concept maps for themselves or to teach their TAs. The last 10 min were devoted to showing students the Gameshow, how to join a game, wager, and answer Gameshow questions. At this time, all students played the game in self-answering mode.

On Day 2, all students logged on to play a preliminary game. Students in the Avatar condition continued to answer the Gameshow questions themselves. However, unlike the day before, students in the TA condition now had the questions answered for them by their TAs. After this preliminary round, all students received a brief tutorial on “best practices” for making a map, followed by 8 min of map revision time (during which they could also chat, read, and so forth). Each student then played the Gameshow against one other opponent. Afterwards, the class was given free time to prepare for and/or continue play in the Gameshow. On Day 3, all students completed a paper and pencil posttest on the mechanisms of fever.

Measures and Coding. The study included three sources of data. One was the computer-generated logging data that indicated how students used their time with the software. A second source of data was the quality of the concept maps. At the end of each day after the students were gone, each map was evaluated using automated scoring as described in the “Results” section. The final data source was the posttest, which had three levels of questions: factual, integration, and application (see “Appendix”). Factual questions asked about facts that were stated explicitly in the passage. Integration questions required integration of information across the passage. Application questions required applying the fever mechanisms to situations not discussed in the passage. Each question was scored on a 0–2 point scale for incorrect, partially correct, and fully correct answers. Two independent coders scored a minimum of 30% of the data for each question. Reliability ranged from 95 to 100% for all questions. A single coder then scored the remaining data.

Results

When students worked with the software, they could complete a number of different activities that ranged from chatting to reading to game playing. A fairly prototypical sequence of activities for the first day comes from John Doe in the TA condition. John spent the first 8 min customizing his agent and chatting with other students on-line. He then read the science passage for 3 min. He spent the next 9 min alternating between connecting the nodes in the agent’s map and referring to the reading passage. After having made headway with his agent’s map, John spent a minute formulating a question from the drop-down menus and then observed his agent’s answer. He gave his agent one of the pre-made quizzes and edited the map based on the feedback for two more minutes. For the following 9 min, he alternated between reading the passage, formulating and asking his agent questions, and editing the map based on the reading and the feedback. In the next 4 min he chatted on-line while looking for other students to play with in the gameshow. He then played the gameshow and chatted for the remaining time.

Other students followed similar patterns of moving between different activities. Some of the activities were directly relevant to learning such as reading the passage, creating the map, formulating questions, and seeking feedback and revising. Other activities were less directly relevant to learning, for example, chatting, customizing the look of the character, and playing the game. The differences between the two conditions appeared in the relative distributions of activities that were directed towards learning and those that were not. The following sections describe the differences in activity distributions, and the evidence that students in the TA condition learned more.

Effort Towards Learning. Students in the TA condition showed greater effort towards learning. Figure 6 shows how students spent their time in the software. The key difference is the greater time the TA condition spent on learning activities (working on the map or reading the passage). A repeated measures analysis crossed the factors of Day and Condition using proportion of time spent on learning activities as the dependent measure. There was a main effect for Day, with students making greater effort to learn on Day 1, F(1, 59) = 431.7, MSE = 0.008, p < 0.001. More importantly, there was a main effect of Condition, with TA students spending a greater proportion of their time learning, F(1, 59) = 21.9, MSE = 0.015, p < 0.001. There were no interactions. So despite the attractions of chatting and playing, the TA students chose to spend more time learning for their TAs.

Table 1 shows the average number of times that students engaged in different learning activities (excluding reading, which is treated below). Map Edits refers to adding, deleting, or changing a link in the concept map. Quizzes refers to how many times students submitted their maps to get scored against a set of questions. Asks refers to how often students asked their maps to answer questions they posed. Explains refers to how often students asked their maps to trace out the details of an answer in more detail. These variables were entered in a multivariate analysis with Condition as a between-subjects variable and Day as a within-subjects variable. Both Day, F(4, 56) = 26.6, p < 0.0001 and Condition, F(4, 56) = 2.7, p < 0.05 showed significant main effects. Looking at specific activities, the number of map edits and quizzes were significantly greater for the TA condition, p’s < 0.01. Thus, students in the TA condition spent more time working on the concept maps and checking those maps with a quiz. It is also worth noting that students in the Avatar condition took advantage of the intelligence of the system by using the quiz, ask, and explain features. Though both conditions appreciated the same interactive affordances, the TA students used them more.

Table 1 Frequency of different learning activities (and SE of means)

Full size table

The TA students’ extra effort towards learning was not confined to working on the map, which might be expected on Day 2 because performance in the Gameshow was contingent on the map in the TA condition. The TA students also spent nearly twice as much time studying the fever passage. Figure 7 shows the time spent reading the passage. A repeated measures analysis used Day as a within-subjects factor and Condition as a between-subjects factor with Reading Time as the dependent measure. Students in the TA condition read longer, F(1, 59) = 10.9, MSE = 17.5, p < 0.005. Students in both conditions read more on Day 1, F(1, 59) = 213.1, MSE = 9.8, p < 0.001. There was also an interaction, F(1, 59) = 9.2, p < 0.005, which indicates that the TA students showed the greatest reading difference on Day 1, even before they knew there was a performance venue for their TAs (i.e., the Gameshow). The mere belief of teaching a TA led to greater effort towards learning.

Effects on Learning. Given the extra effort towards learning, the next question is whether it led to better learning, as measured by the posttest. Based on prior research (Schwartz et al. 2009), we did not expect differences on the basic fact questions. Rather, differences, if any, would show up on the harder integration and applications questions that required reasoning through causal chains. A second question was whether there would be a condition by prior achievement interaction. To get the most precise data possible, we removed five students who did not complete the full implementation. One student was not present on all 3 days of the study. Four students did not complete any questions on the posttest (fortuitously, they were distributed equally across condition and achievement level).

A repeated measures analysis crossed Question Type with Condition, and used prior Achievement as a covariate crossed with the other two factors. There was a Condition by Question Type interaction with the largest TA advantage on the harder problems, F(2, 102) = 3.8, MSE = 0.5, p < 0.05. There was also an Achievement by Condition by Question Type interaction, F(2, 102) = 4.2, p < 0.05. Figure 8 shows the average scores on each of the Question Types by Condition. It indicates the effect of Achievement by breaking it into a high and low variable (using the median of all the students as the break point), instead of a continuous variable as used in the statistical analyses. One way to interpret the complex interaction is to compare the low-achieving TA students with the high-achieving Avatar students. As the questions become more complex, going left to right, the low-achieving TA students catch up with the high-achieving Avatar students.

In-game Correlates of Achievement Effects on Learning. Given the positive effects of the TA condition for the low-achieving students, we examined the log files to see if there was an identifiable activity that contributed to the effect. A multivariate analysis used Condition, Day, and Achievement (high-low on a median split) as crossed factors with the frequencies of the various learning activities as the dependent measures. The only variable to exhibit a significant Condition by Achievement interaction was the time spent editing the maps; F(1, 56) = 5.3, MSE = 52.8, p < 0.05. Figure 9 shows that the low-achieving TA students took advantage of the map editing feature much more than the low-achieving Avatar students. They were working harder to get their maps just right.

One potential concern is that the low-achieving students in the TA condition may have just been rapidly adding and deleting links in a trial and error fashion rather than in a thoughtful way. An analysis of the students’ concept maps indicates this was not the case. The maps were scored automatically against the expert map. Figure 2b, which shows the All-Possible-Questions (APQ) matrix, helps clarify how the scoring was completed. The APQ matrix indicates the agent’s accuracy on all possible questions of the form, “If X increases what happens to Y?” where X and Y are nodes from the expert map. From this matrix, we derived an APQ index, which is the percentage of correct answers to questions that relate two nodes with a traceable path in between. The APQ index naturally weights more central nodes in the concept map because they are involved in more questions.

Compared to other measures of system use, the APQ index was the best correlate of posttest performance; APQ_Day1 by posttest r = 0.46, and APQ_Day2 by posttest r = 0.37, p’s < 0.01. The most telling data compare the APQs for the low-achieving students from the two conditions, as shown in right-hand panel of Fig. 10. Compared to the maps of the low-achieving students in the Avatar condition, the maps of the low-achieving students in the TA condition were twice as good on Day 1 (M _TA = 18. 3, M _Avatar = 9.5), and three times as good on Day 2 (M _TA = 28.5, M _Avatar = 9.9). This indicates that the low-achieving students in the TA condition were not just changing their maps arbitrarily. Rather, they were putting in the effort to make their maps better, and they were succeeding.

Discussion

Study 1 was designed to determine whether students would make greater effort towards learning for their TAs than they would for themselves. On the first day, the TA students were told they were instructing their Teachable Agents, whereas the Avatar students were told they were making concept maps to help themselves learn. They used identical software, and the only difference was their belief state. The differences in the effort towards learning on the first day testify to the power of protégés to influence learning behaviors. Students had attractive alternatives to reading and map editing, namely, the opportunity to chat online and play a game with other students. Furthermore, on Day 1, performance in the Gameshow was not contingent on the maps for either condition. Nevertheless, students in the TA condition spent more time editing their maps and quizzing them, and they spent nearly twice as long reading the fever passage as students in the Avatar condition. Instead, Avatar students spent proportionately more time using the chat feature and playing the Gameshow.

On the second day, students in the TA condition saw their TAs play in the Gameshow, whereas students in the Avatar condition played the game themselves. Again, the TA students spent more time working on their maps, as would be expected, because their TAs had to have accurate maps to do well in the Gameshow. Interestingly, this was especially true for the low-achieving students in the TA condition, who spent much more time improving their maps than the low-achieving students in the Avatar condition. These differences led to relative gains in learning as measured by the posttest. Students in the TA condition did better on the harder questions, and this was especially noticeable for the low-achieving students. On the hard application questions, they performed as well as the high-achieving Avatar students.

It is useful to note that the motivational differences between the conditions should not be attributed to students having “more fun”. Students in both conditions enjoyed chatting and playing the Gameshow, and it is hard to imagine that reading would be more “fun” in this context. On a set of moment-to-moment measures of engagement, not reported here for the sake of simplicity, there were few reliable differences between the conditions. Rather, students in the TA condition were motivated to put greater effort towards learning. This seems like a useful motivational target for designers of educational games, where students often just want to play.

An important question is how to sustain these motivational benefits for months and not just days. One can imagine that the fiction of teaching an agent might lose its luster, and students could stop working so hard to learn on its behalf. One way to address this question is to imagine what would happen if protégés were put into games that included several motivational elements such as rich narratives, clear goals, and incremental challenges. We hypothesize that these motivators would spill over to help sustain the teaching metaphor. For example, students would be energized to learn so they could help their protégés advance to the next level in the game, perhaps even more so than if they were playing only for themselves.

Study 2: Psychological Concomitants of the Protégé Effect

Study 1 demonstrated the protégé effect: students put forth greater effort to learn for their TAs than for themselves. However, the behavioral and learning data collected in Study 1 do not shed light on the underlying mechanisms of this effect. To uncover possible causes, participants in Study 2 were asked to think aloud, externalizing their thoughts and emotions, while they worked with either a TA or an avatar. These data begin to uncover the psychological machinery behind the TA students’ increased motivation to learn.

Study 2 had a similar design as the first; half of the students were in a TA condition and half were in an Avatar condition. 5th-grade children received the same fever passage and an identical set of nodes to connect within the concept map. Students were videotaped as they worked for ~1 h. The children were encouraged to think aloud, and their protocols were transcribed and coded. Analysis of the data focused on three primary questions. The first question was whether there would be a replication of the protégé effect, where students make greater effort to learn for their TAs than for themselves. The other two questions focused on the psychological mechanisms behind the protégé effect.

The first psychological question was whether the students would treat their TAs as independent, sentient beings. For example, would they talk about their TAs’ thoughts? Would they distribute responsibility for performance in the Gameshow across themselves and their TAs? If so, this would indicate that students treated the TA as a protégé, because its behavior was partially due to themselves but partially independent. This could create a sense of responsibility that would lead students to try harder for their TAs than for themselves. For example, previous research with Betty’s Brain documented anecdotal evidence of students feeling responsibility towards their TAs (Biswas et al. 2005).

The second psychological question was how students would respond to failure with the TA as a mediator. This was especially relevant to the positive effects found for the low-achieving students in the preceding study. In a performance situation, students with self-perceived low ability often avoid difficult learning tasks or give up quickly, because they are afraid of failure (Elliot and Dweck 1988). More generally, sustained experiences of personal failure may lead students to opt out, losing interest altogether for certain learning activities. The TA, however, creates a situation in which responsibility for failure is distributed across teacher and pupil. Instead of blaming their own knowledge and abilities, students may fault their TA or their poor teaching. This may allow them to both acknowledge failures and address them by working harder to learn.