Preparing students for future learning with Teachable Agents

Chin, Doris B.; Dohmen, Ilsa M.; Cheng, Britte H.; Oppezzo, Marily A.; Chase, Catherine C.; Schwartz, Daniel L.

doi:10.1007/s11423-010-9154-5

Preparing students for future learning with Teachable Agents

Research Article
Published: 05 March 2010

Volume 58, pages 649–669, (2010)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Educational Technology Research and Development Aims and scope Submit manuscript

Preparing students for future learning with Teachable Agents

Download PDF

Doris B. Chin¹,
Ilsa M. Dohmen¹,
Britte H. Cheng³,
Marily A. Oppezzo²,
Catherine C. Chase² &
…
Daniel L. Schwartz²

1822 Accesses
63 Citations
3 Altmetric
Explore all metrics

Abstract

One valuable goal of instructional technologies in K-12 education is to prepare students for future learning. Two classroom studies examined whether Teachable Agents (TA) achieves this goal. TA is an instructional technology that draws on the social metaphor of teaching a computer agent to help students learn. Students teach their agent by creating concept maps. Artificial intelligence enables TA to use the concept maps to answer questions, thereby providing interactivity, a model of thinking, and feedback. Elementary schoolchildren learning science with TA exhibited “added-value” learning that did not adversely affect the “basic-value” they gained from their regular curriculum, despite trade-offs in instructional time. Moreover, TA prepared students to learn new science content from their regular lessons, even when they were no longer using the software.

Artificial intelligence in K-12 education

Article 14 July 2022

Designing for Complementarity: Teacher and Student Needs for Orchestration Support in AI-Enhanced Classrooms

Investigating the Effect of Meta-cognitive Scaffolding for Learning by Teaching

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Adding value with technology

If asked, many parents and educators would agree that incorporating technology into the curriculum is a good idea for schools. However, given the costs, there are concerns that computer technologies may fail to bring “added-value” to student learning, or worse, they may displace curricula that once provided “basic-value” (Clarke and Dede 2009). A second concern is that technologies may over-scaffold student learning, such that students do not learn to perform basic procedures on their own. Consider, for instance, the debates over whether students should be allowed to use hand-held calculators in school (Ellington 2003), and whether word-processing programs and spell-checkers have degraded writing skills (Galletta et al. 2005).

One way to differentiate whether students have benefited from versus become dependent on a technology is to examine whether they are better prepared to continue learning once the technology disappears. For example, Bransford and Schwartz (1999) proposed an approach to assessment called “preparation for future learning” (PFL). A PFL assessment examines how well students learn given subsequent instruction or informational resources. In the context of evaluating whether a learning technology has been a useful scaffold, a PFL assessment would examine students’ abilities to learn once the technology is removed. In the positive case, students who once used the technology would be more prepared to learn than students who had never used it. In the negative case, students who used the technology would not learn as well once it was removed.

In the current research, we describe a technology called Teachable Agents (TA) that was developed, in part, to add value to paper-and-pencil concept mapping by providing learners with automated feedback. We also explain the design rationale behind the TA. We then present a pair of added-value studies that included PFL assessments to see what new learning benefits TA might add. The first study compared TA with a more traditional concept mapping program. TA led to superior learning of causal relations, and it better prepared students to learn from a subsequent reading. The second study compared student learning from a well-established, kit-based science curriculum with and without the addition of TA. The teachers were free to implement TA as they chose. TA added value to instruction by improving student learning of causal relations without reducing the basic value provided by the science kits. TA also prepared students to learn more deeply from a subsequent month of instruction on a completely new topic when the students were no longer using the technology. We conclude by considering the source of this effect, and the possibilities of using PFL assessments for other technologies including software games.

Teachable Agents

Two paths to added-value

Concept maps are graphical representations of a person’s topical understanding. The maps consist of labeled nodes and links that represent a web of propositions (Novak 2002; Novak and Gowin 1984). Concept maps have proven to be a useful paper-and-pencil technology for improving knowledge retention and integration (for reviews, see Hilbert and Renkl 2008; Horton et al. 1993; Nesbit and Adesope 2006; O’Donnell et al. 2002). How might technology add value to concept mapping?

One approach is the development of productivity tools that capitalize on the computer’s capacities for editing, organizing, storing, sharing, and printing. Inspiration^® is an example of a concept mapping program used widely in schools (www.inspiration.com). It contains a simple interface for structured map-making and a suite of productivity tools, including automated untangling of concept maps and the ability to incorporate images and hyperlinks for nodes.

A second approach is to further recruit the computer’s potential for generating interactive feedback for learners. We have taken this latter approach in creating Teachable Agents (TA). Students learn by teaching a computer character. The students create the concept map that is the character’s “brain,” and they receive feedback based on how well their computerized pupil can answer questions.

Interactivity with Teachable Agents

Figure 1a shows the main TA teaching interface. Students teach their agent by adding nodes and links using the “Teach” buttons. To add a concept, students click on “Teach Concept,” which produces a textbox in which they enter the name of the node. To create a link, students click on “Teach Link” and draw a line connecting two nodes. Next, the palette in Fig. 1b appears, and students use the palette to name the link. They must also specify the type of link, which can be “causal,” “type-of,” or “descriptive.” If students choose a “causal” link, they must further specify whether an increase to the first node causes an increase or decrease to the second node (e.g., landfills increase methane). In the following studies, these causal links are of particular importance, because they were the main source of feedback.

To provide feedback and enhance the teaching metaphor, TA comes with a qualitative reasoning engine (see Forbus 1984; Jackson et al. 1998). The engine uses path traversal algorithms that enable the agent to reason through causal chains in the concept map (Biswas et al. 2005). For example, Fig. 1c shows the palette by which students can ask their agent a question. In this example, the student has asked the agent, “If ‘methane’ increases, what happens to ‘heat radiation’?” Figure 1a shows how the agent highlights successive nodes and links in the concept map to illuminate the chain of inference it uses to answer the question. In this case, the agent has reasoned that an increase in methane decreases heat radiation. It did so by following the path that methane is a type of greenhouse gas; greenhouse gas is a type of insulation; and an increase in insulation decreases heat radiation. The agent has also described this chain of inference in the lower text panel of Fig. 1a. In this manner, students can trace their agent’s thinking, both as a model of causal reasoning, but also as a way to see if the agent has learned what they think they taught it.

A second source of interactive feedback compares agent answers against a hidden expert map entered by the teacher. Students can submit their agent for testing by clicking on the “Quiz” button (Fig. 1a, lower left corner). Questions in the quiz can be seeded by the teacher or generated automatically. The agent’s answers are compared to the answers produced by an expert map and students get feedback on how their agent did. The TA’s lower panel displays the list of quiz questions and indicates which ones the agent answered correctly. For incorrect answers, the system does not provide the student the correct answers, but instead gives more elaborated feedback and hints, for example, “A link or more is missing from your map. The Resources is a good place for more information.”

The automated scoring of the concept map creates additional possibilities for feedback. Figure 2a shows the All-Possible-Questions matrix which tests the agent on every possible question for a given map. The color-coded grid structure provides students quick, comprehensive feedback on how their agent is doing: green for correct answers and red for incorrect. Importantly, the yellow cells indicate where an agent gave the right answer but for the wrong reason. That is, the system detects that the agent has missing or incorrect links, but still happens to give a correct answer for a particular question. Figure 2b shows the Front-Of-Class software designed to provide formative feedback for class discussion. The teacher can use this software with a projector and screen to show multiple agent maps at the front of the room. The teacher can simultaneously ask all the agents the same question. This display also uses the red, green, and yellow highlighting to indicate how each agent did on the question, which helps the teacher identify problem areas. The teacher can also “zoom in” on an agent to animate its reasoning for additional class discussion. Compared to the clickers used in many college classrooms, the Front-of-the-Class software provides a new model for class level formative feedback and discussion (Burnstein and Lederman 2001; Judson and Sawada 2002).

Figure 2c shows another application of interactive feedback. It is a screenshot of an Internet homework system called the Game Show. Students can log on from home or school to teach their agents, chat with other students on-line, and have their agents participate in an on-line game with other students’ agents. During the game, a host asks agents to answer questions on the material. Students have a brief moment after each question to decide how much to “wager” on their agent, before it gives an answer. The wagering feature was designed as a prompt for students to reflect on how their agent would answer questions, thus reflecting on their own teaching and learning. Further details about these features may be found in Schwartz et al. (2009) and the software is available by contacting the authors.

The teaching metaphor

Before describing the two added-value studies, we explain the rationale for the metaphor of teaching an agent. TA belongs to a class of instructional technologies called pedagogical agents, where students interact with a graphical character. Unlike other pedagogical agents, which primarily play the role of coach or peer (see Baylor 2007), TA takes the role of pupil. Why did we include the fiction of teaching a character, given that the interactive feedback does not require it?

One reason is that the teaching metaphor allows students to use the familiar teach-test-remediate schema for self-organizing their interactions and interpreting feedback. In a typical “teaching session,” students first read resources or complete other relevant learning activities. They then teach their agent a few nodes and links based on what they have learned. They ask their agent questions and have it take a quiz. If the agent does well, they add more nodes and links. If the agent does poorly, they use available resources to check their own understanding and then make changes to the map.

Of course, there is also the potential for a less effective learning scenario. Students may use trial and error until the agent gives a correct answer to a quiz question. The potential for trial and error is one reason to examine preparation for future learning. It is possible that students overuse the interactivity and feedback to stumble into correct concept maps without actually learning anything useful.

A second reason for the teaching metaphor is to capitalize on the growing research base that generally shows positive results from learning-by-teaching (Annis 1983; Biswas et al. 2005; Renkl 1995; Roscoe and Chi 2008). For example, people learn better when they prepare to teach someone who is about to take a test, compared to when they prepare to take the test themselves (Bargh and Schul 1980; Biswas et al. 2001). They try harder to organize their understanding for the task of teaching another person than they do for themselves (Martin and Schwartz 2009). In the context of technology, the teaching metaphor can enlist fruitful social attitudes during interaction, including a sense of responsibility for one’s pupil. For example, Chase et al. (2009) had students use identical TA software. In both conditions, students designed the graphical look of their character; they created the concept map; and they used the interactive feedback. The difference was that in one condition, students were told the character was an agent they were teaching, and in the other condition, students were told the character represented them. Students who thought they were teaching engaged in more learning relevant behaviors on behalf of their agent and demonstrated deeper learning at posttest.

A third reason for using the teaching metaphor involves metacognition (see Hacker et al. 2009). As the TA visibly reasons through its concept map, students can reflect on the structure of their agent’s reasoning. Students are applying metacognition, but in this case, the metacognition is about their agent’s thinking rather than their own. TA is specifically designed to highlight chains of qualitative causal reasoning, for example, that an increase in cars can cause an increase in flooding through the intermediary causes of atmospheric change and global warming. Ideally, metacognition about their agent’s causal reasoning improves students’ own abilities to think with and learn about causal chains. The current research examines the hypothesis that TA improves students’ abilities to learn causal relations in science, both when using the software and afterwards, once the TA is removed.

Study 1: The added-value of interactivity

Prior work has compared variations of the TA system (Biswas et al. 2005). In the studies described here, rather than trying to isolate variables within our own technology, we compared the TA system to other instructional approaches. In Study 1, two classes of 6th-grade students learned about global warming over the course of 3 weeks. They received matched curriculum and lessons. The difference was whether they organized what they learned using TA or the concept mapping program, Inspiration.

One research goal was to examine what type of learning TA produced. We do not intend to claim that TA is better than Inspiration, which has its own strengths as a productivity tool. Rather, we wanted to investigate the hypothesis that TA would help children learn to think through chains of causal reasoning. To find out, we assessed students at regular intervals during the global warming unit on how well they reasoned about causal relations.

A second goal was to gather initial evidence on whether TA prepared students for learning new content once the technology was removed. After completing the treatments, the two groups of students were given an opportunity to learn a related topic, but without support from the technologies. This PFL assessment did not involve the far transfer of learning completely new content, which is examined in Study 2. Instead, students had to integrate new content that was relevant to their previous lessons.

Methods

Participants

Two 6th-grade classes from a high SES school with the same science teacher participated. All students had broadband access at home, and students in both classes had previously used Inspiration. Logistical constraints required that the two classes be randomly assigned intact to either the TA condition (n = 28) or the Inspiration condition (n = 30). The principal reported that the school matched classes on ability, but we did not have access to measures of prior achievement. Instead, we administered a pre-test on the first day of the study, before any instruction was given.

Procedures

Students completed a three-unit course on the mechanisms, causes, and effects of global warming. The course supplemented a short section in the school’s 6th-grade science textbook. Instruction consisted of 11 lessons over a period of 3 weeks. For each unit, both classes completed learning activities that included readings, videos, hands-on experiments, and classroom discussion. To ensure consistent, matched instruction, the researchers taught both classes throughout. After each basic unit, students either worked with Inspiration or taught their agents in the TA system. In both conditions, students made causal links among pre-determined nodes to help them organize the content from the other instructional activities, and they received homework assignments to further edit their concept maps. Figure 3 shows the ideal final map partitioned to indicate the nodes introduced in each unit.

Design

The main comparison was between the effects of Inspiration versus TA on causal reasoning. In the TA condition, various feedback features of the technology were introduced across the units. For Unit 1 (Mechanisms), students used the Quiz feature as they made their initial maps. For Unit 2 (Causes), students used the Quiz feature as they incorporated the new nodes for this unit into their global warming map, and the teacher used the Front-of-the-Class display to lead a discussion, after which students could revise their maps. For Unit 3 (Effects), the students updated their maps with the new nodes and played the Game Show in class and at home.

We tried to match each feature for the Inspiration condition. For example, when the quiz feature was enabled for the TA students, the Inspiration students had an identical paper-and-pencil version of the quiz for themselves. When the instructor led map-based discussions, she used the Front-of-Class display with TA students and used Powerpoint slides of student maps with Inspiration students. For the Game Show, the TA students wagered on their agents answering the questions, and the Inspiration students played a modified version of the Game Show in which they answered the questions themselves (using a pull-down menu to indicate increase, decrease, or no change), and they also wagered on their own answers.

In addition to the between-subjects factor of Inspiration versus TA, there were two within-subject factors, time of assessment and length of causal inference required to answer questions. Over time, the students were given four assessments, a pre-test with 24 short-answer, paper-and-pencil questions from across the curriculum, and three end-of-unit tests that included eight short-answer, paper-and-pencil questions. Each test included questions at three levels of complexity: short, medium, and long chains of causal inference. This created a design of two conditions (TA, Inspiration) × 4 tests (days 1, 5, 7, 11) × 3 levels of question complexity (short, medium, long). Length of causal inference was determined by how many causal steps were needed to explain the correct answer (e.g., number of links in the expert map). Short chains were between one to two causal steps, medium chains were three steps, and long chains were four steps or greater. (Instructional materials and tests are available upon request.) Example questions include:

Short Chain: What does insulation do?
Medium Chain: How would global warming affect the rate of plant and animal extinction?
Long Chain: Explain why the number of cars in America may influence the number of floods around the world.

In the final 40 min of the study, students also completed a PFL assessment. They saw a short video about things that individuals and communities have done to reduce global warming. Students then received a one-page text that described things they could do to help prevent global warming. They were given four starter nodes, and their assignment was to construct a paper-and-pencil concept map of the text passage. Concept maps are often used to assess student understanding (e.g., Ruiz-Primo and Shavelson 1996; Taricani and Clariana 2006). In this case, the question was whether there would be differences in how well students integrated this new content into their representation of the topic.

Results

Causal understanding

Students’ answers to the causal questions were scored on how well they explained the causal chain of inference: 0 points (incorrect or no answer), ½ point (partially correct answer), or 1 point (correct answer). Below are sample answers and scores for the question “Explain why the number of cars in America may influence the number of floods around the world”:

0 points: “It uses up gas.”
½ point: “Cars give off CO2 which makes it hot and creates floods.”
1 point: “The cars will burn fossil fuels which will produce carbon dioxide which will join the atmosphere which will heat the earth up which will melt the glaciers which will increase the sea level which will increase floods.”

Inter-rater reliability was determined by having two separate coders score 20% of the tests at random. Pearson correlations between the coders ranged from .90 to .92 across the tests. Cronbach’s alpha for reliability across tests was .79. The following analyses used each student’s mean score for the short, medium, and long chain questions for each assessment, yielding 12 data points per student (3 problem types by 4 assessment times).

Figure 4 shows the average score per question broken out by condition, time of test, and the length of inferential chain needed to answer the question. At pretest and after the first instructional unit, the two groups are similar. After the second unit, the TA students show an advantage for the medium-length inferences. By the final unit, the TA students show an advantage for short, medium, and long inferences. Our interpretation of this pattern is that the TA students were getting progressively better at reasoning about longer and longer chains of inference in the context of global warming. The following provides the relevant statistics.

To rule out pre-existing differences, we first submitted the pretest data to a repeated-measures analysis of variance crossing the between-subjects factor of condition by the within-subject factor of inference length. The conditions were not significantly different; F _{(1, 56)} = 1.5, p > .2.

To test the effect of treatment, we conducted a 2 × 4 × 3 analysis of variance with the between-subjects factor of condition crossed by the within-subject factors of time (four time points) and inference length (short, medium, long). Only students present at all test points were included (TA n = 26, Inspiration n = 27).

All three factors showed main effects, which should be interpreted in light of significant interactions. There was a main effect for time, indicating that students improved; F _{(3, 49)} = 76.4, p < .001. There was a main effect of inference length, indicating that the separation of questions into short, medium, and long chain inferences correctly reflected problem difficulty; F _{(2, 50)} = 77.5, p < .001. And finally, there was a main effect for condition, indicating that the TA system led to superior performance; F _{(1, 51)} = 4.2, p < .05.

The 2-way interactions clarify the TA effect. TA students improved more over time than the Inspiration students; F _(3,49) = 3.1, p < .05. There was also a two-way interaction of condition by inference length, indicating the TA students did relatively better on longer causal chains; F _{(2, 50)} = 4.2, p < .05. Finally, there was a time by inference length interaction indicating that students in both groups did progressively worse on short-chain inferences and progressively better on long-chain inferences; F _{(6, 46)} = 12.9, p < .001. Our best explanation for the drop in short-chain performance is that we inadvertently made the short-chain questions more difficult in the later assessments. The three-way interaction, condition by time by inference length, was not significant; F _{(6, 46)} = 1.0, p > .4.

The best estimate of effect size comes from the final unit test, because this occurred after the full course of the two treatments. A separate analysis of variance crossed treatment by question type for this final unit test. The effect size of the TA treatment over Inspiration is d = .52; F _{(1, 51)} = 13.6, p < .001.

Preparation for future learning

During the last session, students constructed paper-and-pencil concept maps on their own, given a new, one-page text passage on the prevention of global warming. These PFL maps were coded for (a) total number of concepts included, (b) number of concepts from the passage, and (c) number of passage concepts integrated with valid causal paths. Two raters coded a subset (20%) of the maps, resulting in one coding disagreement. A primary rater then coded the remaining maps. Figure 5 shows a sample student map and the coding scheme.

Students in both conditions added roughly four concepts to the starter nodes provided. The TA students included an average of 3.2 concepts from the passage compared to 1.3 for Inspiration students; t ₍₄₉₎ = 4.2, SE = .43, p < .001, d = .61. Additionally, the concepts added by the TA students were better integrated, with more correct causal paths. The TA condition showed twice as many appropriately linked nodes (2.5) compared to the Inspiration condition (1.2), t ₍₄₉₎ = 3.2, SE = .43, p < .01, d = .45. Overall, the students’ paper-and-pencil concept maps indicated that the TA condition better prepared students to develop an integrated understanding of the reading passage, even when they were no longer using the interactive technology as support.

Discussion

Over 3 weeks, two classes of students worked with either TA or Inspiration. Students received identical information about global warming delivered in identical ways. The difference was how they used technology to organize and receive feedback about the ideas they learned. The Inspiration condition used a productivity-focused tool, and feedback was necessarily provided outside the tool. The TA condition used the social metaphor of teaching to organize computer interactions, and provided automated feedback to students through the lens of their agent’s understanding.

Early in the intervention, both treatments exhibited similar levels of understanding, and both groups did much better with inference questions involving shorter causal chains. Over the course of instruction, TA students demonstrated relative gains in their abilities to draw inferences through longer causal chains in the context of global warming. This makes sense because the TA’s organization and reification of knowledge portrays reasoning through causal chains.

The PFL assessment results suggest that students adopted their agent’s reasoning patterns and ways of organizing knowledge. On this assessment, students from both conditions received an identical learning task: integrate new content from a text passage without technological support. The TA students causally integrated more passage-relevant concepts in their paper-and-pencil concept maps. The greater number of integrated nodes in the TA condition indicates that TA students had connected the concepts into potential chains of inference.

Based on this study, the PFL effect could be the result of the TA students having a better grasp of global warming from the prior units of instruction, or it could be that the students had a better grasp of causal integration and used it to make sense of the new material. The next study examines this question more closely by seeing if TA prepared students to learn new content that was topically unrelated to what they had studied with their agents.

Study 2: Added-value to a standard curriculum

Study 1 was of relatively short duration and was taught under the strict edicts of the research design; additionally, researchers took the lead instructional roles and used specially-created content. The study demonstrated that TA is particularly useful for developing an integrated understanding of causal chains. Study 2 was designed to see how TA would fare in a more complex ecology of instruction, in which school teachers used TA to complement their regular curriculum. Six 5th-grade teachers integrated TA into their district-adopted science-kit curriculum as they saw fit, over a period of several months. We were interested in three questions. First, would TA produce added-value gains, as evidenced by improved student performance on researcher-designed measures of causal reasoning? Second, would there be a change in basic-value as measured by the curriculum’s own assessments? And, third, once the TA technology was withdrawn, would the students be more prepared to learn from their standard curriculum on a new and unrelated science topic?

The experiment used a cross-over design. Three teachers used TA for a science kit on biological systems, and then stopped using TA for the subsequent kit on earth science. The other three teachers worked without TA for the biology kit, but then did use TA for the subsequent earth science unit. Our prediction was that students who first used TA to learn about biology would learn to think in terms of causal chains. This causal thinking would benefit their subsequent learning of the non-overlapping content in the earth science unit, even though they were no longer using TA.