Introduction

Word problems that pertain to motion entail a complex use of verbs. There is often a tension between the temporality of the word problem, found in the conjugation of verbs and in the spatiotemporal unfolding of the depicted event, and the logical inferences entailed in solving the problem. Diagrammatic and gestural semiotic resources can facilitate problem solving in such instances, but being embodied actions, they also introduce new dimensions into this tension. A diagram is a still image, and a gesture unfolds in time, often without leaving a trace. In this paper, we focus on how diagrams and gestures are taken up by teachers and students as they engage with word problems about motion. We discuss data from a research project examining the social semiotics of classroom interaction, and we focus on the following research questions:

  1. 1.

    How are temporality and logical inference related in word problems about motion?

  2. 2.

    How do students and teachers jointly operationalize diagrams, gestures, and language as they grapple with complex word problems consisting of multiple verb forms?

We first discuss verb function in word problems about motion. We introduce the concept of “aspect” as a strategic means by which students can situate themselves within the temporal unfolding of the depicted event. Aspect is both a grammatical term referring to a verb tense and an embodied posture through which one engages with an event. One occupies an aspect only provisionally, shifting perspective as one changes one’s aspect. Occupying various embodied perspectives on an event corresponds to verb aspect and is a way to enhance understanding of the logical relations between the moving parts of the event. We suggest that the concept of aspect allows one to study the moving mathematical now of word problems that pertain to motion. Our findings reveal that diagrams and gestures play a crucial role in student and teacher engagement with such problems, operating as visual-haptic devices for exploring the problem from different angles. We argue that the concept of aspect sheds light on how students and teachers engage with word problems multimodally. In this paper, we discuss how three middle school teachers engaged with a nonroutine word problem, called drenched in time, a problem in which various temporal conjugations of verbs carried or conveyed the logical relations of the problem, using, for instance, conditionals, subjunctives, and past, present, and future tenses. We analyze data collected from the teachers’ lesson study group and from their classrooms.

The grammar of word problems

The mathematics word problem, according to Gerofsky (1996), is a “linguistic genre” exhibiting particular structural characteristics that reveal its derivation from routine arithmetic or algebraic tasks. Lave (1992) refers to this structure as a “school mathematics genre.” The common three-part structure involves (1) the listing of given entities and constraints, corresponding to implicit logical and quantitative relationships; (2) the description of a series of actions corresponding to implicit arithmetic or algebraic operations; and (3) the announcement of a goal corresponding to an implicit algebraic solution. On the surface of it, this structure has nothing to do with stories, despite the fact that these problems are often termed “story problems.” Moreover, the tacit structure of the word problem points to the presumed and preferred alphanumeric approach to solving it. Gerofsky (1996) also points out that students are meant to pretend that the described situation exists, and they must decode the tense and verb forms in terms of this imaginary spatiotemporal event. This is especially true when dealing with word problems pertaining to events in which motion is central, often considered the archetype of the word problem. Consider, for instance, the following routine word problem from middle school:

A truck leaves town at 10:00 a.m. travelling at 90 km/h. A car leaves town at 11:00 a.m. travelling at 110 km/h in the same direction as the truck. At about what time will the car pass the truck? (Gerofsky 1996, p.40)

This word problem uses various verb forms. It opens with the simple present (“A truck leaves”) and then shifts to the present progressive “travelling at 90 km/h.” The first verb form refers here to the completion of a single completed action, while the second one refers to an event of limited duration taking place at the present time. But, there are other layers of discursive complexity when one considers the word problem itself as an utterance or text. The verb form of the event (the travelling car and truck) is different from the verb form of the utterance (the word problem as a speech act), and also different from the action that the student is meant to make (solve the problem). In other words, there are at least three events—that which is depicted in the problem, that which is enacted in reading the problem, and that which is affected through the imperative to solve the problem. It is important to keep all three of these in mind as we consider student and teacher engagement with such problems. Each of these events entails various kinds of activities, and each of these activities is associated with various kinds of verbs. Both student and teacher need to negotiate the way these different verb forms operate according to different temporalities.

The task of making sense of such word problems is compounded by the fact that students are expected to decode verbs like travelling, running, sliding, climbing, or any other verb that refers to a material process, in terms of the verbs “to be” and “to have” which dominate the written language of mathematics, and also in terms of operative processes like multiplication, subtraction, etc. The verbs to be and to have refer to relational processes rather than material processes. Travelling trucks and other material processes are ultimately encoded in terms of equations that show the relationships in atemporal terms of equality or attribution. This usually emerges in statements such as “This X is that Y” or “X has Y as an attribute.” In the word problem above, for instance, one would encode the information as the following: The distance travelled by the truck at 90 km/h for t hours will be equivalent to the distance travelled by the car at 110 km/h for the same number of hours less 1. The awkwardness of the grammar points to how powerful the algebraic notation is when attempting to solve such a problem: We can use 90 t = 110 (t-1) where the equal sign stands in for the conjugated verb to be (“will be”).

Word problems address and position the reader in various spatiotemporal locations. As a question or prompt for students, the problem addresses and positions the problem solver at various space-time locations or instants in relation to the unfolding event depicted. Motion is an event that endures through time, and the problem solver is often asked to locate herself or himself inside the temporal flow at some particular instant (imaginary or real). In the case above, the problem solver is first addressed in such a way that they are temporally positioned before the incident of the car passing the truck (“what time will the car….”), and yet, the mathematical method of equating the two distances travelled entails positioning oneself at or after this incident. Of course, there is also the need to take up a birds-eye perspective on the temporal unfolding of the event, in order to “see” the entirety of the event, perhaps allowing the student to dislocate from the event altogether and situate the event in an absolute space-time frame of reference.

The complexity of decoding these word problems relates to three facets of verb forms: tense, aspect, and mood. These three facets are frequently conflated in English language grammar, but it is worth unraveling them. The tense of a verb functions as a time reference for the occurrence of an event and might designate past, present, or future, or modifications of each. The mood of a verb refers to how it functions as imperative, indicative, conditional, subjunctive, etc. As we will see below, it is often through the mood that the verb structures the logical relations of the word problem. For instance, the past tense combined with the subjunctive mood is often used in the question at the end of a word problem, as in “If they were moving backwards…what would happen?”

The concept of aspect broadens the concept of tense, so as to better recognize the felt experience of time. Aspect conveys duration, completion, or frequency of an event (Bhattacharya and Hidam 2011), amplifying the ways in which the past, present, and future are experienced. The aspect of a verb defines its temporal flow (or lack thereof) and the location of a vantage point for making sense of this flow. Choice of aspect for a verb is often between perfective, which expresses an external perspective on the completed event, and imperfective, which expresses an internal perspective. Both of these can be further elaborated into different aspects on time. Imperfective aspect entails attending to the internal temporal structure of the event and may demand either more than one aspect or a moving embedded aspect. For instance, although the following are all present tense, in that the speaker “I” is referring to their current state of being, each statement conveys a radically different position in relation to the activity: “I sing” “I am singing” “I have sung” “I have been singing” “I would have been singing.” Each of these uses the present tense (either the present of to sing or the present of to be or the present of to have), but each conveys very different relations between the claim and the present moment the claim is made. In other words, aspect is that which complicates the way in which an event traverses the present moment, or its durational quality. Aspect thus positions the speaker in relation to the start and stop of durational spatiotemporal processes.

Etymologically, the term aspect derives from the latin word for “the way it looks,” pointing to the way in which language use intersects with visualization skills. If mathematics word problems involve this strange use of tense and aspect in referencing a spatial-temporal world, then how might diagram and gesture offer a powerful way for making sense of tense and aspect? How might these other semiotic resources and modalities be an effective way of engaging with the temporal flow of events as presented in mathematics word problems?

Research on word problems about motion

Research on middle school word problems about motion tends to focus on the extent to which students are able to construct and make sense of Cartesian graphs. For instance, Bremigan (2001) and Armstrong and Sinclair (2011) have shown how dynamic software like Geometers Sketchpad can be effective in developing student understanding of the mathematical relations entailed in motion problems. Sherin (2000) analyzed students’ “invented” diagrams when they were asked to represent motion using their own visualizations and found three key resources that students tapped into, prior to the use of the Cartesian graph. These were the following: (1) basic drawing conventions regarding space and size and iconic representation; (2) conventions about temporal sequences, possibly abstracted from experiences with story and text; and (3) conventions about the features of a line segment, such as length and orientation, which are commonly used to represent motion. Gol Tabaghi and Sinclair (2011), using a similar experiment with preservice teachers and a similar goal of drawing on participants’ non-normative strategies for graphing, found that most diagrams failed to include a representation of the relationship between movement and time.

The design tasks in this research, however, do not attend to the complex role of verb forms in motion problems. The task for Sherin (2000) was to represent the situation described as “A motorist is speeding across the desert, and he is very thirsty. When he sees a cactus, he stops short to get a drink from it. Then he gets back in his car and drives slowly away.” The task for Gol Tabaghi and Sinclair (2011) was to create a visual image related to the following story: “MellowYellow decides to walk to the corner store, which is less than a mile away from her house. She gets about halfway there and stops to pick up a penny. She looks at it for a while and then starts walking toward the corner store again, but faster than before, to make up for lost time.” Both of these stories are told in the present tense, and there is no change in tense, which means that the students must decode other markers in the text (other genre-related markers other than tense) to grasp the relationships between time and motion.

We found no research that addressed the specific challenges of word problems that involve complex verb conjugations and grammatical structures that are linked to the logical structure of the given problem. This oversight may be related to the claim of Solomon and O’Neill (1998) that the temporal order of a word problem must be presented in the “timeless present” so as to simulate logical order (p.216). The scholars draw a sharp distinction between temporal relations and logical relations within the mathematics register. Netz (1998) suggests that mathematics texts use the timeless present tense in order to convey the “impersonal work of mathematical necessity rather than the accident of authentic discovery” (p. 146). Pimm (2004) points out, however, that the common logical connectives then, hence, since, and when, as well as the key terms ever, always, never, have a deeply chronological sense in everyday English. It is evident that the relations between language and logic are extremely complex and that questions that pertain to time and motion further complicate these relations, creating tensions for students.

Students often use visual and kinesthetic modalities—for instance, gesture and diagram—in decoding the story of a word problem. The relation between logic and the visual modality, however, adds further complexity to the study of such activity. There may be radical mismatches between these modalities. For instance, O’Halloran (2007) argues that the English language does not typically afford resources for rendering logical relations in spatial terms. In other words, there is ample temporal vocabulary for rendering logical relations (next, now, then), but there seems to be no spatial language in English that might also capture logical relations. According to O’Halloran (2005), language did not evolve “the same potentiality for spatial logical relations” and instead “‘logical deductions’ based on spatiality [are] performed through visual means rather than depending upon formalized linguistic and symbolic selections”(p. 145). For example, the linguist (Martin (1993), p. 179) omits all reference to spatial language when he classifies logical relations in language in terms of (1) additive (addition, alternation), (2) comparative (similarity, contrast), (3) temporal (simultaneous, successive), and (4) consequential (purpose, condition, consequence, concession, manner).

O’Halloran’s argument suggests that we need to study students’ use of diagrams and gestures and other visual and kinesthetic modalities, for their distinctive capacity to render “spatial logical relations.” Perhaps the complex grammar of verb forms is linked to these visual modalities in ways that contrast with the usual alphanumeric rendering of logical relations. Staats and Batteen (2009) have shown how word problems that entail motion verbs, like slide, drop, and pull, can be seen as indexical words because of the way they “anchor the motion of the verb” in the spatial context of the diagram rather than the actual spatial context where the “real” motion was occurring (p.59). This fact points to the complex ways in which language is fused with particular diagramming habits. This also points to the unexamined impact of tense (present, future, conditional) on the way these verbs are taken up in students’ decoding of the logical relations encapsulated in diagrams.

Radford (2009) explored how middle school students work with word problems involving relative motion, attending to the multimodality of learning, which he describes as involving the body (kinesthetic actions, gestures), signs (symbols, graphs, words), and other artifacts (rulers, calculators). He attends to the “gestural, kinesthetic, symbolic and discursive activity” as students attempt to make sense of such problems (p. 52). Attention to these other modalities helps focus our attention on the material interaction in the classroom where teachers have always had to discuss these problems with their students by “schematically enacting the moving objects through gestures or body movements” (Radford 2009, p. 47).

Radford (2009) argues that our current interest in developing student capacity with Cartesian distance-time graphing techniques reflects modern conceptions of time and motion. With reference to Medieval word problems about motion, Radford shows how time is treated in such problems only as implicit within motion, rather than as an independent variable. “It therefore does not come as a surprise that in many Medieval and early Renaissance mathematical problems that were accompanied by drawings, time remains expressed in the perceptual motion of the moving objects” (Radford 2009, p. 47). Thus, the Medieval approach to such problems is to proceed “by a comparison of traveled distance” whereas the modern approach would be to express the problem in terms of velocities. He points out that Medieval problem solvers would not have thought in terms of small quantified units of time, but chunks of duration or chunks of movement. The Medieval approach was part of a cultural tradition that kept the motion of two or more moving objects as relative to each other rather than incorporated into a “unifying system of reference” (Radford 2009, p. 49). He then suggests that the modern approach aligns with the emergence and dominance of alphanumeric or algebraic strategies. The modern approach is precisely what allows for the timeless present of the algebraic solution, in which word problem verbs are decoded in terms of the present tense of to have and to be.

For Radford (2009), the tension between a “phenomenological space of imagined motion” and a Cartesian space of representation, in which time is quantified and represented on an axis, is a partial cause of student confusion with such problems (p.59). In his case study of a group of middle school students, he shows how student gestures mediate this tension. The students’ gestures enact the motion of the event, and then enact the passing of time. He argues that a process of objectification occurs as the students begin to exhibit an understanding of space-time relations captured in the Cartesian graph.

This research suggests that word problems about relative motion create unique challenges for students because the grammar relates two movements rather than situating one movement within an absolute frame of reference. In these kinds of problems, time is embedded within the situation in ways that make it harder to extract as an independent variable. In other words, the timeless present of an algebraic solution is all the more awkward. As Sherin (2000) suggests, more research is needed on student inventive diagramming, to study how alternative diagramming conventions might serve students in engaging with such problems. Word problems with complex tense, aspect, and mood furnish opportunities to develop these alternative conventions and may also help students explore “spatial logical relations” in ways that cannot be captured in Cartesian graphs. In the case study that we discuss below, we explore a word problem that is saturated with different tenses and moods and entails complex logical relations that introduce a counterfactual. In other words, the word problem does not lend itself to Cartesian or algebraic solution strategies and indeed calls for what Radford would describe as a Medieval approach. In the next sections, we introduce the case study and discuss how teachers analyzed the word problem in a lesson study group.

Lesson study group

In this section, we discuss data from a 3-year research project studying teachers working in inner city “under-resourced” middle schools in New York City. Participating teachers joined a lesson study group that met regularly to study classroom practice. Lesson study is a form of professional development for teachers that involves collaboratively exploring, planning, implementing, observing, reflecting, and revising lesson plans. Unlike other lesson study groups, ours focused on the social semiotics of mathematics teaching and learning. Social semiotics draws on a wide range of theoretical traditions to study the way that people use “semiotic resources” defined broadly to include all “actions and artefacts” used to make meaning (Van Leeuwen 2005, p.3). Social semiotics explores how various semiotic modes (language, gesture, diagram, etc.) are integrated or combined in multimodal artifacts and events (Morgan 2006). Many of the analytic concepts in social semiotics are derived from Halliday’s (1978; 2004) systemic functional linguistics (SFL) and have been extended, elaborated, and amplified so as to study modes other than language. O’Halloran (1999, 2000, 2003, 2005, 2007) has shown how social semiotics can shed light on the “multimodal grammaticality” that binds visual, symbolic, linguistic, and other elements together in mathematics texts (O’Halloran 2007, p. 205).

During the first year of the project, we focused on the need to access and use diagrams during instruction and to facilitate student use of diagrams. We also focused on the conjunction of diagram, gesture, and language use. During lesson study sessions, teachers engaged in a series of activities that focused on diagrams (as objects) and diagramming (as activity) and on the multimodal mediation of diagrams through gesture and language. Lesson study sessions explored diagramming through such prompts as the following: What do you notice about this diagram? If this diagram had a title, what would it be? What does this diagram prove? How would you make this diagram? Does the diagram conceal its method of construction? All of these activities were meant to develop creative habits around diagramming in general (de Freitas 2012). Participants also studied classroom transcripts and created “interaction maps” (de Freitas and Zolkower 2010, 2011) to determine what role a given diagram was playing in examples of classroom interaction.

The case study: drenched in time

In year 2 of the project, we focused on a series of routine and nonroutine word problems consisting of complex verb forms and pertaining to motion. The following problem comes from an English translation of a 1956 collection of Russian word problems.

One day a young man and an older man left the village for the city, one on horse, and the other in a car. Soon it was apparent that if the older man had ridden three times as far as he had, he would have half as far to ride as he had, and if the young man had ridden half as far as he had, he would have three times as far to ride as he had. Who rode the horse? (B.A. Kordemsky 1992, p. 102)

Since one of the lesson study objectives was to cultivate teacher capacity to modify problems for classroom use, teachers were given the word problem as is, despite it being obscure and unrelated to an NYC context. Moreover, the complex use of tense, aspect, and mood in the problem offered an excellent example to work with. We draw attention to four details:

  1. 1.

    This word problem breaks with the genre as described by Gerofsky (1996) insofar as it resists an algebraic decoding. The grammar of the problem encodes implicit logical relations, as she suggests, but it does not point to the algebraic as the preferred solution strategy. The relations are thus not mapped easily onto arithmetic or algebraic operations.

  2. 2.

    The past tense of the event positions the problem solver outside the event, as though looking down at (or perhaps back at) the two men and the various distances between them at different moments.

  3. 3.

    The expression “soon it was apparent” positions the problem solver in two ways: “soon” speaks to a future ahead of a particular moment, while “it was apparent” locates a viewing position within the story world from which to judge the race.

  4. 4.

    In this manufactured present of the snapshot moment (point no. 3) in which the problem solver is positioned in the past event, there is an aspectual relation to the ongoing activity rendered grammatically in “he had ridden” by which the continuous motion of the men is stopped and perceived as a completed event.

  5. 5.

    The conditional or subjunctive language of “If he had…” creates a second layer of spatial/temporal experience related to a counterfactual, requiring a second correlative diagram (or other modal rendering) relating the “is” to the “what if.”

The teachers worked on the problem individually and jointly, exploring various strategies. Five of the six teachers initially struggled with an algebraic solution, until they saw the sixth teacher Ana’s diagrams (Fig. 1). Ana was asked to explain her approach to the others, at which point she began writing on the page (adding to the page the algebra and number line down the right side) to show the others how one might use algebraic notation to think about her diagrams.

Fig. 1
figure 1

Ana’s strip diagrams

The diagrams are extremely effective in capturing the relative motion of the two men. The old man and the young man are represented in terms of blocks or lengths or strips, capturing the durational aspect of their motion (see Fig. 2).

Fig. 2
figure 2

Ana’s diagram: factual and counterfactual conjoined

These diagrams effectively position the problem solver within the motion—so that the durational quality is adequately captured. The extended visual strip is extremely effective in capturing this duration of the event in time while affording different visual cues—such as dotted lines, shading, and labels—to designate before and after the moment of observation.Footnote 1 The diagrams thus allow for a visualization of the entire journey, which positions the problem solver in terms of both perfect and imperfect aspect. In addition, the subjunctive or counterfactual event is tied to the actual event through the juxtaposition or joining of the four strips.

In Ana’s main diagram (redrawn in Fig. 2), the upper left block labeled “old man” is used as the proportional or relational unit rather than an absolute distance and is thus labeled 1/5. These numerical labels came after Ana completed the diagram, but before she added the algebraic labeling seen in Fig. 1. In the top strip, the unit old man is repeated three times (since this makes the unit 1/3 as far, “if the older man had ridden three times as far as he had”), and then, this same extension is repeated to produce another two old man units (since this makes the first extension one half of the total extension, “he would have half as far to ride as he had”). The strip below is shaded to designate the counterfactual situation. The bottom two strips designate the counterfactual and the actual trip of the younger man. The young man’s actual trip is equivalent to four units of the old man. If he went half as far, he would have to travel three of those units to reach the end. In actuality, he needs to travel one such unit to reach the end. She writes “I/5 to go” which refers to what remains to travel.

This diagram became a pivotal tool in the development of the teachers’ lessons. The other teachers realized that the diagram fused four strips. In other words, there was a strip for each man and a strip for the event and the counterfactual event. Thus, each man was given two strips—the “actually there” and “what if.” Moreover, these diagrams were fused together to allow for comparison. The teachers we discuss below habitually engaged word problems, regardless of their content, with the usual algebraic habits of introducing variables and equations, when visual strategies, or a combination of diagram and symbolic notation, might have been better suited to the task in terms of both efficiency and expressiveness. One of our objectives in the research project was to disrupt this ingrained habit of generating equations, in the hope that they might expand their problem-solving strategies. In the next section, we discuss three different lessons developed by the teachers in the lesson study group based on their exploration of the word problem drenched in time.

Three different lessons

The data discussed in this section come from three different grade 8 classrooms in two different schools in Brooklyn, NY. Three lessons were videotaped in each teacher’s classroom, each term for 2 years. These lessons were based on activities in the lesson study sessions. Video data was transcribed and analyzed for how the teachers’ were using multimodal moves, using a rubric published in de Freitas and Zolkower (2011). In the first example, there were 31 students, 14 of whom were ELL, primarily of Russian and South Asian descent. In the second and third example, there were 27 students and 26 students, over half in each case using English as a second language. During discussions in the lesson study sessions, teachers brainstormed ways of teaching with this word problem, given the particular needs of their context. Each teacher indicated that they would use Ana’s strip diagram in some capacity, but also altered the problem or introduced it to their students differently. We focus on these three teachers because they offer evidence of how the activities in the lesson study sessions were taken up in classroom practice. For all three teachers, the data discussed here was obtained from the first and only attempt to explore the word problem with their students during the research project. In each case, diagrams and gestures were conjoined with language and other semiotic resources as a way of exploring the different aspects embedded in the problem.

Lesson 1: positioning the observer

The teacher Bonnie changed the context for the word problem to racing turtles, and radically altered the grammar of the written word problem. She introduced an action—taking a picture or photograph of the race as it occurred—that emphasized the instantaneous act of measuring the relative positions, and thus instantiated the arbitrariness of the moment when the observation was performed. This use of the picture taking also invited the students to begin visualizing and, potentially, begin diagramming. She also introduced a given quantity for the race (10 m) that allowed students to access measurement skills as they began to unpack the grammar and logic of the relative positions.

Two turtles are racing on a 10 m table. Someone took a picture of the race. In the picture, the blue turtle had traveled some distance. If he had traveled three times that distance, the distance he would have left is equal to half of his actual remaining distance. How far did he go? The yellow turtle also went some distance. Imagine the turtle had gone half that far. His imaginary remaining distance is three times his actual remaining distance. How far did he go?

Compare the two turtles. Who went further? By how much? If they continue on, who will win?

In this case, the word problem has been situated more in the present tense (“Two turtles are racing”), and there is an attempt to explore the aspect of the observer who takes the picture at some moment during the race. However, rather than position the student as the picture taker, and the act of picture taking in the present (“You take a picture of the race while they are racing”), Bonnie introduced a third person perspective or aspect and used the past tense (“Someone took a picture”). This phrase forced students to occupy a perfect aspect or external perspective so that they might immediately make sense of the algebraic decoding of the image in terms of x and 10-× (see Fig. 3). In particular, the inscription 10-× demands that one construe the current position of a turtle in terms of its yet to be completed race.

Fig. 3
figure 3

Emphasis on describing the counterfactual as an “imaginary” situation

For Bonnie, emphasis was on describing the counterfactual as an “imaginary” situation and in helping the students to make sense of “if… then” statements. As seen in the transcript excerpt below, Bonnie worked the problem through a series of powerpoint slides and a whole-class conversation, emphasizing the immediate translation into algebraic notation. She used the word “imaginary” repeatedly to designate the counterfactual scenario and the word “actual” for the factual scenario.

Lesson 2: haptic gestures

The teacher Malou opened the lesson with a leading question: “How can you draw time?” Students proposed the answers: “a clock,” “a timeline,” “10:00,” and “Rate = distance/time.” Before introducing the word problem drenched in time (which was given as is, without alteration), students were asked “What is the difference between the word “had” and “have.” Write sentences using both.” They discussed the difference between the simple past and present. The teacher emphasized “If I had studied for the test, I would have done better” saying “If, if, if… means that he didn’t do it” thereby explicitly introducing the grammar of the counterfactual embedded within the word problem. Malou worked with the students in small groups for most of the lesson, moving from group to group and encouraging them to generate diagrams.

In Fig. 4, two students discuss the problem and gesture the counterfactual for the old man. Malou interacts with the group, first copying the gesture of the student. She rhythmically pinches the air and moves her hand laterally, as though forming a segmented strip and asks “what were you doing with your hands?” Malou literally moves her gesture onto the page, so as to trigger student drawing of a segmented strip (see step 4 in image no. 4).

Fig. 4
figure 4

Teacher and student gesture the strip

A different group of students tried to generate a timeline diagram (Fig. 5). The aspect of each verb is visualized using left-right overlapping segments, so that simultaneity and duration are represented. In the far left, “he rode” is written for each racer. The conditional mood “would have to go” is placed to the right (the future) and somewhat shifted up from “could have” for the old man. The “could have” and “would” are close together and in the same segment for the younger man (below), but a dark line breaks this line to indicate the moment at which the two racers are compared.

Fig. 5
figure 5

A different group of students tried to generate a timeline diagram

Fig. 6
figure 6

The prosody of speech

Fig. 7
figure 7

A play about a race between Mr. C and Mr. Z

Lesson 3: intoning the temporal

The teacher Allen began the lesson by taking up a homework task related to the problem. This task asked students to take a pink strip labeled AB and find “Three times as far as half as far as A to B.” The lesson began with Allen using the Elmo (Interactive document camera and projector) to discuss with the students their work, each time showing the strip on the Elmo and marking the strip as he spoke. One of the affordances of the Elmo is that the students see projected on the wall the student work and the markings as they are introduced by the teacher, but also the shadow of his hands as he points at different parts of their work. Thus, they can track the interplay of these different dimensions of visual modality (Chen and Herbst 2013) (Fig. 6).

Allen uses deictic terms (“this,” “here,” “that”) as he points at and draws on the student work. He then introduced a one-page theatrical play or staging of the word problem that he composed to make the problem more performative and to help the students rehearse the language. Allen handed out the play and invited students to read the play aloud, two or three times until they had mastered the awkward language. This allowed him (and them) to work on the intonation and other oral markers—the prosody of speech—to flesh out the meaning of the phrases. The theatrical play also highlighted the event nature of the problem and positioned the students as calling by phone at various instants to find out the current situation between the two racers, thereby stopping midrace to freeze-frame the event. The following is the play as it was distributed to the students in a hand-out (Fig. 7).

The class read the play aloud three times, struggling over particular difficult conjugations of the verbs to be and to have. But, by the third time, they got faster at reading the more challenging phrases. After the reading, the students were then put into groups of four and without direct guidance as to method, told to work on the problem. All the students operationalized the strip method, echoing the work they had done with the pink strip and the isolated phrase. They then posted their posters throughout the room and were asked to explain to the rest of the class their methods. In this case, the teacher’s questioning focused on how many units the strip was divided into. The following is an excerpt from the class discussion, as a student Hong explains his method.

Discussion

Data from these three classrooms indicates that diagrams, gestures, and intonation were crucial multimodal devices for engaging with the complex grammar of the word problem. In lesson 1, Bonnie translates the time of the event to the present tense and emphasizes the algebraic rendering of the grammar. Her lesson recasts the problem more in line with the genre characteristics listed by Gerofsky (1996), stressing the need to immediately generate algebraic expressions for each clause. She also emphasizes the distinction between the “actual” and the “imaginary” situations and separates the strip diagrams for each of these, rather than fusing them. Introducing a length of 10 m allowed Bonnie to label the “yet to go” distance as 10-3× (see Fig. 3). This label was crucial for Bonnie, because it completed the algebraic coding of the diagram. In other words, Bonnie moves toward a more modern approach to the problem. In comparing the three lessons, Bonnie’s lesson involved the most teacher talk and the least group work. In the lesson study group, she had expressed a dislike for all word problems pertaining to motion and time, and she was anxious about her students’ capacity to engage with the problem. She was also inclined to solve problems algebraically rather than diagrammatically. During this lesson, she used limited gesture, which was consistent with her other videotaped lessons. Thus, data from Bonnie’s lesson shows how the algebraic approach is tied to the timeless present tense of the word problem.

In lesson 2, Malou centers the issue of how to use language, diagrams, and gestures to represent time. Her pinching gesture conveys the chunk of movement or duration that Radford (2009) noted in his analysis of Medieval word problems about motion. Because the gesture unfolds in time, it marks the continuity of the unfolding event (the race) and thereby enacts or embodies an imperfect aspect, a moving mathematical now, where the duration of the event is captured. At the same time, the rhythmic segmenting or tempo of the gesture as it moves along laterally cuts out discrete intervals or units of time and evokes a segmented diagram of discrete and equal intervals, like the strip diagram. This data confirms McNeil’s (1992) assertion that “gestures, together with language, help constitute thought” (McNeill 2000, p. 245). Gestures both represent thought and facilitate the emergence of new thinking (Edwards 2009; Roth 2008). According to Alibali et al. (2000), gesture helps speakers package spatial information, “gesture helps speakers to explore alternative ways of organizing a perceptual array, and thus helps speakers to break down a perceptual array into verbalizable units” (p. 595). Malou repeats her gesture on the page, encouraging the student to generate a diagram. Thus, Malou uses the gesture to conjure a “virtual object” and then moves the object to the page (Kita 2000, p. 162). In this case, the gesture captures the relative motion of the old man, quite literally beating a rhythm that captures the proportional distance travelled in that time.

Malou uses gesture to capture the perfective and imperfective aspects of the verbs in the word problem. The duration of the gesture stroke was sustained in the same rhythm to convey the chunk of movement. Thus, the gesture was not only, if at all, an iconic representation of the movement of the men, but it captured the aspect of the verbs through being kinesthetic. This gesture enlivened that which it referenced as a way of capturing the aspect of the verb. In this case, the gesture became part of the temporal contour of the event and allowed students to access and develop a strip diagram approach. As Alibali and Nathan (2007) argue, teacher gestures can help scaffold student engagement.

Malou’s lesson was almost all group work. In Fig. 5, a different group of students mapped the tense, aspect, and mood onto a diagram that sequentially linked them. The first segment, “he rode” is followed by “he could have” and then a third segment “he would have to go.” The group considered all of these as part of one continuous timeline (although there are gaps in the diagram, as each segment is somewhat raised in relation to what comes before it). The middle segment contains the conditional mood “he could have” and also designates the counterfactual situation, while the third segment contains the perfect aspect “would have to go.” These students are confounding various diagram conventions and struggling with the difference between tense and mood. As Sherin (2000) suggested, students often rely on basic conventions about line segments and temporal sequences. In this case, the event corresponding to the conditional mood occurs within the temporal unfolding of actual time.

In the third lesson, Allen directly introduces the strip diagram as a tool before taking up the word problem. The Elmo allows him to integrate gesture, diagram, and intonation before distributing the play. Performing the play allowed students to occupy the position or aspect of the two racers as they read it aloud. Such enactment cannot be overestimated for not only how it links embodied prosodic elements of speech to words, and therefore gets at how we make sense of complex verb forms through such elements, but also because it invites students to take up the aspect of the moving mathematical now. The durational aspect of the race is lived through such enactment. It is worth noting that the theatrical play also allows the students to assign an aspect to the statements of each racer, in that they can quote “Mr. Zad said….” or “Mr. Cap said…” as they work on the problem, and thus, the verb forms of the action of the race are recontextualized or resituated as first person claims by their teacher or the other teacher. In fact, in line 156, Mr. Zad says of the other teacher “that he has ½ as far to go as I…actually…have to go to get to the finish.” Thus, Mr. Zad the speaker is also the racer. The use of the pause and the word “actually” underscores the counterfactual of the problem, although this teacher has not structured the lesson around the distinction between the factual and the counterfactual, stressing instead that the appropriate relative measure must be obtained in the diagram, so that the actions of the two racers can be represented in one diagram. Thus, in this case, a focus is on the relative motion and the central role that a segmented diagram plays in comparing these two movements in one all-encompassing visual rendering.

Allen asked the groups of students to post their diagrams on the wall and explain their method to their peers. In Fig. 8, Hong suggests that “all you need to do is work backwards.” He thus emphasizes the relative positions that each man has taken. In line 151, Hong embodies the imperfect aspect of the verb when he states “and one third times three is one whole” in that the “whole” is indeed only a part of his diagram, an embedded perspective. He repeats this technique when he says, during the same contribution, “if he has half way to go that mean he would be in the middle.” In other words, we interpret Hong as successfully decoding the complex verb forms from different embodied positions within the story world. We selected this excerpt because it underscores the way one group transforms Allen’s theatrical play into a set of logical relations between parts of a diagram. Hong repeatedly uses the language of inference (“because,” “which means,” “that mean”) to argue for his diagram. His reference to working backwards, middle and indexical use of language underscores how he is decoding the various verb aspects within the problem.

Fig. 8
figure 8

Excerpt from the class discussion

Conclusion

This article contributes to research on the multimodal dimensions of working with mathematics word problems. Our aims were to show how tense, mood, and aspect are at play in the word problem genre and to show how particular gestures and diagrams were linked to particular verb forms. We analyzed one standard textbook example to show how various space-time relationships operate through the grammar of conventional word problems. We then discussed how teachers in a lesson study group and later in their classrooms engaged with a more complex example of a nonroutine problem called drenched in time. The analysis revealed that aspect is an important grammatical tool for broadening the concept of tense, so as to better recognize the felt experience of time. Aspect operates in language as a device to communicate or represent duration, completion, and frequency of an event, and thus allows speakers to attend to relative motion and various observational or embodied perspectives on these events (Bach 1981). The aspect of a verb pertains to the nature of the event and its degree of duration—is it sudden, quickly over, delayed, prolonged, ongoing, infinite? Aspect also situates a speaker or observer in relation to that durational event and is thus highly relevant to concerns about how students and teachers render space-time relations in diagrams.

The case study discussed in this article showed how particular gestures and diagrams served students and teachers as they grappled with complex verb tenses in a word problem. In particular, we argued that non-Cartesian diagrams that depict relative motion are powerful tools for unpacking the implicit logical relations conveyed through tense. We showed that the strip diagram approach allowed both students and teachers to engage with complex word problems about motion that entailed multiple verb forms. This research shows how diagrams can give access to relative motion problems that would otherwise be inaccessible to students. The particular affordances of this kind of diagram allow students to map verb function onto a visualization of relative motion. This contrasts to an algebraic strategy that must first translate the problem into the timeless mathematical present, and also contrasts with a Cartesian graph strategy that entails both this translation and the use of an absolute frame of reference. The strip diagram approach, on the other hand, supplied students with visual access to the relative motion. This approach engaged more directly with the grammar of the word problem.