The implementation of the No Child Left Behind (NCLB) Act in 2002 has led to mathematics tests becoming a dominant feature of schooling in the USA as states were required to develop standards and administer tests that were aligned. The Common Core State Standards in Mathematics (NGA & CCSSO, 2010) went a step further in developing uniform standards across the states and promoting the development of assessments aligned with these standards. Both standard and assessment initiatives linked performance to school funding. The NCLB Act required schools to report test results disaggregated by subgroups (e.g., race or ethnicity and English proficiency) and demonstrate progress towards measurable goals for each subgroup over time, or face punitive sanctions. For the purposes of making comparisons between states, the U.S. Federal Government required states to participate in the National Assessment of Educational Progress (NAEP) for mathematics and reading. This test is administered to nationally representative samples of fourth, eighth, and 12th grade students every other year. State test results and NAEP results are reported widely, with the focus often being on the achievement “gaps” between various subgroups of students (e.g., Berends, Lucas, Sullivan, & Briggs, 2005; Fry, 2008; Lubienski, 2003).

Among the subgroups, the underperformance of bilingual studentsFootnote 1 has received a considerable amount of attention (Abedi & Lord, 2001; Butler, Stevens, & Castellon, 2007). In this high-stakes testing environment, there is less focus on what bilingual learners know and can do. In this study, we asked how would bilingual learners draw on multiple modes, including gestures, manipulation of concrete materials, and speech, to interact with and explain their solution to NAEP assessment items. We suggest that a multimodal perspective (Jewitt, 2009; Kress, 2010) can provide a more nuanced understanding of bilingual students’ mathematical thinking that goes beyond possible deficit perspectives that are currently conveyed through a focus on their test scores. While this study is situated in the USA, issues around assessment and accountability are certainly not unique to the USA. High-stakes testing and its related issues have been brought up in other countries like the UK (House of Commons, 2010; West, 2010), Australia (Polesel, Dulfer, & Turnbull, 2012), and Slovakia (Minarechová, 2012), to name a few.

1 Theoretical framework

In this article, we draw on a multimodal social semiotic framework of representation and communication (Jewitt, 2009; Kress, 2009, 2010; Kress, Jewitt, Ogborn, & Tsatsarelis, 2001; van Leeuwen, 2005) to frame the bilingual students’ responses. According to Kress (2010), “A mode is a socially shaped and culturally given resource for making meaning” (p. 54). More recently, Edwards and Robutti (2014) describe a framework of multimodality based on the ideas of embodied cognition. They view modes “as the entire range of cultural, social, and bodily resources available for receiving, creating, and expressing meaning” (p. 12). Modes offer various affordances for making meaning that are shaped through their use in social, cultural, and historical contexts and through the nature of the mode itself (Kress, 2009). These affordances also extend to making mathematical meaning (Edwards & Robutti, 2014; Edwards, Ferrara, & Moore-Russo, 2014).

Multimodality assumes that communication and representation involve multiple modes that combine to form a “multimodal ensemble” (Jewitt, 2009, p. 14). In this context, speech and writing are not viewed as primary modes, but two of many modes that a person could draw on to communicate. Thus, modes including gestures and gaze are not seen as add-ons to speech, but are important in their own right and contribute to the meaning-making process. The nature of the mode also highlights certain features that can be drawn on during communication. Whereas speech is sequenced in time and can be used to convey events in order, images, on the other hand, are framed in space with the different elements combining simultaneously to shape meaning. For gestures, both time and space are used to develop meaning in interactions (Kress, 2009).

Within multimodality, learning is viewed as “a dynamic process of sign-making” (Kress, Jewitt, Ogborn, & Tsatsarelis, 2001, p. 27). Students draw on the various networks of resources available in the modes as part of their cognitive processes and learning. Multimodality extends previous frameworks where learning was viewed as a students’ participation in the classroom mainly via speech and writing, i.e., language. With multimodality, the student participation is viewed through a broader range of modes (Kress et al., 2001). In this study, we apply the ideas of multimodality to the assessment of bilingual students.

2 Assessing bilingual learners

Our study is based in the USA where the school population of bilingual students has grown from 9 to 21% between 1979 and 2008 (Aud, Hussar, Planty, Snyder, Bianco, Fox, Frohlich, Kemp, & Drake, 2010). Bilingual students who speak a language at home other than the language of learning and teaching (LoLT) can be challenged in the classroom (Barwell, 2009, 2012; Gorgorió & Planas, 2001; Moschkovich, 1999; Setati, 2005). While students whose home language is different from the LoLT may be able to converse in the LoLT outside the classroom, they require more time to learn the special language features of mathematics (Cummins, 2000).

Paper-and-pencil tests can be challenging to all students, but especially bilingual students who are in the process of learning the academic language. The written mode dominates both the presentation of the task and the response (unless it is a multiple choice item). Examining the thinking of students in large-scale paper-and-pencil tests can be problematic as the students’ knowledge can be under- or overestimated (Baxter & Glaser, 1998). A number of validity and reliability issues have been raised with the use of large-scale assessments with bilingual students (Abedi, 2002; Lane & Leventhal, 2015; Martiniello, 2009; Young, Cho, Ling, Cline, Steinberg, & Stone, 2008). A major concern for the bilingual students is associated with the construct-irrelevant linguistic complexity in the assessment tasks (Abedi & Lord, 2001; Abedi, Lord, & Hofstetter, 1998; Martiniello, 2008, 2009). By analyzing two large-scale assessment tasks in detail, Campbell, Adams, and Davis (2007) illustrate how the mathematics and language demands can combine to be more challenging for bilingual students than native English speakers. As a whole, these studies raise questions about the results of large-scale assessments as indicators of the mathematical proficiency of bilingual students.

In an effort to get more accurate measures of bilingual students’ mathematical understanding, the research discusses the use of accommodations like extra time and the use of a dictionary (Abedi, Lord, Hofstetter, & Baker, 2000). Some research points to possible ways to improve validity of the assessment of bilingual students through the task. Abedi and Lord (2001) demonstrated that when the language of the items was modified to reduce the linguistic complexity, the performance of the bilingual students improved relative to that of the native English speakers. In another study, Martiniello (2009) found that including schematic representations (e.g., diagrams or symbols that indicated spatial or numerical relationships) within the task helped bilingual students mediate the linguistic complexity and increase their chances of success.

In our quest to elicit bilingual students’ mathematical thinking, we sought to use an assessment task that would afford students’ ways to demonstrate their thinking in modes that went beyond writing to include speech, gestures, and the manipulation of concrete materials. We elaborate on the features of the task in the methods section.

3 Multimodality in students’ mathematical explanations

Responses to test items in the NAEP and other large-scale assessments are mostly in the written mode. In our efforts to understand how other modes could be included in the student response, we isolated prominent themes related to multimodality in the literature.

McNeill (1992) was one of the pioneers in integrating gestures as part of communication. He considered speech and gesture as tightly interlinked and part of the same mental process. This work changed the view of gestures as simply an add-on to speech. A number of studies demonstrate how students can express complex and new ideas in mathematics by drawing on multiple modes in interactions (Arzarello & Sabena, 2014; Arzarello, Paola, Robutti, & Sabena, 2009; Bjuland, Luiza, & Borgersen, 2008; Edwards, 2009; Marongelle, 2007; Nemirovsky, Kelton, & Rhodehamel, 2013; Radford, 2003; Radford, Bardini, & Sabena, 2007; Rasmussen, Stephan, & Allen, 2004; Reynolds & Reeve, 2002). For example, Radford, Bardini, and Sabena (2007) illustrate how students can perceive and express a general pattern through a combination of modes such as words (speech), gestures, actions, and rhythms. In their analysis of the group interactions of ninth grade students examining patterns, different students chose different modes to forefront their sense of generality. For example, a student intertwined gestures, speech, and the diagram provided (see Fig. 1) to explain her strategy of finding the number of circles associated with Figure 10 (in the visual pattern).

Fig. 1
figure 1

Sequence of figures (Radford et al., 2007, p. 508)

Since Figure 10 is not perceptually available for a simple counting, the student gestured the two rows in the air and at the same time stated the number of circles. The student’s method was based on a general scheme of counting developed from noticing that the number of circles in the top row was one more than the number of the figure and the number in the bottom row was one more than that in the top row. According to the researchers, the gestures played a key role in objectifying the general counting scheme. Later, the students in the group make gestures at a higher elevation from the desk as they work out the number of circles in Figure 100. In another case of the same problem, a prosodic analysis of the combination of gesture and speech of a student’s explanation reveals a rhythm that the student uses to emphasize certain features of the figures, at the same time de-emphasizing other features. This rhythm allowed the student to express the regularity in the pattern and, hence, convey a notion of algebraic generality. The study conducted by Radford et al. (2007) demonstrated that the students use various modes to express generality. The researchers claim that the texts produced by the students have a different “texture” from ones that consist of alphanumeric signs, but are nonetheless also valid ways of engaging in the process of algebraic generalization.

Concrete materials and drawings play an important role in students’ multimodal explanations. These modes can act as “material anchors”Footnote 2 on which other modes like gestures and speech can be overlaid to develop explanations (Hutchins, 2005). The stability afforded by concrete materials can hold fixed certain relationships as students test out their mathematical ideas. The anchors serve to reduce the cognitive demands by not requiring them to imagine features of the problem as they formulate their explanations. Roth and Welzel (2001) demonstrate how students can develop explanations of science experiments before being introduced to the underlying concepts (e.g., electrostatics). These explanations draw on gestures and speech and are anchored in the experimental equipment in front of the students.

In examining the explanations of young children in Piagetian conservation tasks, Alibali, Kita, and Young (2000) found that the children’s gestures occurred just prior to their speech and served to organize their verbalization. Reflecting on the same data, Alibali, Church, Kita, and Hostetter (2014) note instances when the children conveyed different information in their gestures and speech. They conclude that these gestures are reflective of the children’s perception- and action-based knowledge that they have gleaned through experience for which they are still developing verbal ability at that stage.

An interesting parallel can be drawn between the explanations of these young children and those of bilingual students who are still learning both new content and its associated academic language in a language different from their home language. There is evidence that bilingual learners employ gestures to produce a comprehensible output when they are challenged by lexical and grammatical difficulties as they try to use their L2 (Evans & Rubin, 1979; Gullberg, 1998, 2011). Further, research in communication shows that gestures are especially useful when speech may not be understood by the listener (Church, Ayman-Nolley, & Manhootian, 2004; Gullberg, 1998, 2011).

There are a few studies in mathematics that discuss the use of modes other than speech and writing by bilingual students (e.g., Domínguez, 2005; Fernandes, Civil, & Kahn, 2014; Moschkovich, 2002; Takeuchi, 2015). These studies demonstrate how bilingual students can provide mathematical explanations by drawing on gestures, speech (including home language), and representations. For example, Moschkovich (2002) provides an example of the role of gestures by a middle school Spanish bilingual student explaining her solution to finding the rectangle (with integer dimensions) with an area of 36 that had the largest perimeter. This student could not recall the proper vocabulary of length, width, and rectangle as she described her pattern. Instead, she used gestures, the drawings in front of her, and the word rángulo (incorrect but approximate word for rectangle in Spanish) to describe the correct pattern.

These studies show that students who are beginning to learn the academic language can develop sophisticated explanations when they draw on and coordinate multiple modes. The use of material anchors facilitates the process of communication as they hold fixed certain features that the students want to elaborate as part of their explanation. In our study, we seek to exploit modes that go beyond writing and speaking to understanding how bilingual students solve and explain their thinking about a NAEP area problem that affords the use of multiple modes. We specifically examine how bilingual students draw on speech, gestures, and concrete materials to construct mathematical explanations. While all students could draw on multiple modes to solve and explain their thinking, the use of multimodality could mean the difference between bilingual students being positioned as knowers of mathematics as opposed to the current dominant view as students who struggle with mathematics.

4 The study

4.1 The participants

Our motivation to interview Latina/o bilingual students was initiated by Lubienski (2003) who reported that, among the five content areas in the 1996 NAEP, the difference between White and Hispanic students (categories labeled by NAEP) was largest for the measurement strand across the fourth, eighth, and 12th grades (the three grade levels at which the NAEP is administered). As a consequence, we wanted to understand challenges and also examine possible resources that students drew on while working with these measurement problems. The label “Hispanic” in NAEP does not imply that these students were English learners. For our study, however, we chose to focus on Latina/o students who had been classified as English Learners by the school district in which they were enrolled. For this article, we focus on 26 middle grade students (ages 11–13) who were interviewed, in English only, across two sites in the USA. Among the 26 students, there were 12 sixth graders, 12 seventh graders, and 2 eighth graders. All students were given three to four measurement questions from the NAEP. The participants were bilingual and their levels of proficiency with English were quite varied. In the sample, we have chosen to highlight a group of students that are not quite proficient in English. We chose these students because they provided more information on their solutions and we were better able to engage in conversations with them.

4.2 The task

The sample of 26 students was interviewed with a pool of six NAEP tasks in measurement (content classification by NAEP). Since the NAEP tasks were given to students in the fourth, eighth, and 12th grades, we had to balance the level of challenge for tasks that were used in the interviews with students in the other grade levels (e.g., sixth grade). We wanted to begin with a problem that we thought would be accessible to the student and gradually build on more challenging tasks towards the end of the interview. Some details about the six NAEP tasks are provided in Tables 1 and 2. These tasks were chosen based on their potential to foreground linguistic challenges and resources that the student brought to the interview.

Table 1 NAEP tasks for the interview
Table 2 NAEP task for the interview (chosen for analysis in this article)

In this article, we examine the bilingual students’ multimodal communication in the context of the Area Comparison task, categorized as “hard” on the NAEP (Fig. 2) and administered at the fourth, eighth and 12th grade levels, respectively. Unlike four of the tasks that provided an associated diagram and another task that provided a grid, the Area Comparison task provided physical cutouts. Given our interest in multimodal communication of the students’ mathematical thinking, we conjectured that the manipulation of the cutouts would afford the bilingual students with more ways to express their thinking. The Area Comparison task included four cutouts of shapes—two identical right triangles (each labeled P) and two identical squares (each labeled N). One of the legs of the triangle was the same as the length of the side of the square and the other leg was double the side of the square. The task offered students various entry points to develop a solution. For example, students could draw on the requisite formulas after assuming that the sides of N are x and the height and base of P are x and 2x, respectively. They could work out the areas of N and P and determine that they are the same. On the other hand, students could associate numbers with the lengths of the sides of the shapes and try to work out the areas using the formulas. The shapes provided the students with material anchors that they could manipulate to solve and construct an explanation that could draw on multiple modes. One aspect of a material anchor is that it encapsulates the relationships in the task which helps the students’ thinking (Hutchins, 2005). In this case, the cutouts encompassed the proper measurements that remain invariant as the students manipulate the shapes and explore relationships between the areas.

Fig. 2
figure 2

Area Comparison task. Adapted from the NAEP (1996) published item for fourth grade described as “Compare areas of two shapes” (the original NAEP version included pictures of the children along with dialogue bubbles with the statements, e.g., Bob saying “N and P have the same area.” We have not reproduced the picture of the student saying the statement in this figure. There are also slight variations in the wordings between this fourth grade version and the eighth and 12th grade versions)

4.3 The interview

In this article, we examine how bilingual students use multiple modes to interact with the Area Comparison task and explain their solution. One or two interviewers interviewed the 26 students, and the interviews lasted approximately 45 minutes. The student worked on three or four NAEP tasks from the pool of six during each interview. The 26 interviews that we analyzed for the purposes of this article were conducted in English only and the researchers did not attempt to use the native language of the student (if the researcher could speak it). Each student was asked to first work on the task independently and then discuss her/his method with the interviewer. Note that, in some cases, the interviewer assisted the student in comprehending the task if the student asked for clarification. If the student provided one solution, the interviewer would ask the student if he/she could solve it another way. In these interviews, there was usually one primary interviewer; however, in some cases, there was a second researcher who was in charge of the video camera and who would occasionally interject a question for the student. All of the 26 interviews were digitally recorded and used for the analysis. The interviewer did multiple probes to confirm and reconfirm the student’s thinking to determine whether it was stable as the interview progressed. The interviewer made a conscious effort not to lead the student in the interactions. However, prompts were provided to extend the interview in instances where the interview stalled and the mathematics educator thought the student had more to say. For example, the interviewer encouraged some students to play with the shapes when they just looked at the shapes to make an inference. We conjectured that this task with the cutouts may have been different from the usual mathematics tasks that they encountered in the class and, thus, they may have been reluctant to use the concrete materials. Note that a prompt to use the materials did not automatically lead the students to a solution of the task. In some cases, the students used the right triangles to form a parallelogram or kite, and in some cases, the students used all the four cutouts to form a large square. In each of these cases, the students had to rethink their approach to the task.

5 Analysis

Our analysis of the digital recordings of 26 student interviews was guided by the seven phases of the framework of Powell, Francisco, and Maher (2003) and the multimodal interaction analysis of Norris (2004). In the framework of Powell et al., the digital records, not the written transcripts, form the primary data for analysis. Given that our goal was to examine the multimodal interactions of the students, we decided to implement the direct analysis of the video records. The seven phases, outlined by Powell et al. (2003), are attentive viewing, describing, identifying critical events, transcribing, coding, constructing storyline, and composing narratives. Powell et al. describe critical events as events that are tied to the research question being answered. In our analysis, an event was labeled “critical” if the student drew primarily on modes other than speech and writing to construct key mathematical ideas related to the solution of the task. Norris (2004) uses the construct of modal density to identify the importance of a mode in constructing the higher-level actions (e.g., constructing a mathematical explanation related to the area task). According to Norris, a high modal density is generated either through the dominance of one mode that is vital to the meaning-making process (modal intensity) or through the complex intertwining of multiple modes with no one mode achieving prominence (modal complexity).

The first author viewed all the interviews multiple times and constructed a data table that included detailed descriptions of the interview, the flow of the interactions, the questions the interviewer posed, and the students’ responses. Verbatim transcription was embedded in the table when the students’ verbal responses were not clear. Besides details of the interview and transcriptions, these entries included the students’ manipulations of the cutouts and their additional gestures that accompanied the explanations. The first author went back and forth between the table and videos to ensure that there was an accurate description of the events in the digital records. In this attentive viewing of the data and constructing of the descriptions in the table, the first author highlighted portions considered to be critical events.

A critical event included two specific determinations. The first included responses to the area problem that had a high modal density in modes other than speech and writing (e.g., gestures, manipulation of shapes, drawings, and inscriptions on paper). The second aspect was defined by whether or not modes other than speech and writing were critical to the student’s explanation of the mathematical task. If only the speech of the student was tracked, would we be able to understand the student’s strategy? In order to be considered a critical event, the answer to this question would have to have been “no.” For example, a student put the Ps together to form a rectangle and then gestured a cut through the middle with the edge of his right palm to explain that he would get two squares like the Ns. He then concluded that the areas of N and P were the same. In this case, positioning the shapes to form a rectangle and overlaying his gestures on these positioned shapes is key to understanding the student’s mathematical thinking. By just focusing on the student’s speech, the mathematical strategy is not clear. In this case, the modal density was achieved through a combination of positioning the shapes and overlaying the gestures on these positioned shapes.

It is important to note that while conducting the interviews, we were not always aware of how powerfully purely visual responses addressed the solutions to the area problems. We agreed that, in each instance, the interviewer pushed himself/herself to understand what the student was saying and encouraged the student to explain himself/herself more fully and in a variety of ways.

Once we agreed on what constituted a critical event, the first and second authors independently highlighted and time-stamped these in the data table. After completing this coding, we resolved portions that did not overlap through further discussions. Some of this analysis was shared with the third author for further confirmation.

We examined the combination of modes—speech, gestures, writing, drawings, and the use of concrete materials—the students used as they discussed the construct of area and isolated 30 episodes for further examination. These occurred with 19 of the 26 student interviews for the Area Comparison task. Each of the 19 students had between one and three critical episodes. Once we had critical events that we agreed upon, we carefully examined each one and determined 11 episodes (with 11 different students out of the 19) where the students used a combination of gestures, manipulation of shapes, and inscriptions to make their argument. This was partly determined by the trajectory of questioning that pushed the students to explain their thinking. Through this process of questioning, we were able to get a more detailed picture about how they were combining the different modes to make mathematical meaning. These 11 episodes illustrate cases where the students relied on some combination of modes, including gestures, manipulation of concrete materials, and inscriptions, to communicate the key mathematical strategies they used to solve the problem. After describing some general results of the interviews, we will highlight two vignettes from these 11 episodes that illustrate the rich multimodal explanations that these students constructed.

6 Results

In general, there were two strategies that the students used to solve the task. The first strategy used one set of shapes (N and P), which the students put on top of each other and observed that the extra part of the triangle (P) could be cut and rearranged into the non-overlapping part of the square (Fig. 3). In the second strategy, students used the two Ps and two Ns to form two congruent rectangles. They observed that the N and P were each halves of congruent rectangles and, thus, had the same area. Based on our interactions with the students, 16 students solved the problem using two shapes and 14 students solved the task using all four shapes. There were three students who could not solve the task.

Fig. 3
figure 3

Cut and rearrange solution

In the two vignettes that follow, we feature student explanations where a high modal density is generated through gestures and the positioning of shapes. Note that there were students who provided a correct explanation where a high modal density was generated in their speech rather than the positioning of the shapes and gestures. For example, the student would state that the rectangles formed from the Ps and the Ns were the same area (putting the shapes together) and each P and N was half of the respective rectangle (pulling the shapes apart). We decided to focus on explanations where the mathematical meaning was constructed in modes other than speech to highlight further possibilities for mathematical thinking among bilingual students, which could be interpreted as incorrect.

In the first vignette, Maicon used the first strategy of cutting and rearranging relying on minimal speech and a twisting gesture. In the second vignette, Adam also uses the cut-and-rearrange strategy, though he uses all four shapes that were provided. These two vignettes illustrate how other bilingual students in the group used a selection of modes to explain their solution to the problem. The students’ ability to choose the mode that best solved the problem was a powerful way of acknowledging their capacity for a solution using a mode that may not be appropriate in assessments. These students’ use of a variety of modes was crucial in demonstrating the mathematical reasoning in the solution. Following the multimodal framework, we analyzed the communication by studying the interrelationship of the different modes to understand the intended meaning of the students.

6.1 Vignette 1

Maicon, a seventh grader, initially attempted to solve the problem by just looking at the shapes, an approach that was common with other students too. Based on his observation, Maicon arrives at two seemingly contradictory conclusions. He primarily uses speech to structure his explanations. First, he says that “P was bigger than N around the areas” and, second, that “N was bigger on the inside.” After this interaction, the interviewer (denoted by “I” in what follows) prompted Maicon to interact with the shapes that were provided.

I: See if you can play around with the shapes and come up with some sort of an explanation.

Table 3 shows the sequence of frames that unfold in time as Maicon attempts to solve the problem and explain his solution to the interviewer.

Table 3 Maicon’s explanation of his solution to the Area Comparison problem

By examining Maicon’s initial interactions through Norris’ (2004) framework of modal density, the speech mode takes on a high intensity and is the primary mode for structuring the interaction. However, his speech remains vague (e.g., “P was bigger than N around the areas”). His vagueness in speech could be a consequence of him still learning the academic language associated with this content area. It is also possible that Maicon was thinking of the everyday meaning of “area” as a place. Thus, by saying, “P was bigger than N around the areas,” he was referring to places with different boundaries. The nature of his interactions changes after the interviewer invites him to use the provided shapes. Maicon plays with the shapes and, in the process, formulates a solution. Maicon is able to construct a mathematical explanation by positioning the shapes and using this positioning to structure the interactions with the interviewer. In frames 1 and 2, Maicon correctly combines the Ps to form a rectangle and compares its area to N by placing it on top rather than on the side. Note that the placement was significant given our experience with some students who compared the lengths of the sides of the shapes and concluded that the triangle had a larger “area.” In frame 2, Maicon observes that the narrow part of P that extends beyond N has the same area as the narrow part of the P that is covered. After this comparison, Maicon removes the second triangle (frame 4) and uses the position of the remaining shapes as an anchor to set up his gestures. Note that by placing and then removing the second N, Maicon knows that the base of P is twice the length of the side of N. In frame 5, Maicon intertwines the modes of positioning, pointing, and speech. The speech indicates that he makes a cut, but the positioning of the shapes and his pointing indicate the location of the cut.

In frame 6, Maicon uses the twisting gesture to indicate that A could be cut and rearranged in B (Fig. 3). In terms of transformational geometry, the twisting gesture would represent a counterclockwise rotation of the triangular piece about a point by 180°. Once again, the position of the shapes anchors and gives meaning to the twisting gesture and the speech. By tracking Maicon’s explanation through the frames, we observe that he makes a convincing argument of the shapes having the same area.

Maicon’s explanation is driven by the tactile and visual nature of the shapes. He puts the shapes in various positions and observes the effect of this positioning on his final goal of solving the problem. If a particular position is useful, the concrete shapes allow Maicon to hold this position on the table and overlay his next steps on the positioned shapes (e.g., gesture and speech indicating a cut in frame 5). In addition to helping him solve the problem, the nature of the shapes as material anchors and the overlaid gestures minimize the need for an elaborate verbal formulation, a big advantage at this stage of his English proficiency. Based on the interactions prior to working with the shapes, we conjecture that constructing a mathematical explanation would be more challenging for Maicon if he did not have access to the modes of positioning and gesture.

6.2 Vignette 2

Adam is a sixth grade bilingual student who worked out that the areas of N and P were the same using one set of shapes and a cut-and-rearrange strategy similar to that used by Maicon (Fig. 3). After his initial method, the interviewer encouraged him to provide another explanation that would not involve cutting one piece of a shape and rearranging. After several attempts, Adam came up with a method that involved simultaneously working with the rectangles formed by two copies of N and P, respectively. However, instead of noting that there were two different ways to cut the rectangles in half, Adam focused on dividing the rectangle formed by the Ps using a vertical cut and observed that the resulting squares had the same area as the Ns. He demonstrates this with a complex intertwining of the modes of position, gestures, head movement, and speech.

Table 4 shows the sequence of frames that unfold in time as Adam attempts to solve the problem and explain his solution to the interviewer.

Table 4 Adam’s explanation of his solution to the Area Comparison problem using four shapes

I: Ok so what were you doing?

Adam made a cut-and-rearrange argument simultaneously where he recognized that the portions A and B have the same area. Thus, instead of cutting A and moving it into the space of B (Fig. 4), he visually recognized that A and B had equal areas and that there was no need to switch their positions. Therefore, when cut down the middle, the two Ps would create two Ns (Fig. 5). Key evidence of this is provided through a careful selection and combination of positioning the shapes, gestures, and speech.

Fig. 4
figure 4

Non-overlapping parts A and B when the shapes are placed on top of each other

Fig. 5
figure 5

Cut through the center of the rectangle formed with the Ps

Just like Maicon, the tactile nature of the shapes is important for Adam as he tries various positions with the shapes in his quest to find a second solution. In frames 1 to 4, Adam effectively draws on the affordances of shapes to be positioned as he makes his case in speech and gestures that the areas of the rectangles are the same. Note that the way the sides are lined up could indicate a focus on the lengths instead of area. However, through his speech and pointing in frames 2 and 3, Adam confirms that he is comparing the areas by indicating the placement of the shapes on top of each other. In frames 5 to 8, Adam once again draws on the flexibility of moving the shapes to bolster his speech about the Ns being “connected” (frame 5) and “cut” and “split” (frame 6). These actions with the rectangle formed by the Ns prepare for his next set of actions on the Ps as he relates the splitting and recombining of the Ps to the Ns established in frames 5 to 8 (frames 13 and 14). According to Gullberg (2011), this strategy of anchoring entities is commonly used by bilingual speakers (when speaking in their non-native language) to help them, and their interlocutors, keep track of objects and events as they unfold in a complex narrative. In her study, she found that by locating some of the objects and entities in the space and using gestures to refer back to them, bilingual speakers are able to construct a narrative that is coherent and comprehensible. In the process, bilingual speakers can avoid the possible grammatical challenges associated with keeping track of objects, people, time, space, and actions in speech (Gullberg, 2011). In frame 9, Adam shifts his attention to the rectangle formed by Ps and holds the pieces together as he emphasizes in his speech that “these are connecting.” Note that the connection of the shapes is important in generating two squares for comparison with the Ns rather than four separate pieces. On these connected Ps, Adam overlays familiar cutting gestures (frames 10 and 11) and, through his speech, indicates the type of cut (straight) and location of the cut (middle), an important feature of the solution. Adam’s drag gestures performed in frames 12 and 14 are key to his method and convey how two squares are generated from the rectangle formed by the Ps. Rather than just pointing to a spot, Adam’s gesture reiterated the two pieces that were combining to form a square. With both his hands occupied holding the Ps together and gesturing, Adam uses his chin to indicate that the squares cut from the Ps were equal in area to the Ns that were located just below the Ps (frame 14). Note that his actions in frames 5 to 8 ensure that “this” in frame 14 refers to one of the Ns and not the rectangle formed by both the Ns. Overall, Adam constructs a solution by effectively drawing on and intertwining various modes. Further, the modes of positioning and the gestures are major contributors to the mathematical aspects of the communication.

7 Discussion

A crucial part of assessment is to understand what students know and can do. In the current context of paper-and-pencil tests, this may be challenging to do for bilingual students. In this study, we see the richness in bilingual students’ thinking as they interact with an assessment task, something that would not be observed within the current testing format dominated by the written mode. Through the lens of multimodality, this study identified how bilingual students solve and explain a mathematics assessment task by drawing on speech, gestures, and concrete objects. Through a careful analysis of the critical events for this task, we demonstrate that bilingual students go beyond pointing and drawing attention to objects. Instead, the multiple modes are employed to complete and discuss mathematical operations. For example, among all possible arrangements of the Ps (e.g., parallelogram), the students chose the rectangle to make inferences. Further, their placement of the shapes on top of each other indicated that the students were thinking of area as the space inside the shapes rather than the lengths of the sides. The students’ use of gesture also indicated their understanding of rotations and translations in transformational geometry. For example, Maicon uses the twisting gesture to indicate a cut-and-rearrange operation that demonstrated an understanding of rotation of shapes about a point.

In this study, we observed that bilingual students built on the affordances of modes besides speech to construct multimodal ensembles to sidestep possible lexical and grammatical challenges. If we insisted on explanations that were dominated by writing or even speech, then, like Maicon’s initial attempts, the explanation could be considered vague. By using the modes of positioning of the shapes and overlaying the gestures, the students were able to develop a multimodal explanation. For students developing their proficiency in English, reliance on modes other than speech and writing is essential at the initial stages of their language learning.

We observe that the affordances of the task allow for the bilingual students to draw on a variety of modes in their explanations. In the Area Comparison task, the concrete shapes act as material anchors that provide stability so that the students could think about the relationships between the parts that were relevant to the solution (e.g., parts that were cut and moved). The shapes served to anchor the gestures, speech, and gaze as they developed their explanations. For example, Adam uses the shapes to anchor features that he refers back to in his mathematical explanation, which would be more challenging to do without the permanent nature of the cutouts. As bilingual students formulate explanations, they may not have immediate access to the academic vocabulary and grammatical resources to formulate a verbal explanation in the moment. As such, the cutouts served to reduce the cognitive demands of formulating an elaborate verbal explanation related to their mathematical actions.

Our work illustrates that multimodality could be a key resource in assessing bilingual students. A broader view of communication and assessment, along the lines discussed in this article, would allow for an expanded repertoire of mathematical solutions to include explanations that are developed in modes other than writing and/or speech. The multimodal approach allows us to take a non-deficit approach to perspectives that surround the assessment performance of bilingual learners. If these students are provided with appropriate tasks, which afford the use of multiple modes, then we can discover the rich possibilities in these students’ thinking.