Mathematics provides a set of tools for describing, analyzing and predicting the behaviour of systems in different domains of the real world (Burkhardt 1994). This practical usefulness of mathematics has always provided, and still provides, one of the major justifications for the important role of mathematics in the school curriculum (Blum and Niss 1991). In particular, the introduction of application and modelling was mainly intended to develop in students the skills of knowing when and how to apply their mathematics effectively in various kinds of problem situations encountered in everyday life and at work. For example, Thorndike (1922 p. 101) stated as a guiding principle for teaching arithmetic that one should “favor … the situations which life itself will offer, and the responses which life itself will demand.”

The application of mathematics to solve problem situations in the real world—otherwise termed mathematical modelling—can be usefully thought of as a complex and cyclic process involving a number of phases (e.g., Burkhardt 1994; Blum and Niss 1991; Verschaffel et al. 2000):

  • understanding the key elements in the problem situation;

  • constructing a mathematical model of the relevant elements and relations embedded in the situation;

  • working through the mathematical model to derive mathematical result(s);

  • interpreting the outcome of the computational work;

  • evaluating if the interpreted mathematical outcome is appropriate and reasonable;

  • communicating the obtained solution of the original real-world problem.

Traditionally, word problems were used as the typical vehicle for introducing modelling and application problems in the mathematics classroom. Ideally, each of the six aforementioned components can also be distinguished in a word problem solving process (Verschaffel et al. 2000).

As Swetz (2009) has argued and illustrated in various publications, word problems have a history of several thousand years, featuring in classic texts of ancient Egypt, India, and China, for example, in forms that reflect the cultures in which they were embedded and yet, in some cases, differing remarkably little from examples to be found in contemporary text-books. Throughout this history, as well as exercises of practical importance, are to be found word problems as puzzles for intellectual play. For example, the first mathematics book to be printed, the Treviso Arithmetic of 1478, contained both very practical mercantile mathematics and verbal puzzles (Swetz 1987). Other functions for word problems can be identified, including their use for practicing computational procedures, and their use as “mental manipulatives” (Toom 1999) whereby a verbal description of an imaginable situation, however fanciful, serves to communicate a mathematical task. The present article only addresses the application function of word problems.

For a very long time, school word problems have played this application function without much reflection or critical concern. Of course, there have always been individuals showing (some) awareness of the bridging problem between reality and mathematics and the complexities involved (see Verschaffel et al. 2000, pp. 132–134 for a summary of an example from 1880 due to Lewis Carroll). However, many teachers, textbook writers, and researchers have been using, and still use nowadays, word problems as if there was no serious “bridging problem” at all (Verschaffel et al. 2000).

During the last 20 years, many scholars have argued in various ways that the (traditional) practice of word problems in school mathematics does not foster in students, indeed inhibits, a genuine disposition towards mathematical modelling and applied problem solving. A first line of argumentation relies on philosophical, epistemological, and sociolinguistic analyses of how the abstract structures of mathematics relate to phenomena in the real world in general and on the analysis of word problems as a text genre in particular. Such work poses penetrating questions about the problematic authenticity of school tasks that ostensibly model reality, and about the mathematical school practices within which such tasks are embedded (Lave 1992; Säljö et al. 2009). The analysis is deepened by the multiple ways in which postmodern linguistic and epistemological theories problematize the nature of reality itself (e.g., Gerofsky 1997, 2009). These kinds of analyses result in serious questionings of the “unproblematic acceptance of concepts of separable mathematical and real worlds and of word problems as a transparent bridge between the two” (Gerofsky 1997, p. 22). A second, closely related, line of argumentation relies on empirical work, mostly grounded in sociocultural, socioconstructivist, and interactional perspectives, that has documented that after being immersed for several years into a traditional mathematics educational culture many students have constructed an approach to mathematical modelling whereby this activity is reduced to the execution of one or more arithmetic operations with the numbers in the problem, without any serious consideration of possible constraints of the realities of the problem context that may jeopardize the appropriateness of their standard models and solutions (Prediger 2009). Together, these two closely related lines of research-based argumentation have led to serious skepticism about word problems as a vehicle for promoting the development of students’ disposition towards authentic mathematical modelling and to a search for alternative ways of promoting mathematical modelling in the classroom.

In this article we give a brief review and discussion of this research, including a summary of earlier work culminating in the book by Verschaffel et al. (2000) and with special attention to the more recent empirical work. We begin with presenting the ascertaining studies documenting and illuminating the phenomenon of “suspension of sense-making” when doing school arithmetic word problems. Then we move to studies that have contributed to the explanation of the observed effects. This explanation is followed by a review of some recent design experiments wherein the modelling perspective has been implemented and tested. Afterwards we discuss some recent studies on the difficulties encountered by teachers who try to implement this new perspective into their daily classroom practices. Finally, we discuss a number of educational implications of the research done so far and some challenges for the future of teaching mathematical modelling.

1 Manifestations of Students’ Abstention from Sense-Making when Doing Word Problems

There are many examples of responses by children to word problems that show an apparent willingness to ignore things that they know about the world, language, and logic (see the first chapter of Verschaffel et al. 2000, for a survey). The most dramatic and well-known example is probably the French study (prompted by a satirical reflection from Gustave Flaubert, see Verschaffel et al. 2000) in which elementary school children were posed questions of the following type: “There are 26 sheep and 10 goats on a ship. How old is the captain?” A large majority gave a numerical answer, while only a small minority questioned whether an answer is possible. The finding that the majority of the children were prepared to offer an answer to this and similarly nonsensical questions became a cause célèbre, both in France (Baruk 1985) and within international mathematics education circles.

Intrigued by this example and some other manifestations of “suspension of sense-making” when doing school mathematics problems, we carried out in parallel in Northern Ireland and in Flanders pencil-and-paper studies with upper elementary and lower secondary school students, using a set of somewhat different problems including the examples in Table 1 (Greer 1993; Verschaffel et al. 1994).

Table 1 Examples of P-items involved in Verschaffel et al.’s (1994) study*

All problems used in these studies are about sense-making, but differ from the “captain’s problem” and related problems in that they admit of sensible answers, albeit of types not generally sanctioned in mathematics classrooms, such as approximations or ranges. We termed each of these items “problematic” (P) in the sense that they require (from our point of view) the application of judgment based on real-world knowledge and assumptions, rather than only the routine application of basic arithmetical operation(s) cued by the problem. In both studies, the P-items were administered in a paper-and-pencil test together with a set of matched standard problems (S-items) that can be solved unproblematically (again, in our judgment) by applying the most obvious arithmetic operation(s) with the given numbers. For instance, the corresponding S-item for the first P-item in Table 1 was: “Pete organized a birthday party for his tenth birthday. He invited 8 boy friends and 4 girl friends. How many friends did Pete invite for his birthday party?” As well as recording answers, students were invited to explain or comment on their responses. When a P-item was answered in the predictable routine-based way without comment, we termed it a “non-realistic reaction” (NR). A response was classified as a “realistic reaction” (RR) if either the answer given indicated that realistic considerations had been taken into account or if a comment was added to the routine-based answer indicating that the student was aware of the modelling complexity. For example, a classification RR for the planks P-item would be given to a student who gave the (realistic) answer “8” (instead of 10) or who responded with 10 but who made a comment such as “Steve would have a hard time putting together the remaining pieces of 0.5 meters”. In both Greer’s (1993) and Verschaffel et al.’s (1994) studies, students demonstrated a very strong overall tendency to exclude real-world knowledge and realistic considerations. (The percentages of correct responses to the corresponding S-items were close to 100%).

The findings of Greer (1993) and Verschaffel et al. (1994) have been replicated in many countries, including Belgium (Verschaffel et al. 1999), China (Xin et al. 2007; Xin and Zhang 2009), Germany (Renkl 1999), Hungary (Csíkos 2003), Japan (Yoshida et al. 1997), Northern Ireland (Caldwell 1995), Switzerland (Reusser and Stebler 1997a), and Venezuela (Hidalgo 1997), mostly as part of more extensive investigations of the effects of certain variations in the presentation of the problems or in the experimental setting (see below). The findings were strikingly consistent across all these countries, sometimes to the great surprise and disappointment of these other researchers who had anticipated that the “disastrous” picture of the Irish and Flemish pupils would not apply to their students. Footnote 1

Besides the consistency across nationalities, these replication studies, as well as some other studies with a somewhat different scope and methodology, have shown how the tendency to respond to school arithmetic word problems in a stereotyped and non-realistic way is related to various kinds of task, subject, and context characteristics. For instance, with respect to task variables it has been found in all above-mentioned studies that P-items about the interpretation of a division with a remainder (e.g., the buses item and the balloons item from Table 1) elicit considerably more realistic answers than the other kinds of P-items in the problem set. A possible explanation for this recurrent finding is that the different P-problems require the problem solver to make realistic considerations at different stages of the modelling cycle. When confronted with the birthday party problem, for instance, students have to take into account realistic considerations (of friendships and birthday parties) already at the initial stage of building up a proper situational model, whereas the division-with-remainder (DWR) problems require students to use real-world knowledge only at the final stage of the modelling cycle, wherein they have to interpret the outcome of their computational work. Apparently, children seem to perform better when they have to behave realistically at the end of the modelling cycle than when such behavior is required at the beginning (Verschaffel et al. 1994; Xin 2009). However, as Verschaffel et al. (1994) have argued, there are other possible explanations for these remarkable differences between P-items as well. Csíkos (2003) tried to unravel the structure underlying Verschaffel et al.’s problem set of P-items by means of factor and cluster analysis. Whereas these analyses largely confirmed the difference between S- and P-items, the internal structure of the P-items remained obscure.

With respect to subject variables, research evidence suggests that students’ tendency to ignore plausibly relevant and familiar aspects of reality in answering word problems is associated with age, gender, and social class. For instance, Boaler (1994) observed that girls were more likely to remain within an “everyday” frame of reference when doing application problems in a school context, leading to less appropriate answers if scored from a traditional point of view, and Cooper and his colleagues (e.g., Cooper and Dunne 1998; Cooper and Harries 2009) found the same for working-class children. However, comparing boys’ and girls’ performance on the 10 P-items from Verschaffel et al.’s (1994) test, Csíkos (2003) found no significant gender differences on any problem, except for the planks item, where boys produced 19% RR’s versus only 8% for girls. According to Csikos, this difference may be due to the fact that 10–11 year-old boys might have more real-life experience about sawing planks. Results for age are also mixed. On the one hand, recent studies with Chinese students (Xin et al. 2007; Xin and Zhang 2009) have shown improvement in the ability to solve P-problems with age. According to the authors, this finding could be attributed to a combination of a growth in children’s everyday knowledge base and their knowledge of mathematics and/or other relevant scientific subjects. On the other hand, there is some evidence that more years of experience with (traditional) schooling may lead to a decrease in percentage of RR’s to problematic word problems (Radatz 1983; Yeping and Silver 2000). For instance, Radatz (1983) observed that the percentage of children trying to reach some solution on “nonsensical” problems such as the “captain’s problem” increased with years of schooling between Kindergarten and grade 4. These contrasting findings may be explained by the fact that the problems used by Radatz were quite different from the ones used in the studies with the Chinese students (Xin et al. 2007; Xin and Zhang 2009). Nevertheless, the discrepant age trends certainly represent an aspect that invites more penetrating analysis.

In an attempt to better understand what happened in these initial studies, several follow-up studies tested the effectiveness of variations in the experimental setting. A first set of studies used an explicit warning at the beginning of the test that some of the problems in the test were non-trivial or might require an unusual way of responding. For instance, besides a replication of the original study by Verschaffel et al. (1994) with Japanese pupils, Yoshida et al. (1997) also made a comparison between groups of Japanese pupils with and without extra hints aimed at encouraging the disposition towards more realistic mathematical problem solving. The additional general warning at the start of the test, aimed at increasing the alertness of the pupils and thereby the number of RR’s on the P-items, produced only a small, statistically non-significant difference in favor of the group receiving that warning (20% as opposed to 15% RR’s overall).

In a recent study by Xin et al. (2007), fourth and fifth graders were administered the 10 pairs of S- and P-problems from Verschaffel et al.’s original study under two different conditions: a “warning instruction” and a “process-oriented instruction” printed on the top of the test sheets. In the “warning instruction” the children were told that some of the problems were not as easy as they seemed to be, whereas in the “process-oriented instruction” students were asked at the beginning of the test to consider the following two questions that would be helpful to their solutions: (1) What are the real-life situations behind the problem statements? (2) Is it appropriate to solve these problems by using straightforward arithmetic operations? The results showed that the difference in the percentages of realistic considerations reached marginal statistical significance between the two instructions (warning vs. process-oriented) (21% vs. 28%). Even though process-oriented instruction seemed, at least partially, to activate children’s real-world knowledge and experience and raise their critical awareness of the appropriateness or otherwise of straightforward arithmetic operations, compared to the standard condition (see above), the impact of these experimental warnings remained disappointingly low.

The results of these studies indicate that variations in the experimental setting intended to make pupils more alert, to sensitize them to the consideration of aspects of reality, and to legitimize alternative forms of answer produce, at best, only weak effects. Apparently, these interventions are not powerful enough to overrule pupils’ ingrained beliefs about word problems.

Another set of follow-up studies investigated the impact of another kind of experimental variation, namely increasing the authenticity of the experimental setting. In these studies, one or more categories of P-items were presented in a more authentic, performance-based setting, for instance, in the context of a group discussion and/or embedded in concrete materials and performance-based goals.

DeFranco and Curcio (1997) examined students’ interpretation of DWR problems embedded in (a) a restrictive scholastic setting, and (b) a (relatively) real-world setting. In the first part of the study, 20 sixth graders were confronted with the following version of the buses item in a restrictive context (i.e., an individual interview in which pupils were questioned about mathematical word problem solving): “328 senior citizens are going on a trip. A bus can seat 40 people. How many buses are needed so that all the senior citizens can go on the trip?”. In the second part, the same pupils were asked to make a telephone call using a teletrainer obtained from a telephone company to order minivans to take sixth-graders to a class party. The (oral) request to make a phone call was accompanied with a fact sheet with relevant information about the date, time, and place of the party, and the number of children attending the party. Only two of the 20 children responded appropriately to the buses item in the restrictive setting. Of the 18 children who produced an inappropriate response, 17 made an incorrect interpretation of the remainder (e.g., by responding with an answer involving a remainder or by rounding their result down to 8 buses without any further comment). By contrast, in the real-world setting, 16 out of the 20 students gave a realistically appropriate response. Thirteen of them ordered 7 minivans because they realized part of a vehicle could not be ordered, and the other 3 ordered 6 minivans but gave good reasons for doing so (e.g., they explained that another, smaller vehicle “like a car or something” would be needed to transport the remaining two students).

In contrast to the research reviewed in the first set of studies involving a general warning, changing the experimental setting, in the way DeFranco and Curcio (1997) and others (see, e.g., Reusser and Stebler 1997b; Säljö et al. 2009; Wyndhamn and Säljö 1997) have done, resulted in much greater improvements in students’ performance on the P-items, and, more specifically, in their inclination and their capability to include the real-world knowledge and the realistic considerations they were so reluctant to activate under the previous, more restricted, testing conditions. More specifically, these findings suggest that when the nature of the “premises for the interactive ritual” (Wyndhamn and Säljö 1997, p. 379)—or, more generally, as Greer (1997, p. 305) has called it, the “experimental contract”—afford it, students are prepared and able to take realistic considerations into account when responding to mathematical problems.

The above-mentioned studies that analyzed student performance on a set of P-problems have been complemented with another type of investigations in which (groups of) students, after they had individually solved (some of) the above-mentioned P-items in a scholastic setting, were questioned in the context of individual or collective “debriefs” (e.g., Caldwell 1995; Hidalgo 1997; Inoue 2001; Reusser and Stebler 1997a; Selter 2001). Many pupils in these studies acknowledged, in retrospect, that they had given the NR automatically, without any hesitation, as illustrated in the following comments: “I know all these things, but I would never think to include them in a math problem. Math isn’t about things like that. It’s about getting sums right and you don’t need to know outside things to get sums right” (Caldwell 1995, p. 39). Others pupils reported that they had been thinking about the modelling difficulties and complexities, but finally decided to choose the NR by deliberately applying certain norms and tactics of “the word problem game” (Verschaffel et al. 2000), as in the following example: “I did think about the difficulty, but then I just calculated it the usual way. (Why?) Because I just had to find some sort of solution of the problem and that was the only way it worked. I’ve got to have a solution, haven’t I?” (Reusser and Stebler 1997a). The results from these debriefs seem to support the claim made by several authors that it is certainly not a strange kind of “cognitive deficit” that causes pupils’ general and strong abstention from sense-making when doing arithmetic word problems in a typical school setting. They confirm that students are not prepared for the kinds of difficulties raised by these P-items, mostly because they are obviously trained to expect the S-type of items in the classroom situation and to routinely handle that type of problem. So, as Schoenfeld (1991, p. 340) suggested, students who react to P-items in a NR way are engaged in sense-making of the deepest kind: “In the context of schooling, such behavior represents the construction of a set of beliefs and behaviors that result in praise for good performance, minimal conflict, fitting in socially, etc. What could be more sensible than that?”

Before moving to the next section wherein we try to explain what elements in students’ instructional histories have led to this adaptive behavior, we point out that some studies in which researchers have tried to gain more insight into how students view and handle P-items by talking to them (rather than by confronting them with a paper-and-pencil test and coding their answers), have suggested that some students’ NRs to P-items are not the result of an automatic or deliberate neglect of real-world considerations, but the expression of idiosyncratic rationality that does involve realistic considerations. For instance, Selter (2001) confronted 24 fourth-graders with a group of word problems one of which was a DWR problem presented in a football context: “820 supporters of Borussia Dortmund want to go to an away game by bus. In each bus 40 supporters can be seated. How many buses are needed?” The written solutions were classified as follows …

  • 21 buses (7 children).

  • 20 buses (8 children).

  • “20 1/2 buses” or “20 remainder 20 buses” (6 children).

  • The remaining children had difficulties with the arithmetical operation, two of them dealt correctly with the (wrong) remainder.

So, the general trend of the previous studies was confirmed, if the distribution of the answers with respect to the three main categories of answers is taken into account. But in order to better understand pupils’ thinking, interviews with twelve children were conducted one week later. Selter (2001) reports the case of Boris, who at the beginning of the interview worked out the division problem correctly and then put down the answer: “20 buses have to drive.” He continued to think about the problem and added 1/2 to his written answer. When the interviewer asked him what he had put down, he answered: “Well, 20 buses, plus 1/2 … not really 1/2, actually one bus, … one bus with half of the places occupied … plus one bus, thus 21 is the answer.” Without the interview information Boris’ reaction would have been classified as a clear example of suspension of sense making. But he did not mean “half a bus”, but “a bus half full”. He deeply thought things through, and his interpretation of the remainder is more complex and nuanced than his (initial) result 20 may suggest. Two other children from the third category justified their answers in similar ways. Thus, a closer look at students’ thinking may reveal more sense-making rationality than observed at first. In another study Selter (1994) interviewed a group of third-graders on how they viewed and solved problems such as “A shepherd owns 19 sheep and 13 goats. How old is the shepherd?” or “There are 13 boys and 15 girls sitting in a classroom. How old is the teacher?”. An analysis of the videos revealed that quite a lot of the children who “solved” the problems by “simply” adding the two given numbers, showed some slight irritation at the moment they were given the problems (a short laugh or any other sign of astonishment), but then immediately devoted themselves to a kind of seemingly thoughtless ritual stating, for example “Actually, our result cannot be really right … Shall we write it down anyway? … Let’s put it down here.” Clearly, these children knew that is was strange to combine the numbers in order to get the result, but they had the feeling that the solution must have been hidden somewhere in the problem (as in a riddle). Thus, several children tried to see the problems from a different perspective that allowed them to somehow connect the given numbers with the arithmetical operation carried out, at least after being asked how they arrived at their solution. Some examples of their creative constructions (for the shepherd problem) were:

  • “The shepherd was given a sheep or a goat on each of his birthdays.”

  • “He bought one animal for each year of his life; so he always knows his age.”

In other words, it was not always an automatic or deliberate decision to neglect real-world knowledge and considerations that was underlying students’ non-realistic responses to P-items; at least in some cases their seemingly unrealistic responses might have been the result of idiosyncratic but sophisticated sense-making processes (see also Inoue 2005). A major methodological problem remains unresolved, however, namely that it is extremely difficult to distinguish solutions based on such idiosyncratic interpretations that took place during problem solving from post-hoc rationalizations in defense of an initially automatically given non-realistic response.

These studies remind us of a pattern found in many other contexts whereby pencil and paper tests give both false positive and false negative results that can be shown up by simply talking with the students. Besides individual idiosyncratic notions, we should also be aware of societal variations that might influence interpretations. Footnote 2 For example, in the United States where there are many regulations and schools are wary of possible litigation, it may be more likely that the full number of buses, in relation to the number allowed per bus, will be ordered for a school trip whereas in a country with fewer resources, people will be disposed to pack a few more students in if it reduces the number of buses needed.

2 Looking for an Explanation in (Traditional) Mathematics Education

The previous results and their interpretation in terms of students’ school histories compels the question: How do these views on, and tactics for, doing school arithmetic word problem develop? Although there are some documented cases where it is explicitly and directly taught, we would claim that, typically, this development is not the result of explicit or direct teaching. Rather, it normally occurs implicitly, gradually, and tacitly in students through being immersed in the culture and practice of the mathematics classrooms in which they engage (Lave 1992). Putting it another way, students’ reactions to word problems develop over time from their perceptions and interpretations of the “didactical contract” (Brousseau 1997) or the “socio-mathematical norms and practices” (Yackel and Cobb 1996) within ”the culture of the mathematics classroom” (Seeger et al. 1998) that tell them—explicitly to some extent, but mainly implicitly—how to behave in a word problem solving lesson, how to approach a problem, how to respond to it, how to communicate with the teacher about it, and so on (Lave 1992). More specifically, this enculturation process seems to be mainly caused by two aspects of instructional practice, namely (1) the nature of the problems given to the students and (2) the way in which these problems are conceived and treated by teachers in their daily interactions with their students (Prediger 2009; Verschaffel et al. 2000).

Let’s first have a look at the nature of the problems given. Studies in different countries that have looked at word problems in traditional textbooks have revealed that, especially in the early grades of elementary school, most word problems:

  • are phrased as semantically impoverished, stereotyped verbal vignettes;

  • contain key words and other kinds of hints that help to identify the operation(s) to perform in a routine-based way;

  • are undoubtedly solvable by accepted criteria;

  • include no irrelevant information;

  • do not require and even do not allow to look outside the problem statement for additional information;

  • ask for a single, precise numerical answer;

  • require rarely more than a couple of minutes to be solved;

  • sometimes even involve presuppositions that are at odds with children’s real-world knowledge about the phenomena being evoked by the word problem statements.

If most textbook problems have these characteristics, it should not be a surprise that many students develop, gradually but inevitably, perceptions of, and tactics for, word problem solving that are characterized by a serious lack of sense-making (Reusser and Stebler 1997a; Verschaffel et al. 2000).

A second plausible explanatory factor is the way in which these problems are conceived of, and actually treated by, teachers. A study that sheds some light on this second factor is an investigation by Verschaffel et al. (1997) with a large group of pre-service elementary school teachers from three teacher training institutes in Flanders. A paper-and-pencil test was constructed consisting of 14 word problems: seven S-items and seven parallel P-items, selected from the study of Verschaffel et al. (1994). This test was given twice to all pre-service teachers. The first time, they had to answer the 14 word problems themselves. Immediately after they had finished, they were a second test, in which they were asked to score four different answers from pupils to the same 14 word problems. These four response alternatives to the seven P-items belonged to different categories: a non-realistic answer (NA), a realistic answer (RA), a technical error (TE), and another answer (OA) derived by using the wrong operation or giving one of the numbers in the problem. At the bottom of each problem, there was a box for writing explanations and/or comments (see Fig. 1 for an example item). As expected, like the upper elementary and lower secondary students from the previous studies, these student-teachers demonstrated a strong overall tendency to exclude real-world knowledge and realistic considerations when confronted with the problematic word problems. More interesting, however, was that student-teachers’ lack of disposition towards realistic modelling was also revealed by their relative evaluations of the realistic answer (RA) and the non-realistic answer (NA) for each of the 7 P-items. Overall, their evaluations of the non-realistic answers to the P-items was considerably more positive than for the realistic answers based on context-based considerations. This study convincingly demonstrates that many future teachers have knowledge and beliefs about teaching and learning arithmetic word problems that are problematic from our point of view.

Fig. 1
figure 1

Sample P-problem with alternative answers for evaluation (Verschaffel et al. 1997)

In a recent replication of Verschaffel et al.’s (1997) study, Bonotto and Wilczewski (2007) found that Italian student-teachers’ overall evaluations of the non-realistic answers were also considerably more positive than for the realistic ones, suggesting that these future teachers also seemed to believe that the activation of realistic context-based considerations should not be stimulated but, rather, discouraged in elementary-school mathematics.

3 Beyond Ascertaining Studies: Applying the Modelling Perspective

Starting more or less explicitly from the above criticisms on the traditional practice surrounding word problems in schools and from the genuine mathematical modelling perspective described above, researchers have set up design studies wherein they developed, implemented, and evaluated experimental programs aimed at the enhancement of students’ mathematical modelling and problems solving along the lines mentioned above. To mention just a few (for more examples, see Blum et al. 2007):

  • Verschaffel and De Corte’s (1997) small-scale teaching experiment wherein they make ample use of P-items to change upper elementary pupils’ conceptions about the role of real-world knowledge in mathematical modelling and problems solving, followed by replications and/or elaborations by Renkl (1999), Mason and Scrivani (2004), and Verschaffel et al. (1999);

  • Bonotto’s series of teaching experiments in upper elementary school aimed at fostering a mindful approach toward realistic mathematical modelling (Bonotto 2009);

  • Lehrer and Schauble’s (2000) experimental curriculum for mathematics and science teaching in young children built upon the modelling approach;

  • The Jasper studies of the Cognition and Technology Group at Vanderbilt (1997), wherein mathematical problem solving is anchored in realistic contexts using new information technologies;

  • Several intervention studies aimed at the enhancement of “Mathematisches Modellieren” at the upper elementary or lower secondary school level by German researchers (see, e.g., Blum and Leiß 2007; Maaß 2004);

  • The very sustained and theoretically highly developed instructional program for mathematical modelling by Lesh and colleagues (see e.g., Lesh and Doerr 2003).

While these experimental programs differ considerably in terms of their concrete aims and scope, content, and structure, some recurrent characteristics include:

  • The use of more realistic and challenging tasks than traditional textbook problems, which do involve some, if not most, of the complexities of real modelling tasks (such as the necessity to formulate the problem, to seek and apply aspects of the real context to proceed, to select tools to be used, to discuss alternative hypotheses and rival models, to decide upon the level of precision, to interpret and evaluate the outcome, etc.). It should be emphasized, however, that the above-mentioned experimental programs differ quite a lot in the “radicalism” of their reaction against standard word problems. Whereas some researchers do not remove these problems from their programs but try to improve them and reconceptualize them as genuine exercises in mathematical modelling, others take a more far-reaching approach and replace them by other kinds of mathematical modelling tasks that come close to involving the full complexity and authenticity of real real-world problems (e.g., Lesh and Caylor 2009).

  • A variety of teaching methods and learner activities, including expert modelling of the strategic aspects of the modelling process, small-group work, and whole-class discussions; typically, the focus is not on presenting and rehearsing established mathematical models, but rather on demonstrating, experiencing, articulating, and discussing what modelling is all about.

  • The creation of a classroom climate and, more particularly, a set of social and sociomathematical norms and practices (Prediger 2009; Yackel and Cobb 1996) that are conducive to the development of a more appropriate view of mathematical modelling, and to a more appropriate set of accompanying attitudes and beliefs, than those held by traditionally schooled children.

In most of these design experiments positive outcomes have been obtained in terms of performance, underlying processes, and motivational and affective aspects of learning. So, taken as a whole, the available research evidence shows that, to quote Niss (2001, p. 8), “application and modelling capability can be learnt, and has to be learnt, but at a cost, in terms of effort, complexity of task, time consumption, and reduction of syllabus in the traditional sense”. Hatano (1997) presented a cost-benefit analysis making similar points, and suggesting that the P-problems we have used “are too trivial for students to recognise the significance of ‘high-cost’ modelling activity” (p. 384). However, he made two suggestions, that we heartily endorse, for shifting the cost-benefit equation: “First, we can make a problem or its solution critically important for people’s lives. Alternatively, we can establish a culture that enjoys and highly evaluates comprehension activity” (p. 386).

To some extent, these characteristics of the modelling approach are beginning to be implemented in mathematical frameworks and texts—including at the elementary school level—in many countries, such as Germany, the UK, The Netherlands, and Belgium. In Germany, for instance, there are several textbooks for the elementary school that integrate authentic real-world settings and simple modelling activities from the very beginning, and these efforts have reached a considerable number of regular classrooms (although not all, of course). Moreover, modelling competencies have become part of the national curriculum standards and thus part of central exams (although not all reach yet the desired level of elaboration.) However, according to Niss (2001), it is still the case, in general international terms, that genuine application and modelling perspectives and activities continue to be scarce in the everyday practice of mathematical education. He points to several important barriers: (1) the difficulty of getting this modelling perspective into (high-stake) tests (partly because mathematical modelling is not viewed by many people as being a part of mathematics, partly because it is very difficult to assess these complex modelling skills appropriately in those tests), and (2) the high demands this modelling approach puts on teachers mathematically, pedagogically, and personally.

4 Implementing the Modelling Perspective in Real Classrooms

Even though it is generally accepted that the high demands this modelling approach puts on teachers is one of the major reasons why the genuine modelling perspective does not get widely and successfully implemented in instructional practice (see Niss 2001), only rarely has attention been paid to how regular teachers actually think about, and implement, connections between school mathematics and the real world, and how they determine whether, when, and how students are exposed to realistic modelling experiences, in particular, in their daily classroom practice. Recently several researchers have started to tackle the question how mathematics teachers conceive and approach traditional and/or realistic word problems in their actual daily teaching settings (see Chapman 2006; Depaepe et al. 2009a, 2009b; Gainsburg 2009; Kaiser and Maaß 2007). For instance, Depaepe et al. designed a study in which they investigated both the nature of the word problems actually selected and used by two teachers and the ways in which these two teachers approached these problems in two regular sixth-grade classrooms in Flanders that were not involved in any intervention study but that simply followed a curriculum and a textbook that claim to be inspired by the above-mentioned perspective on realistic mathematical modelling.

First, they were interested in whether the tasks that these teachers selected from the textbook–i.e., the most frequently used textbook in Flanders, which is representative of the standards and curricula for primary mathematics education in Flanders–reflected the features that are assumed to positively influence students’ genuine modelling skills. They used a fine-grained conceptual framework developed by Palm (2002) for analyzing the realistic nature of word problems. The basic idea of his framework lies in the notion of simulation: A word problem is considered to be realistic if the important aspects of the word problem are taken into account under conditions representative for that out-of-school situation. Palm’s framework includes elements such as the realistic nature of the event, the data and the question, the form in which the problem is presented, the response requirements, etc. Overall, Depaepe et al. (2009a) found that both the problems that the teachers selected from the textbook, and the ones they created themselves, seemed to simulate relatively well some aspects that are assumed to be important in designing realistic tasks according to Palm’s coding scheme, but failed to include others. Another important finding (that was not revealed by Palm’s coding scheme, however) was that the number of problems affording the possibility of experiencing the complexities and subtleties of genuine mathematical modelling and how it differs from applying known mathematical concepts and procedures on dressed-up mathematical application problems, was disappointingly small (Depaepe et al. 2009a).

Second, Depaepe et al. (2009b) investigated how these two upper elementary school teachers handle word problems in their actual instructional practice. This aspect was investigated through in-depth analysis of videotaped lessons. This analysis relied on Chapman’s (2006) distinction between a “paradigmatic-oriented” and a “narrative-oriented” instructional treatment of a school word problem. The paradigmatic mode of knowing is based on categorization or conceptualization and focuses on context-free and universal explanations. The narrative mode of knowing, in contrast, deals with human or human-like intentions and action and, thus, focuses on context-sensitive and particular explications. Because it was supposed that both the initial phases (understanding, modelling) and the final phases (interpreting, evaluating) of the mathematical modelling cycle would lend themselves to such an analysis of the teacher’s main instructional purpose with the problem context, they further distinguished in their analysis between orientations towards a narrative vs. a paradigmatic perspective during both the initial and the final stages of the problem-solving process. Overall, their findings highlighted, first, that the two teachers’ word problem solving lessons were more characterized by a paradigmatic than a narrative approach and, second, that instructional interventions were very rare in which the complex relation between the problem to be modeled and the actual mathematical model was actually experienced and problematized.

5 Conclusions

In our work generally, and specifically in this article, we have concentrated on what originally caught our attention, namely the apparent surrender of sense-making by children in relation to school word problems, our predominant response, based on analysis and research, to recommend adopting a modelling stance, and the educational implications of implementing that stance. It should be acknowledged that we are reflecting a judgment about what mathematics education should be about, one that is not, to say the least, shared by all those involved in determining national programs, including many mathematicians.

There is a wealth of theory and research relating to this area of mathematics education, illustrated in an edited collection (Verschaffel et al. 2009; see also Prediger 2009 for an interesting discussion of the different theoretical lenses that are currently applied to the phenomenon of children’s abstention from sense-making when doing school word problems). These analyses go more deeply into philosophical questions about relationships between our perceptions, interpretations, and constructions of reality, our repertoires of speech acts and playing of language games, and of representational acts and playing of the “Word Problem Game”. A major development among mathematics educators, one that we consider extremely important, is much more intensive attention to the nature of the interactions among students and teachers that take place in classrooms (Lave 1992), and in general “the culture of the mathematics classroom” (Seeger et al. 1998). Furthermore, once a modelling perspective is adopted, it implies cultural relativism, that is to say the “reality” of a situation such as distributing people into buses is societally variable. More generally still, the phenomena that we have documented lend themselves very naturally to analysis from the perspective of Activity Theory (Roth 2009).

While—collectively—this and related research has “problematized” the genre of word problems, the task remains to make research-based proposals to answer the mathematics educators’ question “What am I to do now that I have learned about the problematic nature of word problems?” While the intervention studies reviewed above provide very valuable building blocks to respond to this question more or less radically, recent studies, such as the one by Depaepe et al. (2009b), that have analyzed the concerns, doubts, and tensions of regular teachers working in real mathematics classrooms reveal how complex, subtle, and demanding the task is for a teacher wanting to implement the genuine modelling perspective in his/her mathematics classroom.

When we began our research on the (un)realistic nature of word problems about 15 years ago (Greer 1993; Verschaffel et al. 1994), we were “scandalized” by the examples of apparent suspension of sense-making that numerous researchers had brought to light. In order to protect what we might term “the children’s right of sense-making,” we proposed to promote word problems (of the kind described in this article) as a vehicle for teaching children important aspects of the mathematical modelling process. But, clearly, this aim could be seen as an example of a much wider aim, namely countering the tendency of mathematics education to become an education in simplistic—or, as Vinner (1997) would call it, “pseudo-analytic”—thinking. In this case, the simplistic thinking takes the form of coming to believe that it is acceptable to map aspects of the real word on to mathematical structures unthinkingly and uncritically.

This wider aim should be seen in relation to a world that is increasingly being mathematised, not just in relation to physical phenomena but also social (Jablonka and Gellert 2007; Skovsmose 2005). But beyond that universally recognized trend, there is a process complementary to mathematisation termed “demathematisation” (Jablonka and Gellert 2007) by which is meant the taking over by machines of functions previously carried out by humans so that the mathematics becomes hidden. Skovsmose (2005) and Frankenstein (2009) discuss revealing examples of how mathematical models lie beneath the surface of many aspects of modern life, such as the modelling used by airline companies to decide by how much to overbook flights or by managers to control the working conditions of workers. In an attempt to integrate insights about the nature of mathematical modelling and its (changing) role in contemporary life, Greer and Verschaffel (2007, p. 219) proposed a framework for thinking about, and working on, mathematical modelling consisting of three levels: (1) implicit modelling (in which the student is essentially modelling without being aware of it), (2) explicit modelling (in which attention is drawn to the modelling process), and (3) critical modelling (whereby the roles of modelling within mathematics and science, and within society, are critically examined).

Implicit modelling occurs throughout elementary schooling in relation to the basic arithmetic operations. As pointed out by Usiskin (2007), much of the standard elementary curriculum can be characterized as modelling, though it is not acknowledged as such. As Usiskin argues, it should be possible, from an early age on, for children to become aware that, for instance, not all cases where it seems, at first sight, to be reasonable to add, subtract, multiply, or divide are reasonable at second sight, and that by thinking carefully, you should be able, at least in some cases, to discriminate situations where the operation provides a very precise model, situations where it provides an approximate model, and situations where it is not appropriate (see, for example, the first P-problem in Fig. 1).

Explicit modelling is more typical of (the upper years of elementary school and of) secondary school. By this we mean going through the basic cycle of the modelling process—possibly with iterations (e.g. see Niss et al. 2007). The standard depiction of modelling is often simplistic and needs to be extended at least to take account of knowledge of the phenomenon being modeled, the goals of the modelling exercise, the resources available in terms of tools (of all kinds), information, and other people, comparison of alternative models, communication of results relative to the context in which the modelling is taking place (see also Verschaffel et al. 2000).

Critical modelling means thinking about the nature of modelling per se and its human and societal implications. As argued above, in the contemporary world where mathematisation and demathematisation are simultaneously pervasive, providing future citizens with even some rudimentary sense of how mathematical modelling works and affects our lives, and some agency to believe that they might be able to critique what is happening, is vital.

Evidently, once mathematics educators start applying this modelling perspective on a larger scale and allow students to bring in their personal experience when trying to make sense of all kinds of technical, social and cultural issues and phenomena, they will be confronted quickly and inevitably with the diversity of these experiences in terms of gender, social class, and ethnic diversity (see also, Boaler 1994; Cooper and Dunne 1998). As Mukhopadhyay and Greer (2001) have argued, engaging students in such modelling activities, with careful attention to the relevance of the problem contexts and all the diversity in views and approaches that they elicit, is an important way to prevent students from becoming alienated by mathematics and its authority, and to help them use mathematics as a powerful personal tool for the analysis of issues important in their personal lives, in their communities, and in society in general.