1 Theoretical and Empirical Background

Contemporary reform documents and curricula in most countries more or less explicitly assume that one of the most important goals of mathematics education is that students gain the competence to make sense of everyday-life situations and complex systems stemming from our modern society, which can be called “modelling competencies” (Blum et al. 2002; Lesh and Lehrer 2003). Although mathematical modelling is generally associated with courses at the tertiary or, to an increasing extent, secondary level of education, an early exposure to essential modelling ideas can provide a solid base for competently applying mathematics even at the primary school level (Usiskin 2007, 2008). In this chapter, we report a study on word problem classification that proved to be a promising modelling task at the primary level, having a relatively profound effect in breaking pupils’ well-documented tendency to overuse the linear or proportional model (De Bock et al. 2007).

Proportionality is recognised as an important mathematical topic receiving much attention throughout primary and secondary mathematics education. The reason lies in the fact that proportional relationships are the underlying model for approaching numerous practical and theoretical problem situations within mathematics and science. However, numerous documents and research reports on a wide variety of mathematical domains, and dealing with students of diverse ages, mention pupils’ tendency to apply the proportional model irrespective of the mathematical model(s) underlying the problem situation (Van Dooren et al. 2008). In a mature mathematical modelling approach (see, e.g. Verschaffel et al. 2000), essential steps would be: (1) understanding the problem, (2) selecting relevant relations and translating them into mathematical statements, (3) con­duc­ting the necessary calculations, (4) interpreting and evaluating the result. A study using in-depth interviews (De Bock et al. 2002), however, revealed that pupils almost completely bypass all steps except step 3. Their decision on the mathematical operations mainly was based on routinely recognizing the problem type, the actual calculating work received most time and attention, and after checking for basic calculation errors, the result was immediately communicated as the answer.

In line with the above interpretation, the overuse of proportionality might be broken if pupils pay more attention to the initial steps of the modelling cycle, that is, the understanding of the relevant aspects of the problem situation and their translation in mathematical terms. So, when pupils are engaged in a task with proportional and non-proportional word problems without the need to actually produce computational answers, they might be stimulated to engage in a qualitatively different kind of mathematical thinking, and develop a disposition toward differentiating proportional and non-proportional problems. This assumption was tested by administering a type of task that is rather uncommon in the mathematics classroom: the classification of a set of word problems.

Interest in the value of problem classification and reflection on the relatedness of problems is rather old. Polya (1957) indicated that when devising a plan to solve a mathematical problem, a useful heuristic is to think of related problems. Seminal work was also done by Krutetskii (1976), who indicated that high-ability students differ from low-ability students on their skills to distinguish relevant information (related to mathematical structure) from irrelevant information (contextual details), to perceive rapidly and accurately the mathematical struc­ture of problems, and to generalise across a wider range of mathematically similar problems. Studies that actually used problem classification tasks, however, are rare.

The study by Silver (1979) is well-known. Silver asked 8th graders to classify word problems according to their mathematical relatedness. Afterward, he did a didactical intervention in which the problems were solved and correct solutions were presented and discussed. Then, again, he offered the classification task. In analysing pupils’ classifications and the criteria they had used, Silver distinguished classifications based on mathematical structure, contextual details, question form, and pseudostructure (e.g. relating to the kind of quantity measured: speed, price, weight, …). Silver found strong correlations between the quality of pupils’ classifications and their problem solving performance. Also, classifications were more relating to the mathematical structure after the intervention than before, but the pseudostructure of word problems remained an important criterion for pupils’ classifications.

2 Method

2.1 Subjects, Tasks and Procedure

Seventy-five 6th graders – belonging to five classes in three different primary schools in a middle-sized Flemish city – completed a classification task and a solution task.

In the solution task, pupils were given a traditional paper-and-pencil word problem test, containing nine experimental word problems with different underlying mathe­matical models: three proportional, three additive, and three constant ones. These different types of word problems had already been used and validated in pre­vious research (Van Dooren et al. 2005). Proportional problems are characterised by a multiplicative relationship between the variables, implying that a proportional strategy leads to the correct answer (e.g. Johan and Herman both bought some roses. All roses are equally expensive, but Johan bought fewer roses. Johan bought 4 roses while Herman bought 20 roses. When you know that Johan had to pay 16 Euro, how much did Herman have to pay?). Additive problems have a constant difference between the two variables, so a correct approach is to add this difference to a third value (e.g. Ellen and Kim are running around a track. They run equally fast, but Kim began earlier. When Ellen has run 5 laps, Kim has run 15 laps. When Ellen has run 30 laps, how many has Kim run?). Constant problems have no relationship at all between the two variables. The value of the second variable does not change, so the correct answer is mentioned in the word problem (e.g. Jan and Tom are planting tulips. They use the same kind of tulip bulbs, but Jan plants fewer tulips. Jan plants 6 tulips while Tom plants 18 tulips. When you know that Jan’s tulips bloom after 24 weeks, how long will it take Tom’s tulips to bloom?). The word problems appeared in random order in the booklets, but a booklet never started with a proportional word problem to avoid that – from the start – pupils would expect the test to be about proportional reasoning. For the same reason, we also included six buffer items in the test.

For the classification task, pupils were given a box containing an instruction sheet, a set of nine cards (each containing one word problem), nine envelopes, and a pencil. Again, three of the word problems were proportional, three additive, and three constant. The instructions for pupils were kept somewhat vague because we wanted to see which criteria pupils would use spontaneously while classifying: “This box contains 9 cards with word problems. You don’t need to solve them. Rather, you need to figure out which word problems belong together. Try to make groups of problems that – in your view – have something in common. Put each group in an envelope, and write on the envelope what the word problems have in common. Use as many envelopes as necessary.

Both tasks were administered immediately after each other, but their order was manipulated. Half of the pupils got the solution task before the classification task (SC-condition, n  =  38), the other half got the solution task after the classification task (CS-condition, n  =  37). Because both tasks relied on nine experimental word problems, two parallel problem sets were constructed, each containing three proportional, three additive, and three constant word problems. In both conditions, pupils who got Set I in the classification task got Set II in the solution task and vice versa, so that in principle, differences between both sets would be cancelled out.

2.2 Analysis

Pupils’ responses to the problems in the solution task were classified as correct (C, correct answer was given), proportional error (P, proportional strategy applied to an additive or constant word problem), or other error (O, another solution procedure was followed). Obviously, for proportional problems, only two ­categories (C- and O-answers) were used.

For the classification task, the data are more complex. Two aspects of pupils’ classifications were analysed. The first aspect concerns the quality of the classifications, the second the kind of justifications pupils provided (as written on their envelopes).

The first aspect involves the extent to which pupils’ classifications took into account the different mathematical models underlying the word problems. For each pupil, scores were calculated using the following rules:

  • First, the group with the largest number of proportional problems (“P-group”) was identified. It acted as a reference group: If children would experience difficulties distinguishing proportional and non-proportional problems, they would probably consider some non-proportional problems as proportional, and thus include non-proportional problems in the P-group.

  • Next, among the remaining problems, the “A-group” and “C-group” were ­identified (the groups with the largest number of additive and constant problems, respectively). When more than one group could be labelled as A- or C-group, the group having the highest score (see next point) was chosen.

  • Every group (P, A, and C) got two scores: An uncorrected and a corrected score. We explain these for the P-group. (It is completely parallel for the A- and C-group.) The uncorrected score for the P-group (Pu) is the number of proportional problems in the P-group. The corrected score (Pc) is Pu minus the number of other problems in that group. If no A- or C-group could be distinguished, these scores were set to 0.

The second aspect was the quality of the justifications given by pupils. The justifications for the P-, A-, and C-group of every pupil were labelled using the ­following distinctions:

  • Superficial: Referring to aspects unrelated to the mathematical model: problem contexts (e.g. “these are about plants – tulips and roses”, “they all deal with cooking”), common words (e.g. “they both have the word when”), or numbers (e.g. “there is a 4 in the problems”, “all numbers are even”).

  • Implicit: Referring to the mathematical model in the problems, but not unequivocally or explicitly to one particular model. For example, “the more pies – the more apples, the more you buy – the more you pay” does not per se refer to proportional situations, and “they all relate to the speed with which activities are done” does not grasp the additive character of situations.

  • Explicit: Referring clearly and unambiguously to the (proportional, additive, or constant) mathematical model underlying the problems (e.g. referring to a proportional model “three times this so three times that, and in the other problem both things are doubled”, or referring to the additive model “one person has more than the other, but the difference stays the same” or for the constant problems “these are tricky questions: nothing changes”).

  • Rest: There is no justification written, or it is totally incomprehensible. This label is also assigned when the particular group does not exist.

Our classification of justifications is similar to that of Silver (1979, see above), with the exception of the “question form” category (because this was controlled in our set of problems). Also, our “implicit” category was more inclusive than the “pseudostructure” category of Silver.

3 Results

3.1 Solution Task

Table 6.1 presents the answers to the solution task. A first observation is that the proportional problems elicited many more correct answers (2.68 out of 3 problems, on average) than the additive (0.88) and constant (0.61) problems. A repeated measures logistic regression analysis indicated that this difference was significant, χ² (2)  =  23.87, p  <  0.0001. For the additive and constant problems, almost two out of three answers were proportional, indicating that pupils strongly tended to apply proportional calculations to the two types of non-proportional problems.

Table 6.1 Mean numbers of correct (C), proportional (P), and other (O) answers on the three proportional, additive and constant problems

More importantly, pupils in the CS-condition performed significantly better than pupils in the SC-condition, χ² (1)  =  10.72, p  =  0.0011. Even though the Problem Type  ×  Condition interaction effect was not significant, χ² (2)  =  2.76, p  =  0.2514, the most pronounced differences occurred for the additive problems (1.11 correct answers in the CS-condition vs. 0.65 in the SC-condition) and constant problems (1.00 vs. 0.24), whilst the difference was much smaller for the proportional problems (2.76 vs. 2.61).

Table 6.1 further reveals two explanations for these better performances: First, CS-condition pupils applied fewer proportional strategies than SC-condition pupils (1.70 vs. 2.08 and 1.86 vs. 2.08 proportional errors for the constant and additive problems, respectively), χ² (1)  =  4.73, p  =  0.0297. Second, CS-condition pupils also made significantly fewer other errors than SC-condition pupils (0.30 vs. 0.68 and 0.03 vs. 0.26 for the constant and additive problems, respectively), χ² (1)  =  8.05, p  =  0.0045.

3.2 Classification Task

Table 6.2 provides an overview of the different scores regarding the quality of pupils’ classifications. First of all, this table reveals a high mean Pu-score of 2.37 (on a total of 3). Most pupils put at least 2 – many even all 3 – proportional problems in one single group. In contrast with the high Pu-score, the mean Pc-score is only 0.40, indicating that pupils frequently also put some (on average almost 2) additive and/or constant problems in the P-group, instead of putting them in separate groups.

Table 6.2 Mean uncorrected (Pu, Au, Cu) and corrected (Pc, Ac, Cc) scores for the classification task

For the additive and constant problems, the uncorrected scores (Au and Cu) are 1.73 and 1.71, respectively. These scores are lower than the one for the proportional problems. This is inherent to our scoring rules, because we first determined a P-group (which often also included some additive and constant problems) so that, on average, less than three additive and constant problems were left to create A- and C-groups. But still, the size of the Au- and Cu-values indicates that many pupils did make separate groups for the additive and constant problems. As was the case for the proportional problems, also for the non-proportional problems the corrected scores (Ac and Cc) are somewhat lower than the uncorrected ones (Au and Cu), but the difference is not as pronounced as for the proportional problems (1.73 vs. 1.35 for the additive problems, and 1.71 vs. 1.27 for the constant problems, respectively). So, even though other problems were sometimes included in the A- and C-groups (i.e. on average about 0.40 word problems in each group), this happened less often than for the P-group (on average almost two word problems).

In sum, the results presented so far point out that most pupils created a group containing proportional word problems, but often also some additive and/or constant problems were included in this group, suggesting that not only in their problem solving but also in their classification activities, pupils had difficulties to distinguish all non-proportional word problems from the proportional ones. Nevertheless, there was evidence that pupils distinguished some non-proportional problems, and made separate groups of proportional, additive, and constant word problems, even though their classifications were often imperfect.

We also compared the classifications in the two conditions. As can be seen in Table 6.2, the scores for the proportional problems (Pu and Pc) hardly differ for the SC- and CS-condition (a finding that parallels what was found for the solution task). However, classification scores for the additive and constant problems are somewhat higher in the CS-condition than in the SC-condition (except for the Au-scores, which are approximately equal). So, whereas doing the classification task first has a beneficial impact on performance on the solution task (particularly on non-proportional problems), the reverse is not the case: Doing the solution task first does not improve performance on the classification task. On the contrary, it has a slightly negative impact on children’s classifications.

With respect to quality of justifications, Table 6.3 gives an overview of the various justifications for the P-, A-, and C-groups (for the explanation of the different labels, see the Analysis part in the Method section). A first observation is that explicit justifications are very rare for all three groups of word problems. They occurred in a maximum of 7 out of 75 cases. Second, many of the justifications are implicit, particularly for the C-group, but also for the other two groups. Third, also many superficial justifications were observed in all three groups. Of course, this does not necessarily imply that pupils actually used these superficial criteria while classifying. Their classifications were often in accordance with the underlying mathematical models, so children might have used criteria that acted tacitly, with superficial justifications occurring post hoc, in response to the instruction to provide justification for their classification. And fourth, the kinds of justifications are very comparable for the SC- and CS-condition. So, even though many pupils made appropriate – sometimes even perfect – classifications of the nine word problems in terms of their underlying mathematical models, they were rarely able to justify their classifications explicitly.

Table 6.3 Number of superficial (S), implicit (I), explicit (E), and other (R) justifications given by pupils to the P-, A-, and C-groups

4 Conclusions and Discussion

Previous studies (e.g. Van Dooren et al. 2005) have shown that pupils strongly tend to use proportional solution methods for missing-value word problems, even when this is inappropriate. It was also suggested that pupils’ immature and even distorted disposition toward mathematical modelling plays an important role: After a reflex-like recognition of the type of word problem, pupils quickly jump to the actual calculating work, and afterward, the result is immediately communicated without any further interpretation or critical reflection. The current study assumed that – if pupils would work on an unfamiliar task not focused on producing computational answers but on reflecting on commonalities and differences within a set of word problems – they might engage in a deeper kind of mathematical thinking, and ­distinguish more easily between proportional and non-proportional problems, which, in its turn, might have a beneficial effect on their problem solving skills.

Taken as a whole, the results supported this assumption. On the solution task, pupils were prone to the overuse of proportional methods: Performances on proportional problems were very good, but almost 4 out of 6 non-proportional problems were solved proportionally, as observed in previous studies (De Bock et al. 2007). As expected, pupils’ behaviour on the classification task, however, was different. Nearly all pupils classified the proportional problems in one group, but they typically also included a few non-proportional (additive and constant) word problems. Many pupils also made a group of additive problems and another group of constant problems. Most often, pupils did not provide adequate explicit justifications for their groupings, but justified them implicitly.

The difference between the two conditions provided convincing evidence for the potentially positive effect of the classification task. Pupils who received the solution task after the classification task performed significantly better on the solution task than those who immediately started with the solution task, suggesting that the classification task made them more aware of differences among the word problems, which pupils transferred to the solution task. This observation is remarkable, considering that pupils’ overuse of proportionality is deeply rooted (De Bock et al. 2007), while the classification task was a rather subtle and limited intervention, especially for pupils as young as 6th graders: No classification criteria were provided, no feedback was given, and the usefulness of the classifications for the subsequent solution task was never mentioned.

The positive results on the classification task also have implications for educational practice. They support the assumption that the overuse of proportionality is to a large extent due to pupils’ superficial approach to word problems – jumping too quickly to the calculating work and immediately reporting the outcome – rather than to being really unable to distinguish proportional from non-proportional word problems. As such, explicit classroom attention to discussing similarities and dissimilarities between word problems (both in terms of superficial contextual features and in terms of deeper underlying structures) seems a very promising approach in order to eradicate the overuse of proportional methods.