Keywords

Introduction

There are few studies on teachers’ understanding of statistics (Jacobbe & Carvahlo, 2011), but the results of these studies show that teachers’, also prospective teachers, understanding appears not to be very different from students in school. Other studies indicate prospective teachers ’ understanding of statistical concepts to be procedural and consist of a collection of isolated rules rather than a conceptual scheme (Leavy, 2010). Jacobbe and Carvahlo (2011) suggested that the reason for this is that a more sophisticated level of knowledge about averages has not appeared in teacher education systematically. This higher level of understanding, also known as statistical literacy, is something that teacher education needs to focus on (Ben-Zvi & Garfield, 2004; Shaughnessy, 2007). Statistical literacy implies that it is not enough to know only procedures; statistical literacy “involve(s) more than understanding the arithmetic mean” (Jacobbe & Carvahlo, 2011, p.207). Knowledge of averages should cover, for example, how to make sense of data as linking various means for average and averaging (Gal, 1995; Stack & Watson, 2010). This indicates that there is not only a gap that needs to be addressed in teachers’ training on the concept of average but also a need for curricular goals to make clear how to teach averages (Gal, 1995). As indicated, statistical literacy also includes conceptual knowledge. Knowledge of a concept can be described as a scheme, as knowledge that develops through “networks consisting of connections between discrete bits of information about the measures that are formed” (Groth & Bergner, 2006, p.39). In this paper prospective teachers’ conceptions of mean, median and mode are studied through the definition that conception is “a general notion or mental structure encompassing beliefs, meanings, concepts, propositions, rules, mental images and preferences” (Philipp, 2007, p.259).

One way to inform teacher education on prospective teachers ’ thinking is to analyse descriptions of prospective teachers’ content knowledge (Groth & Bergner, 2006). In this study it is of interest to gain further knowledge about how prospective teachers locally understand statistical concepts. In the light of the above, the aim of this study is to study prospective teachers’ conceptions or, at least in part, on their knowledge and understanding of mean, median and mode in relation to explaining these concepts to 4–6-year students. Thus, the research question for this study is: How do prospective teachers conceptualise the concepts of mean, median and mode to a student in years 4–6? The conceptions were analysed from a current group of prospective teachers’ answers to a questionnaire.

Background

According to Hiebert and Lefevre (1986), knowledge can be thought of as conceptual or procedural. Conceptual knowledge as rich in relationships and procedural knowledge is made up of two parts: formal language (symbol representation system) of mathematics or of algorithms and rules (ibid, 1986). Research shows that students generally have procedural knowledge of isolated rules and reliance upon procedural algorithms rather than conceptual knowledge (Jacobbe & Carvahlo, 2011). Within statistics that is apparent when it comes to the concept of mean , students show ability to calculate the mean but show little basic conceptual understanding related to the concept (Cai & Moyer, 1995; Leavy & O’Loughlin, 2006). To achieve a broader understanding of statistical concepts, students need to understand and use statistical ideas at different levels. That includes competence and understanding of “the basic ideas, terms, and language of statistics” (Rumsey, 2002, s.2). Competences of statistics required for statistical literacy are brought up by Watson (1997), who has summarised these skills in a three-tiered hierarchy : (a) a basic understanding of statistical terminology, (b) embedding of language and concepts in a wider context and (c) questioning of claims, with the aim to develop statistical literacy. Hence, both concepts and terminology are important components. To conceptualise averages, one need not focus only on calculation; there are also other intuitive important ideas concerning averages (Jacobbe & Carvahlo, 2011; Stack & Watson, 2010). Or as Gal (2000) puts it, doing statistics is not equivalent to understanding statistics.

Ideas about averages are discussed in various ways. There are at least three different perspectives on the term average: based on social experiences, based on media or based on the curriculum (Stack & Watson, 2010). Another way to approach students’ understanding of average is to consider the following four perspectives: (a) average as modal, (b) average as what is reasonable, (c) average as the midpoint or (d) average as an algorithmic relationship (Rusell & Mokros, 1991). These different ideas are now exemplified below.

Averages

Traditionally in the teaching of averages, there has been a strong focus on the teaching of mean (Jacobbe & Carvahlo, 2011; Leavy & O’Loughlin, 2006). One reason for that could be that median is computationally more simple than the mean (Lesser, Wagler, & Abormegah, 2014) or in what kind of data is used (Mayén & Diaz, 2010).

Mean is often connected to an algorithm, a procedure, implying to add up and divide by the number of values, regardless of outliers (Jacobbe & Carvahlo, 2011). One way to bring an understanding to students would be to encourage explanation of how to find the mean and develop formulas that use everyday language, making clear connections as a useful life skill (Rumsey, 2002). Median is viewed as easier to understand than mean (Leavy & O’Loughlin, 2006), but still there are many aspects of the concept of importance, not least to a teacher (Lesser et al., 2014). One aspect is that median is not always a value in the dataset; another is difficulties when ordering data (Groth & Bergner, 2006). Importantly for teacher education, research informs us that elementary teachers, in particular, have several difficulties in determining medians in graphic data (Friel & Bright 1998), determining median from a set of unordered data (Zawojewski & Shaughnessy, 2000), ordering datasets and describing median as a centre of something but being unclear what something is (Mayén & Diaz, 2010). However, there is little research on how mode is conceptualised (Groth & Bergner, 2006). Mode is often described as the most frequent or the most popular in a dataset. At first in school, using only nominal data, it is easier to calculate the mode, but when numerical data appears, some confuse the variable value with the frequency (Watson, 2014). Understanding that there might be more than one mode and the importance of a mode, or not, is also important knowledge for teachers to understand (Watson, 2014).

If students are to choose between mean and median to describe a dataset, they often choose mean , without regard to distribution (Groth, 2013). One reason for not being able to choose could be if students “have been exposed to only non-contextual situations where the objective is to correctly perform a calculation” (Jacobbe, 2008). Another reason is if the students’ knowledge has developed in isolation of one another (Jacobbe, 2008). One way to improve the students’ intuitive understanding is to use real data instead of invented datasets (Stack & Watson, 2010). Working with real data and letting the students describe their choice would be one way to gain arguments and conceptual knowledge (Groth, 2013). Varying kinds of data could be of importance to contrast mean and median. Quantitative data seems prevalent than ordinal data when teaching averages and works for both mean and median. Some students, who do not perceive the difference, transform ordinal data to quantitative data and then calculate the mean. Working with both ordinal and quantitative data could be one way to contrast median and mean. This is a way to illuminate that you can decide a median with ordinal data, but not calculate a mean (Mayén & Diaz, 2010).

In sum, research informs us that teacher education programmes need to address several conceptions and misconceptions about averages. Teacher education needs to focus on specific knowledge development, e.g. why averages tell different things about a dataset, which averages are best to use under different conditions and why they do or do not represent a dataset (Garfield, 2002). One explanation suggests that procedural knowledge and calculation have a strong perception in statistics (Rumsey, 2002). To calculate an average is just the process to gain information; it does not demonstrate the ability to understand what average measures or how it is used (Rumsey, 2002).

This study could give an indication of prospective teachers’ conceptual understanding and intuitional ideas of averages, thus providing indicators on what is the essential content in a teacher education course.

Methodology and Methods

This pilot study, of a single case (Yin, 2013), reports initial findings collected through a questionnaire in a teacher education course. The respondents are prospective teachers for school year 4–6 on their sixth semester of eight. The particular course in which the study was performed is the third course out of four dealing with teaching mathematics. The aim of the pilot was exploratory, using four open questions directly related to their conceptions on averages, three of which will be discussed here.

The questions were formulated as how questions, to identify any gaps in respondents’ knowledge related to mean, median and mode. Such questions are appropriate in case study research according to Arthur, Waring, Coe, and Hedges (2012), as they offer possibilities to compare individuals’ descriptions, definitions and understandings of conceptions, in this case, averages. The questions asked were: (1) How would you explain the concept mean to a student in years 4–6? (2) How would you explain the concept median to a student in years 4–6? (3) How would you explain the concept mode to a student in years 4–6? The questions are open in their character where the linguistic elements of how and explain and years 4–6 were provided to provide a structure and context in their answers towards a teaching situation.

The anonymity was an important condition for the prospective teachers to feel confident that their answers would not affect their grades in the course. This was confirmed in the questionnaire as well as orally declared in the course introduction. The response rate was 63% (29 out of 46).

The data has been analysed through a constant comparative method consisting of initial and selective coding (Glaser & Strauss, 1967). Initial coding implies staying close to the data and being open to what is going on in the data. Selective coding implies selecting the most frequent codes and how they relate to other codes identifying important relationships and differences (Arthur et al., 2012). The initial coding took place in several steps analysing the questions back and forth in order to identify what codes are to be found in the data. A starting point for the encoding was procedural and conceptual knowledge in accordance to Hiebert and Lefevre (1986) and competences of statistics required for statistical literacy (Watson, 1997). The final initial codes revealed the following: conceptual knowledge , procedural knowledge, context, colloquial concepts, usefulness, statistics (mathematics) and didactics (teaching). After grouping and comparing initial coding between the three averages, three tentative categories emerged: use of words, understanding averages and teaching explanation. These categories are a synthesis of what was seen as the result of this study. Alongside of the coding process, the codes and categories were read in conjunction to definitions of conceptual and procedural knowledge by Hiebert and Lefevre (1986) and skills required for statistical literacy by Watson (1997). The definitions the students used of the averages arithmetic mean, (referred to as mean in this text), median and mode were read in conjunction to the following definitions: mean the sum of the numbers divided by their quantity; median the middle of an odd number of observations and the mean of the two middle of an even number of observations; mode the observation value or observation values with the largest frequency (Kiselman & Mouwitz, 2008).

Results

The results will be presented and discussed out of the three categories that emerged in the analysis. In each category similarities and differences revealed between the participant’s ideas about each average will be presented.

When comparing the initial codes for mean, median and mode, more varied combinations among the codes were found for mean. For median and mode, most of the respondents used definitions when explaining these averages. For mean, however, fewer definitions were used. The mean is more often explained in a context and with significantly more colloquial words. Another difference that emerged is that there are more words used for explaining mean and median than for explaining mode.

In Swedish the word mean is called medelvärde which is a composition of the two words medel (middle) and värde (value). This means that the word medelvärde signals a middle value. When describing what is measured or what is calculated, 15 used the word value and 9 used the word number or other synonyms indicating quantitative values. Sixteen used examples from a context, for example, age or weight. Another word often used was genomsnitt (used 29 times when describing the mean). The word genomsnitt is composed of the words genom (through) and snitt (cut), and the cut has many meanings. Snitt can also be used as shorthand for genomsnitt and was used three times. As a statistical term, it can describe average as well as mean, median or mode. It could also describe a typical variation or spread (NE, n.d.). In some of the answers, these different words are used in a way that could confuse. For example, in the following quote words, a variation of words is used to explain mean.

Medelvärdet (the mean ) is the same as genomsnitt. If one wants to find genomsnittet of, for instance, age in a family with four family members, then you add all the ages with each other and then divide with the number of family members – thus four. Then one will have medelåldern (the medium age) and genomsnittsåldern (the average age) of the family.

In Swedish, the word median does not have any particular synonym; it is natural to use the word median or to say something like the middle observation. The word median does not signal a value in the same way as middle value (mean), despite that when describing what is measured or what is calculated, 17 participants used the word value, 10 used the word number or other synonyms for number and finally 5 used examples from the context they had chosen, for instance, age or weight. Specifically for the case of median, the word number line is used seven times, and the word number series is also used seven times. The word observation is used just once. A typical description is “To calculate the median you line up the values in order of size, the smallest first and largest last. Then one looks up the number that is in the middle. That number is the median”.

In Swedish the word typvärde is a composition of the two words typ (typical) and värde (value) and signals value, the same way as medelvärde (mean) does. Twenty-four used the word value, nine used the word number or other synonyms for number and finally seven used examples from the context they had chosen, for instance, age or weight. In particular for mode, the word frequency was used twice. The word observation is used one time. A typical description was “The mode is the value that appears most times. For example: 1, 1, 1, 1, 2, 1, 1. Here the mode is one”.

How the students appear to understand averages is interpreted out of the data in different ways. One way is through the definitions and the other through numerical examples that are brought to some definition. Mean appears to be more familiar than median and mode. All but one definition on mean are correct, but when connecting the definition to a context, participants used it in an accurate way. In contrast the definitions of median and mode are incomplete or incorrect to a greater degree. Many show what happens to the median when having an odd number of observations but not for an even number, for example, “the value that is in the middle, neither largest nor least, but the middle”. Seven respondents involved odd numbers in their definitions.

Understanding averages could also be seen through the code usefulness. Few students explained averages this way, but for those who did appear to show some kind of conceptual knowledge . Examples of answers in this code are “the mean is an average suitable to compare different observations”; “if you know that the mean is 10 years, then you know what activities could be suitable (e.g. for a party)”; “mean is interesting when you want to set an age on a group of people with various ages”, or “the mean is 9.5. 9.5 is quite close to all the values, so that the mean we have calculated is a measure on the approximate price”.

Some students compared mean and median in their explanations using outliers. The intention was to show that the median is more reliable in certain situations, for example, “important to choose an appealing and obvious example, for instance, five people’s monthly salary where four people have about the same and one has twice the salary”.

Using a context can be a way to provide an explanation into a teaching situation and/or to present data to be used in an explanation. All but one of these examples involved quantitative data. In the case of qualitative data , the frequency is confused with the variable. “The most frequent number in your series. For example, if you have 10 cars, 4 red, 3 blue, 2 white, 1 green, your mode is the most frequent number 4. It is the number that occurs most times”. The context was predominantly used to complement a definition, rather than an example of how one can teach averages.

The teaching examples provided by the participants were varied. Two examples suggested using concrete material. One simply stated that this is a good idea when teaching mean, but the other offered a way to use it, e.g. “if we divide 16 in 4 piles, there will be 4 in each which means that the mean is 4”. There were also a few examples on how data could be used in an explanation, for instance, to show that different data make the median a more appropriate average than mean, or the other way around, to be aware of ordering the data before deciding the median or to problematise mode by using data with more than one mode. Finally there were a couple of suggestions lacking argumentation. One asserted the importance to teach mean and median closely together in time, in order to teach the difference between the two concepts. Another stressed the importance of explaining the purpose of mode.

Discussion

The aim of this study was to investigate how this group of prospective teachers conceptualise the concepts mean, median and mode in relation to explaining these concepts to 4–6-year students. The result will now be discussed through conceptual and procedural knowledge according to Hiebert and Lefevre (1986) and competences of statistics required for statistical literacy (Watson, 1997). Finally the results will be discussed through Philipps’definition of conception (2007), written on page two in this paper.

The results show that a high proportion of the prospective teachers’ explanations were predominantly related to definitions, rules or algorithms, of the averages investigated. This resonates strongly with Hiebert and Lefevre’s (1986) definition of procedural knowledge . Any contexts used in their explanations are mainly descriptions on where numerical data can be found and show few examples of relationships within the concepts. It could be argued that expressing the definitions of averages can be equated to having a basic understanding of statistical terminology. However some of the participants’ definitions are incomplete, and few embed the language and concepts in a wider context, the second point in Watsons’ (1997) hierarchy. One reason for these results could be that there are many colloquial words used around the teaching and learning of averages, especially for mean. The results indicate how important both the concept and the terminology are when learning this concept, as many colloquial words or synonyms were used in the explanations here, just as Watson (1997) highlighted in her study. My conclusion is that words such as medeltal (middle number) and genomsnitt (“cut through” value), used proficially in this study by the participants as both mean and median, need to be understood and used in a very considered way.

As already mentioned, definitions were the main source to the explainations by the participants in this study. Yet trying to interpret their understanding, one cannot say more than if the definition is correct or not. The conclusion is that they show a strong procedural knowledge which Watson (1997) connects to the first level of understanding in her hierarchy of knowledge. Few prospective teachers showed any other understandings on mean and median than definitions. Mode seems to be more unfamiliar to the prospective teachers. This result is consistent with previous research (Groth & Bergner, 2006) and probably a result of their own schooling (Jacobbe & Carvahlo, 2011; Leavy & O’Loughlin, 2006). Altogether the results show few implications on understanding averages. A possible interpretation is that having only definitions, as a way to explain averages, does not give one the language to demonstrate conceptual knowledge. The conclusion is that working with different data levels of measurement and real data needs to be covered in teacher education on statistics in the future (c.f. Groth, 2013; Jacobbe, 2008; Mayén & Diaz, 2010; Stack & Watson, 2010).

The few examples provided by the participants were of two kinds: the first, brief instructions without argument and, second, more explicit examples of teaching. More explicit examples cover level 1 and 2 of Watsons’ criteria (1997) for statistical literacy but also show some relationships within the concept which could be defined as conceptual knowledge according to Hiebert and Lefevre (1986). My conclusion is that the prospective teachers are not experienced in teaching averages, and the knowledge they have is not appropriate for doing this.

There are limitations in the study. The result presents only 29 prospective teachers’ comments. The design of the questions affects the answers, the result and analyses. However there is a strong implication that the prospective teachers show mainly procedural knowledge and basic understanding of statistic terminology to a greater or lesser extent. The prospective teachers ’ conceptions about the concepts mean, median and mode can, in this pilot study, be described through Philipps’ definition of concept (2007) as consisting of (I) concepts, mainly procedural; (II) rules, definitions of averages; (III) mental images, mainly definitions (few cases as didactic or mathematical); and (IV) beliefs and preferences, being a student rather than becoming a teacher. The knowledge the prospective teachers show is probably a result of their own schooling as they have not had any course yet at teacher education in teaching statistics. When comparing their knowledge to previous research, we can see that it is highly consistent. The implication of this is that there are three key aspects to consider in the future: (1) a need to strengthen prospective teachers’ concept knowledge in order to develop statistical literacy; (2) a need to expand their network of knowledge within and between mean, median and mode; and (3) a need to teach the different data levels of measurement (e.g. nominal, ordinal, interval and ratio) in order to develop statistical literacy and gain a language to reason about statistics as a teacher. A result highlighted by this pilot study revealed the overuse of colloquial terminology used and thus need further investigation.