Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Statistics and statistical thinking have become increasingly important in a society that relies more and more on information and demands for evidence. Hence the need to develop statistical skills and thinking across all levels of education has grown and is of core importance in a century which will place even greater demands on society for statistical capabilities throughout industry, government and education.

The past two decades have seen considerable discussion, research and developments across all levels of education to meet the challenges of facilitating the learning of statistical thinking and reasoning. These have included data-driven approaches, more emphasis on data production and the measuring and modelling of variability (Moore, 1997), real data and contexts, and generally a more holistic approach that reflects the practice of statistics. The emphasis in creating environments for learning has been on active learning, hands-on experience and problem-solving.

Although gathering and interpreting data, and statistical thinking pervade everyday living, disciplines, workplaces and research, they are remarkably challenging to learn and teach. Data are inherently messy, and interpretations of models and analyses, from the very simple to the most complex, require judgement and understanding dependent on assumptions and context, but avoiding susceptibility to context “intuitions”. Whether considering development of school curricula and resources, or pre-service and in-service programmes for teachers, the learning and teaching of statistical thinking require gradual building up of concepts, understanding and skills, in a coherent, consistent and cumulative way that engages students in real contexts and authentic learning experiences. This is an ongoing challenge requiring cooperation and contributions from statisticians, educators and teachers.

The term “statistical thinking” will be used here with the broad meaning of making sense of information in which variation and/or uncertainty is present. It is thus inclusive of chance and data, which should be regarded as intertwining and interacting elements of statistical thinking. Section 2 outlines some of the commonalities in frameworks that have been advanced for statistical thinking and how it is learnt experientially. Some of the shared elements include focus on what statisticians do in the process of data investigations, and on articulating this. The statistical investigative process is often described as the data investigative or enquiry cycle, and this is increasingly emphasised as a vehicle for both statistical problem-solving and learning statistical thinking. The key elements of the process and some articulations of it are described in Sect. 2 and the stages of two articulations provide the structure for Sects. 47 in discussing the value of its explicit use in learning and teaching statistical thinking.

One context in which a description of the data investigative cycle was advanced was the United Kingdom (UK) National School Curriculum in the mid-1970s. Holmes (1997) described the introduction and development in the final 2 years of school study in the UK, of a compulsory project to encourage a more holistic and practical approach to statistics, reflecting what statisticians do. The project-based approach is ideal for the statistical investigative process, and Sect. 3 outlines the approach of learning statistical thinking within the vehicle of the data investigative cycle. Reference is made to this approach throughout Sects. 47. Section 8 provides a small selection of examples of practical and accessible projects that have proved of value in engaging students and teachers in developing statistical thinking within particular stages or in the full process of the investigative cycle.

2 Statistical Thinking and the Data Investigative Cycle

As stated, the term “statistical thinking” is used here in an inclusive and encompassing sense as envisaged by the analysis of Wild and Pfannkuch (1999). The framework of their model for statistical thinking is that of empirical enquiry, and aims to generalise and synthesise from a combination of the literature and qualitative research into the approach and thinking of professional statisticians and statistics students during the process of investigating and solving real, vaguely described problems. The authors comment that they are “not concerned with finding some neat encapsulation of statistical thinking” (Wild & Pfannkuch, 1999, p. 224), nor do they address the full spectrum of statistical thinking. Their focus is on what professional statisticians do in solving real problems involving the need for modelling and analysing context information, uncertainty and data. Dimension one of their model is an articulation of the data investigative cycle.

This is also the focus of Cameron (2009) in a paper written from the viewpoint of training collaborative research statisticians. Cameron commented on Chambers’ (1993) description of what statisticians do as “greater” statistics involving three components: preparing data (including planning, collecting, organising and validating); analysing data; and presenting information from data. In adding an initial stage of formulating a problem so that it can be tackled statistically, and another possible stage of researching the interplay of observation, experiment and theory to develop new methods, Cameron’s (2009) model of what professional statisticians do in the practice of statistics is not only consistent with dimension one (the investigative cycle) of Wild and Pfannkuch’s (1999) model of statistical thinking, but also reflects dimension two of their model in the types of thinking fundamental to statistics. These include recognition of the need for data; changing the representation to assist understanding and problem-solving; investigating variation; reasoning with statistical models; and incorporating statistics and context.

Thus an expression describing the data investigative cycle provides a practical framework for demonstrating and learning statistical thinking. Exact descriptions of the cycle vary slightly but all share common concepts and structure. Cameron’s (2009) description was based on descriptions by professional statisticians. Wild and Pfannkuch’s (1999) description is the Problem, Plan, Data, Analysis, Conclusion (PPDAC) cycle adapted from MacKay and Oldfield (1994) that reflects the statistical process (see, for example, Shewhart & Deming, 1986). The description of the data-handling cycle that featured in the UK National School Curriculum since at least the mid-1970s (Holmes, 1997) has become the Plan, Collect, Process, Discuss (PCPD) cycle that is at the heart of the extensive pedagogies and resources produced by the Royal Statistical Society’s Centre for Statistical Education (www.rsscse.org.uk/). Marriott, Davies, and Gibson (2009) included a mapping of the problem-solving approach of the PCPD cycle onto Bloom’s taxonomy of the cognitive skills of educational objectives as revised by Anderson and Krathwol (2001). A mapping of the learning objectives of this form of the cycle onto a two-way classification that combines the cognitive and the knowledge dimensions of Anderson and Krathwol (2001) is given in www.rsscse.org.uk/qca/doc/PSAtwowaymap.pdf

Statisticians and statistical educators are increasingly emphasising the importance in statistical education of including all stages of the investigative cycle, particularly those that produce data to be investigated – those that involve identifying the problem or issue, planning the investigation and collecting the data – and the stage of interpreting the results of analysis or exploration of the data in context. That is, the stage described as “analysis” in the PPDAC description, or as “process” in the PCPD description, should be taught as part of the overall process of statistical thinking. Such emphasis requires not just real contexts and real data, but placement of components of learning within an overall framework representing whole and complex problems needing the full gamut of the knowledge dimension in statistics of Anderson and Krathwol (2001): factual, conceptual, procedural and metacognitive.

3 Projects and the Investigative Cycle

Thus statisticians and statistical educators advocate enquiry and investigation approaches in the development of statistical thinking, and use of the investigative cycle as a framework. Learning experiences, small or large, can be couched in terms of investigating real problems with real data, and can be explicitly embedded in the framework recommended for the development of statistical thinking. That is, learning experiences can target parts or all of the investigative cycle – the key pedagogies are emphasis on investigation and identification of the stages of the investigative cycle in problem-solving.

Holmes (1997) identified and discussed the advantages of projects in statistics as natural vehicles for the data investigative cycle and holistic experiential learning. Although the context is senior school, the comments could equally well apply to all levels of education across and beyond schooling. Projects may vary in size and in the time allocated to them, but are characterised by incorporating a whole process from identifying a problem or issue of interest through to presenting a report. Holmes (1997, p. 156) described a project as a piece of work “that would start with defining a problem, collecting the appropriate data, analysing the data and drawing appropriate inferences. All this was to be presented in a written project report of about 15 pages”. Statistical projects, whether large or small, provide experiential learning of statistical investigations. Such learning brings together concepts, knowledge and skills in contexts that can engage and motivate students as well as teach them about the nuances of statistical thinking, the vagaries of data and the challenges of communicating interpretations in context.

This also applies within teacher education, and as the use of projects is also accepted as an integral part of school learning experiences, the use of data investigation projects is ideal to develop both statistical understanding and pedagogy for teachers. Associating the statistical investigative cycle with projects will assist in educating teachers to teach statistics, as the cycle may be used to capture statistical thinking within a pedagogical framework of active learning through projects. For example, the PPDAC form of the data investigative cycle is being used in an ongoing study designed to understand primary school teachers’ experiences as they develop confidence in teaching statistical enquiry (Makar, 2008).The projects discussed in Holmes (1997) are free-choice data investigations in which students identify their topic to be investigated, plan and implement a data collection strategy to investigate the topic, explore and analyse their data, and produce a written report. This is often a group project because the task needs a group at all stages, particularly in free-choice data investigations in identifying the topic, planning, and collecting the data. The strong sense of ownership also facilitates teamwork as the project moves through the full process of data handling, exploration, analysis (if appropriate), interpretation and reporting in context. At all educational levels, such projects are advocated for experiential learning of the process of statistical enquiry, because they capture the challenges of turning ideas and questions into plans for investigation, the practicalities and messiness of data collection and handling, the essentials of choosing and using statistical tools, and the synthesis of statistical interpretations in real and authentic contexts. As students move from primary school through the secondary school levels and beyond, the same general approach is followed, but the level of statistical sophistication increases and the level of teacher direction decreases.

A significant impetus for learning is student ownership – of the ideas, the data and therefore, the analysis (MacGillivray, 1998, 2002; Chance, 2005; Lee, 2005). If students do not choose the topic to be investigated or if the topic and data are supplied, then teaching strategies need to address student engagement in the problem and all the stages of the data investigative cycle. No matter what the size or restrictions of a learning experience, what makes it a statistical data investigation is its placement within the process of statistical enquiry. That is, if a learning experience focuses on part of the data investigation cycle, it is important for students to understand where it fits and, if possible, at least consider the other aspects of the cycle relevant to the topic.

Sections 47 below consider learning experiences within the stages of the statistical investigative cycle as discussed in Sect. 2, referring to the descriptions PPDAC and PCPD of the data investigative cycle, as these are probably the best known. Section 4 considers the Problem and Plan stages of the PPDAC description and the Plan stage of the PCPD description. Section 5 considers the Data stage of PPDAC and the Collect stage of PCPD. Section 6 considers the Analysis stage of PPDAC and the Process stage of PCPD. Finally, Sect. 7 considers the Conclusion stage of PPDAC and the Discuss stage of PCPD.

4 Identifying the Problem and Planning

Whether data are to be collected, selected or provided, identification of the problem or topic to be investigated and the plan for investigation are essential and significant aspects of statistical thinking and problem-solving (Wild & Pfannkuch, 1999; Nolan & Lang, 2007) and need much more emphasis than has traditionally been given. Once a general topic or aspects of a topic are identified for investigation, other questions can be asked to assist the planning, irrespective of whether data are to be collected, selected or provided. These include:

  • What do we want to find out about? What can we find out about?

  • What can we measure or observe? Can we measure what we want?

  • Is there anything else we should observe or record … just in case?

  • Should we do a preliminary experiment?

  • How can we collect data that are representative?

Whatever the school level, statistical projects and statistical learning experiences should involve a number of variables. This provides authenticity in that almost all real problems are complex and involve – actually or potentially – many variables. This also provides experience in identifying questions, selecting data and method, and reporting interpretations in context. Even when illustrating methods involving just one or two variables, selecting from within a wider context or a more complex dataset facilitates a more holistic and realistic approach, and therefore more statistical thinking. This applies across educational levels, particularly as students mature. Ridgway, McCusker, and Nicholson (2005), MacGillivray (2005), and Schield (2005) all advocated the value of working within contexts with more than two variables in developing statistical thinking as students progress. Examples of problems and datasets that can involve many variables may be found in CensusAtSchool data (for example, Turner, 2006).

The Problem and Plan stage(s) of the cycle include identifying variables, identifying the subjects of the study, and considering the practicalities. When students are familiar with appropriate software, considering what the resultant spreadsheet of raw data will look like is an excellent aid in planning – if the spreadsheet cannot be visualised or described, then the planning is not complete.

Whether data are to be collected (primary) or provided (secondary), the Planning stage must consider the question of representativeness of data. Data can be used to make inferences about a larger group or a more general situation if the data can be considered to be representative of that larger group or more general situation with respect to the question(s) of interest. As students mature and come to consider the concept of inferring, the challenges of planning data collections to ensure such representativeness, or identifying the representativeness of secondary data, are greatly assisted through experiencing the Problem/Plan stage of the cycle. Each type of statistical investigation – experiment, observational study, survey or a mixture of types – has its own challenges in planning to achieve data relevant to, and representative of, the problem.

The thinking involved in considering the topic and its context, what variables to use, what data to obtain and how to obtain representative data, is profoundly statistical and incorporates almost all of the types of thinking fundamental to statistical thinking of dimension two of Wild and Pfannkuch’s (1999) model. It can be seen why statisticians and statistical educators are emphasising the importance of inclusion in teaching statistics of these aspects of data investigations.

Progressive development of statistical concepts and methods should gradually proceed in types of data, and therefore types of variables, as well as number of variables to be considered, and complexities of contexts and topics. The simplest data are categorical, and early learning is of simple categorical variables. A delightful paper by du Feu (2005) demonstrated how statistical thinking and projects can commence at an early age with categorical data, and motivate and help develop the earliest concepts of presentation of data and commenting on features.

5 Collecting the Data

Data can be generated through surveys, experiments, observations, or from pre-existing datasets such as those available through the Internet and other sources. Offices of National Statistics may also provide a useful source of data for projects at all levels, although such data often tend to be in at least some form of summary; for example, it can be quite difficult to access original data or data on more than two variables at once. The CensusAtSchool data aim to use students’ interest in data about themselves to engage them in statistical questions and explorations. The Internet is a rich source of data of interest to older students, and also encourages them to choose their own topics to explore. Topics of interest include sport, weather, music charts, movies, as well as more serious topics of social issues relevant to teenagers. Examples of work involving the analysis of media data can be found in Watson (1997).

The full Collecting the Data stage of the data investigation cycle involves collection, handling and cleaning of the data. In projects with data either provided or from secondary sources, these key elements can still be discussed, with the emphasis on understanding the need for well-collected, representative data. Pilot studies help in planning collection of good data. For simple projects at primary and even secondary school, such preliminaries may be as straightforward as trialling questions with each other, but no matter how simple the project, identifying the issue to be investigated, and planning the acquisition of, or access to, good data are essential in learning statistical thinking. In both collecting and preparing data for exploration or analysis, the representation of the problem and of the variables must be considered, whether we are dealing with simple or complex categorical variables, or measurement challenges. Data cleaning and data entry, whether students are using spreadsheets or summarising data themselves, have many aspects of challenge and fun. Is there really a student who lives 5 km from school but takes 40 min to reach school? Is there really a student who estimated 5 s by 3 s but 10 s by 15 s? We recorded 16 different colours of cars – how should we group them?

6 Analysing and Processing

At the school level, this stage refers mainly to choosing and using data representations and summaries for data exploration. In many ways, the word “process” of the PCPD version of the data investigation cycle is more appropriate for school levels, and the word “analysis” for post-school levels. It is this component that is most closely dependent on the student cohort’s level and the details of the curriculum. But at all levels, this stage involves not only investigating variation but also reasoning with statistical models and incorporating statistics and context.

Types of data representations, summaries and commenting on features of data are underpinned by considerations of types and number of variables; these are essential building blocks of statistical models. Thus there is naturally a gradual development from categorical to count to continuous, and from single variables to two variables and more. There is also a need for a gradual development of awareness of variation, including the important learning development from consideration of variation within a dataset to variation between groups of data to variation across datasets from the same or similar situations or contexts. If this is done in association with simulations – produced and demonstrated by the teacher is sufficient – strong foundations can be laid in students’ understanding of variation and sampling variation. Also important is the key concept of representativeness of data with respect to the questions or topics under investigation.

An emphasis on exploring data through graphical representations includes development of summary statistics with associated discussion of both the strengths and weaknesses of single-valued quantities in representing features of data. Early introduction and ongoing use of words such as “estimate” assist in providing a lasting foundation for future statistical learning. Data should be linked with chance at every opportunity, not only to reinforce concepts of estimation but also to assist in embedding understanding of probability in real and everyday contexts.

7 Interpreting and Discussing

Projects embedded in the data investigative cycle provide a natural environment for developing both verbal and written communication skills within each stage of the cycle, and facilitate coherent and gradual development of such skills as students mature. However, as commented in Forster, Smith, and Wild (2005), there is a need to systematically teach and develop skills in communicating, particularly in the Conclusions/Discuss stage of PPDAC and PDPC. Benefits of the integration of verbal and written communication within statistical projects go beyond specific development of these skills. Lipson and Kokonis (2005) pointed out that report writing in statistics is a metacognitive activity that facilitates the learning of statistical literacy and thinking.

It is also at this stage that students can learn about the nuances and pitfalls of commenting on variation, and allowing for variation in commenting on features of data, as well as commenting in context. It is of great importance in developing statistical thinking to emphasise commenting, interpreting and discussing, and NOT the definiteness of “answering the question”. There are certainly incorrect comments and interpretations that can be made, but the focus should be on appropriate rather than “right” comments. There should also be emphasis on distinguishing between what the data are telling us and what might be the reasons. Interpreting data in context does not mean drawing conclusions based on contextual intuition.

Wherever possible, the word “estimate” should be used. If syllabi and school level permit the study of the concept of error of estimate, then it is the concept and understanding that are vital, not the names or jargon. Introduction of interval estimates through proportions avoids the messiness and complications inherent in estimating means. The worst misconceptions in interval estimates for means are those that interpret the interval as one in which most individual values lie. In contrast, such misinterpretation is almost impossible in estimating a proportion; this is just one reason statisticians are increasingly suggesting introducing formal inference through inference for proportions.

8 Some Examples of Projects

The following three examples, selected from free-choice projects conducted by students, have proved popular in workshops for teachers conducted in Australia, South Africa, and New Zealand. They illustrate some of the variety and learning potential in real and accessible contexts for statistical projects; Sects. 8.1 and 8.2 have been particularly popular for hands-on experience in planning investigations and trialling data collection, while Sect. 8.3 illustrates connecting chance and data. The Royal Statistical Society Centre’s ExperimentsAtSchool also provide a number of well-constructed project activities.

8.1 An Experiment Involving Measurement Choices

Jelly snakes are a confectionery that appeals to the consumer because of its stretchiness. However, the apparently simple idea of investigating the stretchiness of jelly snakes can produce a wide range of ideas and designs for experimentation. Factors can include one or all of colour, brand and temperature, with the latter lending itself to linking with science discussion. But the most challenging aspect that leads to the greatest variation in ideas and some very interesting planning discussions is the question of how to measure the stretchiness, and what measures of the unstretched snake to include in the investigation. Some examples of suggestions of how to measure the stretch have arisen from students and from teachers, including: stretch to break and record length at breaking; stretch to a selected length, let go and measure length to which snake returns; stretch at constant speed; stretch vertically; and remove head of snake and stretch remainder. This is an example of a topic for investigation that can be made as simple or as complicated as desired. Because of its appeal and potential for diversity in approach, Conker Statistics (www.conkerstatistics.co.uk) have chosen it as an activity for the development of resources.

8.2 A Survey that Involves Human Characteristics

Human characteristics are always a fascinating topic for students. Surveys are popular choices in free-choice projects but the design of questions is usually far harder than it first appears. An example of a survey without question design problems and which can also include experimental design aspects is investigating how people clasp their hands. Some reports say people tend to place the left thumb on top (see, for example, http://humangenetics.suite101.com/article.cfm/dominant_human_genetic_traits). Is it related to how people fold their arms? Because the data are categorical, a reasonable amount of data is required for meaningful discussion based on plots and tables, but the data are also quick to collect. One such investigation (MacGillivray, 2007) found that key aspects included the importance of a pilot study and the randomisation of the order in which subjects were asked to do the clasping and folding.

8.3 An Observational Investigation that Links with Chance

Many aspects of human behaviour provide categorical data, and relationships can be explored through two-way tables and side-by-side or segmented bar charts. These also lead naturally to estimating conditional probabilities without the need for any theoretical concepts or jargon. Table 14.1 shows data from an observational study of the use of stairs or lifts at a bus station during a peak period and an off-peak period.

Table 14.1 Numbers of commuters at a bus station

The probability that a person going up uses the lift can be estimated by 62/85  =  0.73 during off-peak times, and 16/24  =  0.67 during peak periods. The probability that a person using the lift is going up can be estimated by 62/97  =  0.64 during off-peak periods, and by 16/164  =  0.1 during peak periods. These and other estimates lead to a wealth of discussion, and key questions about the context and the data collection methods – questions that are core to the data investigation cycle.

9 Conclusion

Statistics is a very challenging area to teach. In addition, many teachers have limited statistical content knowledge as well as little, if no, exposure to any specific pedagogy related to the teaching of statistics. While one can hope that this situation might change over time, the question facing statistics educators is how to educate teachers to be effective teachers of statistics. The word educate, rather than train, is used because to be effective, teachers need to learn about the appropriate pedagogy as well as update their knowledge and understanding. Governments should work towards a significant increase in the number of teachers specifically educated to teach statistics but this requires time and planning to achieve. The authors make the following specific recommendations:

  1. 1.

    Statistical projects should be included in the mathematics education components of both pre-service and in-service teacher education programmes. This is feasible since the development and discussion of projects is already an integral part of most mathematics education teacher training.

  2. 2.

    The statistical projects used should emphasise the data investigative cycle as a vehicle to teach statistical thinking and to develop teachers’ own statistical understanding and knowledge. The PCPD or PPDAC description can provide a framework. Burgess (2008) demonstrated how teachers’ statistical knowledge for teaching statistics could be usefully benchmarked using the PPDAC framework.

  3. 3.

    The projects used as part of both pre-service and in-service teacher education should, wherever possible, be projects that can be adapted to school use. Many statistical questions can be approached at different levels of sophistication. By using projects that can be used with their students it is more likely that the pre-service and in-service teachers will utilise the materials and ideas in their teaching.

  4. 4.

    Teachers need to undergo the same learning experiences as their students (Burgess, 2008; Pfannkuch, 2008), and teachers of teachers may also need to be trained in substantive statistical content and pedagogical knowledge.

More research should be undertaken into frameworks that will help to structure the teacher “learning” of statistics. Very little is known about effective mechanisms. Much of the research thus far has been on the nature of statistical understanding and thinking; a focus on effective pedagogy in the statistical education of teachers, per se, is a key area of what needs to be undertaken.