There has been a substantial increase in the number of studies examining the association between spatial reasoning and mathematics in recent years. This increased attention has been derived, in part, by the view that an individual’s spatial reasoning capacity is essential for society’s new science, technology, engineering and mathematics (STEM) workforce development (Wai et al. 2009). Spatial reasoning refers to a broad suite of cognitive skills involved in the mental manipulation of two-dimensional and three-dimensional relations between and within objects. Importantly, spatial reasoning skills can be learned and improved (e.g., Uttal et al. 2013), and such improvements can transfer to mathematics learning (e.g., Hawes et al. 2017; Lowrie et al. 2017, 2019, 2020).

In this paper, we provide a description of how spatial reasoning has been defined and measured in both psychology and education contexts. We then explore how interdisciplinary alignment can be achieved. A cornerstone of our argument is that spatial reasoning skills are learned and developed in context (not in a void). Children engage in spatially demanding real-world activities, and through this process, develop a range of contextualised and interrelated spatial skills. As such, theoretical models and assessment measures should account for context to better understand the basic cognitive skills involved in complex spatial tasks, and how they support mathematics education. This analytic process is required for the consumers of each discipline to build upon each other’s intervention work.

Comparison of psychological and educational approaches to spatial reasoning

Descriptions of mental imagery

Psychologists have been identifying and characterising different kinds of spatial abilities since Galton (1883) investigated individual differences in the fidelity and power of individual’s mental imagery. From this work emerged two distinct traditions of imagery research, namely, imagistic representations and propositional representations. The first view described a mental image like a picture of a photograph in the head, which is a replica of a previous pattern of sensory activity. According to Kosslyn (1975), a visual image has two structures: the “deep structure” which is the abstract format in which a visual image is stored in the long-term memory, and the “surface structure,” which is generally associated with the experiences of visual imagery. Kosslyn argued that the perception of concrete objects, and experiences of events or episodes stored in the senses, influence a person’s interpretation of an image. By contrast, Pylyslyn (1981) maintained that images could not be stored in a raw form because it would limit information retrieval. He argued that both images and verbal information form underlying structures (i.e., propositional representations) that originate from knowledge an individual has about objects in the real world.

Both traditions appear to have some merit in mathematics education, with exposure to concrete materials important in spatial concept formation, while symbolic representations appear more automated as expertise develops. For example, Bishop (1980) described the importance of students sharing their idiosyncratic imagery to develop more powerful mathematics representations. According to Presmeg (1986), students use a range of different kinds of visual imagery even within the restricted domain of mathematics engagement. Presmeg identified five different kinds of mental imagery, namely, (1) concrete, pictorial imagery (pictures in the mind’s eye); (2) pattern imagery (pure relationships depicted in a visual-spatial scheme); (3) memory images of formula; (4) kinaesthetic imagery (involving muscular activity) and (5) dynamic (moving) imagery. Presmeg argued that students had little trouble generating visual images, with the types of images evoked changing when encountering different types of problems. However, it is not always the case that these images are helpful in problem solving. Vivid mental images can be counter-productive when it comes to problem solving as the detail may interfere with extracting relevant task information (Hegarty and Kozhevnikov 1999; Presmeg 1986).

Measurement of spatial reasoning skills

In both psychology and education, most definitions of spatial reasoning skills have been closely aligned to descriptions from psychometric tests. These tests were initially developed to determine one’s suitability for specific occupations, including defence (Hegarty and Waller 2005). The use of these tests became more influential when they were modified and adapted for testing of students on entry into university and during high school. These tests, most developed more than 50 years ago, are used extensively in the spatial reasoning literature today. They are intended to be presented in a pencil-and-paper form and require students to decode pictorial representations of shapes and objects in ways that evoke mental imagery. Consequently, most measures demonstrate one’s ability to mentally manoeuvre 2D shapes or 3D objects in space. Sample test items assessing three distinct spatial skills are presented in Fig. 1. Descriptions of the individual spatial skills are provided in the next section describing how these skills have manifested in numeracy assessment. These examples are not exhaustive, but they do represent a subset of spatial skills that overlap with those found in mathematics curricula.

Fig. 1
figure 1

Sample psychometric test items

Despite these tests being the best attempts at replicating spatial skills, they are limited in their ability to replicate the complexity of spatial manoeuvres embedded within a context (Atit et al. 2020). For example, while spatial orientation (see Fig. 1) may be one spatial skill involved in finding your way through a crowded city, such a task likely requires a host of other skills (e.g., navigation, symbolic reasoning, mental rotation, perceptual acuity). This gap between foundational spatial skills captured in psychometric tests and real-world situations is further amplified within the complex spatial problem solving undertaken across the STEM disciplines in today’s dynamic and digital environments. Mathematics tasks associated with proportion and reasoning, for example, are highly spatial in nature—and are certainly not fully measurable within these established psychometric tests—but are representative of the application of spatial reasoning beyond performance on psychometric tests.

More recently, researchers at the Spatial Intelligence Learning Center (SILC) in the USA have been working to create psychometrically sound measures of spatial reasoning involved in the STEM disciplines, with a focus on geoscience. Tests have been developed to isolate spatial reasoning skills involved in the practice of geoscience without requiring content knowledge. For example, the mental brittle transformation test (Resnick and Shipley 2013) assesses one’s ability to visualise putting broken pieces back together, as in what happens when geologists reason about geologic faults. Despite these advancements, similar tests are needed to capture the complex spatial reasoning involved in mathematics, and then translate this work into psychology and education research more broadly.

Spatial reasoning within nationally standardised mathematics assessment

National mathematics tests are becoming more spatially demanding. Lowrie and Diezmann (2009) noted that 85% of the inaugural Year 3 and 5 Australian numeracy tests (MCEETYA, 2008a; 2008b) contained information or contextual graphics. Most of these items required students to decode or encode spatial information—with approximately 25% of all items encouraging (requiring) students to evoke some form of spatial visualisation, orientation or mental rotation of shapes or objects. Such items either require students to utilise spatial skills or access spatial tools including gesture and language (Newcombe 2018). The number of spatial items used to assess numeracy competency in Australia is higher than in equivalent Trends in International Mathematics and Science Study (TIMMS) instruments, where spatial tasks comprise approximately 16% of all items. State-based tests in the USA represent much lower proportions of spatial items (see Lowrie and Logan 2018 for an analysis of spatial items across education jurisdictions). Examples of items in Australia’s national numeracy test (NAPLAN; ACARA 2016) by spatial skill are presented in Fig. 2. These numeracy items tend to mimic the spatial skills measured by the traditional psychological assessments and theories, with the exception of spatial structuring. Spatial structuring is identified as a foundational spatial precursor to mathematics understanding (Battista and Clements 1996; Mulligan et al. 2020), which has not yet been widely examined in psychological literature.

Fig. 2
figure 2

Sample numeracy test items note. Figures reproduced with permission from the Australian Curriculum, Assessment and Reporting Authority

Spatial visualisation

Spatial visualisation may involve completing a sequence of mental transformations, where it may be required to hold intermediate products in memory (Salthouse et al. 1990). It can involve the mental manipulation of an entire spatial configuration and generally necessitates a greater number of processing steps. The spatial visualisation task from the NAPLAN is very similar to the Ekstrom et al. (1976) Paper Folding Test. Both ask the student to imagine/visualise the folds, the cut and then the unfolding of the paper. To do so, they must hold in memory the way the paper is folded after two folds, where the joined corner is located, and the angle of the cut. Then, they must imagine/visualise reversing the folds, keeping in mind the missing section of the paper and how this would be represented. This type of systematic visual manipulation is a key example of the type of spatial processing associated with other spatial visualisation skills. Approximately 38% of year 3 students and 57% of year 5 students across Australia were able to answer this task correctly (Queensland Curriculum and Assessment Authority 2014).

Mental rotation

Mental rotation involves imagining a whole figure rotating. Both the psychometric and NAPLAN items ask the student to identify a figure in a different position after rotation. The NAPLAN item, for example, could be solved by imagining two structured movements (rotating the figure 90 degrees around the Y axis and then 90 degrees around the X axis towards the viewer) or a single three-dimensional diagonal movement through space. This is contrasted with the Shepard and Metzler Rotation Test (Shepard and Metzler 1971), which only moves the geometric figure around a singular plane (either the plane of the page or in depth). Approximately 63% of year 3 and 81% of year 5 students in Australia answered this task correctly (Queensland Studies Authority 2013). This could be indicative of the level of rotation content covered in the Australian Mathematics curriculum under the location and transformation sub-strand in the measurement and geometry strand.

Spatial orientation

Spatial orientation involves imagining the perspective from another location that is not your own. That is, the anticipation of location from different vantage points (Newcombe and Huttenlocher 1992). This skill has been referred to as “perspective-taking” (e.g., Frick et al. 2014) and “spatial orientation” (e.g., Hegarty and Waller 2004), though both terms involve imagining the view from another vantage point. Available psychological measures involve the student imagining moving along a horizontal plane. For example, Hegarty and Waller’s (2004) task asks students to imagine moving to one location (e.g., where the cat is standing), facing another location (e.g., where the tree is) and then pointing to a third location (e.g., where the flower is). In a test designed for children aged 4–8 years old (Frick et al. 2014), children are asked to pick which photo a toy Lego figure could have taken given their perspective. The photo options included four possible perspectives: front (aligned with the child’s view), back, left and right. The NAPLAN question requires the student to imagine the front perspective of a series of cans, where the anticipated view point is different from what they are currently provided. Note that two of the response options are from a bird’s eye view. Approximately 41% of year 5 students across Australia answered this correctly.

Spatial structuring

The construct of spatial structuring is predominantly drawing on skills identified by Battista and Clements (1996). Drawing on the definition of Battista and Clements (1996), Pittalis and Christou (2010) identified spatial structuring as:

…the mental act of constructing a structure or a form for an object or set of objects. Spatially structuring an object means identifying its spatial components and then establishing interrelations among components and composites (p. 193-194).

The example provided by Pittalis and Christou (2010) related to 3D spatial structuring, where they suggested “students that spatially structure a cube with an edge of five units understand that 25 unit-sized cubes are needed to cover its base, which constitute a new ‘unit/layer’ and that five ‘units/layers’ fit in the cube” (p. 194).

The spatial structuring task (see Fig. 2) requires students to mentally deconstruct the object in order to count all the cubes. Since some cubes are hidden, students need to rely on their knowledge of array structures to visualise the rows and columns, utilising the top view perspective to help. Students might approach this task by considering the horizontal layers of the object first, given they are provided with the bottom layer of the structure. In this example, they might identify that the bottom layer has 12 cubes. Then, they might work from the base layer up, considering each horizontal layer, continuing to add the set of cubes that form that layer. Another approach might be to count the cubes in each column, starting with the single layer cubes at the front of the object and working back to the columns that are “4 high” at the back of the structure. In either approach, students are identifying the spatial and structural components of the object and considering the relationship among those components. Perhaps not surprisingly, only 8% of Australian students were able to correctly navigate the spatial demands of this task (Queensland Studies Authority 2009).

While the individual items of the psychometric test and standardised mathematics test feel similar, the structure of the overall tests differs. Standardised tests typically include a few items on range of different content, with enough time provided to complete the test in full. This is contrasted with psychometric tests that focus on a single skill in a timed setting. This allows psychometric tests to differentiate analytic approaches, whereas this may not be the case for standardised tests. While the spatial items described above are intended to engage the student in spatial reasoning by imagining movement of objects or yourself holistically, many students use verbal-analytic approaches to complete these tasks (Harris et al. 2013).

Interdisciplinary alignment

In this section, we consider the pitfalls of, and potential for, interdisciplinary alignment around spatial reasoning. We do not advocate for total alignment; indeed, this would stymie progress in the respective fields. Rather, we argue that it is important to understand the affordances of the distinct methodology paradigms—from controlling as many variables as possible in laboratory-based environments to single site-based case studies within a classroom. In a similar vein, we maintain that instruments that measure fine-grain differences within a spatial construct are as important as tools that measure spatial constructs within life-like contexts. In this section, we describe how methodological considerations (in terms of study design and analyses) and assessment tools (how performance is measured) are interwoven across the distinctly different discipline approaches and viewpoints.

Methodological viewpoints

Bruce et al. (2017) highlighted the fact that cognitive psychologists and mathematics educators seem to be focused on similar problems, yet the disciplines seem to function independent from one another. Their network analysis revealed several factors inhibiting transdisciplinary connections including discipline-based validity, outcome expectations and a lack of awareness of work outside researchers own fields. Most spatial intervention studies conducted in psychology disciplines tend to be devoid of context. Such studies are usually designed in a laboratory environment with participants assigned randomly to two or more groups. The intervention is typically administered by an expert, with dosage that is typically short in duration. These studies are tightly controlled, with little penetration to life-like experiences. A recent example is work undertaken by Gilligan et al. (2019), who found transfer from improved spatial reasoning skills to mathematics with an intervention that lasted approximately 4 min. Post-test measures were captured within 5 min of intervention completion. Such studies are tightly controlled; however, it is difficult to ascertain how such interventions could be scaled up effectively within classroom contexts. In the Gilligan et al. (2019) example, would such an intervention improve mathematics achievement within the classroom? If not, what kinds of modifications would be required to reach this desired outcome: increased length, more embedded context, scaffolding, and so on?

By contrast, interventions conducted through an “education design” tend to be administered by the classroom teacher (see Hawes et al. 2017; Lowrie et al. 2017, 2018, 2019, 2020; Mulligan et al. 2020; Patahuddin et al. 2020). The interventions are administered over extended periods of time in situ. Because the interventions balance many components within uncontrolled settings, it is difficult to identify specific causal mechanisms that explain why the intervention worked. Take, for example, the Lowrie et al. (2017, 2020) interventions, which were comprised of a range of activities that engaged students in spatial visualisation, mental rotation and spatial orientation using the ELPSA pedagogical framework. It is not clear if the transfer to mathematics was due to practice with one or more specific spatial reasoning skill, the ELPSA framework, or perhaps the combination was required. Additionally, teacher buy-in, efficacy and content knowledge may also influence student outcomes. Since students within these classrooms are not assigned randomly to specific intervention or control groups, nesting analysis needs to be considered. Such designs are costly, because you typically need a minimum of twenty intervention and twenty control classes to gain sufficient statistical power to undertake nested (hierarchical) analysis. Nevertheless, cluster randomised control trials (C-RCTs) should be considered gold standard in education designs because practice authenticity is heightened. Only at this level can one be sure the intervention can actually “work” in practice.

Assessment tools

The measures and instruments psychologists use to measure spatial reasoning cannot, or should not be expected to, capture measures of spatial thinking in context (Atit et al. 2020). Nevertheless, for measures of spatial reasoning to be relevant in education fields, more consideration needs to be given to the parameters in which specific spatial skills are defined. At present, these definitions remain narrow.

For the most part, many of the measures currently used to assess spatial development in intervention studies, and to assess the relation between spatial reasoning and broad mathematics achievement, are designed for controlled, lab-based implementation. For example, Hawes et al. (2017) used an adapted version of Levine et al.’s (1999) mental transformation tasks (CMTT; Form D) to measure 2D mental rotation and the Raven’s Progressive Matrices (Raven 2008) to measure visual-spatial analogical reasoning (Hawes et al. 2017). In their high school intervention study, Lowrie et al. (2020) employed traditional psychological measures of object-based spatial skills from Ekstrom et al.’s (1976) Kit of Cognitive Factors (mental rotation and spatial visualization). Studies situated within education contexts report more moderate relationships between spatial measures and curriculum-based standardised tests (Frick 2019; Gilligan et al. 2019; Sorby and Panther 2020) than can be found in lab-based, tightly controlled studies (e.g., Mix et al. 2016). The paucity of appropriate tools to assess a variety of spatial manoeuvres leaves researchers limited by measures that best meet their needs in terms of validity and ease of implementation. These choices are further influenced by what is deemed acceptable when it comes to publication. While ensuring result generalisability, the question remains whether the use of psychological spatial tests truly reflects the nature of spatial thinking in mathematics education environments.

Few studies have examined the extent to which psychological measures of spatial skills align to the spatially rich mathematics tasks students encounter in school-based numeracy and mathematics assessments. Instead, discussions tend to be focused on the mechanisms connecting space and mathematics in controlled lab-based studies removed from applied mathematical problem solving (Hawes and Ansari 2020; Hawes et al. 2019). Data models continually suggest that mathematics and spatial reasoning remain distinct but related constructs (Hawes et al. 2019; Mix et al. 2016). Therefore, future studies need to unpack the presentation of spatial skills within mathematical problem solving.

In small-scale studies that afford opportunities for one-on-one testing, instruments include the Pattern and Structure Assessment (Mulligan and Mitchelmore 2009) for assessing mathematical pattern and structure which is foundational to mathematical development (Mulligan et al. 2020). The scale of these studies also allows for video-recording to enable analysis of gesture and children’s ability to reason spatially (Bruce and Hawes 2015). Studies which are implemented on a smaller scale have opportunities for greater fidelity due to ongoing researcher involvement (Clements and Sarama 2007; Mulligan et al. 2020). The detailed qualitative and longitudinal data analysed at a classroom level affords in-depth understanding and modelling of student learning (Clements and Sarama 2007).

Connections between spatial intervention and mathematics achievement

The consistent findings of the efficacy of spatial intervention have led researchers from a wide range of disciplines to explore ways to leverage spatial intervention for STEM outcomes (Uttal et al. 2013). Although there is now a body of work that points to the affordances of spatial reasoning for mathematics development, the field is still emerging with a limited number of studies characterizing specific causal connections (Stieff and Uttal 2015), and most taking place in Western countries (e.g., Cheng and Mix 2014; Gilligan et al. 2019; Lowrie et al. 2017). Intervention tends to take one of two forms: (1) spatial learning programs (Hawes et al. 2017; Lowrie et al. 2017, 2019, 2020) or (2) spatial skill training (Cheng and Mix 2014; Gilligan et al. 2019).

Classroom-based spatial learning programs typically embed spatial learning within classroom content. For example, a learning program delivered over 3 weeks (age range = 10–12) that focused on spatial visualisation skills (i.e., reflections, symmetry, paper folding, nets of solids, hidden figures, and cross-section) resulted in mathematics performance improvements in both geometry- and word-based problems (Lowrie et al. 2019). A 47-h classroom intervention that focused on similar spatial visualisation skills with 4–7-year-olds improved student performance on symbolic number comparison tasks (Hawes et al. 2017). Spatial reasoning programs have also been balanced across a variety of different kinds of spatial skills, such as spatial visualisation, mental rotation and spatial orientation (see Fig. 1). Such programs have increased primary students’ knowledge of geometry problems (Lowrie et al. 2017) and high school students’ understanding of geometry, word and number-based problems (Lowrie et al. 2020).

Spatial training, by contrast, tends to be narrow and focused where the intervention is implemented by researchers. For example, training is typically aligned to specific spatial skills including mental rotation (e.g. Cheng and Mix 2014) and spatial scaling (Gilligan et al. 2019). A single 40-min session of mental rotation practice improved 6- to 8-year-olds’ mental rotation skills and missing-term mathematics problems in the Cheng and Mix (2014) intervention. Gilligan et al. (2019) delivered 3–4 min of training and found improvements in missing-term problems, number line estimation and geometry tasks in a test 5 min after training.

It is crucial to note that the spatial learning and training programs described above vary on a number of attributes, which makes it challenging to consider the impact of an intervention in a broader classroom context. The studies range in length (a few minutes to 32 weeks), age of students, researcher involvement (single-dose versus ongoing professional development) and focus on different spatial and mathematics skills. In addition, spatial and mathematics reasoning both involve partially dissociable skills (Mix et al. 2016; Resnick et al. 2019), which can explain why the relation between spatial reasoning and mathematics performance varies by task (e.g. Holmes et al. 2008; Mix et al. 2016). For example, in each of the studies described above, transfer was only observed in specific tasks: the work by Lowrie et al. did not transfer to number-based problems (2017) or non-geometry graphic tasks (2019), Cheng and Mix (2014) did not see improvements on number-fact problems or multidigit calculation, and Hawes et al. (2017) did not see gains in the non-symbolic comparison test or the measure of number knowledge.

To further establish the causal mechanisms that connect spatial reasoning to mathematics, it is imperative that the field has clear definitions of spatial reasoning—including what constructs are contained within the broader spatial term. It is also necessary for the two main disciplines of psychology and education to better understand the respective fields’ viewpoints in order to build upon each other’s interdisciplinary strength (see Mix and Battista 2018, for a comprehensive description of these challenges).

Conclusion

Newcombe (2018) argued that researchers need to examine how spatial tools, including gesture, sketching and the encoding of diagrams and graphs, are best used to enhance mathematics learning. This argument is based on the premise that spatial tools can influence learning environments and classroom practices in ways that go beyond skill development. With evidence emerging that spatial reasoning improvement can impact on mathematics achievement, it is now important to consider why and under what conditions spatial reasoning transfers to mathematics. We propose this approach is twofold: (1) lining up spatial skills with mathematics concepts to understand the mechanisms that connect mathematics and spatial reasoning and (2) examine what is happening in classrooms to understand how implementation of spatial intervention impacts student thinking and learning. To this end, it is helpful to conceptualize spatial learning as a continuum, entirely embedded within a [mathematics] context at one end and entirely controlled context at the other end. Researchers sitting at either extreme should begin to consider how their work may be extended to populate the middle of the continuum: How might lab-based studies be replicated or scaled up into a range of classroom contexts? How could classroom-based studies offer more controlled settings or systematic variation? The results of such work will inform our understanding in both directions along the continuum, and thus better position researchers to identify the causal mechanisms connecting specific spatial and mathematics skills in real-world contexts and to develop targeted interventions that are efficient and effective.