Introduction

When we think of numbers, we automatically map them on space. These spatial numerical mappings were originally reported by Dehaene, Bossini, and Giraux, in 1993, via number parity experiments. When Western participants judged the parity of numbers, they unconsciously responded faster on the left to smaller numbers and faster on the right to larger numbers; it was concluded that a faster response on the left reflected a spatial representation of smaller values to the left, and vice versa. This phenomenon is known as the Spatial Numerical Association of Response Code (SNARC) effect and considerable research effort has been invested to investigate its properties. Along with numbers, the seminal SNARC paper also investigated whether letters could be spatially mapped (Dehaene et al. 1993, Experiment 4). Yet, when earlier letters (e.g. A, C, E) in the alphabet were not found to associate with the left side of space and later letters (e.g., R, U, and X) to the right, it was suggested that the spatial effect was specific to only numbers (Dehaene et al. 1993).

It was not until a decade later that a new wave of research started to investigate the possibility of spatial associations beyond numbers (Gevers et al. 2003; Rusconi et al. 2006; Santiago et al. 2007). A experiment on ordinal sequences, with month of the year stimuli, was one of the first to successfully report a significant SNARC-like effect for a non-numerical domain (Gevers et al. 2003): with early months of the year responded to faster with the left hand and later months with the right hand. Torralbo, Santiago and Lupianez (2006) also used a temporal reference, past and future, and found a SNARC-like effect with faster left responses for past and faster right responses for future. Similarly, musical experiments have found responses to lower pitch sounds faster on the left and higher pitch sounds on the right (Cho et al. 2012; Nishimura and Yokosawa 2009).

As such, since non-numerical domains have become a growing focus of spatial association literature, the term ‘SNARC’ no longer adequately captures all the different constructs that can be spatially mapped onto the left and right. In fact, a more appropriate name for the SNARC has already been suggested—the SQUARC (Spatial Quantity Association of Response Codes; Walsh 2003)—which accounts for the relationships between numerous magnitude quantities and space, not just numerical quantities. Yet, even the SQUARC is not fully capturing the extent of the non-numerical SNARC literature, as there are magnitudes demonstrating spatial mappings that do not have quantifiable differences, e.g., letters. Thus, a simplified name for this field could just be the SARC (Spatial Association of Response Codes)—which does not rely on a specific domain to represent the relationship between magnitude and space. For the remainder of this paper, all SNARC-like effects will be referred to as SARC effects.

The techniques that have been employed to measure spatial associations in the last 2 decades typically comprise either implicit or explicit task demands. Implicit SARC experiments ask participants to make judgements on properties of the presented stimuli which are orthogonal to its magnitude (e.g., parity of presented number, colour of target, and type of instrument; Dehaene et al. 1993; Chang and Cho 2015; Rolke et al. 2013). In explicit task demands, however, the magnitude of the presented stimulus is critical for the judgments as participants judge stimuli in comparison to a referent stimulus (e.g. lower/higher than referent number, shorter/longer than referent duration, smaller/larger than referent object; Fabbri et al. 2013; Sellaro et al. 2015; Viarouge et al. 2014). Synthesizing the SARC literature will allow examining whether the type of task instruction affects spatial mappings.

The primary purpose of this systematic review and meta-analysis is to identify all empirical papers that have investigated a left–right SARC effect in non-numerical domains. While numerical spatial associations have been comprehensively reviewed (Wood et al. 2008), no systematic review has brought together the literature on non-numerical spatial associations. This is the first opportunity to unify the literature on all non-numerical magnitude mappings, identify which domains have the greatest propensity to be spatially associated, and determine whether the use of implicit or explicit instructions induces a larger SARC effect. Finally, as the influence that expertise of a domain can have on spatial mappings has been widely debated in the literature (Cipora and Nuerk 2013; Hoffmann et al. 2014), a final analysis of expert vs non-expert participants will be conducted for the domain of musical associations (i.e., musicians and non-musicians).

Methods

Literature search and inclusion criteria

A systematic literature search was conducted to identify all papers relating to non-numerical spatial associations. This search was performed in accordance with the PRISMA guidelines (Liberati et al. 2009). The search terms used for this review included SNARC; Spatial Numerical Association of Response Code*; SCARC; Spatial Chromatic Association of Response Code*; SMARC; Spatial Musical Association of Response Code*; SPARC NOT ‘secreted protein acidic’ NOT ‘cysteine’; Spatial Pitch Association of Response Code*; SQUARC; Spatial Quantity Association of Response Code*; STARC; STEARC; Spatial Temporal Association of Response Code*; Mental Number Line*; Mental Time Line*. These terms were based off already established non-numerical spatial associations in this field.

The systematic search was conducted on the 15th November 2016, in EMBASE, Medline, and PsycINFO. No terms were mapped to subject headings. The search was restricted to no publications before 1990. As the SNARC was only published in 1993, any publications before 1990 were irrelevant. Thus, the search only collected papers published between January 1st 1990 and November 15th 2016.

Papers were included in the review if they investigated horizontal mappings of left and right spatial associations in non-numerical magnitudes; included healthy adult participants (no clinical disorders or synaesthetes—although healthy controls were included in a non-healthy population, e.g. clinical/case study); and were English language only (they could originally be published in a different language if then translated into English). Papers were excluded if they were non peer-reviewed publications (conference abstracts, dissertations, review articles, etc.); investigated right-to-left spatial mappings only (papers were included if the right-to-left data were a separate condition or analysis from the left-to-right data); investigated vertical mappings only (papers were included if the vertical data were a separate condition or analysis from the horizontal data); used numerical stimuli only (papers were included if the numerical stimuli was a separate condition or analysis from the non-numerical stimuli); and if the experiment incorporated a crossing hand manipulation (i.e., if hand responses for buttons were used, the hands physically switched buttons). For papers reporting multiple experiments, the paper was included when at least one experiment adhered to these eligibility criteria—data from the non-eligible experiments were not reported in this review.

From the eligible experiments, information regarding the number of participants, their ages, and gender was extracted; in addition to the magnitude investigated, the methodology, statistically significant findings (p < .05), and finally whether or not a significant SARC effect was found. This information was collated into a systematic review table.

The experiments were further categorized into groups based upon specific criteria. The magnitude domains were operationalized into four classifications (consensus agreement by the authors), as described in Table 1: temporal associations, musical associations, size associations, and additional domain associations. The temporal associations were then separated into three more subcategories in relation to the type of time construct: D-time (past/present/future), S-time (earlier/later than relationships), and T-span (durations) (Nunez and Cooperrider 2013).

Table 1 Spatial association classifications

The experiments were categorized as having either implicit or explicit processing demands. The musical association experiments were also classed as having either expert (musician) or non-expert (non-musician) participants. If no information was provided on the participants’ level of musical experience, it was assumed that they were non-musicians.

Meta-analytic procedure and effect size calculation

Meta-analyses were carried out using the Comprehensive Meta-Analysis Software (Version 3.3.070, Biostat, Englewood, NJ). Experiments reporting inadequate data for effect size calculations were not considered for the meta-analysis. The means and standard deviations were used to calculate the SARC effect sizes. If these data were not provided, effect sizes were computed from other data, including F, t, p, and χ 2 statistics. For each experiment, the standardized mean difference (d), the 95% confidence interval, and the corresponding Z and p values were calculated. The effect size was pooled using a random effect model. Separate analyses were run to assess overall and domain-specific effect sizes. Moreover, effect sizes were calculated for task demands (implicit vs explicit magnitude processing), expertise in the musical domain (musicians vs non-musicians), and the subcategories of the temporal domain (D-Time, S-Time, and T-span).

Overall effect size calculation

In instances where an individual paper reported different outcomes in the same experiment (e.g., error and reaction time data) or multiple experiments (e.g., one experiment assessing musical associations and another experiment assessing temporal associations) using the same participant sample, effect sizes were aggregated to obtain one effect size per included paper. However, in cases where a paper reported multiple experiments with different participant samples, an effect size was calculated for each sample separately.

Domain-specific effect size calculation

When obtaining the effect sizes for a specific domain (e.g., temporal associations), each experiment investigating that domain was considered. That is, when an experiment used the same participants sample but measured multiple domains (e.g., temporal and musical associations), a separate effect size was calculated for each domain.

Publication bias

The publication bias was assessed with four approaches. First, the fail-safe N (Rosenthal 1979) was computed. The number indicates how many potential missing studies would need to be included before the p value became non-significant (Rosenthal 1979; Borenstein et al. 2009). As a rule of thumb, the fail-safe N should be at least five times the number of included studies plus ten (e.g., for 20 studies, there would be a fail-safe N of 110; Rosenthal 1979). Second, a Funnel plot depicting standard errors on the y axis and effect size on the x axis was visually inspected for the overall effect. The distribution of the effect sizes should be symmetric and form a funnel shape to indicate the absence of a publication bias (Borenstein et al. 2009). Third, Funnel plot asymmetry was statistically assessed by the intercept from the regression of normal deviation against the inverse of the standard error (Egger et al. 1997). A p value smaller than .05 (one-sided) is indicative of an asymmetric funnel (i.e., publication bias). Finally, Duval and Tweedie’s trim and fill procedure (Duval and Tweedie 2000) was employed for the overall effect. The procedure imputes the missing studies based on the asymmetry of the funnel and provides an unbiased estimate of the effect size.

Results

A total of 4186 publications were identified through the database search. These were imported into EndNote, where 1970 were removed as duplicates. From the remaining 2216 papers, title and abstract screening identified 79 papers for full-text screening, resulting in 48 selected papers. The reference lists from these texts were examined, using a snow-balling strategy, which helped to identify 9 more eligible papers. Thus, 57 papers were identified from the systematic review, representing 112 experiments. Details regarding the spatial association being investigated, the methods, findings, and presence of a SARC effect, were collated and are presented in Appendix 1 in supplementary material (systematic review table).

Systematic review summary

A significant SARC effect was found in 70 of the 112 experiments. The general trend across all domains was to associate the smaller magnitude (e.g., low, short, early, slow, soft, loss, etc.) to the left and the larger magnitude (e.g., high, tall, late, fast, loud, gain, etc.) to the right. Twenty-two experiments reported no significant SARC effect, while there were 20 experiments that provided partial support. To clarify this, a partially supported SARC effect referred to an experiment in which there could have been multiple task demand outcomes or multiple participants groups, and a significant SARC effect was found for only one of each. For example, a SARC effect for explicit instructions but not for implicit (Cho et al. 2012) or a SARC effect for musicians but not for non-musicians (Lidji et al. 2007).

The most prevalent magnitudes that investigated a SARC paradigm were the temporal (Ne = 60, 53.6%) and musical (Ne = 23, 20.5%) domains (see Table 2 for a summary of significance in each domain). However, while the temporal associations were significant for 66% of experiments, only 39% of musical experiments found a significant effect. The size (Ne = 7, 6.3%) and various additional domains (Ne = 22, 19.6%) all provided significant (or partial) evidence for a spatial mapping within each magnitude, except for the additional domain experiment on handwriting.

Table 2 Summary of the significant effects in each magnitude domain

Meta-analysis results

Seventeen experiments were excluded from 13 papers (Fuhrman and Boroditsky 2010; Hartmann 2016; Lega et al. 2014; Masson et al. 2016; Miles et al. 2011; Perrone et al. 2010; Schwarz and Eiselt 2009; Shaki and Gevers 2011; Stocker et al. 2016; Tonkovic 2013; Vallesi et al. 2008; Vu et al. 2014; Yang and Sun 2016) as they did not provide the required data for the meta-analysis. Hence, 95 experiments, representing 47 papers, were included in the effect size calculations.

Overall effect

The overall pooled effect size across all experiments was d = .488 (z = 11.556; p < .001), a small-medium effect size for the SARC. Notably, there was considerable variability in the effect sizes across the included experiments as indicated by significant tests for homogeneity (I 2 = 78.5%, Cochran’s Q = 465.7, p < .001).

Domain-specific effects

The effect sizes of the four investigated domains ranged from .339 (size), .429 (additional), .433 (musical), to .562 (temporal). There was no evidence for differences in the estimated effects sizes across the domains (Q(3) = 5.29; p = .15). See Fig. 1 and Table 3 for more detailed statistics.

Fig. 1
figure 1

Pooled effect sizes and 95% confidence intervals of SARC effects for overall and domain-specific categories

Table 3 Pooled SARC effect sizes for overall effects and specific domains (temporal, musical, size, and additional)

Effects for temporal subcategories

As shown in Table 4, there were medium effect sizes for all temporal subcategories (d = .620, 515, and 559 for T-span, D-time, and S-time, respectively). The effects sizes did not differ across the three subcategories \((Q(2)=.524,~~p=.769)\). The heterogeneity across the T-span experiments (27.7%) was less than half of the heterogeneity found in the S-span (76.0%) and D-span (88.6%) experiments.

Table 4 Pooled SARC effect sizes for analyses of temporal subcategories (D-time, S-time, and T-span), task demands (implicit and explicit), and expertise (musicians and non-musicians)

Task demand effects

For the task demands analysis, the effect size for implicit instructions \((d=.347)\) was significantly smaller than for explicit instructions \((d=.611;\;~Q(1)=11.5,\;p=.001;\) see Table 4 and Fig. 2).

Fig. 2
figure 2

Pooled effect sizes and 95% confidence intervals of SARC effects for temporal subcategories, task demands, and expertise

Expertise effects

The effect size for musicians \((d=.447)\) did not differ significantly from the effect size for non-musicians \((d=.424;~\;Q(1)=.018,\;p=.893)\), suggesting that expertise did not influence the size of the SARC effect (see Table 4; Fig. 2).

Publication bias

Publication bias was identified by all four approaches. As revealed by the fail-safe N (see Table 5), the number of studies needed to nullify a significant effect within the size domain and musician expertise group, does not meet the rule of thumb criterion to exclude the possibility of a file drawer problem (Rosenthal 1979). The remaining domains would all require more than a hundred studies (if not thousands) to induce a non-significant SARC effect. The overall effect in particular would require an additional 1565 studies.

Table 5 Publication bias analyses for overall effects and specific domains (temporal, musical, size, and additional), temporal subcategories (D-time, S-time, and T-span), task demands (implicit and Explicit), and expertise (musicians and non-musicians)

A Funnel plot analysis, presented only for the overall effect, also suggested publication bias. The experiments’ effect sizes were not distributed symmetrically (see Fig. 3), but were biased towards the right (SARC effect). Eggers intercept test assessed the funnel plot statistically. A significant outcome for the overall effect reiterated the asymmetric distribution of the experiments’ effect sizes (Table 5 also presents significant p values for the majority of the domains and subcategories). Duval and Tweedie’s trim and fill procedure for the overall effect imputed 22 studies and yielded an adjusted effect size of .350 (95% CI .27–.43).

Fig. 3
figure 3

Funnel plot illustrating publications bias for the overall effect size

Discussion

This review systematically brought together all published papers on horizontal spatial mapping experiments using non-numerical stimuli. The findings from the systematic review were the presence of the SARC effect across four domains: temporal, musical, size, and additional, with earlier/smaller/slower constructs typically being associated with the left side of space (indexed by faster reaction times) and later/larger/faster constructs being associated with the right side of space. This was further supported by the meta-analysis in which the pooled effect size for all experiments was d = .488 (d adjusted for publication bias = .350), indicating a small-to-medium effect for spatial mappings for non-numerical stimuli. Over 20 years, since the seminal SNARC paper reported the effect to be number specific (Dehaene et al. 1993), this cumulative evidence provides overwhelming support for the assertion that SARC effects are, in fact, a general property of quantity processing systems.

There was considerable variability in the reported effect sizes as indicated by a test for heterogeneity, suggesting the need to assess the impact of different domains and experimental factors. The findings of these assessments will be explored first, before a general discussion on what the similarities in the spatial mapping metrics across domains suggest in terms of the higher order processing of magnitude information.

Temporal associations

Investigations into the mapping of time on space accounted for around 50% of all identified SARC experiments and there were more than double the number of experiments included in the meta-analysis than in any other domain. The dominance of temporal experiments in this review should not surprise given the importance of time in our everyday lives. The meta-analysis yielded a medium effect size \((d=.562)\). This finding of an effect size conceptualized as being “large enough to be visible to the naked eye” (Cohen 1988) highlights how ingrained a horizontal timeline is in our thinking and how we rely on this spatial association when processing temporal information (Bonato et al. 2012; Korallo et al. 2012; Nunez and Cooperrider 2013).

Our broad operationalization of the temporal domain included very different aspects of temporal cognition. The perception of duration and the ordinal categorization of life events, for example, involve very distinct cognitive processes. Combining such different processes into one broad measure might have contributed to the high variability of effect sizes across the temporal experiments.

In light of the high variability, a series of meta-analyses on temporal subcategories were conducted. Adhering to Nunez and Cooperrider’s (2013) proposed operationalization of temporal constructs, three subcategories (D-time, S-time, and T-span) were formed. D-time is when time is conceptualized in reference to the present moment—the ‘now’. This is reflective of past/future associations. S-time is external, and is related to events beyond the present moment, e.g., earlier or later than an activity. This correlates with early/late and before/after associations. Finally, T-span refers to quantifiable durations and passages of time; therefore, it is comparable to short/long durations (see Nunez and Cooperrider (2013) for further information on the definitions of the three time categories).

The additional analyses revealed medium-sized effect sizes for all three subcategories and there was no evidence that one subcategory had stronger spatial association than the others. Of note is that the T-span experiments had considerably less variability than the D-time and S-time experiments. One possible reason for the lack of homogeneity in the D- and S-time experiments is that these categories are still too broad and more fine-grained categorisation might need to be considered for future research.

The current review based the temporal categories on Nunez and Cooperrider (2013), but it is important to acknowledge that there is no generally agreed consensus on how to operationalise time-related constructs. This issue of operationalising constructs has also been discussed by Bonato et al. (2012), in a review of associations between time and space. Characteristics from one magnitude are often evident in others and may lead researchers to categorise the associations differently. The experiments on months of the year, and before/after time associations are perfect examples of this. Experiments on months of the year were operationalized as ordinal associations by their authors (Dodd et al. 2008; Gevers et al. 2003), yet asked participants to judge whether stimuli were before or after a certain time point (month). This is a nearly identical instruction given to participants in specific before/after construct temporal experiments (Previtali et al. 2010; Santiago et al. 2010).

For the experiments on ordinal associations, the researchers treated the months of the year as sequences and the differences in mappings as related to their order rather than time. However, as all magnitudes could be considered sequences—especially in number and time domains where there is a specific order to the increasing magnitude (Bonato et al. 2012)—we believe that these constructs would be more appropriate categorized as temporal, in accordance with the operationalizations of this review (refer back to Table 1). In summary, the temporal domain of spatial associations provides much evidence in support of SARC mappings. However, there still appears to be discrepancies surrounding the proper format for categorising time and the overlapping commonalities between various temporal constructs.

Musical associations and expertise

There was a small-to-medium effect size for spatial musical association mappings (\(d=.433)\) on a horizontal axis. Experiments exploring the spatial associations of music have thus successfully established that musical elements (e.g., pitch, tempo, and intensity) can be mapped onto space.

An interesting aspect of the horizontal mapping is that some of the musical elements are equally suited to vertical categorisations (e.g., ‘low’ and ‘high’ pitch). The horizontal mappings of these elements might be shaped by the spatial arrangement of musical instruments, i.e., piano (Chang and Cho 2015; Fischer et al. 2013; Nishimura and Yokosawa 2009). Since vertical arrangements of response buttons were not reviewed in this paper, the evidence for a horizontal mapping of elements, such as pitch, does not mean that the same elements cannot also be mapped onto vertical space. Spatial associations are generally very adaptive in nature, in that the brain can modify abstract concepts depending on the context of the situation (Fischer et al. 2013).

In terms of expertise, the musical domain provided an ideal opportunity to investigate the effects of expertise within a domain on the SARC effect. Musical expertise has been established as increasing an individual’s sensitivity to variations in pitch (Kishon-Rabin et al. 2001); a skill that should be an advantage in a pitch judgement task. However, such an advantage had no significant effect on the SARC as the estimated effect sizes for musicians and non-musicians did not differ. The lack of evidence for an effect of expertise in the musical domain seems to mirror the varying effects of expertise reported in spatial numerical mappings. Some papers suggest that there is no difference between arithmetically skilled and unskilled participants in performance on the SNARC (Cipora and Nuerk 2013); other data indicates that mathematically proficient participants produce weaker SNARC effects than those less proficient (Hoffmann et al. 2014); and yet, further results show that professional mathematicians demonstrate no SNARC effect at all in comparison to a group of matched controls (Cipora et al. 2016).

Size associations

The pooled SARC effect was small-to-medium in size \((d=.339)\) and based on a relatively small number of experiments. However, the spatial associations for size were consistent despite the use of different stimuli (Bulf et al. 2014; Ren et al. 2011; Sellaro et al. 2015; Shaki et al. 2012). Objects, both physical and conceptual, displayed the expected small/left and large/right mapping. Sizes can be sequentially ordered from smaller to larger stimuli. For example, people can be easily placed in sequential order of their height (Shaki et al. 2012). This sequential order might be implicitly mapped on space and give rise to the observed spatial associations.

Additional domains

The additional domain comprised a wide variety of stimuli, yet there was still a small-to-medium SARC effect size \((d=.429)\). The stimuli in this category included letters, words, colours, luminance, and emotional expressions. The findings in this domain highlight that spatial mappings can also occur without a quantifiable difference between stimuli and that even abstract constructs can be mapped onto the left and right sides of space. For example, a SARC effect has been shown for left and right wing political party names (van Elk et al. 2010). Stimuli such as these are clearly not being processed and spatially mapped based on quantity, but rather orders and categories. The language and abstract concept experiments are particularly intriguing, because they suggest that spatial associations extend to language processing (Hutchinson and Louwerse 2014; Tonkovic 2013; Vicario and Rumiati 2014).

Task demands

An analysis of task demands revealed significantly larger effect sizes for explicit \((d=.611)\) than implicit \((d=.347)\) structured tasks. That is, tasks requiring the processing of the magnitude associated with the stimuli (e.g., is a target beat slower or faster than a referent beat?; Prpic et al. 2013) will elicit a stronger mapping of this magnitude on space than tasks in which the magnitude of the stimuli is irrelevant for solving the task (e.g., is the target beat a metronome or a synthesized beep?; Prpic et al. 2013). While it may not surprise that explicit processing of a dimension elicits a stronger spatial association, employing implicit task demands are still an important experimental manipulation to demonstrate the automaticity and intrinsic link between the investigated dimension and its association with space.

General discussion

The meta-analysis provides substantial evidence that spatial associations of response codes go beyond numerical stimuli. However, the comparison of the SARC overall effect size (\(d=.488,~~d~\) adjusted \(=.350)\) with the reported effect size for SNARC (\(d=.99~\); Wood et al. 2008) clearly suggests that the spatial associations for numbers are considerably stronger than for other domains.

The strength of spatial associations was not modulated by the stimuli categories operationalized in this review. The effect sizes for temporal, musical, size, and the additional domains did not differ significantly from each other. Given that most of us encounter links between space and time more often in everyday life (Bonato et al. 2012) than links between space and abstract concepts (such as political parties), it may surprise that there was no statistical evidence for a difference in the strength of temporal and additional domain associations. Although, of note is that the effect size estimate for the temporal domain was numerically the largest of the four domains (see Fig. 1). The standard error of all estimates, however, was relatively large due to the considerable variability of effect sizes within a specific domain. It, therefore, cannot be ruled out that a different and more fine-grained operationalization of the domains could yield statistical evidence for differences in the strength of associations.

Regardless of the question relating to operationalising stimuli, the results from the systematic review and meta-analysis illustrate robust spatial associations in non-numerical domains. Categorisations of magnitudes, regardless of stimuli type, tie in nicely with the idea of a general magnitude processing system (Cantlon et al. 2009; Cohen Kadosh et al. 2008; Fias et al. 2003; Pinel et al. 2004; Walsh 2003)—a system which is involved in the processing of any domain with a difference in magnitude (e.g., small/large, low/high, short/long, slow/fast, etc.). As such, there are limitless opportunities for research in this field; potentially exploring spatial associations for low and high risks, weak and strong smells, low and high temperatures, soft and hard textures, and more.

A prominent generalized magnitude system has already been proposed: A Theory of Magnitude [ATOM] (Walsh 2003). ATOM theorises that information on space, time, and quantity likely share a common spatial processing mechanism within the brain due to similarities in their mapping metrics (Bueti and Walsh 2009; Walsh 2003). Regions within the parietal cortex have been identified in SARC tasks, just as in SNARC tasks as a possible centre for this mechanism (Cutini et al. 2014; Dehaene et al. 2003; Gevers and Lammertyn 2005; Pariyadath et al. 2012; Viarouge et al. 2014; Walsh 2003). In the strict sense, ATOM only accounts for magnitudes associated with motor action, such as size, speed, and time (Walsh 2003, 2015). An interesting question is whether ATOM can or should be expanded to include other magnitudes and dimensions, such as the abstract and metaphorical representations included in this meta-analysis (see also Walsh 2015).

Computational and neurobiological modelling approaches may help to identify common and distinct principles of magnitude processing across different domains. Neurobiological evidence from other cognitive processes such as statistical learning (Frost et al. 2015) suggest that both domain-specific and non-specific processing, even at high-levels of the hierarchy system, underlie the processing of magnitude. It may be that domain-specific processing plays a larger role for domains such as size and music, as compared to more tightly spatially mapped domains such as numbers and time.

Generalized magnitude processing systems do not account for why magnitudes are mapped on horizontal space. This review cannot answer whether the spatial mapping is related to stimulus–response compatibility between mental representations of magnitudes and response codes, or whether the mapping is independent of a mental magnitude line (cf. Gevers et al. 2006 and; Proctor and Cho 2006). What the review can say, however, is that the direction of spatial mappings is likely to be modulated by cultural influences.

Left-to-right and right-to-left readers typically show opposite associations between space and magnitudes. While the meta-analysis did not collate the data from experiments involving right-to-left readers, seven experiments (Fuhrman and Boroditsky 2010; Ouellet et al. 2010; Shaki and Gevers 2011; Vallesi et al. 2014) did involve participants who grew up predominantly reading from right-to-left (see Appendix 1 in supplementary material). Two temporal experiments were unable to substantiate a mapping (in any direction) in right-to-left reading Israeli participants (Ouellet et al. 2010; Vallesi et al. 2014), while a significant reverse mapping was present in another series of temporal tasks (Fuhrman and Boroditsky 2010) and an additional domain letter task (Shaki and Gevers 2011). These findings corroborate the original findings by Dehaene et al. (1993, Experiment 7) in which he deduced a reverse SNARC mapping (right-to-left) for a group of Iranian participants. Importantly, each of these experiments also included left-to-right readers demonstrating significant left-to-right SARC effects. Overall, there seems thus strong evidence that the association between magnitude and space is biased by a person’s habitual reading and writing direction.

The publication bias was assessed with four different approaches. The approaches, particularly Egger’s test of the intercept (Egger et al. 1997), strongly suggest the presence of a publication bias in the SARC literature. Assessing the impact of this publication bias is challenging as the common bias detection tests do not work well with large between-study heterogeneity as observed in our review. At least for the overall SARC effect size, we suspect that the impact is probably modest—meaning that the effect size is very likely smaller than reported here, but still significant if all relevant studies would be published (Borenstein et al. 2009). First, Rosenthal’s fail-safe number suggests that 1565 unpublished studies with null findings would be needed to change the SARC effect from significant to insignificant. This large number of studies suggests that the results are unlikely to be explained only by a file drawer problem (Rosenthal and Rosnow 2008). Second, Duval and Tweedie’s trim and fill procedure (Duval and Tweedie 2000) yielded an adjusted effect size of \(.350\) (\(95\%\) CI \(.27 - .43\)) which is around 28% lower than the unadjusted effect size of \(.488\); but the confidence interval of the adjusted effect size is not crossing 0 and the SARC effect is still evident. Thus, while there is evidence for a publication bias, we are confident that the significance of the overall SARC effect is not merely the result of a publication bias in the field.

In summary, this systematic review and meta-analysis has for the first time brought together magnitude associations from a variety of domains to consolidate the literature on SARC mappings in non-numerical stimuli. Substantial evidence is provided that spatial associations of response codes are a general principle in non-numerical domains, although these associations seem to be weaker than for the numerical domain. The review also attempted to investigate the influence of a couple of factors on the strength of the spatial associations. Clear evidence was found that tasks relying on magnitude activation (explicit tasks) will lead to stronger spatial mappings than tasks in which the magnitude of the stimuli is irrelevant (implicit tasks). Interestingly, there was no evidence that expertise affects the strength of spatial mappings, as the associations of musical elements on space did not differ between musicians and non-musicians.

The investigations on the effects of task demands and expertise are a first step in trying to better understand the strength of spatial associations. The high variability in the reported effect sizes across the different domains suggests that there are many more factors that affect the spatial association than we currently understand. For example, one factor not considered in the current meta-analysis is the effect of cognitive load and working memory demands on the SARC. The standard numerical-based SNARC effect is not evident when cognitive load increases (Herrera et al. 2008; van Dijck et al. 2009), for example, when participants are required to memorise information in addition to completing a SNARC task (Herrera et al. 2008). Cognitive load associated with making judgements across domains may vary and is thus one of the factors that needs to be investigated in the future. Other facets to be researched in the SARC field involve investigating new domains, corroborating current ones, but most importantly, isolating factors modulating the strength of the spatial associations.