Introduction

Visual short-term memory (VSTM) pieces together the visual world across interruptions, generating a continuous representation from a discontinuous world. These interruptions occur via eye movements, occlusions, and distraction to form disconnected temporal segments of visual input (Irwin, 1991). Our subjective perceptual experience is a result of VSTM piecing together these segments into meaningful representations that support many cognitive and motor processes.

A common assumption is that visual memory utilizes similar cognitive and neural processes as visual perception and attention. Indeed, the same representations have been suggested to support visual perception and VSTM (Cowan, 1999; Jonides et al., 2005; Souza & Oberauer, 2016; Theeuwes et al., 2009). In particular, the link between visual attention and memory processes is strong, with some suggesting that VSTM employs visual perception and attention to maintain information across sensory lapses (Postle, 2006) while other studies found evidence for a dissociation between attention and memory processes (Hakim et al., 2019; Sheremata et al., 2018).

One of the fundamental properties of visual perception is that it represents information in retinotopic coordinates (Golomb & Kanwisher, 2012; Inouye, 1909). In other words, perception of objects occurs relative to the location of the retina onto which it is projected. Because of its interactions with visual perception, VSTM might be assumed to maintain information retinotopically. However, many cognitive processes utilize memory representations and encode information in other coordinate systems, such as body-centered or world-centered coordinates—here, collectively referred to as spatiotopic coordinates (Burgess, 2006; Culham et al., 2008). Recent studies linking VSTM to long-term memory suggest that representations may be more similar to those found in long-term memory (Beck & van Lamsweerde, 2011; Xie & Zhang, 2017). While there is some debate as to the coordinate systems used to encode long-term memory representations, it is clear that retrieving memories and acting upon objects in novel contexts and spatial locations requires the ability to represent information in a viewpoint-independent form. Together these studies suggest that VSTM may transform visual information into spatiotopic coordinates to interact with higher order cognitive and motor processes.

However, many studies investigating the spatial coordinates of attended visual representations suggest that attention modulates visual representations in a retinotopic manner (Awh et al., 2005; Golomb et al., 2008; Jiang & Swallow, 2013; McKyton & Zohary, 2008). Retinotopic and spatiotopic coordinates have been teased apart using eye movements that render attended stimuli in the same retinotopic or spatiotopic locations (Awh et al., 2005; Golomb et al., 2008). In these experiments, two complementary components of visual attention, target enhancement (Golomb et al., 2008) and spatially determined distractor probability (Awh et al., 2005), were shown to be retinotopically organized. Spatially specific attention training effects have also shown to be retinotopic or viewer centered (Jiang & Swallow, 2013; McKyton & Zohary, 2008). When participants were trained to attend to stimuli within a visual quadrant that move to the same retinotopic (McKyton & Zohary, 2008) or viewer-centered coordinates (Jiang & Swallow,2013), training effects occur only when the stimuli remain in the same location relative to the observer. Furthermore, training effects were shown to be independent of eye movements. Therefore a preponderance of evidence highlights a retinotopic coordinate system for attentional allocation. Because of the close link between attention and VSTM, it is likely that memory representations are maintained in a retinotopic coordinate frame.

An elegant way to probe the coordinate system of items stored in VSTM is to investigate naturally occurring spatial biases. VSTM performance for single-feature items is better in the left visual field (Carlei & Kerzel, 2014; Sander et al., 2019; Sheremata & Shomstein, 2014, 2017). These asymmetries are modulated by top-down expectations, as expected task demands modulate visual field biases (Sheremata & Shomstein, 2017). Therefore, asymmetries in memory performance are flexible and reflect modulation of the representation rather than an inflexible bias based upon the spatial location of the stimulus.

While behavioral benefits for remembered items presented in the left visual field have been consistently demonstrated, it is not clear whether the coordinate of ‘left’ refers to spatial locations in the external world or location relative to the retina. When stimuli are presented to the left of the computer monitor (spatiotopic coordinates) and participants fixate at the center of the monitor, the stimulus is similarly projected to the left of the eye (retinotopic coordinates), rendering it impossible to disentangle the two coordinate systems. However, changing the visual field location where participants fixate can tease apart these coordinate systems thus revealing the nature of spatial representation in VSTM. With a change in eye position, the location of an object changes its location on the retina but remains in the same location in all other coordinate frames. If attentional biases in retinotopic coordinates are read out to VSTM representations, then visual field biases should change with changes in eye position. However, if memory encoding transforms the representation into a spatiotopic coordinate system, then visual field biases should remain consistent independent of eye position.

Here, visual field biases were measured to determine the coordinate frame underlying VSTM biases. In a set of two experiments we manipulated the location of the retina onto which stimuli were projected while independently manipulating spatiotopic locations. Our findings demonstrate that visual field biases changed with fixation location. These results reveal that the spatial biases seen in VSTM occur in a retinotopic coordinate system, consistent with visual perception and attention biases.

Experiment 1

Methods

Participants

Twenty-four (11 male, mean age 24.8 +/- 5.9 years) right-handed participants were recruited from The George Washington University community, all with normal or corrected-to-normal vision. The sample size was chosen based upon previous studies of visual field asymmetries in short-term memory and taking into account the need to counterbalance a Latin-square design using a multiple of four participants: 2 (Fixation Locations) × 2 (Stimulus Locations). Our critical effect in this experiment was a stimulus position by eye position interaction. Sheremata and Shomstein (2014) in Experiment 1 demonstrated a visual hemifield bias for color (d = 0.69, R pwr.t.test). Calculating population size from this analysis, it was estimated that 19 participants would be needed to find a similar effect size with 80% power. Taking power estimates and counterbalancing requirements into account, 24 participants were recruited. Two participants were excluded, either for having greater than 20% of trials rejected due to eye movements (1) or for having a memory capacity under two items (1). All of the experimental procedures were approved by the Institutional Review Board of The George Washington University and gave informed consent.

Stimuli were presented on a 21-in. ViewSonic G225f CRT monitor (ViewSonic, London, UK) positioned 90 cm from participants (25.5° × 19.1°) with a 140-Hz refresh rate. Participants sat with their head in a chin rest and made responses using a button box. Eye movements were recorded with a SR Research EyeLink1000 (SR Research; Mississauga, Ontario, Canada), sampling monocularly at a 500 Hz rate.

Participants performed a change-detection task in which colored squares were presented against a mean gray luminance background. Maximally discriminable, common colors (dark blue, orange red, green, yellow, purple, plum, and maroon) were pseudorandomly chosen without repeat (Fig. 1).

Fig 1
figure 1

Stimuli and visual short-term memory (VSTM) trial structure for Experiments 1 and 2. a In Experiment 1, stimuli were presented to the left and right of the monitor while participants fixated a central fixation cross or a peripheral fixation cross. In peripheral fixation blocks, participants always fixated the cross on the same side as, but more peripheral to, the stimuli, thereby reversing the location in spatiotopic and retinotopic space. b In Experiment 2, stimuli were always presented at the center of the screen, maintaining their location in spatiotopic coordinates. When participants fixated to the right, stimuli were projected onto the left side of the retina and when participants fixated to the right, stimuli were projected onto the right side of the retina. (Color figure online)

Four colored squares were presented in a square configuration with each square subtending 0.8° of visual angle along each edge. Each stimulus configuration was located approximately 4.7° from the center in the horizontal dimension, with each square offset 1.4° in both the horizontal and vertical directions. Fixation and stimulus location were presented in a blocked design, counterbalanced across participants. Sixteen blocks were presented with 10 trials/block for a total of 160 trials. In half of the blocks, items were presented left of the center of the screen, and in the other half of the blocks items were presented right of the center of the screen, with visual field order counterbalanced across participants. In the central fixation condition, participants maintained fixation at the center of the screen, while in the peripheral fixation condition, participants maintained fixation on the same side as, but 4.7° more peripheral than, the stimuli. This resulted in the stimuli being projected to the opposite location of the eye (retinotopic location) as compared with the screen location (spatiotopic location).

Stimuli were presented for 500 ms, followed by a 1,000-ms memory-delay period (Fig. 1). After the memory delay, the items were again presented. In half of the trials, one of the items changed in color and participants responded to indicate whether all items remained the same or if there was a change. Trials in which participants’ eye position deviated from fixation by 1.56o visual angle were aborted and not repeated (average across participants = 10.7% in Experiment 1 and 9.5% in Experiment 2). Visual feedback was given after each trial to indicate whether the participant answered correctly. Fixation location (central/peripheral) and visual field location (left/right hemifield) order were counterbalanced across participants.

Results

The central question was whether visual field asymmetries inherent in VSTM could reveal whether visual field representations are maintained in a retinotopic or spatiotopic coordinate system. To directly test whether there was a significant effect of retinotopic stimulus location, we recoded trials based upon the location relative to the retina. Therefore a trial was considered retinotopic left when the stimulus was presented to the left and participants fixated at the central location or when the stimulus was presented to the right of the screen and participants fixated at the right peripheral location. We conducted an analysis of variance (ANOVA), with accuracy (percentage correct) as the dependent measure and retinotopic stimulus location and fixation location (central or peripheral) as factors. There was a significant main effect of retinotopic stimulus location F(1, 21) = 4.96, p = .037, ηp2 = .19, but no significant effect of fixation position, F(1, 21) = 0.59, p = .45, ηp2 = .05, or interaction between retinotopic stimulus location and fixation location, F(1, 21) = 1.22, p = .28, ηp2 = .03. Therefore hemifield biases were present for retinotopic visual field locations, independent of fixation location.

To confirm that accuracy was greater in the left visual field, we collapsed across trials regardless of fixation condition (Fig. 2). Accuracy was higher for stimuli presented in retinotopic left as compared with retinotopic right positions across conditions (retinotopic left > retinotopic right), M = 89.3%, SD = 5.9% vs. M = 86.7%, SD = 6.1%, t(21) = 2.27, p = .034, d = 0.44.

Fig. 2
figure 2

Results for Experiment 1. Visual field biases were reversed with changes in retinotopic location, resulting in better VSTM performance for stimuli presented in left retinotopic space. Error bars reflect standard error of the mean difference. Asterisks indicate two-sample t-tests with greater performance in the peripheral fixation condition when stimuli were presented left vs. right of fixation (p = .022), and an interaction between stimulus and fixation positions (p = .034) tested by repeated-measures ANOVA. (Color figure online)

To confirm that the spatiotopic location of stimuli did not contribute to visual field biases, we then conducted an ANOVA with accuracy (percentage correct) as the dependent measure and fixation location (central or peripheral) and spatiotopic stimulus location (stimuli on the left or right on the screen) as factors. If stimuli are represented in spatiotopic coordinates, then the location of the stimuli on screen should cause a spatial bias regardless of fixation location. However, there was no main effect of spatiotopic stimulus location, F(1, 21) = 0.593, p = .450, d = .12.

Planned comparisons between the left and right stimulus position for each of the fixation conditions, however, demonstrated a difference between the peripheral and central fixation conditions. During the peripheral fixation condition, accuracy was higher when stimuli were presented to the right of the screen than the left (retinotopic left > retinotopic right), M = 89.2%, SD = 7.4% vs. M = 85.6%, SD = 7.8%, t(21) = 2.475, p = .022, d = 0.47. During central fixation, there was no significant difference between performance for stimuli in the left as compared with the right visual field, M = 89.5%, SD = 6.4% vs. M = 87.8%, SD = 6.3%, t(21) = 1.145, p = .266, d = 0.27, though there was a bias in the same direction as in the peripheral fixation condition. These results reveal that left visual field biases for single-feature items occur in retinotopic coordinates, linking behavioral asymmetries to retinotopic properties in the brain.

Experiment 2

In Experiment 1, performance was better overall when stimuli were projected onto the left side of fixation regardless of location on the screen. There was a significant difference between visual field locations when participants fixated peripheral locations, but the difference failed to reach significance when a central location was fixated. This could be due to the fact that fixating the peripheral locations increased cognitive demands required by the participants. Alternatively, the difference in conditions could possibly reflect a right spatiotopic rather than a left retinotopic bias. Without a significant bias in the central fixation condition, it is impossible to rule out this possibility. Therefore in order to determine whether task difficulty or competing coordinates could account for this visual field bias, we conducted a second experiment. To confirm that the difference reflects biases in retinotopic representations, here stimuli were always presented at the same visual field location and what changed spatial coding was where the participant was instructed to fixate (either left or right of the stimulus). If the visual field biases observed in Experiment 1 were due to a right spatiotopic bias, there should not be an effect when stimuli are presented at the same spatiotopic location across conditions.

Methods

Thirty-six (14 male, mean age 19.9 +/- 3.8 years) right-handed participants were recruited for Experiment 2 from The George Washington University community. The sample size was chosen based on an effect size analysis comparing performance for the peripheral fixation condition in Experiment 1 (d = 0.47, R pwr.t.test). Because we hypothesized better performance for items in the left visual field, we used a one-sided (greater) power analysis which suggested that 29 participants would be needed to demonstrate a significant effect. Using the exclusion criteria from Experiment 1, a larger sample size was needed due to a greater number of participants unable to maintain fixation. This was likely due to longer blocks of solely peripheral fixation in Experiment 2 as compared with Experiment 1. Seven participants were excluded due to excessive eye movements (six participants, >20% trials aborted due to eye movements) or failing to remember at least two items (one participant).

The paradigm for Experiment 2 was the same as Experiment 1 except for the following. Stimuli were always centered at the middle of the screen (Fig. 1b). In each block (12 blocks, 20 trials/block), participants were instructed to fixate a cross presented 4.7° left or right of fixation. Fixation location varied by block and the block order was counterbalanced across participants. Each participant was presented with set sizes of 4 and 5 squares, with each participant performing the task with set size 4 occurring before set size 5. Stimuli were offset from the center of the stimulus array by 1.7° visual angle.

Results

An ANOVA was performed, with accuracy (percentage correct) as the dependent measure and fixation location and set size as factors. There was a signification effect of fixation location, F(1, 28) = 6.95, p = .014, ηp2 = .20 (Fig. 3), indicating that even though the location of the stimuli on the screen was the same across conditions, the location on the retina resulted in visual field biases. Consistent with left retinotopic biases from Experiment 1, performance was better across set sizes when participants fixated to the right of the stimuli as compared with the left of the stimuli (84.5% vs. 82.6%). A main effect of set size, F(1,28) = 12.05, p = .0017, ηp2 = .30, indicated better accuracy for detecting changes at a set size of 4 items compared with a set size of five items.

Fig. 3
figure 3

Results for Experiment 2. Visual field biases occurred even when stimuli were presented at the same spatiotopic location and were more robust at the higher set size. Error bars reflect standard error of the mean difference. Asterisks indicate two-sample t-tests with greater performance in the set size 5 condition (p = .013), and an interaction between fixation position and set size (p = .037) tested by repeated-measures ANOVA. (Color figure online)

A significant interaction between fixation location and set size, F(1, 28) = 3.51, p = .037, ηp2 = .15, supported our hypothesis that visual field biases are dependent upon task difficulty. There was no significant difference for Set Size 4, left, M = 85.3%, SD = 8.1%, right, M = 85.1%, SD = 7.0%, t(28) = 0.272, p = .787, d = 0.03, but a significant difference at Set Size 5, left, M = 83.6%, SD = 6.2%, right, M = 80.1%, SD = 7.8%, t(28) = 2.648, p = .013, d = 0.50. These results confirm greater performance for left-retinotopic coordinates found in Experiment 1 and further corroborate our findings that visual field biases can be found in retinotopic coordinates without any change in spatiotopic location.

Discussion

Our findings strongly support the notion that visual field biases during visual short-term memory (VSTM) are coded in retinotopic rather than spatiotopic space. In Experiment 1, visual field biases, a marker of retinotopic coding, changed with eye position, demonstrating that memory performance for stimuli in the same location on the screen could differ based upon location where it falls on the retina. In Experiment 2, stimuli were always presented at the same location while eye position and the number of items were varied. These results ruled out any alternative explanations that visual field biases might have been coded in spatiotopic coordinates. Importantly, across both studies we observed systematic asymmetries for stimuli projected onto the left side of fixation regardless of location on the screen. Furthermore, these findings bolster previous observations that visual field asymmetries (Sheremata & Shomstein, 2014) and asymmetries in the brain (Sheremata et al., 2010; Sheremata & Silver, 2015) emerge when task demands require greater allocation of resources to the stimuli.

The current results also reflect visual field biases predicted by asymmetric processing in the brain. Visual field asymmetries have been argued to reflect right hemisphere dominance during visuospatial processing (i.e., Bowers & Heilman, 1980). Asymmetric processing has been documented in cortical regions associated with visual attention (Schotten et al., 2011; Sheremata et al., 2018; Szczepanski et al., 2010) and encoding and storage (Sander et al., 2019; Sheremata et al., 2010; Sheremata & Silver, 2015). Importantly, activity in these brain regions has been shown to be retinotopic (Golomb & Kanwisher, 2012), thereby furthering the relationship between visual field biases in VSTM and asymmetries in the brain. Future studies may further be able to determine whether these biases reflect attention processes during the perception and encoding of memory items or storage of remembered items, or both.

We suggest that stronger visual field biases in Experiment 2 as compared with Experiment 1 are due to greater task demands imposed by a larger set size. A central debate in the VSTM literature concerns whether behavioral performance reflects participants remembering a fixed number of discrete items or deploying resources across the memory items. We have previously found that the optimal set size for demonstrating visual field biases is 1 greater than measured maximum capacity, or K + 1 (Sheremata & Shomstein, 2014). Our reasoning is that at set sizes at or below K, performance approaches ceiling. At larger set sizes, performance may reflect differences in performance or strategy, for instance, if a participant selects only a subset of items presented. In Experiment 2, increasing set size resulted in both significantly poorer accuracy and significantly higher measured maximum capacity. The purpose of this experiment was not to tease apart these theories. Instead, we argue that our results are consistent with either interpretation.

Previous studies have demonstrated that behavioral asymmetries reflect cognitive functions served by the parietal cortex such as object complexity (Sheremata & Shomstein, 2014) and task set (Sheremata & Shomstein, 2017), and have tied visual field asymmetries during VSTM to asymmetric representation of objects in the brain (Sanders et al., 2019; Sheremata et al., 2010; Sheremata & Silver, 2015). Higher order processing occurs in a retinotopic coordinate frame (Golomb & Kanwisher, 2012), supporting the hypothesis that visual field spatial biases result from hemispheric biases in representations in the brain. One area in particular that may be involved in memory processes directly linked to behavior is the parietal cortex as it shows hemispheric asymmetries and has an essential role of the intraparietal sulcus in supporting VSTM representations beyond the attention demands inherent in memory tasks (Sheremata et al., 2018).

While there is some evidence for greater precision of spatial working memory in retinotopic as compared with spatiotopic coordinates (Golomb & Kanwisher, 2012), other studies have found better memory for motion in spatiotopic coordinates (Ong et al., 2009). Importantly, both of these studies required participants to make eye movements during the memory delay, suggesting that differences in coordinate systems may be tied to demands such as updating memory representations across eye movements. It has been shown that increasing the number of eye movements required between encoding and retrieval can modify the coordinate system for long-term memory (Zhang et al., 2013). Therefore it is possible that changes in coordinate systems may also be seen with increased eye-movement demands during short-term memory. Therefore we suggest that investigating inherent spatial biases in VSTM performance more directly reveals the intrinsic coordinate system for VSTM as compared with requiring eye movements during the memory delay. However, to create ecologically valid measures of short-term memory demands, future studies should investigate contribution of eye and body movements during short-term memory performance.

In contrast to retinotopic organization of visual information, the coordinate system(s) underlying long-term memory representations are less clear. It has been suggested that objects are stored in memory relative to both external objects (allocentric) as well as to the self (egocentric) (Burgess, 2006). Indeed, there has even been evidence for retinotopic storage of remembered items (Slotnick, 2009). While there is debate as to how long-term memories are stored, the ability to retrieve memories in novel contexts and spatial locations requires the ability to store information in a viewpoint-independent form.

It is not clear how these asymmetries arise in behavior. One current line of research suggests that these asymmetries are tied to reading direction, as asymmetries are not seen in cultures that read right to left (Ransley et al., 2018). However, tellingly, these studies do not find a reversal of asymmetries, suggesting that reading direction and spatial biases may be separate, competing factors. Alternatively, biases may be specific to behavior with no direct link to brain activity. While behavioral asymmetries differ between right- and left-handed individuals, and asymmetries in the brain also differentiate individuals based upon handedness, no specific relationship between behavioral and brain asymmetries has emerged. Future neuroimaging studies utilizing well-controlled behavioral tasks may reveal a distinct relationship between asymmetries in brain and behavior.

Finally, these results highlight the need for models of VSTM to account for asymmetries inherent in behavior. One of the guiding tenants of cognitive neuroscience research is that patterns of behavioral performance reflect the processing properties of cortical areas supporting cognitive functions. Recent studies diverge on whether there is a single locus of short-term memory storage and, if so, whether it is the same area that supports perceptual representations of the same objects, typically visual cortex. However, as the occipital cortex does not demonstrate asymmetric processing, memory models must account for how higher order cortex exerts its influence on VSTM representations.