Attention plays a pivotal role in working memory (WM). One of the core functions of attention is of a “gatekeeper” (Awh, Vogel, & Oh, 2006). Selective attention controls which information gets access to WM, allowing the filtering out of irrelevant inputs, and thereby the efficient allocation of processing capacity (Gazzaley, 2011; Gazzaley & Nobre, 2012; Schmidt, Vogel, Woodman, & Luck, 2002). Attentional influences after perceptual stages are finished have also been uncovered in several WM tasks, indicating that attention can be used to selectively prioritize individual representations in WM over others during maintenance (Gazzaley & Nobre, 2012; Oberauer & Hein, 2012; Olivers, 2008; Souza & Oberauer, 2016). Accordingly, many researchers have proposed that attention plays a role in protecting or strengthening information in WM during the maintenance period (Awh et al., 2006; Chun, 2011; Cowan, 1999; Engle, 2002; Kiyonaga & Egner, 2012; Olivers, 2008; Theeuwes, Belopolsky, & Olivers, 2009). Here, we are concerned with the role of attention during maintenance of visual information in WM. We ask whether directing attention to individual representations in WM contributes to their quality, or to the probability of retaining them, and if so, whether visual or central attention is involved in this beneficial effect.

Visual and central attention

Attention is an umbrella term to talk about selection mechanisms operating in different domains (Chun, Golomb, & Turk-Browne, 2011). One such classical distinction is between visual attention, on the one hand, and central attention, on the other hand. According to Johnston, McCann, and Remington (1995), visual attention limits the parallel processing of information in the visual field, whereas central attention limits parallel processing in higher mental functions such as response selection and memory retrieval (see also Pashler, 1994), and these two forms of attention can be engaged independently. In support of this distinction, Johnston et al. (1995) have demonstrated that increasing the perceptual demands of a letter-identification task (thereby varying the demands on visual attention) does not increase the dual-task costs when this task is combined with a tone-classification task (which engages only central attention). In contrast, increasing the perceptual demand of letter identification does delay responding when combined with a visual cueing task requiring participants to move attention to spatial locations. This pattern is what it is expected if the selection of visual stimuli in the environment can be carried out even when central attention is occupied by response selection for another task, but not when the other task also engages visual attention. The distinction between tasks that do and do not require central attention can also be made through the requirement for processing the imperative stimulus (Frith & Done, 1986). Tasks that require stimulus identification in order to select a response—namely, choice reaction-time (RT) tasks—do require central attention. Conversely, tasks that only require detection of the stimulus—also called simple RT tasks—do not engage central attention.

Attention during maintenance in visual WM

Both visual attention and central attention have been argued to affect how well visual information is maintained in WM. We next briefly review these arguments and the evidence speaking to them.

Visual attention

Some authors have proposed that visual WM shares a common capacity limit with visual attention (Chun, 2011; Franconeri, Alvarez, & Cavanagh, 2013; Olivers, 2008). This suggestion is in line with the finding that asking participants to make eye movements away from locations they try to maintain in WM impairs performance (Awh, Jonides, & Reuter-Lorenz, 1998; Lawrence, Myerson, & Abrams, 2004; Pearson & Sahraie, 2003). Moreover, preventing participants from freely moving their eyes can hamper visual WM performance. Williams, Pouget, Boucher, and Woodman (2013) measured performance in a color change-detection task under conditions with free or constrained eye movements during the RI. To constrain eye movements, participants were requested to keep fixation on a cross in the center of the screen. Performance was slightly lower in the fixation condition than the free eye-movement condition (a decrease of about 2% in accuracy). Moreover, requesting participants to detect a change in brightness in the fixation cross (a simple RT task) while concurrently holding visual representations in WM impaired performance more than requesting participants to perform a simple RT task on an auditory stimulus. These results are consistent with the idea that visual attention supports visual WM maintenance.

Central attention

In contrast to the view described above, other researchers have argued that WM maintenance, for all types of information, relies on central attention (Barrouillet, Bernardin, & Camos, 2004; Barrouillet, Bernardin, Portrat, Vergauwe, & Camos, 2007; Barrouillet, Portrat, & Camos, 2011; Vergauwe & Cowan, 2014). These authors argue that central attention is deployed to individual items in WM in a cyclical fashion, thereby strengthening them and protecting them from forgetting. Following earlier work by Johnson and colleagues (Johnson, 1992; Johnson & Hirst, 1991; Johnson, Reeder, Raye, & Mitchell, 2002), this process is called refreshing. There is substantial evidence that engaging central attention in an unrelated, distracting task during the retention interval (RI) decreases WM recall accuracy compared to conditions without distraction. This decrease is a linear function of the amount of time attention is diverted away from memory representations, divided by the total RI, a ratio known as the cognitive load (Barrouillet et al., 2004; Barrouillet et al., 2011). The cognitive load effect has been observed in tasks with visuospatial and verbal materials (Vergauwe, Barrouillet, & Camos, 2009, 2010; Vergauwe, Langerock, & Barrouillet, 2014). As expected from the assumption that WM draws on central attention, the cognitive load effect is not observed for simple RT tasks not requiring response selection (Barrouillet et al., 2007).

Whereas refreshing was originally conceptualized as a domain-general mechanism applicable to any information in WM, researchers have recently claimed that some visual materials—in particular unfamiliar, non-categorical materials—are non-refreshable (Ricker & Cowan, 2010; Vergauwe, Camos, & Barrouillet, 2014). These observations cast some doubt on the contribution of central attention to maintenance of visual information in WM. One aim of the present work was to revisit this issue, using a method that directly tests for a beneficial effect of refreshing: the guided refreshing technique.

Guided refreshing in visual WM

Although the cognitive load effect has been taken as one piece of evidence for the role of refreshing in WM, this effect does not show that attending to memory representations improves their retention in WM. To provide direct evidence for the beneficial effect of refreshing, we have recently developed a technique to guide refreshing in a visual WM task. In the study of Souza, Rerko, and Oberauer (2015), participants had to remember the precise color of an array of dots. After a short retention interval (RI), participants reconstructed the color of one dot by selecting it from a color wheel. During the RI, a sequence of four cues was presented, each cue pointing to the location of one of the colored dots in the array. Participants were instructed to think of (i.e., refresh) the cued items, with individual items being cued zero, one, or two times. Recall error decreased as the number of refreshing steps directed to the recall target increased, showing that refreshing benefits memory in a cumulative fashion, with each refreshing step boosting memory. Furthermore, Souza et al. (2015) compared performance in this refreshing condition with performance in a baseline-short condition. The RI in the baseline-short condition was as short as the time between memory-array offset and presentation of the first refreshing cue in the guided-refreshing condition. Compared to the baseline-short condition, items never refreshed in the guided-refreshing condition were recalled equally well despite their longer RI, and items refreshed twice were recalled better. This is the result to be expected if refreshing improves memory of the refreshed items, while nonrefreshed items retain their status in WM, neither getting weaker nor stronger over time.

Given that Souza et al. (2015) used spatial cues to direct attention to individual WM contents, the refreshing benefit they observed may be explained by shifts of visual attention to the cued locations. Studies have found that participants tend to look at the locations previously occupied by the memoranda, and this strategy is correlated with better memory accuracy, a phenomenon known as the “looking at nothing” effect (Ferreira, Apel, & Henderson, 2008). This would be consistent with the idea that visual WM requires sustained visual attention. Alternatively, one could interpret the cues used by Souza et al. (2015) as an instruction to direct central attention to the cued items in WM, in line with the notion of refreshing as defined by Barrouillet et al. (2007). Hence so far, the beneficial attentional effect observed by Souza et al. (2015) could rely on visual or central attention.

The present study

In this article, our goal was to examine whether attention can be used to improve maintenance of visual information in WM, and if so, which kind of attention—visual or central—supports visual WM. Experiment 1 used the guided-refreshing technique to examine whether participants can use attention to improve WM performance during the RI. Furthermore, to gauge the contributions of visual and of central attention to maintenance, we combined the visual WM task with attention-demanding distractor tasks during the RI in Experiments 1 and 3. One distractor task engaged visual attention, and another distractor task engaged central attention. If visual WM draws on visual attention, then the visual task should yield a dual-task cost; conversely, if visual WM draws on central attention, the central attention task should yield a dual-task cost. Jointly, the two manipulations allowed us to demonstrate the benefits of focusing attention on WM contents as well as the costs of directing attention away from these contents in a single task and a single group of participants.

To foreshadow our results, we observed that attending to WM representations improves WM retention, and more so the more often those items are refreshed. Distracting attention during the RI impaired visual WM performance only when the secondary task engaged central attention. Preventing participants from sustaining visual attention to visual WM contents had only a negligible effect on memory. To ensure that our visual-attention distractor task actually engages visual attention sufficiently, in Experiment 2 we used a classical visual-attention task—multiple object tracking—as the primary task, and combined it either with the visual-attention or the central-attention distractor task. Only the visual distractor task yielded dual-task costs, but not the central distractor task. These results establish a double dissociation between visual and central attention, and of multiobject tracking and visual WM: Tracking multiple objects in the visual field relies more on visual attention, whereas visual WM relies more on central attention.

Experiment 1

The first aim of Experiment 1 was to demonstrate that attention does benefit the maintenance of representations in visual WM. To attain this goal, we replicated the refreshing manipulation used by Souza et al. (2015): we presented cues during the RI of a continuous color delayed-estimation task (Prinzmetal, Amiri, Allen, & Edwards, 1998; Wilken & Ma, 2004; W. Zhang & Luck, 2008). The cues served to instruct participants to think of the cued items, thereby refreshing them. We varied the frequency with which individual items were refreshed (zero, one, or two times) in order to test whether the beneficial effect of attention is cumulative.

Next, we assessed the role of different kinds of attention in visual WM by examining the pattern of dual-task costs engendered by combining the continuous visual WM task with secondary tasks tapping either visual attention or central attention. This strategy follows from the premise that distracting attention during the RI should impair visual WM performance to the degree that attention supports the maintenance of information in visual WM. To test for a role of visual attention, we used a task developed by Williams et al. (2013). Participants were asked to monitor the fixation cross for a potential small change in brightness. This task binds visual attention to the fixation cross, leading to slower processing of other visual objects (Poth, Petersen, Bundesen, & Schneider, 2014). The change occurred in a small proportion of trials, and only in these trials participants had to make an overt response. Because participants had to detect a change but were not required to identify which type of change occurred, this task does not tap on central attention. Moreover, this task has the advantage of minimizing visual interference because trials without changes in the fixation cross are exactly the same as trials without this secondary task. At the same time, participants need to continuously monitor the fixation cross for a subtle change, preventing covert and overt shifts of attention to the locations previously occupied by memory items. If participants in the single-task condition use visuospatial shifts of attention to bolster visual WM, then this task should yield a dual-task cost for visual WM performance.

To test for a role of central attention, we used a tone-classification task requiring identification of tones as being of low or high pitch. This task requires participants not only to detect that a tone was presented but also to process the identity of the tone in order to select the appropriate response. Response selection does recruit central attention (Pashler, 1994). At the same time, this task makes no demand on visual processing. If maintenance of information in visual WM relies on central attention, then this task should yield a dual-task cost.

Method

Participants

Forty-four students (M = 24 years old; 9 men) from the University of Zurich took part in Experiment 1. Participants completed one of two experimental versions, referred here to as Experiment 1a (n = 24) and Experiment 1b (n = 20). Participants were compensated with course credit or 45 Swiss francs for taking part in three 1-hour sessions. Participants read and signed an informed consent form in the beginning of the experiment, and were debriefed in the end. The study protocol was in accordance with the guidelines of the Ethics Committee of the Faculty of Arts and Social Sciences of the University of Zurich.

Stimuli and -procedure

All experiments reported in this paper were programmed using MATLAB and the Psychtoolbox (Brainard, 1997; Pelli, 1997). Participants were tested in individual booths where they sat at about 50 cm from the computer screen (viewing distance was unconstrained).

Baseline condition.

As shown in the first row of Fig. 1, participants performed a continuous color delayed-estimation task (Wilken & Ma, 2004; W. Zhang & Luck, 2008) as the main visual WM task. First, a white fixation cross was shown for 500 ms. Next, six colored dots (2.11° × 2.14° of visual angle) were presented simultaneously for 1,000 ms. The dots were equidistantly arranged on an invisible circle (radius 4.9° of visual angle) centered on the middle of the screen. The colors of the memoranda were sampled from 360 values evenly distributed along a circle in the CIE L*a*b color space (L = 70, a = 20, b = 38, radius = 60). Colors in each memory array were selected randomly with the constraint that all six colors were at a minimum distance of 20° on the color wheel from each other. The offset of the memory dots was followed by an RI of 2,500 ms, during which the screen was gray, and only the white fixation cross was visible. At the end of this interval, a test display was shown containing the color wheel (randomly rotated from trial to trial), a white frame around the location of one memory item (hereafter the target), and the mouse cursor in the center of the screen (which replaced the fixation cross). Participants were asked to report the color of the target by clicking on a point on the color wheel. After responding, the next trial started after 1,000 ms. Instructions emphasized accuracy but not speed. Performance in this standard task constituted the baseline against which we compared the effects of three attention manipulations: guided refreshing (referred to as refreshing condition), distraction of visual attention (visual condition), and distraction of central attention (central condition).

Fig. 1
figure 1

First row illustrates the flow of events in the continuous color delayed-estimation task (baseline condition) used in Experiment 1. Conditions depicted in rows 2–4 differ from the baseline in the events unfolding during the retention interval (marked in light gray). In the refreshing condition, four arrows were shown sequentially. Participants were instructed to think of the item the arrow points to, thereby refreshing it. In the visual condition, participants were instructed to monitor the fixation cross for a potential change in brightness (occurring in only 25% of the trials). Only if a change was detected, they had to press the spacebar. In the central condition, participants were instructed to listen to tones and to say aloud quickly whether the tone was of high or low pitch. (Color figure online)

Visual condition.

In this condition, a secondary task requiring visual attention was carried out during the RI of the main visual WM task. Participants were instructed to monitor the fixation cross during the RI for a potential change in brightness (Williams et al., 2013). The change consisted of turning the fixation cross from white to light grey (RGB: 166 166 166) for 100 ms. We programmed a low probability of a brightness change occurring once during the RI of a trial (25%). When a change was scheduled in a trial, it occurred at least 500 ms after the offset of the memory array and 900 ms before presentation of the test display to ensure that participants had sufficient time to respond. Upon detection of a change, participants had to press the spacebar. In case no change occurred, participants did not have to press any key. Because participants only had to detect a change but not report which type of change occurred, this task does not require response selection and hence it does not tap central attention (Frith & Done, 1986). Furthermore, to eliminate any possible contribution of response selection to performance in this condition, we excluded trials in which participants reported a change. This also ensured that participants engaged visual attention to the fixation cross for the whole duration of the RI.

Central condition.

In this condition, a secondary task requiring response selection (hence, tapping central attention) but no visual processing was carried out during the RI. Participants were required to judge whether tones (75 ms duration) played through headphones were of low (300 Hz) or high (1000 Hz) pitch by saying aloud ‘low’ or “high,” respectively, into a microphone. The task used auditory stimuli and vocal responses to avoid demands on visual and spatial attention. Experiments 1a and 1b differed regarding time constrains to perform the tone task. In Experiment 1a, the pace at which the tones were presented was determined by the individual’s speed of processing. In Experiment 1b, we set a fixed time limit to respond to the tones, thereby imposing time pressure. Experiment 1a has the advantage of minimizing errors by allowing sufficient time for responding to each tone. However, it also allowed for two (opposing) confounds that we tried to minimize in Experiment 1b. On the one hand, participants could strategically postpone processing of the tones in favor of using central attention to maintain representations in visual WM (Vergauwe, Camos, et al., 2014), which would lead to reduced dual-task costs in Experiment 1a. Imposing time pressure in Experiment 1b prevents such a strategy. On the other hand, allowing sufficient time to process the tones introduced variability in the total RI in Experiment 1a, and allowed it to be longer than in the remaining conditions. This introduced the possibility that worse performance in this condition could be explained by time-based decay. Experiment 1b therefore kept the total RI equal across conditions to rule out these possible confounds. Combined evidence from both experiments allows us to rule out that dual-task costs engendered by the tone-classification task are due to time pressure (and the associated error processing that may follow) or decay.

The central condition in Experiment 1a consisted of the sequential presentation of two tones (henceforth T-2 trials) during the RI. The first tone occurred 500 ms after the offset of the memory array. Participants had to say aloud their answer, thereby automatically triggering a voice recorder that registered the occurrence of the response and created a sound file of the answer for offline accuracy check. The next tone was played 100 ms thereafter. Responding to the second tone was followed by another 100 ms interval before presentation of the test display. Hence the total RI in the central condition of Experiment 1a depended on the reaction times (RTs) to the tones. In contrast, in Experiment 1b, we fixed the RI of the central condition to 2,500 ms to match the RI in the remaining conditions. Moreover, we created two conditions that differed only regarding the number of tones that had to be judged during the RI. In T-1 trials, 500 ms after the offset of the memory array, a single tone was played, and participants had 1,925 ms to respond before onset of the test display. In T-2 trials, 500 ms after the memory array offset, the first tone was played, and participants had 925 ms to respond to it, after which a second tone was played, followed by another 925-ms response period. Manipulating the number of tones allowed us to test for a potential cognitive-load effect in our task (Barrouillet et al., 2007). As in Experiment 1a, vocal responses were recorded for offline accuracy check.

Participants performed three sessions in each experiment. In Experiment 1a, each session comprised 300 trials. The refreshing condition was performed in one single session (yielding 100 trials per refreshing level). The two additional sessions comprised two blocks of trials: one block of baseline trials (120 trials) and one dual-task block with one of the secondary attention tasks (180 trials with either the visual or central condition). Session order was counterbalanced across participants using a Latin square, as was block order (baseline or dual-task block) within a session. Participants were informed of the requirements in effect in each block (baseline, refreshing, visual, or central condition) prior to the start of the session or block of trials; this instruction was followed by six practice trials (excluded from subsequent analyses).

In Experiment 1b, the refreshing condition was also performed in a separate session comprising 300 trials (100 trials per refreshing level). In a second session, participants completed 100 trials of the baseline condition, and 100 trials of the visual condition. These conditions were performed in separate blocks of trials, whose order was counterbalanced across participants. In this experiment, participants also received a single-task practice block (comprising 50 trials) with the brightness task alone prior to being exposed to the visual (dual-task) condition. In a third session, participants performed 200 trials of the central attention conditions (i.e., 100 T-1 trials and 100 T-2 trials). T-1 and T-2 alternated every 10 trials, and a message informed participants of the impending change in the number of tones. Before starting the central dual-task condition, participants completed a practice block with the tone task alone for 60 trials. Session order was counterbalanced across participants using a Latin square in Experiment 1b. In each session, the test trials with the visual WM task were preceded by six practice trials (with the dual-task requirements when applicable) which were discarded from subsequent analyses.

Data analysis

We submitted our data to Bayesian t tests, Bayesian ANOVAs, and Bayesian linear mixed-effect model analyses (Rouder, Morey, Speckman, & Province, 2012). These analyses were performed in R (R Core Team, 2014) using the BayesFactor 0.9.11 package (Morey & Rouder, 2014). To compute Bayesian mixed-effect models, we used the lmBF function. This function estimates the likelihood of a specified model (M1) in comparison to a null model (M0). The ratio of the likelihood of M1 over M0 is the Bayes factor (BF10). To assess the evidence for different predictors (and interactions thereof) in our data, we computed the ratio of the BF10 of a model including a predictor of interest against a model omitting it. This new BF provides the strength of the evidence to include the predictor in the model. We assessed the evidence for each predictor by using a top-down method, starting with a full model with all predictors and interactions, then we systematically assessed the evidence for each predictor by removing it from the full model.

For all models, we entered a random intercept for participant and a random slope for the effect of the within-subjects predictors. We report the BF for the alternative model (1) over the null model (0), namely BF10. By reversing the ratio (i.e., computing 1/BF10), one can obtain the evidence for the null hypothesis (BF01). The BF provides a factor by which our ratio of prior beliefs in the tested models should be updated in light of the data. BFs below 3 are usually regarded as weak evidence, between 3 and 10 as substantial evidence, between 10 and 100 as strong evidence, and above 100 as decisive evidence in favor of the model under consideration (Kass & Raftery, 1995).

We also fitted a mixture model to the data to estimate the probability that responses were informed by memory as opposed to guessing, and the precision of the representations in memory (W. Zhang & Luck, 2008). Details of the modeling method, and the results of analyzing mixture-model parameters, can be found in the Online Supplementary Materials.

All data and analysis scripts are in the open science framework (https://osf.io/arve5/).

Results

Distractor task performance

We first checked performance in the brightness-change task and tone-classification task used as distractors during the visual and central conditions, respectively. We identified one participant in Experiment 1a who never responded to the brightness change, and we excluded this person from the final analysis. Moreover, another participant in Experiment 1a was excluded due to extremely slow responses in the tone task that led to extremely long RIs (~10 s) in the central condition. Hence, the final sample size in Experiment 1a was n = 22.

Performance in the brightness-change task was assessed by computing a measure of detection performance (i.e., hit minus false alarms). In Experiment 1a, we did not assess performance in the brightness-change task when performed alone. In Experiment 1b, we included a single-task block with this brightness-change task before the visual condition was implemented. Accuracy in the single-task block—after excluding the first 10 trials as a familiarization period—served to gauge performance in the absence of a visual WM load. As shown in Table 1, accuracy in the brightness-change task was overall high, and the evidence was ambiguous concerning a decline in performance in the visual condition relative to the single-task condition in Experiment 1b (BF10 = 1.2).

Table 1 Mean performance in the secondary tasks in Experiments 1a and 1b

Performance in the tone-classification task was assessed as the proportion of correct decisions. Table 1 presents the accuracy of tone classification in the central condition (dual-task block). Due to a programming error, vocal responses were not recorded during the single-task block of Experiment 1b, and we could not analyze accuracy in this block. Overall, tone accuracy was high in Experiment 1a, in which a response deadline for the tones was not imposed (which led to a somewhat longer mean RI in this condition, M = 3.21 s, SD = 0.339 s, than in the other conditions). Accuracy in the equivalent condition (i.e., T-2 trials) in Experiment 1b was considerably lower due to the fixed response time window imposed.

Recall error

For each trial of the continuous color delayed-estimation task we computed a measure of recall error by taking the absolute distance on the color wheel between the reported color and the target’s true color. We excluded from the data of the visual condition all trials in which a change was detected in the brightness-change task, so that performance in this condition purely reflects the cost of sustaining visual attention to the fixation cross, without any contribution from response selection or motor action. The pattern of results remains the same if we include all trials performed in this condition. We examined performance in the central condition including or excluding trials with incorrect tone judgments and found the same qualitative pattern of results. Therefore, we included trials with incorrect tone responses to keep as many trials as possible, particularly in Experiment 1b, in which tone accuracy was lower.Footnote 1

Figure 2a presents the mean recall error in each condition of Experiments 1a and 1b. Inspection of Fig. 2a shows that when attention was directed to specific memory items during the RI (by asking participants to refresh a cued item), recall improved as refreshing frequency increased. The requirement to carry out the brightness-change task during the RI (visual condition) only mildly impaired visual WM performance compared to the baseline condition. Performing the tone task had a substantial impact when participants had to judge two tones (T-2 trials) but a benign cost when only one tone had to be judged (T-1 trials in Experiment 1b). We used Bayesian mixed-effects models and Bayesian t tests to gauge the amount of evidence for each of the effects of interest. The Experiment was treated as between-subjects categorical predictor and condition as a within-subjects categorical predictor.

Fig. 2
figure 2

a Mean recall error as a function of condition in Experiment 1. b Posterior of the refreshing effect on recall error (i.e., change in recall error per refreshing step) as estimated from the Bayesian regression analysis of the data from Experiments 1a (red) and 1b (black). The 95% credible intervals are shown by the line bars under each curve. c Mean recall error as a function of cue position in which the target was last refreshed. Error bars depict 95% within-subjects confidence intervals (Cousineau, 2005; Morey, 2008). Ref = refreshing; E = experiment. (Color figure online)

Refreshing frequency effect.

Our first goal was to replicate the effect of refreshing frequency reported by Souza et al. (2015). Linear mixed-effects Bayesian models were run with refreshing frequency and experiment as fixed effects. We found overwhelming evidence for a main effect of refreshing frequency (BF10 = 7.1 × 107). There was no evidence in the data for a main effect of experiment (BF10 = 0.7), and the evidence for an interaction between experiment and refreshing frequency favored the null (BF10 = 0.29; hence favoring the null by about 3). This is not surprising, given that the refreshing manipulation was exactly the same across Experiments 1a and 1b, and hence there was no reason why the two samples of participants should differ on the effect of refreshing.

We pooled the data of both experiments together to perform follow-up Bayesian t tests comparing the three refreshing levels with each other. There was overwhelming evidence supporting a cumulative effect of refreshing on recall error: zero-refreshing vs. one-refreshing, BF10 = 207; one-refreshing vs. two-refreshings, BF10 = 128. We also computed the size of the refreshing effect by entering refreshing frequency as numerical predictor in a Bayesian linear regression model, which yielded clear evidence for a linear trend: Experiment 1a, BF10 = 6.6 × 109; Experiment 1b, BF10 = 2.6 × 107. The posterior distributions of the refreshing effect in Experiments 1a and 1b are shown in Fig. 2b. This analysis showed that each refreshing step led to an average reduction of the recall error of about 5°.

Refreshing over the RI.

We also analyzed recall error as function of the position in the cue sequence in which the target of recall was last refreshed. Figure 2c shows the results of this analysis with the data of both experiments pooled together. We assessed the evidence for the effect of cue position on recall error by entering serial cue position in a Bayesian ANOVA. For targets refreshed once, cue position yielded a BF10 = 0.04, and for targets refreshed twice, cue position yielded a BF10 = 0.30. Hence, there was evidence against an effect of the position within the RI in which an item was refreshed.

Refreshing vs. baseline.

Next, we compared recall error in the refreshing conditions against the baseline. If participants do not refresh spontaneously in the baseline condition, performance in the zero-refreshing condition should be equivalent to baseline, whereas the one-refreshing and two-refreshing conditions should yield better performance. In contrast, if participants refreshed items in the baseline condition, they would spend on average 0.42 s (i.e., 2.5-s/6 items) refreshing each item, roughly equivalent to one cue-controlled refreshing step in the refreshing condition. Therefore, in this case their performance in the baseline condition should be roughly equivalent to performance in the one-refreshing-level condition.

First, we assessed whether participants in Experiments 1a and 1b differed regarding their baseline performance. An independent-samples t test yielded a BF10 = 0.32, supporting the null by a factor of about 3. Therefore, we again pooled the data of both experiments to perform Bayesian t tests comparing each refreshing-level condition against the baseline. Performance in the zero-refreshing level was worse than in the baseline (BF10 = 111). Performance in one-refreshing condition was similar to that in the baseline (BF10 = 0.38, hence favoring the null hypothesis by about 2.6). Performance in the two-refreshing condition tended to yield smaller errors than the baseline, but the evidence was inconclusive in supporting better performance in this condition (BF10 = 1). These results speak in favor of spontaneous refreshing in the baseline condition. On that basis, we can expect that a secondary task that disrupts refreshing during the RI should impair memory. This is what we examine next.

Dual-task costs.

We compared performance across the baseline, visual, and central (T-2 trials) conditions of Experiments 1a and 1b. There was overwhelming evidence for an effect of condition (BF10 = 3.3 × 107). There was not enough evidence to support an effect of experiment (BF10 = 0.7), and the evidence for an interaction between condition and experiment tended to favor the null (BF10 = 0.2; hence supporting the null by a factor of about 5). Experiments 1a and 1b differed regarding the implementation of the tone task (self-paced processing of the tones in Experiment 1a vs. time pressure in Experiment 1b). Nevertheless, this analysis shows that both implementations of the central-attention task yielded comparable impairments in terms of WM performance.

We performed pairwise comparisons of the two dual-task conditions against the baseline, pooling the data of both experiments together. A t test comparing the visual and baseline conditions yielded a BF10 = 0.94, which provides ambiguous evidence concerning an effect of the visual task. Comparison of the central T-2 condition against the baseline yielded overwhelming evidence for an increase in recall error due to processing of the tone task (BF10 = 3.7 × 108).

Cognitive load effect.

In Experiment 1b, we also included a variation of the number of tones to be processed during the fixed 2.5-s RI. This manipulation allows the assessment of the impact of cognitive load on performance in our color memory task. The comparison of performance in the T-1 and T-2 conditions yielded a BF10 = 114, supporting worse performance when processing two tones, as shown in Fig. 2a. Processing one tone had at best a mild impact on the average recall error: Comparison of this condition against the baseline yielded a BF10 = 1.5.

Mixture modeling

We also submitted our data to mixture modeling. The results of the modeling can be found in the Online Supplementary Materials. Here we only briefly report the qualitative pattern of results. We found that refreshing frequency linearly increased the probability of having information in memory (PM). Regarding memory precision (σ), there was evidence that two-refreshed items were remembered with higher precision, but zero-refreshed and one-refreshed items did not differ from each other. Regarding the pattern of dual-task costs, we only found an effect of distraction on PM: Distraction of central attention, but not visual attention, reduced PM to about half of the estimate obtained for the baseline.

Discussion

The refreshing frequency effect

Experiment 1 showed that directing attention to WM contents improves recall commensurate with the frequency with which items are refreshed (Souza et al., 2015). It also demonstrated that the refreshing frequency effect does not depend on when in the RI an item is refreshed. This shows that the refreshing benefit does not arise merely from directing attention to the first-cued item and ignoring all subsequent cues, or from holding only the last-cued item in the focus of attention until the time of test. Furthermore, mixture modeling showed that the refreshing frequency manipulation increased the probability of retrieving the target item from WM. When items were refreshed twice, they also were recalled with a higher precision.

Overall, zero-refreshed items were recalled worse than in the baseline. In contrast, there was ambiguous evidence to support a benefit for items refreshed twice compared to the baseline. We interpreted those findings as indicating that participants spontaneously refresh all items in memory in the baseline whereas in the refreshing condition, we forced them to only refresh a subset of their WM contents. This interpretation is line with the results by Souza et al. (2015) who found that zero-refreshed items were recalled as well as items from a short baseline (with an RI matched to the time until presentation of the first cue in the refreshing condition). At the same time, zero-refreshed items were recalled worse than items from a long baseline (with an RI as long as in the refreshing condition). The baseline used here corresponds to the long baseline in Souza et al. (2015). Our earlier study showed that zero-refreshed items do not get worse during the time other items are being refreshed compared to their state prior to the start of the refreshing period. However, they do lose opportunities for refreshing that become available over the long RI. Taken together, the present results and those of Souza et al. (2015) are best explained by the assumption that refreshing strengthens the refreshed items in WM, while not-refreshed items maintain their initial status.

One question that may remain is why we could not find support for a benefit of two-refreshing items. Although in theory our hypothesis predicts such a benefit, in practical terms this may be complicated by the fact that the refreshing condition is also a dual-task condition: participants have to hold in mind a task set to follow the cues and refresh the cued items as instructed. Hence, there might be two opposing effects driving performance in the refreshing condition: A small cost of having a secondary task to perform partially counteracts the benefit of attending to a representation in WM. Whereas comparison of refreshing levels within the same condition shows strong support for the benefit of refreshing, comparisons between the refreshing condition and the baseline may be somewhat distorted by the presence of putative dual-task costs in the refreshing condition.

One may wonder whether the refreshing frequency effect we observed could be explained by a redistribution of WM resources as a function of cueing. During the baseline, all items would have a roughly equal share of WM resources. As items are cued, resources would be flexibly redistributed, with cued items receiving a larger share of resources at the expense of noncued items. According to this hypothesis, attention would not yield a genuine benefit for performance but would involve a zero-sum game in which benefits and costs would balance out. This could potentially explain the results we observed here. We cannot rule out this possibility, but we note that it is difficult to reconcile with the results of the short and long baselines obtained by Souza et al. (2015). Furthermore, retro-cue studies have shown that cueing attention to a single item in WM can yield a benefit for retrieval of the cued representation without costs for other information concurrently retained in WM (Gunseli, Fahrenfort, Daoultzis, Meeter, & Olivers, 2015; Li & Saiki, 2014; Rerko & Oberauer, 2013). This finding is also difficult to explain with a redistribution-of-resources hypothesis. Hence, future studies will be needed to systematically test predictions of this resource-redistribution hypothesis against our strengthening hypothesis (Rerko & Oberauer, 2013; Rerko, Souza, & Oberauer, 2014; Souza et al., 2015).

Although our study cannot distinguish between these two alternative ways in which refreshing helps WM, this is inconsequential to the examination of our question regarding the contribution of visual and central attention to the maintenance of information in WM.

Visual versus central attention in WM

Given that attending to representations in WM is beneficial to their retention, our next question was which kind of attention supports visual WM performance. Experiment 1 showed that the dual-task costs engendered by the visual condition were modest at best, and the BF was ambiguous, indicating that the data were equally consistent with the absence of a cost and with the presence of a small but nonzero cost. This finding meshes well with the low costs of distracting visual attention observed by Williams et al. (2013) in a color change-detection task. Our results therefore suggest that visual attention contributes little to visual WM. Conversely, the dual-task costs yielded by the central condition were substantial, with mixture modeling indicating that the probability of recalling the target was reduced by about half of that in the baseline. This indicates that central attention contributes to the maintenance of representations in visual WM.

Furthermore, Experiment 1b demonstrated an effect of cognitive load (Barrouillet et al., 2007; Barrouillet et al., 2011) with continuous visual information. Recall was poorer when participants had to process tones at a faster pace (T-2 trials vs. T-1 trials), consistent with the assumption that low cognitive load allows central attention to be shared between refreshing and response selection. We are aware of only one other study that varied cognitive load in a continuous color WM task (Hardman, Vergauwe, & Ricker, 2017). Together, their study and ours show that the cognitive load imposed by choice RT tasks affects the retention of continuous visual information in WM.

Experiment 2

Experiment 1 showed that a task engaging central attention disrupts maintenance of visual representations in WM, whereas a task engaging visual attention has at best a negligible effect on visual WM. However, one may wonder to what degree our brightness-change task engages visual attention. To find out, we measured performance in a classical visual-attention task, the multiple-object tracking task, or MOT for short (Cavanagh & Alvarez, 2005; Pylyshyn & Storm, 1988), combined with the requirement to perform concurrently our brightness-change task or our tone-classification task. Based on the assumption that the MOT task requires mainly visual attention, we predict that MOT performance shows a larger dual-task cost when combined with the brightness-change task than when combined with the tone-classification task. Combined with the opposite pattern of dual-task interference in Experiment 1, this would constitute a double dissociation, confirming the distinction of visual and central attention (Johnston et al., 1995) and demonstrating that visual WM and MOT rely largely on different forms of attention.

Method

Participants

Twenty-five students (M = 25 years old; 8 men) of the University of Zurich took part in two 1-hour sessions in exchange of course credit or 30 CHF. One participant was excluded from the analyses due to low performance in the practice blocks (performance at chance level), leaving a sample size of n = 24.

Stimuli and tasks

The main task in this experiment was a MOT task in which participants tracked the locations of four target dots as they moved around on the screen, intermixed with identical-looking distractors. At the beginning of each trial, participants were presented 10 static dots (0.8° × 0.8° of visual angle) scattered within an invisible square (radius = 6°) centered in the middle of the screen (see Fig. 3). The radius of this area roughly corresponds to the visual display area in Experiment 1. Six of the 10 dots were colored in white (distractors) and four in yellow (tracking targets). This display was shown for 2 s, after which the yellow dots turned white, thereby becoming indistinguishable from the distractors. After 500 ms, the dots started moving about in randomly selected directions at a constant speed. The dots bounced away from each other (when they reached an edge-to-edge distance from each other equal to their diameter), from the borders of the invisible square, and from the area around the fixation cross (radius = 1.33°). The dots moved for a total period of 3.5 s. When the dots stood still again, the fixation cross was replaced by the mouse cursor, and participants were instructed to click on four of the dots to indicate them as the tracking targets. When participants clicked on one dot, it turned black. After clicking on the four dots, the correctly identified tracking targets turned yellow for 1 s, thereby providing feedback regarding the accuracy of the responses. Next, a blank intertrial interval of 2,000 ms elapsed before the start of the next trial.

Fig. 3
figure 3

Flow of events in the multiple object tracking (MOT) task used in Experiment 2. In the baseline condition, the MOT task was performed alone. In the visual condition, the MOT task was combined with the brightness-change task, and hence participants had to monitor the fixation cross during the tracking period for a change in brightness. In the central condition, the MOT task was combined with the tone-classification task, and hence participants had to judge whether tones were of high or low pitch during the tracking period. (Color figure online)

Participants performed the MOT task alone (baseline) or in combination with a secondary task. The secondary tasks were the same as in Experiment 1, namely, the brightness-change task (visual condition) and the tone-classification task (central condition). One of these tasks was inserted during the tracking period of the MOT task. During the brightness-change task, in 25% of the trials, the fixation cross turned light gray for 100 ms. In the remaining trials, nothing happened. Participants were instructed to press the spacebar in case a change in brightness was detected. The change in brightness occurred at least 500 ms after the onset of the tracking period and at least 900 ms before the end of tracking to allow sufficient time to respond to the change. We were mainly interested in trials in which participants did not respond to a change in brightness (75% of the trials), and hence they had to sustain visual attention to the fixation cross for the whole tracking period. During the tone-classification task, two tones were presented one after the other, and participants had to say aloud whether the tone was of high or low pitch. The first tone was presented 500 ms after the onset of the tracking period. The next tone was played 1,500 ms after the first tone. We used a 1.5-s response window to impose only mild time pressure. Oral responses were recorded for off-line accuracy check.

Procedure

Single-task blocks.

In the beginning of each session, participants completed three single-task blocks: a MOT block, a brightness-change block, and a tone-classification block. In each block, participants were first instructed on how to respond in the task, which was then followed by 25 trials. During the MOT block, we implemented a staircase procedure to calibrate the speed of movement of the dots such that task difficulty would be set to 75% (i.e., three out four dots correctly recalled) for each individual. This ensured that the MOT task was difficult enough to fully tax visual processing capacity. Dot speed was initially 2.5 pixels (0.08° of visual angle) per screen-refreshing interval. At the end of each trial, the speed of the dots for the next trial was adjusted based on the performance on the current trial: if the four dots were correctly recalled, speed was increased by 0.5; if three out of four dots were recalled correctly, the speed remained unchanged; if fewer than three dots were correctly recalled, the speed was decreased by 0.5. The speed achieved at the end of the 25th trial was used as the speed for the test bocks. During the single-task blocks of the brightness-change and tone-classification tasks, participants practiced responding to these tasks alone.

Test blocks.

After completing the three single-task blocks, the test phase started. The test phase was also divided into three blocks comprising different experimental conditions. In the baseline condition, the MOT task was performed without any secondary task. In the visual condition, the MOT task was combined with the brightness-change task. In the central condition, the MOT task was combined with the tone-classification task. The order of these three blocks was counterbalanced across participants using a Latin square. There were 50 trials in each condition per session, and given that participants completed two sessions, a total of 100 trials were obtained for each test condition.

Results

Table 2 shows the mean accuracy and 95% within-subjects confidence intervals for performance in each task (brightness change, tone classification, and MOT) across the different experimental conditions.

Table 2 Average accuracy in each task as function of condition in Experiment 2

Distractor-task performance

As shown in Table 2, detection accuracy (hit minus false alarms) in the brightness-change task during the single-task block was overall high. Detection accuracy in the visual (dual-task) condition was lower than in the single-task block, and a Bayesian t test comparing these two conditions provided strong evidence supporting this difference, BF10 = 15.97.

Regarding the tone-classification task, due to a programming error, only vocal responses to the first tone were recorded during the single-task block. Accuracy of these responses was overall high. During the central condition, responses to the tones were highly accurate as well for both tones (see Table 2). We compared responses to the first tone between the single-task block and the central condition using a Bayesian t test, which yielded a BF10 = 0.22, indicating that the null hypothesis that responses in these two conditions are indistinguishable should be preferred by a factor of 4.5 over the alternative hypothesis. These results show that the concurrent performance of the MOT task impaired performance of the brightness-change task but not of the tone-classification task.

MOT performance

Average tracking accuracy is shown in Table 2. The staircase procedure implemented in the single-task practice block was successful in selecting a dot speed that yielded performance levels close to 75% in the Baseline condition. For the analysis of performance in the visual condition, we excluded all trials in which a brightness change was detected to guarantee that response selection did not influence our results. Including all trials did not change the pattern of results. MOT accuracy dropped by about 4.5% in the visual condition compared to the baseline, a reduction supported by a BF10 = 19.66. MOT accuracy, however, was unaffected by the concurrent performance of the tone-classification task—if anything, tracking accuracy was slightly higher in the central condition than in the baseline. The Bayesian t test comparing these conditions yielded a BF10 = 0.54, showing that the null hypothesis should be preferred by a factor of 2 over the alternative hypothesis.

Discussion

The results of Experiment 2 confirm that our brightness-change task taps the ability to attend to visual objects, whereas the tone-classification task does not. The dual-task costs yielded by the brightness task on MOT performance (and, conversely, of the MOT task on the brightness task) are in line with the assumption that both tasks require visual attention. Previous studies using eye tracking have shown that the MOT task can be carried out without the need to fixate the tracking targets (Fehd & Seiffert, 2008, 2010; Oksama & Hyönä, 2016; Zelinsky & Neider, 2008). Indeed, these studies have shown that target fixations are associated with lower MOT accuracy. Hence, the costs of our brightness-change task are unlikely to be because it may have reduced target fixations. Our results are consistent with the hypothesis that both tasks compete for visual attention (Tran & Hoffman, 2016).

The tone task did not yield a dual-task cost when combined with the MOT task. This finding is at odds with some previous reports. Alvarez, Horowitz, Arsenio, DiMase, and Wolfe (2005) combined a MOT task with a tone task requiring participants to determine whether a target tone presented randomly in a sequence of distractor tones was shorter or longer than the distractors. Responses to the MOT and tone task occurred at the end of the trial, one after the other. Performance of this concurrent auditory task led to a reduction of MOT accuracy of a similar magnitude as the reduction yielded by the concurrent performance of a visual search task. In the study of Allen, McGeorge, Pearson, and Milne (2006), tones were presented at a rate of one per second during the tracking period, and participants had to say aloud whether they were of high or low pitch. Compared to a condition without a secondary task, processing the tones reduced the number of targets successfully tracked. This reduction was also similar to the one observed in a visual version of the task in which visually displayed digits (at fixation) were judged as high or low. In the study of Tombu and Seiffert (2008), the difficulty of tracking was varied by increasing speed, proximity, or both, concurrently or after the presentations of a tone to be judged as high or low. MOT performance was more impaired if the tone was presented at the same time as an increase in tracking difficulty than when they occurred separated in time.

There are many procedural differences between the present study and the ones listed above that may explain the discrepant results. We asked participant to provide an immediate response to the tones within a comfortable time window. In the Allen et al. (2006) study, participants had to respond to the tones at a faster pace, and tone accuracy was lower than observed here. Posterror processing may then contribute to the performance of participants in their study. The immediate or delayed requirement to respond to the secondary task may also be an important factor in limiting performance in these tasks. In the study of Alvarez et al. (2005), responses in the tone task were delayed until after participants had reported the tracking targets. This required participants to keep their tone response in WM during the delay period. Tombu and Seiffert (2008) used a somewhat different setup, in which increases in tracking difficulty did or did not coincide with the tone task. It is possible that adjusting to increases in tracking difficulty requires central attention. Last, it may be worth noting that in our task version, objects moved at a constant speed, whereas, in the studies that found an impairment of MOT by concurrent response-selection demands, there was a large variation in speed of movement over time. Conceivably, adjusting the tracking mechanisms of visual attention to different speed ranges may involve central attention, so that variable speed makes the MOT vulnerable to distraction of central attention.

In sum, other studies found that MOT performance is also impaired by distractor tasks engaging central attention. Our results imply that this is not always the case: The MOT task can be performed at maximal processing capacity concurrently with a task requiring response selection. Clearly, the MOT task can be implemented in a more complex setup than the one used here. The more complex the task, the higher the likelihood that different cognitive processes may come into play for performance of the task, and in these cases MOT performance may no longer be a pure measure of visual attention.

To wit, Experiments 1 and 2 jointly demonstrate a double dissociation between maintenance in visual WM on the one hand, and tracking of visual objects on the other hand: Maintenance of colors in visual WM relies mostly on central attention, whereas MOT relies mostly on visual attention.

Experiment 3

Our conclusion that visual attention contributes little to visual WM is based on the double dissociation we established across Experiments 1 and 2: the visual task impaired MOT but not visual WM; the central task impaired visual WM but not MOT. One reviewer raised the objection that the two attention tasks were not matched with regard to the number of responses required: The visual attention task required at best a single response, and none in the 75% of trials we used for analyzing primary-task performance. In contrast, the central-attention task required two responses, and when only a single response was required (Experiment 1b), the dual-task cost on visual WM was fairly small. Against this objection, we maintain that the number of responses required by a secondary task does not matter for its impact on visual WM as the primary task. The brightness-change task requires sustained visual attention to the fixation cross whether or not a change occurs, simply because a change could occur at any moment (see Poth et al., 2014). The tone-discrimination task requires central attention for as long as selecting each response takes—nevertheless, prior research on the effect of concurrent central-attention demands on verbal and spatial working memory has shown that the memory impairment by such secondary tasks depends not on the number of responses but on the cognitive load they impose (Barrouillet et al., 2007). To test these assumptions, we carried out Experiment 3, with the aim to answer two questions: First, does the dual-task cost of a visual-attention task on visual WM increase with the number of responses? Second, does the dual-task cost of a central-attention task increase with the number of responses when the number of responses is deconfounded from cognitive load?

We modified the dual-task conditions implemented in Experiment 1 in the following ways. First, in the visual-attention distractor task, we asked participants to detect between zero and four brightness changes in the fixation cross. The number of changes varied randomly across trials, such that participants could not anticipate when and how many changes would occur. This allowed us to more strictly control that visual attention was indeed sustained to the fixation cross through the whole RI, and to assess whether substantially increasing the processing requirements of the visual task would lead to a cost for visual WM. Although the brightness-change task with responses is a less pure visual-attention task than the 75% of brightness-change trials without responses that we focused on in Experiments 1 and 2, these responses do not involve response selection and therefore should not impose a substantial demand on central attention (Barrouillet et al., 2007; Pashler, 1994).

Second, in the central-attention distractor task, we presented zero, one, or two tones to be processed. In Experiment 1b, we observed that processing of one tone did not substantially impair visual WM performance, and we reasoned that this was due to the lower cognitive load on those trials (which allowed participant to process the tone and still have free time to again focus attention to information in WM). Experiment 3 aimed to deconfound the number of tone classifications from cognitive load, and therefore we kept cognitive load constant between the one-tone and two-tone conditions. If cognitive load, rather than the number of responses, drives the effect of the central-attention distractor task on WM, then one-tone and two-tone trials should yield similar performance. Furthermore, we included a zero-tone condition to assess for dual-task costs engendered by simply preparing for a secondary task. If there is some general but small dual-task cost of preparing for another task during the RI, we should observe that the zero-tone condition yields some mild cost to WM performance. This cost may be similar to the mild cost observed for the Visual condition in Experiment 1.

Method

Participants

Twenty-four students (M = 24.3 years old; three men) of the University of Zurich took part in two 1-hour sessions in exchange of course credit or 30 CHF. None of the participants took part in Experiment 1.

Stimuli and procedure

Participants completed the same continuous color delayed-estimation task used in Experiment 1. There were three experimental conditions: baseline, visual, and central.

Baseline condition.

The baseline condition was exactly as described in Experiment 1, with one exception: The duration of the RI was increased to 3 s. Participants completed a block with 30 trials of the baseline condition in each session (yielding a total of 60 baseline trials).

Visual conditions.

In this condition, participants completed the brightness-change distractor task during the RI of the visual WM task. As in Experiment 1, this task required the detection of a subtle change in brightness in the fixation cross. Whenever the change was detected (signal), participants had to press the spacebar. We varied the number of signals that occurred within the RI: zero, one, two, three, or four. Changes were scheduled to occur at least 500 ms after the offset of the memory array and at least 500 ms before the end of RI. In Experiment 1, mean RT to respond to the brightness change was between 300 and 400 ms. Hence, we scheduled signals to appear at least 500 ms apart from each other to allow sufficient time for responding. There were 200 trials in this condition. The number of changes was varied across trials, with the only constrain that the number of trials was evenly split across the change conditions. The zero-change condition is equivalent to the visual condition in Experiment 1: It required sustained attention to the fixation cross with no demands on response execution. The one–four-change conditions required participants to continuously respond to changes throughout the RI, and hence served as a tight control that sustained visual attention was indeed directed to the fixation cross during the whole RI. Before entering the dual-task condition, participant also completed a single-task block (40 trials) with the brightness-change task. In this block, zero–four brightness changes had to be detected within a 3-s period. This block served as initial practice (first 10 trials), and to gauge the single-task level of performance in this task (remaining 30 trials).

Central conditions.

In these conditions, the tone task was completed during the RI of the visual WM task. The tones were as described in Experiment 1. We varied, in an unpredictable fashion, the number of tones presented during the RI: zero, one, or two tones. There were 198 trials, which were evenly split across conditions.

When tones were scheduled to occur in the trial, the first tone was shown 500 ms after the offset of the memory array. In Experiment 1b, 1,000 ms was insufficient to respond to the tones in a larger number of trials. Hence in Experiment 3, we allowed 1,250 ms for responding to each tone. Furthermore, in Experiment 1b, we observed that processing a single tone within the RI did not impair performance compared to the baseline, potentially because of the lower cognitive load on one-tone trials compared to two-tone trials. In Experiment 3, we kept the cognitive load constant across the one-tone and two-tone trials. In the one-tone condition, participants were allowed 1,250 ms to respond to the tone, after which the RI ended with the presentation of the test array (total RI = 1,750 ms). In the two-tone condition, a second tone followed the response deadline of the first tone, and another 1,250-ms response window was inserted before the onset of the test array (total RI = 3 s). By varying the number of tones and the total length of RI simultaneously, we kept the cognitive load constant across trials with one and two tones. If cognitive load drives the impairment in performance in the WM task, one-tone and two-tone trials in Experiment 3 should yield the same level of performance. If the number of processing tones is what matters, then one-tone trials should yield better performance than two-tone trials. Last, in the zero-tone condition, no tone occurred for the whole 3-s RI. This allowed us to measure the pure costs of preparing for the tone task.

Participants also completed a single-task block with the tone task (40 trials) prior to the start of the dual-task condition. In this single-task baseline, participants responded to one or two tones within the same time constraints as in the dual-task condition. This served as practice with the tone task (first 10 trials) and also to gauge the accuracy of performance of the tone task when carried out on its own (remaining 30 trials).

Participants completed one block of baseline trials and one block of the visual condition in one session, and a block of baseline trials and the central condition in another session. The order of the blocks within the session (e.g., baseline-visual or visual-baseline) as well as the order of the conditions across sessions was fully counterbalanced across participants.

Results

Distractor-task performance

For the visual-attention distractor task, we computed accuracy in each trial as hits minus false alarms, divided by the number of brightness changes (signals). When no changes occurred, we computed accuracy as one minus false alarms. Finally, we set the lower bound on accuracy in each trial to zero (negative values occurred if participants made more false alarms than hits in a trial). Figure 4a shows the average accuracy as a function of the number of brightness changes in the single task block (excluding the first 10 trials as practice) and in the dual-task block (i.e., visual condition). Accuracy decreased as the number of signals per trial increased both in the single-task and the dual-task blocks (BF10 = 3.8 × 109), and accuracy was overall lower in the dual-task block (BF10 = 854.8). There was no interaction between those factors (BF10 = 0.3).

Fig. 4
figure 4

Results of Experiment 3. a Accuracy in the brightness-change task in the single-task and dual-task block (aka visual condition). b Accuracy in the tone task in the single-task and dual-task block (aka central condition). c Recall error across experimental conditions. Error bars show 95% within-subjects confidence intervals. (Color figure online)

For the central-attention distractor task, we computed accuracy in each trial as the proportion of correct responses. Figure 4b shows the accuracy in discriminating the presented tones in the single-task block (excluding the first 10 trials as practice) and in the dual-task block (i.e., central condition). Responses were overall highly accurate. There was modest evidence against an effect of number of tones (BF10 = 0.27), ambiguous evidence concerning the main effect of block (BF10 = 0.73), and modest evidence against their interaction (BF10 = 0.32).

In sum, this analysis showed that the visual-attention distractor task was increasingly more demanding as the number of brightness changes increased, and it was harder to perform under the dual-task condition. In contrast, the central attention task difficulty did not increase with the number of presented tones.

Recall error

Our first analysis focused on the raw recall error score. For the visual and central conditions, only trials in which the distractor tasks were performed correctly were considered for analysis (which led to the removal of 15.5% of the available trials). This selection was particularly influential for the visual condition, in which levels of accuracy were lower. At the same time, it guarantees that the retained trials were the ones in which attention was indeed engaged in the distractor tasks. Figure 4c shows the average recall error as a function of condition (baseline, visual, and central), and of the levels within condition (number of cross changes or number of tones).

Our main aim was to assess dual-task costs as a function of the processing requirements in the distractor tasks. To test for visual attention costs, we compared performance across the baseline and the visual condition, using condition and number of changes as factors in a Bayesian mixed-effects model. Given the nonorthogonal design, we did not allow for an interaction between these factors. There was substantial evidence against a main effect of number of changes (BF10 = 0.12). There was ambiguous evidence concerning the main effect of condition (BF10 = 1.1). Recall error was slightly higher in the visual condition; replicating the general mild costs engendered by the visual condition in Experiment 1, but the evidence for this effect remained inconclusive. At the same time, Experiment 3 shows that it does not matter how many visual changes had to be processed during the RI; visual WM performance stays the same.

To test for central-attention costs, we compared performance across the baseline and central conditions, with condition and number of tones as factors (again, no interaction was included). There was some evidence for a main effect of number of tones (BF10 = 3.64) and weak evidence for a main effect of condition (BF10 = 2.33). Visual inspection of Fig. 4c suggests that the effect of number of tones is due to differences between no tones (zero) and tones (one and two), whereas one and two tones yielded comparable levels of performance, as expected from the assumption that these two conditions implement the same level of cognitive load. To assess this possibility, we compared conditions with Bayesian t tests. The 1one-tone and two-tone conditions did not differ from each other (BF10 = 0.28). There was ambiguous evidence for a general dual-task cost, assessed by comparing the zero-tone condition to the Baseline (BF10 = 1.43).Footnote 2

Mixture modeling

We also submitted the data of Experiment 3 to mixture modeling. The results of which can be found in the Online Supplementary Materials. For brevity, we summarize the main findings here. Both dual-task conditions were associated with a lower precision than in the baseline (main effect of condition), irrespectively of the number of stimuli processed in these conditions (i.e., no effect of number of brightness changes or number of tones). Only the central condition was associated with a lower probability of recall of the test item. This was due to a reduction in the probability of recall from 0.43 in the baseline to about 0.29 in the one-tone and two-tone conditions (which did not differ from each other).

Discussion

Experiment 3 yielded two main findings. First, we again observed that distraction of visual attention had, at best, only a mild cost for visual WM performance. In Experiment 3, we tested the possibility that the visual task impairs visual WM more if we asked participants to process more stimuli and make more responses. This was not the case. Visual WM performance was not related to the number of visual stimuli processed during the RI in the visual condition. This is not due to the visual task being too easy, because detection accuracy decreased monotonically with the number of changes. This can also not be explained by participants prioritizing the visual WM task at the expense of the visual-distractor task because we only retained for analysis trials in which participant’s accuracy in the visual-attention distractor task was 100%. Hence, any trials in which participants may have deprioritized the visual-task were excluded.

There is one caveat, though: Performance of the visual-task did suffer from dual-task costs. One may wonder whether this indicates that detecting a brightness change and visual WM do compete for visual attention to some extent. This is one possibility. An alternative possibility is that maintenance of colors in visual WM interfered with making fast responses in the brightness-change task. In Experiment 3, the brightness-change task involved two demands: Keeping sustained visual attention to the fixation cross and making a speeded response whenever a change occurred. The latter, but not the former, increased with the number of brightness changes. The dual-task cost of visual WM maintenance on brightness change performance was not observed in Experiment 1, and in the present experiment it increased with the number of required responses. Therefore, it could be response execution rather than visual attention that is impaired by the concurrent maintenance of colors in WM.

We cannot rule out that visual attention contributes to some degree to visual WM, but we note that the results of Experiments 1 and 2 shows that this contribution is smaller than for a task clearly tapping visual attention (i.e., for MOT). In Experiment 2, concurrent performance of the brightness-task and MOT yielded costs for both tasks (see Table 2). In contrast, in Experiment 1 the brightness task with the same processing requirements did not yield measurable costs for either task (see Table 1, Experiment 1b). In Experiment 3, we pushed the processing requirements of the brightness-change task to the limit, and then we started to observe dual-task costs in this visual-attention task, but still no cost for visual WM. Hence, altogether, the emerging pattern from the three studies is that visual attention plays at best a minor role for visual WM.

The second main finding of Experiment 3 is that, once again, distraction of central attention impaired visual WM performance. When analyzed with the mixture model, distracting central attention affected mainly the parameter reflecting the probability of recalling the test item. Across Experiments 1 and 3, the central attention condition was associated with a drop of about 40% to 50% on PM compared to the baseline. Experiment 3 confirmed that this impairment was a not a function of the number of tones to be processed, or the duration of the RI, when cognitive load was held constant. Hence, Experiment 1b and Experiment 3 taken together confirm that visual WM performance depends on the cognitive load imposed by the central-attention distraction task.

The comparison of baseline to the zero-change or zero-tone conditions suggests that maintaining the task set for a secondary task might entail some performance impairment, consistent with a general, but mild, dual-task cost. Mixture modeling suggested that this cost affected the precision of the representations in WM: performance in the visual and central conditions was associated with a larger (but similar) σ compared to the baseline. As we had no theoretically motivated prediction for this comparison, and the evidence for the cost of maintaining a task set was far from compelling, we refrain from speculating about an explanation of this effect.

General discussion

The present study was concerned with the role of attention during maintenance of visual information in WM. We showed that attending to representations in visual WM improves recall. This attentional effect, which we termed a refreshing benefit, is a function of how often an item was refreshed. Given that attention benefits WM maintenance, it should be costly to divert attention away. Here we investigate the putative role of two kinds of attention in continuous visual WM: visual attention and central attention. By looking at the pattern of dual-task costs yielded by distractor tasks engaging these two forms of attention, we established a double dissociation: visual WM draws on central attention, whereas MOT depends on visual attention.

Visual attention and central attention in WM

Many theoretical views assume a crucial role of attention in WM. Here we investigated the putative role of two forms of attention for visual WM: visual attention and central attention.

Visual attention.

Many authors have assumed that sustained visual attention is critical for the maintenance of visual information in WM (Awh et al., 2006; Chun, 2011; Gazzaley & Nobre, 2012; Kiyonaga & Egner, 2012; Olivers, 2008; Theeuwes et al., 2009). Nevertheless, there is growing evidence that visual WM and visual attention are dissociated. Participants can attend to visual representations while at the same time keeping other visual information in WM with little cost (Hollingworth, 2004; Maxcey-Richard & Hollingworth, 2013). Moreover, one visual WM object can be prioritized without sustained visual attention being directed to it (Hollingworth & Maxcey-Richard, 2013; Rerko et al., 2014). A corresponding dissociation has also been reported for spatial WM and spatial attention (Belopolsky & Theeuwes, 2009). Here, we showed that maintenance of visual representations in WM is hardly affected when a secondary task prevents participants from moving visuospatial attention to the locations previously occupied by memory items, thereby preventing visuospatial rehearsal mechanisms from being used (Williams et al., 2013).

The observation that visual attention is not needed to maintain representations in visual WM is in line with the findings by H. Zhang, Xuan, Fu, and Pylyshyn (2010). In this study, performance of a visual WM task was not impaired by concurrent performance of a MOT task, unless spatial representations were encoded in visual WM. Only spatial WM representations were susceptible to the interference produced by tracking objects across different spatial locations in the visual field (see also Fougnie & Marois, 2006). The interference yielded by the MOT task in these studies is therefore caused not by the general requirement to sustain visual attention to other representations during the RI but because doing so promotes the encoding of irrelevant, interfering information.

The results of Experiment 2 may be informative for theories of visual attention and its role in the MOT task. Some authors have proposed that limitations in the MOT task arise solely due to spatial interference between the tracked targets: Targets inhibit each other when they come close together, which is more likely to occur the larger the number of targets (Franconeri et al., 2013). According to this view, MOT performance should be solely constrained by how close target items are from each other. Other authors have proposed that tracking capacity is limited by a resource to process or analyze incoming information. According to this view, tracking performance should also be impaired by additional nonspatial demands on that resource. Experiment 2 can be seeing as providing one such nonspatial demand: participants had to track moving targets while monitoring the fixation cross, a location to which targets never came close. Nevertheless, tracking was impaired by the requirement to monitor the fixation cross. This dual-task cost is consistent with the predictions of resource theories of visual attention (Alvarez & Franconeri, 2007; Franconeri et al., 2013).

Central attention.

Experiments 1 and 3 showed that central attention contributes to the maintenance of continuous information in visual WM: When central attention is diverted away during the RI, memory suffers. This effect was observed despite the fact that our visual WM task and our central attention task had little representational overlap, thereby limiting interference. Moreover, we observed that the degree of memory impairment depended on the cognitive load imposed by the central task. When participants had less time to process each tone (as in the two-tone vs. one-tone contrast of Experiment 1b), their memory impairment was more severe (see also Hardman et al., 2017), and this effect was not due to the number of tones to the be processed during the RI (see Experiment 3). These findings are in line with the cognitive-load effect observed in complex span tasks with both visuospatial and verbal materials (Barrouillet & Camos, 2012; Vergauwe et al., 2009, 2010; Vergauwe, Camos, et al., 2014), and also in a change-detection task with visual materials (Vergauwe, Langerock, et al., 2014). Jointly, these results suggest that central attention plays a role in the maintenance of different forms of representations in WM, irrespectively of these representations being visual or verbal.

What does central attention contribute to WM? The simplest explanation is that it enables refreshing. This explanation also dovetails with the observation that refreshing as well as distraction of central attention modulates the same parameter of the mixture model: Refreshing increases the probability of recalling the refreshed information from WM, whereas distraction of central attention reduces it. Refreshing could be beneficial to maintenance for several reasons. It could protect memory representations from decay (Barrouillet et al., 2011; Pertzov, Bays, Joseph, & Husain, 2013). Alternatively, it could strengthen the refreshed memory contents, or their bindings to their locations, thereby reducing the chance of erroneously retrieving the wrong item from WM, and protecting the refreshed items from visual interference at test (Rerko & Oberauer, 2013; Souza, Rerko, & Oberauer, 2016).

Whereas this account is simple and in line with previous research, we cannot rule out a more complex scenario in which the beneficial effect of guided refreshing and the detrimental effect of distracting central attention are independent of each other. In such a scenario, refreshing does not rely on central attention, and distracting central attention does not impair refreshing. This scenario remains a possibility, but we find it unattractive because it lacks parsimony, and it demands some implausible assumptions: First, because refreshing is assumed not to rely on central attention, and visual attention demonstrably plays little, if any role in WM maintenance, refreshing would have to rely on yet another form of attention, and the literature on attention does not offer a plausible candidate for that role. Second, if a secondary task demanding central attention does not disrupt refreshing, we would have to find another reason for why it impairs WM. One possibility is that our tone task introduces representations into WM that interfere with the representations of the to-be-remembered colors (Oberauer, Farrell, Jarrold, Pasiecznik, & Greaves, 2012). Because we designed the tone task to minimize any involvement of visual or spatial information, this possibility is not plausible. Another possibility is that central attention is a domain-general limited resource (Navon & Miller, 2002; Tombu & Jolicœur, 2003), and that resource is also needed for maintaining precise visual representations in WM (Bays & Husain, 2008; Cowan, 1999; Luck & Vogel, 2013; Ma, Husain, & Bays, 2014; W. Zhang & Luck, 2008). This account faces the challenge of explaining the refreshing benefit. Refreshing could be assumed to shift resources to the refreshed item, taking them from all other items. However, we have not observed costs of refreshing for not-refreshed items (Souza et al., 2015). In light of these considerations, we argue that the simplest and most convincing explanation of the present results is that refreshing requires central attention, and a secondary task engaging central attention impairs visual WM by disrupting refreshing.

Conclusion

In sum, the present study showed that visual WM improves when attention is guided to individual representations in WM (thereby refreshing them). Most likely this effect relies on central but not visual attention: Distracting central attention, but not visual attention, during the retention interval impairs visual WM. The reverse pattern was observed for a task engaging visual attention, namely the MOT task: performance of this task suffers when visual attention but not when central attention is engaged otherwise. These findings provide a double dissociation between visual WM, on the one hand, and attention to currently perceived visual stimuli, on the other hand.