Introduction

Visual working memory (VWM) temporarily stores and manipulates a limited set of visual information (Beukers et al., 2021; Lorenc et al., 2021; Luck & Vogel, 1997; Oberauer et al., 2018) and is critical for both low- and high-level cognitive activities (e.g., Cowan, 2017; Hollingworth et al., 2008; Honig et al., 2020; for reviews, see Luck & Vogel, 2013). Various studies have focused on the storage of VWM, including VWM capacity (e.g., Adam et al., 2017; Cowan, 2001; Luck & Vogel, 1997), representation resolution (e.g., Bays & Husain, 2008; for reviews, see Ma et al., 2014; van den Berg & Ma, 2018; Zhang & Luck, 2008), the interaction between VWM and attention (Bocincova & Johnson, 2019; Serences et al., 2009; Woodman & Vogel, 2008), and the format of representations (e.g., Fougnie et al., 2013; Schneegans & Bays, 2017; Schneegans et al., 2022; Treisman & Zhang, 2006). Recently, with the emphasis on its “working” nature, VWM’s manipulation function has been the focus. Examples include directed forgetting (removing specified representations in VWM; e.g., Dames & Oberauer, 2022; Festini & Reuter-Lorenz, 2013; Lintz & Johnson, 2021) and memory updating (replacing certain representations in VWM with new ones; e.g., Ecker et al., 2010; Kim et al., 2020; Shan & Postle, 2022). In the present study, we concentrated on another aspect reflecting the working nature of VWM—that is, the selective maintenance of VWM, which refers to selectively maintaining certain representations in VWM while ignoring the others. Although this function has been investigated (e.g., Gunseli et al., 2015; van Moorselaar et al., 2015; Williams & Woodman, 2012), its mechanism remains to be elucidated.

Previous studies on VWM selective maintenance have primarily focused on objects displayed at distinct locations (e.g., Dube et al., 2019; Gözenman et al., 2014; Griffin & Nobre, 2003; Gunseli et al., 2015; Makovski et al., 2008; Maxcey-Richard & Hollingworth, 2013; Souza et al., 2014; but see Park et al., 2017; Sasin & Fougnie, 2020), in which retro-cueing tasks were commonly employed. Participants initially memorized a set of items and then selectively maintained a subset of those according to a cue presented after the offset of the memory array (e.g., Pertzov et al., 2013). As a result, researchers found that the fidelity of retained representations increased, and the probability of recalling the task-irrelevant items dropped dramatically (Gunseli et al., 2015; van Moorselaar et al., 2015). Electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) studies have suggested that task-irrelevant information may be in a neurally silent state after the retro-cue, while the cued information is in an active state of VWM (Lewis-Peacock et al., 2012; Rose et al., 2016; Wolff et al., 2017). Alternatively, both task-irrelevant information and task-relevant information have been represented actively, but with opposite neural codes (Yu et al., 2020).

However, in some cases in everyday life, the specific content of an item is selectively maintained. Examples include remembering the contour of a car while ignoring its color. Park et al. (2017) partially explored the selective maintenance of an object’s feature in VWM. They instructed participants to remember a Gabor’s color and orientation. Then, a retro-cue indicated which feature should be reproduced later with 80% validity. The results showed that the stored object representations could be unbound as individual features so that active maintenance could focus on the task-relevant features. This finding aligns with the accumulating evidence suggesting that objects can be stored as unbound individual features rather than as bound units (e.g., Hardman & Cowan, 2015; Wang et al., 2017). Nevertheless, because Park et al. (2017) required participants to partially retain irrelevant features to fulfill the task, whether VWM can selectively maintain an object’s feature while ignoring the others (i.e., selective maintenance of an object’s feature) remains unknown. In addition, Sasin and Fougnie (2020) investigated the selective maintenance of an object’s feature by adopting the retro-cue paradigm combined with a visual search task. They observed that the color of a dual-feature object in VWM can retain one’s attention even after becoming task-irrelevant, implying that VWM cannot selectively maintain an object’s single feature while ignoring its color. In their study, the search display comprised dual-feature objects containing both task-irrelevant and task-relevant feature dimensions. Consequently, the performance difference between conditions with and without the task-irrelevant feature presented as a distractor in the search display may not solely be attributed to the task-irrelevant feature because different feature values of search distractors on the task-relevant feature dimension may possess varying priorities in capturing attention. Therefore, a more plausible approach would be to utilize features specifically from the task-irrelevant feature dimension, rather than using dual-feature objects as the search stimuli. This would allow for a purer assessment of the interference effect caused by the task-irrelevant feature and aid the investigation regarding the existence of selective maintenance.

In light of the interactive model of perception and VWM (T. Gao et al., 2011; Gao & Bentin, 2011; Gao et al., 2010), which posits that VWM dynamically engages in different stages of perception, we argue that the selective maintenance of an object’ feature in VWM is determined by the stage of feature processing during perception. Traditional theories of visual perception have characterized it as comprising two stages of processes (Neisser, 1967): The former consists of parallel preattentive processing, which enables quick detection of distinctive features, and the latter is attentive processing, which binds multiple features from the same object and recognizes its detailed information (Treisman & Gelade, 1980; Wolfe, 1994). By applying these principles to object-based storage (Adam et al., 2017; Luck & Vogel, 1997; Zhang & Luck, 2008), the interactive model of perception and VWM proposes that object-based storage in VWM is not the final product of perception. Instead, it originates from the “preattentive objects” (Wolfe & Bennett, 1997) or “proto-objects” (Rensink, 2000) created by parallel perception. Once these proto-objects are selected for storage in VWM, online perception progresses to the next stage. During this stage, focal attention gradually integrates detailed information into the proto-objects until final coherent object representations are constructed (Ullman, 1984).

Based on this model, the selection, consolidation, and maintenance of objects in VWM are affected by the stages of perception in which the features are extracted (Gao et al., 2011; Z. Gao & Bentin, 2011; Gao et al., 2010). To represent information extracted at different stages (parallel vs. serial) of perception, two kinds of features were chosen: highly discriminable and fine-grained features. Treisman’s feature integration theory posits that highly discriminable features (e.g., colors and shapes) can be identified via parallel processing, while the processing of fine-grained features of objects (e.g., random polygons) necessitates attentive processing to bind features together (Treisman & Gelade, 1980; for a review, see Wolfe, 2003). According to the interactive model, the conjunctions of simple features are automatically extracted as object-based units in VWM because they require only preattentive processing, after which they are represented as object-based storage. Consequently, they are selected, consolidated, and maintained as integrated objects in VWM. Conversely, the conjunctions of complex features cannot be represented as part of an integrated object by the end of parallel perceptual processes. Instead, they require further attentive processing, during which focal attention is needed to incorporate detailed information into object representations in VWM. Therefore, complex features cannot be selected automatically for storage in VWM when they are task-irrelevant. The efficiency of consolidating complex features should be lower compared with that of simple features due to the additional time required for further perceptual processing. Furthermore, the conjunctions of complex features may exhibit instability during maintenance in VWM. This model has received support from behavioral, electroencephalographic, and clinical studies (Z. Gao et al., 2010, 2013; Yin et al., 2012; Zhao et al., 2018).

Inspired by the interactive model of perception and VWM, we hypothesized that distinct mechanisms underlie the selective maintenance of an object’s feature in VWM, depending on the perceptual processing nature of the features. Specifically, given that the conjunctions of simple features are automatically encoded as objects in VWM, they may be resistant to selective maintenance manipulations. In contrast, for objects containing fine-grained features, focal attention is required to further integrate detailed information into object representations, and this process is gradual and susceptible to interference. Therefore, under top-down demands, objects can be unbound so that task-irrelevant features are removed from the active state in VWM.

We tested this hypothesis by adopting a retro-cue paradigm combined with a visual search task (Sasin & Fougnie, 2020). Colored shapes were used as the memorized stimuli. A colored shape was first memorized, followed by a retro-cue informing participants to selectively retain one feature and ignore the other (color or shape). The stimuli could contain two highly discriminable features (colored simple shapes; Experiment 1), or one highly discriminable feature and a fine-grained feature (colored polygons; Experiments 2–4; e.g., Z. Gao et al., 2009). Participants then had to complete a visual search task. The search display was composed of features for which the dimensions were the same as those of the irrelevant features. The key manipulation was whether the ignored feature appeared as a distractor in the search display. Evidence suggests that VWM content can automatically guide attention (e.g., Bahle et al., 2018; Hollingworth et al., 2013; Ort et al., 2017). Gao et al. (2016) demonstrated that this phenomenon could be used to probe whether a task-irrelevant feature was in the active state of VWM. If the selective maintenance of an object’s feature does not hold (i.e., the irrelevant feature is not removed from the active state of VWM), the irrelevant feature will be actively maintained in VWM regardless of the retro-cue. The irrelevant feature will capture attention as a distractor in the visual search task, impairing one’s search performance. If selective maintenance of an object’s feature occurs, the ignored feature will be removed from the active state of VWM after the retro-cue. As a result, the irrelevant feature will not affect performance in the visual search task. We utilized this paradigm to examine whether individuals could effectively remove irrelevant features of an object, thereby controlling which features were selectively stored in VWM. Hence, we assessed the effectiveness of selective maintenance by examining the fate of unselected information in VWM, rather than directly investigating the selection of information in VWM.

Experiment 1: Selective maintenance of highly discriminable features

Methods

Participants

Twenty-four volunteers (six males and 18 females, M = 21.8 ± 2.2 years old) from Zhejiang University participated in this experiment for payment or course credit in 2017. The sample size was determined a priori based on PANGEA (Westfall, 2016). Based on the results of the t test utilized by Z. Gao et al. (2016) (neutral condition vs. irrelevant-match condition in Experiment 1, n = 22, p = 0.001), with a design similar to ours, we calculated the effect size of the t test as Cohen’s d = 0.81. Therefore, we predicted the effect size Cohen’s d to be 0.81 (equals to ηp2 = 0.14) for the main effect of distractor presence in the repeated-measures analysis of variance (ANOVA) conducted for this experiment. The suggested sample size was approximately 16 to obtain at least 95% power for the main effect of distractor presence in the repeated-measures ANOVA at a significance level of 0.05. Twenty-four participants were recruited in Experiment 1 to ensure adequate power. All were right-handed and reported normal or corrected-to-normal visual acuity. Signed informed consent was obtained before the study. The study was approved by the Research Ethics Board of Zhejiang University and performed according to the approved guidelines.

Stimuli and apparatus

The experiment was run on a 19-inch CRT monitor with a viewing distance of 57 cm and a resolution of 1,024 × 768 pixels at a 100-Hz refresh rate. The background was gray (RGB: 128, 128, 128). The experiment was programmed using MATLAB (MathWorks, Natick, MA, USA) with Psychtoolbox extensions (Brainard, 1997; Pelli, 1997).

Six different simple shapes and colors were randomly selected for the memory array, as illustrated in Fig. 1. Each item was presented at the screen center (1.27° × 1.27° visual angle). The retro-cue informed either the color or shape feature (in Chinese characters, black; RGB: 0, 0, 0), which was presented above the fixation (2° × 1°), indicating the relevant memory dimension.

Fig. 1
figure 1

Three types of features used in this study. (Color figure online)

The search display was composed of four white shapes or four colored “clouds,” on top of which were four lines (0.75° × 0.15°). The shapes and colors were chosen from the same six possible values used for the memory stimuli. The target line was tilted 45° either to the left or to the right of the vertical line, while the three other lines were either vertical or horizontal as distractors. Each shape or colored cloud was uniformly distributed on an invisible circle (radius = 2.5°).

Experiment design and procedure

The experimental procedure is illustrated in Fig. 2. After a 500-ms fixation, a memory item was presented for 500 ms. Participants were instructed to memorize its color and shape. Following a 500-ms blank interval, a retro-cue appeared for 300 ms to indicate which feature was relevant for the end of the trial. The search display appeared after another 500-ms blank interval. Participants were asked to search for the target (tilted) line as quickly as possible and judge whether it tilted to the left or right (“J” for right, “F” for left, 50% trials in each case). Responses were to be completed within 2000 ms. Finally, a shape or colored cloud appeared after a 400 ms blank interval. Participants needed to answer whether the relevant feature indicated by the retro-cue matched the probe by pressing corresponding keys (“J” for yes, “F” for no, 50% trials in each case). Responses were to be completed within 2,000 ms. Once the participants responded, the probe was removed. The intertrial interval was randomly selected from a uniform distribution between 1,000 and 1,500 ms. Feedback was provided only during practice sessions for each task.

Fig. 2
figure 2

Procedure illustration and the experimental condition in Experiment 1. Participants were first required to remember the color and shape of the memory item. After a 500-ms blank interval, a retro-cue appeared, indicating which feature was relevant for the report at the end of trials. A visual search display then appeared, in which the irrelevant feature was presented as a distractor (distractor condition) or not (nondistractor condition). Finally, participants should answer whether the colored cloud or shape in the center matched the relevant feature. (Color figure online)

The experiment used a 2 (relevant feature: color/shape) × 2 (distractor presence: distractor/nondistractor) within-subjects design. The first factor indicated whether the color or shape was relevant for the report at the end of the trials. The ignored feature dimension appeared as background stimuli in the visual search task. In the distractor condition, the ignored feature appeared as the background color/shape of a distractor. In the nondistractor condition, all colors/shapes in the search display were different from the ignored features. Each combined condition contained 32 trials, which were randomly divided into four blocks. Before formal trials, 16 practice trials were required to ensure that participants understood the procedure. The entire duration was approximately 30 min.

Data analysis

We analyzed the accuracy of the change detection task and reaction time (RT) in the visual search task with a 2 (relevant feature: color/shape) × 2 (distractor presence: distractor/nondistractor) repeated-measures ANOVA. Only correct trials in the two tasks were included in the RT analysis of the visual search task. The Bayes factor (BF; Rouder et al., 2009, 2012) was calculated using the R package of BaysFactor (http://bayesfactorpcl.r-forge.r-project.org; Rouder et al., 2009, 2012). We reported BF10 from models using the BaysFactor package’s default prior width (r scale = 0.5) on the main and interaction effects. Besides, we also ran versions that used the BaysFactor package’s two other prior widths (r scale = 0.707 and 1) to check the sensitivity of BF10 to the choice of priors. According to Jeffreys (1961), if the BF10 is above 3, substantial evidence supports the corresponding main effect or interaction effect. However, if the BF10 is below 1/3, it is assumed that substantial evidence supports the null hypothesis (assuming no effect) of corresponding main effect or interaction effect.

Transparency and openness

All data and materials have been made publicly available via Open Science Framework and can be accessed online (https://osf.io/mxn86/). Data analysis were completed through SPSS 25.0 and R (Version 4.4.1). All experiments in this study were not preregistered.

Results and discussion

Overall, participants performed well on the VWM task (mean accuracy: 94.79%). The ANOVA revealed that none of the effects were significant for accuracy, distractor presence: F(1, 23) = 0.53, p = 0.480, ηp2 = 0.022, BF10 = 0.27; relevant feature: F(1, 23) = 0.16, p = 0.692, ηp2 = 0.007, BF10 = 0.22; interaction effect: F(1, 23) = 0.52, p = 0.480, ηp2 = 0.022, BF10 = 0.35. The results of BF10 were consistent with those from null hypothesis significance testing (NHST), which favored the null hypothesis for all effects compared with the alternative hypothesis. By changing prior widths, we found that all BF10 estimates were similar to those from models that used default prior widths.

The ANOVA for searching accuracy only yielded a significant interaction, distractor presence: F(1, 23) = 3.63, p = 0.069, ηp2 = 0.14, BF10 = 0.55; relevant feature: F(1, 23) = 0.84, p = 0.370, ηp2 = 0.05, BF10 = 0.32; interaction effect: F(1, 23) = 4.67, p = 0.041, ηp2 = 0.17, BF10 = 2.59. Given that the BF10 of the interaction effect was not larger than 3, the significant interaction effect might not have been reliable. Participants searched targets less accurately in the distractor condition than in the nondistractor condition when they were cued to retain the color [96.88% vs. 98.83%], t(23) =  − 2.39, p = 0.025, Cohen’s d =  − 0.49, BF10 = 2.61. According to the BF10 value, however, this significant effect should be cautiously considered. The performance did not differ between the two conditions when shape had to be retained [98.57% vs. 98.18%], t(23) = 0.77, p = 0.450, Cohen’s d = 0.02, BF10 = 0.35. Under different prior widths, the results of BF10 were similar.

The ANOVA of searching RT (Fig. 3) revealed that only the main effect of distractor presence was significant, distractor presence: F(1, 23) = 23.80, p < 0.001, ηp2 = 0.51, BF10 = 2710.89; relevant feature: F(1, 23) = 1.13, p = 0.298, ηp2 = 0.05, BF10 = 0.45; interaction effect: F(1, 23) = 0.61, p = 0.441, ηp2 = 0.03, BF10 = 0.32. Importantly, the results of BF10 also confirmed the significant effect of distractor presence and the null effect of the interaction effect. Under different prior widths, the results of BF10 were similar. Participants responded more slowly when the ignored feature appeared as a distractor in the visual search task than when it did not (878 ms vs. 834 ms), suggesting that selectively maintaining one feature dimension of the object composed of highly discriminable features and ignoring the other was quite difficult. Additionally, we discerned no trade-off between response speed and accuracy.

Fig. 3
figure 3

The results of searching RT in Experiment 1. The bars represented group mean, with error bars indicating the within subject 95% confidence intervals. **p < .01. ***p < .001, n.s. > .05. (Color figure online)

Experiment 2: Selective maintenance of fine-grained features

Methods

In Experiment 2, we still focused on whether the irrelevant features could interfere with visual search performance as distractors, investigating the effect of distractor presence. The sample size was determined in a similar way as that of Experiment 1. We conservatively predicted the effect size Cohen’s d to be 0.81 (equals to ηp2 = 0.14) for the main effect of distractor presence in the repeated-measures ANOVA of Experiment 2 based on previous studies (Gao et al., 2016), although we got a much larger main effect of distractor presence in Experiment 1 (ηp2 = 0.51) than that we predicted. The suggested sample size was approximately 16 to obtain at least 95% power for the main effect of distractor presence at a significance level of 0.05. Twenty-four volunteers (14 males and 10 females, M = 19.4 ± 2.2 years old) from Zhejiang University were recruited in 2017 to ensure adequate power. The experimental procedure was similar to Experiment 1 (Fig. 4). The only difference was that the memory items were colored polygons composed of fine-grained features instead of simple shapes (Fig. 1). All other parameters were the same as in Experiment 1.

Fig. 4
figure 4

Procedure illustration and experimental conditions in Experiments 2 and 3. The procedure and design in Experiments 2 and 3 were the same as in Experiment 1, except that the stimuli were colored polygons composed of fine-grained features. The interval before and after the retro-cue in Experiment 3 was shortened from 500 to 200 ms. (Color figure online)

Results and discussion

Overall, participants performed well in the VWM task (mean accuracy: 93.23%). The ANOVA revealed two significant main effects, distractor presence: F(1, 23) = 5.87, p = 0.024, ηp2 = 0.203, BF10 = 0.76; relevant feature: F(1, 23) = 25.37, p < 0.001, ηp2 = 0.524, BF10 = 3,628,295; the interaction effect: F(1,23) = 0.26, p = 0.615, ηp2 = 0.011, BF10 = 0.31. Participants performed better when required to retain the color compared with retaining the polygon (96.29% vs. 90.17%). The accuracy was higher in the distractor condition compared with the nondistractor condition (94.01% vs. 92.45%). It is worth noting that when considering BF10, the null hypothesis of distractor presence was more favored compared with the alternative hypothesis, which was different from the results from NHST, therefore the effect of distractor presence should be considered cautiously. Except for the effect of distractor presence, the other results of BF10 were consistent with those in NHST. Under different prior widths, the results of BF10 were similar.

The ANOVA for searching accuracy did not yield any significant effects, distractor presence: F(1, 23) = 0.92, p = 0.347, ηp2 = 0.039, BF10 = 0.30; relevant feature: F(1, 23) = 2.76, p = 0.110, ηp2 = 0.107, BF10 = 1.06; the interaction effect: F(1,23) = 1.88, p = 0.183, ηp2 = 0.76, BF10 = 0.56. Different from the results in NHST, the results of relevant feature BF10 implied that there was no evidence for either the null hypothesis or the alternative hypothesis. Under different prior widths, the results of BF10 were similar and neither showed substantial evidence supporting significant effects.

The ANOVA for searching RT (Fig. 5) showed that the main effect of distractor presence and the interaction effect were significant, distractor presence: F(1, 23) = 13.91, p = 0.001, ηp2 = 0.38, BF10 = 5.09; relevant feature: F(1, 23) = 0.004, p = 0.949, ηp2 < 0.01, BF10 = 0.21; interaction effect: F(1, 23) = 38.49, p < 0.001, ηp2 = 0.63, BF10 = 60.24. There was no significant difference in RT between the distractor and nondistractor conditions when participants had to selectively retain the color [800 ms vs. 807 ms], t(23) =  − 0.99, p = 0.331, Cohen’s d =  − 0.20, BF10 = 0.41. Since the related BF10 is larger than 1/3, this nonsignificant result should be taken with caution. However, participants responded more slowly in the distractor condition compared with the nondistractor condition when they had to retain the polygon [833 ms vs. 772 ms], t(23) = 5.67, p < 0.001, Cohen’s d = 1.16, BF10 = 1281.41. Under different prior widths, the results of BF10 were similar and consistent with those from NHST. These results indicated that participants were able to selectively maintain the simple feature (color) of colored-polygons (i.e., successfully ignoring polygons), but not the fine-grained feature (polygon) of colored-polygons (i.e., failing to ignore colors).

Fig. 5
figure 5

The results of searching RT in Experiment 2. The bars represented group mean with error bars indicating the within subject 95% confidence intervals. ***p < .001. n.s. > .05. (Color figure online)

Cross-experiment analysis

To confirm whether the task-irrelevant simple shapes and complex polygons induced different attentional capture effects, we further conducted a cross-experiment analysis comparing the color-relevant conditions in Experiments 1 and 2. A mixed ANOVA was performed on searching RT, with shape type (simple shapes in Experiment 1 vs. complex polygons in Experiment 2) as a between-subject factor and distractor presence (distractor vs. nondistractor) as a within-subject factor. The results showed that the main effect of distractor presence and the interaction effect were significant, distractor presence: F(1, 46) = 4.48, p = 0.040, ηp2 = 0.09, BF10 = 1.23; shape type: F(1, 46) = 2.09, p = 0.155, ηp2 = 0.04, BF10 = 0.83; interaction effect: F(1, 46) = 9.26, p = 0.004, ηp2 = 0.17, BF10 = 7.68. When the task-irrelevant features were simple shapes, participants responded more slowly in the distractor condition compared with the nondistractor condition [881 ms vs. 842 ms], t(23) = 2.92, p = 0.008, Cohen’s d = 0.60, BF10 = 6.20. However, for the task-irrelevant complex polygons, participants’ RT did not differ significantly between the distractor and nondistractor conditions [800 ms vs. 807 ms], t(23) =  − 0.99, p = 0.331, Cohen’s d =  − 0.20, BF10 = 0.41. Under different prior widths, the results of BF10 were similar and consistent with that in NHST. These results provided further evidence that participants were able to selectively ignore the complex polygons, but not the simple shapes.

Experiment 3: Does the polygons decay from VWM in a long interval?

Fine-grained features (polygons) caused no interference when they were task-irrelevant in Experiment 2. This situation may be related to VWM decay rather than removal from the active state of VWM. Previous studies have implied that task-irrelevant features could be forgotten after 1,000–1,500 ms of encoding (Logie et al., 2011). The interval used in Experiment 2 was 1,300 ms. In Experiment 3, the interval between the memory item and visual search task was shortened to 700 ms (cf. Figure 4). The decay alternative predicted that the polygon interference effect would occur, as in Experiment 1.

Methods

The sample size was determined in the same way with Experiment 2. Twenty-four volunteers (five males and 19 females, M = 22.7 ± 2.5 years old) were recruited in 2017. The experimental procedure was similar to Experiment 2. The only difference was that the blank intervals before and after the retro-cue were both shortened to 200 ms. All other parameters were the same as those used in Experiment 1.

Results and discussion

The overall accuracy in the VWM task was 93.72%. The ANOVA revealed that only the main effect of relevant feature was significant, relevant feature: F(1, 23) = 29.93, p < 0.001, ηp2 = 0.57, BF10 = 10,825,111; distractor presence: F(1, 23) = 2.29, p = 0.144, ηp2 = 0.09, BF10 = 0.42; the interaction effect: F(1, 23) = 2.84, p = 0.106, ηp2 = 0.11, BF10 = 0.63. Participants memorized colors better than polygons (96.81% vs. 90.63%). Under different prior widths, the results of BF10 were similar and only showed substantial evidence for the significant effect of relevant feature.

The ANOVA for searching accuracy showed that none of the effects were significant, distractor presence: F(1, 23) < 0.001, p = 1.000, ηp2 < 0.001, BF10 = 0.23; relevant feature: F(1, 23) < 0.001, p = 1.000, ηp2 < 0.001, BF10 = 0.22; the interaction effect: F(1, 23) = 0.90, p = 0.354, ηp2 = 0.04, BF10 = 0.39. Under different prior widths, the results of BF10 were similar and consistent with that in NHST.

The ANOVA for searching RT (Fig. 6) revealed that the main effect of distractor presence and the interaction effect were significant, distractor presence: F(1, 23) = 5.85, p = 0.024, ηp2 = 0.20, BF10 = 0.46; relevant feature: F(1, 23) = 3.09, p = 0.092, ηp2 = 0.12, BF10 = 2.04; interaction effect: F(1, 23) = 9.36, p = 0.006, ηp2 = 0.29, BF10 = 5.87. Contrary to NHST, the results of BF10 only showed substantial evidence for the interaction effect. Participants’ RT did not differ significantly between the distractor and nondistractor conditions when they were cued to selectively retain the color [1,124 ms vs. 1,136 ms], t(23) =  − 1.26, p = 0.221, Cohen’s d =  − 0.26, BF10 = 0.53. Since the related BF10 is larger than 1/3, this nonsignificant result should be taken with caution. However, participants responded more slowly in the distractor condition compared with the nondistractor condition when the polygons had to be retained [1,171 ms vs. 1,133 ms], t(23) = 3.80, p = 0.001, Cohen’s d = 0.78, BF10 = 28.65. Under different prior widths, the results of BF10 were similar. These results replicated the findings from Experiment 2, even with a shortened delay duration. Therefore, the decay hypothesis was excluded. In addition, with the stimuli presented centrally and sufficient searching time, the problem of fuzzy information was likely not a concern for the complex stimuli.

Fig. 6
figure 6

The results of RT in Experiment 3. The bars represented group mean with error bars indicating the within subject 95% confidence intervals. **p < .01, n.s. > .05. (Color figure online)

Cross-experiment analysis

We also conducted a cross-experiment analysis comparing the color-relevant conditions in Experiments 1 and 3. A mixed ANOVA was performed for the searching RT, with shape type (simple shapes in Experiment 1 vs. complex polygons in Experiment 3) as a between-subject factor and distractor presence (distractor vs. nondistractor) as a within-subject factor. The results showed that the main effect of shape type and the interaction effect were significant, distractor presence: F(1, 46) = 2.74, p = 0.105, ηp2 = 0.06, BF10 = 0.55; shape type: F(1, 46) = 38.91, p < 0.001, ηp2 = 0.45, BF10 = 36,496.35; interaction effect: F(1, 46) = 9.66, p = 0.003, ηp2 = 0.17, BF10 = 46.25. When the task-irrelevant features were simple shapes, participants responded more slowly in the distractor condition compared with the nondistractor condition [881 ms vs. 842 ms], t(23) = 2.92, p = 0.008, Cohen’s d = 0.60, BF10 = 6.20. However, for the task-irrelevant complex polygons, participants’ RT did not differ significantly between the distractor and nondistractor conditions [1,124 ms vs. 1,136 ms], t(23) =  − 1.26, p = 0.221, Cohen’s d =  − 0.26, BF10 = 0.30. Under different prior widths, the results of BF10 were similar and consistent with that in NHST. These results provided further evidence that participants could selectively ignore the complex polygons, but not the simple shapes.

Experiment 4: Does a polygon in VWM capture attention?

Searching interference was not observed in Experiments 2 and 3 when color was the task-relevant feature. Alternatively, the polygon could not guide one’s attention, even when in the active state of VWM. To rule out this alternative, we required participants to retain a polygon in VWM and examined whether it would capture their attention in a search task.

Methods

Participants

Twenty volunteers (three males and 17 females, M = 21.0 ± 1.3 years old) from Sun Yat-Sen University participated in this experiment for payment or course credit in 2022. The sample size was determined in a similar way as in Experiment 1. We predicted the effect size Cohen’s d to be 0.81 for the paired t test of Experiment 4 based on previous studies (Gao et al., 2016) for the effect of distractor presence. The suggested sample size should be approximately 16 to obtain at least 95% power for the effect of distractor presence at a significance level of 0.05. Twenty participants were recruited to ensure adequate power. All the participants were right-handed and reported normal or corrected-to-normal visual acuity. Signed informed consent was obtained before the study. The study was approved by the Research Ethics Board of Sun Yat-Sen University and performed according to the approved guidelines.

Stimuli and apparatus

The apparatus was the same as that in Experiment 1. The memory item was randomly selected from six different polygons as Experiment 1 and was always white (RGB: 255, 255, 255). Each item was presented at the screen center (1.27° × 1.27° visual angle). The search display was the same as that in the color-relevant condition of Experiment 2.

Experiment design and procedure

The experimental procedure was similar to the color-relevant condition in Experiment 2. The only difference was that the retro-cue was removed and participants always memorized a polygon. After a 500 ms fixation, a memory item was presented for 500 ms. Participants were instructed to memorize its shape. After a 1,300 ms blank interval, the search display appeared. Participants searched for the target (tilted) line as quickly as possible and judged whether it tilted to the left or right (“J” for right, “F” for left, 50% trials in each case). Responses were to be completed within 2,000 ms. Finally, a polygon appeared after a 400-ms blank interval. Participants needed to answer whether the memory item matched the probe by pressing corresponding keys (“J” for yes, “F” for no, 50% trials in each case). The responses should also be completed within 2,000 ms. Once participants responded, the probe was removed. The intertrial interval was randomly selected from a uniform distribution between 1,000 and 1,500 ms. Feedback was provided only during practice in both the tasks (Fig. 7).

Fig. 7
figure 7

Procedure illustration and the experimental condition in Experiment 4. Participants were first required to remember the shape of the memory item. Then a visual search display appeared in which the memory item was presented as a distractor (distractor condition) or not (nondistractor condition). In the end, participants should answer whether the probe in the center was the same as the memory item

The experiment used a one factor (distractor presence: distractor/nondistractor) within-subjects design. In the nondistractor condition, all shapes in the search display were different from the memory item. In the distractor condition, the memory item appeared as a distractor in the search display. Each condition contained 32 trials, which were randomly divided into two blocks. Before the formal trials, 16 practice trials were given to ensure that participants understood the procedure. The entire duration was approximately 15 min.

Data analysis

The accuracy of VWM task, the accuracy and RT in the visual search task were analyzed with a paired t test (distractor presence: distractor/nondistractor).

Results and discussion

Participants performed well on the VWM task (mean accuracy: 88.97%), which was not modulated by the factor of distractor presence [88.40% vs. 89.55%], t(19) =  − 0.865, p = 0.398, Cohen’s d =  − 0.19, BF10 = 0.41.

The t test for searching accuracy did not yield a significant effect of distractor presence [96.00% vs. 97.03%], t(19) =  − 1.037, p = 0.313, Cohen’s d =  − 0.23, BF10 = 0.46. Critically, the results of searching RT (Fig. 8) revealed that the effect of distractor presence was significant [1,010 ms vs. 977 ms], t(19) = 2.574, p = 0.019, Cohen’s d = 0.58, BF10 = 3.02. Importantly, BF10 also confirmed this significant effect. Under different prior widths, the results of BF10 were similar. Participants responded more slowly when the memory item appeared as a distractor in the visual search task. There results were consistent with previous studies (e.g., Bahle et al., 2018; Hollingworth et al., 2013; Ort et al., 2017), indicating that polygons in VWM can automatically guide attention. This ruled out the alternative that polygons could not guide attention even when in the active state of VWM. Additionally, based on the results reported in Luria and Vogel (2011), the memory performance of participants was similar when required to memorize a polygon or a color-polygon conjunction. Therefore, the fine-grained features effect on visual search in Experiment 4 is unlikely to result from the reduction in features (memory load) relative to Experiment 2. These findings further support our claim that the results of Experiments 2 and 3 were due to the removal of task-irrelevant polygons from active VWM.

Fig. 8
figure 8

The results of searching RT in Experiment 4. The bars represented group mean, with error bars indicating the within subject 95% confidence intervals. *p < .05. (Color figure online)

Cross-experiment analysis

To validate the attentional capture effects elicited by the polygons in Experiment 4 differed from Experiment 2, we conducted a cross-experiment analysis between Experiment 4 and the color-relevant condition in Experiment 2. A mixed ANOVA was performed for searching RT, with task-relevance of polygons (task-irrelevance in Experiment 2 vs. task-relevance in Experiment 4) as a between-subject factor and distractor presence (distractor vs. nondistractor) as a within-subject factor. The results showed that the main effect of task-relevance and the interaction effect were significant, distractor presence: F(1, 42) = 3.54, p = 0.067, ηp2 = 0.08, BF10 = 0.50; task-relevance: F(1, 42) = 14.97, p < 0.001, ηp2 = 0.26, BF10 = 32.75; interaction effect: F(1, 42) = 8.20, p = 0.006, ηp2 = 0.16, BF10 = 15.27. When polygons in VWM were task-relevant, participants responded more slowly in the distractor condition compared with the nondistractor condition [1,010 ms vs. 977 ms], t(19) = 2.574, p = 0.019, Cohen’s d = 0.58, BF10 = 3.02. However, for the task-irrelevant complex polygons, participants’ RT was not significantly different between the distractor and nondistractor conditions [800 ms vs. 807 ms], t(23) =  − 0.99, p = 0.331, Cohen’s d =  − 0.20, BF10 = 0.41. Under different prior widths, the results of BF10 were similar and consistent with those from NHST. These results provided additional evidence that polygons in VWM could capture attention, and the lack of attentional capture effects for polygons in Experiment 2 was attributed to their removal from active VWM.

General discussion

The current study uncovered selective maintenance mechanisms over the constituent features of object representation in VWM. Objects containing highly discriminable features (colored shapes) and composed of fine-grained features (colored polygons) were used as stimuli. The results showed that the irrelevant, highly discriminable feature interfered with search performance when it reoccurred in the subsequent search task (Experiment 1), which is consistent with the findings of previous studies (Park et al., 2017; Sasin & Fougnie, 2020). However, this was not true for the irrelevant fine-grained feature (Experiment 2). These findings suggest that it is easier to ignore fine-grained features than to ignore highly discriminable features, which is consistent with the prediction of the perception-VWM interactive model. Furthermore, by shortening the interval between the memory term and visual search task, fine-grained features could be ignored in Experiment 3. Finally, the results of Experiment 4 confirmed that a polygon in the active state of VWM could guide one’s attention automatically, suggesting that the ignored polygon was indeed removed from the active state of VWM in Experiments 2 and 3. Therefore, a dissociation in the mechanisms underlying selective maintenance in VWM was supported.

Our study contributes to the selective maintenance of objects’ constituent features in VWM in three ways; as a comparison, we refer to the study by Park et al. (2017). First, unlike that study, in which both cue and un-cued features were task-relevant, we set retro-cues as 100% valid, making the un-cued features entirely task-irrelevant. Hence, we exerted a pure and strict examination of the selective maintenance of an object’s feature in VWM. Second, we used a new and more sensitive measurement to probe the state of task-irrelevant items in VWM by exploring whether the irrelevant feature caused interference as a distractor in a visual search task during the delay. Third (and importantly), we were motivated by the interactive model of perception and VWM and suggested that the fate of highly discriminable features and fine-grained features be separately examined to have a better understanding of selective maintenance (Gao et al., 2011; Gao et al., 2009, 2010, 2013). The results supported our view, as the mechanisms underlying the selective maintenance of features were dissociable.

The findings of our study partially align with the research conducted by Sasin and Fougnie (2020), which revealed that task-irrelevant color stimuli can disrupt search performance. However, in contrast to our results, Sasin and Fougnie reported that task-irrelevant orientation, typically considered a simple feature, did not capture attention. This discrepancy may arise from the unique nature of orientation as a feature dimension within VWM. The findings in previous studies (Huang, 2015, 2020, 2022) suggest that orientation possesses distinct properties in VWM. For instance, while colors can be consolidated in parallel, orientations are consolidated strictly in a serial manner (Becker et al., 2013; Liu & Becker, 2013; Miller et al., 2014). Moreover, unlike colors, which can be involuntarily extracted and maintained in VWM (T. Gao et al., 2011; Z. Gao et al., 2010), orientations enter VWM only when prompted by top-down requirements (Woodman & Vogel, 2008), as proposed by the interactive model between perception and VWM (T. Gao et al., 2011). In this conceptual framework, orientations are akin to polygons, indicating a specialized mechanism for the selective maintenance of orientation information. From this perspective, the finding by Sasin and Fougnie (2020) that task-irrelevant orientations failed to capture attention does not contradict the results of our study, which reveal dissociable selective maintenance mechanisms for colors and polygons. The differential patterns observed between colors and polygons in our study suggest that these features engage distinct attentional processes and undergo separate mechanisms for selective maintenance.

The finding that the highly discriminable features of a dual-feature object could not be actively ignored contrasts with previous observations of the successful dropping of highly discriminable features at distinct locations (Gözenman et al., 2014; van Moorselaar et al., 2015). This difference rules out the alternative that the dual-feature object is maintained in a feature-based manner in VWM, which predicts a similar dropping mechanism for the aforementioned two conditions. We argue that—although we simply instructed participants to remember both color and shape without emphasizing the binding between them—VWM automatically bound the two features as an integrated unit (see also Song et al., 2016). Further, the fact that VWM can remove an object’s fine-grained features from the active state of VWM implies that fine-grained features may not be integrated into the object files; rather, they may be in a floating state in VWM. Alternatively, fine-grained features may be integrated into the object files as simple features, but with a lower level of strength compared with the latter. In this sense, this finding adds new evidence supporting the interactive model of perception and VWM, which suggests that fine-grained features entering VWM require top-down control and are not stably retained. However, whether the state of fine-grained features in VWM can be changed by emphasizing the binding between the two features requires future elucidation.

Our findings provide empirical evidence supporting the interactive model of perception and VWM in the context of VWM manipulation. This model suggests that the selection, consolidation, and maintenance of information within VWM depend on its processing characteristics in terms of perception. As such, the model initially focuses on VWM encoding and storage. However, whether the manipulation of VWM representations under the top-down requirement is compatible with the predictions of the interactive model has remained unknown. Our study fills this gap. Our study is the first to demonstrate that the perceptual characteristics of a feature can affect the selective maintenance of an object’s features in VWM, broadening the scope of the interactive model. Taking previous VWM encoding and the current manipulation findings together, we argue that the property of a feature’s perceptual processing is a critical factor in determining the processing mechanisms in VWM.

Our results also demonstrate that in addition to highly discriminable features, fine-grained features in VWM can also automatically capture attention. Although substantial evidence has shown that VWM contents can capture attention, consideration has mainly been given to simple features, such as colors (e.g., Bahle et al., 2018; Hollingworth & Beck, 2016) and simple shapes (e.g., Hollingworth et al., 2013; Soto et al., 2006). Our finding is consistent with that of previous studies showing that complex features, like real-world objects (e.g., Houtkamp & Roelfsema, 2006; Ort et al., 2017) and faces (e.g., Moriya et al., 2014; Rutkowska et al., 2022), can guide one’s attention as well.

Alternative explanations are possible to address the lack of attentional capture effects for complex shapes. First, a complex shape may have weaker representational strength when encoded along with a color feature in VWM, resulting in the lack of attentional capture effects. This alternative can be ruled out considering the findings of a past study by Luria and Vogel (2011) and the results of Experiment 4. Luria and Vogel (2011) highlighted that the memory precision for a polygon does not significantly differ from that for a conjunction of color and polygon. This finding indicates that the representational strengths of polygons in Experiments 2 and 4 could be comparable. Meanwhile, polygons within VWM in Experiment 4 automatically guided attention. Therefore, the absence of attentional capture effects for polygons in Experiment 2 cannot be solely attributed to weaker representational strength arising from memory load. Second, polygons may be encoded as abstract representations, which have been demonstrated to have limited influence on visual search (e.g., Gayet et al., 2018; Olivers et al., 2006). This notion is also not likely in that our study utilized randomly generated complex shapes, which inherently posed challenges in terms of verbal labeling and abstract representation. Besides, if the lack of attentional capture effects for polygons in Experiment 2 were due to their abstract representations, then the polygons in Experiment 4 should have likewise failed to capture attention. However, as we observed contrary results in Experiment 4, abstract representation alone may not fully explain the observed differences in attentional capture. Finally, the saliency or familiarity of the feature could also affect attentional capture. The results of Experiment 4 address this concern. Specifically, if the lack of attentional capture effects for complex shapes in Experiment 2 were solely attributed to their low saliency or familiarity, one would anticipate a similar lack of attentional capture for polygons in Experiment 4. However, the results in Experiment 4 indicate otherwise, suggesting that factors beyond saliency or familiarity contributed to the attentional capture effects observed in our study.

Another possibility is that processing characteristics of perception suggested by the interactive model of perception and VWM are not the only factors determining the fate of selective maintenance. First, simple features, such as colors and shapes, possess different degrees of verbal coding difficulty compared with complex polygons. As mentioned previously, in the current study, we employed randomly generated complex polygons, which are inherently difficult to encode verbally. In contrast, simple features are more likely to benefit from verbal coding. Consequently, in the process of eliciting the differential fate of task-irrelevant simple and complex features in VWM, verbal coding may play a parallel role. Future research can isolate the decisive factors that give rise to these disparate effects of task-irrelevant simple and complex features in the visual search task. The results of our study may also be explained by certain more intrinsic, environmentally neutral characteristics of stimuli, such as complexity or featural strength (Huang, 2022), which can be continuously quantified (Huang, 2022; Sablé-Meyer et al., 2022). However, only two extreme and separate stimuli were examined in the current study, so more stimuli with continuous changes in these characteristics could be used in the future to identify the determinants of selective maintenance and whether selective maintenance continuously changes as the stimuli change.

Finally, we acknowledge that whether the irrelevant simple feature is just out of active VWM or is discarded from VWM is unclear. Neuroscience research has recently discovered that information in VWM can be divided into two states: the “activation” state, in which information is activated when related to the current task, and the “silent” state, wherein irrelevant information is implicitly latent but not excluded from VWM (Christophel et al., 2018; Rose et al., 2016). These two states are reversible under certain conditions. Neurally silent representations can be reactivated when they become task-relevant (LaRocque et al., 2013; Mikael et al., 2018). They can also be reactivated by an exogenous manipulation of neural activity, such as an impulse or transcranial magnetic stimulation (Rose et al., 2016; Wolff et al., 2017). In this case, irrelevant information is not the focus of attention within VWM (LaRocque et al., 2013; Lewis-Peacock et al., 2012). However, if irrelevant information is removed from VWM, latent representations cannot be reactivated. These two conditions share certain common points. For instance, neither information out of the focus of attention of VWM nor information out of VWM can guide attention (Peters et al., 2009). Therefore, the finding that task-irrelevant polygons did not capture attention in the current study did not distinguish whether irrelevant information was discarded or in a latent state, which should be addressed in future studies.

Constraints on generality

The stimuli included six colors, six shapes, and six polygons, the former two representing highly discriminable features and the last one representing fine-grained features. All of them were randomly chosen—therefore, we expect our results to generalize to other colors, shapes, and polygons. Besides, given that both colors and simple shapes induced similar effects as highly discriminable features, we expect other commonly used simple memory stimuli to fit to the results (e.g., orientation). However, without the commonly assumed standard for complexity, we are cautious to generalize the effects of fine-grained features with other complex features (e.g., face). Besides, our research primarily regards complex polygons as fine grained, while considering color and regular shape as simple features. We acknowledge that we haven't delved into the distinction between simple and fine-grained color itself, which is a limitation of our study. Undergraduate and graduate students were sampled from the subject pool at Sun Yat-sen University and Zhejiang University. We believe the results will be reproducible with students from similar subject pools serving as participants. Meanwhile, we do not expect the effect to depend on cultural norms. We have no reason to believe that the results depend on other characteristics of the participants, materials or context.

Data vailability

All data and materials have been made publicly available via Open Science Framework and can be accessed online (https://osf.io/mxn86/).