Behavioral signatures of the rapid recruitment of long-term memory to overcome working memory capacity limits

Adam, Kirsten C. S.; Zhao, Chong; Vogel, Edward K.

doi:10.3758/s13421-024-01566-z

Behavioral signatures of the rapid recruitment of long-term memory to overcome working memory capacity limits

Published: 14 May 2024

(2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Memory & Cognition Aims and scope Submit manuscript

Behavioral signatures of the rapid recruitment of long-term memory to overcome working memory capacity limits

Download PDF

Kirsten C. S. Adam¹,
Chong Zhao² &
Edward K. Vogel²

196 Accesses
1 Citation
6 Altmetric
Explore all metrics

Abstract

Working- and long-term memory are often studied in isolation. To better understand the specific limitations of working memory, effort is made to reduce the potential influence of long-term memory on performance in working memory tasks (e.g., asking participants to remember artificial, abstract items rather than familiar real-world objects). However, in everyday life we use working- and long-term memory in tandem. Here, our goal was to characterize how long-term memory can be recruited to circumvent capacity limits in a typical visual working memory task (i.e., remembering colored squares). Prior work has shown that incidental repetitions of working memory arrays often do not improve visual working memory performance – even after dozens of incidental repetitions, working memory performance often shows no improvement for repeated arrays. Here, we used a whole-report working memory task with explicit rather than incidental repetitions of arrays. In contrast to prior work with incidental repetitions, in two behavioral experiments we found that explicit repetitions of arrays yielded robust improvement to working memory performance, even after a single repetition. Participants performed above chance at recognizing repeated arrays in a later long-term memory test, consistent with the idea that long-term memory was used to rapidly improve performance across array repetitions. Finally, we analyzed inter-item response times and we found a response time signature of chunk formation that only emerged after the array was repeated (inter-response time slowing after two to three items); thus, inter-item response times may be useful for examining the coordinated interaction of visual working and long-term memory in future work.

Long-term memory guides resource allocation in working memory

Article Open access 17 December 2020

Is long-term memory used in a visuo-spatial change-detection paradigm?

Article Open access 08 June 2021

Does value-based prioritization at working memory enhance long-term memory?

Article Open access 20 February 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Because of working memory’s limited capacity, we need to flexibly recruit long-term memory to accomplish our everyday goals. For example, when performing a routine grocery shopping trip, it is not feasible to hold 20 different items actively in working memory. However, by taking advantage of associations in long-term memory, we can strategically retrieve “chunks” of items to effectively shop for all 20 items (e.g., “I should buy the ingredients I need to make caesar salad, lasagna, and tiramisù”) (Bower, 1972; Cowan, 2001; Ebbinghaus, 1885, 1913). Arguably, scenarios like this one are the most common way that we use working memory in the real world. We frequently need to flexibly shuttle information back and forth between working- and long-term memory. Doing so allows us to take advantage of the strengths of each memory system 一 Information in working memory is easily accessible and manipulable but capacity-limited (Baddeley & Hitch, 1974; Cowan, 2001; Luck & Vogel, 1997), whereas information in long-term memory is nearly capacity unlimited but often takes time and effort to retrieve (Beck & van Lamsweerde, 2011; Brady et al., 2008; Mandler & Ritchey, 1977; Squire & Zola, 1996; Standing, 1973; Standing et al., 1970; Wolfe et al., 2023).

Probing interactions of visual working and long-term memory

Although working and long-term memory are typically used in tandem, researchers make an effort to devise working memory experiments that prevent the contribution of long-term memory so that the unique constraints of working memory can be better characterized. For example, visual working memory is often studied by asking people to remember simple shapes or colors. Because these abstract items have low meaningfulness, these tasks help us to estimate the capacity of working memory in the absence of support from long-term memory associations.^{Footnote 1} Over the past many decades, careful work dissociating working and long-term memory has been important for our understanding of these memory systems and their neural correlates (Baddeley & Warrington, 1970; Christophel et al., 2017; Jeneson & Squire, 2011; Luck & Vogel, 1997; Milner & Penfield, 1955; Scoville & Milner, 1957; Serences et al., 2009). However, by focusing primarily on each memory system in isolation, we may miss important insights about how information is flexibly shuttled between both working and long-term memory in everyday life. One approach to closing this gap has been to use more realistic stimuli in working memory tasks, such as photographs of real-world objects and scenes (Brady et al., 2016; Brady & Störmer, 2022; Endress & Potter, 2014; Quirk et al., 2020). One advantage of these stimuli is that they may allow participants to draw on long-term memory via familiarity and meaningful associations (Asp et al., 2021; Jackson & Raymond, 2008; Ngiam et al., 2019b; Reder et al., 2013; Xie & Zhang, 2017, 2022). However, a disadvantage of using real-world objects is that they have associations in long-term memory that are already pre-formed, and the experimenter cannot directly control or observe how the formation of long-term memory associations influences working memory performance.

Rather than changing the memoranda to be more realistic, a second approach for studying interactions of working and long-term memory is to use artificial stimuli, but to introduce controlled opportunities for long-term memory to aid performance. In this vein, prior work has examined how incidental repetitions of memoranda may improve visual working memory performance (i.e., via Hebbian or implicit learning). Surprisingly, initial work on this topic found that visual working memory capacity was stubbornly resistant to improvement (Fukuda & Vogel, 2019; Logie et al., 2009; Olson & Jiang, 2004). For example, Olson and Jiang (2004) found that even after 24 repetitions of the same memory array, participants performed no better than as if they were seeing the array for the first time. The lack of effect of repetitions on visual working memory performance is puzzling, because it contrasts with a rich body of work that shows that memory for verbal memoranda is improved with incidental repetitions (Hebb, 1961; Page et al., 2013; Sukegawa et al., 2019). As such, recent work has begun to systematically explore which factors may prevent versus allow Hebbian learning from incidental repetitions of visual working memory arrays (Musfeld et al., 2023a, 2023b; Souza & Oberauer, 2022). For example, Musfeld et al. (2023b) found that retrieval practice and the expected difficulty of the test both influence whether or not working memory performance improves when arrays are incidentally repeated over time.

Here, we turned our focus from incidental to explicit repetitions of working memory arrays. Explicit repetition of visual working memory arrays has been infrequently examined, so our main goal was to characterize how quickly and to what extent participants can use long-term memory to overcome visual working memory capacity limits when directed to do so intentionally. Indeed, prior work suggests that working memory plays a particularly important role when learning is intentional as opposed to incidental. For example, Unsworth and Engle (2005) found that individual differences in working memory capacity predicted learning in a serial reaction time task in conditions with intentional, but not incidental, learning. To this end, we devised an experimental paradigm to probe the explicit coordination of working and long-term memory. Specifically, we simply instructed participants that the same visual array would repeat for many trials in a row, and that they should use any strategy available to them to improve their performance across repetitions. We predicted that we should initially find that participants’ performance is bound by typical working memory capacity limits (i.e., ~3 items), but after many repetitions participants may begin to use long-term memory to augment performance.

Inter-response times as a measure of chunk formation

In addition to examining how accuracy improves with repetitions, we also planned to examine how response latencies may track the formation of new “chunks” in a visuospatial memory task. Here, we used a “whole-report” visual working memory task, where participants are asked to report the color of all memory items. Because participants report multiple items, we may examine not only the number of correctly reported items, but also how quickly participants make individual responses. In particular, we were inspired by prior work measuring inter-response times, defined as the time between pairs of responses as participants make many responses in a row (Anderson & Matessa, 1997; Broadbent, 1975; Browman & O’Connell, 1976; Chase & Ericsson, 1982; Chase & Simon, 1973; Lovelace & Snodgrass, 1971; Lovelace & Spence, 1972; Murdock & Okada, 1970; Reitman, 1976). During free recall and search through long-term memory, inter-response times tend to increase as the memory set^{Footnote 2} is exhausted (Bousfield & Sedgewick, 1944; Murdock & Okada, 1970; Rohrer, 1996; Wixted & Rohrer, 1994). In addition to this general slowing over time, clustering of inter-response times can be a useful, quantifiable signature of chunk utilization, whereby “intra-chunk” response times are faster than “inter-chunk” response times (McLean & Gregg, 1967). McLean and Gregg (1967) articulated a framework in which chunks may be formed in three key ways: (1) via prior knowledge, (2) via grouping cues during encoding, and (3) via top-down associations (i.e., new associations formed by attention, rehearsal, or some other process).^{Footnote 3} This framework has remained important for later theories of the role of chunking in working memory (e.g., Cowan, 2001).

Inter-response time signatures of chunking have previously been observed when chunks are formed via prior knowledge or during encoding. First, studies examining recall of previously learned sets (e.g., countries of Europe) have found slowing of inter-response times in clusters of three to four items (Broadbent, 1975; Graesser & Mandler, 1978). Second, studies introducing grouping cues during encoding of letter and digit sequences have found a slowing of inter-response times when the recall of a new group begins (Anderson & Matessa, 1997; McLean & Gregg, 1967). Few studies, to date, have looked at inter-response time signatures of chunking when groups are formed via new top-down associations, as we plan to do here (Miller & Unsworth, 2018). However, other putative signatures of chunk utilization have been observed when observers repeatedly recall a word list multiple times in a row (i.e., “multitrial free recall”). Rather than response times, prior work has examined response consistency during multitrial free recall. Namely, when participants encode the same word list multiple times in a row (with the words presented in a randomized order during each list presentation), participants begin to recall the items in a consistent order each time they recall the list (Dunlosky & Salthouse, 1996; Miller & Unsworth, 2018; Sternberg & Tulving, 1977; Tulving, 1962, 1966).

In the current study, we were particularly interested in observing how chunks formed via new, top-down associations may benefit performance in a visual memory task. Our whole-report task with explicit repetitions is a novel, visuospatial analogue of the verbal “multitrial free recall” task. The present study will test if classic behavioral signatures of chunk utilization that have been established with verbal memoranda will generalize to visuospatial tasks. Few studies have examined inter-response times in the context of visual memory (with the notable exception of expertise and chess; Chase & Simon, 1973; Reitman, 1976), in part because of the popularity of change detection measures of visual memory that collect only a single response on each trial (e.g., Luck & Vogel, 1997). In the present experiments, we predicted that when participants recruit long-term memory to improve performance beyond typical capacity limits, we would see behavioral signatures of chunking such as “pauses” in the inter-response times and/or increased consistency of response order during recall.

Summary

To preview results, we found a rapid and robust effect of explicit repetitions on performance. Even after only one repetition, participants’ performance exceeded typical capacity limits. By the eighth and final repetition of an array, participants had a modal performance that was perfect (six to eight items correct). In addition, an analysis of inter-response times is consistent with the idea that participants organize their responses by retrieving ‘chunks’ from long-term memory. Together, these findings illustrate how long-term memory may rapidly assist cognition in tasks that overwhelm working memory capacity, and that inter-item response times can be used to track the formation and deployment of chunking strategies in visual memory tasks.

Experiment 1: Six-item arrays

Methods

Participants

Participants were recruited from the University of Chicago and the surrounding community. Participants provided informed consent under procedures approved by the University of Chicago Institutional Review Board. All participants (27 female, 25 male) were 18 years or older (M = 21.88 years, SD = 3.66 years, range = [18,36]), had self-reported normal color vision and normal or corrected-to-normal visual acuity, and received course credit or cash ($10/h) for their participation. A total of 52 participants took part in the study. Data from two participants were excluded for failure to comply with task instructions (i.e., chance-level performance), leaving a total of 50 participants for analysis. The study procedures were not pre-registered, and the sample size was determined by convenience (i.e., data collection up to a conference deadline). With 50 subjects, we would be powered to detect medium within-subjects effects at 90% power (e.g., within-subjects t-test, critical t(49) = 2.01, d_z = .47; repeated-measures ANOVA with one within-subjects factor (e.g., repetition, eight levels), critical F(7,343), 2.04, Cohen’s f = 0.15, η² = 0.02) (Faul et al., 2007; Kim, 2016).

Stimuli

Participants were seated in a dimly lit room and viewed a 24-in. BenQ LCD monitor with a 1,920 x 1,080 resolution from a distance of ~67 cm. Stimuli were presented with MATLAB (The MathWorks, Natick, MA, USA) and the Psychophysics toolbox (Brainard, 1997; Kleiner et al., 2007). A fixed set of nine highly discriminable colors were used for the colored square stimuli in all three memory tasks (red: 255 0 0; green: 0 255 0; blue: 0 0 255; yellow: 255 255 0; magenta: 255 0 255; cyan: 0 255 255; white: 255 255 255; black: 1 1 1; orange: 255 128 0), and colors were always chosen without replacement for each memory array. Throughout each task, a black fixation dot was drawn at the center of the screen (radius = 6 pixels, 0.14°) and stimuli were presented on a medium-gray background (RGB = 85 85 85).

Discrete whole-report task with repetitions

A total of 30 unique arrays were generated by picking six semi-random locations and assigning a unique color (drawn from the set of nine possible colors) to each location. The locations were semi-random in that they were chosen with some constraints, such that items were separated by a minimum distance of 36 pixels (~0.9° of visual angle) and were split evenly across the left and right hemifields. Each colored square had a diameter of 72 pixels (~1.7°) and the possible locations were in a portion of the screen centered on fixation and subtending 1,066 x 600 pixels (7.1° above/below fixation and 12.6^o left/right of fixation).

Surprise long-term memory recognition task

For the long-term memory recognition task, we showed participants a total of 60 arrays (30 old, 30 new). The old arrays were identical to those used in the whole-report task. The new arrays were randomly generated with the same size, color, and location constraints as in the working memory task.

Color change detection task

On each trial, a new array containing four, six, or eight colored squares was drawn. The stimuli were the same size and drawn in the same nine colors as the whole-report task, and the same location constraints were used.

Procedures

Participants completed a discrete whole-report task with array repetitions, a surprise long-term memory recognition task for the arrays presented in the whole-report task, and a color change detection task. These three tasks were presented in a fixed order for all participants.

Repeated-arrays working memory task

Participants completed a variant of a discrete whole-report working memory task (Adam et al., 2015; Huang, 2010) in which arrays repeated eight trials in a row. On each trial, participants saw a briefly presented array of six colored squares (150 ms). After a short delay (1 s), participants reported the colors of the squares. A 3 x 3 grid of possible color choices appeared at each location, and participants selected the color that belonged at each response grid location. Participants were required to make a response to all six squares before they could advance to the next trial. After the last response was made, the next trial began after an inter-trial interval of 1 s. Critically, the same array was repeated eight times in a row. On the first trial of a set of repetitions, a new configuration of square colors and locations was randomly chosen. This array was then used for the next seven working memory trials in a row (i.e., trials 1–8 were array #1, trials 9–16 were array #2, etc). Participants were given explicit instructions that each unique array would be repeated for eight trials in a row, and that they should try to improve their performance across the eight repetitions. Participants completed a total of 240 working memory trials (eight trials each of 30 unique arrays).

Old-new recognition task

After completing the repeated arrays working memory task, participants completed an old-new recognition task for the 30 arrays that were used in the repeated arrays working memory task. Participants were not informed beforehand that they would be tested on their long-term memory of the arrays in the previous task. On each trial, participants viewed an array of colored squares. On half of the trials, the participants were shown an old array (an array that was previously seen in the discrete whole-report task). On the other half of the trials, the participants were shown a new, randomly generated array with the same stimulus constraints (i.e., six colored squares drawn at new random locations). Participants reported via keypress if they thought the array was “old” (“Z” key) or “new” (“/” key) and they reported their confidence about the decision on a 5-point scale (using the number keys 1–5 on the keyboard). All responses were unspeeded.

Color change detection task

To assess baseline working memory performance with an independent task, we used a standard color change detection task (Luck & Vogel, 1997). On each trial, participants saw a briefly presented array of four, six, or eight colored squares (150 ms), and remembered the colors of the squares across a blank delay (1 s). At test, a memory probe was shown at one of the squares’ locations. On half of the trials (“same” trials), the probe was the same color as the remembered item at the same location. On the other half of trials (“different” trials), the probe was a different color. Participants responded via keypress whether they thought the probe square was the same (“Z” key) or different (“/” key) from the remembered color of the square presented at the probe’s location. Participants completed a total of 180 trials of the color change detection task (60 trials per set size).

Analysis

Analyses were performed using MATLAB 2018a (The MathWorks) and Python 3.9.7 (conda 4.12.0). Data from the raw .mat files were processed in MATLAB and converted into aggregate .csv files for the main analyses in Python. Key open source packages for Python analyses include Jupyter (Kluyver et al., 2016), pandas (McKinney, 2010), seaborn (Waskom, 2021), pingouin (Vallat, 2018), and pymer4 (Jolly, 2018) Fig. 1.

Results

Performance rapidly increased across array repetitions

To characterize how performance changed as a function of repetition, we first analyzed mean performance (Fig. 2A). In the whole-report task, performance is quantified as the number of locations for which participants correctly recalled the item’s color, and this value ranges from 0 to 6 on each trial. The first time the participants saw an array (Repetition 1), mean performance was in line with typical estimates of working memory capacity (M = 2.79 items correct, SD = 0.45). Mean performance significantly increased across repetitions, F(7,343) = 330.6, p < 1x10^-45, η_p² = 0.871.^{Footnote 4} By the final repetition, participants’ performance was near ceiling and had nearly doubled from the first repetition (M = 5.32, SD = 0.91). To quantify the rate of performance improvement on average, we calculated difference scores for adjacent repetitions (e.g., Repetition 2–1, Repetition 3–2, etc.). On average, participants’ performance improved by 0.36 items per repetition (SD = .11), with faster learning across the first four repetitions (M = .71, SD = .24) compared to the last four repetitions (M = .10, SD = .09), t(49) = 16.4, p < 1x10^-20. In Experiment 1, the ceiling was six items correct. As such, the slowing of learning at later repetitions may have been driven by participants hitting the performance ceiling for the task.

In addition to looking at mean performance for each repetition, we also looked at the distribution of performance outcomes (Fig. 2B). One notable aspect of the performance distributions is the increase in the number of trials where participants correctly recalled six out of six items. In a typical whole-report working memory task, participants rarely get six items correct, and these rare “perfect” trials can be explained by guessing inflation (i.e., participants never really store six items, but they sometimes get lucky and get six correct by chance because they are required to make a response to every item; see Adam et al., 2015). The first time participants saw an array (Repetition 1), we found a similar pattern of performance. There was a strong modal tendency toward getting three items correct, and participants very rarely got all six items correct (M = 0.6%, SD = 1.87%). As early as the second encounter with the array (Repetition 2), the number of perfect trials increased 25-fold (from 0.6% to 15%). By the final encounter with the array (Repetition 8), the modal tendency was six out of six correct (M = 65.5%, SD = 24.6%).

Inter-response times and chunk utilization

Prior work has hypothesized that inter-response times can be used as a signature of retrieving a new chunk from long-term memory (e.g., Broadbent, 1975). Inter-response times are calculated as the time in between successive responses, and a long pause may indicate that a participant is engaging in planning for the next series of responses and/or retrieving information from long-term memory. The inter-response times are shown in Fig. 3A. The response time was longest the first time participants saw an array (Repetition 1), and the successive responses became quicker. Starting on the second repetition of the array (Repetition 2), we observed a marked slowing at the fourth response. A two-way repeated-measures ANOVA with within-subjects factors Response Number and Repetition confirmed that there was a significant interaction of Response Number and Repetition on inter-response times, F(35,1715) = 6.34, p < .001, η_p² = .11. To better understand the effect of repetition on inter-response times, we conducted separate one-way repeated-measures ANOVAs with factor Repetition separately for each response. To visualize the meaning of these tests, the data for each response are replotted in separate subplots in Fig. 3B.

We found significant effects of repetition on inter-response times for the first response, F(7,343) = 8.81, p < .001, η_p² = .15, for the fourth response, F(7,343) = 5.27, p < .001, η_p² = 0.10, for the fifth response F(7,343) = 8.12, p < .001, η_p² = 0.14, and for the sixth response, F(7,343) = 4.94, p < .001, η_p² = 0.09. In contrast, there was no effect of repetition on inter-response times for Response 2 (p = .92) and Response 3 (p = .74). This pattern of response times would be consistent with a chunking strategy where participants formed an initial chunk of three items on Repetition 1 that they used throughout the subsequent repetitions. However, starting at Repetition 2, participants appear to have become more efficient at using their already formed chunk (faster response times for response 1 with repetition) and devote extra time during the fourth response to form and recall a second chunk of three items (slower response times for response 4 with repetition).

In addition to response times, we also examined whether participants recalled items in a consistent order by computing transition probabilities between all pairs of items. A transition probability of 100% would indicate that participants reported a pair of items in the same order for all eight repetitions of the array. We found that participants reported items in an order that is more consistent than would be expected by chance (p < .001), with the highest two transition probabilities exceeding 90% (see Online Supplemental Material (OSM) Analysis S2; Fig. S1A). The empirical pattern that we observed is consistent with an account in which participants formed links between the first three items starting on Repetition 1 (i.e., two transition probabilities: Item 1->2 and Item 2->3), and then developed a consistent response order for the remaining items during later repetitions. Furthermore, participants’ response order was more consistent for the later repetitions of the array (Repetitions 5–8) than for the early repetitions of the array (Repetitions 1–4; Fig. S1B (OSM)). This is consistent with the notion that participants first successfully remembered a few items, and then added in more items as the array was repeated (see also Fig. S2A (OSM)).

Successful recognition of repeated arrays in an old-new recognition task

We hypothesized that participants were able to exceed typical working memory capacity limits by rapidly recruiting long-term memory. Given this hypothesis, we next tested whether participants could reliably distinguish learned arrays from novel arrays in the old-new recognition task. We quantified long-term memory performance as d-prime (d’; Fig. 4A), calculated as the normalized difference between hit rate and false alarm rate (d’ = z(H) - z(FA) where z() is the z-transform) (e.g., Banks, 1970). Overall recognition performance was d’ = 0.45 (SD = 0.40), and this was significantly above chance t(49) = 7.94, p < .001 (one-tailed t-test^{Footnote 5}). For correlations between individual differences in change detection performance, learning rate in the whole-report task, and recognition memory performance, see Analysis S1 (OSM).

We next examined whether recognition memory performance varied as a function of confidence. Figure 4B shows the distribution of confidence ratings. Overall, the distribution of confidence scores was fairly even; a one-way repeated-measures ANOVA revealed no difference in the frequency with which participants used each confidence level (p = .12). To examine recognition memory performance as a function of confidence, we divided trials into “low-confidence” (< 3) and “high-confidence” (> 3) bins. Given the total number of trials available for analysis (60), not all participants had sufficient numbers of trials to determine d’ for both the low- and high- confidence bins (i.e., 0 hit or false alarm trials in a given confidence bin). After excluding subjects with insufficient data, there were 41 subjects available for a within-subjects analysis. A paired t-test revealed a significant effect of confidence on memory performance, t(40) = 2.72, p = .009, where memory performance was significantly better for high-confidence trials (M = 1.23, SD = 1.56) compared to low-confidence trials (M = 0.32, SD = 1.24)^{Footnote 6} (Fig. 4C). Memory performance was significantly above chance for high-confidence trials (p < .001) but not for low-confidence trials (p = .06).

Experiment 2: Eight-item arrays

To replicate our results, we performed a second experiment that closely parallels Experiment 1. The only key change that we made was to increase the set size to eight items instead of six items for each array. By raising the set size, we increased the potential performance ceiling even further beyond typical capacity limits of three to four items.