Introduction

In the present study we addressed the issue of working memory (WM) updating in aging. Currently, WM is considered “the key to understanding cognitive processes and failures in many domains” [1]. In fact, WM appears to be more sensitive to aging than other forms of memory, such as short-term memory (STM). These two constructs both refer to memories that are active over a brief period of time. However, STM requires retention and subsequent recall of a given set of information (e.g., retention and recall of a new phone number), while WM requires retention, and subsequent action, dependent on a given set of information (e.g., recognition that only 3 digits out of 7 are changed in the new phone number; thus the old, still relevant, 4 digits should be retained, while substituting the irrelevant 3 ones).

Updating information is one of the most crucial mechanisms through which WM operates and may adapt rapidly to environmental change. In fact, updating consists of selecting and maintaining available relevant information, and removing it from memory once it is no longer relevant; in other words, allowing modification of part of a representation in memory, while the rest remains unaltered (see seminal work by [2]). Others mechanisms play a role in explaining this cognitive flexibility (part of general executive functions), such as the ability to override irrelevant information or switch between different information sets (see [3], for a core view of executive functions).

Previous empirical evidence [4, 5, 6, 7, 8] has shown clear age-related differences in the WM updating process. Other recent contributions have advanced investigation of the role of updating for verbal and visuospatial material [9], pinpointing how the focus of attention may keep information activated in WM, and how updating efficiency clearly declines with age [10].

Old and Naveh-Benjamin [11] proposed to distinguish between memory for isolated/single contents and memory for contents enclosed in a context. In their meta-analysis (originally focusing on episodic long-term memories), they found the deficit observed in the elderly specific to association of a single content to the context it was embedded in, rather than memorization of isolated memory contents.

Looking at age differences from this perspective is clearly valuable and has been evaluated across various tasks and memory systems. Chen and Naveh-Benjamin [12] found a specific associative deficit across short-term and long-term memories, while Oberauer [13] found evidence that deficit in content-context bindings is more likely to explain recollection performance in older adults, rather than inhibition. Boujut and Clarys [14] supported the role of content-context bindings in episodic recollection, finding that age-related decline in remembering is due to failures in binding updating processes. Similarly, Van Geldorp et al. [15] reported the decline across WM and episodic memory systems as following a specific associative process.

However, despite the relevance contextual bindings may have in accounting for WM decline, no studies have specifically embedded it within an updating task. In fact, traditional tasks usually test updating of single memory items. For example, in Morris and Jones’ running memory span task and tasks modelled after this, participants had to recall single items (i.e., digits) outside (and regardless of) the context in which they were embedded (i.e., the other adjacent items) [2].

Therefore, the present study aimed to explore this issue by using an updating task that required updating of single contents or content-context bindings, in order to observe specific differences. To this end, we used a task adapted from previous work with adults [16, 17] that also proved sensitive to differences within the elderly population [18]. The task was self-paced and used a response time (RT) measure (see [17]), for details of the benefits of RT measures in addition to accuracy). This was extremely easy for participants, with very low demands over memory; participants needed to encode a set of three memory contents (i.e., 3 consonants), and in each trial, maintain part of the set unchanged (one element, two elements or the whole set) and, when requested, to update it (that is, to substitute part of the set information. See an example in Fig. 1).

Fig. 1
figure 1

An example of an updating trial. After encoding the first triplet (TRB), participants had to maintain it actively in memory (+++). Next, they were requested to update the binding fully (i.e., replace the binding T-RB with S-RB). Lastly, they had to maintain the recently updated triplet. At the end, a single red probe was displayed: Here, they had to recognize whether the probed consonant belonged to the most recent studied/updated item or not. In the example, a positive probe was presented, to which they had to give a positive answer

To discriminate between single contents and content-context bindings specifically here, we manipulated the memory load (i.e., the number of items to be maintained from a set of three items). Thus, we compared different load conditions, related to a content updating process (i.e., where all WM contents needed to be updated), or to a content-context binding updating process (i.e., where only part of the WM content needed to be updated; see also Method section).

By comparing these different load conditions, we hypothesized a global cost for updating content–context bindings over single contents, as they can tap into the age-related decline specifically (e.g., [11]). Both content-context binding updating conditions should be more demanding (i.e., longer RT), compared to the content updating condition, and, obviously, to the control condition (where no actual updating occurs, and participants had to maintain memory contents only).

In addition, and specifically related to the content-context binding updating conditions, we hypothesized that if memory load is relevant during updating, this should interact with the updating efficiency. In other words, the more items have to be maintained, the more time the updating should take, with selective impairment seen in older adults (vs. younger adults). Age differences in load sensitivity would demonstrate the role of memory load in updating specifically when WM loses efficiency, as happens in aging.

In sum, by manipulating an updating task and measuring RTs elicited by this process, we aimed to clarify and differentiate content updating from content-context binding updating across different age groups. In addition, within binding updating, we aimed to investigate the role of memory load and its impact on the WM task specifically in aging.

Method

Participants

There were 89 participants in this experiment, divided into three age groups: 29 young adults (12 males; age range = 20–30; M = 25.52, SD = 3.02), 30 lower aged old adults (12 males; age range = 60–70; M = 64.79, SD = 3.11), and 30 higher aged old adults (12 males; age range = 71–85; M = 76.13, SD = 4.09).

Groups were recruited through local advertisement and volunteered for the experiment. All participants completed an informed consent form, prior to starting the study. None of the members of the groups (both young-old and old-old adults) reported any severe health problem potentially interfering with their cognitive functions. Young-old and old-old adults were tested individually, using the Mini Mental State Examination [19] to assess their general cognitive level. The mean average score for the young-old was of 27 (SD = 1.60) and for the old–old of 26 (SD = 1.22); therefore, all participants were included in the study.

The three age groups were matched for years of education: young adults (M = 13.52 years, SD = 3.87), young-old adults (M = 13.17 years, SD = 4.04) and old adults (M = 12.83 years, SD = 4.03). Importantly, there were no age group differences evident in this respect; F(2, 88) = .22, η 2 p  = .06, p = .81.

Materials and procedure

Participants were presented with a computerized memory updating task, adapted from previous work (e.g., [16]. The experimental session lasted approximately 40 min. The task was administered on a standard pc running the SuperLab software. Stimuli were 12 high-frequency consonants from the Italian alphabet.

The task consisted of four phase trials. These always started with an initial encoding phase, followed by a maintenance before updating phase, an updating phase, and ending with a final maintenance after updating phase. This last maintenance phase was implemented to minimize the use of recency-based strategies, which could bias performance [20]. At the end of the trial, participants received a probe recognition task, where a single consonant was displayed in red in the center of the screen. During the trial, only new consonants were presented; when a consonant did not change, the plus symbol [+] indicated this, in order to encourage active maintenance of previously presented information.

Participants were instructed to memorize a triplet of consonants, maintaining it unchanged or modifying it (i.e., updating it), in a self-paced fashion. Thus, participants needed to press the spacebar when they were ready to see the next screen, and the RT was recorded at this key press.

In the probe recognition task, participants had to recognize whether the single probe belonged to the most recent studied triplet or not. They answered by pressing one of two keys from the keyboard (i.e., M for a ‘Yes’ response, Z for ‘No’). Letters belonging to the final triplet of updated consonants required a positive answer (positive probes), whereas letters that were not presented (negative probes) required a negative answer. See Fig. 1.

Design

Two within-subjects factors were manipulated: trial and phase. We created four trial types (summarized in Table 1). To manipulate the content-context binding updating, we created two conditions: one where one consonant was to be updated, with two maintained (high memory load), and another, where two consonants were to be updated, with one maintained (low memory load). To manipulate the content updating, we created a condition where all the memory contents (i.e., all the memory load) had to be overwritten/replaced, and no actual updating took place. In addition, we had a control condition, where no updating occurred and all memory contents had to be maintained in WM.

Table 1 Summary of trials and relative examples: Time course of each trial from the initial memory set to the final set: (a) encoding, (c) updating, (b/d) maintenance, (e) final set from which the probed item is recognized

We had four phases as previously described, that is, encoding, maintenance before updating, updating, and maintenance after updating. At encoding, participants studied the triplet presented, and in the two maintenance phases, they had to keep this actively in memory. At updating, they needed to change the memorized triplet; this entailed replacing part of the triplet (i.e., one consonant, two consonants, or all three consonants), with one, two, or three new consonants. The position in which the single to-be-updated consonant appeared was randomized across trials; thus, it could be in the right, left, or central position. In the control condition, no such replacement occurred.

One hundred and twenty trials were administered, divided into two blocks. Each block contained equal numbers of trials, with their order of presentation randomized within blocks. Each trial appeared once per block. Half of the trials required a positive response, whereas the other half needed a negative response. After receiving instructions, each participant was presented with 16 practice trials and subsequently, with the two experimental blocks.

Summary of analyses

In the following section we report analyses on the error rate at recognition, on self-paced RT and on probe recognition RT. As dependent measures for the self-paced RT analysis, we recorded the RTs of each phase separately (i.e., encoding, maintenance, and updating phases). As dependent measures for the probe recognition, we recorded RTs on each type of probe, i.e., positive and negative.

Results

Error rate

Error rate was taken as the number of trials in which participants failed to detect the probe of an updated triplet correctly. On average, young adults made 3.45 errors out of 120 trials; young-old adults made on average 3.63 errors, and old-old adults made on average 3.80 errors. Error rates did not differ between groups, F(2, 88) = .53, η 2 p  = .02, p = .60. Only RTs for trials that ended with correct probe recognition were analyzed. Further, trials with RTs below 150 ms, or exceeding a participant’s mean RT for each condition by more than three intra-individual standard deviations, were considered outliers, and therefore excluded from analyses (1.81 %). These marginal percentage patterns replicated previous findings with young adults (e.g., [17]).

Analysis on self-paced RT

A mixed ANOVA, with age group (young, young-old, old-old) as between-subjects factor, and trial (high memory load, low memory load, overwriting, control) and phase (encoding, maintenance before updating, updating, maintenance after updating) as within-subjects factors, was conducted on RTs resulting from each phase.

We found a significant effect of age group, F(2, 86) = 14.73, η 2 p  = .26, p < .001, (1–β) = .99. Pairwise comparisons showed that overall, young adults were faster compared to both young-old adults (p < .001) and old-old adults (p < .001). Conversely, mean response latencies of young-old and old-old adults were comparable (p = .14).

We also found a main effect of trial and phase. The main effect of trial, F(3, 258) = 70.06, η 2 p  = .45, p < .001, (1–β) = .99, showed that all conditions took longer compared to the control (p < .001), and that both high and low memory load had longer response latencies than the content updating (or overwriting), p < .001; overall, high and low memory load conditions were comparable, p = .37.

A main effect of phase was either significant, F(3, 258) = 121.12, η 2 p  = .59, p < .001, (1–β) = .99. We showed longer RTs for encoding and updating phases compared to both maintenance phases (p < .001); in addition, the two maintenance phases were comparable, p = .27.

Most importantly, we found all interactions statistically significant. All of these are reported, but here the three-way interaction is focused upon, because this includes all the other results. The interaction between age group and trial was significant, F(6, 258) = 5.34, η 2 p  = .11, p < .001, as was the interaction between age group and phase, F(6, 258) = 9.65, η 2 p  = .18, p < .001. Similarly, trial and phase interacted, F(9, 774) = 87.94, η 2 p  = .51, p < .001.

The three-way interaction between age group, trial and phase also reached significance, F(18, 774) = 6.24, η 2 p  = .13, p < .001, (1–β) = .99. To enhance interpretation of this interaction, we have represented each phase by age group and trial graphically, plotting Fig. 2 into four different graphs (one for each phase).

Fig. 2
figure 2

Mean trial RTs as a function of age group, showing the four different phases of encoding; (a), maintenance before updating (b), updating (c) and maintenance after updating (d)

We conducted post hoc comparisons to show specifically: (1) an overall difference between phases, with analogous patterns across all age group; (2) a global slowing of RTs across phases for older participants, relative to younger; and (3) comparison at the critical condition of updating phase, to show whether high and low load binding updating conditions differ from the content updating condition, and resulting differences across groups.

Initially, we ran paired-sample t tests (1) and found that both encoding and updating were slower than maintenance phases, t(88) = 9.19, p < .001. In addition, both two maintenance phases were comparable across trial and group, t(88) = .42, p = .18. Finally, updating was significantly slower than encoding, t(88) = 8.49, p < .001.

Next, we ran independent-sample t tests (2) and found that across all trials, the younger were faster than the young-old, for encoding, t(57) = 3.60, p < .001, updating, t(57) = 4.44, p < .001, and maintaining information, t(57) = 3.86, p < .001. Similarly, younger participants were also faster than the old-old group, for encoding, t(57) = 5.86, p < .001, updating, t(57) = 5.57, p < .001, and maintaining information, t(57) = 3.51, p < .001. Comparing the two groups of older participants, we found that the young-old were significantly faster than the old-old at encoding information, t(58) = 2.30, p < .05. In contrast, no differences were observed at updating, t(58) = .46, p = .15, nor at either maintenance phases, t(58) = .23, p = .82. Thus, the additional slowing for older relative to the young-old participants observed in Fig. 2c/d, did not reach statistical significance.

For comparison (3), we ran paired-sample t tests to compare the updating phase in content-context binding updating (high memory load, low memory load) versus content updating (i.e., overwriting and control). We found that content updating trials were always easier (i.e., shorter RTs) compared to content-context binding updating ones; this occurred across all groups, p < .001, indicating that replacing a single content or simple maintenance (where no actual updating occurs) are not markers of aging.

Within content-context binding, we found no differences between high and low memory load in young adults, t(28) = .27, p = .80. However, in both young-old and old adults, we found that high memory load took longer than the low memory load, as shown in Fig. 2c (young-old: t(29) = 3.08, p = .005; old-old, t(29) = 2.08, p = .046).

Control analysis

From the analyses reported above, it can be observed that there is a specific slowing at encoding (first phase, beginning of the task), that could bias performance at updating. Thus, to control for this potential bias, we ran a further analysis on updating phase RTs, with the encoding phase RT as covariate. Importantly, the results did not change. In fact, we found that the encoding did not significantly interact with the updating, p > .01.

Analysis on probe recognition RTs

A mixed ANOVA, with age group (young, young-old, old-old) as between-subjects factor and trial (high memory load, low memory load, overwriting, control) and probe (positive, negative) as within-subjects factors, was conducted on RTs of correctly recognized probes.

The effect of age group was significant, F(2, 86) = 21.59, η 2 p  = .33, p < .001. Younger were faster compared to both older groups (p < .001), although young-old adults were faster than old–old adults, p < .05. The main effect of probe also reached, F(1, 86) = 100.82, η 2 p  = .54, p < .001, with negative probes requiring more time to be recognized than positive ones.

No other effect or interaction reached significance. In particular, the effect of trial was not significant, F(3, 258) = 1.20, p = .077, nor was the three-way interaction, F(6, 258) = 1.50, p = .18.

Discussion

Our data showed that older adults appear to have selective difficulty in WM updating content-context bindings, as opposed to single contents. In addition, binding updating performance was further delayed as a function of memory load.

We administered an updating task where participants were asked to update content-context bindings or single contents in WM. Overall, we observed a global delay (i.e., longer RT) in older relative to younger participants, across all phases and trials. This fits well with classical approaches to aging, suggesting that physiological deterioration of neural systems with increasing age is associated with parallel decrements in many cognitive abilities, including memory systems [21, 1, 22].

We compared four conditions where single memory contents (i.e., overwriting and control) or content-context bindings (high and low memory load) were manipulated. We found that both high and low memory load binding updating conditions were more demanding than content updating condition (i.e., overwriting) and control condition. This is particularly interesting to note, as in both these trial types, participants needed to act on single contents only. In fact, in control trials, participants are only required to maintain a set of information, and in overwriting trials, they have to maintain and then substitute a complete memory set (although no specific content-context binding was required). However, high/low memory load conditions included acting upon content-context bindings (rather than contents only) and therefore more cognitive operations were possibly needed (see [16]). The finding of a general cost in binding updating supports previous findings obtained with young adults (e.g., [17]. Here, in addition, we showed this effect in older adults; a finding that is consistent with other work investigating the ability to create and dismantle contextual bindings, even with more complex stimuli such as words [14].

More specifically, considering the content-context binding updating only, we showed that for older adults, the load (i.e., the number of items to maintain) made a difference during an updating task: high memory load was more demanding than low memory load. In contrast, this did not occur with younger participants, where high and low loads were comparable. It is worth mentioning that the effect happened at the updating phase only (Fig. 2c); load did not affect any other phase (Fig. 2a, b, d). Thus, in addition to the slowing due to updating of the content-context binding, we also found that loading WM, even marginally, produced a cost. Similarly, other studies have shown evidence that manipulating load during updating (not as number of to-be maintained items, but via item similarity) has produced age-related differences [10].

Concerning recognition probe performance, we cannot avoid noting that this measure was not sensitive enough to show differences between memory systems, only a global effect of positive probes over negative ones. This is likely to be a consequence of the task, which was specifically designed to focus on processing measures, i.e., self-paced RT, rather than post-task measures, such as probe recognition [17]. Therefore, the absence of an effect at recognition is neither unexpected, nor particularly important to our current aims; these were focused primarily on sensitivity to small processing differences.

Interestingly, we did not observe notable differences between young-old and old-old adults, rather a global decline starting from 60 years old. However, we have to note that the oldest participants were much slower than the youngest old, although this difference did not reach statistical significance. Only at encoding did we observe a significant slowing in the oldest group, relative to the youngest.

Our results show a difference between single memory contents and content-context bindings; speculatively, this difference might be taken to represent a difference between memory systems. As briefly mentioned in the introduction, WM and STM differ as to the type of operations that can be enacted on the to-be-studied material; that is, processes that are more or less active. In the light of our results, another important way to distinguish between these could be on the basis of the type of to-be-memorized information. In fact, traditional STM tasks are based on recall of isolated memory contents (see for instance, the classical span tasks), whereas WM is based more on manipulating memory associations, such as between adjacent contents or features, or between task modalities (such as complex span tasks or dual tasks, e.g., [23]. Indeed, it could be then argued that STM is memory for contents, whereas WM is memory for content-context bindings [11].

Therefore, we believe our results can contribute to further differentiation between these two memory systems, and analyzing their relative decline in aging. Indeed, our data show that when content-context binding is manipulated, there is a clear decline in aging. In contrast, when single contents are manipulated, no evidence of decline is shown. However, without doubt, further experimental evidence is needed to support this distinction.

In conclusion, our results contribute to analyzing different objects of updating, and potentially, to differentiate between memory systems and their selective impairment during aging. We found evidence that in updating, as in other memory processes, older adults are impaired in acting on contextual bindings, but not on single memory contents. Therefore, this process appears the best marker of memory age-related decline within the WM system, which crucially, also appears sensitive to memory load increase, selectively for an already-slowed system. That said, these results call for further evidence clarifying the specific nature of contextual binding, and contributing to fuller analysis of its core mechanisms.