Keywords

Definition, Models, and the Significance of Working Memory

Working memory (WM) allows for simultaneously maintaining and processing information in a controlled manner (Baddeley and Hitch 1994). Several competitive theoretical models of WM are existing and are still vividly discussed (Baddeley 2012; Oberauer et al. 2018). Most WM models contributed substantially to our current understanding of WM and largely agree on the basic assumptions that WM capacity is limited and that reliable individual differences in this capacity exist, which place limiting constraints for performing a wide range of other cognitive activities (e.g., Baddeley 2012; Oberauer 2009). In other words, WM is a limited capacity system providing the temporary storage and manipulation of information that is necessary for higher cognitive functioning (e.g., for reasoning; Baddeley 2012). The WM models do, however, significantly differ in the assumptions about the structure of this limited capacity system. We will shortly introduce the main ideas of the models of Oberauer, Baddeley, Miyake and colleagues because they are particularly helpful for understanding well-known WM training paradigms.

Oberauer defines WM as the cognitive system that allows for building, maintaining, and updating structural representations via dynamic bindings (cf. Oberauer 2009; Wilhelm et al. 2013). This WM system consists of two parts: Bindings temporarily organize information such as words, objects, or events in a declarative part, and connect this information to allowed or inhibited responses in a procedural part (Oberauer 2009). Baddeley, however, defines WM as a cognitive system with at least three components: The central executive, which is responsible for focusing and dividing attention and for coordinating the information flow between at least two temporary storage systems, one for phonological and one for visuo-spatial information (Baddeley and Hitch 1994). Miyake emphasizes the special role of WM updating (i.e., monitoring and refreshing information held in WM) as an executive function (Friedman and Miyake 2017; Miyake et al. 2000; Karbach and Kray, this volume).

Taken together, these WM models differ in the assumed underlying structure of the WM system but agree that it allows for simultaneously maintaining and processing information. Because of this fundamental function, it is not surprising that WM has shown to be a central determinant of fluid intelligence (e.g., Fuhrmann et al. 2019; Kane et al. 2004), school achievements in various domains (e.g., Peng et al. 2016, 2018), and a large number of other cognitive tasks that are highly relevant in daily life (e.g., language comprehension, following directions, and writing; Barrett et al. 2004, for a review).

The Rationale Behind Working Memory Training

The idea that WM capacity is the main limiting factor for performing a wide range of cognitive activities (e.g., Baddeley 2012) has the implication that WM training could not only benefit WM functioning but a wide range of cognitive functions. Thus, in addition to performance improvements on the trained WM tasks and near transfer to other nontrained WM tasks, one might even expect far transfer to a range of alternative functions. For example, given the close relation of WM capacity and fluid intelligence (e.g., Kane et al. 2004) one could assume that WM training might also benefit reasoning. Improving WM functioning even slightly might therefore have enormous practical implications relevant to everyday life, which is why this topic has raised so much attention in several areas of psychology.

Two general mechanisms could mediate transfer effects: Enhanced WM capacity and/or enhanced efficiency using the available WM capacity (cf. von Bastian and Oberauer 2014). Enhancing WM capacity is the traditional goal of WM training and a classic explanation for transfer effects (Klingberg 2010, for a review). Enhanced efficiency has long been considered to be largely material- or process specific, for example, through the acquisition of strategies suited for a specific task paradigm only. Although there is evidence that enhanced efficiency could also work on a more general level, such as faster visual encoding or faster attentional processes (von Bastian and Oberauer 2014), enhancing WM capacity remains the aim and focus of most training studies. WM training is assumed to enhance general WM capacity if there is evidence for transfer effects to multiple WM tasks varying in the type of material and mode of testing (Klingberg 2010).

Enhanced WM capacity can theoretically be explained with training-induced cognitive plasticity (Lövdén et al. 2010; see also Karbach and Kray, this volume). Plasticity denotes that a prolonged mismatch between cognitive resources and situational demands can foster reactive changes in the possible ranges of individual cognitive performance – such as changes in WM capacity (cf. Lövdén et al. 2010). To create a prolonged mismatch, WM training needs to be challenging but manageable with a high degree of effort. No mismatch arises if the WM tasks can either be solved with the existing WM capacity or if they are so frustrating that participants give up. Therefore, WM training groups are often assigned to adaptive task-difficulty conditions to foster plasticity by keeping WM demands perpetually at the individual limit, whereas active control groups are assigned to consistently low WM task-difficulty conditions or tasks tapping on functions alternative to WM (cf. Lövdén et al. 2010).

The cognitive routine framework suggested by Gathercole et al. (2019) follows a similar idea. WM task features which create unfamiliar and challenging cognitive demands require participants to develop novel cognitive routines, because the demands cannot be met by existing mechanisms. New cognitive routines can then be applied to untrained tasks sharing the same requirements, which is the basis for transfer effects. This principle is largely in line with the concept of plasticity, but the framework also focuses on specific predictions about which common features will likely generate transfer and which will not (cf. Gathercole et al. 2019). For example, a crucial feature of recall paradigms (the recall of lists) is the presence or absence of distractor interference (distraction during the encoding of lists). Distractor interference requires cognitive routines to reduce the impact of interference (e.g., the removal of distractor representations), which can only be transferred to tasks sharing this requirement. Notably, these routines are automated cognitive procedures and more general than task-specific strategies. The process of constructing a new cognitive routine follows conventional models of skill acquisition and draws on general cognitive resources such as intelligence (cf. Gathercole et al. 2019).

Selected Training Regimes

A basic distinction can be drawn between (1) single-paradigm training regimes, focusing on one WM paradigm, (2) multiparadigm regimes including multiple WM paradigms (both 1 and 2 are single-domain regimes focusing only on the domain WM), and (3) multidomain regimes including not only WM tasks but also tasks drawing on other abilities (e.g., on processing speed; von Bastian and Oberauer 2014). Naturally, single-paradigm regimes have the advantage that training and transfer effects can be attributed to specific mechanisms more easily. Multiparadigm or multidomain regimes could in theory be more effective because they require more heterogeneous cognitive processes, but the effects cannot be isolated. A recent meta-analysis provided the first evidence on the effectiveness in older adults: Single-domain training resulted in larger effect sizes on near-transfer outcomes (compared to far-transfer outcomes), whereas multidomain training obtained larger effect sizes on far-transfer outcomes (compared to near-transfer outcomes; Nguyen et al. 2019). This pattern directly corresponds to training contents (training specific vs. numerous cognitive processes), but needs further validation (e.g., in other age groups). Only a few studies directly compared different WM training regimes (Holmes et al. 2019; von Bastian and Oberauer 2013). Most studies investigate the effectiveness of a specific regime. We will briefly introduce a selection of well-known WM training regimes.

Simple Span Training

In simple span tasks, participants have to recall a list of stimuli (e.g., digits or colors) after a brief retention interval. In case of successful recall, they are given a longer list of stimuli. Recall takes place in either the presented order (e.g., digit span forwards) or in reverse order (e.g., digit span backwards). Recall in the presented order requires temporary storage and thus draws on the storage systems assumed in Baddeley’s WM model. Backward span tasks draw on central executive functioning. Therefore, training regimes based on Baddeley’s WM model usually include both forward and backward span tasks to train all components of WM. The probably best known regime based on simple span tasks is Cogmed WM training (www.cogmed.com), which is very common, particularly for children with ADHD. Cogmed has been tested in a large number of studies and is the topic of several ongoing discussions and current reviews (e.g., Aksayli et al. 2019; Shinaver et al. 2014).

Complex Span Training

Complex span tasks combine simple span tasks with a simultaneous and often unrelated secondary task, such as evaluating equations or pictures. Thus, they draw on both storage and processing, which particularly corresponds with Baddeley’s WM model (which includes storage and processing units). Empirical evidence suggests that they are almost perfectly correlated with binding and updating tasks (e.g., Wilhelm et al. 2013) and can thus also be mapped to Oberauer’s WM model. Complex span tasks are well-established and popular indicators of WM capacity (e.g., Kane et al. 2004), which are regularly used as training tasks in cognitive training. For example, they are implemented in the WM training battery Braintwister (Buschkuehl et al. 2008) and the WM tasks in Tatool (von Bastian et al. 2013).

N-Back Training

In the n-back task, participants are presented with sequences of stimuli and must decide whether the current stimulus matches the one presented n items back in a given modality (e.g., visuo-spatial or auditory). Importantly, n is a variable number that can be adjusted to increase or decrease task difficulty. Dual n-back tasks combine two modalities and are considered to be more difficult and effective than single n-back tasks. The n-back task is a valid indicator of WM capacity (e.g., Wilhelm et al. 2013; but see Jaeggi et al. 2010) and particularly corresponds with the theoretical understanding of Oberauer and Miyake as it requires the updating of information in WM. Cognitive training with n-back tasks is common in various age groups and is implemented in, for example, the Braintwister WM training battery (Buschkuehl et al. 2008) and the Lumosity cognitive training battery (e.g., Hardy et al. 2015).

Training and Transfer Effects

To evaluate the effectiveness of WM training, one considers whether a training group (compared to a control group) showed (1) performance improvements on the trained WM tasks, (2) near transfer to nontrained WM tasks, and (3) far transfer to different cognitive functions.

Training Effects

WM training studies ubiquitously report that trained participants significantly improve their performance on the trained WM task(s) over the course of training (cf. Morrison and Chein 2011). This applies to a wide variety of training regimes and age ranges of the participants. Even generally critical reviews acknowledge that participants typically advance considerably (e.g., Shipstead et al. 2012). One meta-analytical integration of 12 WM training effects derived from studies with older adults found a large average standardized increase between pre- and posttest of d = 1.1 compared to the control groups (Karbach and Verhaeghen 2014), which was confirmed recently by a different research group (Hedges’s g = 1.2 across 15 effect sizes; Nguyen et al. 2019). While average comparisons of standardized pre- and posttest performances are a classical requirement in WM training studies, analyzing individual performance trajectories over the course of training sessions can even provide additional information. For example, growth modeling with N = 190 younger and older adults revealed that individual performance substantially increased across the training phase, with a steeper increase at the beginning (Guye et al. 2017), which is in line with the power law of practice (Heathcote et al. 2000). By comparing the individual performance growth of younger and older adults, Bürki et al. (2014) demonstrated that older adults showed on average a slower WM performance growth during training than younger adults.

However, improved performance on a training task does not necessarily imply an enhanced WM capacity (Shipstead et al. 2012). The conclusion of training-induced increases in WM capacity is only valid in comparison to an adequate control group (e.g., Green et al. 2014, for a review) and with evidence for near transfer effects to multiple WM tasks varying in the type of material and mode of testing (Klingberg 2010).

Near Transfer Effects

A large number of meta-analyses and reviews agree that WM memory training produces near transfer to nontrained WM tasks in children, younger adults, and older adults (e.g., Karbach and Verhaeghen 2014; Melby-Lervåg and Hulme 2013; Nguyen et al. 2019; Sala et al. 2019; Schwaighofer et al. 2015). For example, in a meta-analytical integration of 18–21 near transfer effects derived from studies with children and adults, Melby-Lervåg and Hulme (2013) found moderate and large average standardized increases on visuo-spatial/verbal WM tasks of d = 0.5/0.8 between pre- and posttest compared to control groups. Age was a significant moderator of the effect on verbal WM, with children showing larger benefits than adolescents (Melby-Lervåg and Hulme 2013). However, the effects are found across the whole lifespan (e.g., children, younger adults, and older adults in Sala et al. 2019) and are also valid when only comparisons between trained groups and active control groups were considered for the analysis (Sala et al. 2019). Notably, near transfer effects are usually smaller than training effects. For example, with Cogmed Training for children, improvements in trained tasks were about 30–40%, whereas improvements in nontrained WM tasks were about 15% (cf. Klingberg 2010; see also Karbach and Verhaeghen 2014, for similar findings on older adults).

Despite this promising evidence, it is important to consider that not all studies have minimized task-specific overlaps between the training and near transfer tasks (cf. Shipstead et al. 2012). This is particularly relevant for n-back training, because some learning processes that occur during n-back are assumed to be paradigm specific and thus not directly transferable to other WM paradigms (Shipstead et al. 2012). A recent meta-analysis found that a substantial part of near transfer following n-back training was indeed paradigm specific (Soveri et al. 2017). This demonstrates why transfer should be evaluated on the latent ability level (see Könen and Auerswald, this volume for details). Evidence for near transfer on the latent ability level would be strong evidence for training-induced increases in WM capacity and thus an optimal foundation for the investigation of far transfer effects.

Far Transfer Effects

The question whether valid far transfer effects to different cognitive functions exist is highly controversial. They would be a central determinant of the value of WM interventions because training outcomes need to generalize to other cognitive abilities to optimally support participants in their daily life. Most views on transfer suggest that the likelihood and strength of far transfer varies as a function of the similarity in processing demands between the training and transfer tasks (see Taatgen, this volume for details). Thus, one would expect transfer to abilities that are generally known to be strongly related to WM, such as, for example, fluid intelligence, executive functions, and academic achievement (e.g., Kane et al. 2004; Peng et al. 2016, 2018). The evidence for far transfer effects, however, is mixed. Meta-analyses on WM training differed in the conclusion on the presence (Au et al. 2015, 2016; Karbach and Verhaeghen 2014; Schwaighofer et al. 2015) or absence of far transfer effects (Melby-Lervåg and Hulme 2013, 2016; Sala et al. 2019).

For example, the meta-analysis of Au et al. (2015) focused on fluid intelligence as transfer outcome. They integrated 24 effect sizes of n-back training with healthy adults (18–50 years of age) and found small average standardized increases on fluid intelligence tasks of Hedges’s g = 0.2 between pre- and posttest compared to control groups. The meta-analysis of Schwaighofer et al. (2015) came to a similar conclusion on this issue, whereas two others did not (Melby-Lervåg and Hulme 2013; Nguyen et al. 2019). This is not surprising because different selection criteria can result in different samples and findings. For instance, Melby-Lervåg and Hulme (2013) included studies investigating different age groups from all over the lifespan (up to 75 years of age) and they did not differentiate between healthy and cognitively impaired participants. Considering the large individual differences in the magnitude of transfer effects, it is not surprising that data averaged over these very diverse groups do not show any significant far transfer effects on the group level. However, more evidence is needed before a converging view on far transfer to fluid intelligence can evolve in the field. Interestingly, Bürki et al. (2014) analyzed the individual performance growth in WM training with younger and older adults and found that those who improved more during training showed higher gains in a fluid intelligence transfer task. This is a correlational and by no means a causal finding, but it can help to understand individual differences in transfer outcomes.

Further, recent evidence shows far transfer to executive functions (e.g., Melby-Lervåg and Hulme 2013; Nguyen et al. 2019; Salminen et al. 2012), but a complete picture with findings on all age groups and all executive functions is yet missing. The meta-analyses of Melby-Lervåg and Hulme (2013) including children and adults demonstrated small transfer effects to inhibition (Stroop task, d = 0.3, 10 effect sizes). There is further meta-analytical evidence for small transfer effects to executive functioning (inhibition and flexibility) in adults in general (Soveri et al. 2017) and specifically older adults (e.g., Hedges’s g = 0.2, 15 effect sizes; Nguyen et al. 2019). One meta-analysis, however, tested transfer of WM training to executive control together with other measures (fluid intelligence, processing speed, and language) and found no evidence for transfer effects over all measures in children and adults (Sala et al. 2019). Given the close theoretical and empirical relations of WM and executive functions (Friedman and Miyake 2017, for a review), it is rather surprising that we are missing a more differentiated understanding on the transfer of WM training to executive functions.

Concerning far transfer to academic achievement, the present findings on children demonstrate converging evidence for positive effects on reading but not mathematics (Titz and Karbach 2014, for a review; see also Johann and Karbach, this volume). Findings of children and adults combined, however, do not show transfer effects to either reading or mathematical abilities (Melby-Lervåg and Hulme 2013; Schwaighofer et al. 2015). Future meta-analyses including only children have to decide whether this transfer effect might be only valid for children who are still developing their reading skills.

Moderating Variables

The current controversy about the existence of far transfer effects demonstrates the importance of considering moderating variables in evaluating training and transfer effects. Possible moderating variables are training-specific features (e.g., type, intensity, and duration of training; von Bastian and Oberauer 2014, for a review), individual differences (e.g., baseline performance, age, and personality; see Katz et al., this volume, for a review), and within-person processes during training (e.g., the strength of the relation between daily motivation and WM performance; Könen and Karbach 2015). As elaborate reviews on these issues do already exist (see above), we do not repeat their empirical findings here. We are, however, strongly convinced that the failure to consider moderating variables – not only in meta-analyses but also in primary studies – could mask training and transfer effects.

Maintenance

The longevity of training-induced benefits is a key aspect of the value of WM interventions. Near transfer effects appear to be mostly stable, which is even acknowledged by generally critical reviews (e.g., Shipstead et al. 2012). A meta-analysis on studies with children and adults provided valuable evidence as it included 42 immediate effect sizes of near transfer to verbal WM and eleven long-term effect sizes derived from follow-up tests conducted on average 8 months after the posttests. After the removal of outliers, immediate near transfer effect sizes were moderate (Hedges’s g = 0.3–0.6) and long-term effect sizes were small to moderate (Hedges’s g = 0.2–0.4). The meta-analyses further demonstrated comparable immediate and long-term effects for visuo-spatial WM, albeit based on fewer effect sizes (Schwaighofer et al. 2015). Thus, even several months after WM training, near transfer effects to other WM tasks are still valid.

The longevity of far transfer effects, however, is unclear. Important evidence comes from the COGITO study (Schmiedek et al. 2014), in which a sample of younger adults practiced 12 tests of perceptual speed, WM, and episodic memory for over 100 daily 1-hr sessions. The findings demonstrated a net far transfer effect of 0.23 to a latent factor of reasoning 2 years later (compared to a passive control group), which did not differ in size from the immediate effect 2 years earlier. This shows that intensive cognitive training interventions can have long-term broad transfer at the level of cognitive abilities. However, as this was a multidomain training, the contribution of the WM training component cannot be isolated. This is essential, since a meta-analysis on single-domain WM training studies provided no evidence for the longevity of far transfer effects (Schwaighofer et al. 2015).

Neuropsychological and Everyday Correlates

Identifying correlates to both neural functions and behavior in everyday life is another key aspect when assessing the value of WM interventions. Neuroimaging studies provided the first evidence that training-induced increases of WM performance were related to changes within a network of brain regions generally known for its association with WM functioning (i.e., dorsolateral prefrontal cortex, posterior parietal cortex, and basal ganglia; Morrison and Chein 2011, for a review). They suggest that WM training leads to neuroplastic processes that represent a reduced demand for attentional control with increasing practice (e.g., Clark et al. 2017; Thompson et al. 2016). Training-induced transfer was related to changes within networks of brain regions associated with performance on both the training and transfer tasks (cf. Morrison and Chein 2011). This could indicate that far transfer is more likely if the training and transfer tasks engage specific overlapping neural processing mechanisms and brain regions (Dahlin et al. 2008; see also Wenger and Kühn, this volume).

Correlates to behavior in everyday life are mostly tested in the context of ADHD symptoms. A meta-analysis integrated 13 effect sizes of studies with children and adults and indicated a moderate training-induced decrease of inattention in daily life (d = −0.5). Seven effect sizes from follow-up tests conducted 2–8 months after the posttests suggested persisting training benefits for inattention (d = −0.3; Spencer-Smith and Klingberg 2015). Thus, benefits of WM training might generalize to improvements in everyday functioning.

Methodological Issues

As the review above indicated, there is a huge controversy on far transfer effects of WM training. Many arguments apply to cognitive training in general but are largely discussed in the context of WM training. We briefly review three main methodological issues that have been repeatedly discussed over the years (see Schmiedek, this volume, for more details).

Adequate Control Groups

A major concern in the field of WM training is the appropriateness of the control condition(s). The field fundamentally agrees on the advantages of active control groups and the necessity of considering the type of control group in interpreting findings (passive control groups receive no treatment and active control groups receive a treatment that does not qualify as WM training or not as cognitively demanding WM training). The type of control group is a standard moderator tested in meta-analyses and topic of several reviews (e.g., Green et al. 2014). There is, however, disagreement on the potential benefit of passive control groups. Some emphasize the risks of overestimating training and transfer effects and false claims of causality in passive control designs (they cannot control for expectancy and other nonfocal effects; e.g., Melby-Lervåg and Hulme 2016). Others in turn emphasize the difficulty of finding an adequate active control condition, which produces the same nonfocal effects (e.g., which is motivating and challenging) but does not draw on WM (cf. Oberauer 2015). If the active control condition draws significantly on WM, an underestimation of training and transfer effects is likely. A self-evident consequence of all risks would be to include both passive and active control groups and assess motivation and expectancy in active control groups.

Underpowered Studies

Underpowered studies with too few participants per training group are a common problem in the field. Naturally, null findings in underpowered studies should not be interpreted, but underpowered studies can theoretically produce spurious significant effects, too. Meta-analytic procedures typically adjust effect sizes for the sample sizes of the included studies but the estimates can still be affected. Given the currently large number of meta-analyses in the field summarizing mostly the same partly underpowered studies, we would strongly profit from carefully designed new studies and carefully conducted replications of known effects with adequate power (e.g., Brandt et al. 2014, for a tutorial). One solution for this issue is a more consequent peer-review system requesting power estimates. A couple of notable exceptions exist, for example, a study on a multidomain online training (including WM training) with N = 4715 participants. It demonstrated moderate transfer effects to several cognitive functions such as WM and reasoning compared to an active control condition (Hardy et al. 2015).

Research Bias

It is obvious that the present research labs fundamentally differ in whether they have an optimistic or pessimistic view on WM training outcomes, particularly on far transfer. This could be very valuable because it could be the foundation of a fruitful discussion. However, the current debate is far too heated, which could – in the worst case – result in biased research. That is, it could result in a biased publication of one’s own work and a biased reading of other work. In our view, four things are helpful to address this issue: (1) consideration of labs/authors as a moderating factor in meta-analyses (e.g., in Au et al. 2015), (2) reports of Bayesian analyses which allow for quantifying the strength of evidence in favor of both the null and the alternative hypothesis (e.g., in Gathercole et al. 2019), (3) preregistration of methods and hypotheses (e.g., Weicker et al. 2018, for a registered clinical trial), and (4) endorsement of a more differentiated perspective and language through senior researchers (e.g., Oberauer 2015) and peer review.

Taken together, the necessary tools to overcome research bias already exist and should be applied. A recently published consensus of 48 scientists discusses further aspects of methodological standards in cognitive training research (Green et al. 2019; see also Cochrane & Green, this volume (Chap. 3)).

Conclusion

In summary, consistent evidence suggests significant average training effects and significant near transfer to nontrained WM tasks. However, evidence for far transfer to other cognitive functions is mixed, which caused a vivid controversy in the field. Still, the prospect of successful WM training has so many significant theoretical and practical outcomes that we should be more than motivated to investigate conflicting findings. If the existing evidence for transfer could be further validated, it would significantly impact our theoretical understanding of both WM and the transfer constructs (e.g., in terms of plasticity). It could also positively impact intervention programs, where even small gains in WM capacity and transfer constructs could actually make a difference relevant to everyday life (e.g., for school children relying on WM capacity to improve learning processes). Further, the large individual differences in training outcomes (Katz et al., this volume) should also motivate us to understand these differences. We agree with Colzato and Hommel (this volume) that the current controversy about the effectiveness of training is likely partly due to the failure to consider individual differences. Not considering the personality of the trained participants, their experiences, and life contexts during training could mask training effects. We should not only ask whether WM training works on average but also for whom it works and in which contexts and situations it works.