Introduction

Working memory (WM) is a key feature of our cognitive system, allowing for the simultaneous maintenance and processing of information in a controlled manner (Baddeley and Hitch 1994). WM is an excellent predictor for academic development in childhood, as it helps children to process complex tasks, update and maintain task goals, and remember and follow instructions (Alloway et al. 2009). Consistently, WM, which improves most during the preschool period, but continues to develop well into adolescence (Gathercole et al. 2004), supports learning across academic domains, including reading and mathematics (Alloway et al. 2004, 2005; Cowan 2014; Titz and Karbach 2014, for reviews). Several WM models are currently existing and discussed (e.g., Baddeley 2000; Barrouillet and Camos 2012; Oberauer 2009). They all contributed substantially to our understanding of WM and agree on the basic assumptions that WM capacity is limited and that reliable individual differences in this capacity exist (e.g., Oberauer 2009). They do, however, differ in the assumptions about the components of WM. For example, there is debate of whether verbal and visuospatial information are processed and stored in different components (e.g., Alloway et al. 2006). One way to investigate this is by examining the developmental trajectory of verbal and visuospatial WM. Among the most commonly used tasks to assess WM abilities in childhood are complex WM span tasks. In WM span tasks, participants are presented a sequence of stimuli (encoding phase) and are asked to repeat them in the correct or reversed order after a short delay (recall phase). Complex WM span tasks increase demands on executive functioning by adding a processing task to the encoding phase, which requires attention and thus prevents it from contributing to storage during processing episodes. For instance, participants are asked to decide for every stimulus they see whether it is large or small while encoding the sequence. In large cross-sectional data sets, research repeatedly demonstrated substantial age-related WM-performance improvements across childhood on various WM span tasks (e.g., Michalczyk et al. 2013).

While different components of WM are clearly separable in adults (e.g., Jarrold and Towse 2006; Vallar and Papagno 2002), the extent to which these components differentiate during childhood remains to be clarified. Findings related to Baddeley’s model demonstrated that the phonological loop and the visual–spatial sketchpad seem to be at least partially separable in both 4–8 years (Pickering et al. 1998; Roebers and Zoelch 2005) and 11–14 years (Jarvis and Gathercole 2003). However, this differentiation is still not as well-established as the one between the two slave systems and the central executive (Alloway et al. 2006; Gathercole et al. 2004), indicating that the phonological loop and visuospatial sketchpad may continue differentiating across middle childhood. Consistently, performance on tasks requiring the phonological loop or the visuospatial sketchpad correlates with different outcomes, i.e., performance on verbal span tasks significantly correlates with measures of attainment levels in linguistic domains (e.g., English), whereas performance on spatial span tasks significantly correlates with mathematics and science measures (Jarvis and Gathercole 2003).

Therefore, our first aim was to clarify whether verbal and visuospatial working memory processing differentiates during childhood by comparing performance on complex verbal WM (VWM) and visuospatial WM (VSWM) span tasks in 4–6 years and 8–10 years.

Although research has repeatedly demonstrated substantial age-related WM performance improvements across childhood on various WM span tasks (e.g., Michalczyk et al. 2013), it is unclear whether children’s performance on complex span tasks is influenced by the verbal or visuospatial nature to be recalled information. Young children may perform better on VSWM compared to VWM complex span tasks, because they have had less practice on verbal rehearsal strategies. For instance, Flavell, Beach, and Chinsky (1966) tested 5, 7, and 10 years on a serial recall task and results show that the use of rehearsal strategies increased with age and improved performance of item recall. Whereas younger children only passively used rehearsal and mostly just repeated a single item, older children and adults actively used rehearsal strategies, e.g., repeating a series of items (e.g., Cuvo 1975; Hitch et al. 1993; Ornstein et al. 1977). Further evidence for less practice in rehearsal with age comes from studies with adults, showing that articulatory suppression during item presentation significantly reduced recall performance, leading to recall patterns similar to those found in 5 years (Cowan et al. 1987). Direct evidence from studies with children showed that long and similar item names resulted in decreased rehearsal accuracy in 11 years (only based on VWM span tasks; Hitch et al. 1991). Once children get older and learn to effectively use rehearsal strategies, performance differences between VWM and VSWM may disappear. However, recent evidence based on directly comparable tasks tapping both VWM and VSWM within the same sample is missing. Thus, the present study addressed this question by directly comparing children’s performance on a complex VWM span task and a structurally comparable complex VSWM span task.

In the current study, we investigated two questions. (1) Do verbal and visuospatial WM progressively differentiate with age? (2) Do younger children show asymmetrical performance between the verbal and visuospatial domains that no longer exists in older children? To investigate these questions, we examined children in two age groups (younger: 4–6 years and older: 8–10 years) that performed two complex WM span tasks, one verbal and one visuospatial task. If VWM and VSWM progressively differentiate with age, we expect (1) a stronger correlation between two latent factors representing VWM and VSWM in the younger compared to the older children. Moreover, if younger children are less experienced using verbal rehearsal than visuospatial WM processing strategies, we expect (2) that they should perform better on the VSWM span task than on the conceptually comparable VWM span task and that this difference should be reduced or non-significant in older children.

Method

Participants

The sample comprised a total of 226 children, 125 children aged 4–6 years (MAge = 59.7 months, SE 0.59; 70 boys) and 101 children aged 8–10 years (MAge 106.4 months, SE 0.64; 59 boys). Eleven additional children were excluded from the analyses, because they did not finish both tasks (4), were diagnosed with ADHD, ASD, or epilepsy (6) or because of technical errors (1). Half of the data in both age groups were collected in Germany and the other half in Scotland. The children were either tested in the laboratory or in a quiet room in their kindergarten or daycare. Participants were randomly assigned to start with one of the two adaptive complex working memory span tasks (VWM and VSWM). Informed parental consent was obtained and children received a certificate for participation.Footnote 1

Materials, design and procedures

The tasks were presented on laptop computers with the software package Eprime 2.0 (http://www.pstnet.com/products/Eprime/; Schneider et al. 2002). The test session started with a short game to familiarize the children with the use of the computer mouse. Afterwards, they were introduced to the WM tasks.

Adaptive complex verbal working memory (VWM) span task: In the encoding phase, children heard a series of words corresponding to food items while watching a picture of that item on the screen (e.g., bread, milk…). After each word, children indicated for each picture if it was large or small by clicking on the corresponding response buttons in the lower left and right corners of the screen. At the end of the series, they saw an array containing eight food items, including those that had just been presented, and were asked to reproduce the sequence of items in the correct order by clicking on the corresponding pictures. After each click, a grey dot was presented in the selected location for 500 ms to indicate that the click had been registered. Children performed four practice trials followed by 12 test trials. To ensure comparability between countries, we used one-syllable words in English as well as in German as stimuli. Furthermore, we considered the age of acquisition of the words (Kuperman et al. 2012) to ensure that all children were familiar with the stimuli (e.g., bread, milk, salt, egg, tea, juice, nuts, and rice).

Adaptive complex visuospatial working memory (VSWM) span task: the task procedure was identical to the verbal WM span task, except for the following differences. Children saw an apple presented at different locations in a 3 × 3 grid. For each apple, they were asked to indicate if the apple was complete or if bites were taken from it by clicking on the corresponding response button in lower left and right corners of the screen. At the end of the series, children were asked to reproduce the sequence of locations by clicking on the corresponding grid.

To control for between-group differences, we also assessed children’s verbal abilities and processing speed. Analysis of variance (ANOVA) showed that there neither were differences in terms of language abilities and processing speed between the children tested in Germany and Scotland nor between those performing the VWM or the VSWM task first. However, older children outperformed younger ones on both tasks (see Supplementary Material for details on the tasks, descriptive statistics, reliabilities, and ANOVA results).

Task difficulty adjustment and scoring

Task difficulty (i.e., number of items to be remembered) was individually adapted as a function of each child’s performance, starting with one item in the first trial and continuing for a total of 12 trials (see Studer-Luethi et al. 2015, for a similar procedure). If children responded correctly to both the WM span task and the processing task, task difficulty was increased by one item in the following trial. If they responded incorrectly to both tasks or to the span task only, task difficulty was decreased by one item in the next trial. If they responded correctly to the span task, but incorrectly to the processing task, task difficulty did not change. We calculated two scores for each child and each WM span task (VWM and VSWM) representing average (average trial length) and maximum (maximum correct trial length) performance.

Statistical modeling procedures

All models are confirmatory factor models (CFA) based on maximum likelihood estimation with robust standard error correction (MLR; Mplus 7.4). For model identification, the first factor loading of each latent variable was fixed to one. Model fit was evaluated with the χ2 test, the Comparative Fit Index (CFI), the Root Mean Square Error of Approximation (RMSEA), and the Standardized Root Mean Square Residual (SRMR) in reference to Beauducel and Wittmann (2005). We tested measurement invariance by comparing different levels of invariance (configural, metric, scalar, and strict, e.g., Cheung and Rensvold 2002). None of the variables had missing cases.

Results

Descriptive statistics and reliabilities are shown in Table S1 and correlations of all measures are shown in Table S2. We tested the measurement invariance of VWM and VSWM performance separately for age group (age 4–6 vs. 8–10), country (Germany vs. Scotland), and sex (female vs. male). In each case, invariance was evaluated in a two-group CFA with two factors (VWM and VSWM) having two indicators each (average and maximum performance; cf. Fig. 1 for an example). For all cases, the fit indices of all invariance levels are included in Table 1. Overall, all models fitted the data perfectly (all χ2 with ps > .11) and eight out of nine χ2-difference tests were non-significant (indicating that the assumption of a higher level of measurement invariance did not result in a significant drop of model fit). The exception was metric measurement invariance for age group, which resulted in a significantly higher χ2, but not in a severe drop in the CFI (ΔCFI = .003), indicating that there is a small but not severe violation of measurement invariance (e.g., Cheung and Rensvold 2002). Thus, overall, the data supported the assumption of measurement invariance over age group, country, and sex (Table 1), allowing for comparisons of means and variances between these groups (e.g., Steenkamp and Baumgartner 1998).

Fig. 1
figure 1

Multi-group model with two age groups (younger/older children) and strict measurement invariance across age groups. Read from top to bottom, the digits represent latent factor correlations and standardized factor loadings, respectively, for both age groups 4–6/8–10 years

Table 1 Tests of measurement invariance for verbal and visuospatial working memory performance in children

In a CFA model comparison, we found that two factors for WM performance (VWM and VSWM) explained the data better than one factor (WM) for both the younger and older children. Two WM factors fitted the data well in both the younger [χ2(df = 2) = 0.49, p = .79; CFI = 1; RMSEA = 0 (90% CI = 0–.12); SRMR < .01] and older age group [χ2(df = 2) = 0.15, p = .93; CFI = 1; RMSEA = 0 (90% CI = 0–.06); SRMR < .01]. Testing their correlation against 1, however, resulted in a severe drop of model fit in both age groups [younger: Δχ2(df = 1) = 83.06, p < .01; ΔCFI = .28; older: Δχ2(df = 1) = 157.99, p < .01; ΔCFI = .41], indicating that two factors represent the data better than one. This is not surprising when considering the strength of the observed correlations between the factors of .57 in the younger and .44 in the older age group. Using a two-group CFA (Fig. 1) with strict measurement invariance across age groups and perfect model fit (Table 1, line 4), we found that both correlations were significantly different from each other [Δχ2(df = 1) = 9.60, p < .01; ΔCFI = .01], indicating that the relation between VWM and VSWM was higher in younger children than in older children.

Using this two-group CFA (Fig. 1) with strict measurement invariance across age groups, we compared latent factor means (see Table S3). Performance on both WM factors was significantly higher for older children with very large effect sizes for both factors. Furthermore, performance on VWM was significantly better than on VSWM for both age groups. This effect was large in the younger and medium-sized in the older age group (see Table S3).

Discussion

In the current study, we investigated younger and older children’s performance on adaptive complex VSWM and VWM span tasks. Specifically, we tested (1) whether VWM and VSWM progressively differentiate with age and (2) whether younger children show performance differences between VWM and VSWM span tasks that are no longer evident in older children. Results showed that older children performed significantly better than younger children in both span tasks, replicating former findings (e.g., Gathercole et al. 2004; Michalczyk et al. 2013). Using confirmatory factor modeling, we found that VSWM and VMW were significantly correlated. Critically, the correlation was significantly higher in younger than in older children.Footnote 2 Moreover, both age groups performed significantly better in the VWM than in the VSWM span task, which is inconsistent with previous data (e.g., Cowan et al. 1987).

The current findings show that working memory processes in the verbal and visuospatial domains are already partially separable in children, which is consistent with the distinction between the phonological loop and visuospatial sketchpad proposed by Baddeley (Baddeley and Hitch 1994). Our findings, however, are also consistent with other models of WM that do not include distinct visuospatial resources (e.g., Barrouillet and Camos 2012; Oberauer and Lewandowsky 2013). Such recent models conceive of a separable verbal component, but take visuospatial processing within the general WM resource. For instance, the latest time-based resource-sharing (TBRS) model of Barrouillet and Camos (2014) differs from Baddeley’s model in that it does not include a visuospatial WM sub-system. This model includes a verbal rehearsal component and a general resource (coordinating attentional refreshing), but nothing comparable to the verbal rehearsal component in the visuospatial domain. Consistently, there is evidence, suggesting that visuospatial measures may be part of the general WM resource (i.e., sustained attention; Vergauwe et al. 2010, see also Gray et al. 2017), while verbal WM seems to be a distinct component (e.g., Kane et al. 2004; Park et al. 2002). The separable but correlated VWM and VSWM processes found in the current study are both consistent with the WM structure in Baddeley’s model (i.e., representing the phonological loop and the visuospatial sketchpad) and WM structure proposed by Barrouillet and Camos (i.e., representing the verbal rehearsal component and the general resource). However, although the VWM task primarily relied on verbal rehearsal, it included visual stimuli. Thus, one may argue that it was possible to rely on visual processes instead of verbal rehearsal to solve the task. If children would have mainly used such a visual strategy—even though visuospatial processing is more demanding than verbal processing—this would have contributed to a one-factor solution. Given that our data clearly showed a two-factor model, we consider it unlikely that visual strategies were the primary means to solve the task. Although the two tasks used in this study do not allow distinguishing between general and domain-specific variance in the model, higher correlations between the complex VWM span task and the complex VSWM span task for younger children than for older children suggest at least an age-related differentiation into different components.

Notably, we did not test different WM models against each other (see, for example, Wilhelm et al. 2013) and or data would not even allow for such an approach. We were interested in whether VWM and VSWM progressively differentiate with age, which we found to be consistent with prominent WM models (e.g., Baddeley 2000; Barrouillet and Camos 2012). Our main focus was the comparison of age groups in childhood. Age-related differentiation and age-related performance differences in childhood are important characteristics of WM processing in specific and cognitive functioning in general, because WM capacity places limiting constraints for performing a wide range of other cognitive activities (e.g., Oberauer 2009).

Older as well as younger children performed better in the VWM than in the VSWM span task. This is surprising given the hypothesis that especially, younger children should be better in the VSWM span task, since the use of rehearsal strategies might not be fully developed. It was evident during testing that both age groups tended to name aloud the stimulus they saw on the screen in the VWM span task (cf. Karbach and Kray 2007), indicating that younger children already used verbal rehearsal strategies. This is consistent with the fact that the younger sample in this study was already very verbally capable (see Table S1). Interestingly, children did not use overt verbal strategies in the VSWM span task. Moreover, the VWM and VSWM span tasks used in the present study were structurally and conceptually very similar, which may also have contributed to the fact that children did not perform worse on VWM span tasks.

The present study presents some limitations that need to be discussed. First, one might reason that the less pronounced differentiation of WM processes in younger children may be due to greater error measurement at that age. However, the correlations were tested in a multi-group model with strict measurement invariance. Thus, the intercepts and residuals for both groups were estimated on the same values, allowing for interpretation of the correlation between the factors. Second, we acknowledge that there is a slight difference between the two complex span tasks because of the processing task used during the encoding phase. Whereas the VWM task taps verbal memory and combines it with visual processing of the stimuli for the processing task (a cross-domain combination), the VSWM task taps spatial memory and visual processing only (a within-domain combination). Thus, interference between storage and processing may have been stronger in the VSWM task, which may have contributed to the finding that children performed better on the VWM than the VSWM task. However, this is just an assumption, since we did not include simple-span tasks to measure storage only.

In sum, we found age-related improvements in both VWM and VSWM from 4–6 to 8–10 years of age. We demonstrated measurement invariance across the age groups, which is encouraging for future longitudinal research, because it shows that these age groups can be compared on the same WM measures. At the same time, a differentiation into more separable VWM and VSWM components occurred, pointing to significant changes in the structure of WM during middle childhood. Further research with more working memory tasks is needed to further differentiate between different theoretical models and their subcomponents and to understand how the system matures during normal development.