Introduction

Anticipation represents a ubiquitous and central characteristic of sequence learning (Cleeremans & McClelland, 1991; Dale, Duran, & Morehead, 2012). However, the exact nature of the relation between anticipation during sequence learning and memory remains poorly understood (Dale et al., 2012; Janacsek & Nemeth, 2013; Schwarb & Schumacher, 2012). For example, working memory (WM) could be involved in sequence learning, because WM is needed for retrieval of information from long-term memory (Martini, Furtner, & Sachse, 2013). On the other hand, it is possible that WM is not related to sequence learning because of the relative automaticity of this type of learning (e.g., Unsworth & Engle, 2005). Understanding how these systems interact appears to be critical in elucidating the mechanisms underlying sequential learning. To this end, we investigated whether individuals’ anticipatory behavior changes as a function of working memory (WM) capacity during implicit sequence learning.

Sequence learning is one of the fundamental cognitive abilities that enables individuals to acquire representations of their environment. Specifically, sequence learning is a mechanism responsible for the acquisition of abstract knowledge of regularities present in the environment. Previous research has reported evidence that such distributional learning can take place implicitly—without awareness and explicit instructions (Stadler & Frensch, 1998; Turk-Browne, Scholl, Chun, & Johnson, 2009). Implicit sequence learning underpins many aspects of human behavior, such as language and various motor skills (Cleeremans & McClelland, 1991; Conway, Bauernschmidt, Huang, & Pisoni, 2010; Hunt & Aslin, 2001; Masters, 1992). For example, Conway et al. (2010) demonstrated a correlation between performance on an implicit learning task and sensitivity to word predictability in speech. The authors suggested that increased implicit learning capabilities result in more detailed representations of the word order probabilities, leading to improvements in speech perception.

The relation between implicit sequence learning [typically measured by reaction time (RT)] and WM capacity has been studied extensively (e.g., Bo, Jennett, & Seidler, 2012; Caljouw, Veldkamp, & Lamoth, 2016; Feldman, Kerr, & Streissguth, 1995; Guzmán, 2018; Kaufman et al., 2010; Unsworth & Engle, 2005; Weitz, O’Shea, Zook, & Needham, 2011; Yang & Li, 2012). However, previous studies reported mixed results regarding the relation between the two systems, potentially because researchers used different WM capacity tests (i.e., visuospatial, verbal, or numerical), but also different measures of learning in the SRT task (e.g., the difference in average RT between blocks with a training sequence and blocks with a random sequence, or the rate of RT improvement across blocks of the SRT task; for a review, see Janacsek & Nemeth, 2013).

For example, a number of previous studies found no relation between implicit sequence learning and WM capacity (Caljouw et al., 2016; Guzmán, 2018; Jimenez & Vazquez, 2005; Jongbloed-Pereboom, Nijhuis-van der Sanden, & Steenbergen, 2019; Masters, 1992; Meissner, Keitel, Südmeyer, & Pollok, 2016; Unsworth & Engle, 2005; Yang & Li, 2012). Unsworth and Engle (2005) reported that there were no differences in implicit learning in a manual version of the SRT between high and low WM capacity individuals. The authors measured implicit learning as a difference in average RT between sequence blocks and random trials, while WM capacity was measured using operation span (a numerical task). On the other hand, the authors reported WM capacity effects on explicit sequence learning when participants were aware of the sequence and the learning goal (see also Bo, Borza, & Seidler, 2009; Weitz et al., 2011). Together, these findings were interpreted to indicate the importance of WM capacity in tasks that require some form of control (i.e., explicit learning), compared to relative automaticity of implicit learning (see also Kaufman et al., 2010).

In contrast, there is evidence from different domains that support the likelihood that implicit learning and WM capacity could be related. For example, Bo et al. (2012) reported a significant positive correlation between visuospatial WM capacity and the rate of RT improvement in the SRT task. Visuospatial WM was measured using a change detection task, while the implicit learning was measured as a change in median RT across sequential blocks. The authors argued that learning in the SRT task relies on the number of items which individuals can hold in WM.

Interestingly, there is also evidence that WM and implicit learning could be negatively correlated. Based on findings obtained with artificial neural network models of language learning, Elman (1993) reported that learning can be improved under conditions of limited WM capacity. Likewise, Newport (1990) hypothesized that developmentally immature language learners focus on simpler linguistic structures because of their limited WM capacity (see also Erickson & Thiessen, 2015). In this way, early learners or learners with limited WM capacity grasp only constituents of speech, and then combine them into more complex structures. Late learners or learners with larger WM capacity, conversely, can take in more complex structures, which appear to put them at a disadvantage for learning (we will return to this in our “General Discussion”).

A series of recent behavioral and brain imaging studies has demonstrated that weaker executive functions can lead to better implicit learning (Nemeth, Janacsek, Polner, & Kovacs, 2013; Tóth et al., 2017; Virag et al., 2015). For example, Nemeth et al. (2013) reported increased sequence learning in hypnosis, compared to a waking alert condition. The authors attributed such improved learning to the disruption to the executive system caused by hypnotic instructions. In addition, using EEG data, Tóth et al. (2017) found that a better statistical learning score was related to a lower strength of connectivity between the sensorimotor and cognitive control brain regions.

Moreover, recent studies have set out to demonstrate that WM capacity and implicit learning, at least partly, relied on shared brain networks, thus implying the relation between the two systems (e.g., Hasson, Cashdollar, Weisz, & Ruhnau, 2016; Janacsek & Nemeth, 2013). Using magnetoencephalography recordings, Hasson et al. (2016) found increased neural activity in higher WM capacity individuals when visual stimuli occurred with greater statistical regularity. Moreover, the SRT task has been demonstrated to recruit a brain network that includes the dorsolateral prefrontal cortex (DLPFC), which also plays a role in conscious executive processes (Torriero et al., 2007).

Present investigation

In the current study, we investigate the relation between WM capacity and anticipatory measures during implicit sequence learning. The WM capacity measure reported here represents a combined score of three different complex verbal and visuospatial span tasks. This more encompassing WM capacity measure should account for the potential differential correlations of different WM capacity measures with implicit sequence learning (cf., Janacsek & Nemeth, 2013). In addition, implicit learning was measured using anticipation measures only, as they represent strong indicators of learning (Dale et al., 2012; Nissen & Bullemer, 1987; Schvaneveldt & Gomez, 1998; Stadler, 1989). We used the oculomotor variation of the serial reaction time task (SRT task; Kinder, Rolfs, & Kliegl, 2008; Marcus, Karatekin, & Markiewicz, 2006; Vakil, Bloch, & Cohen, 2017) previously demonstrated to provide measures of anticipatory responses, a feature that is hard to examine in manual versions of the SRT task. This is because manual responses in the SRT task are typically made after the stimulus onset, with only a limited number of pre-target button presses that would indicate anticipations (Marcus et al., 2006). In contrast, anticipatory eye movements appear at a relatively high rate in this task, and can be measured directly by recording eye movements (i.e., anticipations appear if participants transition gaze towards a next position before the subsequent stimulus appears; Marcus et al., 2006; Vakil et al., 2017). We investigate the relation between direct measures of anticipation, namely the overall number of anticipations and the number of correct anticipations, and individual differences in WM capacity. The anticipation measures reported here could be conceptualized as indicators of learning strategies (i.e., the overall number of anticipations) and learning outcomes (i.e., the number of correct anticipations) in the SRT task. Such conceptualization allows us to investigate how these processes unfold during sequence learning, and how they relate to individual differences in WM capacity. In addition, we provide a more nuanced analysis of learning efficiency, and by extension of learning strategies, by investigating the relation between WM capacity and individual differences in the number of consecutive correct anticipations (i.e., chunking patterns). Previous research investigating explicit sequence learning has reported a positive correlation between WM capacity and chunking, defined by faster RTs of groups of movements (Bo & Seidler, 2009; Kennerley, Sakai, & Rushworth, 2004; Sakai, Kitaguchi, & Hikosaka, 2003; Shea, Park, & Braden, 2006). This positive correlation is usually taken to indicate a WM imposed limit on sequence elements that can be considered during explicit learning (on average around three items in length; e.g., Bo & Seidler, 2009). Thus, lower and higher WM spans potentially use different sequence learning strategies. To our knowledge, the relation between WM capacity and chunking, here directly measured by the length of correct anticipation sequences, has not been reported in the context of implicit sequence learning.

During a typical SRT task, participants are asked to follow a target on the screen. Critically, unknown to the participants, the target presentation follows a fixed (i.e., to-be-learned) sequence, and at some point, a different (i.e., interfering) sequence is intercalated. We expect that the number of correct anticipations (henceforth correct anticipations) should increase across the learning blocks (i.e., during presentation of a to-be-learned sequence), decrease in the interference block (i.e., during presentation of a different sequence), and then again increase in the recovery block (i.e., another presentation of the learning sequence; Marcus et al., 2006; Vakil et al., 2017). In addition, we report the changes in overall number of anticipations (both correct and incorrect; henceforth anticipations).

Based on a number of previous studies (e.g., Caljouw et al., 2016; Guzmán, 2018; Kaufman et al., 2010; Unsworth & Engle, 2005), a relation between WM capacity and implicit sequence learning would not be expected. However, if WM capacity and implicit learning at least partly rely on shared mechanisms, we should expect that the two systems are related to some extent (Janacsek & Nemeth, 2013). As noted in the introduction, the potential direction of this relation remains a matter of debate. For example, participants with higher WM capacity could have additional resources available during the learning phase, which could lead to more anticipations and more correct anticipations compared to participants with lower spans. By this account, WM capacity imposes an upper bound on the number of items that can be considered during the SRT task. A positive relation between WM capacity and the number of consecutive correct anticipations (i.e., chunks) would further corroborate this account.

Method

Participants

Participants were 35 students and staff (28 female; mean age 27.4; age range 18–56) from the University of Sheffield. Participants received a £7 Amazon voucher. All participants had normal or corrected-to-normal vision. Two additional participants were excluded, because they failed to complete the experiment. The study was a part of a larger research project on language learning.

Materials

The serial reaction time task

We introduced a rapid oculomotor version of a deterministic SRT task, in an attempt to attenuate potential explicit awareness of the task.Footnote 1 The oculomotor SRT task, based on the digital SRT task by Nissen and Bullemer (1987; see also Kinder et al., 2008; Marcus et al., 2006; Vakil et al., 2017), was implemented using OpenSesame (Mathôt, Schreij, & Theeuwes, 2012). Eye movements were recorded using an EyeLink Portable Duo eye tracker (SR research, ON, Canada), tracking at a sampling rate of 500 Hz in the head-stabilized mode. Participants were calibrated by the nine-point calibration type. Tracking was monocular, using participant’s dominant eye, while viewing was binocular. Overall, the right eye was recorded for 69% of participants (n = 24). Stimuli consisted of five slides, with resolution of 1024 × 768 pixels. Each slide contained four white 65 × 65 mm squares on a gray background (the white squares were also our areas of interest (AOIs); see Fig. 1). The target (a black circle) with a diameter of 20 mm appeared in different white squares across four slides, while the fifth slide was the “anticipatory” slide and contained blank squares only. The slides were presented centrally on a 21 in. monitor (refresh rate: 60 Hz), 70 cm away from participants’ eyes. Individual white squares subtended visual angles of 5.5° horizontally and vertically, while the target subtended visual angles of 1.7°. We used the second-order conditional sequences (SOC; Gabriel et al., 2013; Vakil et al., 2017; Wilkinson & Shanks, 2004), meaning that a target location could be predicted only if the two preceding locations were considered. We used two sequences: “342312143241” and “341243142132” (adopted from Wilkinson & Shanks, 2004). Here, the numbers 1–4 correspond to the four positions: down, left, right, and up. Each sequence served either as the learning or the interfering sequence, and the order of sequences was counterbalanced across the participants.

Fig. 1
figure 1

Anticipatory slide (left panel) and one of the target slides (right panel)

Explicit knowledge questionnaire

To assess sequence awareness, participants were asked the following questions after they had completed the task: (1) Did you notice anything special about the experiment?; (2) Did you notice any patterns during the experiment?; (3) If so, could you explicitly recall the pattern?; (4) If so—please write the pattern down.

Working memory tasks

Working memory capacity was measured using automated versions of three complex span tasks: the operation span, reading span, and symmetry span. The tasks were administered using Tatool (von Bastian, Locher, & Ruflin, 2013), a Java-based programming framework.

In the operation span task, participants were shown a random number that needed to be remembered. Each number was followed by a math problem (e.g., 3 × 7 = 21) and participants were asked to make a decision on the veracity of the provided answer (half of the answers were correct). At recall, participants were asked to type in the random numbers which they had seen, in the order of presentation. Their final score was the number of correct items in the correct order. Set sizes (number–math problem pairs) ranged from 3 to 7, and each set was presented three times in random order.

In the reading span task, participants were presented with a number that needed to be remembered. Each number was followed by a sentence and participants were asked to determine whether the sentence made sense or not (half of the sentences made sense). At recall, participants were asked to report the presented numbers, in the order of presentation. The final score was the number of correct items in the correct order. Set sizes (number–sentence pairs) ranged from 2 to 6, and each set was presented three times in random order.

In the symmetry span task, participants saw a 4 × 4 grid with one of the cells filled in blue. This was followed by a presentation of an 8 × 8 grid where some squares were filled, and participants were asked to decide whether the filled square pattern was symmetrical about the vertical axis (the pattern was symmetrical half of the time). At recall, participants were asked to reconstruct the sequence of the previously filled cells, in the order of appearance. The final score was the number of cell locations recalled in the correct order. Set sizes ranged from 2 to 5, and each set was presented three times in random order.

We created a composite WM capacity score for each participant, by z-transforming complex span tasks and averaging them (e.g., Harrison, Shipstead, & Engle, 2015; Kane et al., 2007; Unsworth, 2017; the average correlation among WM capacity measures was 0.47; descriptive statistics for WM capacity measures are provided in Table S1 in Supplementary Material).

Procedure

Participants were seated in front of the monitor, with head position controlled using a chinrest. The calibration was performed at the beginning of the experiment, while drift check and correction (if required) were performed at the beginning of each block. In the SRT task, participants were instructed to follow the target on the screen. The experiment began with 12 practice trials (randomly generated sequences). The experiment consisted of six blocks, each containing a 12-element sequence repeated five times (i.e., 60 trials within a block). At the beginning of each trial, the anticipatory slide (i.e., blank squares) was presented for 500 ms, followed by the presentation of the target (i.e., a black circle inside the square) for 1100 ms. The first four blocks were learning blocks (Block 1–Block 4). Each of these blocks started from a different point in the sequence. The learning blocks were followed by an interfering block, containing a different 12-element sequence (Block 5). Finally, the original sequence was reintroduced in a recovery block (Block 6). The SRT task, administered in one session, took approximately 15 min to complete. After the SRT task, participants filled out the explicit knowledge questionnaire. Finally, the WM capacity battery was administered.

Results

We used the R Environment for Statistical Computing (R Core Team, 2018) and lme4 package (version 1.1-17; Bates, Maechler, Bolker, & Walker, 2015) to fit generalized linear mixed-effects models with the Binomial link-function (i.e., Logistic GLMM). Type II Wald Chi-square tests of models and parameter confidence intervals were obtained using the car package (Fox & Weisberg, 2011), while slope analyses were performed and plotted using the jtools package (Long, 2018). Additional data visualization was done using the sjPlot package (Lüdecke, 2018).

We fit a series of mixed-effects logistic regression analyses to our two dependent variables of interest (DVs are analyzed separately): (1) anticipations—the overall number of anticipations (correct plus incorrect); and (2) correct anticipations—the number of correct anticipations. Anticipations appeared if participants transitioned their gaze towards a different (potential) target location during the presentation of the blank slide (i.e., during the first 500 ms of each trial); anticipations were correct if the participant’s gaze remained within the correct AOI at the time the target appeared (otherwise, anticipations were incorrect).

We entered the interaction of Block (factor) and WM capacity (covariate) as fixed effects. As random effects, we entered intercept for participants: dv ~ block × wmc + (1|participant). The model fit to data was tested against a reduced model with no-interaction term: dv ~ block + wmc + (1|participant), and against a null model, containing only a constant term (the intercept): dv ~ 1 + (1|participant). Fixed-effects structures were compared using the anova function and on the basis of the Akaike information criterion (AIC) decreasing with increased model fit. For additional information about the model selection, see Supplementary Material (AIC values across the models are presented in Table S2).Footnote 2

Below, we report the results from different phases of performance: learning (Blocks 1–4), interference (Block 4 vs. Block 5), recovery (Block 5 vs. Block 6), and baseline (Block 1 vs. Block 5).Footnote 3

In addition, we used the rle (run length encoding) function from the base package (R Core Team, 2018) to compute the lengths of consecutive correct anticipations. This index represents a measure of chunking or grouping sizes of consecutive correct anticipations across the entire SRT task. The size of chunks (i.e., consecutive correct anticipations; the outcome) was regressed on WM capacity (i.e., the predictor), using generalized linear regression function (glm) from the car package (i.e., consecutive_correct_anticipations ~ wmc).

Anticipations

In our first set of analyses, we investigated the relation between WM capacity and anticipations (as indicators of learning processes) in the SRT task.

Learning (Blocks 1–4)

There was an interaction between Block and WM capacity, such that predicted probabilities of anticipations changed for various combinations of Block and WM capacity: χ2 (3) = 31.59, p < 0.001 (Slopes: Block 1: b = − 0.56, SE = 0.19, p < 0.001; Block 2: b = − 0.41, SE = 0.19, p = 0.003; Block 3: b = − 0.31, SE = 0.19, p = 0.09; Block 4: b = − 0.09, SE = 0.19, p = 0.63). Figure 2 presents the predicted values of anticipations as a function of WM capacity across learning blocks.

Fig. 2
figure 2

Interaction plot of predicted probabilities of anticipations calculated for working memory capacity levels across individual learning blocks (WMC working memory capacity, z score)

Thus, the results of our first analysis suggest that individual differences in WM capacity could be related to implicit sequence learning strategies. Specifically, there was a negative relation between WM capacity and anticipations: anticipations decreased with increased WM span. The strength of this relation decreased across learning blocks, with eventual attenuation in Block 4.

Interference (Block 4 vs. Block 5)

There was an interaction between Block and WM capacity: χ2 (1) = 13.73, p < 0.001 (Slopes: Block 4: b = − 0.11, SE = 0.20, p = 0.58; Block 5: b = − 0.43, SE = 0.20, p = 0.03). Thus, the results of the analysis revealed that WM capacity affected implicit sequence learning strategies. Specifically, there was no effect of WM capacity in the last learning block (Block 4). However, when another sequence was introduced in the interference block (Block 5), there was a negative relation between WM capacity and anticipations, similar to the starting blocks of the learning phase (Blocks 1 and 2). Again, increased WM capacity predicted fewer anticipations.

Recovery (Block 5 vs. Block 6)

There was an interaction between Block and WM capacity: χ2 (1) = 9.65, p = 0.002 (Slopes: Block 5: b = − 0.40, SE = 0.18, p = 0.03; Block 6: b = − 0.14, SE = 0.18, p = 0.44). While increased WM capacity was related to a decrease in anticipations when another sequence was introduced (Block 5), there was no effect of WM capacity on anticipations in Block 6, when the original sequence was reintroduced. The Block 6 slope resembled the slopes from the late stages of the learning phase (Block 4 in particular).

Baseline (Block 1 vs. Block 5)

There was a significant effect of Block on anticipations: χ2 (1) = 19.63, p < 0.001, such that anticipations increased in the interference block compared to the baseline. Moreover, there was a significant effect of WM capacity: χ2 (1) = 7.81, p = 0.005, such that anticipations decreased with increased WM capacity. There was no interaction. Consistent with the results from the other phases of the SRT task, there was a negative relation between WM capacity and anticipations.

The analyses demonstrated that anticipatory behavior changed dynamically across the SRT task as a function of WM capacity. In addition, WM capacity was negatively related to the overall number of anticipations across the task: the greater the WM capacity, the less probable the anticipation.

Correct anticipations

Next, we investigated the relation between WM capacity and correct anticipations (or learning outcomes).

Learning (Blocks 1–4)

Analysis demonstrated that correct anticipations increased across learning blocks: χ2 (3) = 17.94, p < 0.001 (see Fig. 3). There was no effect of WM capacity: χ2 (1) = 0.29, p = 0.587, and no interaction. These results indicate that, in contrast to the frequency of anticipations, correct anticipations were not affected by individual differences in WM capacity. On the other hand, correct anticipations increased gradually over the learning blocks.

Fig. 3
figure 3

Predicted probabilities of correct anticipations across learning blocks

Interference (Block 4 vs. Block 5)

There was a significant effect of Block on correct anticipations: χ2 (1) = 6.27, p = 0.012, such that predicted probabilities of correct anticipations decreased in the interference block. There was no effect of WM capacity: χ2 (1) = 0.53, p = 0.468, and no interaction. Thus, anticipation accuracy was not affected by WM capacity. Overall, participants’ anticipations were more accurate in the last learning block, compared to the interference block, where another sequence was introduced.

Recovery (Block 5 vs. Block 6)

There was no effect of Block or WM capacity, and no interaction.Footnote 4 Thus, similar to previous phases of the SRT task, WM capacity did not affect anticipation accuracy.

Baseline (Block 1 vs. Block 5)

None of the models outperformed the null model. Consistent with the results from the other stages of the SRT task, WM capacity did not affect anticipation accuracy.

Taking all results together, participants clearly demonstrated learning in the SRT task, as indexed by an increase in anticipation accuracy over the four learning blocks, followed by reduced accuracy in the interference block. These results are consistent with previous studies (Marcus et al., 2006; Vakil et al., 2017). Critically, changes in anticipation accuracy during the SRT task were not related to individual differences in WM capacity.

Correlation analysis: WM capacity and errors

The results of our previous analyses demonstrating a negative relation between WM capacity and anticipations, in conjunction with the lack of a relation between WM capacity and correct anticipations, suggest a negative relation between WM capacity and errors (or incorrect anticipations). To examine this relation, we computed the bivariate correlation between the overall number of errors and WM capacity. There was a negative correlation between the two measures, r(33) = − 0.453, p = 0.006. The partial correlation coefficient between the two measures, controlling for age, gender, and education, was comparable in magnitude, r(30) = − 0.451, p = 0.010. Thus, the number of errors decreased with increased WM capacity.

WM capacity and the number of consecutive correct anticipations

In our final analysis, we investigated the relation between WM capacity and the number of consecutive correct anticipations (i.e., grouping sizes). The latter measure represents the size of chunks of correct anticipations across the SRT task. Here, we considered the chunks that contained at least two consecutive correct anticipations (M = 2.50, SD = 1.03). The number of consecutive correct anticipations increased with increased WM capacity, b = 0.20 [0.08, 0.32], SE = 0.06, p < 0.001.

The finding that the grouping sizes (or chunks) of consecutive correct anticipations were related to WM capacity represents a strong indication that WM capacity imposes an upper bound of items considered during the SRT task.

Additional experimental control: explicit sequence awareness

Out of the participants who filled out the questionnaire (n = 30) to assess awareness of the sequence, 31% (n = 11) reported that they noticed something special about the experiment and 49% (n = 17) reported that they noticed a pattern, while 23% (n = 8) indicated that they could recall a pattern. Those participants who indicated that they could recall a pattern (n = 8) were asked to generate the sequence; they produced correct strings ranging from 2 to 7 (M = 3.10, SD = 1.45). These results suggest that although most participants reported that they detected some regularities in the task, few were able to reproduce any chunks from the sequence.

Discussion

The current study investigated the mechanisms underlying implicit learning as measured by a deterministic oculomotor SRT task. We used different anticipation measures as indicators of learning processes and investigated how these measures are affected by individual differences in WM capacity. Our results suggest that the two systems interact in intriguing ways. Specifically, WM capacity influences learning strategies (as measured by the overall number of anticipations), but not learning outcomes (as measured by correct anticipations). More specifically, our results demonstrate that WM capacity is negatively related to the overall number of errors and positively related to the grouping sizes (or chunks) of consecutive correct anticipations. We will discuss each of our findings in turn, bearing in mind the limitations associated with any correlational approach in establishing a causal link between implicit sequence learning and the underlying cognitive processes.

Anticipation measures and WM capacity

Our results demonstrate a negative relation between WM capacity and the overall number of anticipations in the SRT task. Thus, WM capacity seems to influence learning processes by biasing individuals to engage in a more proactive (i.e., more anticipations—lower spans) or reactive (i.e., fewer anticipations—higher spans) mode during the SRT task. Moreover, the negative relation between WM capacity and anticipations varied in strength across different phases of the SRT task. Specifically, in the learning phase, the effect of WM on anticipations decreased gradually across the four learning blocks, followed by a more pronounced effect in the interference block (Block 5) and another decrease in the recovery block (Block 6). Thus, the results of the current study suggest that implicit sequence learning relies on anticipatory processes. Moreover, we demonstrated that learners’ anticipatory behavior in the SRT task changed as a combination of individual differences in WM capacity and the environmental cues.

At this point, the results showing a negative relation between the predicted probabilities of anticipations and WM capacity could be interpreted from the competition theory point of view (Galea, Albert, Ditye, & Miall, 2010; Nemeth et al., 2013): weakening the reliance on executive processes (underlying attention-based learning) could have heightened the sensitivity to statistical probabilities (critical for procedural learning). Thus, by this account, lower spans could outperform higher spans in this task.

On the other hand, consistent with a number of previous studies (e.g., Caljouw et al., 2016; Frensch & Miner, 1994; Guzmán, 2018; Kaufman et al., 2010; Unsworth & Engle, 2005), our results indicated that there was no relation between implicit sequence learning outcomes (in this case indexed by correct anticipations) and WM capacity. Thus, higher WM capacity did not lead to more overall accurate anticipations. Previously, these findings were interpreted to indicate the relative automaticity of implicit learning (Kaufman et al., 2010; Unsworth & Engle, 2005). Results from the explicit awareness questionnaire in the current study further support the notion that the majority of participants lacked awareness about the sequentiality of the stimuli presentation in the SRT task.

Finally, although the results of the current study indicated no relation between WM capacity and learning outcomes (indexed by correct anticipations)—a finding in line with previous research (e.g., Caljouw et al., 2016; Frensch & Miner, 1994; Guzmán, 2018; Kaufman et al., 2010; Unsworth & Engle, 2005), we demonstrated that higher WM capacity was related to larger groupings (or chunks) of correct anticipations. While previous studies have consistently reported a positive correlation between WM capacity and chunking during explicit sequence learning (Bo et al., 2009; Kennerley et al., 2004; Sakai et al., 2003; Shea et al., 2006), our study is the first to demonstrate the existence of this relation in implicit sequence learning. In previous studies where sequence learning occurred explicitly (i.e., with conscious intent), the relation between WM and chunking was interpreted to indicate the WM-dependent performance strategies during learning (Bo et al., 2009; Wymbs, Bassett, Mucha, Porter, & Grafton, 2012). Thus, the nuanced qualitative differences in implicit learning outcomes reported in the current study would represent another indication of different efficient learning strategies in lower and higher WM spans during implicit learning. This is further supported by the negative relation between WM capacity and errors.

The results of the current study are theoretically important, because they demonstrate that individual differences in WM capacity could account for differences in learning processes, and ultimately change individuals’ anticipatory behavior, even when learning is implicit, without intention and awareness. Specifically, the negative relation between WM capacity and anticipations, together with the demonstrated positive relation between WM capacity and the number of consecutive correct anticipations, indicates that WM capacity imposes an upper bound on the number of sequential items that can be considered during the task. Anticipating at a higher rate in lower spans could be conceptualized as “starting small”, a notion that was the focus of several studies across relatively independent domains (cf., Newport 1990; Elman, 1993; Medimorec, Mander, & Risko, 2018). “Starting small” reveals economies in learning by leveraging an individual’s processing capacity against the size of their ideal learning unit or chunk. Furthermore, and in the same vein, higher anticipations can be argued to represent a straightforward strategy to reduce memory load with the aim of optimizing learning. On the other hand, higher spans are less constrained, and do not necessarily need to reduce the information load as much as lower spans. This allows higher spans to chunk probable patterns into larger units. Hence, the results of the current study indicate that individuals’ anticipatory reactions to environmental cues change as a function of WM capacity. The results also suggest that WM processes, such as context-relevant updating of information and the formation of anticipations, can take place implicitly.

Implications for future research

The results of the current study clearly support the notion that individual differences in WM capacity are, at least partly, involved in implicit sequence learning. Deeper insights into the relations between different aspects of WM (i.e., storage and processing) and implicit sequence learning (using more complex probabilistic sequences), but also between different executive functions (e.g., relational integration, imagery, attention, updating, switching and inhibition; Janacsek & Nemeth, 2015) and implicit sequence learning are much needed. Given increased interest in examining the links between language cognition and implicit learning (Arciuli, & Simpson, 2012; Kidd, 2012; Daltrozzo et al., 2017; Kidd & Arciuli, 2016; Milin, Divjak, & Baayen, 2017; Misyak & Christiansen, 2012; Shafto, Conway, Field, & Houston, 2012), determining how language learning and processing are supported by implicit learning remain important questions.

In addition, recent research across a number of domains has demonstrated that individuals use different strategies, such as explorative and exploitative actions when learning and adapting to environmental (conditional) probabilities (Dale et al., 2012; Fischer & Holt, 2017, Milin et al., 2017; Stafford & Dewar, 2014). Further investigating whether anticipatory behavior during implicit learning demonstrates similar explorative/exploitative dynamics should reveal hitherto un(der)explored dimensions of learning and advance our understanding of how individuals adapt to environmental statistics, for example during language learning.

Conclusion

Our results provide support for the existence of general cognitive strategies that are employed spontaneously during implicit sequence learning. Critically, such strategies seem to be efficient in detecting regularities in the environment and are modulated by individual differences in WM capacity. Specifically, WM capacity affects learning strategies, as demonstrated by the negative relation between WM capacity and the rate of anticipation as well as between WM capacity and error rate, and the positive relation between WM capacity and the grouping size (or chunk size) of consecutive correct anticipations. On the other hand, WM capacity does not affect learning, as measured by the overall number of accurate anticipations.