Introduction

Implicit sequence learning refers to the ability to combine isolated movements into one smooth, coherent action without conscious awareness of what was learned or the fact that learning occurred (Reber 1993). Despite being a subconscious process, it is thought to engage cognitive resources. For example, disruptive transcranial magnetic stimulation applied to the dorsolateral prefrontal cortex (DLPFC), a structure involved in working memory (e.g., Jonides et al. 1993), impairs implicit sequence learning (e.g., Pascual-Leone et al. 1996; Robertson et al. 2001). Schwarb and Schumacher (2009) have demonstrated that DLPFC is selectively involved in spatial response selection, which is the underlying mechanism of sequence learning (Deroost and Soetens 2006; Hazeltine 2002; Schumacher and Schwarb 2009; Schwarb and Schumacher 2010). While there is some evidence supporting the role of DLPFC in implicit sequence learning, one model suggests that DLPFC only contributes to sequence learning under explicit conditions (Willingham 1998). Several neuroimaging studies including our recent work (e.g., Bapi et al. 2006; Bo et al. 2011) report no significant DLPFC activation for implicit sequence practice. A recent model suggests that implicit sequence learning begins in the motor and premotor cortical areas. The early stage of implicit learning typically does not involve prefrontal areas (Ashe et al. 2006). Thus, the current study examined the relationship between short-term working memory (both visuospatial and verbal) capacity and performance change in a single session using the implicit SRT task.

A previous study reported that verbal working memory performance correlated with the magnitude of implicitly learned complex sequences [i.e., under dual-task conditions with a short response-to-stimulus interval, (Frensch and Miner 1994)]. However, recent work has suggested that the tasks used by Frensch and Miner (1994) to measure working memory may overestimate capacity because they allow for the use of chunking strategies and rehearsal (Cowan 2001; Jonides et al. 2008). Chunking refers to the process of forming a familiar pattern, which contains multiple elements (Ericsson and Kintsch 1995). When learning a new task, participants have to select individual elements one by one. Once a task is learned, a single representation (i.e., a motor chunk) for the task can be stored in and retrieved from long-term memory. Behaviorally, an unevenly distributed temporal pattern between each movement element (i.e., inter-response time) can be observed after repeated practice (Shea et al. 2006; Verwey 1996, 2001). Ericsson et al. (1980) have reported a case where a participant with average memory abilities increased his memory span from 7 to 79 digits. This individual learned to group chunks of digits together to form “supergroups”, which allowed him to dramatically increase his digit span. Thus, it is not clear whether the correlation reported in Frensch and Miner (1994) was due to “chunking” occurring both during the sequence learning and the working memory tasks. In order to avoid contamination by this common “chunking” process, we adapted change detection working memory assessments (Luck and Vogel 1997; Thomason et al. 2009), which identify the number of items that individuals can hold and operate upon in short-term working memory (Awh et al. 2007).

It has been suggested that there are separate working memory resources for visuospatial and verbal information (MacDonald and Christiansen 2002; Shah and Miyake 1996). According to this domain-specific view, one might expect differential correlations for the two types of working memory with implicit performance of the SRT task. Alternatively, it has been hypothesized that working memory is a unitary domain, which might explain why individual differences in visuospatial working memory are often strongly correlated with individual differences in verbal working memory (Conway and Engle 1996; Kane et al. 2004). According to this domain-general view, one might expect correlations for both verbal and visuospatial working memory with implicit performance of the SRT task. Recent work also demonstrates that visuospatial and verbal working memory engage partially overlapping networks (Thomason et al. 2009), further supporting the prediction that both might equally correlate with implicit SRT task performance.

Methods

Participants

Twenty-one right-handed [determined by self-report and the Edinburgh handedness inventory (Oldfield 1971), mean = 0.90] adults (M = 20.09 ± 3.2 year, 11 men) participated in this study. They gave their informed consent and were paid for their participation. The experimental procedures were approved by the Institutional Review Board of the University of Michigan.

Procedure

We administered measures of short-term visuospatial memory: Corsi block tapping test (Kessels et al. 2000) and card rotation and cube comparison tests [Educational Testing Service, (Thurston 1951)]; and short-term verbal memory: forward and backward digit span tasks from WAIS-R (Wechsler 1981) and the reading span task (Daneman and Carpenter 1980).

Participants also performed computerized visuospatial working memory (VSWM), verbal working memory (VWM), and implicit SRT tasks (E-Prime version 1.0, Psychology Software Tools, Pittsburgh, PA). The working memory tasks were modified from Luck and Vogel (1997) and Thomason et al. (2009). Common between the two tasks, each trial began with a central fixation cross, followed by a sample array for 100 ms, a 900-ms blank screen delay, and a 2,000-ms presentation of a test array. Participants were then prompted to indicate whether the test array was the same (s) or different (d) from the sample array by keypress (Fig. 1). For VSWM, the arrays consisted of 2–8 (array size) colored circles (radius = 1, randomly selected red, orange, yellow, green, blue, violet, pink, white, black, and brown). For each trial, the test array was either the same as the sample array or different with only one of the colors changed. Therefore, this task relied on the detection of a change in color at different locations. In the VWM task, the arrays consisted of letters (size = 1, randomly from Q, R, G, B, H, A, N, F, K, Z, S, and V). Letters were uppercase in the sample arrays and lowercase in the test arrays, forcing participants to encode the letters. One of the letters was changed if the test array was different from the sample array. In these tasks, all the colored circles or letters were arranged along an invisible concentric circle around a fixation cross. The working memory capacity was calculated using the formula: K = Size of the array * (observed hit rate − false alarm rate) (Vogel and Machizawa 2004). Then, the average K across all array sizes was computed to represent the working memory capacity for each participant.

Fig. 1
figure 1

Illustration of working memory tasks. a VSWM task. The sample arrays consisted of colored circles. b The sample arrays consisted of uppercase/lowercase letters

It could be argued that participants might not encode the letters Z, K, S, and V in the VWM task due to the high similarity between uppercase and lowercase forms of these letters. In order to address this, we computed verbal working memory capacity using the trials both with and without inclusion of these letters. A paired-sample t-test showed no significant difference between capacity using the two trial types (t (20) = 0.79, P > 0.10), and capacity was strongly correlated across participants when computed both ways (R = 0.99, P < 0.01). For consistency with the literature in which similar tasks have been used, however, we report VWM capacity computed without the inclusion of these potentially vague letters.

During the implicit SRT task, participants placed the middle and index fingers of both hands on four buttons located on a response box. Upon seeing an “X” in one of four boxes on a computer screen, they were instructed to press the corresponding button as quickly and accurately as possible. If the correct button was pressed, the stimulus appeared in a different box. If the wrong button was pressed, the stimulus repeated. Participants performed two blocks of finger pressing in response to randomly presented stimuli (blocks 1 and 2), followed by 5 blocks of tapping in response to sequentially presented stimuli (blocks 3–7, 8 sequence repetitions per block), and then an additional two random blocks (blocks 8 and 9). Each block consisted of 96 trials (8 repetitions of the 12 element sequence) with a 0-s response-to-stimulus interval [RSI, see (Destrebecqz and Cleeremans 2001)]. Each time the sequence was repeated, the presentation would start at a different random point within the sequence. Participants were not informed of the random or sequence blocks. Each participant was exposed to one of three sequences (121423413243, 342312143241, and 341243142132) in the study. Sequences contained no repeating elements (12), no trills (1212), and no runs (1234). In the random blocks, all the elements were randomly generated. Changes in the median response time (RT) across trials and blocks were used to measure performance change in this task.

Three sequence awareness tests were utilized in an effort to be conservative about participants gaining explicit awareness of the sequence. During the first generate task (Seidler et al. 2002, 2005), participants were asked whether they had noticed that the stimuli were sequenced. Then, they were told that the stimuli were in fact sequential in some of the blocks. Participants positioned their fingers on the response box as they had in the SRT task, and were asked to reproduce the sequence to the best of their ability for 20 trials.

In the second test, we used the process dissociation procedure to probe awareness (Destrebecqz and Cleeremans 2001). Participants completed both inclusion and exclusion tasks. For inclusion, participants were asked to recall any fragment of the sequence they could remember. They were presented with a single stimulus that appeared at a random location and were asked to generate a series of 96 trials that “resembled the original sequence”. During exclusion, participants were asked to make up their own sequence that was different from the sequence in the SRT task. They were instructed to repeatedly generate elements for the duration of 96 trials and to avoid repeating elements, trills, and runs. During the test, participants were not told to produce a 12-element sequence. Generally, inclusion instructions are used to identify explicit knowledge of the practiced sequence, while exclusion instructions are used to identify implicit knowledge. However, Destrebecqz and Cleeremans (2001) have demonstrated that both tasks should be used to evaluate the awareness of the sequence. Normally, if an inclusion task yields a score that is higher than chance level, learning is concluded to be explicit. However, a different conclusion can emerge when one also considers exclusion task performance. If participants have no control over their knowledge of the sequence, both inclusion and exclusion scores are higher than chance level and implicit learning should be concluded (Destrebecqz and Cleeremans 2001). Thus, the current study employed both the inclusion and exclusion tasks to evaluate sequence awareness.

In the final recognition task, participants were presented with 24 3-trial fragments. Half were part of the SRT sequence, and the other half were random. Participants were asked to respond to the stimuli as in the SRT task and then to rate how certain they were that the fragment was part of the sequence they had practiced (ranging from 1 = certain it was, to 6 = certain it was not) (Shanks and Johnstone 1999).

Results

The mean working memory capacity scores were 3.31 and 2.51 for VSWM (SD ± 0.70) and VWM (SD ± 0.63), respectively. A significant difference was found between the two tasks (t (20) = 3.70, P < 0.01), suggesting that the VWM task may have been more difficult. VSWM and VWM were significantly correlated with each other across participants (R = 0.53, P < 0.01).

Figure 2a shows the mean of the median RT (i.e., median RT for every 12 elements) for each block of the implicit SRT task. The mean RT improvement between blocks 1 and 2 (random trials) reflects general practice effects. Figure 2a insert shows that the RT did not change much within the 2nd second random block, suggesting that participants’ performance stabilized before they were exposed to sequence blocks. To measure the rate of performance improvement in the sequence blocks, we calculated 8 median RTs per block (one for every sequence repetition) and then fit three different functions (linear, exponential, and power) to the 40 consecutive median RTs across blocks 3–7. The goodness-of-fit parameters on individual data sets (Table 1) revealed that the power function was the worst while the exponential fitting was slightly better than the linear function. Since there were fewer participants who had the lowest fit for the exponential function, the exponential decay parameter [b in equation a * exp(b * x)] was chosen to represent behavior change rate. To verify that the performance change during sequential blocks was not mainly due to general practice effects (i.e., not sensitive to the initial decrease in response times), we correlated the RT differences between blocks 2–3 with the RT differences between blocks 4–7. No significant correlation was found (R = 0.23, P > 0.05), suggesting that although we could not completely rule out general practice effects, the fitting parameters were still valid for capturing overall performance improvement across the sequence blocks. The median RT for block 8 (random) was significantly different from the median RT for block 7 (the last sequence block), suggesting positive performance improvement (t (20) = 5.32, P < 0.01) during the sequence blocks. The mean accuracy for this task ranged from 94 to 99% among all participants across the 9 blocks. A one-way ANOVA on accuracy revealed no effect of block (F (8,188) = 1.43, P > 0.10).

Fig. 2
figure 2

a Mean response time for each block. Blocks 1, 2, 8, and 9 were random, and blocks 3–7 were sequence practice; a insert Median response time for every 12 trials in blocks 1 and 2; b Mean accuracy scores on the inclusion and exclusion tasks; c Mean rating scores on the sequence and random elements in the recognition task, the bars in the figure stand for the standard error of the mean; d Significant correlation between VSWM capacity and the exponential decay parameter; e Significant correlation between VWM capacity and the exponential decay parameter

Table 1 The goodness-of-fit (Adj-R-square, rmse) for the exponential, linear, and power functions on the individual data set

In the sequence generate task, we considered a participant to have explicit knowledge if he or she could correctly produce five or more successive positions of the sequence (Willingham et al. 1997). None of our 21 participants met this criterion in the current study.

In the process dissociation task, we used the procedure described in Destrebecqz and Cleeremans (2001). We first computed the number of generated chunks of three elements that were part of the training sequence in both inclusion and exclusion tasks. Since the generated sequences were 96 trials, the maximum number of correct chunks was 94 trials. To obtain inclusion and exclusion scores for each participant, we then divided the corresponding number of correct chunks by 94. Since participants were told not to produce repetitions, the chance level was 0.33 (Destrebecqz and Cleeremans 2001).

Figure 2b shows average inclusion and exclusion scores. A paired-sample t-test did not reveal significance (t (20) = 0.84, P > 0.10). To examine whether generation performance reflects knowledge acquired during the SRT task, one-tailed t tests were used to compare generation scores with the chance level. Neither of the two scores were significantly different from 0.33 (both P > 0.10). Individually, we found that twelve participants had inclusion and exclusion scores lower than 0.33. Eight participants had inclusion and exclusion scores that were both higher than 0.33. One participant showed 0.29 and 0.34 on the inclusion and exclusion scores, respectively. It has been claimed that implicit learning can be concluded if the mean correct percentage is higher than 0.33 in both inclusion and exclusion tasks because participants had no control over their knowledge of the sequence (Destrebecqz and Cleeremans 2001).

In the recognition task, there was no significant difference (t (20) = 1.02, P > 0.10) in rating scores between sequence (M = 3.09) and random (M = 3.11) elements (Fig. 2c). Individual data showed that the rating scores were within 2.08–3.92 and 1.95–3.92 for sequence and random, respectively.

The rate of performance change, i.e., exponential decay, was significantly correlated with both VSWM (R = −0.65, P < 0.01) and VWM (R = −0.53, P < 0.05) capacity, supporting that individuals with higher working memory capacity had faster improvements (Fig. 2d, e). However, the RT differences between blocks 8 and 7 (i.e., sequence-specific performance advantage) were not correlated with either working memory measure (R = 0.32, P = 0.15; R = 0.24, P = 0.30 for VSWM and VWM). Since the SRT task contains visuospatial stimuli, we first ran a stepwise regression analysis and found that VSWM explained most of the variance in performance (i.e., exponential decay parameter, R 2 = 0.43, P < 0.05). Adding VWM did not significantly improve the regression model (P > 0.05). We then ran the same stepwise regression analysis using VWM as the first predictor. Results showed that VWM alone did not explain the variance in SRT performance (P > 0.10). Adding VSWM significantly improved the model (P < 0.01). Finally, the rate of performance improvement also significantly correlated with card rotation scores (R = 0.44, P < 0.05), as we have shown for rate of sensorimotor adaptation (Anguera et al. 2010). No other correlations were found between the traditional neuropsychological assessments and either the working memory or SRT measures (all P > 0.05).

It could be argued that using the median RT for each sequence repetition might occlude the chunk patterns developed through learning. To address this possibility, we first carefully inspected the individual data sets and did not find any clear and consistent chunks. We also ran an additional analysis using the median RT for each block. The exponential decay parameters on this data set were used to represent behavioral change rates across the blocks. Similar to our main findings above, significant correlations were found between the block behavioral change rate and VSWM (R = −0.65, P < 0.01) as well as VWM (R = −0.61, P < 0.01) capacity.

Discussion

Although previous studies have suggested that implicit sequence learning involves cognitive processes such as working memory (Frensch and Miner 1994), there has been a lack of evidence. Our main results revealed that VSWM, VWM, and card rotation performance were significantly correlated with the rate of reaction time change in the SRT task, supporting a link between working memory and implicit sequence performance improvements.

As in Frensch and Miner (1994), we did not find correlations between span tasks and performance of the SRT task. In contrast, we found a significant relationship between working memory capacity measured with a change detection task and implicit performance of the SRT task. These results further underscore the importance of using time-limited tasks to assess working memory capacity without contamination by chunking strategies. In contrast to the span tasks, the change detection assessments of working memory measure the number of items that individuals can hold and operate upon in working memory (Awh et al. 2007). Consistent across several studies, it has been reported that the average working memory capacity is between 2 and 4 items for young adults (Luck and Vogel 1997; Vogel and Machizawa 2004). Thus, our results suggest that implicit performance improvements on the SRT task rely on the number of items that individuals can hold and operate upon in working memory, as opposed to the ability for chunking or rehearsal. Note that although the SRT task is a widely used paradigm to investigate implicit sequence learning, a primary limitation of the current work is that we did not administer a delayed retention test to confirm the learning outcomes. Our current findings suggest, however, that even within a single session of the SRT task, improvement rate is related to working memory capacity.

We have previously shown that VSWM predicts the rate of learning during explicit motor sequence learning (Bo and Seidler 2009). It has been argued that implicit and explicit sequence learning rely on partially distinct neural processes. For example, Willingham (1998) has proposed that a ventral cortical system is engaged for explicit learning, in which task goals are transferred from the prefrontal cortex to the posterior temporal lobe. The dorsal cortical learning system, including the parietal and premotor areas, operates in the implicit mode. Similarly, Keele et al. (2003) suggest that the dorsal system contributes to implicit sequence learning while the ventral system can operate under either an implicit or explicit learning mode. However, Ashe et al. (2006) have proposed that overlapping neural networks contribute to both implicit and explicit processes. When the intention to learn a sequence is explicit, the processes originate in the prefrontal cortex and then later in learning the premotor and motor cortical regions are engaged. In contrast, when sequences are acquired implicitly, learning begins in the motor cortical areas and then propagates to premotor regions and eventually the prefrontal cortex. Recently, Galea et al. (2010) reported a TMS study where disruption of DLPFC degraded the consolidation of a sequence within the declarative system and thus facilitated consolidation within the procedural learning system. The authors argued that reduced involvement of the declarative system allowed additional recruitment of resources for procedural consolidation. Thus, the current results, together with our previous findings (Bo and Seidler 2009) as well as others (Galea et al. 2010), support the idea of overlapping processes between explicit and implicit learning and indicate that VSWM may be involved in both forms of sequence learning.

One might argue that the sequence generation task may not be a true test for explicit awareness since it allows the implicit motor system to “cue” explicit memory. In the recognition test for awareness, one might question that the statistical properties of the random elements were different from the trained sequence. As a consequence, performance differences for random and systematic sequences might not reflect explicit awareness. Because we could not rule out these possibilities, three sequence awareness tests were utilized in an effort to be conservative about participants gaining explicit awareness of the sequence. In addition, all the participants performed two random blocks at the beginning of the task. Although this potentially introduces order effects, we think that it was important to familiarize participants with the task and minimize general practice effects before performing the sequence blocks.

One limitation of the current study is the use of random trials in block 8 to evaluate sequence-specific performance improvements. It has been argued that learning should be measured by comparing RT differences between trained-sequence and untrained-sequence trials (i.e., “transfer effects”, Cohen et al. 1990; Keele et al. 1995; Willingham et al. 1989). Reed and Johnson (1994) demonstrated that comparing a trained and an uncontrolled sequence could produce substantial transfer effects that might overinflate the learning effects. A different second-order conditional sequence, instead of random trials, should be used to test final learning. Thus, it is possible that our use of random trials overinflated the size of the transfer effect (difference between blocks 7 and 8) in the current experiment, which might explain the nonsignificant correlation between the transfer effect and working memory scores.

The correlation that we observed between VSWM and VWM does not support the domain-specific view in the literature. It could be argued that such a correlation might be due to the similar spatial layout on the computer screen between the two tasks. Although we could not rule out this possibility, the significant difference between VSWM and VWM suggested that participants performed differently in the two tasks and the VWM task might have been harder for participants.

Finally, since both VSWM and VWM were related to performance of the SRT task, we determined the relative contribution of each to performance improvements. It has been reported that disrupting left DLPFC impairs implicit sequence learning (Pascual-Leone et al. 1996; Robertson et al. 2001). Further, it has been demonstrated that right DLPFC is specifically involved in spatial response selection (Schwarb and Schumacher 2009). Multiple linear regression analysis revealed that VSWM explained a significant portion of the variance in rate of performance change, and the addition of VWM did not significantly improve the model. These findings demonstrate that VSWM plays a role in the implicit performance improvement of second-order conditional sequences and that VSWM and VWM may engage partially overlapping networks. In addition, it is not surprising that there was also a significant correlation between rate of performance change and card rotation scores. Both the card rotation task and our VSWM task involve visuospatial processing. In the card rotation task, participants have to mentally rotate the figure to make comparisons, relying on working memory. The significant correlation between SRT performance change and the card rotation task further supports the importance of VSWM for implicit performance changes in the SRT task.