Introduction

The diversity of functionally distinct memory systems likely evolved in response to divergent selection pressures that vary with environmental conditions and across development (Sherry and Schacter 1987). A habit memory system appears early in development, facilitates the gradual learning of an indefinite number of habits and skills that are stable over long intervals and is not accessible to cognitive monitoring and control (Bachevalier 1990; Gasbarri et al. 2014). In contrast, working memory facilitates the rapid acquisition and relatively brief retention of a limited amount of information (Baddeley 1992; Shettleworth 2010, Chapter 7). In humans, and possibly in some non-humans, the contents of working memory are actively maintained and accessible to cognitive monitoring and control (Baddeley 2003; Basile et al. 2015; Basile and Hampton 2013b; Cowan 2008; Hampton 2001; Tu and Hampton 2014).

Independent memory systems may act simultaneously in parallel to regulate behavior (Hay and Jacoby 1996; McDonald and White 1993; Poldrack and Packard 2003; Tu and Hampton 2013; Tu et al. 2011). Dissociation of memory systems is established when altering the contributions one memory system makes to behavior leaves the contribution of another system relatively intact. For example, Tu and Hampton (2013) studied the relative contributions of habits and “one-trial memories,” the latter being a type of memory of indeterminate status—relatively short term compared to habit, but not clearly working memory. These authors found that these two types of memory can be controlled independently. Decreasing the likelihood of reward following a stimulus reduced the control of behavior by habit memory, but did not affect control by “one-trial memory” in rhesus monkeys. Lengthening the duration of retention intervals decreased control by one-trial memory, but left control by habit intact (Tu and Hampton 2013). Such behavioral dissociations are often related to neurobiological dissociations. In the above example, one-trial memory, but not habit, depended critically on the perirhinal cortex in rhesus monkeys (Tu et al. 2011). The amount of experience with a given task can also dissociate the contributions of distinct memory systems. Rats trained to retrieve food rewards on a plus maze initially used allocentric spatial cues to locate the food. After repeatedly starting from the same location and turning in a particular direction to retrieve the food, the behavior of the rats came under the control of egocentric cues. Although the control of behavior switched from predominantly allocentric to predominantly egocentric cues with training, both types of memory remained present and capable of controlling behavior. Inactivation of the dorsal striatum, which is critical for the control of behavior by egocentric cues, resulted in a return of control by allocentric cues (Packard and McGaugh 1996).

Shifts in the relative control of behavior by distinct memory systems may also occur during the formation of learning sets. In Harlow’s seminal Formation of Learning Sets (1949), the term learning-to-learn was used to describe a shift from gradual to rapid acquisition of discrimination learning tasks as rhesus monkeys completed successive discriminations. Neuroimaging of non-human primates indicates that the formation of a learning set co-occurs with a shift from striatal to lateral prefrontal cortical activity (Yokoyama et al. 2005). Given that shifts in dominance among multiple memory systems contribute to the development of a learning set, it is likely that such shifts also contribute to other learning-to-learn tasks, such as serial reversal learning.

In serial reversal learning, subjects are repeatedly presented with discrimination trials containing the same two objects or images. At any given time, only one of the two stimuli is rewarded when selected. Within every reversal, the positive stimulus (S+) is rewarded if selected and will remain positive until a predetermined performance criterion is met. Upon reaching criterion, the contingencies of reinforcement reverse (i.e., S+ becomes S− and S− becomes S+). Subjects are then required to meet criterion by selecting the formerly non-reinforced stimulus. This process may be repeated for many reversals.

Reversal learning improves with reversal experience. Naïve subjects reverse gradually, making many perseverative choices of the stimulus that was rewarded before the most recent reversal (e.g., Mackintosh et al. 1968). After experiencing many reversals, naïve reversers become experts and show flexible, win-stay, lose-shift responding, sometimes making only a single error before reliably selecting the previously incorrect stimulus (Bessemer and Stollnitz 1971; Shettleworth 2010, Chapter 6). The appearance of the win-stay, lose-shift response pattern occurs in the absence of any change in external task demands, suggesting that the development of expertise is facilitated by a shift in the relative control of choice behavior by distinct memory systems.

One account of performance improvements in serial reversal learning is that responding becomes less perseverative as proactive interference accumulates (Mackintosh et al. 1968). After both stimuli have been extensively reinforced in successive reversals, the difference in associative strength between them may be only modestly affected by current reinforcement. It has therefore been rather counterintuitively argued that the resulting difficulty in discriminating the associative value of the two stimuli reduces perseveration, allowing subjects to respond more flexibly at the onset of a reversal (Clayton 1966; Gonzalez et al. 1967; Kraemer and Golding 1997; Strang and Sherry 2014). However, an inherent issue with a proactive interference explanation is that it can only account for reversal improvement when reversals (i.e., the exchange from S1+/S2− to S1−/S2+) are separated by long intervals. If instead reversals occur in rapid succession, such that the inter-reversal interval is no different from the inter-trial interval, the contributions by proactive interference will likely be outweighed by recency of the last rewarded choice, and thus, preference for the previous S+ will persist into the new reversal (Kraemer and Golding 1997; Mackintosh et al. 1968). Given that performance improvements occur even when reversals are experienced in rapid succession, it seems likely that alternative mechanisms also contribute to the development of expertise in serial reversal learning.

The development of serial reversal learning expertise may be facilitated by a shift in the relative control of choice by working memory and habit. We hypothesize that choice in naïve reversers is under greater relative control by a habit system, while choice in expert reversers is under greater relative control by working memory. Control of choice by habit would explain the relatively gradual reversing, marked by perseveration, observed in naïve reversers. Control of choice by working memory would account for the flexible, rapid reversing when these reversers become experts. If habit controls choice in naïve reversers and working memory controls choice in expert reversers, then manipulations that attenuate working memory should impair reversal learning in expert but not naïve reversers.

Experiment 1

We tested whether the development of serial reversal learning expertise in rhesus monkeys is facilitated by an increase in the relative control of choice by working memory rather than habit. The contents of working memory are typically available for short periods of time while habits remain intact over long intervals (Baddeley 2000; Grant and Roberts 1973; Mishkin et al. 1984, Chapter 2). This difference in availability after the passage of time allowed us to assess the relative contributions of working memory and habit to choice by manipulating the interval between successive discrimination trials. In successive discriminations, the inter-trial interval (ITI) is the interval over which information from the last trial must be maintained to inform choice on the current trial. Working memory for the outcome of the last discrimination should be substantially attenuated after long ITIs, whereas habit resulting from previous trials should persist. We compared accuracy on discrimination trials following short 1-s and long 30-s ITIs across many reversals to determine whether the extent to which habit and working memory controlled choice changed as monkeys changed from naïve reversers at the beginning of training to expert reversers by the end of training. If choice in naïve reversers is controlled primarily by habit, there should be no difference in discrimination performance following 1- and 30-s ITIs. To the extent that serial reversal expertise is under the control of working memory, discrimination performance should be significantly better following 1-s ITIs than 30-s ITIs.

Methods

Subjects and apparatus

Six adult, male rhesus monkeys (Macaca mulatta; mean age = 9.16 years) were used. Monkeys received full daily food rations and ad libitum access to water. Two of the six monkeys were pair-housed at the time of this study. The other four monkeys were individually housed, in line with veterinary guidance, but had visual contact with other monkeys. Testing occurred for up to seven hours a day, six days a week. Monkeys were tested in their home cages using portable testing rigs. Each testing rig was equipped with 15-inch color LCD touch-sensitive screen (Elo TouchSystems, Menlo Park, CA), running at a resolution of 1024 × 768 pixels, and two automatic food dispensers (Med Associates, Inc., St. Albans, VT) which delivered nutritionally balanced primate pellets (Bio-Serv, Frenchtown, NJ). Tests were controlled by a personal computer running a custom program written in presentation (Neurobehavioral Systems, Albany, CA). All six subjects had previous experience with touch screen tasks, including image discrimination; however, none of the six had previous experience with reversal learning. Pair-housed monkeys were separated during testing by a panel that allowed limited visual, auditory and tactile contact but prevented access to the other monkey’s computer screen.

Procedure

Figure 1 depicts the sequence of events in a trial. Subjects initiated each trial by touching a 100 × 100 pixel green square twice (FR 2). Two images (350 × 350 pixels) appeared, each placed 250 pixels left or right of the center of the touch screen. The left–right position of the two images was counterbalanced and pseudo-randomly determined such that a stimulus could appear on the same side no more than 4 times in a row. The same two images were used throughout all reversals. Monkeys selected one of the two images by touching it twice (FR 2). Touching the S+ cleared the screen and produced a positive sound and a food pellet. Touching the S− cleared the screen and produced a negative sound. Either a 1- or 30-s ITI ensued. The 1- and 30-s ITIs alternated, regardless of trial outcome.

Fig. 1
figure 1

Order of events for every trial in Experiment 1. All trials contained the three depicted events. The duration of the inter-trial interval (ITI) alternated every other trial

The same image was the S+ until monkeys reached a performance criterion of 15 out of 16 correct discrimination trials. This criterion was assessed once every block of 16 trials. If criterion was met, a reversal occurred; if criterion was not met, 16 additional trials were administered continuing the same S+/S− arrangement. The first trial in which the reversed contingencies were in place, Trial 0, was not included in the performance criterion because the monkeys could not know that the reversal had occurred until they received feedback on this trial. Thus, every reversal contained at least 17 trials: Trial 0, followed by blocks of 16 trials, 15 of which had to be correct to trigger a reversal. The odd number of trials ensured that Trial 1 of each reversal followed a 1- or 30-s ITI equally often. Testing continued until monkeys had completed a total of 90 reversals. Any reversal that had not been completed by the end of a testing day was administered at the start of the next testing day. Thus, both the reversal number and S+/S− configuration carried over across days; however, any progress toward reaching reversal criterion did not. Trials from incomplete reversals were not included in the data analysis.

We calculated the proportion of correct discrimination trials following 1- and 30-s ITIs to assess the relative control of choice by habit and working memory. Proportion correct scores were arcsine transformed before analysis (Aron and Aron 1999, Chapter 14) to better approximate normality. We hypothesized that if choice behavior in naïve reversers was largely controlled by habit, with little contribution of working memory, discrimination performance would not differ following 1- or 30-s ITIs. Complementarily, we hypothesized that if choice behavior in expert reversers was under greater relative control by working memory, accuracy would be higher on discrimination trials following 1-s ITIs compared to 30-s ITIs.

Results and discussion

Monkeys were scheduled to complete 90 reversals in Experiment 1. However, on the 4th day of testing, 4 of the 6 monkeys were accidentally tested with only 1-s ITIs, rather than alternating long and short ITIs. These four monkeys performed one day of reversals under this erroneous condition, averaging 56.5 reversals. Three of these four monkeys had already completed at least 35 reversals before receiving the incorrect version of the program. The fourth monkey had completed only 8 reversals. Because analysis required 10 reversals under the alternating ITI conditions, this monkey was not included in the analysis of Experiment 1. All monkeys were given an additional 30 reversals of the correct testing with alternating ITIs. As a result, the five monkeys completed an average of 5.2 testing days and 120.8 reversals. Thus, despite the experimental error, we acquired a block of at least 10 sessions of data from 5 monkeys when they were novice reversers and another block of 10 sessions after we expected them to be expert.

We compared the first 10 and the last 10 reversals performed by each monkey to determine whether discrimination performance improved across the intervening reversals. Accuracy was assessed by averaging the number of errors committed in Trials 1–16 of each of the reversals. Only Trials 1–16 were used in the analysis because monkeys could, and sometimes did, reach criterion in the first block of 16 trials in a reversal. Monkeys made significantly more errors in the first 10 reversals than the last 10 reversals, suggesting that they had developed serial reversal expertise over the course of training (first 10: M = 13.54, SD = 7.278; last 10: M = 4.66, SD = 3.274).

Figure 2 shows that performance following 1- or 30-s ITIs did not differ early in the reversal task; however, after experiencing many reversals monkeys performed significantly better following 1-s ITIs relative to 30-s ITIs. To determine whether the control of choice by working memory changed as a function of reversal experience, we compared the proportion of correct discrimination trials preceded by 1- and 30-s ITIs during the first and last 10 reversals. The difference in accuracy following 1- and 30-s intervals was significantly greater during the last 10 reversals, and there was a significant difference in accuracy between 1 and 30-s ITI types (two-factor repeated measures ANOVA; reversal experience: F (1,4) = 39.9, P = 0.003; ITI type: F (1,4) = 101.6, P = 0.001; interaction: F (1,4) = 22.0, P = 0.009). Follow-up analyses confirmed that accuracy was significantly higher following 1-s than 30-s ITIs in the last 10 reversals, while this difference was not present in the first 10 reversals (paired samples t tests; first 10 reversals: t 4 = −.801, P = 0.468; last 10 reversals: t 4 = 11.706, P < 0.001). Follow-up analyses also showed that discrimination accuracy following 30-s ITIs was significantly more accurate in the last 10 reversals than in the first 10 reversals (paired samples t test; reversal experience: t 4 = −3.834, P = 0.019). Interestingly, as is shown in Fig. 3, accuracy following both 1- and 30-s ITI durations improved equally across the first 10 reversals. Because there was no difference in accuracy between the two ITI types in the first 10 reversals, it suggests that monkeys were initially aided by a process other than working memory.

Fig. 2
figure 2

Proportion correct in discrimination trials preceded by a 1-s (striped red bars) or 30-s (solid blue bars) ITI in Experiment 1. Scores are from Trials 1 through 16 for each of the first and last 10 reversals. Monkeys were equally accurate following 1- and 30-s ITIs during the first 10 reversals. Monkeys were significantly more accurate following 1-s, compared to 30-s ITIs, during the last 10 reversals (color figure online)

Fig. 3
figure 3

Proportion correct in discrimination trials preceded by 1-s (dashed red line) and 30-s (solid dark blue line) ITIs in Experiment 1. Monkeys were equally accurate following 1- and 30-s ITIs; however, performance improved with reversal experience (color figure online)

We replicated the finding that animals become more proficient at reversing with more experience (Dufort et al. 1954; Mackintosh et al. 1968; Ploog and Williams 2010). Longer ITIs impaired learning by monkeys once they became expert reversers, but it did not affect them while they were still naïve reversers, suggesting that working memory is critical for reversal expertise. We propose that working memory increases reversal efficiency as it allows subjects to update their representation of the current S+/S− based on feedback from the outcome of the previous trial. Thus, the difference between the contributions by a working memory system, relative to a habit system, is likely to be most pronounced immediately after a reversal has occurred.

To determine whether working memory was especially critical early in reversals in expert reversers, we compared accuracy following 1- and 30-s ITIs early and late in reversals. We averaged accuracy on Trials 1–4 under each ITI condition and on Trials 13–16 under each ITI condition. We determined these scores for the first 10 reversals, while monkeys were naïve, and for the last 10 reversals, when monkeys were expert. If working memory was critical for rapid reversal in expert reversers, but not in naïve reversers, we should find that monkeys were especially accurate early in reversals after they were expert. This pattern, evident in Fig. 4, is supported by a three-way interaction between level of expertise, phase of reversal and type of ITI (three-factor repeated measures ANOVA; reversal experience: F (1,4) = 38.280, P = 0.003; early versus late: F (1,4) = 38.238, P = 0.003; ITI type: F (1,4) = 81.304, P = 0.001; reversal experience * early versus late: F (1,4) = 28.684, P = 0.006; reversal experience * ITI Type: F (1,4) = 28.362, P = 0.006; early versus late * ITI Type: F (1,4) = .035, P = 0.860; reversal experience * early versus late * ITI Type: F (1,4) = 12.543, P = 0.024). This result shows that performance increased most rapidly early in reversals after monkeys became expert at reversing, consistent with a strong influence of working memory early in reversals after expertise was established.

Fig. 4
figure 4

Proportion correct in discrimination trials following 1 s (red dashed line) and 30 s (dark blue solid line) during the first (left) and last (right) 10 reversals. Scores are plotted as a function of trial number, where each trial number was averaged across the first and last 10 reversals, respectively. Monkeys were more accurate after short ITIs than long ITIs only after they became expert reversers. Because Trial 0 is the first trial on which reversed reward contingencies are in effect, monkeys should respond according to the contingencies in effect prior to reversal. Trial 0 is not included in statistical analyses (color figure online)

Our findings suggest that control of choice by habit and working memory differed between naïve and expert reversers. Choice behavior in naïve reversers appeared to be under greater relative control of a habit system. When the monkeys were naïve they were both less accurate, relative to when they were experts, and were unaffected by ITI duration. When monkeys became expert, choice behavior appeared to be under greater relative control by working memory.

Our findings are consistent with those from pigeons in which performance on serial reversal learning tasks is significantly worse when reversals contained only long ITIs than when they contained only short ITIs (Ploog and Williams 2010; Williams 1976). Similar ITI-sensitive performance has been reported in rhesus monkeys performing an object discrimination learning set task (Deets et al. 1970). Together these findings suggest that reversal expertise is contingent on working memory for the outcome of the previous trial.

An alternative interpretation is that subjects are disproportionately more likely to be affected by proactive interference after long than short ITIs. According to this account, proactive interference causes the relative validity of memories A+/B− and B+/A− to become equal, allowing for greater flexibility after a long delay interval (Clayton 1966; Kraemer and Golding 1997; Mackintosh et al. 1968). The proactive interference account thus also coheres well with previous research. Specifically, subjects exhibit less perseveration at the onset of a new reversal as reversal experience accrues, so long as consecutive reversals are separated by long intervals. Furthermore, overall performance on serial reversal learning tasks is worse when all trials are separated by long ITIs compared to short ITIs. Because the associative strength of both stimulus representations becomes similar over long intervals between trials or reversals, one might postulate that our operational definition of expertise in rhesus monkeys can be accounted for by an accumulation of proactive interference.

Our results from Experiment 1 do not provide enough evidence to conclude whether working memory or proactive interference is responsible for the development of expertise in rhesus monkeys. However, these accounts can be distinguished experimentally because proactive interference accounts depend on experience with specific stimuli while the working memory account posits a general shift in information processing. If a monkey has learned to actively maintain the previous trial in mind, it should be able to continue using this strategy if given a new pair of images to discriminate. By contrast, proactive interference depends on experience with specific stimuli, such that introducing a new image pair should eliminate expertise until PI accrues again over multiple reversals. In our next experiment, therefore, we contrasted the working memory and proactive interference accounts of reversal expertise by administering the same serial reversal task with a new pair of images. We hypothesize that if monkeys developed expertise through a generalizable shift in control of choice by working memory, then their performance will continue to be affected by ITI duration across all reversals with the new images. If instead monkeys developed expertise through an accumulation of proactive interference, we hypothesize that performance will be affected by ITI duration only after they have experienced numerous reversals with the new images.

Experiment 2

In Experiment 2, we tested the same rhesus monkeys on the same serial reversal learning task, using two new images. An alternative account for serial reversal learning improvement suggests that responding becomes more flexible as proactive interference accumulates (Clayton 1966; Gonzalez et al. 1967). By using two new images, we eliminate proactive interference from Experiment 1. The working memory account of expertise predicts that the difference between short and long ITI trials should appear immediately, within the first 10 reversals with the new images, while the proactive interference account predicts that expertise will emerge gradually as monkeys experience reversals with the new images.

Methods

Subjects and apparatus

All 6 monkeys from Experiment 1 were used. The same apparatus was used.

Procedure

Testing procedures used in Experiment 2 were identical to those described in Experiment 1. The images used in Experiment 1 were replaced with two new 350 × 350 pixel color photograph images.

Results and discussion

Monkeys completed 60 reversals in an average of 3.5 testing days. Figure 5 shows that monkeys transferred serial reversal learning expertise to new images, showing superior performance following 1-s ITIs in both the first and last block of 10 reversals (two-factor repeated measures ANOVA; ITI type: F (1,5) = 15.577, P = 0.011; reversal experience: F (1,5) = 3.531, P = 0.116; interaction: F (1,5) = .513, P = 0.506). Follow-up analysis confirmed that monkeys performed more accurately following 1-s ITIs than 30-s ITIs during both the first and last 10 reversals (paired samples t tests; first 10 reversals: t 5 = 3.948, P = 0.011; last 10 reversals: t 5 = 3.399, P = 0.019). Our results suggest that the control of choice by working memory transferred across stimulus sets and that it is unlikely that the development of expertise in Experiment 1 was due to the accumulation of proactive interference.

Fig. 5
figure 5

Proportion correct in discrimination trials following 1 s (red striped bar) and 30 s (solid blue bar) for the first and last 10 reversals of Experiment 2. Monkeys performed significantly better on discrimination trials that were preceded by a 1 s for both the first and last 10 reversals (color figure online)

As in Experiment 1, we compared accuracy following 1- and 30-s ITIs early (Trials 1–4) and late (Trials 13–16) in reversals to evaluate whether working memory was an especially strong determinant of accuracy early in reversals. Because there was no main effect of reversal experience between the first and last 10 reversals, we used data from all 60 reversals. Figure 6 depicts learning curves for Experiment 2. The case that accuracy is greater in the short ITI condition early in reversals is supported by the two-way interaction between phase of reversal and type of ITI (two-factor repeated measures ANOVA; early versus late: F (1,5) = 336.998, P < 0.001; ITI Type: F (1, 5) = 31.368, P = 0.003; interaction: F (1,5) = 13.832, P = 0.014). Thus, the pattern of accuracy is consistent with a strong influence of working memory, specifically early in reversals.

Fig. 6
figure 6

Proportion correct in discrimination trials following 1-s (red dashed line) and 30-s ITIs (solid dark blue line). Discrimination accuracy was averaged across all 60 reversals of Experiment 2. Monkeys performed significantly better following 1-s ITIs, compared to 30-s ITIs, for Trials 1 through 4, but not Trials 13 through 16. Because Trial 0 is the first trial on which reversed reward contingencies are in effect, monkeys should respond according to the contingencies in effect prior to reversal. Trial 0 is not included in statistical analyses (color figure online)

The findings from Experiment 2 support the hypothesis that expertise appears when choice is under greater relative control by working memory. Furthermore, this working memory expertise appears to be robust and transferable across stimulus sets. Our findings indicate that the development of expertise in Experiment 1 was due to an increase in the relative contribution by working memory, rather than from an accumulation of proactive interference. We did not counterbalance the discriminanda between Experiments 1 and 2, although the discriminanda for both experiments were color photographic images. Thus, it is possible, although very unlikely, that monkeys showed expertise at the onset of Experiment 2 because the particular discriminanda used in Experiment 2 were easier to discriminate or remember than those used in Experiment 1.

Working memory is characterized by active, effortful maintenance (Baddeley 2003; Cowan 2008). In humans, information can be held in mind over relatively long delays, as long as the information is rehearsed (Baddeley 2000; Baddeley et al. 1975; Milner 1970, p. 29). In rhesus monkeys, the active maintenance of familiar images is disrupted when subjects are required to perform a cognitively demanding task during the retention interval of a matching-to-sample task (Basile and Hampton 2013b). If monkeys actively maintain the outcome of the previous trial in working memory during the serial reversal learning task, then performance should be attenuated if a cognitively demanding task is introduced between discrimination trials. We use concurrent cognitive load to target working memory rehearsal in Experiment 3, thus providing a converging test of whether working memory is important for reversal expertise.

Experiment 3

We assessed the role of working memory in serial reversal learning expertise by alternating low and high concurrent cognitive loads across trials. We compared performance on the serial reversal learning task when discrimination trials were preceded by a classification task or an empty interval. If working memory is important for reversal expertise, we should observe lower accuracy on trials following the classification task, compared to yoked control trials.

Methods

Subjects and apparatus

All 6 monkeys from Experiments 1 and 2 were used. The same equipment was used.

Classification training

All monkeys used in Experiment 3 had previous experience with classifying images as containing birds, fish, flowers or people (Basile and Hampton 2013a, b; Diamond et al. 2016). Monkeys were retrained on the classification task before classification and reversal tasks were combined. The stimulus set for classification contained 425 unique images from each of the four categories, resulting in a total of 1700 images. Images were collected from the online photograph repository Flickr (Yahoo!, Sunnyvale, CA). The entire stimulus set was screened for duplicates using DupDetector (Prismatic Software, Anaheim, CA) and visual inspection. The stimulus set was screened to ensure that no image contained exemplars from more than one category (Gazes et al. 2013).

Figure 7 depicts the sequence of events in classification training. Monkeys initiated trials by touching a green start square (FR 2). A 400 × 300 pixel image corresponding to one of the four categories then appeared in the center of the screen. After monkeys touched the image (FR2), four 100 × 100 pixel classification icons, each corresponding to one of the four image categories, appeared in fixed positions in the four corners of the touch screen. Incorrect classifications resulted in a correction trial containing the same to-be-classified image. Incorrect correction trials were followed by a second correction trial. Second correction trials included the same image; however, only the correct category icon was presented on the screen. This ensured a correct response would occur. All correct classification and correction trials were paired with positive auditory feedback and food reinforcement. All incorrect classification and correction trials were paired with negative auditory feedback and 5-s time-out interval.

Fig. 7
figure 7

Categorization training trials. Monkeys started trials by touching a green square. An image from one of the 4 categories appeared and monkeys were required to touch it. Monkeys then selected from among 4 symbols corresponding to the 4 categories (color figure online)

Monkeys received at least two classification sessions consisting of 600 trials. Images from each of the four classification groups were presented pseudo-randomly, and each group was represented equally within each session. Correction trials did not contribute to the maximum number of trials; thus, every subject viewed 150 images from each category within a session. Monkeys trained until they completed two consecutive classification sessions with at least 80% correct classifications.

Procedure

We used the same serial reversal learning procedure as in Experiments 1 and 2; however, instead of alternating the ITI duration, we alternated two concurrent cognitive demand conditions: a classification task and an empty interval yoked in duration to the amount of time it took to complete the classification on the previous trial. All trials followed the same sequence: self-start, concurrent cognitive load, discrimination and ITI. Figure 8 depicts the sequence of events for Experiment 3. Monkeys completed 60 reversals with this alternating cognitive load procedure.

Fig. 8
figure 8

Order of events for high (a) and low (b) concurrent load trial types in Experiment 3. At that start of every day of the testing phase of Experiment 3, the first trial was of the high concurrent cognitive load type. The following trial was a low concurrent cognitive load type with an empty interval yoked in duration to the time taken to complete the category phase of the previous trial

Images from each category were pseudo-randomly presented so that each category was represented twice in every block of 16 discrimination trials, 8 of which contained the intervening category task. Monkeys viewed a centrally located 400 × 300 pixel image, with the four category icons in each corner. Correct classifications were paired with positive auditory feedback, but no food reward, and allowed subjects to progress to the discrimination trial. Incorrect classifications were paired with negative auditory feedback and resulted in the immediate presentation of a different to-be-classified image. This same process repeated until an image was correctly classified. On the following trial, instead of classifying, monkeys experienced a yoked empty interval. During this yoked empty interval phase, monkeys viewed a black screen for the same time it took to complete the entire category phase, including category corrections, in the previous trial.

After monkeys completed the concurrent cognitive load phase, they were given image discrimination. The discrimination phase was identical to discrimination phases from Experiments 1 and 2; however, two novel images were used. To avoid contamination between discrimination and category phases, discriminanda were two color images that did not contain birds, fish, flowers or people. This was also true for Experiments 1 and 2, as the discriminanda from the previous two experiments also did not contain representations from any of the 4 categories. Monkeys were required to select the currently positive image to receive positive auditory feedback and a food reinforcement. If monkeys selected the incorrect stimulus, they were presented with negative auditory feedback and no food reinforcement. A 1000-ms ITI was presented after each discrimination, regardless of whether the trial was correct or incorrect.

Results and discussion

Monkeys required significantly more corrections of their classification responses during the first 10 reversals compared to the last 10 reversals (paired samples t test: t 5 = 2.670, P < 0.05). This improvement in categorization indicates that there was competition for cognitive resources between reversal learning and category, supporting the premise for using this experimental intervention. Performance on the categorization task is important to note because empty intervals were yoked to the duration of the classification phase of the previous trial. Thus, trials with long category phases were followed by trials with long empty intervals. Because we found in Experiments 1 and 2 that long empty intervals impair working memory performance, we expect longer intervals to have the same effect here. To mitigate this effect, we compared discrimination accuracy as a function of concurrent cognitive load for the last 10 reversals only. This comparison maximizes the likelihood of comparing performance under conditions of relatively low and high concurrent cognitive demands with the shortest delay intervals possible. The interval between discrimination trials for Experiment 3 fell between the two ITI durations used in Experiments 1 and 2 (Median: 7964 ms; Range: 3200–118,907 ms). We examined Trials 1–16, regardless of whether a category correction was needed. Figure 9 shows that monkeys performed significantly better when discrimination trials followed an empty interval rather than the classification task (paired samples t test; t 5 = 14.055, P < 0.001).

Fig. 9
figure 9

Proportion correct in discrimination trials following low (striped red bar) or high (solid blue bar) concurrent cognitive load. Trials 1 through 16 for the last 10 reversals of Experiment 3 are shown (color figure online)

We compared accuracy following low and high concurrent cognitive load conditions early (Trials 1–4) and late (Trials 13–16) in reversals to evaluate whether working memory was an especially strong determinant of accuracy early in reversals. Figure 10 shows learning curves for Experiment 3. While inspection of Fig. 10 gives the impression that accuracy differed most dramatically between cognitive load conditions, the interaction between phase of reversal and cognitive load was not statistically significant (two-factor repeated measures ANOVA; early versus late: F (1,5) = 8.840, P = 0.031; concurrent cognitive load type: F (1,5) = 23.588, P = 0.005; interaction: F (1,5) = 4.643, P = 0.084). Thus, statistical analysis of accuracy in Experiment 3 strongly indicates that working memory was important for reversal accuracy overall, but only weakly supports the conclusion from Experiments 1 and 2, that working memory was especially important early in reversals.

Fig. 10
figure 10

Proportion correct in discrimination trials following a yoked interval (red dashed line) and category trial (solid dark blue line). Trial numbers averaged across the last 10 reversals of Experiment 3. Because Trial 0 is the first trial on which reversed reward contingencies are in effect, monkeys should respond according to the contingencies in effect prior to reversal. Trial 0 is not included in statistical analyses (color figure online)

Experiment 3 varied the difficulty of discrimination trials by alternating the concurrent cognitive load. Monkeys were significantly more accurate on yoked delay trials, when the concurrent cognitive demand was low, compared to categorization trials, when the concurrent cognitive demand was high. Because the two conditions differed only in cognitive demands, and not in duration, these results indicate that monkeys actively maintained the outcome of the previous trial in working memory. The results from Experiment 3 provide converging evidence that serial reversal learning expertise is facilitated by working memory.

General discussion

We applied interventions intended to selectively attenuate working memory and found that the development of serial reversal expertise in monkeys was facilitated by an increase in the relative control of choice behavior by working memory. Discrimination accuracy when the monkeys were naïve reversers in the first 10 reversals of Experiment 1 was the same regardless of whether a 1- or 30-s ITI preceded choice. Insensitivity to delay suggests that responding was largely controlled by habits that were not diminished by the passage of time. In the last 10 reversals of Experiment 1, when monkeys were expert and reversing rapidly, discrimination accuracy was significantly better following 1-s ITIs than 30-s ITIs, suggesting that responding was under greater relative control by delay-sensitive working memory. In Experiment 2, we found that reversal expertise, and the use of working memory, was not limited to stimuli with which monkeys had extensive training. Monkeys given new discriminanda were immediately more accurate after 1-s than after 30-s ITIs. Immediate generalization to new discriminanda indicates that rapid reversal learning in rhesus monkeys cannot be fully explained by the build-up of proactive interference. In Experiment 3, we used a concurrent cognitive load in the place of long ITIs to further assess whether expertise depended on working memory. Susceptibility to concurrent cognitive load is a signature of working memory (Basile and Hampton 2013b). Concurrent cognitive load disrupted reversal learning in expert reversers, further strengthening the case that working memory is important for reversal learning expertise.

The fact that reversal expertise generalized immediately to new stimuli in Experiment 2 suggests that PI does not account for improved reversal learning performance in rhesus monkeys. However, it is possible that PI develops very rapidly, perhaps after just one reversal. With only 6 monkeys, it is not possible to conduct a reliable comparison of accuracy in the long and short ITI conditions in the first reversal alone, so these data cannot entirely exclude the possibility of very rapid build-up of PI.

Another account of reversal expertise posits that the outcome of the previous trial becomes an increasingly salient source of information for guiding choice in the current trial as reversal experience is gained (Williams 1976). The author did not invoke working memory per se in this account, but our hypothesis that control of choice by working memory increases with successive reversals invokes the same change in the source of control of choice. The working memory account and the response–outcome account share a weakness in that neither clearly explains why habit would initially control choice and working memory would control choice only after considerable experience. We found that choice behavior in naïve reversers was not under the control of working memory, but we cannot be certain that the monkeys did not remember the outcome of the last trial from the beginning. It is therefore not clear whether monkeys only begin to remember the outcome of the last trial with experience or whether working memory for the outcome of the last trial is always present and the change in the contribution of working memory occurs by a process more like a shift in strategy. According to the exponentially weighted moving average (EWMA) model, memories are exponentially weighted to favor more recent events over more distant events, especially when environmental conditions are regularly changing (Killeen 1994; McNamara and Houston 1987). While the EWMA model does not evoke memory systems, like our approach the model describes a change in the weightings of memories, which we propose results from a shift in priority of memory systems.

We have stressed the importance of working memory in serial reversal learning improvement. However, working memory does not appear to account for all the improvements that occur as animals gain experience with reversals. As shown in Fig. 3, performance following 1- and 30-s ITI durations improved equivalently across the first 10 reversals in Experiment 1. Because manipulations of delay interval did not affect performance in these reversals, this initial improvement does not appear to be due to increasing control of choice by working memory. Perhaps, instead, the difference in the associative strengths of the two stimuli decreased as both stimuli were rewarded, resulting is less perseveration. Thus, proactive interference may have aided in the initial performance improvements across the first 10 reversals. However, if indeed PI aided early performance on the reversal task, it appears to have little effect on choice behavior after expertise has developed, because expertise transferred to new discriminanda in Experiment 2. Both PI and working memory may facilitate reversal learning, making different contributions depending on stage of training, the specific parameters of testing and possibly the species tested. It is interesting to consider the possibility that the contributions of PI and working memory might differ among species. Perhaps the effect of PI is strong in animals with comparatively weak working memory, as might be the case with pigeons, but plays a smaller role in animals that have comparatively robust working memory, like monkeys.

Indirect evidence supports this idea that the robustness of working memory in a given species determines the extent to which working memory is critical for expertise. First, proactive interference has been proposed to be the primary mechanism underlying serial reversal learning improvement in rodents, bumblebees and goldfish (Gonzalez et al. 1967; Mackintosh et al. 1968; Strang and Sherry 2014). Second, it has been suggested that dependence on habit in reversal learning is decreased in animals with larger brains. In many samples, larger brain size may accompany enlargement of the frontal lobes and thus enhancement of working memory. The so-called mediational paradigm has been used to assess the extent to which animals exhibit associative or rule-based strategies in reversal learning (Rumbaugh 1971). The meditational paradigm is a variation of a reversal learning task where animals learn an A+/B− discrimination. Once animals learn the A+/B− discrimination, they are given one A−/B+ reversal trial. Following this single reversal trial, animals are presented with one of the three conditions: a control A−/B+ condition, a new positive stimulus A−/C+ condition or a new negative stimulus C−/B+ condition. If an animal has learned the original discrimination through associative rules, such as “approach A” or “avoid B,” it will succeed on one or two of the conditions, but not all three. In contrast, if an animal has learned the original discrimination though a rule-based strategy, such as win-stay, lose-shift, it will perform equally well on all three conditions. The meditational paradigm has been tested on a variety of primate species, and rule-based learning is associated with larger brain size (Beran et al. 2008; Rumbaugh 1971, 1997; Rumbaugh and Pate 1984, Chapter 31). In light of our findings, it seems likely that the degree to which a species exhibits either associative or rule-based learning may be largely influenced by the extent to which their behavior is under greater relative control by either habit or working memory, respectively. Future comparative studies may address the extent to which “rule-based learning” depends on working memory.

There has been a resurgence in interest in reversal learning, manifest in a raft of recent studies of “midsession reversal” (McMillan et al. 2014; Rayburn-Reeves, et al. 2011; Smith et al. 2016; Stagner et al. 2013). Generally, these studies find that pigeons make many anticipatory and perseverative errors when a reversal predictably occurs in the middle of a testing session. This result clearly shows that the choice behavior of subjects is not controlled by working memory for the outcome of the last trial. If it were, subjects would make no anticipatory errors and very few, if any, perseverative errors. Instead, time since session onset (Rayburn-Reeves et al. 2011; Stagner et al. 2013) appears to influence midsession reversal choice behavior in pigeons (but see McMillan and Roberts 2012). Because estimates of time are fuzzy, anticipatory and perseverative errors occur even though the outcome of the last trial would be a nearly perfect cue for correct choice. Pigeons are not the only species to have been tested on the midsession reversal task, and near-optimal responding has been observed in humans, rhesus monkeys and rats (Rayburn-Reeves et al. in press; Rayburn-Reeves et al. 2011, 2013). As in the meditational paradigm, species differences on the midsession reversal task may reflect the degree to which choice is controlled by working memory.

Both the effects of concurrent cognitive load (Experiment 3; Basile and Hampton 2013b) and studies of directed forgetting (Tu and Hampton 2013) indicate that working memory is an active process in monkeys. Our analyses looking at early and late phases within reversals indicate that accuracy is reduced by long delays and high concurrent cognitive demands within the first 4 trials of a reversal. However, the effect of long ITIs and concurrent cognitive load reliably disappear with additional trials within a reversal. When active rehearsal of the positive stimulus is disrupted early in a reversal, choice is more greatly controlled by a habit that is incongruent with the current S+/S- conditions, causing perseverative errors. However, upon experiencing numerous trials under that particular S+/S− condition, subjects displayed near-optimal performance regardless of whether working memory was disrupted by long ITIs or concurrent cognitive load. From this finding, we posit the relative associative strength of the discriminanda flip within a reversal—early in the reversal, the new S− has a greater associative strength and later in the reversal, the S+ has the greater associative strength. If this is indeed the case, then there is little need to allocate limited attentional resources to actively maintaining the S+ “in mind” late within a reversal. In humans, the ability to multitask, measured by having subjects perform two tasks simultaneously, is substantially better when one of the two tasks can be solved through habit, compared to when both tasks require attentional resources (e.g., Lisman and Sternberg 2013). If monkeys are able to strategically shift attentional resources according to changes in concurrent cognitive demands, then performance on a secondary task would improve later in a reversal when the serial reversal learning task can be solved through habit alone. Future work should determine whether monkeys continue to actively maintain the S+ “in mind” late within a reversal, after it is no longer necessary, or instead adaptively reallocate cognitive resources.

Our results highlight the importance of working memory for the development of serial reversal expertise. However, other processes may also contribute, including inhibition of responses to previously rewarded stimuli. Our procedure and results do not directly address the role that inhibition might play in reversal expertise. We highlighted the control of choice by working memory and by habit, and we selectively attenuated the contribution of working memory, establishing a single dissociation. Our work did not selectively manipulate habit. Future work might be directed at generating a double dissociation with procedures that attenuate both habit and working memory.

We found that both habit and working memory contribute to choice in serial reversal learning. The development of expertise coincided with a shift from inflexible, habitual responding, to flexible, rapidly updated responding, suggesting that working memory is critical for reversal expertise. Using both ITI duration and concurrent cognitive load, we found converging evidence to support the hypothesis that working memory is critical for serial reversal learning expertise in rhesus monkeys. Furthermore, results from Experiment 2, in which use of working memory generalized to new stimuli, suggested that proactive interference played little role in determining choice behavior in experts. Our novel approach to the study of mechanisms underlying serial reversal learning expertise indicates that habit and working memory together determine the pattern of performance in expert reversers.