Abstract
Functionally distinct memory systems likely evolved in response to incompatible demands placed on learning by distinct environmental conditions. Working memory appears adapted, in part, for conditions that change frequently, making rapid acquisition and brief retention of information appropriate. In contrast, habits form gradually over many experiences, adapting organisms to contingencies of reinforcement that are stable over relatively long intervals. Serial reversal learning provides an opportunity to simultaneously examine the processes involved in adapting to rapidly changing and relatively stable contingencies. In serial reversal learning, selecting one of the two simultaneously presented stimuli is positively reinforced, while selection of the other is not. After a preference for the positive stimulus develops, the contingencies of reinforcement reverse. Naïve subjects adapt to such reversals gradually, perseverating in selection of the previously rewarded stimulus. Experts reverse rapidly according to a win-stay, lose-shift response pattern. We assessed whether a change in the relative control of choice by habit and working memory accounts for the development of serial reversal learning expertise. Across three experiments, we applied manipulations intended to attenuate the contribution of working memory but leave the contribution of habit intact. We contrasted performance following long and short intervals in Experiments 1 and 2, and we interposed a competing cognitive load between trials in Experiment 3. These manipulations slowed the acquisition of reversals in expert subjects, but not naïve subjects, indicating that serial reversal learning expertise is facilitated by a shift in the control of choice from passively acquired habit to actively maintained working memory.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
The diversity of functionally distinct memory systems likely evolved in response to divergent selection pressures that vary with environmental conditions and across development (Sherry and Schacter 1987). A habit memory system appears early in development, facilitates the gradual learning of an indefinite number of habits and skills that are stable over long intervals and is not accessible to cognitive monitoring and control (Bachevalier 1990; Gasbarri et al. 2014). In contrast, working memory facilitates the rapid acquisition and relatively brief retention of a limited amount of information (Baddeley 1992; Shettleworth 2010, Chapter 7). In humans, and possibly in some non-humans, the contents of working memory are actively maintained and accessible to cognitive monitoring and control (Baddeley 2003; Basile et al. 2015; Basile and Hampton 2013b; Cowan 2008; Hampton 2001; Tu and Hampton 2014).
Independent memory systems may act simultaneously in parallel to regulate behavior (Hay and Jacoby 1996; McDonald and White 1993; Poldrack and Packard 2003; Tu and Hampton 2013; Tu et al. 2011). Dissociation of memory systems is established when altering the contributions one memory system makes to behavior leaves the contribution of another system relatively intact. For example, Tu and Hampton (2013) studied the relative contributions of habits and “one-trial memories,” the latter being a type of memory of indeterminate status—relatively short term compared to habit, but not clearly working memory. These authors found that these two types of memory can be controlled independently. Decreasing the likelihood of reward following a stimulus reduced the control of behavior by habit memory, but did not affect control by “one-trial memory” in rhesus monkeys. Lengthening the duration of retention intervals decreased control by one-trial memory, but left control by habit intact (Tu and Hampton 2013). Such behavioral dissociations are often related to neurobiological dissociations. In the above example, one-trial memory, but not habit, depended critically on the perirhinal cortex in rhesus monkeys (Tu et al. 2011). The amount of experience with a given task can also dissociate the contributions of distinct memory systems. Rats trained to retrieve food rewards on a plus maze initially used allocentric spatial cues to locate the food. After repeatedly starting from the same location and turning in a particular direction to retrieve the food, the behavior of the rats came under the control of egocentric cues. Although the control of behavior switched from predominantly allocentric to predominantly egocentric cues with training, both types of memory remained present and capable of controlling behavior. Inactivation of the dorsal striatum, which is critical for the control of behavior by egocentric cues, resulted in a return of control by allocentric cues (Packard and McGaugh 1996).
Shifts in the relative control of behavior by distinct memory systems may also occur during the formation of learning sets. In Harlow’s seminal Formation of Learning Sets (1949), the term learning-to-learn was used to describe a shift from gradual to rapid acquisition of discrimination learning tasks as rhesus monkeys completed successive discriminations. Neuroimaging of non-human primates indicates that the formation of a learning set co-occurs with a shift from striatal to lateral prefrontal cortical activity (Yokoyama et al. 2005). Given that shifts in dominance among multiple memory systems contribute to the development of a learning set, it is likely that such shifts also contribute to other learning-to-learn tasks, such as serial reversal learning.
In serial reversal learning, subjects are repeatedly presented with discrimination trials containing the same two objects or images. At any given time, only one of the two stimuli is rewarded when selected. Within every reversal, the positive stimulus (S+) is rewarded if selected and will remain positive until a predetermined performance criterion is met. Upon reaching criterion, the contingencies of reinforcement reverse (i.e., S+ becomes S− and S− becomes S+). Subjects are then required to meet criterion by selecting the formerly non-reinforced stimulus. This process may be repeated for many reversals.
Reversal learning improves with reversal experience. Naïve subjects reverse gradually, making many perseverative choices of the stimulus that was rewarded before the most recent reversal (e.g., Mackintosh et al. 1968). After experiencing many reversals, naïve reversers become experts and show flexible, win-stay, lose-shift responding, sometimes making only a single error before reliably selecting the previously incorrect stimulus (Bessemer and Stollnitz 1971; Shettleworth 2010, Chapter 6). The appearance of the win-stay, lose-shift response pattern occurs in the absence of any change in external task demands, suggesting that the development of expertise is facilitated by a shift in the relative control of choice behavior by distinct memory systems.
One account of performance improvements in serial reversal learning is that responding becomes less perseverative as proactive interference accumulates (Mackintosh et al. 1968). After both stimuli have been extensively reinforced in successive reversals, the difference in associative strength between them may be only modestly affected by current reinforcement. It has therefore been rather counterintuitively argued that the resulting difficulty in discriminating the associative value of the two stimuli reduces perseveration, allowing subjects to respond more flexibly at the onset of a reversal (Clayton 1966; Gonzalez et al. 1967; Kraemer and Golding 1997; Strang and Sherry 2014). However, an inherent issue with a proactive interference explanation is that it can only account for reversal improvement when reversals (i.e., the exchange from S1+/S2− to S1−/S2+) are separated by long intervals. If instead reversals occur in rapid succession, such that the inter-reversal interval is no different from the inter-trial interval, the contributions by proactive interference will likely be outweighed by recency of the last rewarded choice, and thus, preference for the previous S+ will persist into the new reversal (Kraemer and Golding 1997; Mackintosh et al. 1968). Given that performance improvements occur even when reversals are experienced in rapid succession, it seems likely that alternative mechanisms also contribute to the development of expertise in serial reversal learning.
The development of serial reversal learning expertise may be facilitated by a shift in the relative control of choice by working memory and habit. We hypothesize that choice in naïve reversers is under greater relative control by a habit system, while choice in expert reversers is under greater relative control by working memory. Control of choice by habit would explain the relatively gradual reversing, marked by perseveration, observed in naïve reversers. Control of choice by working memory would account for the flexible, rapid reversing when these reversers become experts. If habit controls choice in naïve reversers and working memory controls choice in expert reversers, then manipulations that attenuate working memory should impair reversal learning in expert but not naïve reversers.
Experiment 1
We tested whether the development of serial reversal learning expertise in rhesus monkeys is facilitated by an increase in the relative control of choice by working memory rather than habit. The contents of working memory are typically available for short periods of time while habits remain intact over long intervals (Baddeley 2000; Grant and Roberts 1973; Mishkin et al. 1984, Chapter 2). This difference in availability after the passage of time allowed us to assess the relative contributions of working memory and habit to choice by manipulating the interval between successive discrimination trials. In successive discriminations, the inter-trial interval (ITI) is the interval over which information from the last trial must be maintained to inform choice on the current trial. Working memory for the outcome of the last discrimination should be substantially attenuated after long ITIs, whereas habit resulting from previous trials should persist. We compared accuracy on discrimination trials following short 1-s and long 30-s ITIs across many reversals to determine whether the extent to which habit and working memory controlled choice changed as monkeys changed from naïve reversers at the beginning of training to expert reversers by the end of training. If choice in naïve reversers is controlled primarily by habit, there should be no difference in discrimination performance following 1- and 30-s ITIs. To the extent that serial reversal expertise is under the control of working memory, discrimination performance should be significantly better following 1-s ITIs than 30-s ITIs.
Methods
Subjects and apparatus
Six adult, male rhesus monkeys (Macaca mulatta; mean age = 9.16 years) were used. Monkeys received full daily food rations and ad libitum access to water. Two of the six monkeys were pair-housed at the time of this study. The other four monkeys were individually housed, in line with veterinary guidance, but had visual contact with other monkeys. Testing occurred for up to seven hours a day, six days a week. Monkeys were tested in their home cages using portable testing rigs. Each testing rig was equipped with 15-inch color LCD touch-sensitive screen (Elo TouchSystems, Menlo Park, CA), running at a resolution of 1024 × 768 pixels, and two automatic food dispensers (Med Associates, Inc., St. Albans, VT) which delivered nutritionally balanced primate pellets (Bio-Serv, Frenchtown, NJ). Tests were controlled by a personal computer running a custom program written in presentation (Neurobehavioral Systems, Albany, CA). All six subjects had previous experience with touch screen tasks, including image discrimination; however, none of the six had previous experience with reversal learning. Pair-housed monkeys were separated during testing by a panel that allowed limited visual, auditory and tactile contact but prevented access to the other monkey’s computer screen.
Procedure
Figure 1 depicts the sequence of events in a trial. Subjects initiated each trial by touching a 100 × 100 pixel green square twice (FR 2). Two images (350 × 350 pixels) appeared, each placed 250 pixels left or right of the center of the touch screen. The left–right position of the two images was counterbalanced and pseudo-randomly determined such that a stimulus could appear on the same side no more than 4 times in a row. The same two images were used throughout all reversals. Monkeys selected one of the two images by touching it twice (FR 2). Touching the S+ cleared the screen and produced a positive sound and a food pellet. Touching the S− cleared the screen and produced a negative sound. Either a 1- or 30-s ITI ensued. The 1- and 30-s ITIs alternated, regardless of trial outcome.
The same image was the S+ until monkeys reached a performance criterion of 15 out of 16 correct discrimination trials. This criterion was assessed once every block of 16 trials. If criterion was met, a reversal occurred; if criterion was not met, 16 additional trials were administered continuing the same S+/S− arrangement. The first trial in which the reversed contingencies were in place, Trial 0, was not included in the performance criterion because the monkeys could not know that the reversal had occurred until they received feedback on this trial. Thus, every reversal contained at least 17 trials: Trial 0, followed by blocks of 16 trials, 15 of which had to be correct to trigger a reversal. The odd number of trials ensured that Trial 1 of each reversal followed a 1- or 30-s ITI equally often. Testing continued until monkeys had completed a total of 90 reversals. Any reversal that had not been completed by the end of a testing day was administered at the start of the next testing day. Thus, both the reversal number and S+/S− configuration carried over across days; however, any progress toward reaching reversal criterion did not. Trials from incomplete reversals were not included in the data analysis.
We calculated the proportion of correct discrimination trials following 1- and 30-s ITIs to assess the relative control of choice by habit and working memory. Proportion correct scores were arcsine transformed before analysis (Aron and Aron 1999, Chapter 14) to better approximate normality. We hypothesized that if choice behavior in naïve reversers was largely controlled by habit, with little contribution of working memory, discrimination performance would not differ following 1- or 30-s ITIs. Complementarily, we hypothesized that if choice behavior in expert reversers was under greater relative control by working memory, accuracy would be higher on discrimination trials following 1-s ITIs compared to 30-s ITIs.
Results and discussion
Monkeys were scheduled to complete 90 reversals in Experiment 1. However, on the 4th day of testing, 4 of the 6 monkeys were accidentally tested with only 1-s ITIs, rather than alternating long and short ITIs. These four monkeys performed one day of reversals under this erroneous condition, averaging 56.5 reversals. Three of these four monkeys had already completed at least 35 reversals before receiving the incorrect version of the program. The fourth monkey had completed only 8 reversals. Because analysis required 10 reversals under the alternating ITI conditions, this monkey was not included in the analysis of Experiment 1. All monkeys were given an additional 30 reversals of the correct testing with alternating ITIs. As a result, the five monkeys completed an average of 5.2 testing days and 120.8 reversals. Thus, despite the experimental error, we acquired a block of at least 10 sessions of data from 5 monkeys when they were novice reversers and another block of 10 sessions after we expected them to be expert.
We compared the first 10 and the last 10 reversals performed by each monkey to determine whether discrimination performance improved across the intervening reversals. Accuracy was assessed by averaging the number of errors committed in Trials 1–16 of each of the reversals. Only Trials 1–16 were used in the analysis because monkeys could, and sometimes did, reach criterion in the first block of 16 trials in a reversal. Monkeys made significantly more errors in the first 10 reversals than the last 10 reversals, suggesting that they had developed serial reversal expertise over the course of training (first 10: M = 13.54, SD = 7.278; last 10: M = 4.66, SD = 3.274).
Figure 2 shows that performance following 1- or 30-s ITIs did not differ early in the reversal task; however, after experiencing many reversals monkeys performed significantly better following 1-s ITIs relative to 30-s ITIs. To determine whether the control of choice by working memory changed as a function of reversal experience, we compared the proportion of correct discrimination trials preceded by 1- and 30-s ITIs during the first and last 10 reversals. The difference in accuracy following 1- and 30-s intervals was significantly greater during the last 10 reversals, and there was a significant difference in accuracy between 1 and 30-s ITI types (two-factor repeated measures ANOVA; reversal experience: F (1,4) = 39.9, P = 0.003; ITI type: F (1,4) = 101.6, P = 0.001; interaction: F (1,4) = 22.0, P = 0.009). Follow-up analyses confirmed that accuracy was significantly higher following 1-s than 30-s ITIs in the last 10 reversals, while this difference was not present in the first 10 reversals (paired samples t tests; first 10 reversals: t 4 = −.801, P = 0.468; last 10 reversals: t 4 = 11.706, P < 0.001). Follow-up analyses also showed that discrimination accuracy following 30-s ITIs was significantly more accurate in the last 10 reversals than in the first 10 reversals (paired samples t test; reversal experience: t 4 = −3.834, P = 0.019). Interestingly, as is shown in Fig. 3, accuracy following both 1- and 30-s ITI durations improved equally across the first 10 reversals. Because there was no difference in accuracy between the two ITI types in the first 10 reversals, it suggests that monkeys were initially aided by a process other than working memory.
We replicated the finding that animals become more proficient at reversing with more experience (Dufort et al. 1954; Mackintosh et al. 1968; Ploog and Williams 2010). Longer ITIs impaired learning by monkeys once they became expert reversers, but it did not affect them while they were still naïve reversers, suggesting that working memory is critical for reversal expertise. We propose that working memory increases reversal efficiency as it allows subjects to update their representation of the current S+/S− based on feedback from the outcome of the previous trial. Thus, the difference between the contributions by a working memory system, relative to a habit system, is likely to be most pronounced immediately after a reversal has occurred.
To determine whether working memory was especially critical early in reversals in expert reversers, we compared accuracy following 1- and 30-s ITIs early and late in reversals. We averaged accuracy on Trials 1–4 under each ITI condition and on Trials 13–16 under each ITI condition. We determined these scores for the first 10 reversals, while monkeys were naïve, and for the last 10 reversals, when monkeys were expert. If working memory was critical for rapid reversal in expert reversers, but not in naïve reversers, we should find that monkeys were especially accurate early in reversals after they were expert. This pattern, evident in Fig. 4, is supported by a three-way interaction between level of expertise, phase of reversal and type of ITI (three-factor repeated measures ANOVA; reversal experience: F (1,4) = 38.280, P = 0.003; early versus late: F (1,4) = 38.238, P = 0.003; ITI type: F (1,4) = 81.304, P = 0.001; reversal experience * early versus late: F (1,4) = 28.684, P = 0.006; reversal experience * ITI Type: F (1,4) = 28.362, P = 0.006; early versus late * ITI Type: F (1,4) = .035, P = 0.860; reversal experience * early versus late * ITI Type: F (1,4) = 12.543, P = 0.024). This result shows that performance increased most rapidly early in reversals after monkeys became expert at reversing, consistent with a strong influence of working memory early in reversals after expertise was established.
Our findings suggest that control of choice by habit and working memory differed between naïve and expert reversers. Choice behavior in naïve reversers appeared to be under greater relative control of a habit system. When the monkeys were naïve they were both less accurate, relative to when they were experts, and were unaffected by ITI duration. When monkeys became expert, choice behavior appeared to be under greater relative control by working memory.
Our findings are consistent with those from pigeons in which performance on serial reversal learning tasks is significantly worse when reversals contained only long ITIs than when they contained only short ITIs (Ploog and Williams 2010; Williams 1976). Similar ITI-sensitive performance has been reported in rhesus monkeys performing an object discrimination learning set task (Deets et al. 1970). Together these findings suggest that reversal expertise is contingent on working memory for the outcome of the previous trial.
An alternative interpretation is that subjects are disproportionately more likely to be affected by proactive interference after long than short ITIs. According to this account, proactive interference causes the relative validity of memories A+/B− and B+/A− to become equal, allowing for greater flexibility after a long delay interval (Clayton 1966; Kraemer and Golding 1997; Mackintosh et al. 1968). The proactive interference account thus also coheres well with previous research. Specifically, subjects exhibit less perseveration at the onset of a new reversal as reversal experience accrues, so long as consecutive reversals are separated by long intervals. Furthermore, overall performance on serial reversal learning tasks is worse when all trials are separated by long ITIs compared to short ITIs. Because the associative strength of both stimulus representations becomes similar over long intervals between trials or reversals, one might postulate that our operational definition of expertise in rhesus monkeys can be accounted for by an accumulation of proactive interference.
Our results from Experiment 1 do not provide enough evidence to conclude whether working memory or proactive interference is responsible for the development of expertise in rhesus monkeys. However, these accounts can be distinguished experimentally because proactive interference accounts depend on experience with specific stimuli while the working memory account posits a general shift in information processing. If a monkey has learned to actively maintain the previous trial in mind, it should be able to continue using this strategy if given a new pair of images to discriminate. By contrast, proactive interference depends on experience with specific stimuli, such that introducing a new image pair should eliminate expertise until PI accrues again over multiple reversals. In our next experiment, therefore, we contrasted the working memory and proactive interference accounts of reversal expertise by administering the same serial reversal task with a new pair of images. We hypothesize that if monkeys developed expertise through a generalizable shift in control of choice by working memory, then their performance will continue to be affected by ITI duration across all reversals with the new images. If instead monkeys developed expertise through an accumulation of proactive interference, we hypothesize that performance will be affected by ITI duration only after they have experienced numerous reversals with the new images.
Experiment 2
In Experiment 2, we tested the same rhesus monkeys on the same serial reversal learning task, using two new images. An alternative account for serial reversal learning improvement suggests that responding becomes more flexible as proactive interference accumulates (Clayton 1966; Gonzalez et al. 1967). By using two new images, we eliminate proactive interference from Experiment 1. The working memory account of expertise predicts that the difference between short and long ITI trials should appear immediately, within the first 10 reversals with the new images, while the proactive interference account predicts that expertise will emerge gradually as monkeys experience reversals with the new images.
Methods
Subjects and apparatus
All 6 monkeys from Experiment 1 were used. The same apparatus was used.
Procedure
Testing procedures used in Experiment 2 were identical to those described in Experiment 1. The images used in Experiment 1 were replaced with two new 350 × 350 pixel color photograph images.
Results and discussion
Monkeys completed 60 reversals in an average of 3.5 testing days. Figure 5 shows that monkeys transferred serial reversal learning expertise to new images, showing superior performance following 1-s ITIs in both the first and last block of 10 reversals (two-factor repeated measures ANOVA; ITI type: F (1,5) = 15.577, P = 0.011; reversal experience: F (1,5) = 3.531, P = 0.116; interaction: F (1,5) = .513, P = 0.506). Follow-up analysis confirmed that monkeys performed more accurately following 1-s ITIs than 30-s ITIs during both the first and last 10 reversals (paired samples t tests; first 10 reversals: t 5 = 3.948, P = 0.011; last 10 reversals: t 5 = 3.399, P = 0.019). Our results suggest that the control of choice by working memory transferred across stimulus sets and that it is unlikely that the development of expertise in Experiment 1 was due to the accumulation of proactive interference.
As in Experiment 1, we compared accuracy following 1- and 30-s ITIs early (Trials 1–4) and late (Trials 13–16) in reversals to evaluate whether working memory was an especially strong determinant of accuracy early in reversals. Because there was no main effect of reversal experience between the first and last 10 reversals, we used data from all 60 reversals. Figure 6 depicts learning curves for Experiment 2. The case that accuracy is greater in the short ITI condition early in reversals is supported by the two-way interaction between phase of reversal and type of ITI (two-factor repeated measures ANOVA; early versus late: F (1,5) = 336.998, P < 0.001; ITI Type: F (1, 5) = 31.368, P = 0.003; interaction: F (1,5) = 13.832, P = 0.014). Thus, the pattern of accuracy is consistent with a strong influence of working memory, specifically early in reversals.
The findings from Experiment 2 support the hypothesis that expertise appears when choice is under greater relative control by working memory. Furthermore, this working memory expertise appears to be robust and transferable across stimulus sets. Our findings indicate that the development of expertise in Experiment 1 was due to an increase in the relative contribution by working memory, rather than from an accumulation of proactive interference. We did not counterbalance the discriminanda between Experiments 1 and 2, although the discriminanda for both experiments were color photographic images. Thus, it is possible, although very unlikely, that monkeys showed expertise at the onset of Experiment 2 because the particular discriminanda used in Experiment 2 were easier to discriminate or remember than those used in Experiment 1.
Working memory is characterized by active, effortful maintenance (Baddeley 2003; Cowan 2008). In humans, information can be held in mind over relatively long delays, as long as the information is rehearsed (Baddeley 2000; Baddeley et al. 1975; Milner 1970, p. 29). In rhesus monkeys, the active maintenance of familiar images is disrupted when subjects are required to perform a cognitively demanding task during the retention interval of a matching-to-sample task (Basile and Hampton 2013b). If monkeys actively maintain the outcome of the previous trial in working memory during the serial reversal learning task, then performance should be attenuated if a cognitively demanding task is introduced between discrimination trials. We use concurrent cognitive load to target working memory rehearsal in Experiment 3, thus providing a converging test of whether working memory is important for reversal expertise.
Experiment 3
We assessed the role of working memory in serial reversal learning expertise by alternating low and high concurrent cognitive loads across trials. We compared performance on the serial reversal learning task when discrimination trials were preceded by a classification task or an empty interval. If working memory is important for reversal expertise, we should observe lower accuracy on trials following the classification task, compared to yoked control trials.
Methods
Subjects and apparatus
All 6 monkeys from Experiments 1 and 2 were used. The same equipment was used.
Classification training
All monkeys used in Experiment 3 had previous experience with classifying images as containing birds, fish, flowers or people (Basile and Hampton 2013a, b; Diamond et al. 2016). Monkeys were retrained on the classification task before classification and reversal tasks were combined. The stimulus set for classification contained 425 unique images from each of the four categories, resulting in a total of 1700 images. Images were collected from the online photograph repository Flickr (Yahoo!, Sunnyvale, CA). The entire stimulus set was screened for duplicates using DupDetector (Prismatic Software, Anaheim, CA) and visual inspection. The stimulus set was screened to ensure that no image contained exemplars from more than one category (Gazes et al. 2013).
Figure 7 depicts the sequence of events in classification training. Monkeys initiated trials by touching a green start square (FR 2). A 400 × 300 pixel image corresponding to one of the four categories then appeared in the center of the screen. After monkeys touched the image (FR2), four 100 × 100 pixel classification icons, each corresponding to one of the four image categories, appeared in fixed positions in the four corners of the touch screen. Incorrect classifications resulted in a correction trial containing the same to-be-classified image. Incorrect correction trials were followed by a second correction trial. Second correction trials included the same image; however, only the correct category icon was presented on the screen. This ensured a correct response would occur. All correct classification and correction trials were paired with positive auditory feedback and food reinforcement. All incorrect classification and correction trials were paired with negative auditory feedback and 5-s time-out interval.
Monkeys received at least two classification sessions consisting of 600 trials. Images from each of the four classification groups were presented pseudo-randomly, and each group was represented equally within each session. Correction trials did not contribute to the maximum number of trials; thus, every subject viewed 150 images from each category within a session. Monkeys trained until they completed two consecutive classification sessions with at least 80% correct classifications.
Procedure
We used the same serial reversal learning procedure as in Experiments 1 and 2; however, instead of alternating the ITI duration, we alternated two concurrent cognitive demand conditions: a classification task and an empty interval yoked in duration to the amount of time it took to complete the classification on the previous trial. All trials followed the same sequence: self-start, concurrent cognitive load, discrimination and ITI. Figure 8 depicts the sequence of events for Experiment 3. Monkeys completed 60 reversals with this alternating cognitive load procedure.
Images from each category were pseudo-randomly presented so that each category was represented twice in every block of 16 discrimination trials, 8 of which contained the intervening category task. Monkeys viewed a centrally located 400 × 300 pixel image, with the four category icons in each corner. Correct classifications were paired with positive auditory feedback, but no food reward, and allowed subjects to progress to the discrimination trial. Incorrect classifications were paired with negative auditory feedback and resulted in the immediate presentation of a different to-be-classified image. This same process repeated until an image was correctly classified. On the following trial, instead of classifying, monkeys experienced a yoked empty interval. During this yoked empty interval phase, monkeys viewed a black screen for the same time it took to complete the entire category phase, including category corrections, in the previous trial.
After monkeys completed the concurrent cognitive load phase, they were given image discrimination. The discrimination phase was identical to discrimination phases from Experiments 1 and 2; however, two novel images were used. To avoid contamination between discrimination and category phases, discriminanda were two color images that did not contain birds, fish, flowers or people. This was also true for Experiments 1 and 2, as the discriminanda from the previous two experiments also did not contain representations from any of the 4 categories. Monkeys were required to select the currently positive image to receive positive auditory feedback and a food reinforcement. If monkeys selected the incorrect stimulus, they were presented with negative auditory feedback and no food reinforcement. A 1000-ms ITI was presented after each discrimination, regardless of whether the trial was correct or incorrect.
Results and discussion
Monkeys required significantly more corrections of their classification responses during the first 10 reversals compared to the last 10 reversals (paired samples t test: t 5 = 2.670, P < 0.05). This improvement in categorization indicates that there was competition for cognitive resources between reversal learning and category, supporting the premise for using this experimental intervention. Performance on the categorization task is important to note because empty intervals were yoked to the duration of the classification phase of the previous trial. Thus, trials with long category phases were followed by trials with long empty intervals. Because we found in Experiments 1 and 2 that long empty intervals impair working memory performance, we expect longer intervals to have the same effect here. To mitigate this effect, we compared discrimination accuracy as a function of concurrent cognitive load for the last 10 reversals only. This comparison maximizes the likelihood of comparing performance under conditions of relatively low and high concurrent cognitive demands with the shortest delay intervals possible. The interval between discrimination trials for Experiment 3 fell between the two ITI durations used in Experiments 1 and 2 (Median: 7964 ms; Range: 3200–118,907 ms). We examined Trials 1–16, regardless of whether a category correction was needed. Figure 9 shows that monkeys performed significantly better when discrimination trials followed an empty interval rather than the classification task (paired samples t test; t 5 = 14.055, P < 0.001).
We compared accuracy following low and high concurrent cognitive load conditions early (Trials 1–4) and late (Trials 13–16) in reversals to evaluate whether working memory was an especially strong determinant of accuracy early in reversals. Figure 10 shows learning curves for Experiment 3. While inspection of Fig. 10 gives the impression that accuracy differed most dramatically between cognitive load conditions, the interaction between phase of reversal and cognitive load was not statistically significant (two-factor repeated measures ANOVA; early versus late: F (1,5) = 8.840, P = 0.031; concurrent cognitive load type: F (1,5) = 23.588, P = 0.005; interaction: F (1,5) = 4.643, P = 0.084). Thus, statistical analysis of accuracy in Experiment 3 strongly indicates that working memory was important for reversal accuracy overall, but only weakly supports the conclusion from Experiments 1 and 2, that working memory was especially important early in reversals.
Experiment 3 varied the difficulty of discrimination trials by alternating the concurrent cognitive load. Monkeys were significantly more accurate on yoked delay trials, when the concurrent cognitive demand was low, compared to categorization trials, when the concurrent cognitive demand was high. Because the two conditions differed only in cognitive demands, and not in duration, these results indicate that monkeys actively maintained the outcome of the previous trial in working memory. The results from Experiment 3 provide converging evidence that serial reversal learning expertise is facilitated by working memory.
General discussion
We applied interventions intended to selectively attenuate working memory and found that the development of serial reversal expertise in monkeys was facilitated by an increase in the relative control of choice behavior by working memory. Discrimination accuracy when the monkeys were naïve reversers in the first 10 reversals of Experiment 1 was the same regardless of whether a 1- or 30-s ITI preceded choice. Insensitivity to delay suggests that responding was largely controlled by habits that were not diminished by the passage of time. In the last 10 reversals of Experiment 1, when monkeys were expert and reversing rapidly, discrimination accuracy was significantly better following 1-s ITIs than 30-s ITIs, suggesting that responding was under greater relative control by delay-sensitive working memory. In Experiment 2, we found that reversal expertise, and the use of working memory, was not limited to stimuli with which monkeys had extensive training. Monkeys given new discriminanda were immediately more accurate after 1-s than after 30-s ITIs. Immediate generalization to new discriminanda indicates that rapid reversal learning in rhesus monkeys cannot be fully explained by the build-up of proactive interference. In Experiment 3, we used a concurrent cognitive load in the place of long ITIs to further assess whether expertise depended on working memory. Susceptibility to concurrent cognitive load is a signature of working memory (Basile and Hampton 2013b). Concurrent cognitive load disrupted reversal learning in expert reversers, further strengthening the case that working memory is important for reversal learning expertise.
The fact that reversal expertise generalized immediately to new stimuli in Experiment 2 suggests that PI does not account for improved reversal learning performance in rhesus monkeys. However, it is possible that PI develops very rapidly, perhaps after just one reversal. With only 6 monkeys, it is not possible to conduct a reliable comparison of accuracy in the long and short ITI conditions in the first reversal alone, so these data cannot entirely exclude the possibility of very rapid build-up of PI.
Another account of reversal expertise posits that the outcome of the previous trial becomes an increasingly salient source of information for guiding choice in the current trial as reversal experience is gained (Williams 1976). The author did not invoke working memory per se in this account, but our hypothesis that control of choice by working memory increases with successive reversals invokes the same change in the source of control of choice. The working memory account and the response–outcome account share a weakness in that neither clearly explains why habit would initially control choice and working memory would control choice only after considerable experience. We found that choice behavior in naïve reversers was not under the control of working memory, but we cannot be certain that the monkeys did not remember the outcome of the last trial from the beginning. It is therefore not clear whether monkeys only begin to remember the outcome of the last trial with experience or whether working memory for the outcome of the last trial is always present and the change in the contribution of working memory occurs by a process more like a shift in strategy. According to the exponentially weighted moving average (EWMA) model, memories are exponentially weighted to favor more recent events over more distant events, especially when environmental conditions are regularly changing (Killeen 1994; McNamara and Houston 1987). While the EWMA model does not evoke memory systems, like our approach the model describes a change in the weightings of memories, which we propose results from a shift in priority of memory systems.
We have stressed the importance of working memory in serial reversal learning improvement. However, working memory does not appear to account for all the improvements that occur as animals gain experience with reversals. As shown in Fig. 3, performance following 1- and 30-s ITI durations improved equivalently across the first 10 reversals in Experiment 1. Because manipulations of delay interval did not affect performance in these reversals, this initial improvement does not appear to be due to increasing control of choice by working memory. Perhaps, instead, the difference in the associative strengths of the two stimuli decreased as both stimuli were rewarded, resulting is less perseveration. Thus, proactive interference may have aided in the initial performance improvements across the first 10 reversals. However, if indeed PI aided early performance on the reversal task, it appears to have little effect on choice behavior after expertise has developed, because expertise transferred to new discriminanda in Experiment 2. Both PI and working memory may facilitate reversal learning, making different contributions depending on stage of training, the specific parameters of testing and possibly the species tested. It is interesting to consider the possibility that the contributions of PI and working memory might differ among species. Perhaps the effect of PI is strong in animals with comparatively weak working memory, as might be the case with pigeons, but plays a smaller role in animals that have comparatively robust working memory, like monkeys.
Indirect evidence supports this idea that the robustness of working memory in a given species determines the extent to which working memory is critical for expertise. First, proactive interference has been proposed to be the primary mechanism underlying serial reversal learning improvement in rodents, bumblebees and goldfish (Gonzalez et al. 1967; Mackintosh et al. 1968; Strang and Sherry 2014). Second, it has been suggested that dependence on habit in reversal learning is decreased in animals with larger brains. In many samples, larger brain size may accompany enlargement of the frontal lobes and thus enhancement of working memory. The so-called mediational paradigm has been used to assess the extent to which animals exhibit associative or rule-based strategies in reversal learning (Rumbaugh 1971). The meditational paradigm is a variation of a reversal learning task where animals learn an A+/B− discrimination. Once animals learn the A+/B− discrimination, they are given one A−/B+ reversal trial. Following this single reversal trial, animals are presented with one of the three conditions: a control A−/B+ condition, a new positive stimulus A−/C+ condition or a new negative stimulus C−/B+ condition. If an animal has learned the original discrimination through associative rules, such as “approach A” or “avoid B,” it will succeed on one or two of the conditions, but not all three. In contrast, if an animal has learned the original discrimination though a rule-based strategy, such as win-stay, lose-shift, it will perform equally well on all three conditions. The meditational paradigm has been tested on a variety of primate species, and rule-based learning is associated with larger brain size (Beran et al. 2008; Rumbaugh 1971, 1997; Rumbaugh and Pate 1984, Chapter 31). In light of our findings, it seems likely that the degree to which a species exhibits either associative or rule-based learning may be largely influenced by the extent to which their behavior is under greater relative control by either habit or working memory, respectively. Future comparative studies may address the extent to which “rule-based learning” depends on working memory.
There has been a resurgence in interest in reversal learning, manifest in a raft of recent studies of “midsession reversal” (McMillan et al. 2014; Rayburn-Reeves, et al. 2011; Smith et al. 2016; Stagner et al. 2013). Generally, these studies find that pigeons make many anticipatory and perseverative errors when a reversal predictably occurs in the middle of a testing session. This result clearly shows that the choice behavior of subjects is not controlled by working memory for the outcome of the last trial. If it were, subjects would make no anticipatory errors and very few, if any, perseverative errors. Instead, time since session onset (Rayburn-Reeves et al. 2011; Stagner et al. 2013) appears to influence midsession reversal choice behavior in pigeons (but see McMillan and Roberts 2012). Because estimates of time are fuzzy, anticipatory and perseverative errors occur even though the outcome of the last trial would be a nearly perfect cue for correct choice. Pigeons are not the only species to have been tested on the midsession reversal task, and near-optimal responding has been observed in humans, rhesus monkeys and rats (Rayburn-Reeves et al. in press; Rayburn-Reeves et al. 2011, 2013). As in the meditational paradigm, species differences on the midsession reversal task may reflect the degree to which choice is controlled by working memory.
Both the effects of concurrent cognitive load (Experiment 3; Basile and Hampton 2013b) and studies of directed forgetting (Tu and Hampton 2013) indicate that working memory is an active process in monkeys. Our analyses looking at early and late phases within reversals indicate that accuracy is reduced by long delays and high concurrent cognitive demands within the first 4 trials of a reversal. However, the effect of long ITIs and concurrent cognitive load reliably disappear with additional trials within a reversal. When active rehearsal of the positive stimulus is disrupted early in a reversal, choice is more greatly controlled by a habit that is incongruent with the current S+/S- conditions, causing perseverative errors. However, upon experiencing numerous trials under that particular S+/S− condition, subjects displayed near-optimal performance regardless of whether working memory was disrupted by long ITIs or concurrent cognitive load. From this finding, we posit the relative associative strength of the discriminanda flip within a reversal—early in the reversal, the new S− has a greater associative strength and later in the reversal, the S+ has the greater associative strength. If this is indeed the case, then there is little need to allocate limited attentional resources to actively maintaining the S+ “in mind” late within a reversal. In humans, the ability to multitask, measured by having subjects perform two tasks simultaneously, is substantially better when one of the two tasks can be solved through habit, compared to when both tasks require attentional resources (e.g., Lisman and Sternberg 2013). If monkeys are able to strategically shift attentional resources according to changes in concurrent cognitive demands, then performance on a secondary task would improve later in a reversal when the serial reversal learning task can be solved through habit alone. Future work should determine whether monkeys continue to actively maintain the S+ “in mind” late within a reversal, after it is no longer necessary, or instead adaptively reallocate cognitive resources.
Our results highlight the importance of working memory for the development of serial reversal expertise. However, other processes may also contribute, including inhibition of responses to previously rewarded stimuli. Our procedure and results do not directly address the role that inhibition might play in reversal expertise. We highlighted the control of choice by working memory and by habit, and we selectively attenuated the contribution of working memory, establishing a single dissociation. Our work did not selectively manipulate habit. Future work might be directed at generating a double dissociation with procedures that attenuate both habit and working memory.
We found that both habit and working memory contribute to choice in serial reversal learning. The development of expertise coincided with a shift from inflexible, habitual responding, to flexible, rapidly updated responding, suggesting that working memory is critical for reversal expertise. Using both ITI duration and concurrent cognitive load, we found converging evidence to support the hypothesis that working memory is critical for serial reversal learning expertise in rhesus monkeys. Furthermore, results from Experiment 2, in which use of working memory generalized to new stimuli, suggested that proactive interference played little role in determining choice behavior in experts. Our novel approach to the study of mechanisms underlying serial reversal learning expertise indicates that habit and working memory together determine the pattern of performance in expert reversers.
References
Aron A, Aron E (1999) Statistics for psychology. Prentice Hall, Upper Saddle River
Bachevalier J (1990) Ontogenetic development of habit and memory formation in primates. Ann NY Acad Sci 608(1):457–484
Baddeley A (1992) Working memory. Science 255(5044):556–559
Baddeley A (2000) The episodic buffer: a new component of working memory? Trends Cogn Sci 4(11):417–423
Baddeley A (2003) Working memory: looking back and looking forward. Nat Rev Neurosci 4(10):829–839
Baddeley AD, Thomson N, Buchanan M (1975) Word length and the structure of short-term memory. J Verb Learn Verb Be 14(6):575–589
Basile BM, Hampton RR (2013a) Monkeys show recognition without priming in a classification task. Behav Process 93:50–61
Basile BM, Hampton RR (2013b) Dissociation of active working memory and passive recognition in rhesus monkeys. Cognition 126(3):391–396
Basile BM, Schroeder GR, Brown EK, Templer VL, Hampton RR (2015) Evaluation of seven hypotheses for metamemory performance in rhesus monkeys. J Exp Psychol Gen 144(1):85–102
Beran MJ, Klein ED, Evans TA, Chan B, Flemming TM, Harris EH, Washburn DA, Rumbaugh DM (2008) Discrimination reversal learning in capuchin monkeys (Cebus apella). Psychol Rec 58(1):3–14
Bessemer DW, Stollnitz F (1971) Retention of discriminations and an analysis of learning set. In: Behavior of nonhuman primates vol 4, pp 1–58
Clayton KN (1966) T-Maze acquisition and reversal as a function of intertrial interval. J Comp Physiol Psychol 62(3):409–414
Cowan N (2008) What are the differences between long-term, short-term, and working memory? Prog Brain Res 169:323–338
Deets AC, Harlow HF, Blomquist AJ (1970) Effects of intertrial interval and Trial 1 reward during acquisition of an object-discrimination learning set in monkeys. J Comp Physiol Psychol 73(3):501–505
Diamond RF, Stoinski TS, Mickelberg JL, Basile BM, Gazes RP, Templer VL, Hampton RR (2016) Similar stimulus features control visual classification in orangutans and rhesus monkeys. J Exp Anal Behav 105(1):100–110
Dufort RH, Guttman N, Kimble GA (1954) One-trial discrimination reversal in the white rat. J Comp Physiol Psychol 47(3):248–249
Gasbarri A, Pompili A, Packard MG, Tomaz C (2014) Habit learning and memory in mammals: behavioral and neural characteristics. Neurobiol Learn Mem 114:198–208
Gazes RP, Brown EK, Basile BM, Hampton RR (2013) Automated cognitive testing of monkeys in social groups yields results comparable to individual laboratory based testing. Anim Cogn 16(3):445–458
Gonzalez RC, Behrend ER, Bitterman ME (1967) Reversal learning and forgetting in bird and fish. Science 158(3800):519–521
Grant DS, Roberts WA (1973) Trace interaction in pigeon short-term memory. J Exp Psychol 101(1):21–29
Hampton RR (2001) Rhesus monkeys know when they remember. Proc Natl Acad Sci USA 98(9):5359–5362
Harlow HF (1949) The formation of learning sets. Pychol Rev 56(1):51–65
Hay JF, Jacoby LL (1996) Separating habit and recollection: memory slips, process dissociations, and probability matching. J Exp Psychol Learn 22(6):1323–1335
Killeen PR (1994) Mathematical principles of reinforcement. Behav Brain Sci 17(01):105–135
Kraemer PJ, Golding JM (1997) Adaptive forgetting in animals. Psychon B Rev 4(4):480–491
Lisman J, Sternberg EJ (2013) Habit and nonhabit systems for unconscious and conscious behavior: implications for multitasking. J Cogn Neurosci 25(2):273–283
Mackintosh NJ, McGonigle B, Holgate V, Vanderver V (1968) Factors underlying improvement in serial reversal learning. Can J Psychol 22(2):85–95
McDonald RJ, White NM (1993) A triple dissociation of memory systems: hippocampus, amygdala, and dorsal striatum. Behav Neurosci 107(1):3–22
McMillan N, Roberts WA (2012) Pigeons make errors as a result of interval timing in a visual, but not a visual-spatial, midsession reversal task. J Exp Psychol Anim B 38(4):440–445
McMillan N, Kirk CR, Roberts WA (2014) Pigeon (Columba livia) and rat (Rattus norvegicus) performance in the midsession reversal procedure depends upon cue dimensionality. J Comp Psychol 128(4):357–366
McNamara JM, Houston AI (1987) Memory and the efficient use of information. J Theor Biol 125(4):385–395
Milner B (1970) Memory and the medial temporal regions of the brain. In: Pribram KH, Broadbent DE (eds) Biology of memory. Academic Press, New York, pp 29–50
Mishkin M, Malamut B, Bachevalier J (1984) Memories and habits: two neural systems. In: Lynch G, McGaugh JL, Weinberger NM (eds) Neurobiology of learning and memory. Guilford Press, New York
Packard MG, McGaugh JL (1996) Inactivation of hippocampus or caudate nucleus with lidocaine differentially affects expression of place and response learning. Neurobiol Learn Mem 65(1):65–72
Ploog BO, Williams BA (2010) Serial discrimination reversal learning in pigeons as a function of intertrial interval and delay of reinforcement. Learn Behav 38(1):96–102. doi:10.3758/LB.38.1.96
Poldrack RA, Packard MG (2003) Competition among multiple memory systems: converging evidence from animal and human brain studies. Neuropsychologia 41(3):245–251
Rayburn-Reeves RM, Molet M, Zentall TR (2011) Simultaneous discrimination reversal learning in pigeons and humans: anticipatory and perseverative errors. Learn Behav 39(2):125–137
Rayburn-Reeves RM, Stagner JP, Kirk CR, Zentall TR (2013) Reversal learning in rats (Rattus norvegicus) and pigeons (Columba livia): qualitative differences in behavioral flexibility. J Comp Psychol 127(2):202–211
Rumbaugh DM (1971) Evidence of qualitative differences in learning processes among primates. J Comp Physiol Psych 76(2):250–255
Rumbaugh DM (1997) Competence, cortex, and primate models: a comparative primate perspective. In: Krasnegor NA, Lyon GR, Goldman-Rakic PS (eds) Development of the prefrontal cortex: evolution, neurobiology, and behavior. Paul H. Brookes, Baltimore, pp 117–139
Rumbaugh DM, Pate JL (1984) The evolution of cognition in primates: a comparative perspective. In: Roitblat HL, Bever TG, Terrace HS (eds) Animal cognition. Erlbaum, Hillsdale, pp 569–587
Sherry DF, Schacter DL (1987) The evolution of multiple memory systems. Psychol Rev 94(4):439–454. doi:10.1037/0033-295X.94.4.439
Shettleworth SJ (2010) Cognition, evolution, and behavior. University Press, New York
Smith AP, Pattison KF, Zentall TR (2016) Rats’ midsession reversal performance: the nature of the response. Learn Behav 44(1):49–58
Stagner JP, Michler DM, Rayburn-Reeves RM, Laude JR, Zentall TR (2013) Midsession reversal learning: why do pigeons anticipate and perseverate? Learn Behav 41(1):54–60
Strang CG, Sherry DF (2014) Serial reversal learning in bumblebees (Bombus impatiens). Anim Cogn 17(3):723–734
Tu HW, Hampton RR (2013) One-trial memory and habit contribute independently to matching-to-sample performance in rhesus monkeys (Macaca mulatta). J Comp Psychol 127(3):319–328
Tu HW, Hampton RR (2014) Control of working memory in rhesus monkeys (Macaca mulatta). J Exp Psychol 40(4):467–476
Tu HW, Hampton RR, Murray EA (2011) Perirhinal cortex removal dissociates two memory systems in matching-to-sample performance in rhesus monkeys. J Neurosci 31(45):16336–16343
Williams BA (1976) Short-term retention of response outcome as a determinant of serial reversal learning. Learn Motiv 7(3):418–430
Yokoyama C, Tsukada H, Watanabe Y, Onoe H (2005) A dynamic shift of neural network activity before and after learning-set formation. Cereb Cortex 15(6):796–801
Acknowledgements
The authors followed the ethical guidelines set by the National Research Council’s Guide for the Care and Use of Laboratory Animals, as well as the regulations set by the Office of Laboratory Animal Welfare and the United States Department of Agriculture. All procedures were consistent with United States law and were approved by the Institutional Animal Care and Use Committee at Emory University (protocol #: YER2002833). This work was supported by National Science Foundation Grants Nos. IOS-1146316 and BCS-1632477, and by the Office of Research Infrastructure Programs/P51OD11132.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
We thank Tara A. Dove-VanWormer for assistance with testing animals and animal care.
Rights and permissions
About this article
Cite this article
Hassett, T.C., Hampton, R.R. Change in the relative contributions of habit and working memory facilitates serial reversal learning expertise in rhesus monkeys. Anim Cogn 20, 485–497 (2017). https://doi.org/10.1007/s10071-017-1076-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10071-017-1076-8