Change in the relative contributions of habit and working memory facilitates serial reversal learning expertise in rhesus monkeys

Hassett, Thomas C.; Hampton, Robert R.

doi:10.1007/s10071-017-1076-8

Change in the relative contributions of habit and working memory facilitates serial reversal learning expertise in rhesus monkeys

Original Paper
Published: 09 February 2017

Volume 20, pages 485–497, (2017)
Cite this article

Download PDF

Access provided by CONRICYT – Journals CONACYT

Animal Cognition Aims and scope Submit manuscript

Change in the relative contributions of habit and working memory facilitates serial reversal learning expertise in rhesus monkeys

Download PDF

594 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

Functionally distinct memory systems likely evolved in response to incompatible demands placed on learning by distinct environmental conditions. Working memory appears adapted, in part, for conditions that change frequently, making rapid acquisition and brief retention of information appropriate. In contrast, habits form gradually over many experiences, adapting organisms to contingencies of reinforcement that are stable over relatively long intervals. Serial reversal learning provides an opportunity to simultaneously examine the processes involved in adapting to rapidly changing and relatively stable contingencies. In serial reversal learning, selecting one of the two simultaneously presented stimuli is positively reinforced, while selection of the other is not. After a preference for the positive stimulus develops, the contingencies of reinforcement reverse. Naïve subjects adapt to such reversals gradually, perseverating in selection of the previously rewarded stimulus. Experts reverse rapidly according to a win-stay, lose-shift response pattern. We assessed whether a change in the relative control of choice by habit and working memory accounts for the development of serial reversal learning expertise. Across three experiments, we applied manipulations intended to attenuate the contribution of working memory but leave the contribution of habit intact. We contrasted performance following long and short intervals in Experiments 1 and 2, and we interposed a competing cognitive load between trials in Experiment 3. These manipulations slowed the acquisition of reversals in expert subjects, but not naïve subjects, indicating that serial reversal learning expertise is facilitated by a shift in the control of choice from passively acquired habit to actively maintained working memory.

Cognitive control of working memory but not familiarity in rhesus monkeys (Macaca mulatta)

Article 07 July 2020

Knowledge generalization and the costs of multitasking

Article 08 November 2022

Complementary task representations in hippocampus and prefrontal cortex for generalizing the structure of problems

Article Open access 28 September 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The diversity of functionally distinct memory systems likely evolved in response to divergent selection pressures that vary with environmental conditions and across development (Sherry and Schacter 1987). A habit memory system appears early in development, facilitates the gradual learning of an indefinite number of habits and skills that are stable over long intervals and is not accessible to cognitive monitoring and control (Bachevalier 1990; Gasbarri et al. 2014). In contrast, working memory facilitates the rapid acquisition and relatively brief retention of a limited amount of information (Baddeley 1992; Shettleworth 2010, Chapter 7). In humans, and possibly in some non-humans, the contents of working memory are actively maintained and accessible to cognitive monitoring and control (Baddeley 2003; Basile et al. 2015; Basile and Hampton 2013b; Cowan 2008; Hampton 2001; Tu and Hampton 2014).

Independent memory systems may act simultaneously in parallel to regulate behavior (Hay and Jacoby 1996; McDonald and White 1993; Poldrack and Packard 2003; Tu and Hampton 2013; Tu et al. 2011). Dissociation of memory systems is established when altering the contributions one memory system makes to behavior leaves the contribution of another system relatively intact. For example, Tu and Hampton (2013) studied the relative contributions of habits and “one-trial memories,” the latter being a type of memory of indeterminate status—relatively short term compared to habit, but not clearly working memory. These authors found that these two types of memory can be controlled independently. Decreasing the likelihood of reward following a stimulus reduced the control of behavior by habit memory, but did not affect control by “one-trial memory” in rhesus monkeys. Lengthening the duration of retention intervals decreased control by one-trial memory, but left control by habit intact (Tu and Hampton 2013). Such behavioral dissociations are often related to neurobiological dissociations. In the above example, one-trial memory, but not habit, depended critically on the perirhinal cortex in rhesus monkeys (Tu et al. 2011). The amount of experience with a given task can also dissociate the contributions of distinct memory systems. Rats trained to retrieve food rewards on a plus maze initially used allocentric spatial cues to locate the food. After repeatedly starting from the same location and turning in a particular direction to retrieve the food, the behavior of the rats came under the control of egocentric cues. Although the control of behavior switched from predominantly allocentric to predominantly egocentric cues with training, both types of memory remained present and capable of controlling behavior. Inactivation of the dorsal striatum, which is critical for the control of behavior by egocentric cues, resulted in a return of control by allocentric cues (Packard and McGaugh 1996).

Shifts in the relative control of behavior by distinct memory systems may also occur during the formation of learning sets. In Harlow’s seminal Formation of Learning Sets (1949), the term learning-to-learn was used to describe a shift from gradual to rapid acquisition of discrimination learning tasks as rhesus monkeys completed successive discriminations. Neuroimaging of non-human primates indicates that the formation of a learning set co-occurs with a shift from striatal to lateral prefrontal cortical activity (Yokoyama et al. 2005). Given that shifts in dominance among multiple memory systems contribute to the development of a learning set, it is likely that such shifts also contribute to other learning-to-learn tasks, such as serial reversal learning.

In serial reversal learning, subjects are repeatedly presented with discrimination trials containing the same two objects or images. At any given time, only one of the two stimuli is rewarded when selected. Within every reversal, the positive stimulus (S+) is rewarded if selected and will remain positive until a predetermined performance criterion is met. Upon reaching criterion, the contingencies of reinforcement reverse (i.e., S+ becomes S− and S− becomes S+). Subjects are then required to meet criterion by selecting the formerly non-reinforced stimulus. This process may be repeated for many reversals.

Reversal learning improves with reversal experience. Naïve subjects reverse gradually, making many perseverative choices of the stimulus that was rewarded before the most recent reversal (e.g., Mackintosh et al. 1968). After experiencing many reversals, naïve reversers become experts and show flexible, win-stay, lose-shift responding, sometimes making only a single error before reliably selecting the previously incorrect stimulus (Bessemer and Stollnitz 1971; Shettleworth 2010, Chapter 6). The appearance of the win-stay, lose-shift response pattern occurs in the absence of any change in external task demands, suggesting that the development of expertise is facilitated by a shift in the relative control of choice behavior by distinct memory systems.

One account of performance improvements in serial reversal learning is that responding becomes less perseverative as proactive interference accumulates (Mackintosh et al. 1968). After both stimuli have been extensively reinforced in successive reversals, the difference in associative strength between them may be only modestly affected by current reinforcement. It has therefore been rather counterintuitively argued that the resulting difficulty in discriminating the associative value of the two stimuli reduces perseveration, allowing subjects to respond more flexibly at the onset of a reversal (Clayton 1966; Gonzalez et al. 1967; Kraemer and Golding 1997; Strang and Sherry 2014). However, an inherent issue with a proactive interference explanation is that it can only account for reversal improvement when reversals (i.e., the exchange from S₁+/S₂− to S₁−/S₂+) are separated by long intervals. If instead reversals occur in rapid succession, such that the inter-reversal interval is no different from the inter-trial interval, the contributions by proactive interference will likely be outweighed by recency of the last rewarded choice, and thus, preference for the previous S+ will persist into the new reversal (Kraemer and Golding 1997; Mackintosh et al. 1968). Given that performance improvements occur even when reversals are experienced in rapid succession, it seems likely that alternative mechanisms also contribute to the development of expertise in serial reversal learning.

The development of serial reversal learning expertise may be facilitated by a shift in the relative control of choice by working memory and habit. We hypothesize that choice in naïve reversers is under greater relative control by a habit system, while choice in expert reversers is under greater relative control by working memory. Control of choice by habit would explain the relatively gradual reversing, marked by perseveration, observed in naïve reversers. Control of choice by working memory would account for the flexible, rapid reversing when these reversers become experts. If habit controls choice in naïve reversers and working memory controls choice in expert reversers, then manipulations that attenuate working memory should impair reversal learning in expert but not naïve reversers.

Experiment 1

We tested whether the development of serial reversal learning expertise in rhesus monkeys is facilitated by an increase in the relative control of choice by working memory rather than habit. The contents of working memory are typically available for short periods of time while habits remain intact over long intervals (Baddeley 2000; Grant and Roberts 1973; Mishkin et al. 1984, Chapter 2). This difference in availability after the passage of time allowed us to assess the relative contributions of working memory and habit to choice by manipulating the interval between successive discrimination trials. In successive discriminations, the inter-trial interval (ITI) is the interval over which information from the last trial must be maintained to inform choice on the current trial. Working memory for the outcome of the last discrimination should be substantially attenuated after long ITIs, whereas habit resulting from previous trials should persist. We compared accuracy on discrimination trials following short 1-s and long 30-s ITIs across many reversals to determine whether the extent to which habit and working memory controlled choice changed as monkeys changed from naïve reversers at the beginning of training to expert reversers by the end of training. If choice in naïve reversers is controlled primarily by habit, there should be no difference in discrimination performance following 1- and 30-s ITIs. To the extent that serial reversal expertise is under the control of working memory, discrimination performance should be significantly better following 1-s ITIs than 30-s ITIs.

Methods

Subjects and apparatus

Six adult, male rhesus monkeys (Macaca mulatta; mean age = 9.16 years) were used. Monkeys received full daily food rations and ad libitum access to water. Two of the six monkeys were pair-housed at the time of this study. The other four monkeys were individually housed, in line with veterinary guidance, but had visual contact with other monkeys. Testing occurred for up to seven hours a day, six days a week. Monkeys were tested in their home cages using portable testing rigs. Each testing rig was equipped with 15-inch color LCD touch-sensitive screen (Elo TouchSystems, Menlo Park, CA), running at a resolution of 1024 × 768 pixels, and two automatic food dispensers (Med Associates, Inc., St. Albans, VT) which delivered nutritionally balanced primate pellets (Bio-Serv, Frenchtown, NJ). Tests were controlled by a personal computer running a custom program written in presentation (Neurobehavioral Systems, Albany, CA). All six subjects had previous experience with touch screen tasks, including image discrimination; however, none of the six had previous experience with reversal learning. Pair-housed monkeys were separated during testing by a panel that allowed limited visual, auditory and tactile contact but prevented access to the other monkey’s computer screen.

Procedure

Figure 1 depicts the sequence of events in a trial. Subjects initiated each trial by touching a 100 × 100 pixel green square twice (FR 2). Two images (350 × 350 pixels) appeared, each placed 250 pixels left or right of the center of the touch screen. The left–right position of the two images was counterbalanced and pseudo-randomly determined such that a stimulus could appear on the same side no more than 4 times in a row. The same two images were used throughout all reversals. Monkeys selected one of the two images by touching it twice (FR 2). Touching the S+ cleared the screen and produced a positive sound and a food pellet. Touching the S− cleared the screen and produced a negative sound. Either a 1- or 30-s ITI ensued. The 1- and 30-s ITIs alternated, regardless of trial outcome.

The same image was the S+ until monkeys reached a performance criterion of 15 out of 16 correct discrimination trials. This criterion was assessed once every block of 16 trials. If criterion was met, a reversal occurred; if criterion was not met, 16 additional trials were administered continuing the same S+/S− arrangement. The first trial in which the reversed contingencies were in place, Trial 0, was not included in the performance criterion because the monkeys could not know that the reversal had occurred until they received feedback on this trial. Thus, every reversal contained at least 17 trials: Trial 0, followed by blocks of 16 trials, 15 of which had to be correct to trigger a reversal. The odd number of trials ensured that Trial 1 of each reversal followed a 1- or 30-s ITI equally often. Testing continued until monkeys had completed a total of 90 reversals. Any reversal that had not been completed by the end of a testing day was administered at the start of the next testing day. Thus, both the reversal number and S+/S− configuration carried over across days; however, any progress toward reaching reversal criterion did not. Trials from incomplete reversals were not included in the data analysis.

We calculated the proportion of correct discrimination trials following 1- and 30-s ITIs to assess the relative control of choice by habit and working memory. Proportion correct scores were arcsine transformed before analysis (Aron and Aron 1999, Chapter 14) to better approximate normality. We hypothesized that if choice behavior in naïve reversers was largely controlled by habit, with little contribution of working memory, discrimination performance would not differ following 1- or 30-s ITIs. Complementarily, we hypothesized that if choice behavior in expert reversers was under greater relative control by working memory, accuracy would be higher on discrimination trials following 1-s ITIs compared to 30-s ITIs.

Results and discussion

Monkeys were scheduled to complete 90 reversals in Experiment 1. However, on the 4th day of testing, 4 of the 6 monkeys were accidentally tested with only 1-s ITIs, rather than alternating long and short ITIs. These four monkeys performed one day of reversals under this erroneous condition, averaging 56.5 reversals. Three of these four monkeys had already completed at least 35 reversals before receiving the incorrect version of the program. The fourth monkey had completed only 8 reversals. Because analysis required 10 reversals under the alternating ITI conditions, this monkey was not included in the analysis of Experiment 1. All monkeys were given an additional 30 reversals of the correct testing with alternating ITIs. As a result, the five monkeys completed an average of 5.2 testing days and 120.8 reversals. Thus, despite the experimental error, we acquired a block of at least 10 sessions of data from 5 monkeys when they were novice reversers and another block of 10 sessions after we expected them to be expert.

We compared the first 10 and the last 10 reversals performed by each monkey to determine whether discrimination performance improved across the intervening reversals. Accuracy was assessed by averaging the number of errors committed in Trials 1–16 of each of the reversals. Only Trials 1–16 were used in the analysis because monkeys could, and sometimes did, reach criterion in the first block of 16 trials in a reversal. Monkeys made significantly more errors in the first 10 reversals than the last 10 reversals, suggesting that they had developed serial reversal expertise over the course of training (first 10: M = 13.54, SD = 7.278; last 10: M = 4.66, SD = 3.274).

Figure 2 shows that performance following 1- or 30-s ITIs did not differ early in the reversal task; however, after experiencing many reversals monkeys performed significantly better following 1-s ITIs relative to 30-s ITIs. To determine whether the control of choice by working memory changed as a function of reversal experience, we compared the proportion of correct discrimination trials preceded by 1- and 30-s ITIs during the first and last 10 reversals. The difference in accuracy following 1- and 30-s intervals was significantly greater during the last 10 reversals, and there was a significant difference in accuracy between 1 and 30-s ITI types (two-factor repeated measures ANOVA; reversal experience: F _(1,4) = 39.9, P = 0.003; ITI type: F _(1,4) = 101.6, P = 0.001; interaction: F _(1,4) = 22.0, P = 0.009). Follow-up analyses confirmed that accuracy was significantly higher following 1-s than 30-s ITIs in the last 10 reversals, while this difference was not present in the first 10 reversals (paired samples t tests; first 10 reversals: t ₄ = −.801, P = 0.468; last 10 reversals: t ₄ = 11.706, P < 0.001). Follow-up analyses also showed that discrimination accuracy following 30-s ITIs was significantly more accurate in the last 10 reversals than in the first 10 reversals (paired samples t test; reversal experience: t ₄ = −3.834, P = 0.019). Interestingly, as is shown in Fig. 3, accuracy following both 1- and 30-s ITI durations improved equally across the first 10 reversals. Because there was no difference in accuracy between the two ITI types in the first 10 reversals, it suggests that monkeys were initially aided by a process other than working memory.

We replicated the finding that animals become more proficient at reversing with more experience (Dufort et al. 1954; Mackintosh et al. 1968; Ploog and Williams 2010). Longer ITIs impaired learning by monkeys once they became expert reversers, but it did not affect them while they were still naïve reversers, suggesting that working memory is critical for reversal expertise. We propose that working memory increases reversal efficiency as it allows subjects to update their representation of the current S+/S− based on feedback from the outcome of the previous trial. Thus, the difference between the contributions by a working memory system, relative to a habit system, is likely to be most pronounced immediately after a reversal has occurred.

To determine whether working memory was especially critical early in reversals in expert reversers, we compared accuracy following 1- and 30-s ITIs early and late in reversals. We averaged accuracy on Trials 1–4 under each ITI condition and on Trials 13–16 under each ITI condition. We determined these scores for the first 10 reversals, while monkeys were naïve, and for the last 10 reversals, when monkeys were expert. If working memory was critical for rapid reversal in expert reversers, but not in naïve reversers, we should find that monkeys were especially accurate early in reversals after they were expert. This pattern, evident in Fig. 4, is supported by a three-way interaction between level of expertise, phase of reversal and type of ITI (three-factor repeated measures ANOVA; reversal experience: F _(1,4) = 38.280, P = 0.003; early versus late: F _(1,4) = 38.238, P = 0.003; ITI type: F _(1,4) = 81.304, P = 0.001; reversal experience * early versus late: F _(1,4) = 28.684, P = 0.006; reversal experience * ITI Type: F _(1,4) = 28.362, P = 0.006; early versus late * ITI Type: F _(1,4) = .035, P = 0.860; reversal experience * early versus late * ITI Type: F _(1,4) = 12.543, P = 0.024). This result shows that performance increased most rapidly early in reversals after monkeys became expert at reversing, consistent with a strong influence of working memory early in reversals after expertise was established.

Our findings suggest that control of choice by habit and working memory differed between naïve and expert reversers. Choice behavior in naïve reversers appeared to be under greater relative control of a habit system. When the monkeys were naïve they were both less accurate, relative to when they were experts, and were unaffected by ITI duration. When monkeys became expert, choice behavior appeared to be under greater relative control by working memory.

Our findings are consistent with those from pigeons in which performance on serial reversal learning tasks is significantly worse when reversals contained only long ITIs than when they contained only short ITIs (Ploog and Williams 2010; Williams 1976). Similar ITI-sensitive performance has been reported in rhesus monkeys performing an object discrimination learning set task (Deets et al. 1970). Together these findings suggest that reversal expertise is contingent on working memory for the outcome of the previous trial.

An alternative interpretation is that subjects are disproportionately more likely to be affected by proactive interference after long than short ITIs. According to this account, proactive interference causes the relative validity of memories A+/B− and B+/A− to become equal, allowing for greater flexibility after a long delay interval (Clayton 1966; Kraemer and Golding 1997; Mackintosh et al. 1968). The proactive interference account thus also coheres well with previous research. Specifically, subjects exhibit less perseveration at the onset of a new reversal as reversal experience accrues, so long as consecutive reversals are separated by long intervals. Furthermore, overall performance on serial reversal learning tasks is worse when all trials are separated by long ITIs compared to short ITIs. Because the associative strength of both stimulus representations becomes similar over long intervals between trials or reversals, one might postulate that our operational definition of expertise in rhesus monkeys can be accounted for by an accumulation of proactive interference.

Our results from Experiment 1 do not provide enough evidence to conclude whether working memory or proactive interference is responsible for the development of expertise in rhesus monkeys. However, these accounts can be distinguished experimentally because proactive interference accounts depend on experience with specific stimuli while the working memory account posits a general shift in information processing. If a monkey has learned to actively maintain the previous trial in mind, it should be able to continue using this strategy if given a new pair of images to discriminate. By contrast, proactive interference depends on experience with specific stimuli, such that introducing a new image pair should eliminate expertise until PI accrues again over multiple reversals. In our next experiment, therefore, we contrasted the working memory and proactive interference accounts of reversal expertise by administering the same serial reversal task with a new pair of images. We hypothesize that if monkeys developed expertise through a generalizable shift in control of choice by working memory, then their performance will continue to be affected by ITI duration across all reversals with the new images. If instead monkeys developed expertise through an accumulation of proactive interference, we hypothesize that performance will be affected by ITI duration only after they have experienced numerous reversals with the new images.

Experiment 2

In Experiment 2, we tested the same rhesus monkeys on the same serial reversal learning task, using two new images. An alternative account for serial reversal learning improvement suggests that responding becomes more flexible as proactive interference accumulates (Clayton 1966; Gonzalez et al. 1967). By using two new images, we eliminate proactive interference from Experiment 1. The working memory account of expertise predicts that the difference between short and long ITI trials should appear immediately, within the first 10 reversals with the new images, while the proactive interference account predicts that expertise will emerge gradually as monkeys experience reversals with the new images.