Organisms, human and nonhuman alike, must make a multitude of second-by-second decisions about how to adaptively respond to the environment in order to optimize gains and minimize losses associated with their behaviors. These decisions are colored by their innate drives, reinforcement history, and cognitive expectations. The selection of appropriate action involves a balance of processes that permit hasty action in the absence of deliberative consideration (i.e., impulsive responses and habits) and those that involve a slow set of cognitions necessary for optimized response selection in a complex, conditional and/or changing world. By virtue of repetition and embedded in a long-term reinforcement rule, there are certain behaviors (e.g., stimulus–response habits) that require little cognitive effort to be elicited and/or emitted (Balleine and O’Doherty 2010). For controlled decision-making processes to be optimized, individuals must be able to exert inhibitory control over these rapid, automatic response systems, when appropriate (Roberts and Wallis 2000; Aron et al. 2004; Eagle et al. 2008).

Our understanding of the role for inhibitory control in human decision-making processes has been greatly advanced through the study of cognitively guided behavior of laboratory animals. The power of this experimental approach depends heavily on our ability to accurately measure a latent process of interest using a behavioral task (i.e., construct validity) as well as whether we can use an array of tasks to measure a unitary construct in humans and laboratory animals (i.e., predictive or translational validity). For relatively simple behavioral phenomena (e.g., fear conditioning), it is easy to see how these criteria can be satisfied, but for processes like decision-making and its associated executive functions, the challenge is more substantial. Fortunately, studies of a phenomenon called reversal learning, in a variety of species, have been critical in uncovering both the brain circuitry that support flexible choices during decision-making as well as identifying relevant neurochemical and genomic determinants.

Discrimination reversal learning involves repeated pairing of an action (digging in a bowl, physically interacting with a lever or touchscreen, or displacing an object) with an outcome (e.g. provision of a food reward). In these contexts, subjects can learn about reward contingencies through the sensory properties of cues that predict reward availability and the actions required to procure that reward. Operationally, the subject first learns that discriminative stimuli carry information about whether a particular response instrumentally generates a reward (e.g., dig in a bowl scented with aroma A but not B; press the left, but not right, lever; or touch visual stimulus A on a screen, but not B to obtain food). Over the course of training, subjects become proficient at providing discriminated behavior, consistent with the associative rules. The rules learned can be deterministic or probabilistic.

Typically, after reaching a learning criterion for accuracy on this discrimination problem, the reversal phase is implemented, and the reward contingencies are reversed. At reversal, the trained response no longer results in reward, though it remains at least temporarily dominant (prepotent) because of the initial training history. For this reason, reversal learning (unlike the initial stage of discrimination learning or acquisition) emphasizes the need for the subject to effortfully withhold the initially trained response and, instead, emit those responses it previously learned to be useless. Reversal learning is, therefore, thought to measure flexibility of response, referred to in the literature using an array of terminology: “cognitive flexibility,” “behavioral flexibility,” “cognitive control,” “inhibitory control,” “impulse control,” “response inhibition,” and “behavioral inhibition,” to name a few. The ability to adapt to changes in reward contingency during reversal learning relies on a circumscribed neural circuitry (Fineberg et al. 2010; Chudasama 2011), an orchestrated balance of neurotransmitters (Robbins 2005; Dalley et al. 2008; Flagel et al. 2011) and genes (Laughlin et al. 2011).

The study of individual differences in flexible responding during reversal learning is potentially relevant to our understanding of normal human behavior and temperament. For example, individual differences in the propensity to make quick, poorly considered choices may underlie dimensions of impulsive temperament and personality. Evidence shows that inflexible responding in reversal learning is genetically related to impulsivity (Franken et al. 2008; Crews and Boettiger 2009; Romer et al. 2009; Fineberg et al. 2010). This poses the interesting possibility of employing the reversal learning paradigm to quantify impulsive behavior across species and to index biological vulnerability for disorders characterized by extreme impulsivity, such as attention deficit hyperactivity disorder or ADHD (Itami and Uno 2002), impulse control disorders, and the propensity for the initiation of illicit substance use (Brewer and Potenza 2008; Winstanley et al. 2010).

Another conceptualization of reversal learning emphasizes the notion that the difficulty to disengage from ongoing behavior after a contingency shift reflects a compulsive or habitual response tendency. This suggests that the task measures a set of processes related to automatized behavior, which may also be germane to psychiatric disorders like addiction. In essence, drug dependence involves the compulsion to consume an illicit substance and a loss of control over intake, despite negative consequences (DSM-IV, American Psychiatric Association 1994). Therefore, reversal learning abilities in subjects may be informative of both impulsive and compulsive aspects of a variety of forms of psychopathology.

In this review, we first describe common variants of reversal learning and identify shared and unique characteristics relative to other commonly used tasks in animal behavioral neuroscience research to measure impulsive or compulsive behavior, specifically instrumental extinction and go/no-go tasks. We also review the neural circuitry of reversal learning and its supporting neuropharmacology and genetic determinants. We then relate reversal learning to the impulsive and compulsive aspects of addiction.

Discrimination reversal learning

The literature on the neuroanatomy and neuropharmacology of reversal learning is vast, particularly considering there are many different task variants and implementations. Because of an interest in using these phenotypes to better understand impulsive and compulsive reward seeking (addictions), we restrict this review to instrumental, reward-related reversal learning as opposed to tasks that involve reversal of Pavlovian rules or tasks that primarily involve aversively motivated learning. Additionally, we limit our discussion to tasks that emphasize updating of behavior in response to changes in reinforcement contingencies as opposed to shifts in attentional focus or strategy.

There are many kinds of reversal tasks studied, each varying according to the sensory modality of the discrimination, the concurrent or sequential nature of stimulus presentation and the type of response made. Examples include reversal of: concurrent or sequential odor discriminations (McAlonan and Brown 2003; Schoenbaum et al. 2003), spatial discriminations in maze contexts (Jentsch and Taylor 2001; Bannerman et al. 2003), operant discriminations (El-Ghundi et al. 2003; Boulougouris et al. 2007; Laughlin et al. 2011) and visual discriminations. The latter frequently employ touchscreen response methodology (such as that shown in Fig. 1a) and are emergent paradigms in both mice (Bussey et al. 2001; Izquierdo et al. 2006; Brigman et al. 2010; Barkus et al. 2011; Bissonette and Powell 2011) and rats (Chudasama and Robbins 2003; Izquierdo et al. 2010), being similar to methods used in monkeys (Dias et al. 1996; Rudebeck et al. 2008) and humans (Robbins et al. 1998). Most discrimination reversal studies administer two stimuli (or objects) that the subject is required to discriminate (Fig. 1b). Often, post hoc analyses are used to assess the kinds of errors committed during reversal learning: stages of learning (below chance, at chance, and above chance performance; Jones and Mishkin 1972), trial-by-trial rewarded and unrewarded choices (Rudebeck and Murray 2008) as well as perseveration indices representing consecutive or “correction” errors over the total number of errors committed (Chudasama and Robbins 2003; Izquierdo et al. 2006). Other methods have incorporated the use of three or more concurrent items to dissect error type within the task itself (Lee et al. 2007; Seu et al. 2009).

Fig. 1
figure 1

An example of the apparatus and stimuli used in touchscreen-based operant methods for visual discrimination reversal learning. a An operant chamber is modified to accommodate a touchscreen (this setup is used for testing mice). Animals are required to nose poke the touchscreen on one end of the chamber and procure a pellet on the opposite side of the chamber. b Equiluminant stimuli used for discrimination and reversal learning. Adapted from Izquierdo et al. 2006 Behav Brain Res

As noted above, the relationships between actions and outcomes in reversal learning tasks can be either deterministic (fully predictive) or probabilistic. In most animal studies, deterministic rules are used, though probabilistic rules are commonly used in human subjects in order to slow down the rate of initial and reversal learning. This represents an important dimension of discrepancy, though recent studies in rats have begun to use probabilistic rules (Bari et al. 2010) and other experiments in human subjects have used deterministic rules (Ghahremani et al. 2010), helping to bridge the gap.

A final factor of relevance in considering implementations of reversal learning tasks relates to whether subjects learn one reversal or multiple (serial) reversals. If administered serially, the task encourages an automatized switching tendency, possible rule learning, acquisition of a reversal learning “set” as well as prospective planning for anticipated reward contingencies (Murray and Gaffan 2006). These task details theoretically correspond to different underlying neural mechanisms.

Comparison with related tasks

Reversal learning procedures are just one of an array of tasks designed to measure aspects of impulsive actions and choices. Tests of impulsive choice, like delay and effort discounting, juxtapose the possibility of procuring an immediate or low effort reward of small magnitude or a delayed or effortful, albeit larger, reward. When compared to reversal learning, the effects of lesions and pharmacological manipulations on these tasks appear to vary more frequently with the methods used. Due to space limitations, we restrict our comparison of reversal learning with more qualitatively similar tests of inhibitory control (that also have the capability of measuring impulsive action): namely, instrumental extinction and go/no-go.

Reversal learning, tests of instrumental extinction, and go/no-go tasks all implement dynamic adjustments in the occurrence of reward and measure functions related to response inhibition or inhibitory control. During reversal learning, subjects must suppress one response while engaging actively in another to obtain reward; therefore, there remains a strong motivational impulse to respond postreversal. In extinction, however, the subject may simply inhibit the conditional response altogether, reflecting the importance of conserving energy when actions no longer result in reward. Therefore, inhibition of response in reversal and extinction exists in different motivational contexts. Reversal learning may reflect selective response inhibition as opposed to more general behavioral inhibition mechanisms.

In the go/no-go task, the subject is also required to inhibit a response in the presence of a discriminative stimulus. These procedures are conditional discriminations in the sense that the subject must respond to “go” cues (e.g., nose poke an aperture in the presence of a green light) and inhibit response to the “no-go” cues (e.g., withhold nose poke in the presence of a red light). Usually these two types of trials are presented randomly throughout the testing session, with go cues being more frequent in order to form the prepotent tendency to respond strongly. One main difference between go/no-go and reversal learning is the presentation of the discriminative stimuli or cues: in the go/no-go task they are presented serially whereas in reversal learning they are usually presented concurrently. Consequently, the “no-go” trials are more similar to instrumental extinction trials, supporting general behavioral or response inhibition mechanisms rather than eliciting a more selective reallocation and control of response. Additionally, in the go/no-go task, withholding a response on “no-go” trials may be rewarded, whereas in reversal learning only an instrumental response yields a reward.

Though we do not describe in detail the underlying neural substrates of instrumental extinction and go/no-go in this review, there is much overlap with those supporting reversal learning (described below), consonant with their shared features.

Neural circuitry of reversal learning

Reversal learning recruits frontocorticostriatothalamic loops often implicated in psychiatric and neurological disorders (Haber 2003) (Fig. 2). Studies of the effects of discrete brain lesions on reversal learning have traditionally placed greater emphasis on determining the neural substrates of reversal, and not initial discrimination, learning. Such reports of “reversal-specific” effects across many species have greatly contributed to both the predictive validity of the task itself as a measure of inhibitory control as well as its discriminant validity (e.g., its utility as a diagnostic tool).

Fig. 2
figure 2

Frontocorticostriatal circuitry in discrimination reversal learning. Left of the dotted line: a general ‘loop’ of connectivity. Frontal cortex and its connections with different levels of the striatum are rerouted back to itself by the thalamus. Right of the dotted line: one of several nested anatomical loops within the general loop, this one specifically implicated in discrimination reversal learning. Connections between orbitofrontal cortex (OFC) and striatum are modulated by dopamine (DA). Additional abbreviations: GPi internal segment of globus pallidus, SNr substantia nigra pars reticulata, VAmc ventralis anterior pars magnocelullaris, MDmc medialis dorsalis pars magnocellularis, m medial, rm rostromedial, mdm medial dorsomedial. Adapted from Chudasama and Robbins, 2006 Biol Psychol

Orbitofrontal cortex

If we restrict our description of the circuitry to that which supports two stimuli odor or visual discrimination reversal learning, the literature is quite consistent across species. Mice (Bissonette et al. 2008), rats (Chudasama and Robbins 2003; McAlonan and Brown 2003) and monkeys (Jones and Mishkin 1972; Dias et al. 1996; Izquierdo et al. 2004) with lesions that include the orbitofrontal cortex (OFC) exhibit normal acquisition of the initial discrimination but are impaired at reversal, often exhibiting perseverative responding to the previously rewarded stimulus (Rudebeck and Murray 2008). Conversely, medial and dorsal subregions of the frontal cortex are not critical to discrimination reversal learning in mice, rats, or monkeys (Birrell and Brown 2000; Boulougouris et al. 2007; Bissonette et al. 2008; Rudebeck et al. 2008). Instead, the medial wall of the frontal cortex in the rodent brain (to include prelimbic and infralimbic cortices) appears to be more important in task and strategy switching (Ragozzino et al. 1999a, b; Floresco et al. 2008; Ghods-Sharifi et al. 2008; Rich and Shapiro 2009), which have been described as serving more of a working memory, action-monitoring function (Ragozzino et al. 1998; Delatour and Gisquet-Verrier 1999, 2000; Gisquet-Verrier and Delatour 2006). Similar results have been obtained in monkey studies (Dias et al. 1996; Rudebeck et al. 2008) with a recent report showing that a subregion within the OFC, Walker’s area 14, may be most critical in learning to inhibit responses to a previously rewarded stimulus (Rudebeck and Murray 2011). Importantly, behavioral and neuroimaging studies in human subjects with OFC damage (Fellows and Farah 2003; Hornak et al. 2004), patients diagnosed with OCD (Remijnse et al. 2006) and healthy controls (O’Doherty et al. 2001; Ghahremani et al. 2010) all show that this region is important for accurate performance across a variety of reversal learning paradigms. In sum, the functional localization of discrimination reversal learning within the frontal cortex is well-preserved across species (Chudasama 2011).

Striatum

The idea that areas of the striatum may also mediate “frontocortical”-like processes was proposed and supported by empirical evidence as early as the late 1960s (Divac et al. 1967; Winocur and Mills 1969; Winocur and Eskes 1998). The anatomical interconnectivity (Fig. 2) may account for some functional overlap in reversal learning: lesions of the medial striatum, like damage limited to the OFC, produce perseverative responding on reversal learning in rats (Castane et al. 2010) and in marmoset monkeys (Clarke et al. 2008; Man et al. 2008). Additionally, mediodorsal thalamus, heavily interconnected with OFC, is a contributor to accurate performance during reversal learning (Chudasama et al. 2001), yet its circumscribed contribution has not been fully identified.

Medial temporal lobe structures

Severe discrimination and reversal learning impairments are found following lesions of rhinal cortex, specifically perirhinal cortex, probably due to this region’s essential role in object recognition and object identification (Murray and Richmond 2001). Damage restricted to the hippocampus also results in object reversal learning impairments in monkeys (Murray et al. 1998); yet in rodents, hippocampal recruitment appears to depend on the spatial nature of the reversal task (Morellini et al. 2010). Certain regions of the medial temporal lobe, though substantially involved in discrimination learning and in forming stimulus–reward associations, generally, are not critically important to reversal learning per se. For example, selective lesions of the amygdala do not disrupt reversal learning (Izquierdo and Murray 2007; Stalnaker et al. 2007) and even facilitate reversal learning (Stalnaker et al. 2007) and instrumental extinction (Izquierdo and Murray 2005). Thus, medial temporal lobe structures are critical to associative learning processes involving reward, yet frontocorticostriatal circuitry is most reliably implicated in support of reversal-specific learning.

Neurotransmitter mechanisms in reversal learning

An extensive literature exists on the pharmacological regulation of reversal learning abilities in birds, rodents and nonhuman and human primates. Some of this work involves the use of pharmacological agents that produce wide-ranging nonspecific impairments in cognitive and executive dysfunction. This section describes the modulatory role for monoamine systems that, while not proposed to exert selective actions on reversal abilities, do nevertheless exert more constrained influence on inhibitory control mechanisms, rather than working more generally on a broad array of cognitive and/or executive functions.

Serotonin

Some of the first evidence linking serotonin mechanisms to discrimination reversal came from studies of the effects of ondansetron—a centrally active 5-HT3 receptor antagonist—on object discrimination acquisition and reversal in monkeys. In both marmosets and rhesus monkeys, ondansetron improved performance, but this effect was noted for both initial acquisition and reversal learning, meaning that the pharmacological effect could not be localized to the inhibitory control processes measured in reversal (Barnes et al. 1990; Domeney et al. 1991; Arnsten et al. 1997).

Reductions in serotonin concentrations produced by dietary deficiencies in tryptophan have largely found no effect of low indoleamine transmission on reversal learning (Murphy et al. 2002; Evers et al. 2005a, b; van der Plasse and Feenstra 2008). On the other hand, toxin-mediated depletions of serotonin (which potentially produce larger magnitude reductions in transmitter) have often been reported to affect reversal learning in a behaviorally selective manner. Focal destruction of serotonin terminals in the OFC impairs the ability to update behavior during reversal in marmosets (Clarke et al. 2004, 2005, 2007). Similar findings have been reported after cortical serotonin depletions in rats (Masaki et al. 2006; Lapiz-Bluhm et al. 2009) yet this manipulation appears to have no effect on reversal learning in mice, even though chronic fluoxetine treatment produces a reversal enhancement in this species (Brigman et al. 2010). Very little study has been focused on the receptor mechanisms within prefrontal cortex that mediate effects on reversal learning, though 5-HT2 subtype receptors appear to be plausible candidates (Boulougouris et al. 2008b; Boulougouris and Robbins 2010).

Additional evidence relating serotonin to reversal learning derives from genetic findings in laboratory animals. In rhesus monkeys, variation in the noncoding regions of the gene encoding the serotonin transporter (that inactivates synaptic serotonin) is linked to reversal learning (Izquierdo et al. 2007; Vallender et al. 2009; Jedema et al. 2010); since the studies conducted so far linked independent, nonsegregating variants with reversal learning, the relationship between this gene and reversal appears to be a robust finding. Experimental studies in mice and rats (involving genetic or pharmacological inhibition of the serotonin transporter) support these findings as well (Brigman et al. 2010; Lapiz-Bluhm et al. 2009; Nonkes et al. 2011). The influence of genetic variation in the serotonin transporter on reversal learning probably depends on an interaction of factor including stress (Graybeal et al. 2011).

Unfortunately, much remains unknown about the details of the control of reversal learning by central serotonin systems. The types of manipulations conducted so far (dietary and pharmacological depletions, reductions in reuptake) are not specific enough to uncover the cellular and molecular mechanisms that mediate these influences and that are relevant to treatment of disorders linked with reversal learning problems. These findings are summarized in Table 1.

Table 1 Summary of evidence for serotonergic modulation of reversal learning

Dopamine

Though depletion of dopamine in OFC does not impair reversal learning the way that serotonin depletion does (Clarke et al. 2007), there are clear data linking dopaminergic systems to reversal learning. A recent study found that depletion of dopamine, but not serotonin, in the medial caudate nucleus results in a nonperseverative impairment on reversal learning in marmosets (Clarke et al. 2011).

Pharmacological studies, on the other hand, have provided convincing evidence for dopaminergic modulation of reversal learning. Using an object discrimination task, Ridley et al. reported that both indirect dopamine agonists and dopamine receptor antagonists, d-amphetamine and haloperidol, respectively, produced behaviorally specific problems with reversal (Ridley et al. 1981a, b). While both selectively impaired performance in reversal, amphetamine increased perseverative errors and haloperidol produced “nonperseverative” responding. Haloperidol was found to block the effect of amphetamine (Ridley et al. 1981b), showing that amphetamine-induced perseveration was due to increased dopamine output produced by the drug (as opposed to an action on another monoamine system).

Studies of the effects of amphetamine, or its analogues, on reversal learning in rodent models have generated variable results including impairments (Idris et al. 2005; McLean et al. 2010), no effect (Wilpizeski and Hamilton 1964) or improvements (Kulig and Calhoun 1972; Weiner and Feldon 1986); the variability in types of reversal used (spatial vs. visual; appetitively reinforced vs. escape-reinforced; Pavlovian vs. instrumental) and the inconsistency in doses and routes of administration make a direct resolution of these disparate findings challenging.

The fact that haloperidol blocked the effects of amphetamine, while producing an impairment itself (Ridley et al. 1981b; Idris et al. 2005), raises the hypothesis that over or under stimulation of dopamine D2like receptor function causes impairments in reversal learning, and over the years, this hypothesis has received remarkable support. In rats, activation or antagonism of D2like receptors (D2 and D3 receptors, in particular) affects reversal learning (Idris et al. 2005; Boulougouris et al. 2008a), and the particularly important role of the D2 receptor gene has been confirmed through the study of dopamine D2 receptor gene knockout mice (Kruzich and Grandy 2004; Kruzich et al. 2006; De Steno and Schmauss 2009). In nonhuman primates, activation or antagonism of D2 and D3 receptors disrupts reversal learning (Smith et al. 1999; Lee et al. 2007) with similar effects in human subjects (Mehta et al. 2001). It is important to note that not all these studies agree on the symmetrical effects of agonists and antagonists, with some showing effects of one, alone with no effects of the other; however, these discrepancies are almost certainly related to variation in doses administered and the difficulty of the reversal problem being solved. These studies do not imply that dopamine D1 receptors are unrelated to reversal learning; indeed, systemic treatment with the D1 agonist SKF81297 results in a selective early reversal learning impairments in mice, leaving visual discrimination learning unaffected (Izquierdo et al. 2006).

Beyond the behavioral pharmacological data, recent studies in a variety of species indicate that individual differences in reversal learning are related to D2-mediated dopamine transmission, with low receptor availability predicting poor reversal in otherwise normal mice (Laughlin et al. 2011), nonhuman primates (Groman et al. 2011) and humans (Jocham et al. 2009). Additionally, the reversal learning deficits in patients with psychostimulant addiction, a condition associated with low D2 receptor availability (Volkow et al. 2001; Lee et al. 2009), are reversed by administration of a D2like receptor agonist (Ersche et al. 2011), suggesting that the treatment implications of this dopamine-mediated effect on reversal learning are strong. These findings are summarized in Table 2.

Table 2 Summary of evidence for dopaminergic modulation of reversal learning

Together, these data collectively support the notion that dopamine D2 receptors are major players in coordinating effective behavioral flexibility during reversal learning and that low receptor complement/function is associated with poor inhibitory control in this task.

Noradrenaline

Increases in synaptic noradrenaline levels elicited by selective reuptake inhibitors or alpha-2 adrenergic autoreceptors are sufficient to trigger improvements in reversal learning (Lapiz and Morilak 2006; Lapiz et al. 2007; Seu and Jentsch 2009; Seu et al. 2009). Though it is tempting to speculate that this effect is mediated by increased noradrenaline output into OFC, this remains untested. Consequently, it remains possible that noradrenergic mechanisms in another cortical or subcortical region are mediating this effect or that these mechanisms are secondary to increases in dopamine output in the frontal cortex (Millan et al. 2000; Bymaster et al. 2002). This is an area worthy of much more systematic study.

Reversal learning and addiction

Evenden (1999) described impulsivity as an all-encompassing term for “actions that are poorly conceived, prematurely expressed, unduly risky, or inappropriate to the situation and that often result in undesirable outcomes.” The term compulsivity, on the other hand, denotes a state of being compelled to behavior as if “driven to perform” them, despite one’s own volition, (DSM-IV, American Psychiatric Association 1994). Though impulsivity and compulsivity are each multidimensional constructs relevant to behavioral addictions, they may both be considered endpoint exemplars of maladaptive decision-making; impulsivity clustered at the initiation of behavior and compulsivity most prevalent at the cessation of behavior. Similarly, they can each be positioned along a single, unitary dimension of cognitive rigidity, both characterized by frontocorticostriatal imbalance or dysfunction (Dalley et al. 2011). An important question is whether impulsive and compulsive behaviors, measured using reversal learning tasks, represent predisposing processes to addictions or whether they instead reflect sequelae arising from neuroadaptations caused by drug experience.

Poor inhibitory control: cause and consequence?

Addiction and cognitive rigidity are related in a complex circuitous relationship that is not yet fully understood (Schoenbaum and Shaham 2008). The observation that substance abusers exhibit inhibitory control deficits such as increased perseveration in reversal learning (Ersche et al. 2008) and risky decision-making may indicate premorbid vulnerability factors for addiction, direct consequences of long-term drug intake, or a combination of these (Verdejo-Garcia et al. 2008). Animal studies have substantiated both views: that exposure to addictive drugs causes cognitive deficits (Jentsch et al. 1997; Jentsch et al. 2002; Shoblock et al. 2003; Schoenbaum et al. 2004; Kantak et al. 2005) and that individual variation in inhibitory control influences addiction vulnerability (Dalley et al. 2007; Belin et al. 2008; Diergaarde et al. 2008; Perry and Carroll 2008).

Daily exposure to cocaine over a 2-week period produces persistent deficits in reversal learning (Jentsch et al. 2002). This is strong evidence for deficits of inhibitory control following prolonged stimulant exposure; a similar long-lasting impairment has been found in rodent models of cocaine or methamphetamine administration using a range of reversal-like tasks (Shoblock et al. 2003; Schoenbaum et al. 2004; Kantak et al. 2005; Schoenbaum and Setlow 2005). Additionally, even brief exposure to methamphetamine results in selective impairments on a wide range of reversal tasks in rats: spatial reversal (White et al. 2009), response reversal (Cheng et al. 2007), and discrimination reversal learning (Izquierdo et al. 2010). The learning impairment observed in the latter study was reversal-specific, leaving initial discrimination learning, as well as attentional set shifting, unaffected. Follow-up studies in our lab confirm that impaired reversal learning is just one example of cognitive rigidity after relatively brief exposures to methamphetamine that do not produce any significant, measured dopaminergic neurotoxicity (Kosheleff et al. 2011).

Rats that, in turn, exhibit poor inhibitory control (assessed prior to any experience with drug) have been shown to acquire cocaine or nicotine self-administration faster and to exhibit greater overall intake and diminished extinction of the drug-taking response (Dalley et al. 2007; Belin et al. 2008; Diergaarde et al. 2008). While the precise relationship between poor inhibitory control and addiction liability is still unknown, this difference in propensity may be due to the fact that impulsive individuals are more sensitive to the acute stimulus effects of the drugs (Perkins et al. 2008). Additionally, inhibiting prepotent responses may be a phenotype that contributes to a more rapid change in drug-taking behavior from a “goal-directed” pattern of use to a more compulsive pattern (Groman et al. 2008). This relationship has not yet been shown for reversal learning measures, specifically.

A putative mechanism for the loss of control of drug use involves frontocorticostriatal adaptations after protracted and perhaps even brief exposure to drug. For example, imaging studies in humans reveal an increasing involvement of ventral-to-dorsal striatum with increasing severity of stimulant craving and habit (Volkow et al. 2006). This may represent the neurophysiological correlate to increased automaticity and habitized behavior (Takahashi et al. 2007) measured in reversal learning. Plasticity within frontocorticostriatal circuitry resulting from drug exposure could contribute to the transition from recreational use to addiction (Everitt et al. 2007). With more striatal control in addiction, there is a concomitant decrease in frontocortical (e.g. OFC) involvement in inhibitory control, and consequently, decreased control over use of the drug.

The neurochemical mechanisms by which premorbid differences in inhibitory control function affect addiction liability are unknown, but recent studies suggest that low D2like receptor function is a potential mechanistic determinant (Dalley et al. 2007; Zald et al. 2008; Ersche et al. 2011; Groman and Jentsch 2011). These results suggest that low D2like receptor function is a molecular convergence point for premorbid genetic factors influencing impulsivity and for chronic stimulant drug-induced deficits of inhibitory control. This underscores the possibility that premorbid differences in inhibitory control affect vulnerability and that addiction further impairs inhibitory control as well as that a common molecular alteration may mediate the relationship between both associations.

Summary

Reversal learning is impaired in individuals affected by addictions and we have conceptually linked it to both the impulsive and compulsive aspects of drug-seeking and -taking. Though many questions still remain unanswered (see “Questions for Future Research”), this review has described the phenomena of reversal learning, underscored its relevance for understanding impulse control disorders and addictions, and defined what is known about the underlying biological determination of inhibitory control processes measured in these tasks. The literature reviewed here suggests that discrimination reversal learning may continue to be used and further developed as a diagnostic tool for pathology typified by poor inhibitory control. Preclinical and clinical research point to two interrelated neuroadaptations in addiction related to poor reversal learning: frontocorticostriatal circuitry dysregulation and poor dopamine (D2 receptor) modulation of this circuitry. If new therapeutics were to mitigate or ameliorate these adaptations, they have the potential to enhance the chance of abstinence and reduce the risk of relapse in addiction.

Questions for future research

The role for frontocorticostriatal circuits and dopamine D2 receptors are central to the mechanisms mediating inhibitory control abilities, yet little is known about genetic factors that code for individual differences in reversal learning. Candidate gene studies have confirmed roles for the serotonin and dopamine systems (Kruzich and Grandy 2004; Izquierdo et al. 2007; De Steno and Schmauss 2009; Jocham et al. 2009; Vallender et al. 2009; Brigman et al. 2010), and whole genome strategies have been initiated in an attempt to localize major effect loci in novel molecular systems (Laughlin et al. 2011). Because these genes theoretically also represent liability factors for disorders associated with extreme variations in impulsivity, they are crucial targets of future research.

It remains unclear whether effective pharmacological treatment of inhibitory control problems will translate into a clinically meaningful benefit. To test this, such a pharmacological treatment is required. In theory, cognitive-enhancing therapeutics could ameliorate problematic inhibitory control and help with drug abstinence. One such study showed that impairments in reversal could be rescued by subchronic citalopram treatment (Lapiz-Bluhm et al. 2009). To date, atomoxetine, and other noradrenaline reuptake inhibitors, remain some of the best characterized tools (Lapiz et al. 2007; Seu and Jentsch 2009; Seu et al. 2009), yet their ability to modulate problematic drug use remains undemonstrated.