Introduction

Learning how to predict profitable outcomes based on environmental stimuli is crucial for one’s survival. After repetitive associations with rewards, stimuli gain incentive valence through Pavlovian conditioning working as conditioned stimulus (CSs) and further affect instrumental learning, action selections, and performance [1,2,3,4]. However, this associative learning itself is adaptive while sometimes can be maladaptive. An extreme case is substance use disorder, i.e., addiction, one of the main symptoms of which is compulsive seeking or taking [5,6,7]. Addicts showing this behaviour unusually spend much time and effort in seeking drugs that might be in short supply or absence, regardless of their healthy and financial conditions. In other situations, drug-paired contextual can maladaptively attract addicts toward sources of drugs independently [8]. Though studies in humans can provide insights in this regard, the complexity of drug-using contexts and other inevitable factors make studies harder. In contrast, animal research instead meets the need to unravel neurobiological mechanisms in depth. Previous studies seldom draw a sharp distinction between Pavlovian conditioning and instrumental conditioning based on rodent models of addiction; thus, the neural basis underlying these behavioural phenotypes is largely neglect.

Learning theory emphasizes that Pavlovian and instrumental conditioning are two separate processes mainly mediated by ventral striatum (VS) and dorsal striatum (DS), respectively [9]. Here we integrate the results mainly coming from animal researches highlighting how the heterogeneous structures of the striatum are involved in these two processes and contribute to addiction. Furthermore, it has been recognized that addiction represents difficulties in shifting between goal-directed and habitual behaviour (two forms of instrumental behaviour), which probably results from the abnormally hypoactivity of the goal-directed system [7]. However, relevant neural basis evidence is still absent. Therefore, we discuss the rationality of this hypothesis and propose other possibilities to further unravel probably mechanisms underlying compulsive drug-seeking behaviour.

Briefly, we elaborate on the striatum circuits mediating switches from recreational drug use to compulsive drug seeking and provide other lines of thinking to understand addiction.

Learning theory in addiction

Learning theory emphasizes that behavioural aspects of addiction are the result of abnormal interaction between Pavlovian and instrumental conditioning [7]. To put it simply, a CS itself is enough to elicit preparatory, consummatory, or approach behaviours after repetitive representation with an unconditioned stimulus (US) [10]. For instance, the tone-food relationship can control salivation, which is a kind of specific physical arousal state of individuals. Thus, the stimulus-outcome (S–O) association is critical for establishing Pavlovian conditioning [11]. In contrast, instrumental conditioning is more about taking actions to obtain CSs or USs, which is maintained by response-outcome (R-O) or stimulus–response (S-R) contingency [11]. Compared with Pavlovian conditioning, instrumental conditioning can be seen as a more initiative process (Fig. 1).

Fig. 1
figure 1

Different controlling contingency between Pavlovian and instrumental behaviour (goal-directed and habitual behaviour). Pavlovian behaviour is sustained by the S–O association and is sensitive to changes in outcome value. Goal-directed behaviour is maintained by the R-O association, sensitive to changes in both outcome value and R-O contingency. Habitual behaviour depends on the S-R association and is insensitive to both outcome value and R-O contingency

The two processes often intermingle in drug addiction. Initially, drugs induce subjective pleasure, further increasing the likelihood of actions or responses to produce them via instrumental conditioning. Meanwhile, drug-paired environmental stimuli work as CSs gain incentive salience through Pavlovian conditioning and influence or even determine subsequent actions [12]. They render individuals being in an arousal or motivational state, concentrating attention, heading in certain directions and ready for the next emergency of drugs. The interaction of these two processes is important to high relapse rates after drug withdrawal or even compulsive drug-seeking behaviour. For instance, when exposed to a drug-paired stimulus, addicts often experience physiological and psychological arousal the same as when taking drugs. Consequently, they cannot resist the urge to seek and take drugs.

Reinforcement, Pavlovian conditioning and the nucleus accumbens

The mesolimbic dopamine (DA) system originates in the ventral tegmental area (VTA), the main target of drugs, also called “reward system.” Each drug has its own range of molecular targets but shares the common feature, i.e., leading to DA accumulation in the VTA as well as its downstream regions [13, 14]. Stimulants like cocaine, amphetamine and ecstasy hijack DA transporters interfering with DA reuptake, while nicotine can depolarize DA neurons directly [15]. Finally, although opioid reinforcement whether needs mesolimbic DA is still under debate. It is certain that they can instead disinhibit DA neurons via decreasing the activity of GABAergic interneurons in the VTA [16].

Apart from producing extreme pleasure feelings, drug-induced abundant DA releases can also serve as teaching signals, which stems from a revised Hebbian rule. The rule maintains neurons firing together, wiring together if they get a burst of DA [17, 18]. For simplicity, if neuron A activates neuron B and neuron B elicits a behaviour producing a reward, then released DA fastens the connection between these two neurons and reinforces this behaviour. Working through this mechanism, the value of drugs passes on the CSs; then, individuals learn to predict USs based on the emergency of CSs and react quickly. Learning the predictive rules usually is adaptive and enables individuals to choose appropriate behaviours to gain maximized rewards. However, addictive drugs lead to abnormal quantities of DA in the synaptic clefts, further cause long-term functional changes within reward circuits and finally result in individuals overreacting to drugs or CSs [19].

As a major hub of the reward system, the nucleus accumbens (NAc) integrates sensory signals from mesolimbic dopaminergic neurons (mainly from the VTA) and glutamatergic neurons residing in the basolateral amygdala (BLA), medial prefrontal cortex (mPFC) and ventral hippocampus (VH) into motor outputs (the NAc also has extended connections with the visceral and skeletal motor system) [20,21,22,23]. Previously, studies have found that the NAc is mainly involved in the acquisition of Pavlovian conditioning and integrates the Pavlovian influence on instrumental behaviour [20].

Basically, the NAc can be further divided into two subdivisions (NAc core and shell), where anatomical and functional heterogeneity between subpopulations of NAc neurons serves different functions [24]. It has long been postulated that the NAc core plays a vital role in the Pavlovian S–O association, while the NAc shell is crucial for maintaining the steady instrumental responding (performance) and rewarding effects of drugs [9, 11, 20]. Indeed, by using the cocaine conditioned place preference (CPP) paradigm, researchers can easily see Pavlovian incentive functions of CSs by measuring the time rodents spend on the drug-paired side. After a couple of days of cocaine infusions, enhanced synaptic connections were found in the BLA-NAc core circuitry, and optical or chemogenetic inhibition of this circuit activities can dampen cue-evoked cocaine CPP expression [25, 26]. Similarly, lesion of the NAc core also abolished stimulus-evoked approach behaviour (which can refer to goal tracking), while inactivation of the NAc shell did not affect this [20]. Silencing NAc core also blocked the acquisition of instrumental responding under the cocaine second-order schedule but did not affect the baseline performance once the animals had learned this behaviour [27]. More importantly, neural plasticity in the NAc core was detected only after contingent cocaine administration, but not found in the non-contingent cocaine regimens, suggesting that the drug itself did not provoke these changes, but probably owing to learning [28]. The above findings highlight NAc core is either necessary or sufficient for learning the rule of S–O association and facilitating the acquisition of instrumental responding.

In comparison, the NAc shell instead energizes responding once behaviours have been well established, but it may not be involved in the acquisition of instrumental behaviours. Food-restricted rodents were given 7-day instrumental training in a fixed ratio (FR1) procedure for food rewards and then transformed to intracranial optical self-stimulation (ICOSS), in which self-stimulation replaced food as reinforcers. The study found that optical stimulation of the NAc shell induced a higher level of responding rates than the food-reinforced performance under the FR1 procedure but had no influence on the acquisition of instrumental responding [20]. In addition, the NAc shell also mediates the psychomotor effects of drugs [29]. Acute intravenous injections of amphetamine, cocaine and opioid preferentially increased extracellular DA concentration and rates of glucose utilization in the NAc shell, while the NAc core did not find these changes [30]. Repeated administration of amphetamine led to long-lasting neural plasticity changes in the NAc shell and those linked to locomotor sensitization [31]. Moreover, injection of amphetamine into the NAc shell, but not NAc core, increased locomotor activity, and block of NAc shell weakened drug-induced locomotor sensitization.

A parallel or sequential system—dorsal striatum and instrumental behaviour

Instrumental conditioning contains two different processes [7, 11]: The goal-directed process is maintained by the response and its outcome association (R-O), which is sensitive to the changes in outcome value or R-O contingency. In contrast, habitual process instead is sustained by stimulus and response association (S-R), which is directly triggered by CSs and less involvement of the goal itself [11]. During the development from recreational drug use to compulsive drug seeking and taking, the NAc transfers its control over behaviour to the dorsal striatum (DS), which mediates the expression of instrumental conditioning. Indeed, DA releases measured by microdialysis were detected in the DS when CSs representation after long-term second-order schedules training, which remained relatively stable in the NAc [32].

The DS can further divide into two subregions based on their different cortical afferents: dorsomedial striatum (DMS) and dorsolateral striatum (DLS) [33, 34]. The DMS so-called sensorimotor striatum mainly receives inputs from mPFC and orbitofrontal cortex (OFC), while the DLS is primarily innervated by the motor cortex and is also called associative striatum. A considerable body of literature supports that goal-directed and habitual behaviour developed sequentially and are mediated by DMS and DLS, respectively. This hypothesis maintains, at the early stage of training, almost all individuals show goal directed, but only parts of them transform into habitual behaviour after extended training. Indeed, lesion studies have shown that inactivation of the DMS can bias performance to habitual form, which was impervious to either devaluation or contingency degradation test, whereas the same manipulation in the DLS resulted in a more goal-directed mode response even after extended training [35,36,37]. More interestingly, as already mentioned, silencing NAc does not affect the performance in the instrumental behaviour assays [11]. These findings suggest that goal-directed behaviour can gradually develop as a habitual mode along with repetitive practice. Importantly, during this process, the DLS replaces the DMS dominating behaviour.

However, recent data have challenged this sequential system hypothesis. A study involving electrophysiological recordings found that DMS and DLS activity patterns became similar after extended instrumental behaviour training, but not separated [38]. Photosilencing the DLS during discriminated choice learning accelerated behaviour acquisition at an early stage of training. In comparison, the same manipulation of the DMS largely decreased correct response at the late stage of training [39]. These findings suggest that the DMS and the DLS might not involve goal-directed or habitual behaviour in an all-or-none way. More specifically, it is probably that different neural populations in the DMS or DLS engage in goal-directed and habitual behaviour. Around 95% of the DS neurons are medium spiny neurons (MSNs), divided into two types based on their different efferents: D1-MSNs and D2-MSNs [40]. Although the contributions of the two clusters of neurons in goal-directed and habitual behaviour are still controversial, functional differences between DMS and DLS have been identified. Optogenetic activation of DMS A2A adenosine receptor, which can enhance the activity of D2-MSNs, impaired the goal-directed behaviour measured by devaluation [41]. Similarly, injecting A2A receptor antagonism in the DMS can rescue deficits of goal-directed behaviour caused by pre-exposure to amphetamine [42]. Since the activation of D2-MSNs can narrow the entire DMS effects on the cortex through the cortico-basal ganglia-thalamo-cortical loop, the D2-MSNs in the DMS might serve as a mediator to determine the contribution of DMS in the instrumental behaviours. On the contrary, in the DLS, a study, using ICOSS stimulating DLS D1-MSNs and D2-MSNs to establish operant behaviour, found that activation of D1-MSNs made rodents favour goal-directed behaviour, showing more sensitivity to the outcome devaluation and contingency degradation; Instead, D2-MSNs self-stimulation induced slower acquisition of operant behaviour, generalized between active and non-active nose pokes, and insensitive to the contingency degradation, indicating that D1-MSNs support goal-directed and D2-MSNs support habitual behaviour [43]. However, this result is inconsistent with another study, which monitors the two groups of neurons directly in the operant behaviour assays via two-photon laser scanning microscopy. The research found a shift in the relative firing timing between D1-MSNs and D2-MSNs, which is strongly correlated with the degree of habitual behaviour in mice [44]. More interestingly, a finding indicates that the relative firing timing also predicts the possibilities of action cancellation, and probably, the striatum projecting pallidus neurons play a crucial role in action inhibitory [45]. The above evidence implies that the mechanism of striatal instrumental control is more complex than previously assumed.

Compulsive drug seeking: a maladaptive habitual behaviour?

Drug addiction represents maladaptive behaviour patterns, which shows overreliance on the habitual system and deficits in goal-directed system [7]. Specifically, the habitual system continuously dominant actions and the goal-directed system cannot, in turn, regain the dominance. As a result, addicts often show the inability to re-estimate changes in the outcome value and contingency and adjust behaviour for the best interest.

The behavioural manifestation of addiction permits this hypothesis to be developed theoretically. For instance, almost all addictive drugs can tip the balance between goal-directed and habitual behaviour. In humans, participants addicted to alcohol displayed overreliance upon the habit system in the computer tasks [46]. In rodents, either contingent or non-contingent pre-exposure to alcohol, amphetamine and cocaine expedited the development of habitual behaviour reinforced by natural rewards [47]. Although it is reasonable to speculate that addictive drugs might induce functional deficits in the goal-directed system, it gives way to the habit system to control upon behaviours. However, mechanistically, we should be cautious to explain these findings, whether these are on account of the loss of top-down cortical control or impaired goal-directed system, which often conflate together to discuss.

Cortical lobes like the mPFC and the OFC serve as executors determining to conduct appropriate actions in specific situations [48,49,50]. Impairment of these cortical lobes leads to loss of inhibitory control over behaviour, increased impulsive-like behaviour (premature responses) and decision-making deficit, which in turn boosts the development of compulsive seeking behaviour or even addiction [51,52,53]. Indeed, drug use is probably one of the main origins of the functional reductions in the cortical lobes in addicts, although predisposing factors also should be taken into account. For instance, in humans, compared with healthy participants, stimulants, alcohol and nicotine abusers consistently suffered a loss of grey matter volume in the cortex, including mPFC and cingulate cortex (ACC) [54]. Similarly, in animals, rodents which underwent extended cocaine self-administration training showed reduced dendritic spine density, apical dendritic length and activities in the prelimbic cortex (PL) (a subregion of the mPFC) [55]. Although the DMS receives abundant glutamatergic inputs from cortical lobes, it is still not enough to conclude a loss of DMS-mediated goal-directed behaviour in addiction.

Counterintuitively, studies found that, after chronic self-administration of amphetamine, rodents showing compulsive-like behaviour represented enhanced engagement of OFC-DMS circuit but also came with decreased activities of PL-VS circuit [56]. Similarly, a recent study applied the optogenetic dopamine neuron self-stimulation (oDASS) paradigm, directly stimulating VTA DA neurons to mimic the effects of drugs to establish compulsive behaviour in mice, which found potentiated synaptic strength in the OFC-DMS pathway [57, 58]. What’s more, activation of the fibre terminals of OFC-DMS led to repetitive compulsive-like behaviours. These findings beg some questions. For instance, if the DMS is genuinely responsible for value-based goal-directed behaviour, it seems to be inconsistent with the habitual hypothesis we illustrated aforementioned. One possibility is that DMS neural populations receiving different cortical subregions afferents coordinate different or even opposing cognitive components of the goal-directed behaviour (Fig. 2). Indeed, in rats showing compulsive drug-seeking behaviour, the enhanced OFC-DMS connection strength subsided when they were not facing noxious foot shock, implying that the OFC-DMS circuit probably involved processing the conflict situations when individuals needed to weigh loss and gain [56]. This speculation is supported by the previous findings, showing that OFC was more sensitive to value changes and conveys the upcoming action selection signals, whereas PL played a relatively minor role in this process. For instance, OFC neural assemblies were reported that changed their activities with fluctuations of the current reward prediction error (RPE) [59]. A recent study also reported OFC projecting to the striatum encoded integrated value and is critical for value prediction based on previous errors [60]. Therefore, the above evidence points to the gain function of OFC is probably related to a state of value judgement when facing aversive stimuli, but we still cannot exclude the compulsive drug-seeking behaviour has a goal-directed (value-based) system basis (for instance, rodents extremely want drugs, so that they can bear punishment).

Fig. 2
figure 2

Neural populations involved in habitual or goal-directed actions in addiction. Traditional hypothesis emphasizes goal-directed and habitual system which is separated and develop sequentially. However, recent findings support different neural populations might serve different cognitive functions in mediating goal-directed or habitual behaviour, and these two systems seem to develop in parallel. For instance, in DMS, D1-MSNs are involved in goal-directed learning, while D2-MSNs participate in updating instrumental contingency. In comparison, in DLS, D1-MSNs support goal-directed actions, while D2-MSNs support habitual actions. These neural populations like different paths to their own destinations. The same path model can also apply to cortex regions. Previous findings advocate that there is a loss of cortical functions in addicts. However, a recent study also found a gain function of the OFC-DMS circuit. Indeed, mPFC probably is involved in inhibitory control, and OFC takes part in value judgement, which co-regulates goal-directed performance

On the contrary, PL is more engaged in the processes of inhibitory control. For instance, inactivation of the PL has largely impaired the inhibitory control in the discriminative stimulus task in rats [61]. Training rats to suppress taking levers to avoid punishment, inactivation of PL also impaired the performance in the avoidance tests [62]. Notably, after 2 weeks of cocaine self-administration training, only 20% of rats still resisted responses regardless of noxious mild foot shock. A decreased intrinsic excitability was measured in the PL, especially in shock-resistant rats. Compensatory stimulation of the PL could dampen compulsive cocaine-seeking behaviours but did not affect the baseline performance without foot shock [55]. Integrated these findings, we speculate that DMS circuits collaborate with each other to regulating different components of goal-directed behaviours. Specifically, when facing aversive stimuli, the OFC-DMS circuit is mainly responsible for value judgement, while the PL-DMS circuit involves in suppress inappropriate behaviours. Therefore, we believe the previous points of view highlighting the impaired goal-directed system are oversimple to explain these complicated behaviours. Future work should analyze the functions of different neural populations in the DMS and test behaviours in multidimensional paradigms.

Summary

In this review, we elaborated on the intrastriatal functional shifts during the development of addiction. We made a distinction between the Pavlovian and instrumental behaviour: Pavlovian conditioning is maintained by the S–O association, while instrumental conditioning is instead sustained by R-O or S-R association. We also identified the behavioural mechanisms underlying them and unravel the neural basis that contributes to addiction. The NAc, as the prominent part of the VS mainly involve in Pavlovian conditioning and renders CSs to affect and mobilize instrumental behaviour. However, the role of the NAc subregions play in this process has not been fully understood. We integrated recent findings proposing that the NAc core plays an important role in the Pavlovian S–O association, while the NAc shell is crucial for maintaining the instrumental response and passes the rewarding effects of drugs. Then, we further illustrated the DS (DMS and DLS) role in regulating goal-directed and habitual behaviour and highlight the importance of the DS in mediating instrumental behaviour. We also took this opportunity to incorporate the latest findings, which put forward the possibility that different DS neural populations might mediate goal-directed and habitual behaviour cooperatively or antagonistically.

Finally, as the hallmark of addiction, compulsive drug-seeking behaviour hypothetically has difficulties in shifting between goal-directed and habitual behaviour, which is probably due to loss of goal-directed control. However, recent data contradicted this hypothesis and found a gain function in the OFC-DMS circuit. We proposed one possibility that DMS neurons receiving different cortical afferents mediate different cognitive components in goal-directed behaviour. Future studies should further unravel how the neural populations change their function, contributing to addiction.