Keywords

Introduction

Defining Goal-Directed and Habitual Instrumental Behaviour

Purposeful instrumental behaviour can be explained by both intentional and automatic theories of behavioural control (Heyes & Dickinson, 1990). On the goal-directed account, instrumental behaviour to obtain a reward is controlled by knowledge of the causal contingency between the instrumental response and the rewarding outcome, and knowledge of the predictive contingency between the current state of the agent and the value of the reward in that state. For example, the value of food is expected to be higher in a state of hunger, which motivates instrumental behaviour known to produce food (Dickinson, 1997). By contrast, according to the habit account, instrumental behaviour can also be controlled by S-R associations between external stimulus context (S) and the response (R), which are strengthened by contiguous reinforcement. That is, if in a particular stimulus context a response is performed which produces reward, the link between the stimulus context and the response is strengthened by the reward, such that the stimulus becomes able to elicit the response. This habitual form of behavioural control is automatic in the sense that the S elicits the R without retrieving an expectation of the reward to be obtained. It has been argued that drug addiction is driven by a propensity to habitual, as opposed to goal-directed, control over behaviour (Everitt, Dickinson, & Robbins, 2001).

Effect of Drug Exposure on Outcome-Devaluation in Animals

To test the habit theory of drug dependence, animal studies have examined whether chronic drug exposure modifies performance in the outcome-devaluation procedure. The design of these studies can be broken into four types (see the Habit Research in Action box).

  1. 1.

    In the most compelling set of studies, animals learn that one instrumental lever press response produces drug reward (alcohol, cocaine, nicotine in different studies), and in separate training blocks, learns that another instrumental response produces food. Then, in separate test phases, each outcome is devalued by pairing it with lithium chloride-induced sickness. Finally, animals are given the opportunity to perform the instrumental response for the devalued outcome in an extinction test. Four studies using this design have found that the drug-seeking response is habitual in not being reduced by the devaluation treatment in the extinction test (Dickinson, Wood, & Smith, 2002; Loughlin, Funk, Coen, & Lê, 2017; Mangieri, Cofresí, & Gonzales, 2012; Miles, Everitt, & Dickinson, 2003). Such insensitivity to devaluation suggests that the drug-seeking response is not goal-directed (not controlled by knowledge of the current value of the outcome) but rather, is habitual, i.e. elicited automatically by the stimulus context. By contrast, the food-seeking response in the studies is shown to be reduced by devaluation, indicating that it is goal-directed in being controlled by the expected low value of the outcome and knowledge of which response produces that outcome. These four studies provide the core empirical basis for the claim that drug-seeking (in animals) is especially prone to habitual control.

  2. 2.

    In the second type of design, animals are chronically exposed to a drug (experimenter administered or consumed in the home cage), and then trained on single lever for food, food is then devalued, and finally the food-seeking response is tested in extinction. Eight studies have shown that in the extinction test, food-seeking is habitual in the drug exposed animals (insensitive to devaluation) and goal-directed in the non-drug-exposed animals (Corbit, Chieng, & Balleine, 2014; Corbit, Nie, & Janak, 2012; LeBlanc, Maidment, & Ostlund, 2013; Nelson & Killcross, 2006, 2013; Nordquist et al., 2007; Schmitzer-Torbert et al., 2015; Schoenbaum & Setlow, 2005); although three studies have failed to demonstrate this effect (Ripley, Borlikova, Lyons, & Stephens, 2004; Shiflett, 2012; Son, Latimer, & Keefe, 2011). These data suggest that drug exposure renders animals generally prone to habitual control of rewarded instrumental behaviour, which could conceivably play a role in dependence formation by promoting general behavioural autonomy, although how this could promote drug dependence specifically remains unclear.

  3. 3.

    In the third type of design, animals are trained on a single lever for the drug, and sensitivity to devaluation is tested after minimal training versus extensive training . Three studies have demonstrated that the drug-seeking response is initially goal-directed, but then becomes habitual with extensive training (Clemens, Castino, Cornish, Goodchild, & Holmes, 2014; Corbit et al., 2012; Zapata, Minney, & Shippenberg, 2010). However, given that food-seeking also transitions from being goal-directed to habitual with training (Dickinson, Balleine, Watt, Gonzalez, & Boakes, 1995), these findings tell us nothing about the unique habit forming status of drug-seeking.

  4. 4.

    In the fourth type of design, animals are trained on a single lever for the drug and tested for sensitivity to devaluation following a fixed amount of training. These studies have revealed drug-seeking to be goal-directed (Hutcheson, Everitt, Robbins, & Dickinson, 2001; Olmstead, Lafond, Everitt, & Dickinson, 2001), and habitual (Corbit, Nie, & Janak, 2014). Again, these studies tell us nothing about the unique habit forming status of drug-seeking.

Criticisms of Animal Outcome-Devaluation Studies

There are two main criticisms of the animal outcome-devaluation model of habitual drug-seeking. First, habitual instrumental behaviour is only found when animals have access to a single lever in each session. By contrast, when rats have concurrent access to two levers for different rewards in each session, drug-seeking remains goal-directed (Halbout, Liu, & Ostlund, 2016), food-seeking remains goal-directed despite chronic drug exposure (Phillips & Vugler, 2011; Son et al., 2011), food-seeking remains goal-directed despite overtraining (Colwill & Rescorla, 1985; Colwill & Triola, 2002; Holland, 2004; Kosaki & Dickinson, 2010), and drug self-administration remains sensitive to shock punishment (Pelloux, Murray, & Everitt, 2015). It has been suggested that concurrent access to two responses for different rewards maintains memory for the response–outcome relationships abolishing habitual control (Klossek, Yu, & Dickinson, 2011; Kosaki & Dickinson, 2010). If one accepts that the natural environment of human drug users contains access to multiple responses for different rewards, then it must be concluded that the form of habitual control demonstrated in the animal model could not play a role in human addictive behaviour (Heather, 2017; Singer, Fadanelli, Kawa, & Robinson, 2018).

The second criticism is that habitual control in the outcome-devaluation procedure is fragile in that the sensitivity of drug-seeking (Dickinson et al., 2002; Loughlin et al., 2017; Mangieri et al., 2012; Miles et al., 2003) and food-seeking in chronically drug exposed animals (Nelson & Killcross, 2006, 2013) to outcome-devaluation is restored in reacquisition tests where instrumental response produces the devalued reinforcer. Sensitivity to devaluation may be restored in the reacquisition test either because the response can be modified by S-R learning or because animals are reminded of the response–outcome contingencies (Dickinson et al., 2002). If it is accepted that in the natural environment of human drug users instrumental responses are typically reinforced (is more comparable to the reacquisition than the extinction condition) then it must be concluded that habitual control demonstrated in the extinction test of the animal model can play little role in human addictive behaviour.

To negate the problem that habitual control is limited to extinction conditions, subsequent theories of habit and compulsivity (e.g. Everitt & Robbins, 2016) have proposed that drug-seeking may become permanently insensitive to devaluation, based on the finding that impulsive or chronically drug exposed animals are less sensitive to the suppressive effect of shock punishment on drug-seeking (Belin, Mar, Dalley, Robbins, & Everitt, 2008; Economidou, Pelloux, Robbins, Dalley, & Everitt, 2009; Pelloux et al., 2015; Pelloux, Everitt, & Dickinson, 2007; Vanderschuren & Everitt, 2004). However, persistence of punished self-administration appears to be driven by the greater reinforcement value ascribed to the drug (Bentzley, Jhou, & Aston-Jones, 2014), which was inadequately assessed by single lever self-administration procedures in the earlier studies (Ahmed, 2010). In sum, the restriction of habitual control to single lever tests, the abolition of habitual control in reinforced conditions, and the attribution of persistent punished drug-seeking to heightened drug value, weakens the claim habit or compulsion plays a role in human addiction (Becker & Greig, 2010; Bentzley et al., 2014; Heather, 2017; Markou, Chiamulera, Geyer, Tricklebank, & Steckler, 2009; Pierce, O’Brien, Kenny, & Vanderschuren, 2012).

Outcome-Devaluation Studies with Human Drug Users

Nine outcome-devaluation experiments (published in six papers) have tested whether habit is more pronounced in human drug users versus non-users, or as a function of dependence severity in the user group. In two experiments, student smokers first learned that two key press responses earned tobacco and chocolate points, respectively (Hogarth & Chase, 2011). Tobacco was devalued by smoking to satiety or health warnings (in Experiment 1 and 2 respectively), before choice between the two responses was tested in extinction. Devaluation reduced tobacco choice in the extinction test of both experiments indicating that tobacco choice was goal-directed. Crucially, there was no correlation between sensitivity to devaluation and tobacco dependence, contradicting habit theory. The third study used the same protocol but tobacco was devalued by a 1 mg dose of nicotine nasal spray (Hogarth, 2012). This devaluation treatment reduced goal-directed tobacco choice in less-dependent smokers, and primed goal-directed tobacco choice in more-dependent smokers, demonstrating different motivational effects of the 1 mg dose. Nevertheless, more-dependent smokers were demonstrably goal-directed, again contradicting habit theory. In the final experiment of this series (Hogarth, Chase, & Baess, 2012), student smokers learned that two responses earned chocolate and water points respectively, one outcome was then devalued, and choice was measured in extinction. Daily- and non-daily smokers differed markedly in dependent severity but showed no differential propensity to habit in the extinction test. These four studies contradict the prediction of habit theory that propensity to habit should be more pronounced as a function of dependence severity.

One possibility is that habit is exclusively found in drug users who are clinically dependent. This was tested in two experiments where treatment-seeking addicts learned that two responses earned food and drink points respectively, before one outcome was devalued (Hogarth, Lam-Cassettari, et al., 2018) and choice was tested in extinction. In both experiments, treatment-seeking drug users and controls were equally goal-directed, contradicting the prediction that habit would be more evident in clinically dependent users.

Only two outcome-devaluation studies suggest that habit learning is more pronounced in drug users. One study trained alcohol-dependent and control participants on an instrumental discrimination task in which a left or right response was rewarded with points depending on which ‘stimulus fruit’ picture was present (Sjoerds et al., 2013). When points were earned, an ‘outcome fruit’ picture was also presented, which was reliably associated with the left or right response. In the outcome-devaluation test, two outcome fruits were presented together, associated with the left and right response, respectively. One outcome fruit had a cross through it and participants were told to choose the response associated with the uncrossed outcome fruit, as only this response would be rewarded. Alcohol-dependent participants were less accurate in choosing the correct (rewarded) response, indicating that they had weaker knowledge of the association between the responses and the outcome fruits. However, there is a problem with interpreting this finding as evidence for propensity to habit or impaired goal-directed control (De Houwer, Tanaka, Moors, & Tibboel, 2018). Alcohol-dependent participants may have been goal-directed in that they learned which response produced points in the presence of each stimulus fruit, and simply ignored the outcome fruits that accompanied the points because they were incidental to task performance. Alcohol-dependent participants may have been more inclined to ignore the incidental outcome fruits because of general cognitive impairments or task disengagement (Stavro, Pelletier, & Potvin, 2013), rather than because they have a specific deficit in goal-directed control or propensity to habit.

The final study tested an appetitive and aversive version of the outcome-devaluation procedure in cocaine-dependent individuals versus controls (Ersche et al., 2016). The appetitive task was very similar to the task used by Sjoerds et al. (2013) described above. The key finding was that in the outcome-devaluation test, cocaine-dependent participants showed poorer accuracy, again indicating weaker knowledge of the relationships between the left and right responses and the incidental outcome stimuli. As before, this impairment could be due to cocaine-dependent participants simply ignoring the incidental outcome stimuli, while acquiring goal-directed knowledge of the response-points contingencies (De Houwer et al., 2018). More damaging still, cocaine-dependent participants showed poorer accuracy (and slower response latencies) during initial discrimination learning, which accounted for a substantial proportion of the variance in accuracy in the outcome-devaluation test. Also, cocaine-dependent participants verbally reported less knowledge of the relationships between discriminative stimuli, responses and outcome stimuli in the task. The implication is that cocaine-dependent participants’ impaired performance in the outcome-devaluation test was due to general cognitive impairment or task disengagement (Potvin, Stavro, Rizkallah, & Pelletier, 2014), rather than a specific propensity for habit learning.

Habit theory was further weakened by cocaine-dependent and control participants showing comparable outcome-devaluation performance in the aversive procedure (Ersche et al., 2016). In the aversive procedure, discriminative stimuli were presented which signalled that the left or right wrist would be imminently shocked unless the foot pedal on the corresponding side was pressed to cancel the shock (cocaine-dependent participants again showed poorer discrimination accuracy). In the outcome-devaluation test, one wrist was disconnected from the shock generator and participants were told that wrist would not be shocked—the implication being that there was no need to press the foot pedal on the same side as the disconnected wrist to cancel signalled shock. The groups equally reduced foot pedal responses corresponding to the disconnected wrist, suggesting they were able to integrate the instructions into decision-making, i.e. are equally goal-directed, contradicting habit theory.

In sum, seven outcome-devaluation studies have shown that propensity to habit is not more pronounced in drug users versus controls, or as a function of dependence severity, whereas two studies have claimed evidence for weaker goal-directed knowledge in alcohol and cocaine-dependent participants. This gives a ratio of 7:2 studies against habit theory. Furthermore, the two studies which claim evidence for weaker goal-directed knowledge in drug users can be criticised for inadequately assessing the nature of users’ knowledge (De Houwer et al., 2018), and may be explained by general cognitive deficits or task disengagement. The human outcome-devaluation task has provided little or no evidence for habit theory of dependence.

Two-Stage Task in Human Drug Users

The two-stage task is another procedure used to quantify the balance between goal-directed and habitual control in humans (see the Habit Research in Action box). The measure of goal-directed versus habitual control in the two-stage task correlates with sensitivity to outcome-devaluation, suggesting these tasks assess a common capacity (Gillan, Otto, Phelps, & Daw, 2015), but it remains unclear to what extent poor performance also reflects general cognitive impairments or task disengagement.

There are currently eight studies (reported in seven papers) which have used the two-stage task to compare drug users versus controls, or examine variation across dependence severity. One study found that in a large general sample obtained through online testing, greater self-reported alcohol use disorder severity was weakly associated with reduced goal-directed (model-based) control (Gillan, Kosinski, Whelan, Phelps, & Daw, 2016). Another study found a weak significant reduction goal-directed control in alcohol-dependent patients versus control participants in a one-tailed test (Sebold et al., 2014), but this difference was abolished when a group difference in cognitive speed was controlled. The third study found that methamphetamine-dependent participants were less goal-directed than control participants (Voon et al., 2015). However, alcohol-dependent participants reported in the same paper (the fourth study) had comparable goal-directed capacity to control participants. The fifth study found no association between goal-directed control and binge drinking severity in 18 year old social drinkers (Nebe et al., 2017). The sixth study found no relationship between goal-directed control and frequency of alcohol consumption in a general sample of young adults (Deserno et al., 2015). The seventh study found no reduction in goal-directed control in children of alcoholic fathers compared to control participants (Reiter, Deserno, Wilbertz, Heinze, & Schlagenhauf, 2016). Finally, the eighth study found no reduction in goal-directed control in alcohol-dependent participants compared to healthy controls (Sebold et al., 2017). In sum, the two-stage task has yielded five studies against habit theory and three studies in favour (although one favourable effect was abolished when cognitive speed was controlled), giving a ratio of 5:3 against versus for habit theory (at best). It remains unclear to what extent the other positive effects were due to general cognitive impairment or task disengagement.

Interpreting Human Evidence for Habit in Addiction : The Role of Explicit Contingency Knowledge

Human studies using the outcome-devaluation and two-stage task have collectively yielded 12 negative studies showing no greater propensity for habit in drug users or as a function of dependence severity, and five positive studies which have reported such effects, i.e. a ratio of 12:5 negative to positive findings. A key question is whether there is any obvious distinction between positive and negative studies, which accounts for their differential findings. It can’t be claimed that negative studies all used concurrent choice procedures militating against habit (see section “Criticisms of Animal Outcome-Devaluation Studies”), because the five positive studies also used choice designs. It also cannot be claimed that positive studies all used clinical samples whereas negative studies used sub-clinical samples, because there are four negative studies with clinical samples (Ersche et al., 2016; Hogarth, Lam-Cassettari, et al., 2018; Voon et al., 2015) and one positive study with a general online sample (Gillan et al., 2016).

The explanation offered here is that in tasks where drug users acquire explicit contingency knowledge, they also show goal-directed/model based control, whereas in tasks where drug users do not acquire explicit contingency knowledge, they show a general deficit in task performance which is misinterpreted as evidence for a propensity to habit/model free learning. It is notable that accurate contingency knowledge was acquired by drug users in the seven outcome-devaluation studies which reported intact goal-directed control in drug users (Ersche et al., 2016; Hogarth, 2012; Hogarth et al., 2012; Hogarth & Chase, 2011; Hogarth, Lam-Cassettari, et al., 2018). In these studies, the contingencies were often simple, sometimes a small number of contingency unaware participants were excluded, and sometimes training continued until contingency knowledge was acquired. By contrast, of the two outcome-devaluation studies which claimed evidence for habit learning in drug users, one did not publish data on contingency knowledge (Sjoerds et al., 2013), and the other reported impaired explicit contingency knowledge (Ersche et al., 2016). Comparison of the two tasks reported in this final study is particularly telling (Ersche et al., 2016). In the aversive learning outcome-revaluation task where cocaine-dependent showed goal-directed control, they also showed accurate contingency knowledge. To quote: “All participants demonstrated intact awareness about the task contingences (100% accuracy in both groups)” (page 7 of the supplementary material). By contrast, in the appetitive outcome-devaluation task where the cocaine-dependent group (CUD) showed impaired goal-directed control, they also had impaired contingency knowledge. To quote: “Compared with control volunteers, CUD demonstrated significant deficits in explicit knowledge in terms of stimulus-outcome (mean: U = 985, p < 0.001), response–outcome (U = 1250, p = 0.007) and stimulus-response (U = 1023, p < 0.001) relationships” (page 6 of supplementary material). Finally, all of the two-stage studies gave participants explicit instructions about the transitional structure and reward probabilities operating in the task, but knowledge of these contingencies was not assessed at the end of the procedure, and so the relationship between contingency knowledge and model-based performance remains unknown. In conclusion, where it is possible to assess their co-occurrence, goal-directed control and explicit knowledge of task contingencies do co-occur. The implication is that excessive habit/model free learning in drug users, shown in a small number of studies, is probably due to an impairment in explicit contingency knowledge, due to general cognitive deficit or weaker motivation to engage in the task (Potvin et al., 2014; Stavro et al., 2013), which produces a general deficit in task performance (Hogarth & Duka, 2006; Mitchell, De Houwer, & Lovibond, 2009), which is misinterpreted as evidence for a specific propensity to habit learning.

Excessive Goal-Directed Drug-Seeking as an Alternative to Habit Theory

In contrast to the weak evidence for habit theory , there is substantial evidence that dependence is underpinned by excessive goal-directed drug-seeking. First, dependence severity in both sub-clinical and clinical dug users is reliably associated with greater economic demand for drugs, that is, willingness to work or pay for drugs (Bruner & Johnson, 2014; Chase, MacKillop, & Hogarth, 2013; Gray & MacKillop, 2014; MacKillop et al., 2008, 2010; MacKillop & Murphy, 2007; MacKillop & Tidey, 2011; Murphy & MacKillop, 2006; Murphy, MacKillop, Skidmore, & Pederson, 2009; Murphy, MacKillop, Tidey, Brazil, & Colby, 2011; Petry, 2001). Furthermore, economic demand for drugs prospectively predicts relapse following a cessation attempt consistent with a causal role (MacKillop, 2016; MacKillop & Murphy, 2007; Murphy et al., 2006, 2015; Murphy, Correia, Colby, & Vuchinich, 2005). Similarly, in animals, economic demand for the cocaine prospectively predicts persistent responding in extinction, cued- and drug-induced reinstatement (relapse), and insensitivity to shock punishment of self-administration (Bentzley et al., 2014). Thus, dependence is associated with greater value ascribed to the drug.

Studies measuring concurrent choice between drugs and natural rewards suggest that dependence is mediated by excessive goal-directed drug-seeking. Specifically, dependence severity in both sub-clinical and clinical drug user is reliably associated with preferential choice of the drug (Chase et al., 2013; Hardy & Hogarth, 2017; Hardy, Mitchell, Seabrooke, & Hogarth, 2017; Hogarth, 2012; Hogarth & Chase, 2011, 2012; Hogarth & Hardy, 2018; Hogarth, Hardy, Mathew, & Hitsman, 2018; Miele et al., 2018; Moeller et al., 2009, 2013). Choice of the drug in concurrent choice designs is also demonstrably goal-directed as shown by sensitivity to devaluation in an extinction test (Hogarth, 2012; Hogarth et al., 2015; Hogarth & Chase, 2011; Hogarth, Field, & Rose, 2013). In animals, preferential concurrent choice of drugs in rats is associated with greater number of neurons in the orbitofrontal cortex (OFC) that activate in preparation for that choice (Guillem & Ahmed, 2017; Guillem, Brenot, Durand, & Ahmed, 2018), and the OFC is known to play a role in encoding goal-directed outcome values in humans (Valentin, Dickinson, & O’Doherty, 2007; see: Balleine & O’Doherty, 2010; Mannella, Mirolli, & Baldassarre, 2016). Together, these data suggest that dependence is mediated by the ascription of greater value to the drug, which drives excessive goal-directed drug-seeking.

Excessive incentive learning may further promote goal-directed drug-seeking in drug users with comorbid psychiatric illness . Specifically, goal-directed drug choice is reliably increased by aversive states of withdrawal (Hogarth, Mathew, & Hitsman, 2017; Hutcheson et al., 2001) and acute negative mood (Hardy & Hogarth, 2017; Hogarth et al., 2015; Hogarth & Hardy, 2018). Individuals with depression symptoms and those who report using drugs to cope with negative affect are more sensitive to the motivational impact of withdrawal and negative mood-induced priming of goal-directed drug choice (Fucito & Juliano, 2009; Hogarth et al., 2017; Hogarth & Hardy, 2018; Hogarth, Hardy, et al., 2018). Furthermore, depression (Crum et al., 2008) and drinking to cope with negative affect (Crum et al., 2013) are both excellent prospective markers for the development of dependence. The implication of the foregoing data is that dependence is mediated by excessive goal-directed drug choice, combined with comorbid psychiatric states conferring increased sensitivity to the motivational effects of adverse states promoting further goal-directed drug choice via incentive learning (Hogarth et al., 2015; Hogarth & Hardy, 2018; Mathew, Hogarth, Leventhal, Cook, & Hitsman, 2017). Certainly, the evidence for excessive goal-directed drug-seeking driving dependence is more compelling than the evidence for habit theory.

Implications for Treatment

The studies reviewed in this chapter have indicated that human drug dependence is not reliably associated with propensity to habit, but is reliably associated with excessive goal-directed drug choice and sensitivity to adverse state triggers of goal-directed choice. If drug-seeking in dependent individuals is not a habit, then treatments designed to target habits, for example, implementation intentions (Webb, Sheeran, & Luszczynska, 2009) or avoidance training (Eberl et al., 2013), may ultimately be less effective (see also Chap. 16 in this volume). By contrast, if drug-seeking in dependent individuals is a goal-directed choice driven by the expected value of the drug, then treatments should seek to: (a) Decrease the value of the drug, for example, by health education (Kleinot & Rogers, 1982), mood/stress management (Bradizza et al., 2017; Pettinati, O’Brien, & Dundon, 2013), drug replacement medication (Mariani, Khantzian, & Levin, 2014; Stead, Perera, Mant, & Lancaster, 2008); (b) Increase the costs associated with the drug, for example, by taxation or minimum price policies (Chaloupka, Grossman, & Saffer, 2002) or prohibition (MacCoun & Reuter, 2011); (c) Increase the value of competing alternative rewards, for example, by contingency management (Higgins, Heil, & Lussier, 2004; Regier & Redish, 2015), behavioural activation (Ross et al., 2016) or community-reinforcement (Meyers, Roozen, & Smith, 2011); and (d) Decrease the costs associated with the alternative competing rewards, for example, by prescription access to exercise facilities (Sanchez, Bully, Martinez, & Grandes, 2015) or funding access to work (Silverman et al., 2007). Ideally, treatments should target the value and costs ascribed to both drugs and natural rewards simultaneously, to maximise the impact on goal-directed drug choice.

Conclusion

Animal studies have suggested that drug-seeking and natural reward-seeking is more prone to habitual control following chronic drug exposure. However, these effects are restricted to single choice situations where there is no direct experience of the devalued outcome (extinction), so these effects are unlikely to operate in complex human decision-making environments (Heather, 2017). Twelve human outcome-devaluation and two-stage have shown that habit learning is not more pronounced in drug users, or as a function of dependence severity, whereas five studies have reported these effects. These five human studies favouring habit theory may be trivially explained by general cognitive deficits/task disengagement giving rise to weaker explicit contingency knowledge and hence poorer general task performance. By contrast, there is compelling evidence that human drug dependence is driven by excessive goal-directed drug-seeking and that psychiatric comorbidity confers greater sensitivity to acute adverse states triggering further goal-directed drug-seeking through incentive learning. Treatments should focus on the decision-making processes involved in the weighing the relative value and costs of drugs versus competing natural goals.

Habit Research in Action

The outcome-devaluation task—In the outcome-devaluation task, subjects learn that two responses (R1 and R1) earn different rewarding outcomes (O1 and O2). One outcome is then devalued by pairing it with sickness or consumption to satiety. Finally, choice between the two responses is tested in extinction (no rewards are provided). A reduction in choice of the response that earned the devalued outcome must be mediated by an expectation of the current low value of that outcome (i.e. must be goal-directed). The test is conducted in extinction because if the response produced the devalued outcome, the propensity to make this response could be reduced by a weakening of the habitual stimulus-response association controlling the response.

The two-stage task—At the start of each trial , the same first-stage pair of stimuli is always presented. When participants select one stimulus from this first-stage pair, a ‘common’ second-stage stimulus pair is produced on 70% of occasions, whereas a different ‘rare’ second-stage pair is produced on 30% of occasions. By contrast, if the other stimulus from the first-stage pair is selected, the common and rare second-stage pairs are reversed. Participants are informed about the transitional structure between choices made between stimuli at the first stage, and the production of the rare and common second-stage pairs. Upon production of a second-stage pair, participants select one stimulus, and this yields money reward with a probability between 0.25 and 0.75, which varies slowly over trials, and independently of the other second stage stimuli. Thus, on any given trial, participants can maximise payoff by learning that selection of a first stage stimulus reliably produces the common and rare second stage pairs with different probabilities, from which the second-stage stimulus can be selected that is most likely to pay off based on recent experience of which second stage stimuli are being rewarded. Participants who are goal-directed (‘model-based’) can be distinguished from those who are habitual (‘model-free’) on the basis of their choices following trials in which a rare transition was rewarded. Specifically, on trials where choice of a first-stage stimulus produced the rare second-stage pair, and a stimulus selected from this second-stage pair was reinforced, participants face an interesting conundrum when choosing a first-stage stimulus in the next trial. If participants are goal-directed (model-based) they will choose the other first-stage stimulus than the one they chose previously, to give a 70% chance of producing the same second-stage pair as the previous trial, and thereby select the second-stage stimulus that was just rewarded. By contrast, if participants are habitual (model-free), they will choose the same first-stage stimulus as they chose on the previous trial because that previous trial was reinforced, even though this choice gives only a 30% chance of producing the same second-stage pair as the previous trial, from which the previously rewarded stimulus could be selected. The task therefore measures whether participants are using knowledge of the rare and common transitional structure of the task to chase the second-stage stimuli that are currently paying off.