Keywords

1 Introduction

Imagine that you ritualistically purchase your morning coffee from the same place every day, but one day, you have a bad experience and the coffee is subpar. What will you do? Will you continue to habitually follow your routine or will you take your business elsewhere? It seems like a relatively simple computation, but it is remarkable to think about how many different brain signals come into play in situations like these. At the very least, you need to break the habit and be motivated to pursue new goals. This involves detection of errors and allocation of attention so that new associations can be formed. Subsequently, you must consider potential options and their economic value, as well as the probability of achieving those options. In this chapter, we examine neural correlates of these functions in animals performing tasks during which past and potential experiences—both positive and negative—modify behavior.

Over the last two decades, the number of brain areas in which activity has been shown to be modulated by rewards and cues that predict reward has escalated dramatically. In fact, all the brain areas illustrated in Fig. 1a contain neurons that increase firing to cues that predict reward and to reward themselves (and even this is not an all-inclusive list). This raises an important question: Are all these brain areas encoding the exact same information or are they subserving different functional aspects of reward processing? Recent work has tried to tease apart these functions to determine what exactly is being encoded by nodes in this circuit. Below, we describe single neuron recordings in behaving animals that have addressed this issue.

Fig. 1
figure 1

a Circuit diagram demonstrating connectivity between brain regions involved in reward -guided decision-making. Arrows represent direction of information flow where single-headed arrows are unidirectional and double-headed arrows are reciprocal. Words in shaded box specify functions and shading of box provides a general idea of the role that nearby anatomical labels play in the strength of these functions. b Interplay of functions related to reward-guided decision-making. Orbitofrontal cortex OFC, dorsal-lateral prefrontal cortex PFC, basolateral amygdala ABL, anterior cingulate cortex ACC, parietal cortex Parietal, premotor cortex PM, nucleus accumbens NA, dorsal-medial striatum DMS, dorsolateral striatum DLS, ventral tegmental area VTA, dopamine DA, substantia nigra compacta SNc, globus pallidus GP, thalamus Thal, substantia nigra reticulata SNr, prediction error PE. Adapted from Bissonette et al. (2014) and Burton et al. (2014)

This chapter is broken down into four sections. Each section describes experiments that have tried to parse “reward -related” neural activity into the functions illustrated in Fig. 1b. This figure tries to encapsulate all possible functions that go into making simple decisions based on anticipated reward and punishment. It is clear that this is an extremely complex computation! Several of the arrows linking proposed functions are bidirectional, illustrating that most mechanisms involved are highly interrelated and influence each other. When examining neural correlates of these functions, great care needs to be taken when linking firing patterns of single units to functions related to decision-making.

In the first section, we examine studies that distinguish value from other signals that covary with value, such as motivation and salience. Learned value can be defined as the relative anticipated worth that some cue predicts, either positive or negative. Motivation is the enhancement or decrement of motor output based on an increased or decreased level of arousal. More specifically, a stimulus is “salient” if it leads to a general increase in arousal or is attention grabbing, whereas something is “motivational” if it enhances motor behaviors. In the first section, we will focus on dissociating “value” signals from signals related to motivation and salience. Notably, the dissociation between the latter two is much more difficult to study and requires further investigation; however, signals directly related to motivated motor output should be observed in the period leading up the behavioral response, whereas signals related to attention or salience might solely be during the presentation of cues (i.e., salient events), but not necessarily the actions associated with them.

In the second section, we examine neurons that increase firing to the delivery of unexpected outcomes, which are critical for reporting surprising events so that learning can occur. Several brain areas respond to reward delivery but few specifically report errors in reward prediction. To uncover these correlates, experimental paradigms must violate expectations in both positive (outcome better than expected) and negative (outcome worse than expected) directions. This allows researchers to dissociate signed from unsigned prediction error signals. Signed prediction error signals lead to changes in the associate strength of conditioned stimuli that predict anticipated value, whereas unsigned prediction error signals lead to increased attention so that learning can occur.

All of these signals (value, motivation, prediction errors, attention, etc.) must modulate systems that guide behavior. The striatum is thought to be one major interface that integrates this information with motor output. In the third section, we will describe studies that demonstrate a basic trend along the diagonal of striatum (ventral-medial to dorsal-lateral) by which correlates are more reward-related in ventral-medial regions, whereas correlates are more associative and motor-related in more dorsal-lateral sections. Finally, in the fourth section, we discuss where all of these “reward-related” signals might be integrated into a common signal.

2 Value Versus Motivation and Salience

Neurons that increase firing prior to a desirable reward might be encoding value or they might reflect the increased motivation or salience associated with receiving this reward. It is not trivial to dissociate value from motivation or salience because they covary in most situations; that is, the more you value something, the more motivated you are to work to obtain it (Solomon and Corbit 1974; Daw et al. 2002; Lang and Davis 2006; Phelps et al. 2006; Anderson 2005; Anderson et al. 2011). The literature has extensively examined neural systems involved in making decisions based on potential outcomes, both good and bad, but whether these signals reflect value or motivation/salience is still not entirely clear. As described above, we define motivation as a process that invigorates motor responding and salience as cues that are attention grabbing or arousing. The two are intertwined; salient cues might lead to increased motivation, both of which might be triggered by cues that predict valued reward and induce faster responding. Importantly, neural correlates related with value can be dissociated from motivation and salience by examining firing to aversive stimuli, which have low value but are highly salient and motivating (Fig. 2a). A classic example is that of the carrot and the stick. A donkey finds both salient and motivating, but the carrot has high value, whereas the stick has low value.

Fig. 2
figure 2

a Tasks that dissociate value from motivation/salience correlate by varying appetitive and aversive outcomes. In these tasks, one trial type promises a large reward with little punishment (app = appetitive); another promises a small reward with little punishment (neu = neutral); and a third promises the small reward, but threatens the animal with a large punishment (aver = aversive). Both primates and rats performing tasks like these prefer large reward and dislike large punishment (aver) relative to neutral, but are highly motivated by both as indicated by faster reaction times and better performance. Thus, theoretically, if neurons in the brain participate in value computations, then they should fire highest for appetitive trial types and lowest for aversive trial types (“value”). If neurons participate in signals that reflect motivation/salience, their activity should be high for both appetitive and aversive trial types. b Monkey study that dissociated value from motivation. Accuracy and RT data from primates, illustrating that monkeys were faster and more accurate for large reward and punishment trials, versus neutral trials. c Neural recordings in primate OFC demonstrate higher activity for large over small reward cues, whereas activity in premotor cortex (PM) d reflected the level of motivation associated with those outcomes. eg, behavior data from rats performing a similar task. Rats licked more for large, appetitive outcomes, and less for punishing trials and were more accurate f and faster g for large rewards and potential punishing trials compared to neutral trials. h Neural recordings in nucleus accumbens show higher activity for high-valued cues than for lower-valued cues in one neural population, while other NAc neurons i showed salience signals for both high-valued reward and possible punishment trial types. Firing rates were normalized by subtracting the baseline and dividing by the standard deviation. Ribbons represent standard error of the mean (SEM). Gray-dashed aversive (aver); Black appetitive (app); Gray solid neutral (Neu). Adapted from Bissonette et al. (2013)

By manipulating both anticipated appetitive and aversive events, experimental procedures can dissociate value from motivation and salience. That is, cues that predict appetitive and aversive outcomes have opposite values, but both are highly salient and motivational. Several studies have done just that, dissociating these signals by motivating behavior with both the promise of reward and the threat of punishment. In these experiments, animals learn that conditioned stimuli (CS) predict potential rewards or the possibility of punishment. Typically, there are three trial types where the CS predicts: (1) a large reward (e.g., sucrose, juice); (2) a neutral condition or a small (or no) reward; and (3) a small (or no) reward with the threat of an aversive outcome, such as delivery of a bitter quinine solution, electric shock, or air puff to the eye (Rolls et al. 1989; Roesch and Olson 2004; Roesch et al. 2010a; Bissonette et al. 2013; Matsumoto and Hikosaka 2009; Brischoux et al. 2009; Anstrom et al. 2009; Lammel et al. 2011). If neurons encode value, neural activity should show a monotonic relationship during appetitive, neutral, and aversive trials (Fig. 2a; theoretical neural signals). If activity is modulated by factors like motivation or salience that vary with the strength of appetitive and aversive stimuli, neurons should respond with the same “sign” for appetitive and aversive trials compared to neutral trials. This approach has been applied to several brain areas thought to represent value in one form or another.

2.1 Orbitofrontal Cortex (OFC)

Most evidence prior to the described work clearly suggested that activity in OFC reflected value . OFC is strongly connected with the limbic system and has proven to be a key associative structure (Pickens et al. 2003; Burke et al. 2008; Ostlund and Balleine 2007; Izquierdo et al. 2004). Neurons in OFC respond to cues that predict differently valued rewards, as well as to the anticipation and delivery of rewards themselves. Activity in OFC is modulated by a number of economic factors (e.g., probability, effort, delay, and size) for both appetitive and aversive stimuli. Its firing is also influenced by the availability of alternative rewards (i.e., relative reward value) and how satiated the animal is. Still, all these neural representations might reflect how salient or motivating the predicted reward is, not its value.

To disentangle the two, Roesch and Olson recorded neural activity in OFC during performance of a task in which monkeys responded to cues that indicated the size of the reward the monkey would receive if correct (one or three drops of juice) and the size of a penalty that would be incurred upon failure (a 1 s or 8 s time-out). Figure 2a illustrates the 3 trial types. The first promises 3 drops of liquid reward with no threat of punishment. This trial type has high value and high motivation as demonstrated by high accuracy and fast reaction times. The second trial type (neutral) has low value and motivation because it only promises a small reward with no risk of punishment. The critical trial type is the third one. It also promises only a small reward but the threat of punishment is high; thus, it has low value but is highly motivational as evidenced by high accuracy and faster reaction times similar to those observed on high reward trials (Fig. 2b). Thus, choice rate and response latencies showed that monkeys were more motivated by large rewards and penalties as compared to smaller/neutral trials, allowing for the dissociation of value and motivation via simultaneous manipulation of appetitive and aversive outcomes (Roesch and Olson 2004).

In this study, OFC neural activity was found to encode the value associated with cues, as opposed to their motivational properties. That is, OFC neurons fired most strongly for cues that predicted large reward and least strongly for cues that predicted large penalty relative to neutral conditions (Fig. 2c). This was in stark contrast to neurons in an area of cortex more strongly associated with motor output, the premotor cortex, which fired at a higher rate during the preparation to move to achieve a large reward and to avoid the large penalty—i.e., they appeared to encode a factor related to the level of motivation to respond (Fig. 2d). Other studies have replicated these results in OFC and have further shown populations of neurons that encode potential rewards offers, the identity of specific rewards expected or obtained, and the option that is eventually chosen during performance of choice tasks (Hosokawa et al. 2007; Morrison et al. 2011; Padoa-Schioppa and Assad 2006, 2008).

From these and other studies in primates, it has been suggested that OFC encodes an abstract “common currency” (Padoa-Schioppa and Assad 2006; Padoa-Schioppa and Cai 2011; Rudebeck and Murray 2014); however, rodent studies have emphasized that OFC can also encode specific outcomes, representing the sensory qualities of available outcomes and potential maps of task/environmental space (Wilson et al. 2014; Schoenbaum et al. 2011a; Schoenbaum and Eichenbaum 1995). Some primate studies also support this notion, showing that OFC can convey sensory and informational aspects of reward in addition to their hedonic properties (Wallis and Miller 2003; Wallis et al. 2001; Tsujimoto et al. 2011, 2012). Nevertheless, all of these studies agree that OFC is critical for signaling predictions about future outcomes in the service of reward-guided decision-making .

2.2 Nucleus Accumbens

Next, we turn our discussion to the nucleus accumbens core (NAc), which receives strong glutamatergic projections from OFC pyramidal neurons. NAc has been described as the “critic” in actor-critic models of reinforcement learning, which model NAc as a generator of value predictions that are subsequently used by dopamine (DA) neurons to compute prediction errors necessary for updating actions polices in the “actor” (e.g., dorsal striatum) (Redish 2004; Joel et al. 2002; van der Meer and Redish 2011; Padoa-Schioppa 2011; Barto 1995; Niv and Schoenbaum 2008; Sutton and Barto 1998; Takahashi et al. 2008; Houk et al. 1995; Haber et al. 2000; Ikemoto 2007). In addition to this proposed role, NAc has been traditionally described as the “limbic-motor interface,” motivating behaviors in response to both appetitive and aversive stimuli, and not for representing value per se. Consistent with both of these theories, pharmacological manipulations of NAc impact motivated behaviors dependent on value expectations during a variety of tasks (Cardinal et al. 2002a, b; Berridge and Robinson 1998; Di Chiara 2002; Ikemoto and Panksepp 1999; Salamone and Correa 2002; Di Ciano et al. 2001; Wadenberg et al. 1990; Wakabayashi et al. 2004; Yun et al. 2004; Gruber et al. 2009; Stopper and Floresco 2011; Ghods-Sharifi and Floresco 2010; Floresco et al. 2008; Blokland 1998; Giertler et al. 2003), including reward seeking (Ikemoto and Panksepp 1999), cost-benefit analysis (Stopper and Floresco 2011; Floresco et al. 2008), and delay/effort discounting (Ghods-Sharifi and Floresco 2010; Cardinal et al. 2001). Furthermore, single-unit recordings have clearly demonstrated that neural activity in NAc is modulated by the value associated with cues that predict reward in rats (Setlow et al. 2003; Janak et al. 2004; Carelli and Deadwyler 1994; Day et al. 2011; Ito and Doya 2009; Goldstein et al. 2012; Nicola et al. 2004; van der Meer et al. 2010; van der Meer and Redish 2009; Lansink et al. 2010; Kalenscher et al. 2010) and monkeys (Cromwell et al. 2005; Shidara and Richmond 2004; Schultz et al. 1992; Kim et al. 2009; Nakamura et al. 2012) performing a variety of instrumental tasks, including go/no-go (Setlow et al. 2003; Schultz et al. 1992), lever pressing (Janak et al. 2004; Carelli and Deadwyler 1994; Day et al. 2011; Cromwell et al. 2005; Shidara and Richmond 2004), discrimination (Goldstein et al. 2012; Nicola et al. 2004; van der Meer et al. 2010), maze running (van der Meer and Redish 2009; Lansink et al. 2010; Kalenscher et al. 2010), and eye movement paradigms (Kim et al. 2009; Nakamura et al. 2012).

Thus, the basic finding across many of these studies is that single-unit activity in NAc is modulated by cues that predict reward after an instrumental response is performed. As above, this activity might reflect value or motivation. Therefore, application of a paradigm similar to the one described above allowed for an exploration of these different potential roles. In a study by Bissonette et al., rats performed a task where they were motivated by both reward (sucrose solution) and threat of punishment (quinine delivery). Rats found sucrose and quinine appetitive and aversive, respectively, as illustrated by increased and decreased licking, (Fig. 2e), but were most strongly motivated by cues that predicted either a large possible sucrose reward or a possible quinine punishment, as illustrated by better accuracy and faster reaction times for these cues, compared to a neutral cue (Neu) which only predicted a small sucrose reward, similar to the neutral cue in the aforementioned primate study (Fig. 2f, g). In addition, cues that predicted potential reward and punishment (odor cues) were presented before stimuli (lights) that instructed the instrumental response to dissociate value signals from specific motor planning.

Interestingly, activity of separate populations of single neurons in NAc encoded either value or motivation prior to the response instruction (i.e., light). This result suggests that NAc might represent the value of the goal the rat is working for as well as the motivational level associated with differently signed outcomes. Activity of some neurons was stronger for conditioned stimuli that predicted large reward and weaker for punishment trials relative to neutral trials (i.e., value representation; Fig. 2h). Other NAc neurons fired strongly for cues that predicted reward and punishment, respectively (i.e., motivation; Fig. 2i). These results suggest that NAc fulfills both motivational and evaluative functions via separate neuronal populations and might be critical for integrating both types of information as the “limbic-motor interface,” as well as the “critic” (Bissonette et al. 2013). Thus, NAc appears to be a common junction point for concurrently signaling value and motivation , leading to the invigoration of particular behavioral actions over others. This idea is consistent with pharmacological studies which suggest that DA in the NAc disrupts the ability to modify behavior based on the current value of predicted outcomes (Burton et al. 2013; Singh et al. 2010), as well as to interfere with behavioral measures of motivation or salience (Nunes et al. 2013; Salamone 1994; Salamone and Correa 2012; Salamone et al. 1991, 2012; Koch et al. 2000; Berridge 2007; Lex and Hauber 2010; Salamone 1986; McCullough and Salamone 1992).

2.3 Parietal Cortex

Finally, the most recent brain area to be scrutinized by separating value from salience is the parietal cortex. Parietal neural activity is dependent on the value of expected actions (action-value) (Platt and Glimcher 1999; Sugrue et al. 2004; Louie and Glimcher 2010) and is thought to be critical for making economic decisions (Louie and Glimcher 2010; Gold and Shadlen 2007; Sugrue et al. 2005; Rangel and Hare 2010). Others suggest that this signal reflects increased salience induced by the promise of a better reward, not the value associated with it. For example, Leathers and Olson reported that primate lateral intraparietal (LIP) neurons fire most strongly when a behavior is associated with a large versus small reward and more importantly, to cues that predicted a large versus small penalty (Leathers and Olson 2012). They argued that this signal reflected increased salience because activity was high for both large reward and penalty. They also suggest that this reflected salience (as opposed to motivation) because it did not span the delay between the cue and the behavioral response. This study has sparked considerable debate between influential leaders in the field (Leathers and Olson 2012, 2013; Newsome et al. 2013). Further work is necessary to determine whether salience versus value encoding in parietal cortex is task or procedurally specific. One possible middle ground in this debate might be that neurons in LIP, like NAc, encode both properties, and that one function is emphasized over the other depending on context.

3 Signed Prediction Error Versus Attention /Salience

The above discussion focused on neural firing leading to a decision and whether or not it reflects the value of potential goals or the motivation associated with obtaining those goals. Here, we consider what happens after the predicted outcome is or is not delivered, and whether it was better or worse than expected. These signals may be used to update previously described “value” and “motivation” signals when contingencies change.

Many brain areas increase activity in response to unexpected delivery of reward. Most commonly these signals are interpreted as “reward prediction signals.” However, this signal might also reflect other functions, such as changes in attention that result due to the delivery of unexpected reward. Note that this activity cannot represent motivation because motivation usually increases with training as one learns to expect the more valued reward. Furthermore, signals related to motivation are maintained as long as the animal is not satiated and/or is actively pursuing reward. The signals that we will describe in this section attenuate with learning, as rewards become anticipated (i.e., no longer unexpected).

We will discuss two types of prediction errors, signed and unsigned. Signed prediction error (PE) signals strengthen or weaken associations in downstream brain areas during learning and adaptive decision-making . In the coffee example, if the coffee is bad, positive associations with the coffee shop must be attenuated (negative PE) so that it is no longer sought after, but if it is really good, these associations must be strengthened (positive PE) to promote coffee-seeking at that specific coffee shop. In contrast, unsigned prediction errors modulate attention. In the example, attention increases both when the coffee is really great and when it is very bad. Increased attention is necessary to determine what in the environment caused the deviation from reward expectations (e.g., new coffee brand or barista). Attention and prediction error signals likely work together in an intricate manner that has not been fully realized in the literature. Prediction errors are necessary to increase attention, and attention is needed to detect and learn from prediction errors. How these two interact in the brain is an interesting question, but as a first step, we must dissociate these two highly interrelated functions. Below we describe studies that accomplish this by manipulation of both positive and negative events, in an unpredictable manner.

3.1 Midbrain Dopamine (DA) Neurons

Many midbrain dopamine (DA) neurons signal signed PEs that are essential for reinforcement learning (Steinberg et al. 2013). Phasic bursting of DA neurons intensifies when an outcome occurs that is “better” than predicted (positive PE), and firing of DA neurons decreases or is inhibited when an outcome is “worse” than anticipated (negative PE). When rewards are accurately predicted, outcomes elicit “no” change in DA neural activity (Schultz 1997, 1999). Over the course of learning, phasic DA bursting “shifts” from occurring at the time of reward delivery to the time of reward-predicting cues that precede outcomes. Dopaminergic reward prediction error signals are commonly referred to as a “teaching signals,” updating decision circuits about changes in contingencies so that behaviors can be modified when expectations are violated (Montague et al. 1996; Schultz 1998; Bromberg-Martin et al. 2010a).

Although PE signaling is frequently studied in the context of animals learning to approach cues that elicit an appetitive outcome, positive prediction errors appear to be elicited in any situation that is construed as being better than expected [e.g., short delay to reward, low effort to achieve reward, high probability (Roesch et al. 2010b)]. Positive PE signals are even induced when predicted negative events do not occur. This has been observed in primates, where DA neurons that increase firing to unexpected rewards also increase firing when an expected air puff is not delivered (Bromberg-Martin et al. 2010a), and when rats do not receive an anticipated foot shock during an avoidance procedure (Oleson et al. 2012).

The literature has recently turned its focus to how these DAergic prediction error signals may be derived. In its simplest form, reward prediction error computations require two pieces of information: the reward prediction and the reward that was actually received (actual minus predicted). As mentioned above, computational models refer to structures that signal predictions as the “critic.” They calculate the expected value based on the summed values of environmental cues. Structures proposed to fill this role include the nucleus accumbens, prefrontal cortex, and/or amygdala (O’Doherty et al. 2004; Belova et al. 2008; Balleine et al. 2008; Daw et al. 2005). Indeed, several studies have shown that information from the orbitofrontal and prefrontal cortex is necessary for expectancy-related changes in phasic firing of midbrain dopamine neurons (Takahashi et al. 2009, 2011; Jo et al. 2013).

Others have demonstrated that an error signal also occurs upstream of DA neurons. The lateral habenula (LHb) (Bromberg-Martin et al. 2010a; Hikosaka 2010; Matsumoto and Hikosaka 2007), which is thought to receive error information from globus pallidus border (GPb) (Hong and Hikosaka 2008), also signals signed reward prediction errors but in the opposite sign. Activity of these neurons is excited or inhibited by negative or positive prediction errors, respectively. Furthermore, prediction error signaling in LHb occurs earlier than those in DA neurons and stimulation of LHb inhibits DA firing (Matsumoto and Hikosaka 2007; Bromberg-Martin et al. 2010b). DA neurons are likely to receive this information via the adjacent rostromedial tegmental nucleus (RMTg) and indirect connections through midbrain GABA neurons in the ventral tegmental area (VTA) of the midbrain (Hong et al. 2011; Ji and Shepard 2007; Omelchenko et al. 2009; Jhou et al. 2009; Kaufling et al. 2009; Brinschwitz et al. 2010). Together these results suggest that DA neurons receive widespread information regarding reward predictions and prediction errors themselves.

Signed prediction errors are not the only information conveyed by midbrain DA neurons. Some midbrain DA neurons also signal motivational salience and are excited by both rewarding and aversive events. This signal is thought to support systems for orienting, cognitive processing, and motivational drive (Matsumoto and Hikosaka 2009; Bromberg-Martin et al. 2010a). These neurons are identified on a gradient, being more prominently found in ventromedial substantia nigra pars compacta (SNc) (Matsumoto and Hikosaka 2009) and tapering toward the VTA. These salience signaling DA neurons show increased activity for cues which signal either positive or negative potential events, such as foot shock and air puff (Brischoux et al. 2009; Anstrom et al. 2009; Lammel et al. 2011). PE- or value-related signals are more commonly found in the ventromedial SNc area of the midbrain and the lateral VTA. In contrast, motivational salience signals are more predominantly found in the dorsolateral SNc (Matsumoto and Hikosaka 2009). Interestingly, DA neurons in these different areas of the midbrain project to different downstream regions. DA neurons in the salience predominant regions project preferentially to areas of the prefrontal cortex such as dorsolateral prefrontal cortex (DLPFC) and to NAc, in particular the core of the nucleus accumbens, while DA neurons in the value- and PE-predominant region project mainly to ventromedial PFC and to the shell of the nucleus accumbens in the NAc (Bromberg-Martin et al. 2010a). In this way, DA signaling is able to both modify representations of expected outcomes, and the need for motivation and attention to salient features in the environment when contingencies are violated (Bromberg-Martin et al. 2010a).

3.2 Basolateral Amygdala (ABL)

Like midbrain DA neurons, several reports have now suggested that ABL also signals reward prediction errors. Traditionally, ABL was thought to serve many of the same functions as OFC consistent with their reciprocal connections (Bechara et al. 1999; Malkova et al. 1997; Kesner and Williams 1995; Hatfield et al. 1996; Cousens and Otto 2003; Parkinson et al. 2001; Jones and Mishkin 1972; Winstanley et al. 2004; Churchwell et al. 2009; Ghods-Sharifi et al. 2009; Cardinal 2006). Studies examining the encoding of both appetitive and aversive signals in OFC and ABL have shown that the two are dependent on each other for normal encoding when rats discriminate stimuli that predict appetitive (sucrose) and aversive (quinine) outcomes during reversal learning (Roesch et al. 2007a, 2010a, b; Haney et al. 2010; Saddoris et al. 2005; Schoenbaum 2004; Schoenbaum et al. 1998, 1999, 2000, 2003a, 2006, 2007, 2009, 2011a; Schoenbaum and Esber 2010; Schoenbaum and Roesch 2005; Stalnaker et al. 2007; Rudebeck and Murray 2008, 2011, 2014; Rudebeck et al. 2013a, b).

Although amygdala is important for signaling expected outcomes by acquiring and storing associative information related to both appetitive and aversive outcomes (LeDoux 2000; Murray 2007; Gallagher 2000; Schoenbaum et al. 2003b; Ambroggi et al. 2008), it also supports other functions related to associative learning, such as signaling attention to salient cues, uncertainty about outcome probability, and intensity of stimuli (Morrison et al. 2011; Saddoris et al. 2005; Tye and Janak 2007; Tye et al. 2010; Belova et al. 2007). These studies have shown that activity in ABL is modulated by the predictability of both appetitive and aversive events, specifically when predictions are violated (Roesch et al. 2010a, b; Tye et al. 2010; Belova et al. 2007). During events that are highly salient and attention grabbing, such as when an outcome is unexpectedly delivered or when an expected outcome is omitted, ABL neurons increase firing (Roesch et al. 2010a). In rats, it has been reported that activity in ABL increases when rats were expecting reward, but it was not delivered during extinction (Tye et al. 2010). Modulation of neural activity in lateral and basal nuclei of amygdala by expectation has also been described during fear conditioning in rats (Johansen et al. 2010). In primates, unexpected delivery of appetitive or aversive (air puff) outcomes caused amygdala neurons to fire more strongly than when they were completely predictable (Belova et al. 2007). Additionally, the same populations of ABL neurons which represent appetitive stimuli were also activated by aversive stimuli, regardless of the particular sensory modality from which the experience comes (Shabel and Janak 2009). These studies suggest a critical role for ABL in multiplexing signals related to attention, which may be modulated by salience or intensity, as well as signaling the associated value of cues and outcomes.

This hypothesis is consistent with data showing that outcome encoding ABL neurons exhibit increased firing to rewards that are unexpectedly delivered and omitted in a task in which reward expectations were violated by varying its size and the time to its delivery (Roesch et al. 2010a). Recall, these correlates are different than what we have described for DA neurons; DA neurons increase and decrease firing during unexpected delivery of reward, respectively. In this experiment, unexpected up-shifts in value occurred whenever a reward was larger than expected or arrived earlier than anticipated (Fig. 3a). On the other hand, down-shifts in value occurred whenever reward was unexpectedly smaller or was delayed (Fig. 3b). In this study, neurons in ABL tended to exhibit higher firing for both up- and down-shifts, whereas DA neurons exhibited increases and decreases in firing, respectively.

Fig. 3
figure 3

Neural activity in VTA and ABL responses to unexpected reward and omission consistent with Rescorla–Wagner or Pearce–Hall attention models, respectively. a Example trial types representing up-shifts in value, with an unexpected increase in reward quantity (left) or unexpected decrease in wait time for reward (right). Deflections reflect time of events. Heavy black lines and fluid drops reflect unexpected reward delivery (i.e., up-shift). In this task, odors predicted short (0.5 s) or long (1–7 s) delays to reward during “delay” blocks. In “size” blocks, odors predicted large (2 boli) or small (1 bolus) reward. b Example trial types representing a down-shifting in value, with an unexpected decrease in reward quantity or increase in wait time for reward. Deflections reflect time of events. Dashed gray lines and fluid drops reflect unexpected reward omission (i.e., down-shift). c, d Signals predicted by the Rescorla–Wagner (c) and Pearce–Hall (d) models after unexpected delivery (black) and omission (gray) of reward. e, f Average firing during the 500 ms after reward delivery in dopamine neurons in VTA (e) and for ABL (f) during the first ten trials when value of delivered reward was unexpectedly higher (up-shifts = black) and in blocks when the value of the reward was unexpectedly lower (down-shifts = gray) normalized to the maximum firing rate. Error bars indicate SEMs. Adapted from Roesch et al. (2010a)

Two common models for interpreting neural signals relating to expected and actual outcomes are the Rescorla–Wagner (Rescorla and Wagner 1972) (R-W) model (Fig. 3c) and the Pierce–Hall (PH) (Pearce and Hall 1980) model (Fig. 3d). The R-W model uses errors to drive the change in associative strength, where larger errors cause larger changes in associative strength, smaller errors drive smaller changes, and if no error occurs, there is no change in associative strength. Since these error values are computed from the expected versus actual outcomes, the sign of errors may be positive (outcome is better than expected) or negative (outcome is worse than expected) as observed in the firing of DA neurons (Fig. 3e). An alternative model presented by PH uses the absolute (unsigned) value of a prediction error to determine the amount of attention needed on following trials. A large prediction error will warrant a large increase in attention as observed in the firing of ABL neurons (Fig. 3f), whereas a small prediction error will elicit a corresponding smaller change in attention. Notably, this signal would take several trials to develop because attention on previous trials must be taken into account. Importantly, both models involve changes that are proportional to the size of the prediction error, but because the R-W model uses signed PEs, the changes in strength of PEs over learning is opposite for better than expected versus worse than expected trials (Fig. 3c). In contrast, in the PH model the changes in strength over training are equivalent for both of those types of trials (Fig. 3d). Note, although there is strong evidence that ABL signals are unsigned, consistent with PH signals, there is also some evidence for a population of ABL neurons which instead encode signed prediction errors (Belova et al. 2007; Klavir et al. 2013).

3.3 Anterior Cingulate (ACC)

One brain area that likely receives and transmits prediction error-related information to ABL is the ACC (Klavir et al. 2013). ACC has strong reciprocal connections with ABL (Dziewiatkowski et al. 1998; Cassell and Wright 1986; Sripanidkulchai et al. 1984) and has been shown to be involved in a number of functions related to error processing, conflict monitoring, behavioral feedback, and attention (Totah et al. 2009; Walton et al. 2004; Rushworth et al. 2004, 2007; Wallis and Kennerley 2010; Rudebeck et al. 2008; Rushworth and Behrens 2008; Hillman and Bilkey 2010; Matsumoto et al. 2007; Amiez et al. 2005, 2006; Sallet et al. 2007; Hayden et al. 2011; Quilodran et al. 2008; Kennerley et al. 2006, 2009; Kennerley and Wallis 2009; Ito et al. 2003; Carter et al. 1998; Holroyd and Coles 2002; Oliveira et al. 2007; Paus 2001; Scheffers and Coles 2000; Rothe et al. 2011; Magno et al. 2006). During tasks that manipulate the predictability of reward, ACC neurons signal unsigned prediction errors (Hayden et al. 2011; Bryden et al. 2011a) consistent with theories put forth by Pearce and Hall as described above. Further, it has been shown that activity in ACC was elevated at the beginning of behavioral trials after reward contingences changed unexpectedly and that these changes in firing were correlated with behavioral measures of attention (Bryden et al. 2011a). This is different than what is typically described in ABL, where modulation after violations in reward prediction induces attention-related changes at the time of reward delivery, not during subsequent presentations of cues. While these studies are consistent with the role that ACC might play in allocating attention when prediction errors occur, other primate studies have suggested that ACC can also signal prediction errors of the signed variety (Klavir et al. 2013; Matsumoto et al. 2007).

4 Correlates of Motivation and Associative Encoding in Striatum

In the above sections, we dissociate value from motivation and salience and define signals specifically related to signed prediction errors, which increase and decrease associative strength, and unsigned prediction errors, which increase attention so that learning can occur. How do these signals impact behavior? One conduit is striatum; however, even in striatum signals are not simply integrated and transformed into motor output. Striatum is a structure with many overlapping neural correlates related to reward-guided and stimulus-driven behaviors. Neurons there have been shown to be responsive to cues that predict reward, during initiation of actions, the anticipation of reward, and the delivery of reward (Yin and Knowlton 2006). Although there are many overlapping correlates, there appears to be a basic trend along the diagonal of striatum (ventral-medial to dorsal-lateral) by which correlates are more reward-related in ventral-medial regions, whereas correlates are more associative and motor-related in more dorsal-lateral sections.

This has been demonstrated by neural recordings in primates that have divided the caudate (one of three main regions of the primate striatum, of which the dorsal-medial striatum in rats is the closest anatomical homologue) into three subdivisions (dorsal, central, and ventral) (Nakamura et al. 2012) based on anatomical connections with cortical and subcortical structures as reported by Haber and Knutson (2010), and in accordance with tripartite subdivisions observed in humans (Fig. 4b) (Karachi et al. 2002). In this study, monkeys performed a task in which they fixated at central spot and then responded to a target to the left or the right of fixation. During different blocks of trials, one of those produced a large reward, the other a small reward (Fig. 4a).

Fig. 4
figure 4

Neural correlates across primate striatum. a Visually guided saccade task with an asymmetric reward schedule. After the monkey fixated on the FP (fixation point) for 1200 ms, the FP disappeared and a target cue appeared immediately on either the left or right, to which the monkey made a saccade to receive a liquid reward. The dotted circles indicate the direction of gaze. In a block of 20–28 trials (e.g., left-big block), one target position (e.g., left) was associated with a big reward and the other position (e.g., right) was associated with a small reward. The position–reward contingency was then reversed (e.g., right-big block). b Subdivisions of the primate striatum. c Percentage of neurons that showed large-reward preference for each subdivision of striatum. Modified from Nakamura et al. (2012), Roesch et al. (2009), Burton et al. (2014), Roesch and Bryden (2011)

As expected from the anatomy, the functional segregation as the recording electrode passed through ventral-medial to dorsolateral striatum reflected the progression of limbic to associative to sensorimotor afferents (Haber and Knutson 2010; Lau and Glimcher 2007, 2008; Samejima et al. 2005; Stalnaker et al. 2010). Recordings obtained more ventral medially demonstrated that activity was driven by the expected size of the reward, with fewer representations related to the action necessary to achieve it. Ventral-medial portions of the caudate, including the central and ventral part, tended to fire more strongly for large versus small rewards than dorsal caudate (Fig. 4c) and fewer neurons were modulated by the direction of the behavioral response compared to dorsal caudate. This is in contrast to dorsal-lateral portions of caudate which exhibited high directionality and reward selectivity but showed no preference for large over small reward. This study suggests a continuum of correlates with ventral-medial aspects of striatum reflecting value, and more dorsal-lateral regions better reflecting associative and motor aspects of behavioral output.

Similar results have been obtained in studies which examine the difference across the ventral-medial/dorsal-lateral divide in monkeys (Hollerman et al. 1998; Apicella et al. 1991; Cai et al. 2011) and in rats (van der Meer et al. 2010; Takahashi et al. 2007; Wang et al. 2013). For example, in a task where odor cues predicted differently valued rewards and the direction necessary to achieve those rewards (Fig. 5a), the majority of NAc neurons fired significantly more strongly for cues that predicted high-value outcomes for actions made in one particular direction (into the cell’s response field) (Roesch et al. 2009; Burton et al. 2015). Further, faster response times (a behavioral measure of more motivated behavior) were correlated with higher firing rates. These data suggested that activity in NAc represents the motivational value associated with chosen actions necessary for translating cue-evoked value signals into motivated behavior as discussed above (Bissonette et al. 2013; Roesch et al. 2009; McGinty et al. 2013; Catanese and van der Meer 2013). These findings are consistent with those described in Sect. 1, showing that separate NAc neurons encode both value and motivation. To characterize neurons, we performed an ANOVA with value (high or low) and direction (contralateral or ipsilateral to the recording site) as factors on activity collected during the decision period (cue onset to response). In NAc, roughly equal proportions of neurons were selective for contralateral and ipsilateral response directions; however, of those selective for value, the large majority fired more strongly for high-value reward (Fig. 5b; NAc).

Fig. 5
figure 5

Neural correlates across rat NAc and DLS. Odor-guided choice task during which the delay to and size of reward were independently varied in ~60 trial blocks (i.e., “blocks 1–4”). Upon illumination of house lights, rats started the trial by poking into the central port. After 500 ms, an odor signaled the trial type. For odors 1 and 2, rats had to go to a left or right fluid well to receive reward (forced-choice trials). A third odor signaled that the rat was free to choose either well to receive the reward that was associated with that response direction during the given block of trials. In blocks 1–4, the length of the delay (blocks 1 and 2) to reward and the size of reward (blocks 3 and 4) were manipulated: short delay = 0.5 s wait before delivery of 1 bolus reward; Long delay = 1–7 s wait before 1 bolus reward; Big reward = 0.5 s wait for 2–3 boli reward; Small reward = 0.5 s wait before 1 bolus reward. Throughout each recording session, each of these trial types were associated with both directions and all three odors, allowing us to examine different associative correlates. b Locations of recording sites in rat NAc, DMS, and DLS and percentage of significantly modulated neurons. More NAc neurons encoded high-valued options (NAc, black bar). Representations of outcome were evenly distributed in DLS, while number of response direction encoding neurons was significantly elevated (contralateral gray bar). Chi-square was used to compare counts of neurons. Hi High value; Lo Low value; Con contralateral to recording site; Ipsi Ipsilateral to the recording site. c and d Example of a single neuron recorded in DLS for each of the trial types for forced (c) and free (d) choice odors. Modified from Nakamura et al. (2012), Roesch et al. (2009), Burton et al. (2014), Roesch and Bryden (2011)

In contrast to NAc, encoding in dorsolateral striatum (DLS) was remarkably associative, representing all aspects of the task; outcome type, response direction, and the specific identity of conditioned stimuli (Burton et al. 2014, 2015). For example, the neuron in Fig. 5c responded the most strongly for odor cues that predicted a short delay after moving in the contralateral direction. Many DLS neurons were selective for very specific combinations of task events related to future outcomes, responses, and the stimuli that preceded them; however, there was not a preponderance of neurons that showed increased firing for any one combination over another, as was observed in NAc. If we characterize single neurons, the same as we did for NAc, we see that there is a preponderance of neurons showing a preference for contralateral movements, but of those that are value selective, equal proportions preferred high- and low-value reward (Fig. 5b; DLS). These results suggest that neural correlates in NAc are more closely tied to the motivational level associated with choosing high-value goals, whereas correlates in DLS are more associative, representing expected outcomes and response directions across a range of stimuli.

Although NAc and DLS are not directly connected, more ventral-medial regions like NAc likely impact more dorsal-lateral areas in striatum via the “spiraling” connectivity with dopamine (DA) neurons (Joel et al. 2002; van der Meer and Redish 2011; Niv and Schoenbaum 2008; Takahashi et al. 2008; Houk et al. 1995; Haber et al. 2000; Ikemoto 2007). The circuitry allows propagation of information from limbic networks to associative and sensorimotor networks so that behaviors can be more stimulus driven and therefore more efficient (Haber et al. 2000; Ikemoto 2007; Haber and Knutson 2010; Haber 2003; Balleine and O’Doherty 2010). This connectivity is consistent with ideas that behavior is first goal-directed and then become habitual with extended training (for more information on habitual action mechanisms, please see the chapter by O’Doherty in this volume). However, several studies have now shown that the NAc and DLS can function independent of each of each other, especially after learning. That is, these functions do not seem to always work in series, but can operate in parallel. For example, we have found that NAc lesions do not impair task-related correlates in DLS, but instead enhances neural correlates related to stimulus and response processing (Burton et al. 2013). These results suggest that rats normally use goal-directed mechanisms to perform the task, but rely more heavily on stimulus–response (S-R) processes after the loss of NAc (Lichtenberg et al. 2013). Consistent with this interpretation, rats recovered function after initial disruption observed after NAc lesions. Similar results have been described for NAc after loss of DLS function; rats recover function after an initial impairment was observed due to DLS interference (Nishizawa et al. 2012). This recovery was abolished with subsequent NAc lesions suggesting that it was NAc that was guiding behavior in DLS’s absence (Nishizawa et al. 2012). Together, these results suggest that NAc and DLS both guide behavior normally during performance of reward-related tasks and that these two regions can compensate for each other if the need arises. Note that this is not to suggest that they do not normally interact, particular during learning or when reward contingencies change.

5 Integration of Positive and Negative Information into a Common Output Signal

How independent representations of positive and negative valences are converted to a unified representation of expected outcome value, which ultimately leads to motivated behavior, is still unclear. A number of brain areas are thought to represent abstract value in the service of making comparisons, allowing the brain to compare apples to oranges (or coffee to tea). Under choice paradigms, Lee and colleagues have found integrative encoding of value in several brain regions (dorsal striatum, ventral striatum, and DLPFC) using an intertemporal choice task (Cai et al. 2011; Kim et al. 2008). Other studies have clearly shown neural activity reflecting value in several frontal areas in primate cortex (Kim et al. 2008; Roesch and Olson 2005a, b). In primates, several prefrontal regions contain neurons that integrate several economic factors, including cost, size, delay, and probability (Padoa-Schioppa and Assad 2006; Padoa-Schioppa and Cai 2011; Rich and Wallis 2014; Wallis and Kennerley 2011; Kennerley et al. 2011; Wallis 2007; Padoa-Schioppa 2007). However, others have reported that distinct prefrontal areas encode rewards and punishments, with ventral and dorsal aspects being more active for appetitive and aversive trial types, respectively (Monosov and Hikosaka 2012). It is difficult to find one clear path by which predicted outcomes lead to motivated behavior. This is likely because there are multiple paths that work in parallel, which interact at different phases of behavior and are highly dependent on the context. For an in depth discussion on valuation systems in the brain in the context of decision-making processes, see Redish et al., in this volume. It has also been difficult to clearly separate value signals from signals that might represent attention, salience, and/or motivation. With that being said, there is a trend by which brain areas more closely tied to attention and motor networks appear to better reflect the integration of economic functions.

For example, both ACC and parietal cortex are thought to be critical for functions related to attention and motor planning, but they have also been described as encoding value in the service of making economic decisions. The fact that value and salience signals have been observed in these regions might reflect the need for increased attentional control when potential rewards are available. Increases in neural firing depending on the value of expected actions would help ensure that neural processes are prioritized depending on expected events (whether they are positive or negative). Interestingly, integration of value predictions and spatial attention has been shown to be integrated in clusters of neurons in primate prefrontal and parietal cortex (Kaping et al. 2011). Further research will need to be done to fully appreciate how specific nodes contribute to the process of transforming value signals into executive control signals.

Out of all the rat brain areas that we have recorded from in our laboratory, only the firing of DA neurons exhibited a strong correlation between delay and size of reward, reflecting a common representation of value. DA neurons fired more strongly to cues that predicted high reward, whether it be a short delay or a large-sized reward, and were inhibited by cues that predicted low reward, whether it be a long delay or a small sized reward. Importantly, these were the same neurons that encoded reward prediction errors during unexpected reward delivery and omission. This is interesting, considering that dopaminergic input is thought to build associations regarding value in these brains areas. In our rat studies, very few brain areas appear to be computing this function. That is, we found correlates related to delay length and reward magnitude in multiple brain areas including OFC, ABL, ACC, DMS, and DLS, all of which maintained dissociable activity patterns, which were signaled by different neurons. Even in NAc, where the population of neurons fired more strongly for cues that predicted shorter delays to reward and larger magnitude rewards, and whose activity was correlated with motivated output (i.e., reaction time), there was only a weak trend for single neurons to represent both size and delay components. Although some neurons did encode both factors at the single cell level, many more neurons represented one economic variable but not the other. This trend toward common encoding likely reflects the conversion of expected outcome information into appropriate motor signals at the level of NAc (i.e., limbic-motor interface). Consistent with this hypothesis, when looking even further downstream, activity in the substantia Nigra pars reticulata (SNr) does appear to reflect a common evaluation of goals, which likely better reflects its role as a motor output structure rather than a reporter of economic value (Bryden et al. 2011b). Even within SNr, significant correlations between delay and size were relatively weak, suggesting that we might have to move even deeper into the motor system to find a common output signal for similarly valued outcomes.

6 Conclusion

Conditioned stimuli that predict reward simultaneously set into motion several functions as illustrated in Fig. 1a. These functions are highly interrelated but can be distinguished through manipulations of reward certainty, positive and negative outcomes, and division into goal- and stimulus-driven behaviors. To a degree, these functions map on to specific brain areas in the reward/decision circuit depicted in Fig. 1b, but they clearly depend on each other and there is redundancy within the circuit. OFC appears to represent value expectancies necessary for guiding decision-making and learning. These signals are informed by ABL, which represents associative information as well as the intensity or salience of behavioral outcomes. Simultaneously, OFC and ABL broadcast their information to NAc and DA neurons. Prediction error and salience signals generated by DA neurons provide feed-forward information to more dorsal-medial and dorsal-lateral regions in striatum, which are critical for goal-directed and habitual behaviors, respectively. These regions receive specific inputs from cortex consistent with these roles. The medial regions of striatum receive dense projections from the medial prefrontal cortex (mPFC) and the anterior cingulate cortex (ACC). The DLS is densely innervated by sensorimotor cortex (SMC), but also receives projections from mPFC and ACC. Importantly, basal ganglia signals that loop back onto cortex are likely to be critical for modulating representations related to reward-guided decision-making in cortical areas and can drive behavior in their own right. Cortical input from parietal and ACC likely increase attention al control in conjunction with value encoding to ensure that neural processes are prioritized in downstream areas depending on expected actions and errors in reward prediction.

Well-designed behavioral tasks that continue to dissociate these highly related brain functions, combined with techniques that can probe for causality are necessary to continue to disentangle the roles that neural correlates play in converting positive and negative events into abstract representations of value and motivated behavior, and to elucidate critical relationships between nodes within the circuit. Understanding how these circuits work to produce behavior allows us to look for alterations in neural signals in animal models of human disorders, such as models of psychological disorders, drug addiction, and aging (Hernandez et al. 2015; Roesch et al. 2007b, 2012a, b; Gruber et al. 2010; Stalnaker et al. 2009). Future therapeutic methods (behavioral, psychological, pharmacological, etc.) should focus on restoring the lost or changed signals as observed in these animal models to restore lost functions observed in psychiatric disorders.