Introduction

Mood disorders, including major depressive disorder (MDD) and bipolar disorder, are associated with reward processing deficits, which contribute to some of the functional impairments that characterize these conditions (American Psychiatric Association 2013; Whitton et al. 2015). Reward processing deficits are often broadly classified as anhedonia, or loss of interest or pleasure, although recent evidence suggests that there may be subtle variations in more distinct reward processes, such as consummatory pleasure, motivation, and reward learning, in different psychiatric disorders (Barch et al. 2016; Gold et al. 2008; Pizzagalli et al. 2008a; Pizzagalli et al. 2008b; Treadway and Zald 2011). Moreover, these different reward processes are mediated by distinct neurobiological mechanisms (Der-Avakian and Markou 2012), which have important implications for treatment of reward deficits across psychiatric disorders.

Reward learning is defined as the ability to learn associations between environmental stimuli and behavioral actions that predict rewarding outcomes, and to subsequently repeat the behavioral actions that maximize the probability of obtaining rewards. Reward learning can be assessed objectively in humans using a probabilistic reward task (PRT) (Pizzagalli et al. 2005; modified after Tripp and Alsop 1999). In this task, individuals attempt to accurately identify which of two ambiguous stimuli (e.g., lines with a miniscule difference in length) is briefly presented in a given trial on a computer screen in order to receive a reward. Unbeknownst to participants, over the course of the test session, correct identification of one stimulus (e.g., the longer of the two lines) is reinforced three times more frequently than correct identification of the other stimulus (e.g., the shorter of the two lines). Under these experimental conditions, healthy individuals reliably develop a response bias, or preference, for the more frequently reinforced stimulus (e.g., they more frequently indicate that they saw the longer of the two lines) regardless of which stimulus was actually presented. Thus, reward learning assessed in the PRT is operationally defined as the development of a response bias toward (i.e., propensity to indicate they had seen) the stimulus that is associated with a history of more frequent reinforcement (called the rich stimulus), and reflects the rapid adaptation of behavioral choices based on prior reinforcement experiences. In contrast, individuals with current MDD (Liu et al. 2011; Pizzagalli et al. 2008b; Vrieze et al. 2013) and past MDD (Pechtel et al. 2013), euthymic subjects with bipolar disorder (Pizzagalli et al. 2008a), chronic smokers after 24 h of nicotine abstinence (Pergadia et al. 2014), and healthy subjects with elevated depressive symptoms (Pizzagalli et al. 2005) fail to develop a response bias for the more frequently reinforced stimulus. That is, despite being reinforced more for correctly identifying one stimulus over the other, these individuals respond with similar accuracy for both stimuli, reflecting impaired reward learning.

Stress can precipitate mood disorders, or the expression of related symptoms, such as anhedonia, in healthy individuals (Berenbaum and Connelly 1993; Charney and Manji 2004; Kendler et al. 1999), as well as deficits in brain reward system function in laboratory animals (Der-Avakian et al. 2014; Donahue et al. 2014). Critically, stress has also been shown to disrupt reward learning in humans as assessed with the PRT and reflected by reductions in the development of response biases directed toward the rich stimulus (Bogdan et al. 2010; Bogdan and Pizzagalli 2006; Bogdan et al. 2011; Pizzagalli et al. 2007). Thus, stress-induced disruption of reward learning may represent a biomarker of MDD and other mood disorders.

We recently designed and validated a preclinical analog of the clinical (human) PRT to assess reward learning in rats (Der-Avakian et al. 2013; Pergadia et al. 2014). As in the human version of the task, healthy rats develop a response bias for the more frequently reinforced of two ambiguous stimuli in the preclinical version of the task. Rats complete 100 trials, the ambiguous stimuli are long- and short-duration tones (auditory stimuli, owing to putative limitations in visual acuity), and the reward is food pellets. In contrast, in the clinical version of the task, participants complete 300 trials, the ambiguous stimuli are long and short lines on a computer screen (visual stimuli), and the reward is money. Indeed, despite some species-specific design features, in both humans and rats, administration of a low dose of the dopamine D2/D3 receptor agonist pramipexole (assumed to decrease striatal dopamine transmission via autoreceptor activation) and withdrawal from chronic nicotine exposure each blunted the development of response biases in nearly identical manners (Der-Avakian et al. 2013; Pergadia et al. 2014). Given these parallels, we predicted that stress exposure in rats would blunt response biases in the PRT, exactly as in humans (Bogdan et al. 2010; Bogdan and Pizzagalli 2006; Bogdan et al. 2011; Pizzagalli et al. 2007). To test this hypothesis, we exposed rats to social defeat, a rodent model of psychosocial stress that has been shown to disrupt brain reward system function in rats and mice (Der-Avakian et al. 2014; Donahue et al. 2014), and assessed reward learning in our preclinical analog of the PRT.

Reward learning is thought to rely on interactions between the anterior cingulate cortex (ACC in humans; Cg1 in rodents) and striatum. Specifically, lesion, functional magnetic resonance imaging (fMRI), and electroencephalogram (EEG) studies have highlighted a key role of the ACC in integrating reinforcement over time in order to guide adaptive behavior (Amiez et al. 2006; Bogdan et al. 2011; Ernst et al. 2004; Kennerley et al. 2006; Rushworth et al. 2007; Santesso et al. 2008). Additionally, various key aspects of reinforcement learning have been hypothesized to involve mesolimbic dopamine neurotransmission (Glimcher 2011; Maia and Frank 2011). Consistent with this view, MDD, which is characterized by reward deficits, is associated with disruption of the ACC and mesoaccumbal dopamine circuit (Dunlop and Nemeroff 2007; Lambert et al. 2000; Nestler and Carlezon 2006; Nutt 2006; Whitton et al. 2016). Thus, after stressed and non-stressed rats were tested in the PRT, we examined whether social defeat altered the expression of several stress- and MDD-related genes in Cg1, ventral striatum (i.e., nucleus accumbens (NAc) shell and core), and ventral tegmental area (VTA), a dopamine-rich nucleus that projects to the ACC/Cg1 (Onn and Wang 2005) and NAc (Swanson 1982). We hypothesized that social defeat would alter expression of stress- and MDD-related genes within key nodes implicated in reward learning. Owing to recent work implicating nociceptin and dynorphin systems in motivational states and depressive behaviors (Carlezon and Krystal 2016; Gavioli et al. 2004; Post et al. 2016), our primary analyses focused on them. For comparison, we also examined expression of a broad panel of genes (reflected by mRNA levels) that encode proteins implicated in neuronal function, plasticity, and inflammation.

Materials and methods

Subjects

Forty-eight male Wistar rats (Charles River Laboratories, Raleigh, NC, USA) weighing approximately 300 g at the beginning of the experiment were housed in pairs in standard rat Plexiglas cages with food and water available ad libitum. Upon initiation of behavioral training, the rats were restricted to 32 g of food per day per cage (i.e., approximately 16 g of food per rat) to facilitate responding for food pellets during training and test sessions. In addition, each rat received approximately 4 g of food pellets (Test Diet 5TUM; Richmond, IN, USA) per day during training and test sessions. Body weights were monitored three times per week to verify that rats gained weight throughout the experiment. The rats were maintained in a climate-controlled colony room at 21 °C on a 12-h reverse light/dark cycle (lights off at 06:00); all experiments were conducted during the dark phase in rooms illuminated by red light. Of the 48 rats, 19 were included in a previously published study (Der-Avakian et al. 2013) and had received a single low dose of pramipexole (0.1 mg/kg) and amphetamine (0.5 mg/kg) after training and at least 18 days prior to the test sessions reported here. Thirty-two male and 32 female Long-Evans rats between the ages of 6 and 12 months were used as residents during the social defeat procedure (see below). All procedures were conducted in accordance with the guidelines from the National Institutes of Health and the Association for the Assessment and Accreditation of Laboratory Animal Care and were approved by the Institutional Animal Care and Use Committee.

PRT apparatus

Behavioral training and testing were conducted in operant testing chambers that consisted of two metal retractable levers, a food receptacle located between the levers, and a single speaker positioned above the food receptacle (Med Associates, St. Albans, VT, USA). Tones were generated using a multipurpose sound generator, and all programs and data collection were controlled by a computer that ran MED-PC IV software (Med Associates, St. Albans, VT, USA; see Der-Avakian et al. 2013 for details).

PRT procedure

Tone discrimination training

The training procedure was developed to mirror the instructions presented to human subjects tested with the PRT (see Pizzagalli et al. 2005 for details) and has been described in detail previously (Der-Avakian et al. 2013). Briefly, the rats were trained to discriminate between two tone stimuli that varied in duration (5 kHz, 60 dB, 0.5 or 2 s) by pressing one of the two levers associated with each tone. Tone durations and lever sides were counterbalanced across subjects, and tones were presented in a random order over 100 trials. Each trial was initiated with presentation of a tone, after which the levers were extended, and the rats had a 5-s limited hold period to respond. In each trial, correct identification of the tone stimuli resulted in delivery of a single 45 mg food pellet. Both levers retracted after a correct, incorrect, or omitted response, followed by a variable intertrial interval between 5 and 8 s. The rats were trained daily until they achieved at least 70% accuracy for 5 consecutive days. The training program was then modified to reinforce only 90% of correct responses for 2 consecutive days, followed by 2 days each with 80, 70, and 60% reinforcement for correct responses. It is important to note that in adequately trained rats, accuracy was unaffected by the gradual decrease in reinforcement.

Social defeat

Upon fulfilling the criteria for acquiring the task, the rats were assigned to receive either social defeat or no stress. Male Wistar (trained) rats receiving social defeat (i.e., intruders) were transported to a separate room housing male and female Long-Evans rats (i.e., residents) selected for aggressive behavior. Each intruder was placed inside the resident’s cage (61 × 43 × 20 cm) behind a perforated Plexiglas partition physically separating the rats, with food (restricted) and water (ad libitum) available. At 08:00 h the following day, the resident female Long-Evans rats and partitions were removed to allow physical confrontations between the two male rats (i.e., resident and intruder). Social defeat was defined as the intruder displaying a defensive, supine posture for 3 consecutive seconds. After social defeat or 3 min (whichever occurred first, i.e., all intruders displayed a supine posture, but not all intruders maintained that posture for 3 consecutive seconds), intruders were transferred to and housed within another resident’s cage (separated by a partition) for 24 h until the social defeat procedure was repeated. Intruders were exposed to 3 days of social defeat and were never paired with the same residents twice. Control (i.e., no stress) rats were briefly handled in the vivarium during the 3 days prior to testing.

PRT test

Twenty-four hours after the third social defeat session, all the rats were transported to the laboratory for behavioral testing. During the test session, tone durations that were more ambiguous than the training tones (i.e., 0.9 and 1.6 s) were reinforced for 60 and 20% of correct responses (counterbalanced across subjects) over 100 trials, which is identical to the reinforcement ratio used in the human PRT (Pizzagalli et al. 2005). The more frequently rewarded stimulus is defined as the “rich” stimulus, whereas the less frequently rewarded stimulus is defined as the “lean” stimulus. Ambiguous tones are necessary to allow for the probabilistic reinforcement schedule to shape behavior in naïve, healthy rats (i.e., to correctly identify the rich stimulus and incorrectly identify the lean stimulus, resulting in a response bias toward the rich stimulus).

Quantitative real-time reverse transcriptase polymerase chain reaction (qRT-PCR)

Immediately after testing, the 29 rats that were not part of the previously published study (Der-Avakian et al. 2013) were killed by decapitation, and their brains were removed, placed under dry ice until frozen, and stored at −80 °C. Isolation of brain tissue and qRT-PCR methods are based on previously described methods (Chartoff et al. 2016). Briefly, frozen brains were coronally sectioned on a cryostat (HM 505 E; Microm; Walldorf, Germany) until the following regions were exposed: Cg1 (Bregma 3.72 mm), NAc core (Bregma 2.52 mm), NAc shell (Bregma 2.52 mm), and VTA (Bregma −5.04 mm), based on the atlas of Paxinos and Watson (2007). Bilateral tissue punches 1–1.5 mm in length were taken with a 1-mm-internal-diameter corer (Fine Science Tools; Foster City, CA, USA) and placed in Eppendorf tubes kept on dry ice and then stored at −80 °C. Total RNA was extracted using PureLink RNA Mini Kit (Invitrogen; Carlsbad, CA, USA), and cDNA was synthesized from 250 ng total RNA using iScript cDNA Synthesis Kit (BioRad; Hercules, CA, USA) in a ThermoHybaid iCycler (Thermo Scientific; Waltham, MA, USA). cDNA was diluted 5X for qRT-PCR reactions, which were run on a MyiQ Single Color Real-Time PCR Detection System using iQ SybrGreen Supermix (BioRad). Each 20 μl reaction contained 10 μl SybrGreen Supermix, 2 μl ultra-pure distilled water (Gibco; Waltham, MA, USA), 2 μl each of 3 nM forward and reverse primers, and 4 μl diluted cDNA. qRT-PCR cycling conditions were 95 °C for 5 min; 40 cycles at 94 °C for 15 s, 60 °C for 15 s, and 72 °C for 15 s (except in the case of adenylate cyclase activating polypeptide 1 receptor type 1 (PAC1) reaction where 30-s elongation time was used given its larger amplicon size). Data were collected at read temperatures of 79–88 °C for 15 s (30 s for PAC1) depending on amplicon melt temperatures. Reactions were run in duplicate and the values were averaged.

Standard dilution curves (run on each reaction plate) were generated for each primer set by serially diluting (1.00-, 0.25-, 0.0625-, and 0.0156-fold) a master cDNA stock comprising an equal mix of cDNA from all treatment groups for a given brain region. MyiQ Optical System Software (BioRad) calculated a standard curve by fitting a least squares linear regression curve to the log10 of the dilution values plotted against the threshold cycle values. Unknown sample starting quantities were then determined by plotting each unknown well’s threshold cycle against the standard curve. Each plate contained wells with “no cDNA template” and “no reverse transcriptase (no RT)” as controls for contamination and amplification of genomic DNA, respectively. Reported starting quantity values for each sample were normalized to the averaged starting quantity values across three reference genes: Itm2B, Clnx, and NBA. To assess cDNA integrity, two Itm2B primer sets were used: one set specific to the 3′, and one set specific to the 5′, ends of the transcript.

Primers for qRT-PCR were designed using NCBI Primer-BLAST (http://www.ncbi.nlm.nih.gov/tools/ primer-blast/) and purchased from Integrated DNA Technologies (Coralville, IA, USA). Gene names, NCBI reference sequence numbers, product sizes, and primer sequences used in this study can be found in Table 1.

Table 1 Primer sequences used for qRT-PCR

Data and statistical analyses

Behavioral data collected by the MED-PC IV software included correct, incorrect, and omitted responses and reaction times for the rich and lean stimuli for each individual trial and cumulated across blocks 1 (trials 1–33), 2 (trials 34–67), and 3 (trials 68–100). For each block, response bias (RB), the primary dependent variable, was calculated as follows:

log b = 0.5 × log[([RichCorrect + 0.5] × [LeanIncorrect + 0.5]) / ([RichIncorrect + 0.5] × [LeanCorrect + 0.5])], exactly as in the human task. A value of 0.5 was added to each cell to allow for calculations in cases of cells with a value of 0 (Pizzagalli et al. 2008b). A response bias develops when subjects correctly classify the rich stimulus (i.e., the stimulus associated with three times more frequent reward) and misclassify the lean stimulus (i.e., pressing the lever associated with the rich stimulus when the lean stimulus was presented). As in prior studies in humans (e.g., Pizzagalli et al. 2008a; Santesso et al. 2008), a reward learning score was computed as ΔRB = RB (block 3)  RB (block 1) and compared between stress and control groups using an unpaired two-tailed t test.

Similar to human studies, discriminability was calculated for each block as follows:

log d = 0.5 × log[([RichCorrect + 0.5] × [LeanCorrect + 0.5]) / ([RichIncorrect + 0.5] × [LeanIncorrect + 0.5])]. Discriminability captures the ability to differentiate between the stimuli and can thus be taken as a proxy of task difficulty. In addition, accuracy (i.e., number of correct responses / [numbers of correct + incorrect responses]) and reaction time were averaged within each block for each treatment group and stimulus type (i.e., rich/lean), exactly as in the human task. Because development of response bias is dependent on the ratio of rich vs. lean reinforcements, and because rats were reinforced as a percentage of correct responses, rats were excluded if the ratio during testing was 6:1 or greater, which resulted in a disproportionately high amount of reinforcements for the rich stimulus.

Response bias and discriminability scores were analyzed using a two-way analysis of covariance (ANCOVA; see below for covariate description), with Block as a within-subjects factor and Stress as a between-subjects factor. To determine whether prior exposure to pramipexole and amphetamine in the first cohort of rats affected response bias, Cohort was analyzed as a between-subjects factor in a separate ANCOVA and revealed no significant effects of cohort (data not shown). Accuracy and reaction time were analyzed using similar ANCOVAs, in which Stimulus Type (rich vs. lean) was an additional within-subjects factor. Some rats responded asymmetrically for one or the other stimulus when equally reinforced during training sessions, suggesting that some degree of inherent bias was present during the test session for these subjects, regardless of the differential reinforcement. Thus, as in our prior work (Der-Avakian et al. 2013), variability of inherent response patterns was controlled for using a covariate, defined as the change in response bias between the first and third blocks (ΔRB) during the training session immediately prior to the test session, when both stimuli were equally reinforced. To calculate ΔRB during the final 100-trial training session, trials were separated into blocks (1–3), exactly as described above for the test session. Bonferroni post hoc tests were performed where appropriate. None of the analyses produced violations of sphericity, and thus, no corrections were applied.

Unpaired two-tailed t tests were used to analyze fold-change differences in the starting quantities of mRNAs between the Control and Stress rats. Pearson’s product-moment correlation coefficient (Pearson’s r) tests were used to analyze relationships between fold-change differences in the starting quantities of mRNAs and reward learning (ΔRB) in (a) stressed rats only or (b) stressed and control rats combined. Non-stressed control rats were included in the correlation analyses to determine whether natural variability in reward learning contributed to potential changes in mRNA expression.

The level of significance for all tests was set at 0.05. Throughout the analyses, Cohen’s d values are reported for the main findings to quantify effect sizes.

Results

Behavioral results

Five stressed rats and three control rats were excluded because of high rich:lean ratios during testing (≥6:1), and one stressed rat was excluded for not responding to the tone stimuli during testing, leaving 19 stressed and 20 control rats for analyses. Critically, control and stressed rats did not differ in their rich:lean reward ratios (mean ± SD = 3.52 ± 0.75 vs. 3.28 ± 1.07; t(37) = 0.84, p > 0.40), indicating that they were exposed to a similar reinforcement schedule during the test procedures.

Response bias

The Stress × Block ANCOVA revealed a significant main effect of Stress (F 1,36 = 4.163, p < 0.05), indicating that social defeat decreased response bias compared to the no stress condition (Fig. 1a). The ANCOVA also revealed a significant Stress × Block interaction (F 2,72 = 3.278, p < 0.05), owing to the fact that stressed rats had significantly lower reward learning (ΔRB) relative to control rats (−0.004 ± 0.07 (mean ± SEM) vs. 0.20 ± 0.07; t 37 = 2.07, p < 0.05; Cohen’s d = 0.66; Fig. 1b). Post hoc analyses revealed that response bias was significantly lower in stressed compared to control rats during blocks 2 and 3 (p < 0.05; Cohen’s d value = 0.74 and 0.66, respectively). Further highlighting the robust effects of stress on the PRT, 16 of the 20 control rats (binomial p(16/20) = 0.0046), but only 7 of the 19 stressed rats (binomial p(7/19), ns), showed a reward learning score (ΔRB) greater than zero (Fisher’s exact test: p = 0.0065).

Fig. 1
figure 1

Behavioral effects of social defeat or no stress in the rat PRT. a Response bias was blunted after social defeat compared to no stress, reflecting stress-induced impairment of reward learning (b). c Importantly, social defeat did not affect discriminability during the task. Accuracy (d) and reaction time (e) for rich and lean stimuli were differentially affected between socially defeated and non-stressed rats. *p < 0.05; **p < 0.01

Discriminability

The Stress × Block ANCOVA revealed a significant main effect of Block (F 2,72 = 6.646, p < 0.01). Post hoc analyses revealed that discriminability was significantly lower during block 3 compared to block 2 (p < 0.01) (Fig. 1c). Importantly, there was no effect of Stress on discriminability, indicating that response bias results were not confounded by differential task difficulty across groups.

Accuracy

The Stress × Block × Stimulus ANCOVA revealed a significant main effect of Block (F 2,72 = 6.152, p < 0.01), a significant main effect of Stimulus (F 1,36 = 4.565, p < 0.05), as well as a significant Block × Stimulus interaction (F 2,72 = 3.253, p < 0.05). Although the Stress × Block × Stimulus interaction approached significance (F 2,72 = 2.629, p = 0.079), we tested our a priori hypothesis based on previously published results (Der-Avakian et al. 2013; Pergadia et al. 2014) that control, but not stressed, rats would be more accurate for the rich vs. lean stimuli across blocks. Post hoc analyses revealed that control rats were significantly more accurate when responding for rich vs. lean stimuli during blocks 2 and 3 (p < 0.01). Conversely, stressed rats were similarly accurate when responding for rich and lean stimuli across all blocks (Fig. 1d).

Reaction time

The Stress × Block × Stimulus ANCOVA revealed a significant main effect of Stress (F 1,36 = 6.579, p < 0.05), a significant main effect of Block (F 2,72 = 6.453, p < 0.01), a significant main effect of Stimulus (F 1,36 = 5.314, p < 0.05), and a significant Block × Stimulus interaction (F 2,72 = 3.953, p < 0.05). Although the Stress × Stimulus interaction approached significance (F 1,36 = 2.979, p = 0.093), we tested our a priori hypothesis based on previously published results (Der-Avakian et al. 2013) that control, but not stressed, rats would respond quicker for the rich vs. lean stimuli. Post hoc analyses revealed that control rats had significantly shorter reaction times when responding for rich vs. lean stimuli (p < 0.01). Conversely, reaction times were similar when responding for rich vs. lean stimuli in stressed rats (Fig. 1e). The main effect of Stress indicates that overall, rats exposed to social defeat were slower to respond to either stimulus compared to control rats.

qRT-PCR results

Of the 39 rats that were behaviorally tested in two cohorts, the brains from the second cohort were processed and analyzed using qRT-PCR. The brains from eight rats whose behavioral data were excluded from analyses (see above) were also excluded from qRT-PCR analysis, leaving 10 control and 11 stressed rats. Social defeat significantly increased N/OFQ peptide mRNA levels in the NAc shell compared to non-stressed rats (t 19 = 2.948, p = 0.008, Cohen’s d = 1.29; Fig. 2a). There was also a trend toward similar increases in N/OFQ peptide mRNA levels in the NAc core in stressed vs. non-stressed rats (t 19 = 2.037, p = 0.056, Cohen’s d = 0.89; Fig. 2a). Social defeat significantly decreased Fos mRNA expression in the VTA compared to non-stressed rats (t 19 = −3.035, p = 0.007, Cohen’s d = −1.33; Fig. 2b). No other statistically significant effects were observed.

Fig. 2
figure 2

Gene expression after social defeat or no stress. a N/OFQ peptide mRNA was increased in the NAc shell (p < 0.01) and NAc core (p = 0.056) after social defeat. b Fos mRNA was decreased in the VTA after social defeat (p < 0.01). **p < 0.01; different from no stress

Correlations between behavioral and qRT-PCR data

In rats exposed to social defeat, Pearson’s r tests revealed significant negative correlations between reward learning (ΔRB) and (1) N/OFQ peptide mRNA levels in Cg1 (r 9 = −0.628, p < 0.05), (2) pro-dynorphin mRNA levels in NAc shell (r 9 = −0.656, p < 0.05), (3) pro-dynorphin mRNA levels in NAc core (r 9 = −0.684, p < 0.05), and (4) PAC1 mRNA levels in the NAc core (r 9 = −0.649, p < 0.05) (Fig. 3a). Across all the rats (i.e., stressed and control rats combined), Pearson’s r tests revealed significant negative correlations between reward learning (ΔRB) and (1) N/OFQ peptide mRNA levels in Cg1 (r 19 = −0.632, p < 0.01), (2) NOP mRNA levels in the VTA (r 19 = −0.564, p < 0.01), and (3) CREB mRNA levels in the NAc shell (r 19 = −0.552, p < 0.01) (Fig. 3b).

Fig. 3
figure 3

Correlations between reward learning and gene expression. a In rats exposed to social defeat, reward learning (ΔRB) was negatively associated with N/OFQ peptide mRNA levels in Cg1, pro-dynorphin mRNA levels in NAc shell and core, and adenylate cyclase-activating polypeptide 1 receptor type 1 mRNA levels in NAc core. b In all rats (i.e., stressed and control rats combined), reward learning (ΔRB) was negatively associated with N/OFQ peptide mRNA levels in Cg1, NOP receptor mRNA levels in the VTA, and CREB mRNA levels in NAc shell. *p < 0.05; **p < 0.01

Discussion

After exposure to social defeat, rats failed to develop a response bias toward a more frequently rewarded stimulus, reflecting disrupted reward learning. The lack of response bias in stressed rats was manifested as similar accuracy for both stimuli despite the fact that one stimulus was probabilistically reinforced more frequently than the other stimulus. In contrast, non-stressed rats showed greater accuracy when responding to the more frequently reinforced rich stimulus compared to the less frequently reinforced lean stimulus, resulting in the development of a response bias. Control rats were also faster to respond for the rich stimulus compared to the lean stimulus, whereas reaction times for the two stimuli did not differ in stressed rats. Importantly, the differences in response bias, accuracy, and reaction time between stressed and non-stressed rats were not a function of the ability to discriminate between the two stimuli, considering that discriminability was similar between both groups of rats throughout the entire test session. These results suggest that stress blunted the ability to learn from prior reinforcement experiences and to adapt accordingly.

The effects of social defeat on reward learning in the present study in rats are similar to the previously published effects of stress on reward learning in humans using an analogous PRT. Various experimental (e.g., threat of shock, negative performance feedback) and naturalistic (e.g., high perceived life stress) stress conditions blunted response bias in healthy participants without a history of psychiatric diagnosis (Bogdan et al. 2010; Bogdan and Pizzagalli 2006; Bogdan et al. 2011; Pizzagalli et al. 2007), but especially among healthy controls carrying genetic variants previously linked to increased risk for MDD and/or stress reactivity (Bogdan et al. 2010; Bogdan et al. 2011; Nikolova et al. 2012). In both rats and humans, discriminability did not differ between stressed and non-stressed groups, indicating that the differences in response bias were not due to any potential stress-induced performance differences. We previously verified that the tone durations used in the rat version of the task (0.9 and 1.6 s) are ambiguous, reflected by decreased accuracy (~60–70%) in correctly identifying these tones compared to the training sessions using more distinct tone durations (~80% accuracy) (Der-Avakian et al. 2013). Such ambiguity is important for allowing the development of a response bias in control rats when the asymmetric reinforcement schedule is introduced during the test trials. Additionally, the rat version of the task was designed using auditory tones to take advantage of the greater auditory vs. visual acuity of albino rats. Because both versions of the task were designed to assess learning based on positive reinforcement, it is unlikely that the differences in sensory modalities used for stimulus detection would differentially affect behavior between species.

Critically, the stress-induced reduction in response bias mimics the pattern of responses observed under baseline (no stress) conditions in individuals with MDD (Liu et al. 2011; Pizzagalli et al. 2008b; Vrieze et al. 2013) who were tested in the human PRT. Additionally, as in individuals with MDD (Pizzagalli et al. 2008b), stressed rats had significantly higher reaction times compared to controls. While there is debate as to whether the social defeat procedure in rodents constitutes a valid model of stress-induced MDD in humans (Hollis and Kabbaj 2014; Rygula et al. 2008; Venzala et al. 2012), the data presented here provide direct cross-species evidence that social defeat produces deficits in at least some behaviors related to symptoms of MDD.

In addition to disrupting reward learning, exposure to social defeat also significantly increased mRNA levels of N/OFQ peptide in the NAc shell, while stress-induced increase in N/OFQ mRNA levels approached significance in the NAc core. Furthermore, in stressed rats, decreased reward learning was associated with increased N/OFQ mRNA levels in Cg1 and increased pro-dynorphin mRNA levels in the NAc shell and core. In all the rats (stressed and controls), decreased reward learning was associated with increased N/OFQ mRNA levels in Cg1 and increased N/OFQ peptide (NOP) receptor mRNA levels in the VTA, suggesting that N/OFQ signaling may also contribute to natural variability in reward learning in non-stressed rats. N/OFQ is a peptide that binds to the NOP receptor, both of which have high sequence homology with dynorphin and the kappa opioid receptor, respectively (Witkin et al. 2014), raising the possibility that parallel stress-related adaptations between N/OFQ and dynorphin systems in the ventral striatum may contribute to stress-induced reward learning deficits.

N/OFQ was initially thought to be involved in pain processing, but its role in stress and depression has recently gained considerable attention. For example, restraint stress increased N/OFQ expression in the hippocampus, an effect thought to be mediated by stress-induced glucocorticoids (Nativio et al. 2012). Additionally, central administration of N/OFQ activated the hypothalamic-pituitary-adrenal (HPA) axis (Devine et al. 2001). Consistent with these data, administration of an NOP receptor antagonist produced antidepressant-like effects in the rodent forced swim and tail suspension tests (Gavioli et al. 2003; Gavioli et al. 2004; Redrobe et al. 2002; Rizzi et al. 2007) and also reversed stress-induced decreases in sucrose preference in rats, a putative measure of anhedonia (Vitale et al. 2009). N/OFQ knockout mice also displayed an antidepressant-like response in the forced swim test (Gavioli et al. 2003). Interestingly, patients with MDD or bipolar disorder have high circulating plasma levels of N/OFQ (Gu et al. 2003; Wang et al. 2009), and, most recently, it was reported that chronic administration of a NOP receptor antagonist in patients with MDD provided some antidepressant effects (Post et al. 2016). Our current data further support the notion that increases in the tone of brain N/OFQ systems may contribute to the expression of stress-related mood disorders.

The role of N/OFQ specifically on stress-induced impairment of reward learning is less clear, but anatomical studies provide some insight into brain mechanisms involved in this process. Brain autoradiography studies in rodents have revealed a very high concentration of NOP receptors in the frontal cortex, and there are particularly high levels of NOP receptors in the cingulate cortex (Neal et al. 1999; Sim and Childers 1997; Sim et al. 1996). Consistent with these rodent findings, human positron emission tomography (PET) and post-mortem autoradiography studies have also reported high distribution of NOP receptors in the cingulate cortex and striatum (Berthele et al. 2003; Lohith et al. 2014; Lohith et al. 2012). Moreover, NOP receptors are localized on dopaminergic nuclei in the VTA, and administration of a NOP receptor agonist inhibited dopamine neurotransmission in the VTA and NAc (Koizumi et al. 2004a; Koizumi et al. 2004b; Murphy et al. 1996; Murphy and Maidment 1999; Murphy et al. 2004). In our study, social defeat decreased Fos mRNA expression, a putative marker of neuronal activation, in the VTA. While the mechanism of this effect is not clear, decreased VTA activity may be an indirect consequence of increases in NAc N/OFQ signaling, and reflect general reductions in the function of the mesolimbic dopamine system and depressive-like effects (Nestler and Carlezon 2006).

Reward learning is thought to involve communication between the VTA, NAc, and ACC/Cg1. In particular, the NAc encodes for reward prediction errors by recognizing whether expected rewards are presented or withheld (Glimcher 2011; Maia and Frank 2011). As described above, in the PRT, expected rewards are more often presented during correct identification of rich stimuli and more often withheld during correct identification of lean stimuli. One mechanism by which stress could disrupt the normal processing of reward prediction errors may be via a stress-induced increase in N/OFQ signaling from the NAc to the VTA, which would act at NOP receptors in the VTA to decrease mesoaccumbal dopamine signaling. Furthermore, dorsal ACC neurons encode previous reward outcomes that guide future decisions (Seo and Lee 2007). The PRT requires healthy participants/subjects to encode the probabilistic reward outcomes of rich vs. lean trials, which then should guide future decisions that lead to greater accuracy in identifying rich vs. lean stimuli (i.e., development of a response bias). Thus, another possible mechanism by which stress may disrupt this process is through increased N/OFQ signaling from the NAc to the ACC/Cg1. Decreasing this signaling (e.g., with a NOP receptor antagonist) during stress may prevent the disrupted VTA and/or ACC/Cg1 activity that is associated with impaired reward learning (Bogdan et al. 2011; Bush et al. 2002; Santesso et al. 2008; Santesso et al. 2009). Further studies are required to test these hypotheses. Additionally, future studies are needed to confirm that these effects translate to changes in protein expression, as concurrent analysis of several other genes in these four brain areas may raise concerns about false positive results.

In addition to the N/OFQ and NOP findings, decreased reward learning was associated with increased PAC1 mRNA levels in the NAc core of stressed rats and increased CREB mRNA levels in the NAc shell of all (stressed and control) rats. Consistent with our results, stress has been shown to increase brain PAC1 expression in rats (Lezak et al. 2014), and treatment with a PAC1 antagonist during stress exposure prevents several of the behavioral consequences of stress (Roman et al. 2014). Additionally, increased CREB expression in the NAc shell has been shown to increase brain stimulation reward thresholds in the intracranial self-stimulation procedure in rats (Muschamp et al. 2011), similar to the effects of social defeat (Der-Avakian et al. 2014; Donahue et al. 2014). Conversely, environmental enrichment, which is associated with a reduction in depression-like behaviors, decreases CREB activity in the NAc (Green et al. 2010), which produces antidepressant and antistress-like effects (Carlezon and Krystal 2016; Pliakas et al. 2001). Altogether, the findings from our study are consistent with previously published data and suggest several molecular mechanisms that contribute to stress-related disruption of reward learning.

One important implication of the current findings is that complementary studies may be conducted in parallel between rats and humans. As an example, we developed a preclinical analog of the human PRT and showed here that reward learning is disrupted by stress in rats, exactly as in humans. The additional data indicating that N/OFQ may represent a novel mechanism mediating this effect can be explored in more detail in future studies. If the results are replicated, the same hypotheses may be tested in humans to determine if targeting N/OFQ signaling may provide relief from stress-induced disruption of reward learning in MDD and other stress-related mood disorders. Dr. Athina Markou envisioned that this type of cross-species translational research will strengthen the validity of preclinical behavioral assessments and advance the discovery of novel therapeutic targets for the treatment of several psychiatric disorders. Her vision, perseverance, and foresight are a continued source of inspiration in efforts to optimally align preclinical and clinical research efforts in the context of neuropsychiatric illness.