Introduction

Adaptive reward-directed behaviour is dependent on several inter-dependent processes that bring the organism from an appetitive to a consummatory relationship with the primary reward stimulus. As indicated by the research domain criteria framework (RDoC)1,2, reward or positive-valence processing comprises a number of inter-dependent constructs, such as responsiveness (e.g. anticipation/expectancy, salience, satiation), learning (e.g. stimulus association, reinforcement, prediction error) and valuation (e.g. predictability, delay, effort). In stress-related neuropsychiatric disorders, including major depressive disorder (MDD) and schizophrenia (SZ), pathologies of reward processing are common. In MDD and SZ these include the syndromes of anhedonia (markedly reduced interest or pleasure in daily activities) and apathy (diminished motivation for physical or cognitive goal-directed behaviour and/or diminished emotional reactivity); both are syndromes of amotivation, and are often co-morbid in both MDD and SZ3,4,5. Identifying their specific contributory processes and underlying neural circuits, and then their etio-pathophysiology, is key to much needed improved treatments. Functional imaging (fMRI) studies have compared MDD patients with healthy controls in terms of event-related changes in local BOLD signal, with several using the monetary incentive delay task which allows for separate assessment of BOLD signal during reward anticipation and then reinforcement6. Some such studies report reduced BOLD signal during reward anticipation but no difference at reinforcement, in the ventral striatum, which includes the nucleus accumbens (NAc)7,8,9.

The RDoC positive-valence processes overlap extensively with those proposed to account for appetitive-to-consummatory goal-directed behaviour in animals10. Incentive motivation refers to the activation and reinforcement of goal-directed behaviour by reward stimuli per se as well as stimuli that predict them, and has clear overlap with reward interest and anticipation11,12. Animal studies are essential for elucidation of the neural circuitry of specific reward processes. The mesolimbic DA neurons in the ventral tegmental area (VTA) that send long-range projections to the GABA medium spiny neurons (MSNs) in the NAc constitute a critical pathway in the neural circuitry of reward processing. The NAc MSNs express either the excitatory (Gs) protein-coupled receptor, DA receptor 1 (D1R), or the inhibitory (Gi) protein-coupled receptor, D2R13. Nucleus accumbens MSNs encode primary reward stimuli, conditioned and discriminative stimuli that predict reward, and incentive-motivated behaviour including reward approach and operant responses; VTA-NAc DA signalling is integral to these processes10,13,14,15. Recently, the development of genetically encoded G-protein-coupled receptor (GPCR)-activation-based sensors for DA (GRABDA) has enabled the in vivo imaging of DA release with high spatial and temporal resolution. It is possible to measure changes in region-specific extracellular DA activity coincident with specific reward events/processes at intervals of ≤0.1 s16,17,18. Studies to date include demonstrations in mice that onset of sucrose consummation or interactions with socio-sexual stimuli co-occur with transient increase in NAc DA activity17,19. In rats, a sequential operant task required responding to discriminative cues with nose-poke behaviour to trigger food release; transient increases in NAc DA activity occurred in response to cues, and directly prior to nose-poking at each of trial initiation and reward retrieval/reinforcement20.

Animal models are also essential for detailed study of causal inter-relationships between chronic stress, deficits in specific reward processes, and associated/mediating changes in neural circuitry4,5,21. A substantial number of rodent studies have combined chronic unpredictable mild stress (CUMS), comprising exposure to stressors such as 18 h water deprivation and 1 h physical confinement on an unpredictable schedule for several weeks, with a sucrose (or saccharin) versus water preference test, where CUMS leads to reduced sucrose/saccharin preference22. Whilst this model has good reproducibility, it is challenging to equate reduced sucrose preference with a specific human reward pathology: it clearly involves reward consummation, whereas pleasure in response to sweet tastes is intact in MDD23. Concerning underlying neural changes, in rats, CUMS led to reduced self-stimulation of the VTA24; in mice, CUMS led to decreased frequency of burst firing events and number of spikes per burst in VTA neurons, and photoactivation of VTA DA neurons reversed reduced sucrose preference in CUMS mice25. A 10-day chronic social defeat (CSD) procedure resulted in a lower sucrose preference than in control mice, an effect specific to a “susceptible” subgroup that displayed passive avoidance of the aggressor mouse strain in a social interaction test26. In male mice, chronic (15-day) social stress (CSS), comprising a short daily placement in the cage of an unfamiliar, dominant and aggressive resident male mouse followed by continuous distal exposure, leads to deficits in reward processing: tone cue-motivated sucrose responding in a discriminative reward learning-memory test is reduced, as is operant responding in a reward-to-effort valuation test27,28,29,30,31. Interestingly, CSS does not lead to reduced saccharin preference32. Relative to controls, CSS mice have reduced DA turnover in the NAc32. In the current and previous experiments (e.g.30,31,33), the inter-individual variation in both behavioural and neurobiological measures is similar when comparing the control and CSS groups. Accordingly, it would not be justifiable to divide the CSS mice, specifically, into subgroups such as “susceptible” and “resilient”, as is commonly done in the case of mice that have undergone CSD followed by testing social interaction (e.g.26,34). Indeed, it has been demonstrated that mice already display these susceptible versus resilient behaviours as traits in the social interaction test before CSD, indicating the caution that needs to be exercised when identifying subgroups, in order to not mix traits and states35.

In the current study, NAc GRABDA sensor fibre photometry was integrated into the mouse CSS-reward deficit model for detailed assessment of: (1) in control mice, changes in NAc DA activity related to specific reward processes; (2) in CSS mice, changes in NAc DA activity associated with and potentially contributing to deficits in reward learning and motivation. (3) In addition, a population-level analysis of CSS effects on the transcriptome of VTA DA neurons was conducted. Whilst control mice demonstrated distinct increases in NAc DA activity during reward anticipation/expectancy and reward reinforcement, in CSS mice specifically the former were attenuated, analogous to the fMRI findings in human stress-related disorders. The transcriptome evidence indicated that this CSS deficit was not related to fundamental changes in the status of VTA DA neurons, such that the basis of deficient reward anticipation-specific NAc DA signalling might be due to a specific sub-population of these neurons or is located elsewhere in the neural circuitry of reward processing.

Results

Reduced tone-sucrose discriminative learning co-occurs with attenuated tone and tone+sucrose related NAc DA activity in CSS mice

Mice underwent conditioning (training) for tests of discriminative reward learning-memory (DRLM) and reward-to-effort valuation (REV) with sucrose pellet reinforcement (Fig. 1A). This was followed by unilateral stereotactic surgery in the NAc (bregma 1.1 mm, core, primarily, and shell) for injection of AAV vector expressing GRABDA sensor and placement of an optic fibre (Fig. 1H, supplementary Fig. 1A, B). Mice then underwent CSS (n = 20) (Fig. 1B) or control handling (CON, n = 14). In CSS, the mouse is placed in the cage of a dominant-aggressive mouse for 30–60 s of proximal attack, followed by physical separation and continuous distal exposure for the next 24 h, repeated with different resident mice for 15 days. Mean duration of daily attack experienced by CSS mice was 50.6 ± 4.8 s and all CSS mice were submissive during proximal exposure. The CON mice remained in littermate pairs and were handled daily. The day after CSS/CON, the NAc DA signal was checked in the photometry-behaviour test chamber, and testing began on the next day. Across the testing period, in order that chocolate-sucrose pellets provided reinforcement as gustatory reward and not hunger satiety, mice received sufficient normal diet in the home cage to maintain body weight close to baseline (95–100%; supplementary Table S1); as expected, CSS mice required more normal diet than did control mice (for further details, see Methods).

Fig. 1: Effects of chronic social stress on behaviour and NAc DA activity in the discriminative reward learning-memory test.
figure 1

A Experimental design. BBW + FC: measurement of baseline body weight and food consumption during handling; 90-95% BBW: conditioning under food restriction that reduced BW to 90-95% BBW; Surgery + R + AAVV: stereotactic surgery, recovery, expression of AAV vector; C + OF: conditioning sessions with patch cord attached to optic fibre; CSS/CON: CSS protocol or control handling; re-BBW + FC: BW and food consumption under ad libitum feeding on days 5–12 of CSS/CON provided re-BBW values; 95-100% re-BBW: mice were mildly food restricted to be tested at 95–100% re-BBW; SIG: fibre photometry signal test; DRLM: discriminative reward-learning memory test; REV: reward-to-effort valuation test; BP: brains were perfused-fixed for histology. B CSS mice were placed in the cage of a dominant-aggressive CD-1 mouse to receive 30-60 s physical attack followed by 24 h sensory exposure through a divider; this was repeated with a different CD-1 mouse on each of 15 days. CON mice were kept in littermate pairs and were handled for 1 min on each of 15 days. C Schematic of discriminative reward learning-memory test with fibre photometric recording. Tone discriminative stimulus (DS) signalled chocolate-sucrose pellet (gustatory reward) availability following a feeder response; maximum DS duration was 25 s per trial and inter-trial intervals (ITIs) were 20–60 s (mean = 40 s). Mice received 3 daily tests of 25 trials each and fibre photometry data are presented for tests 1 and 3. Behaviour: D Number of gustatory rewards obtained, i.e. DS trials with a response, across tests (left) and individual mean scores (right). Group main effect: F(1, 31) = 34.74, p < 0.0001. E DS response latency i.e. from DS onset to time of response, with 25 s assigned to no-response trials. Group main effect: F(1, 31) = 31.39, p < 0.0001. F ITI response interval i.e. average latency between successive responses in ITIs. Group main effect: F(1, 31) = 17.69, p = 0.0002. G Learning ratio (mean ITI response interval/DS response latency), across tests (left) and individual mean scores (right). Group main effect: F(1, 31) = 20.78, p < 0.0001; Test main effect: F(2, 62) = 5.24, p < 0.008. Data are given as group mean+s.e.m. and individual data points. Statistical analysis was conducted using 2-way mixed-model ANOVA with between subjects factor of group and within-subjects factor of test. Test days indicated by different letters were significantly different from each other in Tukey’s multiple comparisons test. NAc DA activity: H Schematic showing unilateral injection site of AAV GRAB-DA Sensor in NAc, and fibre optic probe implantation directly dorsal to the injection site. I Representative traces from individual CON and CSS mice of z-scored NAc DA activity during 2 consecutive trials in each of tests 1 and 3. For each of the 2 trials per trace, z-scores were calculated using the trial-specific baseline. J Baseline phase DA activity expressed as ΔF/F across the 10 s before DS onset. For each mouse and time point, the mean score for 25 trials was calculated and data are given as mean±s.e.m. per group and test. K DS-on phase z-scored DA activity, for trials with a DS response, following time-normalisation using 10 equal intervals, across intervals (left) and individual mean scores (right). Group main effect: F(1, 31) = 12.12, p < 0.002; Interval main effect: F(9, 279) = 4.79, p < 0.002. L DS-feeder phase z-scored DA activity, for trials with a DS response, divided into 10 intervals of 0.5 s, across intervals (left) and individual mean scores (right). Group x Test x Second interaction effect: F(9, 279) = 5.67, p < 0.0001). Asterisks indicate CSS < CON in test 3: *p < 0.05, **p < 0.01, ***p < 0.001. M, N Scatterplots showing mean DS-feeder phase z-scored DA activity versus trial number in (M) test 1 and (N) test 3. Statistical analysis was conducted using linear regression and significance of the regression was assessed using ANOVA. O ITI feeder response z-scored DA activity, specifically the 1st feeder response per ITI, from 2 s pre- to 5 s post-feeder response. Test x Second interaction effect: (F(13, 403) = 2.78, p < 0.01). For (JL, O), statistical analysis was conducted using 3-way mixed-model ANOVA with between subjects factor of group and within-subjects factors of test and interval. Time-specific group effects are for Sidak’s multiple comparisons test; intervals indicated by different letters were significantly different from each other in Tukey’s multiple comparisons test. Images (B, C, H) were created with BioRender.com.

The DRLM test was applied on 3 consecutive days with 25 trials per test (Fig. 1C): an initially neutral tone discriminative stimulus (DS) indicated the period (maximum 25 s per trial) within which a nose-poke response in the feeder port triggered reward delivery, with a delay of 0.3–0.5 s, and DS termination after 1 s. Such trials were separated by variable inter-trial intervals (ITIs: mean 40 s, range 20–60 s) when responses were counted but without consequence. Decreased DS response latency relative to ITI average interval between consecutive responses provides a measure of discriminative reward learning (learning ratio: average ITI response interval/DS response latency). As in previous experiments (e.g.29,30,31), CSS mice made fewer DS-coincident feeder responses, and therefore obtained fewer rewards, than CON mice (Fig. 1D). They had longer DS response latencies than CON mice (Fig. 1E), and also longer ITI response intervals (Fig. 1F). The learning ratio was close to 1 in CON and CSS mice in test 1 when the DS was largely neutral; in tests 2 and 3 the learning ratio increased in CON mice, primarily due to decreased DS response latency, whereas in CSS mice it remained close to 1, with DS response latency unchanged (Fig. 1G). Across the 3 tests, CSS mice made moderately fewer feeder responses (52 ± 9, mean ± SD) than CON mice (70 ± 8), and were therefore less exposed to the DS-reward contingency.

In these mice, measurement of event-related bulk GRABDA sensor activity, dependent on DA release and binding in NAc, was conducted for each DRLM trial as follows: Across the 10 s prior to DS onset (F0), per 0.05 s time point, changes in DA activity (ΔF) relative to overall mean DA activity were similar in mice across groups and tests (Fig. 1J); therefore, F0 was used as the baseline against which to assess event-related DA activity. During these baseline periods, CSS mice performed fewer (non-rewarded) feeder-port responses than did CON mice (supplementary Fig. 2A). From DS onset, per 0.05 s time point (t), event-related NAc DA activity (F) was z-scored using F0 and its standard deviation (SD0) i.e. ((F(t)-F0)/SD0). Representative examples of z-scored NAc DA signal from individual mice across two successive trials are shown in Fig. 1I. For CON and CSS groups, event-related NAc DA activity is shown for DRLM tests 1 and 3; data for each mouse are derived exclusively from trials in which it made a DS feeder response and therefore was reinforced (Fig. 1K–N) (Supplementary Data 1, 2). The latency from DS onset to feeder response (Fig. 1E) is referred to as the DS-on phase; because these durations were variable, data were normalised and divided into 10 equal intervals. In CON mice, across trials 1-25, whilst DA activity remained close to baseline, it did increase monotonically and was highest towards the end of the DS, i.e. directly prior to feeder-port responding, and similarly so in tests 1 and 3. DS-on phase NAc DA activity remained at or close to baseline in CSS mice and was lower than in CON mice, and similarly so in tests 1 and 3 (Fig. 1K). In non-response trials, NAc DA activity increased towards the end of the 25-s DS in test 3 in CON mice, whereas it remained at baseline in CSS mice (supplementary Fig. 2C). Trials with a DS feeder-port response progressed to the DS-feeder phase, which had a duration of 5 s, divided into 0.5-s intervals, that included reward delivery-retrieval and consumption. The same trial-specific F0 and SD0 values were used for z-scoring, with scores averaged per 0.5 s (Fig. 1L, supplementary Fig. 2B). In CON mice: at test 1, NAc DA activity increased beginning at 1 s after the feeder response coincident with reward delivery-retrieval and DS offset; at test 3, there was an initial peak coincident with the feeder response at 0.5 s and a larger peak at 1 s, which was also larger than the activity peak at test 1. In CSS mice: at test 1, NAc DA activity was similar to CON mice except that post-peak activity decreased sooner; at test 3, activity was similar to test 1 and therefore low relative to CON mice. Concerning stability of DS-feeder phase DA activity across consecutive trials, at test 1 there was a decrease in CON mice (Fig. 1M) and at test 3 there was a decrease in CON and CSS mice (Fig. 1N). Confirmation that DS-feeder phase DA activity was reward-related, i.e. related to DS and/or sucrose, is provided by comparison with ITI feeder responses: activity increased slightly at ITI feeder responding but otherwise remained at/near baseline (Fig. 1O); activity also remained at baseline in the post-DS phase of non-response trials (supplementary Fig. 2D). Further confirmation that GRABDA fluorescence-signal changes were indicative of NAc DA activity and not artefacts caused by, for example, head movements, was provided by negative-control mice expressing NAc EGFP (supplementary Fig. 1C, D): whilst these mice behaved similarly to GRABDA mice in the DRLM test, they did not display any change from baseline signal activity at any test phase (supplementary Fig. 4).

The integrated behavioural and NAc DA activity data for CON mice are consistent with acquisition of the causal association between DS and reinforcement of feeder-port responding: NAc DA activity increased slightly as CON mice approached the feeder and, by test 3, increased transiently coincident with DS-feeder responding and markedly coincident with DS-sucrose reinforcement. In comparison, by test 3 CSS mice displayed slower DS-on phase responding and reduced DS-feeder response-reward learning; these effects suggest lower DS-mediated reward anticipation and co-occurred with and were possibly caused by attenuated DS-related NAc DA activity.

Reduced effortful motivation for tone-sucrose reinforcement co-occurs with attenuated tone and normal sucrose-related NAc DA activity in CSS mice

In the same mice (Fig. 2A), an operant nose-poke port was added to the test chamber and the REV test was applied on three consecutive days (Fig. 2B): reinforcement was now dependent on operant responding at the port, and a progressive ratio (PR) was used so that required effort increased across successive trials. Attaining the required number of responses triggered a 1-s tone DS that signalled reward delivery into the feeder, such that mice could leave the operant port, approach the feeder, and retrieve the sucrose reward. Test 1 was used to allow mice to adjust to the new test conditions following DRLM testing. The data for tests 2 and 3 were analysed; in test 3, a pellet of normal food was provided as a low-reward/low-effort choice to test for any CSS-CON mice differences in hunger. Both CON and CSS mice obtained more sucrose rewards in test 3 than 2, and the data are presented in Fig. 2 and supplementary Fig. 3, respectively.

Fig. 2: Effects of chronic social stress on behaviour and NAc DA activity in the reward-to-effort valuation test.
figure 2

A Experimental design. For definitions of abbreviations, see Fig. 1A. B Schematic of reward-to-effort valuation test with fibre photometric recording. Nosepoke responses at an operant stimulus triggered a tone discriminative stimulus (DS) and chocolate-sucrose pellet delivery on a progressive ratio (PR) schedule (5 trials at PR 1, 5 x PR 3, 5 x PR 5, 5 x PR 7, …); attaining the PR resulted in 1 s tone DS and pellet delivery, and the ITI was 5 s. Mice received 3 daily tests and in test 3 normal food was provided as a low-reward/low-effort choice. Fibre photometry data are presented for test 3 at PR 5. Behaviour: (C) Number of operant responses: t(31) = 5.56, p < 0.0001. D Number of gustatory rewards earned: t(31) = 6.48, p < 0.0001. E Final ratio attained: t(31) = 6.56, p < 0.0001. F Post-reinforcement pause, i.e. latency from end of ITI to 1st operant response of subsequent trial: t(31) = 3.60, p < 0.002. G Weight of normal food eaten during the test: t(31) = 0.36, p = 0.72. Data are given as individual data points and group means. Statistical analysis was conducted using unpaired t-tests. NAc DA activity and associated behavioural measures: (H) Representative traces from individual CON and CSS mice of z-scored NAc DA activity during PR 5 trials. For each of the 2 trials per trace, z-scores were calculated using the trial-specific baseline. I Operant phase duration i.e. time from 1st until 5th operant response: t(25) = 3.17, p = 0.004. J Operant phase z-scored DA activity following time normalisation using 10 equal intervals. For each mouse, the mean score for 5 trials at PR 5 was calculated and data are given as mean±s.e.m. per group. Interval main effect: F(9, 225) = 5.12, p < 0.0001. K DS phase duration i.e. time from DS onset until feeder response: t(25) = 1.72, p = 0.10. L DS phase z-scored DA activity following time normalizaion using 10 equal intervals. Group x Interval interaction effect: F(9, 225) = 2.47, p < 0.02; Group main effect: F(1, 25) = 14.30, p = 0.0009. M Feeder phase z-scored DA activity divided into 10 intervals of 0.5 s. Group x Second interaction effect: F(9, 225) = 2.27, p < 0.02. N ITI feeder response z-scored DA activity, specifically the 1st feeder response per ITI, from 2 s pre- to 5 s post-feeder response. O In CON mice, comparison of z-scored DA activity during the DS phase at PR 3, 5, 7 and 9. Interval main effect: F(9, 450) = 19.07, p < 0.0001. P In CSS mice, comparison of z-scored DA activity during the DS phase at PR 3, 5, and 7. Statistical analysis was conducted using 2-way mixed-model ANOVA with between subjects factor of group and within-subjects factor of interval. Intervals indicated by different letters were significantly different in Tukey’s multiple comparisons. Image (B) was created with BioRender.com.

At REV test 3, compared with CON mice, CSS mice made fewer operant responses (Fig. 2C), consequently earned fewer rewards (Fig. 2D), attained a lower final PR (Fig. 2E) and had a longer post-reinforcement pause (Fig. 2F). CON mice (0.1 ± 0.1 g) and CSS mice (0.1 ± 0.1 g) (p = 0.72) consumed a similar and low amount of normal diet (Fig. 2G), indicating that both groups were close to satiety regarding low-reward food. Similar to DRLM testing, trial-specific NAc DA activity during 10 s prior to onset of operant responding provided baseline activity for z-score analysis of each test phase, of which there were three per trial: operant phase, comprising 10 time-normalised intervals across the time period from first to last nose poke; DS phase, 10 time-normalised intervals from onset of 1-s DS to feeder response; feeder phase, from feeder response-reward retrieval until elapsing of 5 s, divided into 0.5-s intervals (Supplementary Data 3). All 14 CON mice and 13 of 19 CSS mice reached at least PR 5 (Fig. 2E) and this ratio was used to investigate DA activity. In the operant phase there was no consistent relationship between operant responses and DA activity, as indicated in the representative data from individual mice in Fig. 2H. The operant phase required longer in CSS than CON mice (Fig. 2I); there was a small increase in NAc DA activity coincident with operant response 1, after which activity was at baseline across the operant phase in CON and CSS mice (Fig. 2J). The DS phase was of a similar duration, 2-3 s, in CON and CSS mice (Fig. 2K); whilst both groups showed increased NAc DA activity, in several normalised time intervals the increase was lower in CSS than CON mice (Fig. 2L). Feeder-phase NAc DA activity was similar in CON and CSS mice: it peaked at 0.5 s in CON mice and at 1 s in CSS mice, followed by gradual decline to baseline (Fig. 2M). Confirmation that feeder phase DA activity was sucrose reward-related is provided by comparison with ITI feeder responses, during and after which activity remained at baseline (Fig. 2N). To investigate whether NAc DA activity was sensitive to the PR ratio (i.e. effort), the DS phase was compared at PR 3, 5, 7 and 9 in CON mice (Fig. 2O), and at PR 3, 5, and 7 in CSS mice (Fig. 2P): whilst activity was lower in CSS than CON mice at each PR value, there was no consistent change in DS-phase NAc DA activity related to increasing PR within either group.

At test 2, behavioural effects of CSS were similar to test 3 (supplementary Fig. 3A–F, H, J). For NAc DA activity analysis, we again used PR 5, although fewer CSS mice reached this PR compared with test 3. In the DS phase, mean NAc DA activity was lower in CSS than CON mice but not significantly (in part related to the smaller sample size; supplementary Fig. 3K). In the feeder phase, activity was lower in CSS than CON mice immediately after feeder responding (supplementary Fig. 3L). (It is noteworthy that in CON mice DS phase NAc DA activity was higher at test 3 versus 2, whilst feeder phase NAc DA activity was lower at test 3 versus 2; these shifts are consistent with DS-reward learning.) At ITI feeder responses, NAc DA activity decreased below baseline directly after the feeder response (supplementary Fig. 3M). Comparing DS-phase NAc DA activity at increasing PRs, as for test 3, activity was consistently lower in CSS than CON mice and there was no change in response to increasing effort within either group (supplementary Fig. S3N, O). As for the DRLM test, in negative-control mice expressing NAc EGFP, there was no significant change from baseline signal activity coincident with any phase of the single REV test that was conducted, indicating that the DA signal was not confounded by non-specific factors in experimental mice (supplementary Fig. 4).

The integrated behavioural and NAc activity data for CON mice are consistent with acquisition of the causal association between effortful operant responding and DS reinforcement: NAc DA activity in response to the DS was similarly high to that in response to sucrose. In comparison, CSS mice displayed slower operant responding; this suggests lower DS-mediated reward anticipation and co-occurred with and was possibly caused by attenuated DS-related NAc DA activity. In contrast, their NAc DA activity in response to sucrose was similar to that of CON mice.

Reduced motivation for sociosexual reinforcement precedes normal female-related behaviour and NAc DA activity in CSS mice

In a separate experiment, mice underwent conditioning (training) with sucrose pellets and then distal female mouse interaction, for a test of sociosexual motivation (SOM) (Fig. 3A). The conditioning/test chamber (Fig. 3B) was divided into two compartments by a wall that incorporated a sliding door and tunnel: the operant compartment contained an operant nose-poke port that was LED-illuminated when active; responding triggered opening of a sliding door that allowed access to the stimulus compartment via the short tunnel. Conditioning was followed by unilateral stereotactic surgery in the NAc (bregma 1.1 mm, core, primarily, and shell) for injection of AAV vector expressing GRABDA sensor and placement of an optic fibre. Mice then underwent CSS (n = 16) or CON (n = 16); mean duration of daily attack experienced by CSS mice was 47.7 ± 5.5 s and all CSS mice were submissive during proximal exposure. The day after CSS/CON, the NAc DA signal was checked in the test chamber. This was followed by placing a female in the test chamber for 10 min to provide the male with a first proximate exposure to sociosexual interaction. On each of the next 2 days (test days 1–2) mice were given a test session comprising 5 trials at fixed ratio (FR) 3, 5, 5, 5 and 5, respectively, with reinforcement in the form of 60-s distal interaction with a pro-(estrous) female under an inverted cup (Fig. 3B). After a 2-day interval, on each of the next 2 days (test days 3–4) mice were given a test session comprising 2 trials each at FR 10, with reinforcement in the form of 180-s proximal interaction with a pro-(estrous) female (Fig. 3E). On each test day. each trial was initiated by placing the mouse in the operant compartment. After the mouse completed the required FR, the sliding door opened immediately, and the mouse could enter the stimulus compartment via the tunnel. All trials on test days 1–4 included an operant phase (time from operant response 1 until the final operant response), and a post-operant phase: onset at time mouse first entered the tunnel separating the two compartments (after door opening) and offset after 5 s, divided into 1-s intervals. The DA signal across the entire operant phase was used to calculate baseline F0 and SD0 for the operant and post-operant phases. In proximal test trials on days 3–4 there was also a social phase: each social episode began with social approach and ended with social leave. Social episode onset was designated as t = 0 s, the DA signal in the 5 s prior to 0 s was used to calculate baseline F0 and SD0, and the peri-event signal was measured until t = 5 s, with data binned into 1-s intervals. Social episodes 1–5 were analysed per trial.

Fig. 3: Effects of chronic social stress on behaviour and NAc DA activity in the sociosexual motivation test.
figure 3

A Experimental design. BBW + FC: measurement of baseline body weight and food consumption during handling; 90–95% BBW: conditioning with sucrose as reinforcer under food restriction that reduced BW to 90-95% BBW; ♀D, conditioning with (distal) female under cup as reinforcer; Surgery + R + AAVV: stereotactic surgery, recovery, expression of AAV vector; OF: session in test chamber with patch cord attached to optic fibre; CSS/CON: CSS protocol or control handling; SIG♀: fibre photometry signal test with proximal exposure to female; ♀D: distal sociosexual motivation test; ♀P: proximal sociosexual motivation test; BP: brains were perfused-fixed for histology. B Schematic of distal sociosexual motivation test with fibre photometric recording. Nosepoke responses at an operant stimulus triggered door opening on a fixed ratio (FR) schedule (1 trial at FR 3, 4 trials at FR 5); attaining the FR resulted in immediate opening of a sliding door, so that the mouse could access the tunnel to the stimulus compartment. A (pro-)estrous female was placed under a pencil cup through which the male could interact with the female; the stimulus phase of each trial was 1 min. Mice received two daily tests (days 1 and 2). C Distal SOM test operant phase duration i.e. time from 1st until 3rd/5th operant response. Group x Day interaction effect: F(1, 842) = 8.49, p = 0.004). D Post-operant phase z-scored DA activity from 3 s prior to until 5 s after the mouse first entered the tunnel to the stimulus compartment, for day 1 and trial 1 (left) and day 1 and trial 5 (right). Group x Time interaction effect: F(3, 1340) = 3.15, p < 0.02. E Schematic of proximal sociosexual motivation test with fibre photometric recording. Nosepoke responses at an operant stimulus triggered door opening on a fixed ratio (FR) schedule (2 trials at FR 10); attaining the FR resulted in immediate opening of a sliding door, so that the mouse could access the tunnel to the stimulus compartment. A (pro-)estrous female was placed in the social compartment and the male and female could interact; this stimulus phase of each trail was 3 min. Mice received two daily tests (days 3 and 4). F Representative traces from a CON mouse (left) and a CSS mouse (right) of z-scored NAc DA activity during FR 10 trials. For each trial, z-scores were calculated using the trial-specific baseline. G Proximal SOM test operant phase duration i.e. time from 1st until 10th operant response. Group main effect: F(1, 28) = 4.56, p < 0.05. H Post-operant phase z-scored DA activity from 3 s prior to until 5 s after the mouse first entered the tunnel to the stimulus compartment, for day 3 and trial 1 (left) and day 3 and trial 2 (right). I Proximal SOM test percent of social phase spent in social episodes. Day main effect: F(1, 84) = 14.54, p < 0.001. J Social phase z-scored DA activity from 3 s prior to until 5 s after the onset of a social episode, for day 3, trial 1 and social episode 1 (left) and day 3, trial 1 and social episode 5 (right). Group x Social episode interaction effect: F(4, 2814) = 6.87, p < 0.001; CSS > CON in social episode 1: p < 0.001, Sidak’s multiple comparisons test. In (C, G and I), statistical analysis was conducted using a linear mixed model with fixed effects of group and day and random effect of subject. In (D and H), statistical analysis was conducted using a linear mixed model with fixed effects of group, day, trial and time and random effect of subject. In (J), statistical analysis was conducted using a linear mixed model with fixed effects of group, day, trial, social episode and time and random effect of subject. Post hoc comparisons were conducted using Sidak’s multiple comparisons test. Images (B) and (E) were created with BioRender.com.

In the distal tests on days 1-2 (Fig. 3B), CSS mice required longer to complete operant responding than did CON mice, particularly on day 2 (Fig. 3C). In this operant phase there was no consistent relationship between nose-pokes and NAc DA activity and no consistent change in NAc DA activity, as was also the case during the sucrose REV test. In the post-operant phase, NAc DA activity peaked at 1 s, which coincided with door opening and the mouse entering the tunnel to the stimulus compartment, and then declined and was consistent thereafter (Fig. 3D). NAc DA activity was similar on days 1 and 2; it was higher at trial 1 than at each of trials 2–5, across which it was consistent (Fig. 3D) (Supplementary Data 4). Whilst there was no significant effect of CSS on the NAc DA peak at 1 s, activity at 4 s was higher in CSS than in CON mice. We did not analyse peri-event NAc DA activity associated with interactions with the female under the cup. In the proximal tests on days 3–4 (Fig. 3E), CSS mice again required longer to complete operant responding than did CON mice (Fig. 3G). There was no consistent relationship between nose-poke responses and NAc DA activity, as exemplified by the representative data in Fig. 3F; there was also no consistent change in NAc DA activity following time-interval normalisation. In the post-operant phase (Fig. 3H), NAc DA activity peaked at 1-2 s and then decreased but remained above baseline across the 5 s. NAc DA activity was higher on day 3 than day 4 and higher at trial 1 than trial 2; there was also a trend to higher NAc DA activity in CSS compared with CON mice (Fig. 3H). In the subsequent 3-min social phase, the % time spent in social contact was higher on day 3 than day 4, higher at trial 1 than trial 2, and was similar in CSS and CON mice (Fig. 3I). The mean duration of each social episode was 12.7 ± 6.7 s (mean ± SD) in CON mice and 9.3 ± 3.3 s in CSS mice. With respect to copulation, in CON mice, 5/16 and 3/16 mice copulated with the female on at least 1 of the 2 trials on days 3 and 4, respectively, and in CSS mice, 3/14 and 7/14 mice copulated with the female on at least 1 of the 2 trials on days 3 and 4, respectively. Peri-event NAc DA activity was analysed for social episodes 1-5 of each trial (Fig. 3J; Supplementary Data 5): NAc DA activity peaked at 1–2 s and declined monotonically to 4 s. NAc DA activity was higher at day 3 than day 4, higher at trial 1 than trial 2, and higher at social episode 1 than at subsequent episodes and this was also the case for social episode 2. At social episode 1, NAc DA activity was higher in CSS compared with CON mice (Fig. 3J). Confirmation that fluorescence-signal changes were indicative of NAc DA activity was provided by negative-control mice expressing NAc EGFP: whilst these mice behaved similarly to GRABDA mice in the SOM test, they did not display any change from baseline signal activity at any test phase (supplementary Fig. 5).

The CSS mice displayed slower operant responding than did the CON mice. The absence of a discrete DS that signalled the completion of operant responding precludes analysis of whether the reduced operant motivation was associated with lower DS-mediated NAc DA activity, as proposed for the REV test. The NAc DA activity of CSS mice during female-related appetitive interaction (post-operant phase) and sociosexual interaction (social phase) was similar to (or even higher than) CON mice, as for sucrose.

Absence of CSS effect on transcriptome expression of ventral tegmental DA neurons

The differential effect of CSS on NAc DA activity responses to reward predictive cues versus reward per se indicates that the responsible CSS-induced changes in neural circuitry are specific to reward anticipation signalling and therefore complex. Nonetheless, as a first level of analysis, it is justifiable to investigate CSS effects on the VTA DA neurons, which constitute the major source of DA release onto NAc neurons. To do this, mice were injected in VTA with two viral vectors, each of which expressed a fluorescent protein, one under the control of the promoter for tyrosine hydroxylase (Th) to label DA neurons (EGFP+), and the other under the control of the promoter for glutamic acid decarboxylase 67 (Gad1) to label GABA interneurons (mScarlet-I+) (Fig. 4A–C). After recovery, mice underwent CSS (n = 6) or CON (n = 6) and after an interval of 3 days—to achieve uniformity with the neuro-behavioural experiments—were then euthanized and perfused with PBS for blood-free brain collection. From the frozen brains, coronal sections including the VTA were cut at 10 µm, mounted onto PET membrane slides and dehydrated-fixed. Using laser capture microdissection, samples (Ø = 35 µm) of EGFP+ tissue i.e., the putative cell bodies of DA neurons, were collected, whilst simultaneously avoiding any tissue samples that were also m-Scarlet-I+, i.e., overlapping putative cell bodies of GABA interneurons (Fig. 4D–F). Per mouse, n = 500 samples were collected, pooled and lysed, and RNA extraction and library preparation were followed by RNA sequencing.

Fig. 4: Absence of effect of chronic social stress on population-level transcriptome expression of VTA DA neurons.
figure 4

A Experimental design. Surgery + R + AAVV: stereotactic surgery, recovery, expression of AAV vectors; CSS/CON: CSS protocol or control handling; BP: brains were perfused with PBS; LCM: laser capture microdissection; RNA-Seq: RNA-sequencing and differential gene expression analysis. B Representative coronal image (20x) from brain of a mouse injected with AAV mTH-EGFP in the VTA at bregma −3.1 mm, and ex vivo immunostaining for TH. Both the AAV signal and the immuno-TH signal are concentrated in the VTA, whilst there is also immune-TH signal in the substantia nigra pars compacta (SNc). Scale bar = 500 µm. C Schematic showing bilateral injection site of AAV mTH-EGFP and AAV hGAD67-mScarlet-I in VTA. D Figure of coronal section from mouse atlas49 at bregma level −3.08 with VTA highlighted. E Representative coronal image (5x) from brain of a control mouse at bregma −3.1 mm pre- and post-LCM collection of EGFP+ tissue. Scale bar = 1000 µm. Inset: representative coronal image (20x), with white circles indicating areas of EGFP+ tissue demarcated for LCM. Scale bar = 100 µm. F Representative coronal image (20x) using the FITC channel to visualise EGFP+ tissue and the TRITC channel to visualise mScarlet-I+ tissue. 1 = EGFP+ tissue samples for collection; 2 = EGFP+/mScarlet-I+ tissue samples not for collection; 3 = mScarlet-I+ tissue samples not for collection. Scale bar = 100 µm. G Expression levels (transcript per million; individual and group mean values) of cell type-specific marker genes in CON and CSS mice: N = neuron, N-DA = DA neuron, N-Gl = glutamate neuron, N-GB = GABA interneuron, A = astrocyte, OL = oligodendrocyte, OP = oligodendrocyte progenitor cell, M = microglia. H Volcano plot for differential gene expression in CSS compared with CON mice: significantly up-regulated genes in CSS mice are shown in red and significantly down-regulated genes are shown in blue. Image (C) was created with BioRender.com. Image (D) was used with permission of Elsevier, from The Mouse Atlas, G. Paxinos & K.B.J Franklin, 2nd edition, 2001; permission conveyed through Copyright Clearance Center, Inc.

Firstly, to determine whether samples comprised primarily DA neuronal somata, expression levels of brain cell type-specific marker genes were compared (Fig. 4G): there was relatively high expression of neuronal gene Snap25 (synaptosomal associated protein 25) and DA-neuron gene Th, whilst expression levels of marker genes for GABA interneurons (Gad1, Gad2) and for all other cell types were low. After filtering out genes with low expression, a median of 12,250 genes were detected in all mice. Using the 500 genes with the highest between-mouse expression variability to conduct principal component analysis (PCA), PCA and subsequent heatmap analysis identified intermingling of CSS and CON mice rather than group-level separation. Indeed, conducting differential gene expression analysis at thresholds of absolute log2-fold change (FC) > 0.5 and nominal p < 0.001 identified that of the 12,250 detected genes, only 5 were up- and 7 down-regulated in CSS compared with CON mice (Fig. 4H). Functional enrichment analysis with mouse-specific KEGG pathways identified that these dysregulated genes were not enriched in any specific gene sets/pathways. Therefore, sampling DA neurons throughout the VTA did not yield robust evidence for altered basal transcriptome status of these neuronal populations in CSS mice.

Discussion

The association between environmental stressors and reward pathologies that are major symptoms in various mental disorders is well-recognised. It is widely assumed that changes in dopamine signalling contribute causally to this inter-relationship, but the empirical evidence for this is sparse4,5. Animal studies with genetically-encoded DA sensors enable imaging of DA release related to event-behaviour and behaviour-event associations, with high spatial and temporal resolution17,18,19,20,36. When incorporated into animal models of stress-induced deficient reward-directed behaviour30,31, the DA sensors provide a novel opportunity to increase understanding of the changes in region-specific DA function associated with, and potentially causal to, specific behavioural deficits. The behavioural tests need to allow for the study of distinct reward processes, including anticipation (incentive motivation) and reinforcement. This is not possible with the widely used sucrose preference test; furthermore, CSS32 and several other environmental manipulations e.g., early life stress37, do not lead to a reduced preference for sweet taste. In the present study, we obtained evidence that chronic social stress-induced reduced discriminative reward learning and effortful reward valuation both co-occur with lower nucleus accumbens DA activity at some specific test phases and not at others. As such, this study provides insights into the specific reward processes that are impaired by chronic stress and related to decreased NAc DA activity, and for which restoration of typical NAc DA activity could constitute an effective treatment strategy. Whilst it would be interesting to know whether there is sexual dimorphism in the effects of chronic (social) stress on reward-directed behaviour and dopamine signalling in mice, the CSS protocol as applied in this study only functions with males. Given that our final aim for such experiments is to identify potential novel molecular targets for psychotropic drugs, then assessing whether the molecular targets are similarly expressed in the brains of female and male mice is certainly an essential part of the preclinical validation. Other laboratories use the approach of manipulating the resident male mice so that, atypically, they are also aggressive towards intruder females, for example using chemogenetic activation of the hypothalamus, allowing for the study of chronic stress effects in male and female mice38.

In the DRLM test, a novel tone DS signals that an operant response at a feeder port will result in sucrose reinforcement. In CON mice, by DRLM test 3, the higher learning ratio indicated acquisition of reward anticipation. Discriminative learning-memory co-occurred with increases in NAc DA release at DS-feeder approach, -feeder response and -sucrose reinforcement. In test 3, the high NAc DA activity concomitant with expected reward could reflect on-going learning of the causal sequence “DS causes response causes reward”, in which NAc DA conveys and guides retrospective causal learning39. In a study of DA neuron activity during novel cue-reward learning, neuron activity resulted from the summation of sensory cue responding and reward-directed behaviour40; such summation of NAc DA activities related to continuous DS-feeder response - reward could account for the current findings in CON mice. (It is important to note that high-DA reward responding was specific to the discriminative learning phase and DA activity decreased post-learning40). That NAc DA activity at sucrose reinforcement increased positively with reward anticipation suggests that whilst reward prediction error (RPE) is likely to be relevant to discriminative reward learning, three tests of 25 trials were insufficient for RPE to be established. In a mouse study in which a large number of tone-sucrose discriminative learning sessions were applied, it was indeed the case that the major NAc core DA activity shifted forward from sucrose retrieval to discriminative tone onset, in accordance with the RPE model41. Furthermore, in the present study, that CON mouse NAc DA activity at sucrose reinforcement increased as the interval between DS onset and reward reinforcement decreased, is also to some extent consistent with the RPE model42,43.

Turning to the effects of CSS, the reduced DS-on phase NAc DA activity in CSS relative to CON mice was already present at DRLM test 1. At test 3, the DS-on phase NAc DA activity in CSS mice was still close to baseline and now even lower relative to that in CON mice. This co-occurred with a learning ratio in CSS mice that indicated minimal acquisition of the sequence, DS-feeder response - reward, i.e., the latency from DS onset to feeder response remained similar to the interval between feeder responses during ITIs, and long relative to CON mice. In tests 1 and 3, CSS mice displayed increased NAc DA activity at DS-sucrose reinforcement, equivalent to that in CON mice at test 1. Therefore, CSS attenuated NAc DA signalling of reward anticipation in terms of DS causes response causes reward, whilst being without effect on NAc DA signalling of sucrose reward per se.

In the REV test, a progressively increasing number of operant responses was required for successive triggering of a 1-sec tone DS that signalled sucrose reward availability. In CON mice, operant responding did not co-occur with consistent changes in NAc DA release. This contrasts with a rat study in which nose-poke responses to discriminative cues on a FR 1 schedule triggered food release: transient increases in NAc DA activity occurred both in response to discriminative cues and directly prior to nose-poking responses for trial initiation and reward retrieval20. The absence of a relationship between NAc DA activity and operant responding in the present study could be due to the unpredictable and/or the increasingly effortful PR schedule of reinforcement used. On completion of the required ratio as signalled by the presentation of the DS, and independently of the current ratio, there was a marked increase in NAc DA activity. The NAc DA activity then declined gradually during the 2 sec required to approach the feeder and retrieve the sucrose; the latter resulted in another increase in NAc DA activity similar in amplitude to that elicited by the DS.

In CSS mice, relative to CON mice, the number of operant trials completed was reduced and the duration of the operant phase was prolonged; during the latter, NAc DA activity remained basal, as in CON mice. The increase in NAc DA activity in response to the DS was lower in CSS than CON mice: this suggests that CSS leads to reduced NAc DA signalling of the association “operant response causes DS”. This could be due to a reduced “response causes DS” association, reduced “DS causes reward” association, or both. CSS-induced attenuation of NAc DA signalling of “response causes DS” would be the inverse of “DS causes response” in the DRLM test, and again places focus on deficits in NAc DA signalling of the reward anticipation processes that precede primary reward reinforcement. Reduced anticipation in the response-DS association could then account for slower operant responding and longer post-reinforcement pause. The progressively effortful schedule deployed could be particularly sensitive to detecting such a deficit. In contrast, CSS mice had a similar increase in NAc DA activity at sucrose retrieval, indicative of intact responsiveness to primary reinforcement. Furthermore, the duration of the DS phase was similar in CSS and CON mice (as was consummatory sociosexual behaviour in the SOM test (see below)). This provides additional evidence that the CSS-induced deficits in reward-directed behaviour were specific to anticipation, and the absence of deficits in sucrose- and female-directed consummatory behaviours also indicates that CSS does not lead to general impairment of motor behaviour.

In the SOM test, the primary reinforcers were distal and then proximal contact with a (pro-)estrous female mouse, stimuli known to increase NAc DA release transiently and markedly17,19. The CSS mice required more time than CON mice to complete the FR reinforcement schedule, as was the case in the REV test that used a PR schedule. Also as in the REV test, there was no consistent change in NAc DA activity during operant responding, neither in CON nor CSS mice. The completion of the operant ratio triggered door opening and the post-operant phase. The absence of a DS to signal operant completion precludes direct comparison with the REV test on whether reduced operant motivation co-occurred with attenuated NAc DA release to a discrete DS. The CSS and CON mice had a similar, robust increase in NAc DA activity on first entering the tunnel to the social compartment, which likely constitutes a constellation of conditioned and primary (e.g. female visual and olfactory stimuli) reinforcers. This evidence for intact NAc DA activity during primary reinforcement in the SOM test added to that obtained in the REV test. Furthermore, for the first social contact episode, NAc DA release was actually higher in CSS than CON mice, which was perhaps indicative of an increase in salience of social contact per se following the absence thereof during CSS.

Therefore, the overall evidence is that the CSS-induced reductions in reward learning and motivation are associated with decreased NAc DA activity during discriminative stimulus—operant response and operant response—discriminative stimulus phases of reward anticipation, whilst CSS leaves NAc DA activity during primary reinforcement largely intact. Concerning the pathways that could contribute to these deficits, one candidate would be the VTA DA neurons themselves. To investigate this, we assessed whether CSS resulted in consistent changes in the population-level basal transcriptome expression of VTA DA neurons, but this was not the case. Of course, this does not preclude the possibility that CSS alters the responsiveness of the transcriptome to reward stimuli. In a study26 of the effects of 10-day chronic social defeat (CSD), CSD mice were divided into susceptible or resilient depending on whether or not, respectively, they passively avoided a mouse of the aggressor strain in a subsequent social interaction test. Susceptible mice displayed a sucrose preference that was lower than that in control mice whereas resilient mice did not. Using VTA tissue to extract RNA and conduct gene expression microarray, differential gene expression at uncorrected p < 0.05 identified about 70 dysregulated genes in susceptible mice and more than twice as many dysregulated genes in resilient mice. Slice preparation electrophysiology demonstrated that spontaneous VTA DA neuron firing was increased in susceptible mice specifically. In a subsequent study, it was demonstrated that optogenetic induction of phasic, but not tonic, firing of VTA DA neurons projecting to NAc led to passive avoidance in the social interaction test and a decrease in sucrose preference after only 1 day of social defeat34. That there were more dysregulated genes in the above-mentioned study than in the present study could be due to the analysis of VTA tissue rather than VTA DA neurons and certainly to the more relaxed significance level. Our current finding of reduced NAc DA activity coincident with reduced reward anticipation in CSS mice is difficult to reconcile with the increased phasic firing of VTA DA-NAc neurons in susceptible CSD mice, although it is important to note that reward anticipation-related NAc DA release has been demonstrated to be independent of the phasic firing of VTA DA-NAc neurons20. Somewhat in contrast to CSD, chronic unpredictable mild stress in mice led to decreases in the frequency of burst firing events and the number of spikes per burst in VTA neurons25, which might reflect changes inherent to VTA neurons or their afferent projections. With respect to the neural circuitry underlying behaviour in the DRLM test, we have reported recently that the glutamate neurons projecting from the basal amygdala to NAc are in a state of increased activity during the DS-on phase and that this is inhibited by CSS. Furthermore, chronic, viral vector-mediated tetanus toxin inhibition of basal amygdala-NAc neurons replicated the behavioural effects of CSS in the DRLM test30. The lateral and basal amygdala nuclei, including the basal amygdala neurons projecting to NAc, are major regions in the neural circuitry of Pavlovian, discriminative and operant reward processing30,44,45, as are the bidirectional amygdala-medial prefrontal cortex pathways and particularly with respect to operant reward processing46. The current findings indicate the importance of identifying the neural pathways that are responsible for regulating NAc DA activity in relation to reward anticipation specifically, and that are sensitive to stress.

With the design of this mouse study having been informed by the human evidence, it is essential to integrate its findings with this evidence, in particular with the monetary incentive delay task-fMRI studies reporting that BOLD signal is reduced in ventral striatum during reward anticipation and not reinforcement in MDD patients relative to healthy controls7,8,9. Therefore, the translational findings are consistent with MDD and chronic stress leading to reduced NAc DA signalling during reward anticipation/incentive-motivation, specifically. As such, this mouse model can now be applied to provide further advances: Firstly, in CSS mice, it will be important to augment NAc DA activity during predictive cue exposure specifically, for example using optogenetic photostimulation, to determine whether the deficit causes the impaired reward learning and motivation observed. Second, if the relationship is causal, the model will allow for identification of the mediating neural pathways and aetio-pathophysiological processes, and then discovery of a neuropharmacological mechanism-of-action that restores adaptive NAc DA signalling to treat amotivational symptoms such as anhedonia and apathy.

Methods

Animals

Experiments were conducted with C57BL/6J (BL/6) male mice bred in-house and aged 12–14 weeks and weighing 26–30 g at experiment onset. Mice were weaned with same-sex littermates at age 4 weeks, and caged in littermate pairs from age 5–6 weeks until the end of the experiment or the onset of chronic social stress. Cages measured 33 × 21 × 14 cm in an individually ventilated caging system. The temperature was kept at 21–23 °C and humidity at 50–60% humidity, and the light cycle was reversed with lights off at 07:00-19:00 h. Standard diet (Complete pellet, Provimi, Kliba AG, Kaiseraugst, Switzerland) was provided ad libitum except during behavioural conditioning/testing (see below). Water was provided ad libitum including during conditioning/testing. All experimental procedures were conducted during the dark phase and between 09:00-17:00 h. The experiments were conducted under animal experiment licenses issued by the Veterinary Office of Canton Zurich (ZH-155/2018 and ZH-038/2022). We have complied with all relevant ethical regulations for animal use.

Experimental designs

Three experiments were conducted: (1) Effects of chronic social stress (CSS) on sucrose-rewarded behaviour and NAc DA activity were investigated in CSS mice (n = 20) versus control mice (n = 14), and n = 6 mice for NAc EGFP control of the NAc DA signal. (2) Effects of CSS on female (sociosexual)-rewarded behaviour and NAc DA activity were investigated in CSS mice (n = 16) versus control mice (n = 16), and n = 6 mice for NAc EGFP as control for the validity of the GRABDA sensor signal. Both experiments began with the handling of each mouse for 5 min/day on 3 consecutive days. In the first week, daily baselines for body weight and food consumption were determined. Mice were conditioned with sucrose pellets for testing of reward-directed behaviour in the case of the sucrose reward experiment, and with sucrose pellets and then a female in the case of the sociosexual reward experiment. This was followed by stereotactic surgery for viral vector-GRABDA sensor injection and optic fibre implantation in NAc, and 10 days of recovery. Mice underwent CSS or control handling, and then behavioural testing combined with fibre photometry. Ex vivo histological assessment of the viral vector injection site and optic fibre placement was conducted. (3) Effect of CSS on the population-level transcriptome expression of VTA DA neurons was investigated in CSS mice (n = 6) cersus control mice (n = 6). Mice were handled, followed by stereotactic surgery for DA- and GABA-neuron viral vector injection in VTA and 14 days for recovery and expression. Mice underwent CSS or control handling, and after an interval of 3 days to correspond to the interval between CSS and behavioural testing in experiments 1 and 2, mice were euthanized and brains PBS perfused for laser capture microdissection of populations of VTA DA neurons, followed by RNA extraction and transcriptome sequencing.

Conditioning for behavioural testing

Sucrose reward experiment

Controlled feeding and body weight

Prior to conditioning (training), body weight (BW) per mouse and food intake per littermate pair were measured for each 24 h across 1 week. Beginning the following week, mice were food restricted so that BW was reduced to 90-95% of baseline (BBW); this ensured adequate motivation for conditioning using sucrose pellet reinforcement. On the day prior to the onset of conditioning, mice were familiarised with the sucrose pellets to be used as reinforcement in the home cage.

Apparatus

Modular chambers had inner dimensions of 20 × 17 × 18 cm and a house light provided 10 lux illumination; four such chambers, each placed within an attenuation chamber into which background white noise was presented, were run in parallel by a control PC and interface (TSE Systems, Bad Homburg, Germany)29,30,31,47. A feeder port was located in the middle of one side wall. Food pellets were delivered singly into the feeder port from a pellet dispenser and could be retrieved by the mouse extending its snout into the feeder (feeder response); each such response into the feeder was detected via an infrared motion sensor and recorded. A nose-poke port for operant responding could be inserted into the side of the feeder (centre-to-centre distance = 55 mm); a white LED set into its rear was illuminated to indicate it was active, and operant responses were detected via an infrared motion sensor and recorded. Water was available from a bottle opposite the feeder and operant stimulus. The chamber floor and walls were wiped with 70% ethanol between mouse runs.

After stereotactic surgery (see below), for the last stage of conditioning and for testing, a photometry chamber running IntelliMaze software and connected with a TTL module was used (TSE Systems)30. It had inner dimensions of 21 × 27 × 27 cm and an opening along the centre of the ceiling allowed for unrestricted movement of a patch cord. It was fitted with a house light providing 10 lux. A feeder port located in the centre of one side wall extended into the chamber, and thereby enabled mice fitted with a cranial optic fibre and patch cord to retrieve pellets. Each response into the feeder was detected via an infrared motion sensor. Reward pellets were delivered from a dispenser directly into the feeder. An operant nose-poke port, enlarged to accommodate the mouse’s head with optic fibre, could be inserted to the left of the feeder on the same side wall; a white LED set into its rear indicated it was active, and operant responses were detected via an infrared motion sensor. The centre-to-centre distance between operant port and feeder port could be set to 55 mm (“near”) or 110 mm (“far”). Water was available from a bottle placed at the opposite side wall. The set-up was placed within an attenuation chamber into which white noise was presented.

Conditioning

Conditioning sessions were conducted on consecutive days and each had a maximum duration of 30 min29,30,31,47. Mice were trained with sucrose pellets (14 mg, F05684 Dustless Precision Pellets, Bio-Serv). All training steps were conducted in the absence of tone stimuli. At stage 1, without an operant port in the chamber, mice learned that sucrose pellets were available in the feeder port. Firstly, 15 pellets were placed in the feeder at session onset and 1 further pellet was delivered automatically each 45 s. At stage 2, 1 pellet was placed in the feeder at session onset and 1 further pellet was delivered automatically each 45 s, and mice were required to retrieve and eat at least 30 pellets on 2 consecutive sessions. At stage 3, mice were required to make a response in the feeder port to trigger pellet delivery (0.3-0.5 s delay) and the learning criterion was 2 consecutive sessions with at least 30 pellets retrieved and eaten. At stage 4, the operant port was introduced, and mice learned that 1 operant response (fixed ratio 1, FR1) into the illuminated port was required to extinguish the LED and trigger pellet delivery; the subsequent feeder port response for pellet retrieval was followed by a 5 s time out and the operant port was then active (LED on) again. In FR1 sessions 1-3, 5, 3 and 1 pellets, respectively, were placed in the operant port, and thereafter no pellet. Mice were required to complete at least 30 FR1 trials and consume at least 30 pellets in 2 consecutive sessions. At stage 5, mice were transferred into the photometry conditioning chamber, and were required to complete at least 20 FR1 trials and consume at least 20 sucrose pellets (20 mg, F0071 Dustless Precision Pellets, Bio-Serv) with the operant port “near” and then “far”, respectively. In the final FR1 “far” session, chocolate-flavoured sucrose pellets (20 mg, F05301 Dustless Precision Pellets, Bio-Serv) were used; mice preferred these to the training pellets, and they were the relatively novel gustatory stimulus used for testing. Mice required 15-17 days to complete the 5 training stages.

At days 13-14 post-surgery (see below), mice experienced operant responding and sucrose pellet retrieval with the patch cord attached to the optic fibre: they had a conditioning session with operant port present (REV-test condition, see below) and the following day a conditioning session with operant port absent (DRLM-test condition, see below).

Sociosexual reward experiment

Controlled feeding and body weight

Because sucrose was used as the initial reinforcer for mice that were tested with sociosexual reinforcement, BW and food intake were measured as described above.

Apparatus

The test arena was constructed from transparent Plexiglas and measured 48 × 38 × 21 cm. The arena was divided at the centre of its long side by a wall (depth = 8 cm) that contained: (1) An operant port activated by nose-poke; a white LED indicated it was active and operant responses were detected via an infrared motion sensor and recorded. (2) A sliding door at the opening of a tunnel that connected the two compartments. The system was connected with a TTL module and ran IntelliMaze software (TSE Systems). An opening along the centre of the removable lid allowed for unrestricted movement of the patch cord. The arena was placed within an attenuation chamber that contained a house light (10 lux) and a loudspeaker for white noise. The arena floor, walls and door were wiped with 70% ethanol between mouse sessions.

Conditioning

Conditioning sessions were conducted on consecutive days. At stage 1, mice were food restricted so that BW was 90-95% BBW; the sliding door was open, and mice explored and ate chocolate sucrose pellets placed in a small dish in each compartment. At stage 2, food-restricted mice were placed in the operant compartment and underwent 5 operant FR trials per daily session; the switching on of the LED in the operant port signalled trial onset. Operant conditioning began at FR 1,1,1,1,1, with a maximum of 120 s allowed per trial, and 60 s allowed for passing through the tunnel and collecting and eating the 2 pellets. If all trials were completed, mice progressed daily to a more effortful schedule (e.g. FR 1,1,1,3,5) until the final condition of FR 3,5,5,5,5. On completing the final FR session, mice were placed on ad libitum feeding. At stage 3, on each of two days, FR 3,5,5,5,5 was used, and the reinforcer was an adult female mouse place underneath an inverted stainless steel wire pencil cup, with which the mouse could interact for 60 s. Mice required 13–16 days to complete the 3 training stages.

Stereotactic surgery and adeno-associated viral vectors

Stereotactic surgery was conducted according to our previously published protocol30,48. Both mice per littermate pair were operated successively on the same day, either both in the left or right hemisphere, with alternation between successive littermate pairs. For analgesia, buprenorphine (Temgesic, 0.1 mg/kg s.c.) was administered 0.5-1.0 h pre-operatively. Mice were anaesthetized using isoflurane in pure oxygen, 4% for induction followed by 1.5-1.75% for maintenance. The mouse was placed in a stereotactic frame (Angle Two™, Leica) and a heating pad was used to maintain body temperature. Ophthalmic ointment was applied to the eyes (Viscotears, Novartis) and disinfectant (Betadine) was applied to the incision site. An incision was made at the cranial midline, and local anaesthetic (lidocaine 10 mg/kg and bupivacaine 3 mg/kg) was applied. Skin and connective tissue were pulled to the sides, and a burr hole (Ø = 300 µm) was drilled into the cranium.

In experiments 1 and 2, to quantify release of DA in the NAc (referred to here as NAc DA release or activity), a GRABDA sensor adeno-associated viral vector, pAAVss_hsyn-GRAB-DA4.4 (1.1 ×1013 vg/ml; Boehringer Ingelheim Pharma GmbH)17, was injected in a volume of 350 nl. As a control to determine whether certain behaviours (e.g. operant responding, pellet retrieval) generated movement-related artifacts in the fibre photometry signal, additional mice were injected in the NAc with an EGFP viral vector, ssAAV-9/2-hSyn1-EGFP-WPRE-hGHp(A) (2.9 × 1013 vg/ml, 350 nl; Viral Vector Facility, ETH and University of Zurich). Injection of viral vector was conducted using a 10 µl NanoFil™ microsyringe fitted with a 33 G bevelled stainless-steel needle and connected to an ultra-micro pump (UMP3, Micro4, World Precision Instruments), at a rate of 50 nl/min. After injection the microsyringe remained in position for 10 min and was then withdrawn slowly. A fibre-optic probe (Ø = 200 µm) was implanted directly dorsally to the injection site. Stable adhesion of the probe onto the cranium was achieved as described previously48. The coordinates were set to inject into the nucleus accumbens (NAc) core (at the border with the lateral shell) at bregma anterior-posterior (AP) + 1.10 mm, medial-lateral (ML) ± 1.50 mm, dorsal-ventral (DV) −4.60 mm, according to a mouse brain atlas49. These coordinates resulted in minimal injection into the anterior commissure. The fibre-optic probe was implanted 0.15 mm above the injection site (bregma AP + 1.10 mm, ML ± 1.50 mm, DV −4.45 mm). The mouse was returned to its home cage and remained on a heating pad until it was observed to be active, which required 0.5-1.0 h. Buprenorphine was injected at 4–5 h and 8–10 h post-surgery and administered via the drinking water for 3 days. Mice were weighed and wound healing was controlled for 10 days post-surgery.

Chronic social stress (CSS)

In the sucrose reward experiment, mice were allocated to CON (n = 14) and CSS (n = 20), and in the sociosexual reward experiment to CON (n = 16) and CSS (n = 16); in each experiment littermate pairs were allocated to group by counterbalancing on BBW and required number of conditioning sessions. The chronic social stress (CSS) procedure used is based on the resident-intruder paradigm and includes refinements from similar procedures50,51. Resident mice were unfamiliar, aggressive, ex-breeder CD-1 males aged 8–10 months and weighing 40–55 g, caged singly. On the day prior to the onset of CSS, a transparent, perforated plastic divider was placed along the length of the home cage of each CD-1 mouse, separating the cage into two equal compartments. On day 1 of CSS, BL/6 littermate pairs allocated to the CSS group were separated and placed singly in the cages of CD-1 mice: The CSS mouse and CD-1 mouse remained together for a cumulative total of 60 s physical attack or 10 min maximum. In contrast to the standard CSS protocol, the central divider was removed from the cage to avoid the optic fibre from becoming caught in the divider perforations30,47. After this acute proximal stressor, the divider was re-inserted in the cage and the CSS and CD-1 mice were placed in separate compartments and remained in distal (visual, olfactory, auditory) contact for 24 h. The following day, the CSS - CD-1 mouse pairings were rotated so that each CSS mouse was placed with a novel CD-1 mouse, firstly for proximal attack and then for distal exposure, and this continued across days. The total duration of the CSS protocol was 15 days. It is essential that the emotional stressor of CSS is not confounded by bite wounds so that, in addition to the refinement of timing and restricting the daily attacks to 60 s maximum, the lower incisor teeth of CD-1 mice were trimmed every 3 days50. In the sucrose reward experiment the mean cumulative duration of daily attack experienced by CSS mice was 50.6 ± 4.8 s (mean ± SD; range: 41.4–56.0 s) and in the sociosexual reward experiment was 47.7 ± 5.5 s (43.9–54.3 s). All CSS mice displayed submissive behaviour and vocalisation during the proximal stressor. The mice in the control or comparison group (CON) comprised littermate pairs that were handled for 1 min on each of the 15 days. From day 15 of CSS until the end of the experiment, each CSS mouse remained in the same divided cage with the same CD-1 mouse without further attacks50.

In the sucrose reward experiment, at days 5–12 of the CSS/CON protocol, BW and food intake were measured daily; mean values of BW and daily food intake were used as re-baseline values for these parameters (re-BBW, re-B-food intake) and applied during testing (Table S1).

Fibre photometry

Fibre photometry for optical recording of neural activity in freely moving mice was conducted as described previously30,48. Briefly, a laser as excitation light source, a high sensitivity photoreceiver, and customised software for signal processing, were used. A 488 nm laser light was focused into a fibre patch cord and delivered at the optic fibre tip in the NAc. Openings in the centre of the ceilings of attenuation chambers and behavioural test arenas allowed for unrestricted movement of the patch cord. The latter was connected to the optic fibre ferrule on the mouse cranium via a ceramic sheath. Back-propagated GRABDA-sensor or EGFP fluorescence was focused on the photoreceiver, and custom-written software code was used for data acquisition (LABView, 2020). Fibre photometry data were analysed using MATLAB. According to experiment and specific test, one or more of feeder port response, operant port response and tone-onset each generated a TTL signal that was recorded simultaneously with the photometry signal. Optical signal data were demodulated at 970 Hz and down sampled to a sampling frequency of 20 Hz.

Behavioural testing and NAc DA activity imaging

Sucrose reward experiment

On the day after completion of CSS/CON, mice were placed in the conditioning chamber without any stimuli and connected to the patch cord: the GRABDA or EGFP photometry signal of each mouse was recorded for 15 min to check for stimulus-related peaks in the signal; one CSS mouse did not show any signal peaks and was excluded from the experiment thereafter. Starting on day 13 of CSS/CON and continuing until the last day of testing, mice were mildly food restricted to yield 95%-100% re-BBW directly prior to each test session: the required amount of normal diet was placed in the home cage 2–3 h after testing and all food was consumed prior to testing on the next day. Using only mild food restriction minimises the effect of homoeostatic hunger on behaviour and thereby maximises test sensitivity to gustatory reward salience (Table S1)29,31. Chronic social stress leads to an increase in daily food intake required to maintain stable BW; this is associated with lower plasma leptin and higher plasma ghrelin levels28,29,31,52. Therefore, CSS mice need to be provided with more normal diet to maintain their BW at 95–100% re-BBW during testing29,30,31. To control that there are no differences in homoeostatic hunger between groups/subjects, in the final behavioural test (see below), a pellet (3 g) of normal diet is placed on the chamber floor as a low-effort/low-reward alternative to chocolate pellets (choice test): mice would consume a large amount of normal diet relative to chocolate pellets only if behaviour was motivated primarily by homoeostatic hunger and not by gustatory reward: typically, control and CSS mice consume a low and similar amount of normal diet under these test conditions29,30,31.

Discriminative reward learning-memory (DRLM) test

Beginning 2 days after the CSS/CON protocol, mice underwent a DRLM-fibre photometry test on 3 consecutive days30. The chamber contained the feeder port and no operant port. Following 30 s delay, trial 1 was initiated by presenting a novel tone at 5 kHz and 80 dB; the tone had a maximum duration of 25 s and during this time one feeder port response triggered chocolate pellet delivery (delay 0.3- 0.5 s) and tone termination after 1 s. The interval between consecutive tones was 40 ± 20 s (variable inter-trial interval, ITI). Feeder responses during the ITIs were counted but without consequence. Therefore, the tone serves as a discriminative stimulus (DS) that signals when a feeder port response will be rewarded; the higher the reward salience, the greater the amount of discriminative learning expected, measured as a relative decrease in response latency during DS compared with ITI. Successive tests allowed for the study of discriminative learning-memory. Per DRLM test, the maximum number of DS trials was 25 and session duration was set to 30 min (maximum) to ensure that all mice received 25 trials. In each test, all 25 trials were analysed and the measures of interest were: number of chocolate pellets obtained (=number of trials on which a DS feeder response was made), median DS response latency (no DS response = 25-s latency), median ITI response interval (ITI duration (s)/feeder responses per ITI), and discriminative learning ratio calculated as median ITI response interval/median DS response latency.

For analysis of fibre photometry signal data (NAc DA activity, EGFP), all 25 DS trials of each test were analysed; they were categorised as trials with response or without response. Each trial with response was analysed individually and was subdivided into the following phases: The 10 s prior to DS onset was the trial-specific baseline phase in terms of signal intensity. From DS onset until a feeder response was the DS-on phase, which was time normalised and divided into 10 equivalent intervals. Time normalisation involves fixing a time phase of variable length to one standard size of arbitrary units; the time-normalised period can be divided into n intervals of equal duration53. From feeder-response onset until 5 s had elapsed was the DS-feeder phase and was divided into 10 ×0.5 s intervals. After the DS-feeder phase, the first ITI feeder response marked the onset of the ITI feeder phase, which lasted for 5 s and was divided into 10 × 0.5 s intervals. For each trial with a response, during the DS-on phase, DS-feeder phase, or ITI feeder phase, for each 0.05 s time bin (t), the z-scored (normalised) signal intensity (F) was calculated using the formula ((F(t) – F0)/SD0), where F0 and SD0 denote mean and standard deviation of baseline phase signal intensity. The mean z-scored F(t) for trials with response in trials 1–25 was calculated for each t and each test and mouse. These mean z-scored signal F(t) values were then binned into time-normalised intervals or 0.5 s intervals for statistical analysis30.

Reward-to-effort valuation (REV) test

Beginning 1 day after DRLM testing, mice underwent a REV-fibre photometry test on 3 consecutive days, the final day being a chocolate pellet versus normal diet choice test30. The chamber now also contained the operant port. The session duration was 30 min and no break point was used. Each test session was initiated with operant stimulus LED illumination and progressive ratio (PR) 1: one operant port response elicited simultaneous extinguishing of the LED, 1 s tone DS (6 kHz, 80 dB), and chocolate pellet delivery into the feeder; feeder response/pellet retrieval was followed by a 5 s time out. A shallow PR reinforcement schedule was used as follows: trials 1–5 at PR 1, trials 6–10 at PR 3, trials 11–15 at PR 5, trials 16–20 at PR 7, and so on. The REV test measures reward valuation/incentive motivation, and because reinforcement is on a PR schedule it allows for measurement of reward valuation relative to aversive effort valuation in terms of nose-poke activity and time required to obtain reward. Mice were tested on 3 consecutive days. The initial test served as a transition test from the DRLM test conditions, and the data from REV test 2 and 3 were used for analysis. The measures of interest were: total number of operant responses, number of chocolate pellets earned, final ratio attained, duration of operant responding, pellet retrieval latency, and post-reinforcement pause.

For analysis of NAc DA activity (and EGFP signal), trials were grouped and analysed according to the progressive ratio (e.g. PR 3, PR 5) to which they pertained. Each trial was divided into the following phases: 10 s prior to the first operant response was the trial baseline phase of signal intensity. From operant response 1 until final operant response required to complete the current PR was the operant phase; it was time normalised and divided into 10 equivalent intervals. From final operant response and the 1-s DS that it elicited until feeder response was the DS phase; it was time normalised and divided into 10 equivalent intervals. From feeder-response onset until 5 s had elapsed was the feeder phase, divided into 10 × 0.5 s intervals. After the end of a feeder phase, the first ITI feeder response marked the ITI feeder phase which lasted for 5 s and was divided into 10 × 0.5 s intervals. For each completed trial at PR 3, PR 5 or PR 7, during the operant phase, DS phase or feeder phase, signal activity was z-scored as for the DRLM test. The mean z-scored F(t) for completed trials at PR 3, PR 5, PR 7 or PR 9 was calculated for each t and each test and mouse, and these mean z-scored values were then binned into time-normalised intervals or 0.5 s intervals30.

Sociosexual reward experiment

Adult female BL/6 mice were screened for reproductive stage: vaginal lavage was conducted by gently pipetting and triturating 50 µL sterile ddH2O at the opening of the vagina. The derived cell suspension was transferred onto a glass slide and then placed at 37 °C until dry. The cells were then stained with 50 µL 0.1% cresyl violet, cover-slipped and assessed at the microscope54. Females that were at proestrus or oestrus were included as social reward stimuli.

Sociosexual motivation (SOM) test

On the day after completion of CSS/CON, a signal test was conducted: mice were connected to the patch cord and then placed in the social test chamber with sliding door open: the GRABDA or EGFP photometry signal was recorded for 15 min to check for a sufficient and stable signal; one CSS mouse did not show any signal peaks and was excluded from the experiment thereafter. A female was then placed with the virgin male in the social test chamber and they remained together for 10 min. On each of the next 2 days, mice underwent a distal test session at FR 3,5,5,5,5, with 60-s distal interaction with a pro-(estrous) female under an inverted cup as reinforcement on each trial. After a 2-day interval, on each of the next 2 days, mice underwent a proximal test session at FR 10, 10, with 180-s proximal interaction with a pro-(estrous) female as reinforcement on each trial. On each test day, each trial was initiated by placing the mouse in the operant compartment and simultaneous operant-port LED illumination. After the mouse completed the required FR, the sliding door immediately opened, and the mouse could enter the stimulus compartment. A camera (model C920, Logitech) was fixed to the underside of the ceiling of the attenuation chamber and allowed for simultaneous video recording of sessions on the control PC running LabView. The measures of interest were: duration of operant responding; the number and duration of the social episodes approach + contact, approach + mount, approach + copulation, regardless of whether approach was initiated by male or female.

For analysis of NAc DA activity (and EGFP signal), LABView files of video recording and optical signal data were used; social events were manually time stamped onto the optical signal data. Each trial was analysed individually and divided into the following phases: From operant response 1 until the final operant response required to complete the FR was the operant phase; z-scored signal intensity was scored using signal intensity across the entire operant phase to compute baseline F0 and SD0. The mouse entering the tunnel for the first time after door opening and the next 5 s was the post-operant phase. Thereafter was the social phase, and each social approach initiated a social episode. The mean NAc DA (or EGFP) activity during the 5 s prior to social episode onset at t = 0 s provided the measure of baseline activity. For 5 s after episode onset, regardless of the duration of the social episode that it initiated, for each 0.05 s (t), the z-scored signal intensity (F) was calculated using the formula ((F(t)- F0)/SD0), where F0 and SD0 denote mean and standard deviation of 5-s baseline activity. After the onset and offset of a social episode, if the onset of the next social episode occurred within 10 s, this latter episode was not analysed; this ensured separation between baseline signals and social episode-related signals.

Fibre photometry target validation

After completion of behaviour-fibre photometry testing, mice were deeply anaesthetised and underwent brain perfusion-fixation for histological assessment in terms of NAc probe placement and NAc GRABDA or EGFP expression. As described in detail elsewhere48, the optic fibre implant was removed, and the brain was sectioned coronally at 100 μm using a vibratome (Leica). Sections underwent Nissl staining (NeuroTrace 640/660 Deep-Red Fluorescent Nissl Stain, Thermo Fisher), followed by washing in PBS, mounting on microscope slides, addition of Dako/DAPI fluorescence mounting medium (Sigma Aldrich), and cover-slipping. Using an epifluorescence microscope (Axio Observer.Z.1, Zeiss), mounting medium allowed for localisation of GRABDA or EGFP expression, and Nissl staining allowed for localisation of the optic fibre placement. Using a mouse brain atlas49 the bregma level of the NAc section that included the most ventral position of the fibre tip in the NAc combined with GRABDA or EGFP expression, was identified. For the CSS-sucrose reward experiment, supplementary Fig. 1 provides representative examples of histological verification of GRABDA sensor or EGFP expression and optic fibre tip placement in NAc, as well as the estimated descriptive statistics for NAc locations of optic fibre tip and GRABDA and EGFP expression in CON and CSS mice based on histological assessments. For the CSS-sociosexual reward experiment, the estimated NAc locations of optic fibre tip and GRABDA were: CON mice, n = 16: AP: 1.18, range 1.38–0.90, ML: 1.38 ± 0.09, DV: −4.50 ± 0.15; CSS mice, n = 16: AP: 1.15, range 1.45–0.80, ML: 1.36 ± 0.10,DV: −4.42 ± 0.15.

Statistical analysis

Statistical analysis was conducted using Prism (GraphPad, version 9) or SPSS (IBM, version 29). In each of experiments 1 and 2, data sets were first assessed for outliers, using the ROUT test in Prism and Boxplot analysis in SPSS; any outliers identified were removed (one CSS mouse in the SOM test). Next, data were checked to ensure normal distribution, using the D’Agostino-Pearson normality test in Prism and the Shapiro-Wilk test in SPSS. For t tests, homogeneity of variance was ensured using the F test in Prism. For linear mixed models in SPSS, Levene’s test of homogeneity of variance was used. In the DRLM test: for each behavioural measure 2-way mixed-model ANOVA was applied with a between subjects factor of group (CON, CSS) and a within-subjects factor of test (1-3). For each fibre-photometry phase a linear mixed model was applied with fixed effects of group (CON, CSS), test (1, 3) and sampling interval/time (1-10) and a random effect of mouse subject. In the REV test: for each behavioural measure a t test of group means was applied; for each fibre-photometry phase at a specific progressive ratio, 2-way mixed-model ANOVA was applied with a between subjects measure of group and a within-subjects factor of sampling interval. In the SOM test: for each behavioural measure a linear mixed model was applied with fixed effects of group (CON, CSS), day (1, 2 or 3, 4) and trial (1-5 or 1, 2) and a random effect of mouse subject. For each fibre-photometry phase a linear mixed model was applied with fixed effects of group (CON, CSS), day (1, 2 or 3, 4), trial (1-5 or 1, 2), time (1-5) and in the case of social phase, social episode (1-5), and a random effect of mouse subject. In the case of significant main or interaction effects, Tukey’s or Sidak’s posthoc multiple comparison test was conducted. Data are reported primarily as mean ± standard error of the mean (S.E.M.). Statistical significance was set at p ≤ 0.05.

VTA dopamine neuron population transcriptomics

Stereotactic surgery and adeno-associated viral vectors

Stereotactic surgery and injection of AAVs were conducted as described above for experiments 1 and 2. To enable identification of VTA DA neurons, mice were injected with a cocktail of 2 AAV vectors, each in a volume of 300 nl per hemisphere: ssAAV-9/2-mTH-EGFP-WPRE-SV40p(A) (AAV mTH-EGFP, 7.0 × 1011 vg/ml; Viral Vector Facility, ETH and University of Zurich), to achieve EGFP expression in DA neurons; ssAAV-9/2-hGAD67-chI-mScarlet-I-SV40p(A) (AAV hGAD67-mScarlet-I, 8.0 × 1011 vg/ml; Viral Vector Facility, ETH and University of Zurich), to achieve m-Scarlet-I expression in GABA interneurons. In each vector, expression of a specific fluorescent protein was therefore dependent on a promoter-region sequence of a neuron type-specific marker gene: EGFP under the control of tyrosine hydroxylase (Th) promoter for DA neurons, and monomeric bright red fluorescent protein under the control of glutamate decarboxylase 67 (Gad1) promoter for GABA (inter)neurons. Stereotactic coordinates were set to inject into VTA at AP −3.1 mm, ML ± 0.5 mm, DV −4.9 mm49. Mice were weighed and wound healing was controlled for 10 days post-surgery.

To validate the specificity of the AAV vectors, pilot mice were injected with AAV mTH-EGFP and/or AAV hGAD67-mScarlet-I, and brains were perfused-fixed with PBS and then ice-cold paraformaldehyde (PFA, 4%). Brains were extracted and post-fixed in PFA, and then transferred into 30% sucrose solution for 48 h prior to freezing. Using a freezing microtome (Leica), brains were sectioned coronally at 40 µm from bregma −2.8 to −3.5 mm for VTA sections, and stored in tissue collection solution (TCS; glycerine and ethylene glycol in 0.2 M phosphate buffer; Sigma-Aldrich) at −20 °C. Using a 24-well plate, sections were placed free-floating in Tris-Triton buffer (pH 7.4) and then underwent immunofluorescence staining for TH or GAD67. For TH, a primary antibody of rabbit anti-TH (1:2500; AB152, Chemicon) and a secondary antibody of donkey anti-rabbit IgG-Alexa Fluor 647 (1:1000; A31573, Invitrogen) were used. For GAD67, a primary antibody of mouse anti-GAD67 (1:200; ab26116, Abcam) and a secondary antibody of donkey anti-mouse IgG-Alexa Fluor 647 (1:1000, A31571, Invitrogen) were used. Images including the VTA and surrounding regions were acquired using a confocal laser scanning microscope (Leica SP8) at x20 magnification. Separate laser channels were used for DAPI (405 nm), EGFP (488 nm), mScarlet-I (552 nm) and Alexa Fluor 647 (638 nm).

Chronic social stress

Littermate pairs were allocated to CSS (n = 6 mice) and CON (n = 6 mice) by counterbalancing on body weight. Mean cumulative duration of daily attack experienced by CSS mice was 49.6 ± 5.4 s (range: 43.0-55.5 s); all CSS mice displayed submissive behaviour and vocalisation during the proximal stressor. From day 15 of CSS until the end of the experiment, each CSS mouse remained in the same divided cage with the same CD-1 mouse without further attacks.

Brain collection

At 3 days after completion of CSS/CON, mice were deeply anaesthetized and then perfused with PBS (20 mL) at RT. The brain was removed and placed in a cryo-mould (E6032-ICS, Sigma) with embedding medium (Tissue-TEK OCT Compound). The cryo-mould was then placed on dry ice, wrapped in aluminium foil and a polythene bag and stored at −80 °C.

Laser capture microdissection

Frozen brains were processed using RNA- and RNAse-free conditions throughout. Using a cryostat set at −18 °C, coronal sections that included the VTA at AP −2.9 to −3.3 mm were cut at 10 µm and mounted (3 sections/slide) on RNAse-free PET membrane slides (50102, Molecular Machines & Industries, MMI). Sections then underwent fixation and dehydration: 100% ETOH at RT for 20 s and xylene at RT for 20 s. Slides/sections were placed on their edge in a covered box at RT for 10 min or until completely dried, and then in a capped 50 ml Falcon tube for storage at −80 °C for 3 days maximum. Tissue samples that were EGFP+ were collected from these coronal sections using a laser capture microdissection (LCM) system (CellCut, MMI). Fluorescence settings were optimised for visualisation of EGFP+ tissue (channel FITC) or mScarlet-I+ tissue (channel TRITC). The membrane slide was positioned and using 4x magnification, VTA tissue areas that were EGFP+ were each encircled at Ø=35 µm using the MMI CellTools software. Selected EGFP+ areas that were also mScarlet-I+ were deselected. There were 20-30 EGFP+/m-Scarlet-I- samples per VTA hemisphere/section; these were encircled for both hemispheres for each of the 3 sections on the membrane slide. An MMI Universal UV laser (355 nm, 2 µJ, 4 kHz frequency, 500 pico-s pulse-duration) at 88% laser power was activated (velocity = 51 µm/s, focus = 2233 µm) and the designated tissue areas were collected on the adhesive cap of an MMI isolation tube (0.5 ml). The procedure was conducted with 3 membrane slides (7-9 sections) and isolation tubes per mouse, to yield a total of 500 EGFP+/mScarlet-I- tissue samples per mouse; this was with the exception of one CSS mouse in which the EGFP/mScarlet-I signals were weak (likely due to misplaced injection), and this mouse was excluded from the experiment. Following tissue collection, tissue lysis was conducted by adding QIAzol (100 µl) to the tube, triturating the tissue on the cap with 20 µl volumes and returning this volume to the tube; the tube was closed, inverted for 15 min at RT, and vortexed for 1 min, inverted for 5 min and centrifuged for 5 s. The tube was then sealed with Parafilm and frozen at −80 °C until RNA extraction.

RNA isolation and quality control

Per mouse sample, lysate aliquots (3 × 100 µl per sample) were pooled to give a final lysis volume of 300 µL. Samples were transferred to 2 mL PhaseLock tubes (QuantaBio). A half volume of chloroform:isoamyl alcohol (24:1 v:v) was added before shaking, 3 min RT incubation and centrifugation at 4 °C. The aqueous phase was then transferred to a 1.5 mL Eppendorf tube and mixed with a 1.5 volume of isopropanol (Sigma). After thorough pipette mixing, the isopropanol mixture was applied to a RNeasy MinElute spin column and total RNA was extracted using the miRNeasy Micro Kit (Qiagen) with a DNase treatment. Samples were eluted in 14 µL nuclease-free water. RNA samples were assessed both quantitatively and qualitatively using the High Sensitivity Total RNA 15nt Analysis DNF-472 Kit on a 48-channel Fragment Analyser (Agilent). Total RNA yield was 1.14 ± 0.20 ng; RNA integrity could often not be computed due to low input.

Low input RNA sequencing with poly(A) enrichment

Up to 1.4 ng of total RNA was used for cDNA synthesis, conducted with the SMART-Seq® v4 Ultra Low Input RNA kit (Takara Bio); 12 amplification cycles were conducted. After clean-up, up to 10 ng of cDNA was used to generate the final sequencing libraries with the tagmentation-based DNA Prep Kit (#20018705) and the IDT® DNA/RNA UD Indexes Set A (#20026121), both Illumina®. The index PCR was performed with 9 cycles, while the final library was eluted in 30 µL EB Buffer. Low input mRNA libraries were then quantified using the High Sensitivity dsDNA Quanti-iT Assay Kit (ThermoFisher) on a Synergy HTX (BioTek). Library molarity averaged 42 nM. Libraries were also assessed for size distribution and adapter dimer presence (10, Rd3: 10, Rd4: 101), reaching an average depth of 26 million Pass-Filter reads per sample (14.2% CV).

Differential gene expression and pathway analysis

Sequencing reads were mapped to the Mus musculus reference genome (mm10) using STAR v2.5.2b allowing for soft clipping of adapter sequences. An average of 20 million reads per sample was obtained, from which approximately 10 million reads were assigned to genomic features. Transcript quantification was conducted with RSEM v1.3.0 and feature Counts v1.5.1. QC and downstream bioinformatics analyses were performed with R v4.1.0 and Bioconductor v3.12 tools, respectively. Briefly, we identified expressed genes based on the distribution of median log2 raw counts across samples, and this yielded a median of 12,250 expressed genes per sample in the experiment. A Gaussian mixture model was fitted to the distribution with mclust v5.4.7 to identify two clusters: genes with median expression values belonging to the cluster with the mean closest to 0 were filtered out from the expression matrix. Then, we normalised the expression matrix using the variance stabilising transformation from package DESeq2 v1.32.0 and identified the 500 highest variable genes (HVGs). Principal component analysis (PCA) was performed with these 500 HGVs using PCAtools 2.4.0. Using brain cell type-specific marker genes to identify the relative contribution of different cell types to the RNA sample (mouse visual cortex55), the DA neuron gene marker Th, as well as the pan-neuronal gene marker Snap25, displayed consistent and markedly higher expression than marker genes for GABA (inter)neurons (Gad1, Gad2) and each of the glial cell types (astrocyte: Aqp4, oligodendrocyte progenitor cell: Pdgfra, myelinating oligodendrocyte: Opalin, microglia: Ctss). Differential gene expression analysis (DGEA) was conducted for CSS vs CON with DESeq2 v1.32.0, using an absolute log2 fold-change of at least 0.5 and a raw p-value of ≤0.001. Functional enrichment analysis of differentially expressed genes was performed with enrichR v3.0 against the mouse-specific pathway collection from KEGG 2019.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.