Introduction

When making decisions, individuals assess the relative values of different alternatives in order to make choices between them. Some choices, such as between more or less reward, are relatively easy and optimal decision-makers should choose more over less. However, when choices differ in more than one dimension, such as a choice between a large uncertain (LU) reward and a smaller certain (SC) reward, the decision becomes more complex as there is a tradeoff between risk and more reward. In decisions involving risk, normative theories such as expected value (EV) and optimal foraging theory (Herrnstein 1990; Kahneman and Tversky 1979; Stephens and Krebs 1986) suggest that decision-makers should maximize overall expected reward, yet a number of factors and biases have been identified that can produce less than optimal decision-making (Kahneman 2003; van Holst et al. 2010).

In the laboratory, tradeoffs between risk and reward are often modeled using probability discounting (PD; Rachlin et al. 1991) procedures in which individuals choose between an LU and SC reward while the probability of obtaining the LU reward gradually declines. LU choices typically decrease as the LU reward probability decreases with the rate of decrease taken as an index of risk tolerance associated with gambling behavior (Holt et al. 2003; Madden et al. 2009; Petry 2012), smoking (Reynolds et al. 2004), internet gaming (Lin et al. 2015), and obesity (Rasmussen et al. 2010). However, one factor generally unexplored in PD procedures is the role of cues that signal wins and losses (Barrus et al. 2015). For example, two rodent procedures examining risky choice outside of the PD framework have found that presenting audio and/or visual reward-paired cues simultaneously with reward delivery (Barrus and Winstanley 2016) or omission (Marshall and Kirkpatrick 2017) can bias choices towards risk taking.

Another procedure, often referred to as the suboptimal choice procedure, has shown that signaling choice outcomes prior to the actual receipt of reward greatly promotes gambling-like choices (McDevitt et al. 2016; Zentall 2016). In the suboptimal choice procedure, choice of the gambling-like alternative is followed either by a signal indicating that a win or loss will follow, while choice of the non-gambling alternative results in an ambiguous cue (i.e., present on both win and loss trials) uninformative of the forthcoming outcome but provides greater overall reward. Despite the economic advantage of the ambiguously signaled option, it has been repeatedly demonstrated that individuals consistently choose the option coupled with infrequent ‘jackpots’ (including rats, pigeons, starlings, and self-reported gamblers; Chow et al. 2017; Molet et al. 2012; Vasconcelos et al. 2015; Zentall 2016), leading to suboptimal preference that results in significant loss of primary reward. The suboptimal preference engendered by the suboptimal choice procedure has been suggested to be mediated by the overvaluation of the win signals (or jackpots), despite their relative infrequency (Smith et al. 2016; Smith and Zentall 2016; Zentall et al. 2015), and the undervaluation of the more-frequent loss signals (Fortes et al. 2016; Laude et al. 2014; Pisklak et al. 2015). Additionally, recent research has further revealed that win signals must be capable of acquiring sufficient conditioned reinforcing value, or incentive salience, such as levers associated with sign-tracking behavior (stimulus approach and contact; Beckmann and Chow 2015; Robinson and Berridge 2008), to effectively promote suboptimal risk preferences (Chow et al. 2017). While the use of appropriately valued stimuli can explain some inconsistencies in the conditioned reinforcing efficacies of different win signals (Trujano and Orduna 2015), the relative influence of discrete loss signals (Martínez et al. 2017) on suboptimal preference has not been adequately studied.

An additional unexplored factor in suboptimal choice is the characterization of its mediating neurobehavioral processes. In particular, dopaminergic transmission is a likely mediator in how gambling cues can influence risky decision-making, as it has been implicated in stimulus reward learning (Glimcher 2011), including incentive salience attribution (Flagel et al. 2011), and a growing body of research is demonstrating its role in risky decision-making (Orsini et al. 2015). For example, Barrus and Winstanley (2016) found that dopamine D3-receptor signaling modulated the ability of gambling cues to increase rats’ risk taking. Additionally, clinical populations with aberrant dopaminergic signaling such as those with Parkinson’s disease have shown an increased tendency towards problem gambling (Clark and Dagher 2014).

The present study explored how both gambling cues and dopaminergic signaling can modulate risky decision-making in rats. Specifically, three groups of rats were given choices between an LU 4-pellet reward and a 1-pellet SC reward where the probability of receiving the LU reward decreased systematically across blocks. Three groups of rats received different reward-associated cues for choosing the LU; the unsignaled group always received an ambiguous lever cue not informative of trial outcome as a control for procedures lacking cues (e.g., St Onge and Floresco 2009), while two remaining groups received a lever cue only when an LU choice resulted in a win (or a jackpot). The critical difference between these two groups was that, when LU choices resulted in a loss, the signaled group received no cue similar to previous research (Chow et al. 2017), while the signaled-light group received a salient light stimulus. If suboptimal choice is primarily dependent on sufficiently valued win-paired jackpot signals acting as conditioned reinforcers, the two signaled groups should show increased risk-taking relative to the unsignaled group; however, if the presence of a salient loss signal can attenuate the effect of a win-paired cue, then the signaled-light group should also show reduced risk-taking relative to the signaled group (Trujano et al. 2016). Finally, D1 and D2 receptor agonist and antagonist pretreatments were administered to assess their role in modulating these risk preferences, with the hypothesis that the signaled groups’ risk taking would be modulated to a greater extent given dopamine’s role in both risk taking and incentive salience attribution.

General methods

Subjects

Eighteen adult male Sprague-Dawley rats (Harlan Inc., Indianapolis, IN) were used in the experiment. Rats were individually housed and placed on a 12:12 h light:dark cycle (lights on at 7:00 am). The use of only male rats ran in the light cycle was to make results comparable to previous research (St Onge and Floresco 2009). Rats had free access to water but were food restricted to 12 g of standard lab chow per day (Harlan Inc.). All research was approved by the University of Kentucky Institutional Animal Care and Use Committee (Protocol # 2011-0885).

Apparatus

The experiments were conducted in Med Associates (St. Albans, VT) operant conditioning chambers (ENV-008) within a sound-attenuating box (ENV-018). Inside the chamber, a pellet dispenser (ENV-203M-45) delivered 45-mg sucrose pellets (BioServ Precision Pellets, Flemington, NJ) to a magazine receptacle (ENV-200R2MA) on the front panel equipped with a head entry detector (ENV-254-CB). Two retractable response levers (ENV-122CM) were on either side of the magazine with a white cue light (ENV-221M) above each lever. On the back panel, opposite the levers, were two nosepoke response receptacles (ENV-114BM) and a house light (ENV-227M) centrally positioned at the top of the back panel.

Pretraining

The rats were trained to eat from the magazine, respond to the nosepokes, make an orienting response to the magazine, and finally discriminate one versus four pellets in magnitude training; details can be found in the supplemental materials.

Probability discounting

Probability discounting (Chow et al. 2017; Smith et al. 2017) began immediately following magnitude training with the general methods illustrated in Fig. 1. Sessions consisted of five 13-trial blocks (65 trials total) where, for each block, the first eight trials were forced trials and the last five were free choice trials. In the first block, rats chose between a 4-pellet LU reward and a 1-pellet SC reward at 100% probability. In subsequent blocks, the probability of receiving the 4-pellet LU reward decreased from 100 to 50, 25, 12.5, and 6.25%, respectively. The descending sequence of probabilities was chosen to better adhere to previous risky decision-making procedures (e.g., Cardinal and Howes 2005; Stopper et al. 2013) in order to make better comparisons between the present results and the existing literature. Forced trials started with the illumination of the house light and, following an orienting head entry response, one of the nosepoke receptacles on the back wall illuminated. If the small certain (SC) nosepoke was lit, one response produced a lever stimulus presented for 10 s on the front wall, followed by one sucrose pellet and a 10-s intertrial interval (ITI) during which all stimuli were turned off.

Fig. 1
figure 1

a Schematic of the general method for the signaled and signaled-light groups on free choice trials. b Schematic of the general method for the unsignaled group on free choice trials. c Table indicating the probabilities of reinforcement used in determining LU reward delivery as well as the expected values for the LU and SC alternatives. Note: P denotes probability of event, * indicates that all stimuli offset following the event, # indicates differential signals for signaled (dark) and signaled-light (light), and bold numbers indicate optimal choice

If the large uncertain (LU) nosepoke was lit, the procedure varied across three groups (signaled, signaled-light, and unsignaled) when the probability of LU reward was less than 100%. For the unsignaled group, LU responses always resulted in a 10-s non-predictive lever stimulus that was distinct from the SC-associated lever; this stimulus was not informative of the trial outcome, and the four-pellet reward probabilistically followed after 10-s according to the reinforcement probabilities for that block (Fig. 1c). For the signaled group, a 10-s predictive lever stimulus appeared only when a win was to follow (i.e., a jackpot signal), while losses led to a 10-s blackout. Finally, the signaled-light group was similar to the signaled group except loss trials were signaled by a 10-s white cue light above the retracted lever. Free choice trials were identical to forced choice except, following an orienting response, both nosepokes illuminated (allowing a choice). For all groups, a 10-s ITI separated all trials. Training continued for 40 sessions.

Drugs and pharmacological manipulations

All drugs were purchased from Sigma-Aldrich (St. Louis, MO, USA) and included the D1 agonist SKF-38393, the D1 antagonist SCH-23390, the D2 agonist Quinpirole, and the D2 antagonist Eticlopride. All drugs were dissolved in 0.9% saline and injected subcutaneously at a volume of 1 ml/kg (except for SKF-38393, which was first sonicated until dissolved and injected at 2 ml/kg) 15 min prior to behavioral testing. All rats first received either antagonist, as well as two saline treatments, in a randomized latin-square design with 2-day wash outs between doses. Once all rats received all doses of the antagonists, rats again received either agonist or two saline treatments in a randomized latin-square design with two-day wash outs. Doses for all drugs were chosen based on previous reports in the literature showing pharmacological effects following systemic injections during choice procedures (Cooper and Al-Naser 2006; Koffarnus et al. 2011; St Onge and Floresco 2009).

Data analysis

Data were analyzed using nonlinear mixed effects (NLME) modeling (Chow et al. 2017; Smith et al. 2017; Young et al. 2009) in R (Pinheiro et al. 2016) using a concatenated matching equation of the form:

$$ \frac{\mathbf{LU}}{\mathbf{LU}+\mathbf{SC}}=\frac{\mathbf{1}}{\mathbf{1}+{\left(\frac{{\boldsymbol{P}}_{\mathbf{SC}}}{{\boldsymbol{P}}_{\mathbf{LU}}}\right)}^{{\boldsymbol{S}}_{\boldsymbol{P}}}\ast {\left(\frac{{\boldsymbol{M}}_{\mathbf{LU}}}{{\boldsymbol{M}}_{\mathbf{SC}}}\right)}^{{\boldsymbol{S}}_{\mathbf{M}}}} $$
(1)

in which the proportion of LU choice is a function of the relative LU and SC probabilities of reinforcement, P, and magnitude of reward, M, raised to probability and magnitude sensitivity parameters, SP and SM (Smith et al. 2016; see supplemental materials for further details). From Eq. 1, (in cognitive terms) the two free parameters, SP and SM, describe how sensitive an individual is to receiving any reward (i.e., reward probability; SP) versus how large the reward is (i.e., reward magnitude; SM). Individuals that are more sensitive to relative probability (i.e., an increased SP) will relatively quickly decrease LU choices as the LU reward probability decreases, while higher values of SM indicates greater discriminability of the relative difference between four and one pellet independent of reward probability.

Using the NLME approach with the matching equation is advantageous as it can analyze an entire choice function rather than the typical ANOVA approach that analyses each data point separately. The matching equation also expresses LU preferences relative to the alternative SC choice in terms of both probability and magnitude differences that make a more informed comparison over traditional choice measures. Model selection was based on ∆AIC criterion (Wagenmakers and Farrell 2004). Subject was treated as a nominal, random factor while group was treated as a nominal, fixed factor. In the case of pharmacological pretreatments, NLME models were fit with dose as a continuous factor to assess potential systematic changes in sensitivity to magnitude and probability as a function of dose increases. Post-hoc analyses were done using the multcomp package in R (Hothorn et al. 2008; Young 2017) with Hochberg corrections (Hochberg 1988). To aid the reader in interpretation, all parameter results reported, which are negative in their raw form, are the absolute values of the parameters.

Results

Baseline

Figure 2a illustrates the proportion LU choice fit with Eq. 1 averaged over the last five training sessions for each group as a function of its relative probability. Relative probability refers to the probability that a choice for the LU alternative would be rewarded (see Fig. 1c) divided by the probability that an SC choice would be rewarded. As expected, all groups initially chose the LU when it was rewarded 100% of the time but, as LU reward probability decreased, LU preference also decreased. However, the signaled and signaled-light groups appeared to maintain higher risk preferences across LU reward probabilities.

Fig. 2
figure 2

a Mean (± SEM) proportion choice of the LU alternative as a function of LU and SC relative reward probability for all groups fit with Eq. 1. b Mean (± SEM) best fit sensitivity parameter estimates for probability and magnitude used in Fig. 2a

To determine if there were significant LU preference differences between the tree groups, an NLME analysis using Eq. 1 was used. The NLME analysis quantifies group differences in terms of differential sensitivities to either relative probability (SP) or relative magnitude (SM; 4 versus 1) of the LU and SC rewards with the NLME-determined parameter (SP and SM) estimates for each condition graphically shown in Fig. 2b. NLME analysis using Eq. 1 confirmed that the groups were differentially sensitive to probability of reinforcement, as indicated by an effect of group on SP [F(2, 67) = 10.96, p < .001]; corrected pairwise comparisons further revealed that the unsignaled group had increased sensitivity to probability compared to the Signaled and Signaled-Light groups (ps < 0.001). Thus, the NLME analysis confirmed group differences in risk taking between the unsignaled and signaled groups and demonstrated that the behavioral mechanism underlying the suboptimal preference observed in both signaled conditions was due to reduced sensitivity to reinforcement probability.

As the signaled and signaled-light groups had significantly greater risk taking, it was further hypothesized that the unsignaled group may be tracking the optimal expected reward values (defined as the magnitude × probability of reward) between the LU and SC alternatives better (see Fig. 1c). To corroborate the hypothesis that the unsignaled group tracked reward values better, Table 1 shows the total reward, total reinforcements, and reward earned in the final two blocks (where the LU’s EV was less than the SC) cumulatively over the last five sessions. The unsignaled group had a greater number of overall rewarded choices [t(10) = 5.50, p = < .001, t(10) = 5.67, p < .001] and rewards earned in the final two blocks [t(10) = 4.90, p = .001, t(10) = 5.76, p < .001] than the signaled and signaled-light groups, respectively, which did not differ from each other (ps ≥ .334). Thus, the present results suggest that the unsignaled group more closely followed the EVs of the two-choice alternatives that led to an optimal increase in reward. Furthermore, Fig. 3 shows a significant association (r = .90, p < .001) between an individual’s reinforcement probability sensitivity estimate (SP) and their cumulatively obtained food when the LU had a lower EV than the SC, implicating reinforcement probability sensitivity as a key determinant of suboptimal risk taking. Sign-tracking rates to the LU-associated signals were also analyzed but not associated with risk preferences (p = .231) and so are left to the supplementary materials.

Table 1 Mean (± SEM) cumulative totals over the final five sessions of training for total reward earned, total reinforcements (rewarded trials regardless of reward amount), and reward earned in the final two blocks where the expected value of the LU was less than the SC
Fig. 3
figure 3

Correlation between sensitivity to reinforcement probability and the cumulatively earned food for all groups in the last two blocks over the last five sessions of training

Pharmacology

To investigate dopaminergic signaling as a possible mediator of risky choice, D1- and D2-specific receptor antagonists (SCH-23390 and Eticlopride) and agonists (SKF-38393 and Quinpirole) were administered subcutaneously 15 min prior to the session. As there was no significant difference between the signaled and signaled-light groups, the signaled-light group was not included in pharmacological testing. Table 2 shows the session completion times for both groups across drug types and doses. While there was considerable variability in the session completion times, there were no group differences between the signaled and unsignaled groups as assessed by pairwise comparisons (ps ≥ .149).

Table 2 Mean (± SEM) session completion times (in minutes) for the signaled and unsignaled groups as a function of drug type and dose

SCH-23390

Figure 4 shows the proportion LU choice for the signaled (a) and unsignaled (b) groups fit with Eq. 1. Initial doses of 0.017 and 0.03 mg/kg were attempted but, due to evidence of clear nonspecific effects (e.g., failure to eat pellets), neither dose was continued. Interestingly, SCH-23390 produced apparent group-dependent effects on both measures: the unsignaled group showed dose-dependent decreases in LU choices while the signaled group showed dose-dependent increases in LU preferences at higher doses.

Fig. 4
figure 4

Top. Mean (± SEM) proportion LU choice for the signaled (a) and unsignaled group (b) across doses of the D1 antagonist SCH-23390 as a function of LU and SC relative reward probability fit with Eq. 1. Bottom. Mean (± SEM) best fit parameter estimates for sensitivity to probability (c) and magnitude (d) as a function of SCH-23390 dose used in Fig. 4a, b. Note: the error bars represent the variability between subjects at that predicted data point

To determine statistically if any group differences or effects of dose were present, the NLME model using Eq. 1 was again used with the NLME-determined parameter estimates for sensitivity to reinforcement probability (SP; 4c) and magnitude (SM; 4d) for each group at each dose graphically shown in Fig. 4c, d. First, the unsignaled group was generally more sensitive to reinforcement probability (SP) [F(1, 161) = 10.04, p = .002], and sensitivity to reinforcement probability across groups was differentially affected by SCH-23390 dose [F(1, 161) = 4.42, p = .037]. The interaction on SP was due to an increase in sensitivity across doses for the unsignaled group [t(161) = −2.10, p = 0.37], while Sp estimates in the signaled group did not change as a function of dose. Finally, magnitude sensitivity (SM) was significantly higher in the unsignaled group [F(1, 161) = 6.27, p = .013].

Eticlopride

Figure 5 shows the proportion LU choice for the signaled (a) and unsignaled (b) groups fit with Eq. 1. Systematic dose-dependent effects (shown in Fig. 5c, d) were not found in any models even allowing for a quadratic effect of dose. The only effect that appeared was the unsignaled group showing significantly higher sensitivity to reinforcement probability (SP) than the signaled group [F(1, 221) = 5.80, p = .017] similar to baseline.

Fig. 5
figure 5

Top. Mean (± SEM) proportion LU choice for the signaled (a) and unsignaled group (b) across doses of the D2 antagonist Eticlopride as a function of LU and SC relative reward probability fit with Eq. 1. Bottom. Mean (± SEM) best fit parameter estimates for sensitivity to probability (c) and magnitude (d) as a function of Eticlopride dose used in Fig. 5a, b. Note: the error bars represent the variability between subjects at that predicted data point

SKF-38393

Figure 6 shows the proportion LU choice for the signaled (a) and unsignaled (b) groups fit with Eq. 1. Systematic dose-dependent effects (shown in Fig. 6c, d) were not found in any models even allowing for a quadratic effect of dose. Indeed, the only effects found were that the unsignaled group showed significantly greater sensitivity to reinforcement probability (SP) [F(1, 216) = 8.00, p = .005] and magnitude (SM) [F(1, 216) = 4.25, p = .040] than the signaled group.

Fig. 6
figure 6

Top. Mean (± SEM) proportion LU choice for the signaled (a) and unsignaled group (b) across doses of the D1 agonist SKF-38393 as a function of LU and SC relative reward probability fit with Eq. 1. Bottom. Mean (± SEM) best fit parameter estimates for sensitivity to probability (c) and magnitude (d) as a function of SKF-38393 dose used in Fig. 6a, b. Note: the error bars represent the variability between subjects at that predicted data point

Quinpirole

Figure 7 shows the proportion LU choice for the signaled (a) and unsignaled (b) groups fit with Eq. 1. Quinpirole produced dose-dependent decreases in LU preferences in both groups. To quantify the changes, NLME with Eq. 1 was again used to compare estimates of sensitivity to reward probability (SP) and magnitude (SM) with the NLME-determined parameter estimates illustrated in Figs. 7c, d. The NLME analysis revealed that sensitivity to reinforcement probability (SP) tended to be differentially affected across the two groups by Quinpirole (p = .058), but showed only the unsignaled group had heightened sensitivity to relative reinforcement probability (SP; F(1, 161) = 9.99, p = .002). Additionally, Quinpirole produced decreases in sensitivity to relative reward magnitude (SM) as a function of dose for both groups [F(1, 161) = 8.02, p = .005].

Fig. 7
figure 7

Top. Mean (± SEM) proportion LU choice for the signaled (a) and unsignaled group (b) across doses of the D2 agonist Quinpirole as a function of LU and SC relative probability fit with Eq. 1. Bottom. Mean (± SEM) best fit parameter estimates for sensitivity to probability (c) and magnitude (d) as a function of Quinpirole dose used in Fig. 7a, b. Note: the error bars represent the variability between subjects at that predicted data point

Discussion

Behavior

The present experiment revealed several novel findings relevant to understanding suboptimal choice. First, suboptimal choice in rats was produced in the signaled group via reduced sensitivities to reinforcement probability (SP) relative to the unsignaled group that served as analogous control for procedures lacking any gambling related cues; this finding corroborates previous research indicating suboptimal choice can be demonstrated in rats when sufficiently valued signals, or conditioned reinforcers, are used (Chow et al. 2017; Smith et al. 2016). Second, the levels of an individual’s SP were directly related to how much reward was earned when choosing the LU alternative was suboptimal. Third, the signaled-light group, which had a distinctive and salient loss cue, did not display differential decision-making relative to the signaled group with a less-salient loss cue. That the two signaled groups were not different suggests that choice inhibition to losses is less of a determining factor of suboptimal choice (cf. Martínez et al. 2017; Trujano et al. 2016) relative to the value of the win signal (see supplementary materials for continued discussion). Finally, while the value of the win signal appears to be the primary determinant of suboptimal choice, sign-tracking rates (i.e., approach and contact) to the signal (an often used measure of a signal’s value) was not associated with LU risk preferences as suggested previously (Anselme 2015; Anselme et al. 2013). Thus, similar to previous research (Chow et al. 2017; Picker and Poling 1982), the present results suggest that the value of an alternative can be independent of how much an individual responds to that stimulus (see supplementary materials for continued discussion).

Interestingly, an unexpected finding was that modeling of behavioral data showed that the unsignaled group had higher sensitivities to reinforcer magnitude (SM) in nearly all instances. Such a result is unexpected based on an analogous procedure and analysis conducted with pigeons (Smith et al. 2017) indicating the role of win signals was to increase the subjective magnitude of reinforcement rather than reducing sensitivity to probability (SP). Indeed, the notion of a jackpot stimulus invokes the perception of an increased magnitude of reward, yet the current results do not directly support this. One possibility is that, rather than assessing parameters in isolation, their relative influence may be important as stated in the matching equation through multiplying sensitivities to the dimensions of probability and magnitude. For example, although the unsignaled group had greater SM values, its proportion of influence [defined as SM/(SP + SM)] for the unsignaled group was 0.53 for the baseline phase, while the signaled group was 0.69; these results suggest that SM exerted a greater proportional influence in the signaled group’s decision-making, similar to a jackpot-like effect, while the unsignaled group tended to equally weigh both magnitude and probability of reward.

Pharmacology

The present experiment supports the role of dopamine mediating the efficacy of win signals and corroborates previous studies (Barrus and Winstanley 2016). While no systematic effects were observed for the D1 agonist SKF-38393 or the D2 antagonist eticlopride, systematic dose-dependent effects were found for the D1 antagonist SCH-23390 and the D2 agonist Quinpirole. Specifically, SCH-23390 produced a pharmacological dissociation between the two groups, namely dose-dependent increases in sensitivity to probability (SP) were exhibited by the unsignaled group, while the signaled group demonstrated a decrease in SP. The results with SCH-23390 both corroborate previous results using unsignaled conditions (St Onge and Floresco 2009) and extend the role of the D1 receptor in mediating risk sensitivity for signaled conditions.

Previous research has shown that dopaminergic activity within mesocorticolimbic circuits may encode reward value and that neuronal firing increases when under uncertainty (Fiorillo et al. 2003; Schultz 2010); this has led to the suggestion that D1 antagonism may dampen dopaminergic activity and reduce an individual’s ability to effectively make decisions involving risk leading to preferences for more certain alternatives (i.e., the SC; St Onge and Floresco 2009). The current results may corroborate that D1 antagonism promotes preferences for certainty. The unsignaled group showed reductions in their LU preferences and increases in sensitivity to relative reward probability (SP). Conversely, the signaled group showed the opposite pattern, with increases in LU preference and decreased sensitivity to relative probability. However, when the Signaled group chose the LU alternative and won, they received a certain and informative cue indicative of greater reward than the SC alternative that served as a conditioned reinforcer. As sensitivity to reward magnitude was unaffected by D1 antagonism, the increase in risk taking by the signaled group may be due to an increased preference for certainty, but due to differences in trial signaling across groups, actually promoted risk taking instead of decreasing it. Further research is needed to determine the underlying neurobehavioral mechanisms.

In contrast to D1 antagonism, Quinpirole produced a dose-dependent decrease in LU choices for both groups and did so via reducing sensitivity to relative magnitude (SM) rather than relative probability (SP) of reward. The results with Quinpirole therefore reflect a change in an individual’s ability to discriminate large from small rewards, perhaps similar to findings where administration eliminates preferences for highly-palatable food rewards (Cooper and Al-Naser 2006), and affects a distinct behavioral mechanism from SCH-23390 that altered sensitivity to the relative frequency of reward. Such a result highlights the importance of using mathematical models to parse out what dimensions of reinforcement (e.g., probability and magnitude) are responsible for producing what may appear to be the same behavioral effect (i.e., reduced LU or risky choices).

While the effect of D1 antagonism on unsignaled outcomes herein was consistent with an analogous study by St Onge and Floresco (2009), nearly all other effects were inconsistent. For example, in St Onge and Floresco (2009), the D2 agonist, antagonist, and D1 agonist Eticlopride, Bromocriptine, and SKF-81297, respectively, produced changes in risky decision-making absent from the current study. The differences in results may have stemmed from differences in the behavioral procedure (Yates et al. 2016), our use of an additional reward probability of 6.25%, or underlying differences between the drugs used. For example, the D2 agonist Quinpirole used here has been reported to have greater D3 affinity than Bromocriptine used by St Onge and Floresco (Levant and De Souza 1993). Such a difference may account for why Quinpirole produced a similar effect in both groups that was different from Bromocriptine, as D3 receptor modulation has been shown to alter signaled (Barrus and Winstanley 2016; albeit in the wrong direction) and unsignaled (St Onge and Floresco 2009) risk preferences alike.

Although a singular explanation for discrepant results is beyond the current scope, the present results highlight the complex interaction that occurs between pharmacological agents and the environment an individual is in while under its influence. Much research on the neurobiology of risky choice has illustrated the importance of the neuroanatomy of the agent (e.g., Onge et al. 2012; Orsini et al. 2015; Sugam et al. 2012); however, it is also important to consider how the roles of different receptor subtypes and brain areas may change when they interact with different environments. The pharmacologically dissociable effects between the signaled and unsignaled groups seen with SCH-23390 highlight such a difference and may suggest that signaling the choice outcome changes the underlying neurobehavioral process. Future research can further aid elucidating pharmacological effects by focusing on systematic drug effects that alter an entire choice function (i.e., choice models), rather than assessing a momentary change in one block that may be due to random variation. Models can also prove a useful tool in dissociating drug mechanisms as highlighted via the dissociation between SCH-23390 modulating SP and Quinpirole modulating SM; without the use of a choice model, the results produced by the two drugs would have simply appeared as similar LU preference decreases.

The present research has shown that win signals with sufficient conditioned reinforcing value are efficacious moderators of risky choice when they signal an increase in the magnitude of reward, that increasing the salience of a loss with a cue light signal did not reduce risk preferences, and that dopaminergic activity mediates the efficacy of the win signal. It is clear that signaling choice outcomes can produce robust changes in risk preference, but future research is needed to understand the mechanism by which the conditioned reinforcer influences risk preferences. If we are to understand how individuals engage in choices under risk, we must also understand how cues in the environment modulate risk preferences, as they may have important implications for understanding how humans make decisions as well as having clinical implications for problem gamblers (Molet et al. 2012).