Main

When making a categorical decision about a noisy stimulus, subjects commonly fluctuate between levels of commitment to a choice before reporting a decision. In some instances, the fluctuations are sufficiently strong to lead to a change of mind (CoM)2,3,4,5,6,7,8. Because these within-trial fluctuations are different from trial to trial and not necessarily tied to an external event or stimulus feature, they can only be captured using a moment-to-moment neural readout of the decision state on single trials.

To obtain such a readout, we decoded an instantaneous DV in real time from neural population activity in dorsal premotor cortex (PMd) and M1 while two monkeys performed a motion-discrimination task9,10 (Fig. 1a, Supplementary Methods 3; all methods for this paper are provided in the Supplementary Information). We used a linear decoder, trained on previously obtained data, on multielectrode spiking data from the preceding 50–100 ms, updated every 10 ms throughout each trial (Fig. 1b, Supplementary Methods 9, 10). The sign of the DV indicated which choice was predicted, enabling calculation of the decoder’s prediction accuracy. The magnitude of the DV reflected the model’s prediction confidence in units of log odds for one versus the other decision (Supplementary Methods 9). Note that the DV defined here encompasses all choice-predictive signals that can be decoded from neural activity11, including but not limited to accumulated evidence as posited in classical models.

Fig. 1: Real-time readout of decision states during a motion discrimination task.
figure 1

a, Motion discrimination task. Task design, single trial (described further in Supplementary Methods 3). Decision states were continuously decoded during all epochs of the trial. Three different decoders were used during different trial epochs (coloured boxes; Supplementary Methods 9). We focused primarily on the dots epoch in this study. b, Real-time, closed-loop setup. Neural activity was continuously recorded, processed and decoded (Supplementary Methods 7 and 10). The resulting real-time DV could be used to stop the stimulus presentation in a neurally contingent manner (red arrow), closing the loop in the experiment. c, Choice prediction accuracy from real-time, open-loop readout. Black traces, mean (± s.e.m.) prediction accuracy (Supplementary Methods 9, 10) over time, pooled across monkeys (calculated for each session and averaged across sessions; 32,294 trials total). d, Average DV traces during dots period. Top, mean DV for right (red) and left (blue) choices, pooled across monkeys. Bottom, mean DV sorted by choice and stimulus coherence (correct trials only), pooled across monkeys. Darker shades correspond to higher coherences (Supplementary Methods 4). Red (blue) bars indicate time points for which coherence was a significant regressor of DV for right (or left) choices, respectively (correct trials only; linear regression, P < 10−5 uncorrected, two-sided t-statistic). e, Example single-trial DV traces, open-loop trials (monkey H).

Source data

We demonstrate that this real-time DV can predict choices on single trials beginning approximately 250 ms after visual stimulus onset, and that prediction accuracy increases throughout the course of the trial, consistent with previous offline observations2. Moreover, we employ closed-loop, neurally contingent control over stimulus timing to directly probe the behavioural significance of within-trial DV fluctuations. We quantify the behavioural effects of previously covert DV variations (1) as a function of time and instantaneous DV (experiment 1), (2) during CoM-like DV fluctuations (experiment 2), and (3) in response to subthreshold stimulus pulses (experiment 3). Using this approach, we validate the behavioural relevance and computational implications of intra-trial DV fluctuations.

Real-time choice decoding

Psychophysical performance on the discrimination task11 was better for higher coherences and stimuli of longer duration (Extended Data Fig. 1a), as expected from previous studies9,12. We first measured the accuracy of our real-time decoder in predicting monkeys’ choices as a function of time during the trial. The average prediction accuracy started near chance during the targets epoch (Fig. 1c, Extended Data Fig. 1b). During stimulus presentation, average prediction accuracy quickly departed from baseline, rising monotonically to 99% correct for the longest stimulus presentations for monkey H and 98% for monkey F. Moreover, for all 4 epochs considered, the average accuracy of our real-time readout was within ±2% of an equivalent offline decoder (Extended Data Figs. 2a–d, Supplementary Methods 12.3; comparisons between PMd and M1 in monkey H and for decoders trained in different epochs are presented in Extended Data Figs. 3, 4). Thus, our real-time decoder reproduces the prediction accuracy of our own offline analyses and of an analogous study of the prearcuate cortex2.

Our real-time decoder also reproduced the average temporal dynamics and coherence dependence expected of the DV: it started at around 0 at dots onset, separated by choice after about 200 ms, and rose (or fell) faster for higher coherence trials (Fig. 1d, Extended Data Fig. 1c). As expected from previous results13, prediction accuracy was higher for correct trials than error trials (Extended Data Fig. 5) at constant stimulus coherence.

Our decoding method yielded stable performance across multiple days, justifying combination of data across sessions (Extended Data Fig. 6).

DV fluctuations track evolving decisions

We often observed large fluctuations (over 3 natural log units) in the DV on individual trials, even within single epochs (Fig. 1e). If moment-to-moment fluctuations in DV reflect fluctuations in the animal’s decision state, we expect larger absolute values of DV to be associated with stronger preference for one of the two choices, and hence higher prediction accuracy were a decision to be required at a given time during a single trial.

Because we decoded and tracked the DV in real time, we were able to test this expectation by terminating the visual stimulus in a neurally contingent manner and probing both neural activity and behaviour with high precision and negligible latency (less than 34 ms; Supplementary Methods 11.4). In the first closed-loop test (experiment 1), we imposed virtual decision boundaries at specific DV values that, if reached, triggered stimulus termination (Fig. 2a), prompting the subject to immediately report its decision (in trials with no delay period). For example, Fig. 2b shows 22 DV traces that reached a fixed DV boundary of magnitude 3 (tolerance of ±0.25 DV units), leading to stimulus termination and the subject’s decision. In this manner, we obtained a direct mapping between the nearly instantaneous readout of decision state and the likelihood of a given choice.

Fig. 2: Choice likelihood, accurately decoded in real-time using only 100 ms of neural data.
figure 2

a, Virtual boundary experiment schematic. Virtual boundaries for DV magnitude (green shading) were imposed and if reached, triggered the termination of the stimulus (Supplementary Methods 11.1). The subject then immediately reported its decision. A minimum stimulus duration was imposed (grey shading). Grey traces, cartoons of trials for which the boundary was not reached. Red (blue) traces, cartoons of terminated trials for which the decoder predicted right (left) choices. b, Example trials during the virtual boundary experiment (monkey H). Real-time DV time courses for example trials terminated using boundaries set at ±3 DV units. c, Prediction accuracy as a function of DV magnitude. Blue trace and black symbols, all 5,876 trials from both monkeys during the virtual boundary experiment. Mean prediction accuracy ± s.e.m. and median DV magnitude were calculated and plotted separately for each of six DV quantiles. Dashed black line, predicted accuracy from log-odds equation used to fit the DV model; red dashed line, chance level. d, Single trial DVs substantially increase prediction accuracy. Prediction accuracy as function of coherence for three nested models with successively more regressors (Supplementary Methods 12.1). ME, motion energy.

Source data

We systematically swept boundary heights from 0.5–5 DV units in increments of 0.5 (1 DV unit corresponds to an increase of 2.718 in the likelihood ratio of choosing one target over the other). Figure 2c shows that prediction accuracy increases monotonically with the DV magnitude at termination, as expected. Using only 100 ms of data to estimate the terminating DV, the observed likelihood of a given choice (solid trace) differed from that predicted by the logistic function (dashed trace) by 1.7% for monkey H and by 1.9% for monkey F (mean absolute error; Extended Data Fig. 1d). Notably, prediction accuracy falls systematically as the time window for calculating DV is moved further than 100 ms into the past (Extended Data Fig. 1g). Thus, very recent neural population activity better reflects the current decision state than earlier time intervals. In further analyses, we performed the calculation in Fig. 2c on subsets of the aggregated data: high versus low stimulus coherences and short- versus long-duration trials. The result in Fig. 2c is robust across trial duration, but differs modestly for high versus low coherences (Extended Data Fig. 1e, f), revealing a significant effect of DV derivative on prediction accuracy (Supplementary Note 1, Supplementary Table 2).

Overall, these results show that moment-by-moment fluctuations in PMd and M1 neural population activity captured by our decoding model are indeed reflective of a fluctuating internal decision state—fluctuations that have been covert and thus uninterpretable until now.

To quantify how much additional predictive power is gained from the real-time DV readout compared with (1) the stimulus itself, and (2) the average DV for a given stimulus coherence and time-in-trial, we built three nested logistic regression models, each using an additional regressor (Supplementary Methods 12.1). The first model, using only stimulus information (motion energy) plus an intercept, correctly predicted choice in 74.5% of trials for monkey H and 71.5% of trials for monkey F (Supplementary Table 1). Adding the average DV for the corresponding stimulus coherence and time in trial to this model increased prediction accuracy by 2–3%. By contrast, adding the single-trial DV at termination as a third regressor increased prediction accuracy by more than 10%. This effect is substantial for lower-coherence trials (Fig. 2d, Extended Data Fig. 1h). Second, as a complementary analysis, we built four logistic regression models, three using only one of the above regressors (Supplementary Methods 12.1) and a fourth using signed motion coherence. Not only was single trial DV by itself 10% more accurate than any other regressor, it was also only 1–2.5% less predictive than the model with all 3 regressors (Supplementary Table 1).

We emphasize that our decoded DV is model based and thus only a proxy for the actual decision state in the brain. We are sampling from a relatively small number of neurons in only one brain region, over relatively short time bins, and the underlying mechanism is unlikely to be strictly linear. Despite these caveats, our ability to predict choice likelihood within a small margin of error confirms that DV is a reliable proxy for decision state.

Neurally detected changes of mind

Validation of the mapping between DV and choice likelihood (Fig. 2c) enabled us to perform a new closed-loop experiment (experiment 2) aimed at capturing robust DV fluctuations in which the sign of the DV changed mid-trial, suggestive of a behavioural CoM (Fig. 3a, b). We established neural criteria for a candidate CoM that, when met in real time, led to stimulus termination and the monkey’s decision (Fig. 3a, Supplementary Methods 11.2).

Fig. 3: Putative CoMs detected in real time.
figure 3

a, CoM experiment schematic. On each trial, if a DV zero crossing was detected and met remaining CoM criteria (Supplementary Methods 11.2, Supplementary Table 4; green arrows, criteria for temporal stability of DV sign; orange arrows, criteria for minimum DV deflections), the stimulus was terminated and the subject immediately reported its decision (red cartoon, example trace). Grey cartoon example traces, trials in which CoM criteria were not met. A 250-ms minimum stimulus duration was imposed (grey shading). b, Example CoM trials. Real-time DV traces for two example trials from monkey H with candidate CoMs. c, CoM count as a function of coherence level. Data from 2,712 CoMs total, pooled across monkeys. CoM frequency was negatively correlated with coherence (linear regression P < 0.001). df, Same data as c. d, Corrective and erroneous CoM counts as a function of coherence. Data sorted by reward outcome. e, CoM frequency as a function of time during stimulus presentation. Edge effect (increasing CoM frequency from 250 to 400 ms) results from exclusion of potential CoMs that would have resolved before the minimum 250 ms stimulus duration. f, CoM time as a function of coherence. Mean (± s.e.m.) time of the zero crossing for CoM trials. CoM time was negatively correlated with coherence (linear regression, P = 1.14 × 10−26).

Source data

We conceptually divide a CoM trial into two segments—the initial preference before the DV sign change, and the final (opposite) preference that leads to the observed choice. The interpretation of the initial preference relies on the mapping between the DV and choice likelihood obtained from experiment 1. The observed choices allow validation of the neural estimate of the final decision state in the second segment (Extended Data Fig. 7a, Supplementary Note 2).

For monkey F, the relationship between prediction accuracy and DV at stimulus termination was very similar for CoM and non-CoM trials (compare Extended Data Fig. 1d, right with Extended Data Fig. 7a, right; mean error between predicted and observed choice likelihood: 1.9% for non-CoM trials and 3.8% for CoM trials). This relationship was also lawful and monotonic for monkey H, although observed prediction accuracy was lower than expected from the logistic model (compare Extended Data Fig. 1d, left with Extended Data Fig. 7a, left; mean error between predicted and observed choice likelihood: 1.7% for non-CoM trials and 9.3% for CoM trials), suggesting that in addition to the measured DV at stimulus termination, the decisions of monkey H were influenced by some aspect of the DV trajectory history specifically related to the CoM (Extended Data Fig. 7a, Supplementary Note 3, Supplementary Table 3, Supplementary Methods 12.4).

We combined all 985 CoMs detected in monkey H (and all 1,727 CoMs detected in monkey F) to assess whether our neurally detected CoMs conformed to three statistical regularities of CoMs established in previous psychophysical3 and electrophysiological2 studies: (1) CoMs are more frequent for low- and intermediate-coherence trials compared with high-coherence trials; (2) CoMs are more likely to be corrective than erroneous; and (3) CoMs are more frequent early in the trial than later in the trial. All three predicted regularities are true in our real-time neural detection data (Fig. 3c–e, Extended Data Fig. 7b–d).

We also discovered a new regularity associated with CoMs: the average time of zero crossing was negatively correlated with stimulus coherence (Fig. 3f, Extended Data Fig. 7e). This observation probably results from the stronger corrective effect of higher-coherence stimuli (Fig. 3d, Extended Data Fig. 7c).

Of note, the statistical regularities in the neural CoMs were not foreordained since our decoder was trained on choices made at the end of trials, completely agnostic to rare CoMs during any given trial.

Probing decisions with motion pulses

In a final closed-loop experiment (experiment 3), we tested whether neural and behavioural responses to brief pulses of additional motion information varied with DV and/or the time of pulse onset. Decision-making models involving accumulation of evidence to a bound14,15,16,17 predict that termination of deliberation and commitment to a choice become more likely at high DV values2,3,9,17,18, resulting in decreased sensitivity to stimulus information beyond the point of commitment. We therefore hypothesized that additional pulses of sensory evidence would have less effect on neural DV and behavioural choices when triggered by high DV values.

To test this prediction, we imposed virtual DV boundaries (as in Fig. 2a, b) that, if reached, triggered a 200-ms pulse of additive dots coherence (randomly assigned to be rightward or leftward on each trial) followed by stimulus termination (Fig. 4a). We swept a subset of the previously used DV values for the boundary (integers from 1 to 4 DV units). Pulses were only presented on trials with motion coherences near or below the subject’s psychophysical threshold; pulse strength was calibrated to yield very small but significant effects on behaviour, to avoid making the pulses sufficiently salient to change the animals’ integration strategy (∆coherence = 2% (monkey H), 4.5% (monkey F)). Pulse information had no bearing on reward9,19. Motion pulses slightly but significantly biased the monkeys’ choices in the direction of the pulse (Fig. 4b).

Fig. 4: Neurally triggered motion pulses nonlinearly bias choice and DV.
figure 4

a, Motion pulse task trial structure (Supplementary Methods 11.3). b, Psychometric functions for closed-loop motion pulse trials. Curves fit to trials with motion pulses only (Supplementary Methods 11.3, 16). Data points show mean (± s.e.m.) proportion of rightward choices. Black, highest coherence trials, no pulses presented. c, Average change in post-pulse DV aligned to estimated PERL, with mean subtracted. 𝛿DV, difference between DV and DVPERL at each time point. Traces (mean ± s.e.m.) are terminated 100 ms before each subject’s median reaction time, or 150 ms before the single-trial reaction time (whichever came first; Supplementary Methods 16). The mean 𝛿DV across pulse directions in each time bin has been subtracted for visualization. Black dots, time bins in which 𝛿DV differs significantly for leftward versus rightward pulse trials (false discovery rate 0.05). Same trials as b, pooled across monkeys. d, Average change in post-pulse DV for each DV boundary, aligned to PERL (mean subtracted). Same trials and conventions as c, sorted by DV boundary. e, Residual behavioural pulse effects over |DVPERL| (Δ(mean choiceresD)), difference between the mean residuals for stimulus-congruent minus incongruent pulse trials (Supplementary Methods 16.1). Variable is signed according to the direction of the motion of the baseline stimulus. Black, mean (± s.e.m.) residual pulse effects on choice for trials in each |DVPERL| bin. Asterisks denote significantly non-zero means at 95% confidence (bootstrapped; Supplementary Methods 16.1). Blue, nonlinear regression model fit of the residuals to a half Gaussian over |DVPERL| (Levenberg–Marquardt algorithm, using the MATLAB fitnlm function), including the P value for the fit amplitude coefficient (two-sided t-statistic). Same trials as d. fh, Same trials, conventions and statistics as e. f, Residual neural pulse effects over |DVPERL| (Δ(mean ΔDVresD)). ∆DV, DV averaged over the last 50 ms of the time window described in c, minus the DV averaged over the 50 ms pre-PERL (Supplementary Methods 16.1). Black, mean (± s.e.m.) residual pulse effects on ∆DV for each |DVPERL| bin. g, Residual behavioural pulse effects over time (Δ(mean choiceresT)). Black, mean (± s.e.m.) residual pulse effects on choice for trials in each stimulus-duration quantile. h, Residual neural pulse effects over time (Δ(mean ΔDVresT)). Black, mean (± s.e.m.) residual pulse effects on ∆DV for each stimulus-duration quantile.

Source data

We reasoned that, to detect the small effects of these weak motion pulses on DV, and to best estimate the DV at the time when pulse information could actually influence the momentary decision state, we should account for a processing delay between pulse presentation and measured effects on our recorded neural populations in PMd and M1. We refer to this delay, estimated from an independent set of open-loop trials, as the evidence representation latency (ERL), which is 170 ms for monkey H and 180 ms for monkey F (Supplementary Methods 16). For each pulse trial, we then measured the initial DV at the time of pulse onset plus the ERL (DVPERL) (Supplementary Methods 16), as well as the change in DV (𝛿DV) for each subsequent time bin. On average, motion pulses slightly but significantly biased 𝛿DV in the direction of the pulse (Fig. 4c, Extended Data Fig. 8a).

In simple, unbounded linear integration, we expect the magnitude of DV change in response to a fixed motion pulse to remain constant regardless of the initial state of the DV, as suggested above. By contrast, Fig. 4d (Extended Data Fig. 8b) shows that motion pulses led to larger DV changes when triggered by low DV values compared with high DV values, consistent with the presence of an absorbing decision bound.

We next addressed whether the decision bound is stationary or changing with time. For models with stationary bounds, the effect of the motion pulse would depend solely on the state of the DV at the time of the pulse, whereas for models with time-varying bounds16,17,18,20,21, the pulse effect would also depend on the pulse time. Devising an analysis that disentangles the effects of the DVPERL from pulse time also addresses a potential confound in the 𝛿DV analysis presented above: the motion pulse was always delivered at the end of the stimulus and, on average, longer stimulus durations were required to generate higher DVs in our experiment, as expected from standard evidence-accumulation models. Thus the 𝛿DV analysis results could have been partially shaped by elapsed time. We therefore conducted an additional analysis to determine whether the reduced pulse effect was attributable specifically to higher DVs, to later pulse times (longer stimulus durations), or both.

We adopted a data-driven approach to separate the effects of DVPERL and stimulus duration (Supplementary Methods 16.1). In brief, to isolate the effect of the magnitude of DVPERL (|DVPERL|), we (1) divided trials into eight quantiles for stimulus duration, (2) calculated a residual pulse effect for each trial by subtracting the mean pulse effect for each combination of stimulus-duration quantile and baseline motion strength, (3) recombined the data across duration quantiles to obtain statistical power, and (4) analysed how the residuals varied with |DVPERL|. We refer to this as the time-adjusted effect of DVPERL, that is the effect of DVPERL that cannot be accounted for by stimulus duration or baseline motion strength (Fig. 4e, f). Conversely, to isolate the DV-adjusted effect of stimulus duration, we (1) divided trials into |DVPERL| bins, (2) calculated residual pulse effects by subtracting the mean effects for each combination of |DVPERL| bin and baseline motion strength, (3) recombined the data across all |DVPERL| bins, and (4) analysed how the single-trial residuals varied with stimulus duration (Fig. 4g, h).

The time-adjusted magnitudes of both behavioural and neural pulse effects decreased systematically with |DVPERL| (Fig. 4e, f, Extended Data Fig. 8c, d, g–j), and the DV-adjusted magnitudes of both behavioural and neural pulse effects decreased systematically with stimulus duration (Fig. 4g, h, Extended Data Fig. 8e, f).

Discussion

In this study, we have combined neural population recordings with closed-loop, neurally contingent stimulus control to probe moment-to-moment fluctuations in decision states and validate their significance for behaviour. We show that large fluctuations in a decoded DV in premotor and primary motor cortices are nearly instantaneously (<100 ms) predictive of choices. Notably, these intra-trial DV fluctuations are not driven predominantly by intra-trial fluctuations in stimulus strength, as quantified by motion energy, even in CoM trials (Extended Data Fig. 9, Supplementary Note 4). This advance enabled real-time detection of covert cognitive events (such as CoM) at the neural level.

We exploited this approach to test current models of evidence accumulation and termination in decision making. We introduced weak motion pulses at known DV values during naturally evolving decisions. Strictly linear, unbounded accumulation models predict a constant effect of stimulus pulses irrespective of the momentary decision state or the time of pulse presentation during the trial. By contrast, we found that the neural and behavioural effects of stimulus pulses were strongest when delivered at low DV values or short stimulus durations.

Each of these two results establishes constraints on models of decision making. First, the decreased efficacy of stimulus pulses at higher DV values comprises direct evidence for absorbing bounds, a feature of the decision-making process that is widely assumed in many models of decision formation15,22,23,24,25. Our result indicates that the system becomes resistant to further motion input as the DV becomes larger, reflecting a stronger state of commitment to a choice. Evidence for a decision bound also emerged from experiment 1: DV variability decreases late in the trial, consistent with inferences from previous studies9,19 that the system becomes more resistant to new stimulus input at longer stimulus durations (Extended Data Fig. 10).

Second, the decreased efficacy of pulses with longer stimuli suggests that the amplitude of the terminating bound decreases with time during the trial. Two large groups of models that lack a time-dependent termination mechanism cannot explain our data, because they predict that the pulse effect will be determined by the strength of the pulse and the state of the model when the pulse is delivered, with no independent effect of time: (1) models that assume a static termination mechanism, including commonly used drift diffusion models with fixed decision bounds; and (2) models that lack a termination criterion for fixed and variable duration tasks, relying instead on dynamic competition between two alternatives to determine the final choice15,26. Extensions of the drift diffusion models—and the broader class of bounded accumulation models—that include time-dependent decision bounds27 or an urgency signal16,17,18,20,21,28 are compatible with our experimental observations (Supplementary Note 5).

Our study builds on a substantial literature of single-unit9,19 and neural population2,4,5,8,29 studies of decision mechanisms, and leverages the technical power of intracortical brain–computer interfaces developed for real-time control of prosthetic devices30,31,32,33,34,35,36. Our findings were enabled by the ability to accurately decode decision states in real time (Supplementary Note 6), which could bring the concept of cognitive prostheses37,38,39,40,41 closer to reality by providing another means of decoding subjects’ goals for use in prosthetic control. More broadly, our real-time closed-loop approach may also be applicable to other cognitive phenomena such as working memory and attention42,43, and even to affective processes41.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.