Introduction

The ability to estimate elapsed time and produce timed responses is critical for a wide range of behaviors. Interval timing, which is on the scale of hundreds of milliseconds to tens of seconds, contributes importantly to cognitive behaviors, such as associative learning, sensory-motor processing, and decision making [1,2,3]. Examples of interval timing tasks include the peak-interval timing procedure [4, 5], the time production task [6], and Pavlovian conditioning [7, 8], in which animals are trained to produce a motor action after a specific time interval [3]. A prominent feature of interval timing is the scalar property, i.e., the variability of timing scales with the interval being timed [6, 9,10,11], which is reminiscent of Weber’s law in the psychophysics of sensory discrimination [12]. Although many studies on interval timing have focused on performance in the steady state, there is also evidence showing adaptive timing behavior in a changing temporal context. For instance, during a switching interval variance task in which the distribution of action-reward delay switches between blocks, mice gradually adjust their waiting time after block switches [13]. For rats trained to perform peak-interval timing procedures at two different intervals that are varied across sessions but remain constant within a session, the responding times in initial trials of the current session are biased toward the inter-reinforcement interval in the last session [14]. For rats trained on fixed-interval reinforcement schedules, unpredictable changes in the inter-food interval within a session cause rapid changes in wait time [15]. These studies demonstrate that interval timing behavior changes with temporal statistics, analogous to the influence of trial history statistics on sensory perception and decision making [16,17,18,19]. However, it remains poorly characterized how interval timing behavior is influenced by recent experience of temporal interval. It is also unclear which brain regions exhibit history-dependent activity in timing tasks.

Neuronal activity related to temporal information processing and timing behavior has been reported in many brain areas, including cortex, thalamus, and basal ganglia [2, 3, 20, 21]. A whole-brain functional neuroimaging study showed that supplementary motor area is consistently activated across all timing tasks [22]. In an interval-generation task that requires the monkey to hold for a specific time interval instructed by a specific visual cue, neuronal activity in the supplementary and pre-supplementary motor areas is influenced by the interval time indicated by the instructional cue [23]. During a synchronization-continuation tapping task in which subjects need to keep track of time, recordings from the medial premotor cortex of monkeys showed that neurons are tuned to the duration of the produced interval [24], and the elapsed time within intervals can be well decoded from the population activity [25]. The secondary motor cortex (M2) in rodents is a homolog of the supplementary motor area and premotor cortex in monkeys [26,27,28,29]. In an odor-guided two-interval timing task in mice, the responses of neurons in a subregion of M2 in the ALM show temporal scaling between short and long interval trials, and the elapsed time from cue onset to reward delivery can be decoded from the dynamics of neurons [8]. In a sensory discrimination task that requires the use of short-term memory for movement planning, mouse ALM neurons exhibit preparatory activity that causally contributes to behavioral performance [30, 31]. A growing number of studies also found that M2 neurons carry trial history information about choice, outcome, and sensory stimuli [19, 32,33,34,35,36,37]. These findings raise the possibility that M2 activity in timing tasks might be involved in short-term memory of the time interval and can be influenced by trial history of intervals.

In this study, we aimed to investigate how interval timing behavior is dynamically influenced by a change of inter-reinforcement interval in the previous trial. We also applied optogenetics and extracellular recordings to examine the role of ALM activity in short-term memory of time interval. Our experiments revealed that interval timing behavior is rapidly influenced by trial history, and neuronal activity in the ALM is modulated by the interval in the previous trial.

Materials and Methods

Animals

All animal procedures were approved by the Animal Care and Use Committee at the Institute of Neuroscience, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences (IACUC No. NA-013-2019). The C57BL/6 mice were from SLAC Laboratory Animal Co. (Shanghai, China). VGAT-Cre (Slc32a1tm2(cre)Lowl/J, Stock No: 016962) and VGAT-ChR2 (B6.Cg-Tg(Slc32a1-COP4*H134R/EYFP)8Gfng/J, Stock No: 014548) mice were from the Jackson Laboratory (Bar Harbor, ME, USA). Adult (2 months at the time of surgery) male mice were used for all experiments. Mice were housed in groups of 4–6 per cage in the Institute of Neuroscience animal facility (12 h: 12 h light/dark cycle), with the humidity controlled at 40%–70% and temperature at 22–23°C.

Surgery

The mice were anesthetized with a mixture of fentanyl (0.05 mg/kg), medetomidine (0.5 mg/kg), and midazolam (5 mg/kg) injected intraperitoneally before surgery, and were head-fixed in a stereotaxic apparatus. Lidocaine jelly was applied to the incision site. Two craniotomies (~1 mm diameter) were made bilaterally above the cortical region to be manipulated. The virus (2–3 × 1012 viral particles/mL) was injected with a glass pipette (15–20 μm tip diameter) and a syringe pump (Harvard Apparatus, Holliston, USA).

To inhibit the subregion of M2 at the ALM (AP 2.46 mm, ML ±1.8 mm) (the ML coordinate ranged from 1.5 to 2 mm) [38], we used VGAT-ChR2 mice [39] or VGAT-Cre mice injected with AAV-FLEX-ChrimsonR, similar to previous studies [30, 31, 38]. In these mice, we photostimulated the excitatory opsin (ChR2 or ChrimsonR) in inhibitory neurons [39] to achieve ALM inhibition. Because our controls were C57BL/6 mice [31] (injected with AAV-hSyn-eGFP), we also included in the experimental group some C57BL/6 mice (injected with AAV-CaMKIIα-GtACR2 to express the inhibitory opsin in excitatory neurons). In total, we used 9 mice for the experiment of ALM inactivation: VGAT-ChR2 mice (n = 3), VGAT-Cre mice (n = 3, injected with AAV-FLEX-ChrimsonR), and C57BL/6 mice (n = 3, injected with AAV-CaMKIIα-GtACR2). For VGAT-Cre mice, a total of 200 nL AAV2/8-Syn-FLEX-rc[ChrimsonR-tdTomato] was bilaterally injected at a depth of 500 μm in the ALM. For C57BL/6 mice, a total of 200 nL AAV2/9-mCaMKIIα-hGtACR2-EGFP-WPRE-pA was bilaterally injected at a depth of 500 μm in the ALM. For control C57BL/6 mice, we bilaterally injected 200 nL of AAV2/8-hSyn-eGFP-3Flag-WPRE-SV40pA at a depth of 500 μm in the ALM.

To inhibit the central and medial subregion of M2 (AP 1.34 mm, ML ±0.75 mm), we used C57BL/6 mice (n = 6), in which 400 nL of AAV2/8-hSyn-eNpHR3.0-EYFP was bilaterally injected at a depth of 500 μm.

To inhibit the medial prefrontal cortex (mPFC, AP 2.0 mm, ML ±0.4 mm), we used C57BL/6 mice (n = 5), in which a total of 200 nL AAV2/9-hSyn-Jaws-KGC-GFP-ER2 was injected at a depth of 1600 μm.

A summary of the mouse lines, viruses, and numbers of mice used for each experiment is provided in Table S1.

After virus injection, the pipette was left in place for 10–15 min before retraction. Following the injection, optical fibers (200 μm diameter, NA 0.37) were inserted at the cortical surface for the injection site of M2 (either the ALM or the central and medial subregion of M2), and 350 μm above the injection site of the mPFC. In VGAT-ChR2 mice, optical fibers were bilaterally inserted above the ALM (AP 2.46 mm, ML ±1.8 mm). A stainless-steel head-plate was fixed to the skull using dental cement. For mice used in extracellular recording, the skull region above the ALM was marked with permanent ink. After the surgery, mice were given Rimadyl via drinking water for 3 days, and allowed to recover with food and water ad libitum for 10 days before behavioral training.

Behavioral Task

Mice were deprived of water for 2 days before the behavioral training. During the licking task, the mouse was head-fixed and sat in an acrylic tube. The lick spout was located ~3 mm in front of the tip of the nose and 1 mm below the lower lip. Touching the spout with the forelimbs was prevented by a plastic plate. Tongue licks were detected with a custom-made electrical lick sensor [40] or by the interruption of an infrared beam if the mice were used for electrophysiological recordings. Fluid delivery was controlled by a peristaltic valve (Kamoer, Shanghai, China). An Arduino microcontroller platform was used to measure licking, deliver fluid, and apply laser stimulation. A multifunction I/O device (USB-6001, National Instruments, Texas, USA) was used for data acquisition. The lick signals and task-related signals were sampled at 1000 Hz.

The mice first went through a habituation phase and a free-drinking phase before being trained in a fixed-interval licking task. During the habituation phase (1–2 days), the mouse was handled by the experimenter for 5–10 min and learned to lick water (300−500 nL) from a syringe. During the free-drinking phase (1–2 days), the mouse was head-fixed into the behavioral apparatus for 30 min per session, and water (2–3 μL) was delivered to the spout every 4 s. During the training in the fixed-interval licking task (10–30 days), 2–3 μL of 10% sucrose was delivered every 10 s. After mice showed a high level of anticipatory licking before reward delivery, non-rewarded probe trials were inserted. The duration of each probe trial was 3–4 times longer than the interval of the inter-reinforcement trials. Each probe trial was preceded by 4–14 reinforced trials. The probe trials represented ~10% of all trials. Mice were trained with the probe trials inserted for 20–50 days until the peri-stimulus time histogram (PSTH) of licking in the probe trials resembled that in the reinforced trials. In the final task phase, each mouse performed the task for 1 h in each session. Mice used for electrophysiological recordings further went through 14–21 days of additional training, during which they were head-fixed for ~1 h before the timing task was initiated. This 1-h period was necessary to insert the electrode into the cortex in later electrophysiological experiments.

To examine the scalar property of timing behavior, we first trained a group of mice (n = 16) in a 10-s fixed-interval task. After the mice learned the task and were tested with the peak-interval timing procedure, we trained half of the mice with a 7.5-s interval and the other half with a 15-s interval.

Because probe trials represented 10% of all trials in a session, we used three sets of experiment to test the effect of different types of trial history. Mice were first trained in a 10-s fixed-interval task in all sets of experiments. In the first set of experiments (Fig. 2), we added probe trials that were preceded by an inter-reinforcement interval of 8 s or 12 s, by delivering the preceding reward 2 s earlier (reward at −2 s) or 2 s later (reward at +2 s), while the other inter-reinforcement intervals were still at 10 s. This allowed us to determine whether time estimation in the probe trial is affected by trial history of inter-reinforcement interval.

In the second set of experiments (Fig. S1), we examined whether the effects of decreasing and increasing the inter-reinforcement interval are symmetrical. In one session, half of the probe trials were preceded by an inter-reinforcement interval of 8 s (reward at −2 s) and the other half preceded by an inter-reinforcement interval of 10 s as usual (reward at 0 s). In another session, the two types of trial history for probe trials were ‘reward at +2 s’ and ‘reward at 0 s’, respectively, again with all other inter-reinforcement intervals remaining at 10 s.

In the third set of experiments (Figs 3 and S2), we determined whether a larger change of inter-reinforcement interval causes a larger shift in peak time. In one session, the two types of trial history for probe trials were ‘reward at −2.5 s’ (the reward in the previous trial was delivered 2.5 s earlier) and ‘reward at − 1 s’ (the reward in the previous trial was delivered 1 s earlier), with all other inter-reinforcement intervals at 10 s (Fig. 3). In another session, the two types of trial history for probe trials were ‘reward at − 1 s’ and ‘reward at 0 s’ (Fig. S2).

In each set of experiments, the two types of trial history were interleaved in a session, and their order was randomized across sessions.

Optogenetic Stimulation

Optical activation of GtACR2 or ChR2 was induced by blue light, activation of ChrimsonR or Jaws was induced by red light, and activation of NpHR was induced by green light. A blue (473 nm), red (635 nm), or green laser (532 nm) (Shanghai Laser & Optics Century Co., Shanghai, China) was connected to an output optical fiber and the stimulation was controlled by an Arduino microcontroller.

Optogenetic experiments were performed after the mice were well trained at an inter-reinforcement interval of 10 s. For half of the probe trials, 2 s of constant laser stimulation was applied before the time of reward delivery on the trial preceding the probe trial. To prevent the mice from using laser stimulation as a cue of no-reward, we also applied laser stimulation for 2 s before the time of reward delivery in randomly-selected reinforced trials. The laser power was 5 mW, 10 mW, and 15 mW at the fiber tip for the blue, green, and red laser, respectively.

Extracellular Recording

Before extracellular recording, the mice were head-fixed to a holder attached to the stereotaxic apparatus and anesthetized with isoflurane (1%–2%). Two craniotomies (~1 mm diameter) were made bilaterally above the ALM (AP 2.46 mm, ML ±1.8 mm). The dura was removed, and the craniotomy was protected by a silicone elastomer (Kwik-Cast, WPI, Saratosa, FL, USA). The mouse was allowed to recover from anesthesia in its home cage for at least 2 h. The recordings were made with multi-site silicon probes (A1×32-Poly2-10mm-50s-177-A32, NeuroNexus Technologies, Ann Arbor, MI, USA). After recordings, the electrode was retracted. The craniotomy was cleaned with saline and covered with silicone elastomer. For recordings from behaving mice, we made a total of 18 sessions of recording from 5 mice (3 or 4 sessions per mouse). In some recordings, the electrode was coated with fluorescent dye (DiI, Invitrogen, Eugene, OR, USA), which allowed us to mark the electrode track.

The neural signals were amplified and filtered using a Cerebus 32-channel system (Blackrock Microsystems, Salt Lake City, UT, USA). Spikes were sampled at 30 kHz. To detect the waveforms of spikes, we band-pass filtered the signals at 250–7500 Hz and set a threshold at 3.5 SD of the background noise. Spikes were sorted off-line using the Offline Sorter (Plexon Inc., Dallas, TX, USA). The sorting involved cluster cutting of spike waveform features in principal component space. Spike clusters were regarded as single units if the interspike interval was >1.5 ms and the p value for multivariate analysis of variance tests on clusters was <0.05. Task-related events (licks and reward delivery) were digitized as TTL levels and recorded by the Cerebus system.

Histology

The mice were deeply anesthetized with isoflurane and perfused with 30 mL saline followed by 30 mL paraformaldehyde (PFA, 4%). Brains were removed, fixed in 4% PFA (4℃) overnight, and then transferred to 30% sucrose in phosphate-buffered saline until equilibration. Brains were cut at 60 μm on a cryostat (Leica, VT1200S, Wetzlar, Germany). Fluorescence images were captured with VS120 (Olympus, Tokyo, Japan). Images were analyzed with ImageJ (NIH, Bethesda, MD, USA).

Data Analysis

Analyses were performed in MatLab. To analyze licking behavior in probe trials for the experiments in which all inter-reinforcement intervals were the same (at 7.5, 10, or 15 s), time zero of each probe trial was aligned to the time of previous reward delivery. For the experiments in which the inter-reinforcement interval immediately before the probe trials was shortened or lengthened, time zero of each probe trial was aligned to the time of the first lick after reward delivery in the preceding reinforced trial (i.e., aligned to the time of the first rewarded lick), because this was the time the mouse obtained the reward and may start to time the interval. For optogenetic experiments, we aligned time zero to the time of reward delivery preceding the probe trial, because all trials were aligned by the time of laser onset. PSTHs for the licking behavior were constructed by averaging the licks (200 ms/bin) across trials.

For each session, the lick PSTH in the probe trial was fitted with a Gaussian function. Only licks within the range of [T – T/2, T + T/2] were used for curve fitting, in which T indicates the inter-reinforcement interval. For example, for a session of 10-s fixed-interval task, licks from 5 to 15 s of the probe trials were used for curve fitting. Peak time was defined as the time corresponding to the peak lick rate in the fitted curve. Peak width was the full width at half maximum of the Gaussian fit.

We defined a lick bout as a group of licks in which the first lick was preceded by at least 0.67 s of no licks, the inter-lick interval of the first three licks was <0.33 s, and the last lick was followed by ≥0.67 s of no licks. For each probe trial, the start time and end time were defined as the times of first lick and last lick in a bout. For each session, we computed the start time (end time) for each trial, and averaged the start time (end time) across trials.

The Weber fraction was computed according to the following equation [11]:

\({\text{Weber}}\;{\text{fraction}} = \frac{{{\text{peak}}\;{\text{width}}}}{{{\text{peak}}\;{\text{time}}}}\).

To estimate the scalar property of lick PSTHs in probe trials of 7.5-s, 10-s, and 15-s fixed-interval tasks, time points along the X axis were normalized by the peak time, and lick rates along the Y axis were normalized by the peak amplitude.

For behaving mice used for electrophysiological recordings, the two types of trial history for probe trial were ‘reward at +2 s’ and ‘reward at −2 s’, with all other inter-reinforcement intervals remaining at 10 s. The two types of trial history were interleaved in a session. When we analyzed the probe trials, we aligned time zero to the time of the first lick after reward delivery in the preceding reinforced trial. The PSTH for spiking activity in each probe trial was constructed with a bin size of 200 ms. To test whether ALM activity was temporally modulated, we took 5 non-overlapping segments of the responses in the first 10 s of probe trials and tested the firing rate difference using one-way ANOVA. Neurons with P <0.05 were included in the subsequent analysis. On average, each session of recording yielded 9.72 ± 4.86 (mean ± SD) temporally modulated neurons. For each of the two types of trial history (‘reward at +2 s’ or ‘reward at −2 s’), we obtained a peak time (T1 or T2) by fitting the lick PSTH with a Gaussian function. To analyze ALM activity in the early part of a probe trial, we took the response profile between time zero and the peak time. The response profile between time zero and T2 in the ‘reward at −2 s’ condition was the short response profile, and that between time zero and T1 in the ‘reward at +2 s’ condition was the long response profile. We used the following equation to calculate the mean-squared error (MSE) between the short response profile and a scaled version of the long response profile (scaling factors ranging from 0.1 to 3):

$$ {\text{MSE(}}f{)} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left[ {PSTH_{short} (t_{i} ) - PSTH_{long} (f \times t_{i} )} \right]^{2} } , $$

where PSTHshort and PSTHlong are the short and long response profiles, respectively, n is the bin number, and f is the scaling factor. The best scaling factor was the one yielding the minimum MSE [41]. As the probe trials were 3–4 times longer than the inter-reinforcement interval, we also extracted two response profiles for the ‘reward at +2 s’ and ‘reward at −2 s’ conditions from the later part of probe trials, during which the mice were disengaged from timing. The lengths of the two later response profiles were the same as those of the early response profiles. We computed a best scaling factor for these two later response profiles by computing the MSE described above.

To decode elapsed time from the population firing rates of ALM neurons in each probe trial, we used a multiclass support vector machine (SVM) with a radial-basis function kernel, and the algorithm was implemented in the LIBSVM library [7, 42]. For each probe trial in the ‘reward at +2 s’ or ‘reward at −2 s’ condition, the period between time zero and the peak time (T1 or T2) was divided into 50 bins (i.e., the bin size was different for the two conditions). As the peak time (T1 or T2) differed across sessions (Fig. 5C, right panel), the time was scaled across sessions even for the same condition of trial history. The SVM decoder was trained on the data from the ‘reward at +2 s’ condition, and tested on the data from the same condition or on those from a different condition (‘reward at −2 s’). In addition to using responses in the early part of the probe trial, we also performed SVM decoding using responses in the later part of the probe trial.

Across all recording sessions, the minimum number of probe trials for the ‘reward at +2 s’ or ‘reward at −2 s’ condition was 18. For those sessions in which the number of probe trials was >18, we computed the correlation coefficients between the spike PSTH in each probe trial and the spike PSTH averaged over all probe trials, sorted the trials according to the correlation coefficients, and took the first 18 trials of ALM spikes for analysis. For those sessions with only 18 probe trials, ALM spikes in all probe trials were used for analysis.

For training and testing on data from the same condition, we sampled 17 of the 18 probe trials as the training data to train the SVM decoder, which was used to decode elapsed time based on ALM responses in the remaining probe trial (the testing data), and we implemented this process 18 times. For training and testing on data from different conditions, we used all 18 probe trials in the ‘reward at +2 s’ condition as the training data to train the SVM decoder, which was used to decode time based on ALM responses in a probe trial from the ‘reward at −2 s’ condition (the testing data), and we implemented this process 18 times.

During training, the SVM classifier was trained to identify population activity in each of the 50 time bins, so that the population firing rates in a given bin could be distinguished from those in each of the other bins. During testing, we presented a trial of population responses to the decoder, which predicted the population firing rates in each bin as coming from one of the 50 bins. The SVM output was represented in 50 readout units, one for each bin, forming a classification matrix (Fig. 6A, B). For each actual time bin in the classification matrix, the predicted bin was chosen as the bin corresponding to the readout unit with the maximum value. For each repeat of SVM implementation, we quantified the prediction accuracy by calculating the Pearson’s correlation coefficient between the predicted and actual bin values [7].

For the early part of probe trials, we compared the capability of encoding elapsed time between scaled and unscaled ALM neurons, whose best scaling factors were within and outside ±10% of the mean T2/T1 (computed using T2 and T1 in all sessions) [41]. Because decoding accuracy depends on the number of neurons [7], we randomly sampled the same number of neurons (n = 50) for both scaled and unscaled neurons, and repeated the resampling process 10 times. In each set of resampled data, we performed SVM decoding and computed correlation coefficients as described above.

Statistics

The statistical analysis was performed using MatLab. The Wilcoxon signed rank test, one-way ANOVA, one-way ANOVA followed by the Tukey-Kramer multiple comparisons test, one-way repeated measures ANOVA followed by Sidak's multiple comparisons test, and the χ2 test were used to determine the significance of the effect. Data are reported as the SEM unless otherwise stated.

Results

Licking-Based Peak-Interval Timing Task in Head-Fixed Mice

A classic behavioral paradigm to study interval timing is the peak-interval timing procedure, which includes fixed-interval reinforced trials and non-rewarded, longer probe trials (also termed peak trials) [4, 43]. Head-fixed mice were trained on a fixed-interval licking task as described in a previous study [11]. Mice voluntarily licked from a drinking spout, which delivered a drop of 10% sucrose solution every 10 s. After training, mice showed anticipatory licking that gradually increased toward the time of reward delivery (Fig. 1A). In mice well trained on the fixed-interval task, we implemented the peak-interval timing procedure, by adding non-rewarded probe trials to evaluate the accuracy and precision of time estimation (Fig. 1B). Probe trials were 10% of all trials, and each probe trial lasted 3−4 times longer than the inter-reinforcement interval in reinforced trials. While the reinforced trials contained both anticipatory and consummatory licking (Fig. 1A), licking behavior was not confounded by consummatory licks in probe trials. The lick PSTH in probe trials, which was constructed by averaging the licks (200 ms/bin) across trials, showed a peak around 10s (time 0 aligned to the last reward delivery, Fig. 1B), corresponding to the inter-reinforcement interval in the reinforced trials. To quantify timing behavior in probe trials, we fitted the lick PSTH with a Gaussian function, which yielded a peak time and a peak width (Fig. 1C). Across a population of mice (n = 16) trained with a 10-s fixed-interval task, the peak time and peak width in probe trials were 10.24 ± 0.09 s and 5.91 ± 0.24 s (Fig. 1D), respectively, indicating that the behavior in probe trials reflects the temporal memory of the inter-reinforcement interval. The start time and end time of the licking bouts in probe trials were 7.92 ± 0.12 s and 12.80 ± 0.17 s (Fig. 1D), respectively, also suggesting that the mice time the expected reward according to the inter-reinforcement interval.

Fig. 1
figure 1

Licking-based peak-interval timing task in head-fixed mice. A Lick rasters and lick PSTH from an example mouse in a 10-s fixed-interval task. B Lick rasters and lick PSTHs from an example mouse in reinforced trials and probe trials in a 10-s peak-interval timing task. In lick rasters of probe trials, blue and red dots mark the start and end times of lick bouts in each trial, respectively. C Gaussian fit of the lick PSTH in probe trials of an example session. D Parameters of licking behavior in probe trials of a 10-s peak-interval timing task (n = 16 mice). E Lick PSTHs for probe trials in 7.5-s, 10-s, and 15-s peak-interval timing tasks, averaged across 8, 16, and 8 mice, respectively. FH Peak time, peak width, and Weber fraction across the three intervals. I Normalized lick PSTHs for probe trials across the three intervals. Data represent the mean ± SEM.

To determine whether the timing behavior has a scalar property, we next trained half of the mice (n = 8) with a 7.5-s interval and the other half with a 15-s interval (Fig. 1E−I). Analysis of probe trials in the three interval conditions (7.5 s, 10 s, and 15 s) showed that, as the peak time increased from 7.96 ± 0.06 s to 15.50 ± 0.10 s (Fig. 1F), the peak width also increased from 3.90 ± 0.32 s to 8.94 ± 0.56 s (Fig. 1G). For lick PSTHs in probe trials of the three interval conditions, the Weber fraction was not significantly different (P = 0.13, one-way ANOVA, Fig. 1H), and the three curves normalized by peak time and peak lick rate largely overlapped (Fig. 1I). These results demonstrate the scalar timing property of the fixed-interval licking task, consistent with previous reports [11, 44].

Time Estimation in Probe Trials is Dynamically Influenced by a Decrease of Inter-reinforcement Interval in the Previous Trial

We next determined whether time estimation in the probe trial is affected by a change of inter-reinforcement interval in the previous trial. Because probe trials were 10% of all trials in a session, we used several sets of experiments to test the effect of different types of trial history. The first set of experiments contained both an increase and a decrease of inter-reinforcement interval in the same session (Fig. 2). The second set of experiments contained either an increase or a decrease of inter-reinforcement interval in a session (Fig. S1), with the purpose to determine whether the two effects are symmetrical. The third set of experiments (Figs 3 and S2) was designed to determine whether a larger change of inter-reinforcement interval causes a larger shift in peak time. Mice trained in a 10-s fixed-interval task were used for all sets of experiments. In well-trained mice, anticipatory licking in the probe trials mostly occurred within ±3 s of the peak time (Fig. 1B–D). We therefore restricted the increase or decrease of inter-reinforcement interval to within 3 s.

Fig. 2
figure 2

Time estimation in probe trials is dynamically influenced by previous inter-reinforcement interval. A Lick rasters and lick PSTHs in probe trials and three preceding reinforced trials for an example mouse. Red, ‘reward at −2 s’ condition; black, ‘reward at +2 s’ condition. Time zero in the probe trial was aligned to the time of first lick after the delivery of reward in the previous reinforced trial. B Lick PSTHs averaged across a population of mice (n = 16), similar to A. CG Comparison of peak time, start time, end time, peak width, or Weber fraction between ‘reward at −2 s’ and ‘reward at +2 s’ conditions. **P <0.01, ***P <0.001, n = 16 mice, Wilcoxon signed rank test. Data represent the mean ± SEM.

Fig. 3
figure 3

Peak time is shifted earlier when the previous inter-reinforcement interval is shorter. A Lick rasters and lick PSTHs in probe trials and three preceding reinforced trials for an example mouse. Red, ‘reward at −2.5 s’ condition; black, ‘reward at –1 s’ condition. B Lick PSTHs averaged across a population of mice (n = 19), similar to A. CG Comparison of peak time, start time, end time, peak width, or Weber fraction between ‘reward at −2.5 s’ and ‘reward at –1 s’ conditions. **P <0.01, n = 19 mice, Wilcoxon signed rank test. Data represent the mean ± SEM.

In the first set of experiments (Fig. 2), we used probe trials that were preceded by an inter-reinforcement interval of 8 s or 12 s, by delivering the reward 2 s earlier or 2 s later (Fig. 2A), while the other inter-reinforcement intervals were still 10 s. In the subsequent analysis, the two types of trial history were referred to as ‘reward at −2 s’ and ‘reward at +2 s’ conditions. To compare timing behavior in the probe trial between the two types of trial history, we defined time zero as the time of first lick after reward delivery, which was the time mice received a reward in the previous reinforced trial (Fig. 2A, B). We found that the peak time in probe trials was significantly earlier in the ‘reward at −2 s’ than the ‘reward at +2 s’ condition (P = 5.3×10-4, Wilcoxon signed rank test, n = 16, Fig. 2C). Similarly, the start and end times of the licking bout in probe trials were also earlier in the ‘reward at −2 s’ condition (P <0.01, Wilcoxon signed rank test, Fig. 2D, E). The peak width was not significantly different between the two conditions (P = 0.12, Wilcoxon signed rank test, Fig. 2F). The Weber fraction appeared to be larger in the ‘reward at −2 s’ than the ‘reward at +2 s’ condition, although the difference did not reach statistical significance (P = 0.07, Wilcoxon signed rank test, n = 16, Fig. 2G). Thus, the results suggest that time estimation is dynamically influenced by a change of inter-reinforcement interval in the previous trial.

In the second set of experiments (Fig. S1), we examined the effects of decreasing and increasing the inter-reinforcement interval separately. In one session, half of the probe trials were preceded by an inter-reinforcement interval of 8 s (‘reward at −2 s’) and the other half preceded by an inter-reinforcement interval of 10 s as usual (‘reward at 0 s’). We found that the peak time in the probe trials was significantly earlier in the ‘reward at −2 s’ than the ‘reward at 0 s’ condition (P = 0.03, Wilcoxon signed rank test, n = 7, Fig. S1A). In another session, the two types of trial history for probe trial were ‘reward at +2 s’ and ‘reward at 0 s’, respectively, again with all other inter-reinforcement intervals remaining at 10 s. As shown in Fig. S1F, the peak time in ‘reward at +2 s’ and ‘reward at 0 s’ conditions was not statistically different (P = 0.58, Wilcoxon signed rank test, n = 7). Thus, when the amount of interval change was 2 s, a decrease in previous inter-reinforcement interval is more effective than an increase in previous inter-reinforcement interval at influencing time estimation in the probe trial.

A third group of mice (n = 19) were used for the third set of experiments. In one session, the two types of trial history for probe trial were ‘reward at −2.5 s’ (the reward in previous trial was delivered 2.5 s earlier) and ‘reward at − 1 s’ (the reward in previous trial was delivered 1 s earlier), respectively, with all other inter-reinforcement intervals remaining at 10 s (Fig. 3A, B). We found that the peak time in probe trials was significantly earlier in the ‘reward at −2.5 s’ than the ‘reward at − 1 s’ condition (P = 0.0025, Wilcoxon signed rank test, n = 19, Fig. 3C). The start time of licking bout in the probe trial was significantly earlier in the ‘reward at −2.5 s’ condition (P = 0.0062, Wilcoxon signed rank test, Fig. 3D), although the end time was not significantly different between the two conditions (P = 0.3, Wilcoxon signed rank test, n = 19, Fig. 3E). The peak width was not significantly different between the two conditions (P = 1, Wilcoxon signed rank test, Fig. 3F). The Weber fraction appeared to be larger in the ‘reward at −2.5 s’ condition (P = 0.064, Wilcoxon signed rank test, n = 19, Fig. 3G). In another session, in which the two types of trial history for probe trial were ‘reward at − 1 s’ and ‘reward at 0 s’, respectively, we found that the peak time for the ‘reward at − 1 s’ was significantly earlier than that for the ‘reward at 0 s’ condition (P = 0.0011, Wilcoxon signed rank test, n = 19, Fig. S2). Together with the finding that peak time was earlier in the ‘reward at −2.5 s’ than the ‘reward at − 1 s’ condition, the data suggest that the peak time is shifted earlier when the inter-reinforcement interval in previous trial becomes shorter.

Optogenetic Inactivation of a Subregion of M2 at ALM Alters Timing of Licking Behavior

As the part of M2 at the ALM not only plays an important role in licking action [45, 46] but also encodes information about elapsed time in a licking-based timing task [8], we wondered whether ALM activity in the previous trial influences the timing of anticipatory licking in the current trial. To manipulate ALM activity, we used a total of 9 mice, including VGAT-ChR2 mice (n = 3), VGAT-Cre mice (n = 3, injected with AAV-FLEX-ChrimsonR), and C57BL/6 mice (n = 3, injected with AAV-CaMKIIα-GtACR2) (Fig. S3 and Fig. 4). After the mice were well trained with a 10-s fixed-interval licking task, non-rewarded probe trials (10% of all trials) were inserted to evaluate the performance of time estimation. For half of the probe trials, 2 s of constant laser stimulation was applied to inhibit the ALM (AP 2.46 mm, ML ±1.8 mm, with the ML coordinate ranging from 1.5 to 2 mm) before the time of reward delivery on the trial preceding the probe trial (Fig. 4A, B). To analyze the effect of optogenetic inhibition on timing behavior, we aligned time zero to the time of reward delivery preceding the probe trial, which was also the time of laser offset. We found that optogenetic inhibition of the ALM significantly reduced the lick rate of mice (P = 0.0039, Wilcoxon signed rank test, n = 9, Fig. 4C), consistent with previous reports that the subregion of M2 at the ALM causally contributes to licking behavior [45, 46]. Laser offset may cause a rebound in lick rate, but optogenetic inhibition did not significantly affect the interval of reward (i.e., interval between the time of first lick after reward delivery and the time of previous reward delivery) (P = 0.65, Wilcoxon signed rank test, n = 9, Fig. 4D), because laser was turned off at the time of reward delivery. Interestingly, however, inactivation of the ALM before reward delivery induced the peak to occur significantly earlier in probe trials (P = 0.0039, Wilcoxon signed rank test, n = 9, Fig. 4E). Optogenetic inhibition also caused an earlier shift in end time (P = 0.0078, Wilcoxon signed rank test, n = 9), without significantly affecting the start time (Fig. 4F, G). The peak width was not significantly different between laser-off and laser-on conditions (P = 0.73, Wilcoxon signed rank test, Fig. 4H). The Weber fraction in laser-on condition was significantly larger than that in laser-off condition (P = 0.039, Wilcoxon signed rank test, n = 9, Fig. 4I). Thus, inactivating ALM before reward delivery caused an earlier shift in peak time, similar to that induced by a shortening of inter-reinforcement interval preceding the probe trial (Figs. 2 and 3).

Fig. 4
figure 4

Optogenetic inactivation of ALM before reward delivery shifts the peak time earlier in probe trials. A Upper, schematic of the behavioral task and laser stimulation. Middle and lower, lick rasters and lick PSTHs in probe trials and 5 s of the preceding reinforced trials for an example mouse. Red, laser-on trials; black, laser-off trials; shading, duration of laser stimulation. B Lick PSTHs averaged across a population of mice (n = 9). CI Comparison of lick rate during [−2 0] s, interval of reward (interval between the time of first lick after reward delivery and the time of previous reward delivery), peak time, start time, end time, peak width, or Weber fraction between laser-off and laser-on trials. Green, VGAT-ChR2 mice (n = 3); purple, VGAT-Cre mice (n = 3) injected with AAV-FLEX-ChrimsonR; gray, C57BL/6 mice (n = 3) injected with AAV-CaMKIIα-GtACR2. *P <0.05, **P <0.01, n = 9 mice, Wilcoxon signed rank test. Data represent the mean ± SEM.

For control C57BL/6 mice in which control virus (AAV-hSyn-eGFP) was bilaterally injected in the ALM (Fig. S3C), laser stimulation for 2 s before the time of reward delivery on the trial preceding the probe trial did not affect peak time (Fig. S4), indicating that the change in time estimation induced by ALM inactivation (Fig. 4) was not due to laser stimulation itself.

M2 occupies a large area along the rostral-caudal axis [47]. In addition to the subregion of M2 at the ALM (AP 2.46 mm, ML ±1.8 mm), we also examined a different subregion of M2 at a central and medial location (AP 1.34 mm, ML ±0.75 mm) (Fig. S3D). We found that inactivation of this central and medial subregion of M2 for 2 s before the time of reward delivery did not affect peak time in probe trial (Fig. S5). This result suggests that neuronal activity in the ALM, but not in the central and medial subregion of M2, is involved in short-term memory of time interval.

Optogenetic Inactivation of mPFC does not Affect Timing Behavior

Previous studies showed that activities related to temporal processing and timing behavior are distributed in the brain [3, 20]. Among frontal cortical areas, neurons in the mPFC also convey information about elapsed time [48, 49]. We thus further determined whether perturbation of mPFC activity before reward delivery influences interval timing in probe trials. To manipulate mPFC activity, we injected AAV-hSyn-Jaws bilaterally into the mPFC of C57BL/6 mice (Fig. S3E). Electrophysiological recordings confirmed that activating Jaws suppressed the firing rates of cortical neurons (Fig. S3G). We found that inactivation of mPFC for 2 s before the time of reward delivery on the trial preceding the probe trial did not affect peak time (Fig. S6). Together, the results suggest that neuronal activity in the ALM but not in the mPFC is important for short-term memory of time interval.

ALM Activity in Probe Trials Preceded by Long and Short Inter-reinforcement Intervals Exhibits Task-engagement-dependent Temporal Scaling

Because time estimation in probe trials was dynamically influenced by a decrease of inter-reinforcement interval in the previous trial, we next considered how ALM activity in probe trials is adaptive to the preceding inter-reinforcement interval. To address this issue, we recorded from ALM neurons in mice that were trained with a 10-s fixed-interval licking task (Fig. S7A). During the recording session, while most inter-reinforcement intervals were 10 s, those preceding the probe trials were either 8 s or 12 s, referred to as ‘reward at −2 s’ or ‘reward at +2 s’ (Fig. 5A). To analyze neural activity related to time estimation, we only included those neurons that exhibited significant temporal modulation of firing rates during the first 10 s of probe trials, as defined by significant differences in firing rates over 5 non-overlapping periods (P <0.05, one-way ANOVA).

Fig. 5
figure 5

Temporal scaling of ALM responses in probe trials preceded by long and short inter-reinforcement intervals. A Schematic of reward delivery for the reinforced trial preceding a probe trial in the ‘reward at −2 "s" or ‘reward at +2 "s" condition. B Left, Z-scored responses in the ‘reward at +2 "s" condition (each row is one unit). The units are ordered by time to peak firing rate. Right, Z-scored responses in the ‘reward at −2 "s" condition. The units are ordered by the unit index at left. C Left, lick PSTHs of an example session during probe trials in the ‘reward at +2 s’ (black) and ‘reward at −2 s’ (red) conditions. T1 (T2) indicates the peak time for the ‘reward at +2 s’ (‘reward at −2 s’) condition. Right, peak time in each session. D Spike rasters and PSTHs for an example ALM neuron in the ‘reward at +2 s’ (black) and ‘reward at −2 s’ (red) conditions. E The best scaling factor for the neuron in D is the one that yields the minimum MSE between the short response profile and a scaled version of the long response profile. F Spike rasters and PSTHs for another example ALM neuron. G The best scaling factor for the neuron in F is the one that yields the minimum MSE. H Distribution of best scaling factors for ALM neurons (n = 175) in the early part of probe trials. Red, best scaling factors were within ±10% of T2/T1. I Distribution of best scaling factors in the later part of probe trials. J Number of neurons with best scaling factor within (or outside) ±10% of T2/T1 during the early vs the later part of probe trials (χ2(1) = 41.14, P = 1.42×10–10). In C, D and F, data represent the mean ± SEM.

For the responses during the first 10 s of probe trials, when we sorted the neurons according to the time of maximum firing rate in the ‘reward at +2 s’ condition and used the sorted index to plot response profiles for the ‘reward at −2 s’ condition (Fig. 5B), we found that the latter appeared to show sequential firing, suggesting that ALM responses in the latter condition are scaled from that in the former. We next quantified the degree of temporal scaling for the response profiles of single ALM neurons in the probe trials. For the licks in each probe trial, we aligned time zero to the time of the first lick after reward delivery. For each of the two types of trial-history of probe trial, we fitted the lick PSTH with a Gaussian function, obtaining two peak times (T1 and T2 for ‘reward at +2 s’ and ‘reward at −2 s’, respectively, Fig. 5C). For each ALM neuron, we extracted a response profile within the period of [0 T1] for probe trials in the ‘reward at +2 s’ condition (black curves in Fig. 5D and F) and a response profile within the period of [0 T2] for those in the ‘reward at −2 s’ condition (red curves in Fig. 5D, F). We then calculated a best scaling factor that produced a minimum difference between response profiles in the two types of probe trial (Fig. 5E, G). For the population of ALM neurons (n = 175), the distribution of best scaling factor showed a peak at ~0.96 (Fig. 5H), close to the ratio between T2 and T1 (T2/T1 = 0.90 ± 0.01, mean ± SEM, n = 18 sessions; Fig. 5C, right). For 33.71% of ALM neurons (59 out of 175, red bars in Figs. 5H and S7B), the best scaling factor was within ±10% of the mean T2/T1, suggesting that their response profiles in probe trials preceded by long and short inter-reinforcement intervals exhibit temporal scaling.

Although the duration of a probe trial was 3–4 times longer than the inter-reinforcement interval of reinforced trials, mice exhibited anticipatory licking only in the early part but not in the later part of a probe trial (Fig. S8A, C), suggesting that mice were actively engaged in timing in the early part but less so in the later part. Using this less engaged period as a control, we applied similar analysis to ALM responses in the later part of probe trials. For these responses, when we sorted the neurons according to the time of maximum firing rate in the ‘reward at +2 s' condition (Fig. S8B) and used the sorted index to plot response profiles of the same neurons in the ‘reward at −2 s' condition, we found that the latter did not show sequential firing (Fig. S8D), suggesting that firing patterns in the two conditions are not temporally scaled. We also performed scaling factor analysis by using response profiles in the last T1 s of the ‘reward at +2 s' condition and those in the last T2 s of the ‘reward at −2 s' condition. For these responses in the later part of probe trials, the distribution of best scaling factor did not show a peak near T2/T1 (Fig. 5I), and the percentage of neurons with best scaling factor within ±10% of the mean T2/T1 was significantly lower than that in the early part of probe trials (χ2(1) = 41.14, P = 1.42×10−10, Fig. 5J). These results suggest that, during probe trials preceded by long and short inter-reinforcement intervals, the temporal scaling of ALM response profiles is specific to the period when mice are actively engaged in timing behavior.

ALM Population Activity Encodes Elapsed Time

We further used a multiclass SVM [7, 42] to perform trial-by-trial decoding of time from the population activity of ALM neurons. We first performed decoding using responses in the early part of probe trials (Fig. 6A). For each probe trial from the ‘reward at +2 s’ (‘reward at −2 s’) condition, we took the spike train in the first T1 (T2) s and binned the spikes to 50 time bins. The SVM decoder was trained with the population firing rates from multiple trials of training data in the ‘reward at +2 s’ condition, and then was presented with population activity from a trial of testing data in the same condition (‘reward at +2 s’) or in a different condition (‘reward at −2 s’). The SVM output was represented in 50 readout units, one for each time bin, forming a classification matrix (Fig. 6A, upper). For each actual bin in the classification matrix, the predicted bin was the one corresponding to the readout unit with the maximum value (Fig. 6A, lower). For the classification matrix produced by using testing data in the ‘reward at +2 s’ condition (i.e., training and testing data were from the same condition, Fig. 6A, upper left), the large values were mostly along the diagonal line of the matrix, suggesting that the predicted bins largely match the actual bins. We also obtained similar classification results when we used testing data in the ‘reward at −2 s’ condition (i.e., training and testing data were from different conditions, Fig. 6A, upper right), suggesting that ALM dynamics in one type of trial-history condition can predict the elapsed time in another type of trial-history condition.

Fig. 6
figure 6

Neural dynamics in the ALM can encode elapsed time. A SVM decoding using population activity in the early part of probe trials. Left, training and testing data are both from ‘reward at +2 s’. Right, training and testing data from ‘reward at +2 s’ and ‘reward at −2 s’. Upper, classification matrix averaged over all repeats of SVM implementation. For each time bin, the readout values are normalized across readout units. Lower, Predicted vs actual time bins for all repeats of SVM implementation. B SVM decoding using population activity in the later part of probe trials, similar to A. C Prediction accuracy for SVM decoding using ALM population activity in the early and later parts of probe trials. Each data point is the result of one repeat of SVM implementation. D Prediction accuracy for SVM decoding using the activity of scaled or unscaled ALM neurons in the early part of probe trials. Each data point is the result of SVM decoding using one set of resampled data. Scaled neurons: best scaling factor is within ±10% of the mean T2/T1. Unscaled neurons: best scaling factor is outside ±10% of the mean T2/T1. **P <0.01, ***P <0.001, one-way ANOVA followed by Tukey-Kramer multiple comparisons test. Data represent the mean ± SEM.

We next applied decoding analysis using responses in the later part of probe trials (Fig. 6B). The SVM decoder was trained with the training data using population firing rates in the last T1 s of the ‘reward at +2 s’ condition, and then was presented with population activity from a trial of testing data in the same condition (Fig. 6B, left) or in the ‘reward at −2 s’ condition (Fig. 6B, right). We found that the classification matrix did not show large values along the diagonal line in either case, suggesting that information about elapsed time in the later part of probe trials is not encoded by the population activity of ALM neurons.

To quantify decoding performance for each repeat of SVM implementation, we computed Pearson’s correlation coefficients between the predicted and the actual time bin (Fig. 6A lower and Fig. 6B lower). For decoding using responses in the early part of probe trials, SVM decoder yielded high accuracy no matter whether the training and testing data were from the same or different conditions (Fig. 6C). For decoding using responses in the later part of probe trials, however, the correlation coefficient was not significantly different from zero (Fig. 6C). Thus, when mice were actively engaged in timing behavior that was adaptive to the previous inter-reinforcement interval, the population activity of ALM neurons reliably encoded elapsed time and the representation was scalable.

For the early part of probe trials, we further compared the capability of encoding elapsed time between scaled and unscaled ALM neurons, whose best scaling factors were within and outside ±10% of the mean T2/T1. During each repeat of SVM implementation, we randomly sampled 50 units from the scaled or the unscaled neurons. We repeated the SVM decoding 10 times, each with a different set of 50 randomly-sampled units. As shown in Fig. 6D, for SVM decoding using training and testing data from different conditions (‘Train +2 s Test −2 s’) as well as that using data from the same condition (‘Train +2 s Test +2 s’), the correlation coefficient between the predicted and the actual time bin was significantly higher for scaled neurons than for unscaled neurons (P <0.001, one-way ANOVA followed by Tukey-Kramer multiple comparisons test). This result suggests that, compared with unscaled neurons, those neurons with temporally scalable firing patterns can encode elapsed time more reliably.

Discussion

In this study, we examined the trial-history influence of interval timing behavior. Using a licking-based peak-interval timing task in mice, we found that interval timing was rapidly adaptive to a decrease of inter-reinforcement interval in the previous trial. Bilateral inactivation of the subregion of M2 at ALM, but not the central and medial subregion of M2 or the mPFC, for a short period of time before reward delivery shifted the peak of anticipatory licks to an earlier time in the next trial, suggesting that ALM activity is essential for the short-term memory of time interval. By analyzing ALM spiking responses in probe trials preceded by short and long inter-reinforcement intervals, we demonstrated that ALM neurons showed task-engagement-dependent temporal scaling in their response profiles and could encode elapsed time. These results reveal that ALM activity not only contributes to the short-term memory of time interval but also reflects the influence of recent experience during time estimation.

Previous studies have shown that estimation of time interval can be influenced by both non-temporal and temporal contexts [50, 51]. For instance, the sense of time can be altered by attention [52], emotional experience [53], anxiety [54], or non-temporal stimulus size [55]. In terms of temporal context, a widely reported phenomenon in the time reproduction task of human subjects is the central tendency effect, in which prior temporal context influences the interval estimate in such a way that a shorter interval is overestimated and a longer interval is underestimated [56,57,58,59]. Similar to human subjects, rodents in timing task are also sensitive to the distribution of temporal intervals and can adjust their waiting time accordingly [13]. Using a fixed-interval schedule, previous studies have shown that the waiting time of rodents and pigeons can rapidly follow a step-function change in inter-food interval within a session [15, 60]. Our findings add to the literature that interval timing is adaptive to the temporal context.

The peak-interval timing procedure often requires many sessions of training to form a criterion interval in the reference memory [43, 44]. In our case, mice usually underwent 30−50 training sessions before being tested in sessions containing probe trials preceded by a shorter or longer inter-reinforcement interval. We found that a decrease in inter-reinforcement interval caused an earlier shift in the peak time in probe trials, suggesting that the interval estimation established by long-term temporal memory can be rapidly modified by short-term experience of an interval decrease. Compared to a decrease of inter-reinforcement interval by 2 s, an increase in inter-reinforcement interval by 2 s was not immediately effective at influencing time estimation in the probe trial. For the ‘reward at +2 s’ condition in which the reward delivery preceding the probe trial was postponed by 2 s, the reinforced trial might effectively become a probe trial, i.e., time integration may be terminated after the target interval passes without reinforcement and timing is reset with a reward delivery. In addition to the asymmetric effect of ‘reward at −2 s’ and ‘reward at +2 s’ on peak time, we also found that the start time was more adaptive than the end time (Figs. 3, S1, and S2). These effects may reflect the scalar property of interval timing behavior, in which the variability of timing is proportional to the interval being timed [9, 10]. Interestingly, an asymmetric effect has also been reported in the sequential effect in the variable foreperiod task [61, 62], and it has been suggested that the sequential effect involves arousal modulation [62, 63]. For the decrease of inter-reinforcement interval in the ‘reward at −2 s’ condition in our study, there might be a ‘surprise effect’ influencing the arousal level when the reward occurred earlier than expected. Because dopamine neurons encode the reward prediction error [64] and dopamine neuron activity can control the judgment of time [65], it is of interest for future studies to determine whether dopamine neurons are involved in the influence of trial history on interval timing. Our finding that short-term experience of an interval decrease and an interval increase had asymmetric effects may impose constraints on models of interval timing [1, 3]. Although the peak time in the probe trial was rapidly affected by a decrease in inter-reinforcement interval, the peak width was not significantly different between different conditions of preceding inter-reinforcement interval. As peak time and peak width reflect timing accuracy and timing precision, respectively [44, 66], our results suggest that, while timing accuracy is rapidly adaptive to short-term experience, timing precision established by long-term training is relatively stable.

Previous studies have shown that neuronal activity in sensory cortex [67, 68], prefrontal cortex [6, 41, 69,70,71], thalamus [6, 66, 72], tectum [73], and basal ganglia [11, 65, 74,75,76] play an important role in timing behavior. Although many behavioral studies have examined the effect of temporal context on interval timing, few studies have investigated the underlying neural substrate. A recent study found that, in monkeys trained to reproduce time intervals drawn from short or long interval distributions, the speed of neural dynamics in dorsomedial frontal cortex is adjusted according to the mean of the interval distribution [77]. A previous study on sensory timing demonstrated that intervals in the range of hundreds of milliseconds can be encoded as specific states in a model neural network that uses short-term synaptic plasticity to keep track of the memory trace of recent stimulus history [78]. The model predicts that discrimination of a temporal target is impaired when a distractor precedes the target at random intervals, and the prediction has been confirmed by human psychophysical studies [78]. Using a waiting task, a recent study found the neural correlates of history-dependent waiting time bias in both M2 and the mPFC [79]. In our study, we used a peak-interval timing task and found that interval timing in the range of seconds was influenced by a shortening of the previous inter-reinforcement interval. Our finding is consistent with the trial history influence reported in value-based decision and perceptual decision tasks [17, 18, 35, 80,81,82,83,84,85], in which the memory of the most recent trial shows the greatest effect [19, 32, 35]. We also found that inactivation of the subregion of M2 at the ALM for a short period before reward delivery caused an underestimation of elapsed time in probe trials. This suggests that the time period while the ALM is inactivated cannot be stored in short-term memory, likely resulting in a shortening of the perceived inter-reinforcement interval. Our results are also consistent with the important role of rodent M2 and monkey premotor cortex in memory-guided perceptual decision-making [30, 31, 86,87,88,89].

The subregion of M2 at the ALM is a region necessary for motor planning of licking action [45, 46]. We found that optogenetic inactivation of the ALM for 2 s before reward delivery resulted in a reduction of lick rate, which was associated with an earlier shift in peak time in the following probe trial (Fig. 4), whereas optogenetic inactivation of the central and medial subregion of M2 or the mPFC had no effect on lick rate or peak time (Figs. S5 and S6). This raised an interesting possibility that mice may partly rely on licking action to estimate elapsed time in the licking-based interval timing task. Such a conjecture that motor action may be used to keep track of time is supported by several studies. For instance, in rodents performing an interval duration categorization task, the temporally structured behavioral sequences during stimulus presentation predict the temporal judgment in the choice period [90]. Using a treadmill-based timing task, a recent study showed that accurate timing of rats is associated with the stereotyped motor routine on the treadmill [91]. A wealth of evidence has shown that motor-related regions, such as the premotor cortex, basal ganglia, and cerebellum, are implicated in timing [3, 20, 51]. Our study adds to the literature supporting the hypothesis that sensorimotor experience plays an important role in the representation of time [51]. Our results also suggest that, in addition to the subregion of M2 at the ALM, other licking-related regions, such as the lateral superior colliculus and the ventrolateral striatum [11, 92,93,94], may also contribute to short-term memory of interval in the licking-based interval timing task. It is also of interest to determine at the circuit level [95] whether ALM projections to these regions is involved in short-term memory of time interval.

In animals performing a timing task containing both short and long intervals, temporal scaling of neural responses has been found in several brain regions, including prefrontal cortex, M2, striatum, and thalamus [6, 8, 41, 49, 77, 96]. The activity profiles in different intervals are similar when compressed or stretched, which may allow flexible temporal control of sensorimotor and cognitive behaviors [6]. In our study, a subpopulation of ALM neurons scaled their responses in probe trials in accordance with the preceding inter-reinforcement interval, and compared with the unscaled neurons, these scaled neurons better encoded elapsed time and the representation was scalable. This temporal scaling of neuronal responses was diminished when mice were not attentively engaged in time estimation. Our results suggest that ALM activity during the period of active time-estimation is influenced by recent trial history to enable adaptive timing behavior.