Cortical processing of flexible and context-dependent sensorimotor sequences

Xu, Duo; Dong, Mingyuan; Chen, Yuxi; Delgado, Angel M.; Hughes, Natasha C.; Zhang, Linghua; O’Connor, Daniel H.

doi:10.1038/s41586-022-04478-7

Cortical processing of flexible and context-dependent sensorimotor sequences

Article
Published: 09 March 2022

Volume 603, pages 464–469, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

From

View current issue Submit your manuscript

Cortical processing of flexible and context-dependent sensorimotor sequences

Download PDF

13k Accesses
19 Citations
104 Altmetric
9 Mentions
Explore all metrics

Abstract

The brain generates complex sequences of movements that can be flexibly configured based on behavioural context or real-time sensory feedback¹, but how this occurs is not fully understood. Here we developed a ‘sequence licking’ task in which mice directed their tongue to a target that moved through a series of locations. Mice could rapidly branch the sequence online based on tactile feedback. Closed-loop optogenetics and electrophysiology revealed that the tongue and jaw regions of the primary somatosensory (S1TJ) and motor (M1TJ) cortices² encoded and controlled tongue kinematics at the level of individual licks. By contrast, the tongue ‘premotor’ (anterolateral motor) cortex^{3,4,5,6,7,8,9,10} encoded latent variables including intended lick angle, sequence identity and progress towards the reward that marked successful sequence execution. Movement-nonspecific sequence branching signals occurred in the anterolateral motor cortex and M1TJ. Our results reveal a set of key cortical areas for flexible and context-informed sequence generation.

Distinct dynamics of neuronal activity during concurrent motor planning and execution

Article Open access 10 September 2021

Sensory representations in the striatum provide a temporal reference for learning and executing motor habits

Article Open access 09 September 2019

The Rodent Vibrissal System as a Model to Study Motor Cortex Function

Main

The world presents itself to us as a series of sensations arising from our own actions, which in turn elicit further actions in an intricate sensorimotor loop. Orofacial sensorimotor control is essential for exploration, communication and survival, and is exquisitely orchestrated^11,12,13,14. To investigate the cortical control of complex orofacial movements, we trained head-fixed mice to use sequences of directed licks to advance a motorized port through seven consecutive positions, either from left to right or right to left, after an auditory cue (15 kHz for 0.1 s) that signalled the start of a trial (Fig. 1a, Supplementary Video 1). Each transition from one position to the next was driven in a closed loop by a single lick touching the port. Thus, if a lick missed the port, it would remain at the same position until the tongue eventually made contact. The port was no longer moveable after the mouse finished the seven positions and a water droplet was delivered as a reward after a short delay (0.25 s, or 0.5 s in two mice). The next trial then started with a sequence in the opposite direction after a random inter-trial interval (mean duration of 6 s).

We measured instantaneous tongue angle (θ), tongue length (L), vertical and lateral components of contact force (F_vert and F_lat), and contact duration during sequences (see Methods; Fig. 1b–d, Extended Data Fig. 1a–d). In addition to the continuous θ measurement, we use scalar angle value θ_shoot to denote the angle of the tongue shooting out in each lick (see Methods) and use capital Θ to represent unified tongue angles after the sign in right-to-left sequences is flipped to pool data from both sequence directions.

Mice modulated each lick to reach different target locations (Extended Data Fig. 1e, f). In addition to stereotypic licking kinematics, expert mice showed remarkable sequence execution speed, with the seven positions completed in about 1 s (Extended Data Fig. 1h). Mice performed the task in darkness with no visual cues to guide the licks. Control experiments (Methods) showed that mice did not rely on auditory (Extended Data Fig. 1i) or olfactory (Extended Data Fig. 1j) cues, but did require tactile feedback from the tongue (Extended Data Fig. 1k). Mice reached proficiency in standard sequences (Fig. 1e) after approximately 1,500 trials of training (Methods; Extended Data Fig. 1l–n).

To determine whether sequence generation was ‘ballistic’ or capable of flexible reconfiguration based on sensory feedback, we varied the task by introducing unexpected port transitions after mice learned standard sequences (Fig. 1f, Supplementary Video 2). On a randomly interleaved subset (one-third or one-quarter) of trials, when a mouse licked at the middle position, the port would backtrack two steps rather than continue to the anticipated position. Mice previously trained only with standard sequences (Methods; Extended Data Fig. 2a, b) learned to detect the change of port transition, branch out to the new position and finish the sequence (Fig. 1g, Extended Data Fig. 2c, d). On average, it took one to two missed licks before mice quickly relocated the port (Extended Data Fig. 2e). Head-fixed mice can thus learn to perform complex and flexible licking sequences guided by sensory feedback.

Optogenetic inhibition screen

To determine which brain regions contribute to the performance of our sequence licking task, and at which points during execution, we performed systematic optogenetic silencing⁶. In different sessions, bilateral inhibition was centred at each of five regions (Fig. 2a, Extended Data Fig. 3a): the anterolateral motor (ALM)¹⁵ cortex (also including part of the M1TJ cortex (ALM–M1TJ hereafter)), the body region of the primary motor (M1B) cortex^16,17, the S1TJ cortex^2,18, the barrel field of the primary somatosensory (S1BF) cortex and the trunk subregion of the primary somatosensory (S1Tr, including part of the posterior parietal cortex) cortex. For each region, inhibition was triggered with equal probability (10%) at sequence initiation, mid-sequence or at the start of water consumption (Extended Data Fig. 3b). Stimulation at mid-sequence and at consumption was triggered in closed loop by the middle touch and by the first touch after water delivery, respectively.

**Fig. 2: Optogenetic inhibition and single-unit activity survey across cortical regions during sequence execution.**

Somatosensory inputs both provide information about external objects and enable proprioceptive sensing of the position of the body¹⁹ for motor control^20,21. Missing sensory feedback can make effortless manipulations surprisingly difficult despite unchanged motor capability²². Normally executed sequences were stereotyped across trials. Therefore, in a given time bin during the sequence, across-trial variability in lick angle (quantified by the standard deviation of Θ_shoot, or SD(Θ_shoot)) was relatively low. When the S1TJ cortex was inhibited, however, sequences became disorganized and no longer stereotyped (see examples in Extended Data Fig. 3g, Supplementary Video 3). As a result, SD(Θ_shoot) increased significantly compared with no inhibition (Fig. 2b, left). Despite disorganized targeting, the ability to direct licks to the sides (that is, |Θ_shoot|) was uncompromised (Fig. 2b, middle). Inhibition of the S1TJ cortex also did not shorten the length of licks (Fig. 2b, right), although slight but statistically significant increases were observed. Full quantifications of data summarized in Fig. 2b appear in Extended Data Fig. 3j. Together, these data suggest that inhibition of the S1TJ cortex left the core motor capabilities that are required for tongue protrusions and licking intact, but corrupted their proper targeting, possibly due to missing sensory feedback.

By contrast, when inhibiting the ALM–M1TJ cortex, mice had reduced ability to direct licks to the sides (Fig. 2b, middle; see example in Extended Data Fig. 3h), and showed decreased length of lick (Fig. 2b, right). Inhibition of the M1B cortex caused only minor increases in lick angle variability with no decrease in angle deviation or lick length. Inhibition of the S1BF or S1Tr cortex changed no aspects of lick control.

The ALM cortex has been shown to be important in motor preparation of directed single licks to obtain water reward^10,15,23. Here we found that inhibiting the ALM–M1TJ cortex at sequence initiation strongly suppressed production of licking sequences (Fig. 2c, left, Supplementary Video 4). In four of seven mice, licks were largely absent (Extended Data Fig. 3k, top panel under ALM–M1TJ). Inhibition of the S1TJ cortex caused more moderate suppression, with no obvious change from inhibiting other regions. When applied at mid-sequence, inhibition of the ALM–M1TJ cortex also suppressed the production of licks, although less strongly. Inhibition of other regions at mid-sequence showed little or no effect. Full quantifications appear in Extended Data Fig. 3k (top and middle rows).

When a sensorimotor sequence reaches its normal stopping point, one might intuitively expect movement to cease in a passive rather than active manner. To our surprise, when inhibiting the S1TJ or M1B cortex at water consumption, mice were impaired at stopping ongoing sequences (Fig. 2c, right, Extended Data Fig. 3k, bottom row; see example in Supplementary Video 5). This prolonged licking was not due to additional attempts to reach the port for water, as mice continuously made successful contacts, nor did we inhibit the water-responsive gustatory cortex^24,25.

To test the possibility that inhibition of the S1TJ or M1B cortex caused persistent lick bouts due simply to spread of inhibition to other regions, we repeated the above experiments with half the illumination power (2 mW) (Extended Data Fig. 3l, m). The effects of ALM–M1TJ inhibition on sequence initiation, tongue length and angle control, and of S1TJ inhibition on angle control, remained largely consistent with, although weaker than, our previous results using higher power (4 mW). At consumption, inhibition of the S1TJ or M1B cortex resulted in similarly strong deficits in terminating ongoing sequences (Extended Data Fig. 3m, bottom row). Therefore, the observed deficit in sequence termination was not due to spread of inhibition. Rather, our results indicate that sequence termination is an active process²⁶ mediated collectively by the S1TJ and M1B cortices.

Sequence tiling of single-unit responses

We used silicon probes to record from multiple brain regions from both hemispheres (Fig. 2d) during the task, obtaining 1,537 single units and 303 multiple units (Methods; Extended Data Fig. 4a–e) from 57 recording sessions. Perievent time histograms (PETHs) of single-unit spiking (Fig. 2e–h; example neurons are shown in Extended Data Fig. 4f–h) exhibited a wide variety of patterns before, during and after sequence execution. Spiking that gave rise to the PETHs was consistent across trials (Methods; Extended Data Fig. 4i). To present these PETHs in a way that reflects the main themes observed in the population activity, we pooled neurons from all brain regions and clustered their PETHs using non-negative matrix factorization (Methods).

We observed that single-neuron responses tile the sequence progression (Fig. 2e–h, Extended Data Fig. 4k), with more ALM neurons tuned to sequence initiation (Extended Data Fig. 4l). The S1TJ and M1TJ cortices contained more neurons (for example, cluster 7; Extended Data Fig. 4k, m) that showed greater modulation by individual licks (Extended Data Fig. 4n, o). Patterns of activity arising from these single-unit responses might encode behavioural variables that are important for sequence control.

Hierarchical population coding

Our sequence licking task requires the brain to encode instantaneous tongue length (L) and angle (θ), presumably both for motor output and sensory feedback. Encoding of velocity (L′) could also be used to indirectly control tongue position. Sequence identity (I) and relative sequence time (τ) can be used to represent the sequence-level organization of individual licks beyond instantaneous control. The variable τ can also serve as a proxy for sequence progress or ‘distance to goal’. The five behavioural variables, L, L′, θ, I and τ, were measured (or derived) at 2.5-ms resolution (Fig. 3a). Conveniently, any pair of these variables was uncorrelated (Extended Data Fig. 5b). Therefore, being able to encode one is of little or no help with encoding any other.

**Fig. 3: Neuronal populations code with increasing levels of abstraction across cortical areas.**

For each recording session, we performed separate linear regressions (Methods) to obtain unit weights (and a constant) for each of the five behavioural variables, such that a weighted sum of instantaneous spike rates from simultaneously recorded units (32 ± 13 units; mean ± s.d.) plus the constant best predicted the value of a behavioural variable. We used cross-validated R² values to quantify how well the recorded population of neurons encoded each behavioural variable²⁷.

The five behavioural variables were decoded from population activity on a single-trial basis (see examples in Extended Data Fig. 5c, d). Overall, the S1TJ and M1TJ cortices had stronger coding of L and L′ than the ALM cortex and the control region S1BF (Fig. 3b, c). S1TJ, M1TJ and ALM cortices, but not the S1BF cortex, all showed comparable encoding of θ (Fig. 3b, c). However, the traces of decoded θ in the S1TJ and M1TJ cortices contained rhythmic fluctuations that were absent in the ALM cortex, despite similar overall levels of encoding of θ (R² values). These fluctuations indicate that the M1TJ and S1TJ cortices encoded θ in a more instantaneous manner, whereas the ALM cortex encoded θ in a continuously modulated manner that may provide a control signal for the intended lick angle or represent the position of the target port.

Higher-level cortical regions are in part defined by the presence of more abstract (or latent) representations of sensory, motor and cognitive variables²⁸ . Compared with L, L′ and θ, which describe the kinematics of individual licks, I and τ describe more abstract motor variables. In the ALM cortex, we found the strongest encoding of both I and τ (Fig. 3b, c). Encoding of I and τ became progressively weaker in the M1TJ, S1TJ, and S1BF cortices, respectively. Overall, these results reveal a neural coding scheme with increasing levels of abstraction across the S1TJ, M1TJ and ALM cortices during the execution of flexible sensorimotor sequences.

Good decoding may come from a small fraction of informative units or from dominant activity patterns across a population. Distinguishing these requires comparing the similarity between activity patterns captured by the coding axes (defined by the vector direction of regression weights), as shown above, and the dominant patterns in population activity identified in an unsupervised manner. In each recording session, we obtained neural trajectories in the coding subspaces (the subspaces spanned by coding axes) via linear decoding and trajectories in principal component subspaces (the subspaces spanned by the first few principal components) via principal component analysis. Trajectories in principal component subspaces depict dominant patterns in population activity, but the principal components per se need not have any behavioural relevance. To see whether neural trajectories in coding and principal component subspaces were the same except for a change (rotation and/or scaling) in the reference frame, we used canonical correlation analysis (Methods) to find the linear transformation of the two trajectories such that they were maximally correlated²⁹.

After transformation, trajectories of the ALM population in the subspace of the top three principal components aligned (Fig. 3d) and correlated (Fig. 3e; group 2 in the ALM cortex) well with the trajectories in the subspace encoding θ, I and τ. This indicates that the dominant neural activity patterns in the ALM population encoded θ, I and τ. As the ALM cortex minimally encoded L and L′, including these in the coding subspaces decreased the correlation with principal component trajectories (Fig. 3e; groups 1 and 3 in the ALM cortex). The decoded trajectories and principal component trajectories in the M1TJ and S1TJ cortices also showed a strong correlation but only when the coding subspaces included L and L′.

Across regions, the sum of variance explained by the five coding axes reached about half that of the top five principal components (Methods; Extended Data Fig. 5e). The five coding axes were largely orthogonal with each other (Extended Data Fig. 5f), indicating that they not only captured dominant neural dynamics but also did so efficiently with little redundancy.

Sequence branching signals in ALM–M1TJ

In backtracking sequences, mice licked back to a previous angle to relocate the port and then progressed through the rest of the sequence. The opposing deflections in the decoded θ from backtracking trials matched this behaviour (Fig. 3a, b, dashed curves for θ). This is not surprising as the M1TJ and ALM cortices are expected to encode the changed motor program, and the S1TJ cortex to signal the resulting proprioceptive or reafferent feedback. However, the motor cortical mechanisms that allow sensory feedback to integrate with unfolding motor programs^{30,31,32,33,34,35} could involve a movement-nonspecific signal to indicate sequence branching.

We used a linear support-vector machine to classify trials into either backtracking or standard sequences based on population activity at each time bin (Methods). Within each class, about equal numbers of left-to-right and right-to-left sequences were pooled so classifiers could not rely on the coding of specific licking movements. ALM and M1TJ activity started to predict the presence or absence of backtracking during the initial missed lick (Fig. 3f). We randomly shuffled class labels to determine chance-level classification accuracy. S1TJ populations showed only a statistically insignificant trend towards being able to distinguish backtracking from standard sequences (Fig. 3f), at much later time points (Extended Data Fig. 5g). As expected, S1BF populations showed no prediction.

Context-dependent coding of subsequences

Complex sequences can be composed of different combinations of subsequences. The same subsequence can be used in multiple complex sequences, and it is crucial for the brain to keep track of the context in which a subsequence is executed^36,37,38,39. To search for such sequence context signals, we trained mice on two new sequences where the port steps in a ‘zigzag’ manner from one side to the other, then steps back, and then again steps to the other side (Fig. 4a, b, Supplementary Video 6). The two sequences have symmetrical movements. By fixing one and shifting the other forwards or backwards in time, it is possible to find subsequences that have the same licking movements but different sequence contexts (Fig. 4c). There are in total four ways to shift and match subsequences, and we focused on the three licks in the middle (Fig. 4d) for analysis.

Three simultaneously recorded ALM neurons illustrate three types of response (Fig. 4e). The first neuron preferentially fired during blue-coloured sequences, and the second neuron fired during red-coloured sequences, whereas the third neuron responded faithfully to the physical movements with no clear sequence preference (Fig. 4e, neurons 1–3, respectively). Using population activity as a predictor, linear support-vector machine classifiers (Methods) were able to predict the sequence identity, or context, in the example session (Fig. 4f) and across sessions and mice (Fig. 4g). Chance-level classification accuracy was determined by shuffling the sequence labels.

Our results provide strong evidence that ALM neurons in mice encode complex sequences with combined information about both physical movements and the latent sequence context.

Reward modulation in ALM

In the decoding analysis for standard sequences, the τ coding axis was identified by fitting models to link neural activity and relative sequence time. We performed the same decoding analysis with zigzag sequences and found a similar ramping pattern of τ (Extended Data Fig. 5h). The monotonic coding of τ therefore does not require a constant sequence direction. However, if τ faithfully represents time, the downward deflection of traces from backtracking sequences (Fig. 3b) should not appear, as time advances regardless of what the animals do. This suggests representation of a distance to goal⁴⁰, which might correspond to arrival at the last port position, water delivery, finishing water consumption, and so on.

The ALM cortex contained single neurons (Extended Data Fig. 6a) that fired actively during sequence execution but abruptly decreased firing upon tongue contact with water, even though mice continued with approximately five consummatory licks (Extended Data Fig. 6b) of similar or more strongly modulated kinematics and force (Extended Data Fig. 6c). The τ decoded from ALM populations showed similar time courses (Extended Data Fig. 6d, top left).

ALM activity was thus modulated by reward⁴¹ so as to signal reward expectation in a manner that smoothly increased as mice approached water delivery, regardless of sequence direction or lick angle, that was suppressed by the delay of progress upon backtracking, and that terminated at water delivery despite continued licking. Coding of I and θ followed more complex time courses than τ (Extended Data Fig. 6d, e).

ALM encodes upcoming sequences

In our task, sequences alternated direction across trials (Extended Data Fig. 7a). Before each trial, there was no cue to indicate the starting side. Expert mice nevertheless usually initiated sequences from the correct side without exploring the other (Extended Data Fig. 7b), suggesting internal maintenance of information about target position during inter-trial intervals. Brain regions maintaining such information may contribute to organizing higher-level sequences across trials.

In the ALM cortex, we found simultaneously recorded units that fired persistently to specific target position values during the inter-trial interval (Extended Data Fig. 7c). A linear model fitted using data from the second before cue onset showed smooth population decoding of target position across the span of many trials (Extended Data Fig. 7d). On average, ALM populations showed stronger encoding of target position (Extended Data Fig. 7e, f) than other regions. When using this model to decode during sequence execution, the resulting traces from two sequence directions crossed at mid-sequence (Extended Data Fig. 7g), showing similar structure as θ. None of the regions, including the ALM cortex, encoded time or a distance to trial start (Extended Data Fig. 7h), perhaps because our inter-trial interval contained an exponential portion (Methods) that made the time to trial start unpredictable⁷.

Together, our results from behaviour analysis, population electrophysiology and optogenetics define key sensory and motor cortices in mice that govern hierarchical execution of flexible, feedback-driven sensorimotor sequences.

Methods

Mice

All procedures were in accordance with protocols approved by the Johns Hopkins University Animal Care and Use Committee (protocols: MO18M187 and MO21M195). Mice were housed in a room on a reverse light–dark cycle, with each phase lasting 12 h, and maintained at 20–25 °C and 30–70% humidity. Before surgery, mice were housed in groups of up to five, but afterwards were housed individually. Fifteen mice (12 male and 3 female) were obtained by crossing VGAT-IRES-Cre (Jackson Labs: 028862; B6J.129S6(FVB)-Slc32a1^tm2(cre)Lowl/MwarJ)⁴² with Ai32 (Jackson Labs: 012569; B6;129S-Gt(ROSA)26Sor^{tm32(CAG-COP4*H134R/EYFP)Hze}/J)⁴³ lines. Two (one male and one female) were heterozygous VGAT-ChR2-EYFP (Jackson Labs: 014548; B6.Cg-Tg(Slc32a1-COP4*H134R/EYFP)8Gfng/J)⁴⁴ mice. Twelve (nine male and three female) were wild-type mice, including nine C57BL/6J (Jackson Labs: 000664) mice, one wild-type littermate for each of VGAT-ChR2-EYFP, TH-Cre (Jackson Labs: 008601; B6.Cg-7630403G23Rik^{Tg(Th-cre)1Tmd}/J)⁴⁵, and Etv1-Cre^−/− (Jackson Labs: 013048)⁴⁶. Two were male TH-Cre mice. Two (one male and one female) were Advillin-Cre (Jackson Labs: 032536; B6.129P2-Avil^tm2(cre)Fawa/J)⁴⁷ mice. Mice ranged in age from approximately 2 to 9 months at the start of training. A set of behavioural testing sessions typically lasted approximately 1 month (Supplementary Table 1).

Surgery

Before behavioural testing, mice underwent implantation of a metal headpost. For surgical procedures, mice were anaesthetized with isoflurane (1–2%) and kept on a heating blanket (Harvard Apparatus). Lidocaine or bupivacaine was used as a local analgesic and injected under the scalp at the start of surgery. Ketoprofen was injected intraperitoneally to reduce inflammation. All skin and periosteum above the dorsal surface of the skull were removed. The temporal muscle was detached from the lateral edges of the skull on either side and the bone ridge at the temporal–parietal junction was thinned using a dental drill to create a wider accessible region. Metabond (C & B Metabond) was used to cover the entirety of the skull surface in a thin layer, seal the skin at the edges and cement the headpost onto the skull over the lambda suture.

To make the skull transparent, a layer of cyanoacrylate adhesive was then dropped over the entirety of the Metabond-coated skull and left to dry. A silicone elastomer (Kwik-Cast) was then applied over the surface to prevent deterioration of skull transparency before photostimulation. Buprenorphine was used as a post-operative analgesic and the mice were allowed to recover over 5–7 days following surgery with free access to water.

For silicon probe recording, a small craniotomy of about 600 μm in diameter was made for implantation of a ground screw. The skull was thinned using a dental bur until the remaining bone could be carefully removed with a tungsten needle and forceps. Following this, one or more craniotomies of about 1 mm in diameter were made over the sites of interest for silicon probe recording. Craniotomies were protected with a layer of silicone elastomer (Kwik-Cast) on top. Additional craniotomies were usually made in new locations after finishing recordings in previous ones.

Task control

Task control was implemented with an Arduino-based system (Teensy 3.2 and Teensyduino), including the generation of audio (Teensy Audio Shield). Custom MATLAB-based software with a graphical user interface was developed to log task events and change task parameters. Touches between the tongue and the port were registered by a conductive lick detector (Svoboda lab, HHMI Janelia Research Campus), in which the mouse acted as a mechanical switch that opened (no touch) or closed (with touch) the circuit. Any mechanical switch has electrical bouncing issues when a contact is weak and unstable. To handle bouncing during loose touches, we merged any contact signals with intervals less than 60 ms.

The auditory cue that signalled the beginning of each trial was a 0.1 s long, 65 dB SPL and 15 kHz pure tone. Touches that occurred during the auditory cue were not used to trigger port movement as they were probably due to impulsive licking rather than a reaction to the cue.

The lick port was motorized in the horizontal plane by two perpendicular linear stages (LSM050B-T4 and LSM025B-T4, Zaber Technologies), one for anterior and posterior movement and the other for left and right. A manual linear stage (MT1/M, Thorlabs) installed in the vertical direction controlled the height of the lick port. The motors were driven by a controller (X-MCB2, Zaber Technologies), which was in turn commanded by the Teensy board via serial interface communication. Although the linear stages were set up in cartesian coordinates, we specified the movement of the port using a polar coordinate system. For a chosen origin of the polar coordinates, the seven port positions were arranged in an arc symmetrical to the midline with equal spacing (in arc length) between adjacent positions (Fig. 1a).

A movement of the lick port was triggered by the onset of a touch during sequence performance. A second port movement could not be triggered within a refractory period of 80 ms, which prevented mice from driving a sequence by constantly holding the tongue on the port (although we never observed such behaviour). When a movement was triggered, the port first accelerated (477 or 715 mm s⁻²) until the maximal speed (39.3 mm s⁻¹) was reached, then maintained the maximal velocity, and decelerated until it stopped at the end position. The acceleration and deceleration phases were always symmetrical, such that the maximal velocity might not be reached if the distance of travel was short.

The movement was typically in a straight line. For four of the nine mice, when the two positions were not adjacent (for example, at backtracking and the following transition), the port would move in an outward half circle whose diameter was the linear distance separating the two positions. This arc motion minimized the chance of mice occasionally catching the port prematurely before the port stopped. Nevertheless, catching the port prematurely did not trigger the next transition in a sequence because, in this case, the port movement could only be triggered again after 200 ms from the start of backtracking (and 300 ms after the following touch). As a result, mice always needed to touch the port at the fully backtracked position to continue progress in a sequence.

The control of port movement was similar for zigzag sequences except that five port positions were used instead of seven, the refractory period before the next trigger was 100 ms, the acceleration was 2,000 mm s⁻², the maximal speed was 75 mm s⁻¹ and every port movement travelled along an outward half circle.

Mice performed the task in darkness with no visual cues about the position of the port. To prevent mice from using sounds emitted by the motor to guide their behaviour, we played two types of noise throughout a session. The first was a constant white noise (cut-off at 40 kHz; 80 dB SPL) and the second was a random playback (with 150–300-ms interval) of previously recorded motor sounds during 12 different transitions.

Two-axis optical force sensors

A stainless steel lick tube was fixed on one end to form a cantilever. Mice licked the other free end, producing a small displacement (approximately less than 0.1 mm at the tip for 5 mN) of the tube. Two photointerrupters (GP1S094HCZ0F, Sharp) placed along the tube (Extended Data Fig. 1c, d) were used to convert the vertical and horizontal components of displacement into voltage signals. Specifically, the cantilever normally blocked about half of the light passing through, outputting a voltage value in the middle of the measurement range. Pushing the tip down caused the cantilever to block more light at the vertical sensor and thereby decreased the output voltage; conversely, less force applied at the tip resulted in increased voltage. For the horizontal sensor, pushing the tube to the left or right decreased or increased the voltage output, respectively. Output was amplified by an op-amp then recorded via an RHD2000 Recording System (Intan Technologies).

By design (the circuit diagram and the displacement–response curve are available in the GP1S094HCZ0F datasheet), the force applied at the tip of the lick tube and the output voltage of the sensor follow a near linear relationship within a range of forces. To find this range, we measured the voltages (relative to baseline) with different weights added to the tip. Excellent linearity (R² = 0.9999) was achieved up to more than 20 mN (Extended Data Fig. 1d). By contrast, the maximal force of a lick was on average about 4 mN (Extended Data Fig. 1f).

The motorization of the lick tube introduced mechanical noise to the force signals. The spectral components of these noises were mainly at 300 Hz and its higher harmonics, presumably due to the resonance frequency of the tube, whereas the force signal induced by licking occupied much lower frequencies. Therefore, we low-pass (at 100 Hz) filtered the original signal (sampled at 30 kHz) to remove the motor noise. Additional interference came from the 850-nm illumination light used for high-speed video, which leaked into the optical sensors (mainly in early experiments with two mice) and caused slow fluctuations in the baseline over seconds. To mitigate this slow drift, we used a baseline estimated separately for each individual lick as follows. We first masked out the parts of the signal when the tongue was touching the port, then linearly interpolated to fill in these masked out lick portions using the neighbouring (that is, no touch) values. These interpolated time series served as the baseline for each lick. As the lick force was only a function of voltage change compared to baseline, the above procedure would at most negligibly affect the force estimation. Owing to the dependency of this procedure on complete touch detection, we excluded eight sessions from behavioural quantifications in Fig. 1 and Extended Data Figs. 1, 2 in which only touch onsets were correctly registered.

High-speed videography and tongue tracking

High-speed video (400 Hz, 0.6-ms exposure time, 32 µm per pixel, 800 × 320 pixels) providing side and bottom views of the mouth region was acquired using a ×0.25 telecentric lens (55–349, Edmund Optics), a PhotonFocus DR1-D1312-200-G2-8 camera and Streampix 7 software (Norpix). Illumination was via an 850-nm LED (LED850-66-60, Roithner Laser) passed through a condenser lens (Thorlabs).

Three deep convolutional neural networks were constructed (MATLAB 2017b, Neural Network Toolbox v11.0) to extract tongue kinematics and shape from these videos. The first network classified each frame as ‘tongue-out’ if a tongue was present, or ‘tongue-in’ otherwise. This network was based on ResNet-50 (ref. ⁴⁸) (pretrained for ImageNet), but the final layers were redefined to classify the two categories using a softmax layer and a classification layer that computes cross-entropy loss. A total of 37,658 frames were manually labelled in which 1,611 frames were set aside as testing data. Image augmentation was performed to expand the training dataset. A standard training scheme was used with a mini-batch size of 32 and a learning rate of 1 × 10⁻⁴ to 1 × 10⁻⁵. The fully trained network achieved a high accuracy in classifying the validation data (Extended Data Fig. 1a).

The second network assigned a vector from the base to the tip of the tongue in each frame classified as tongue-out. L and θ were derived from this vector (Fig. 1c). A total of 12,095 frames were manually labelled in which 643 frames were used only for testing. The architecture and training parameters of this network are similar to those of the classification network except that the final layers were redefined to output the x and y image coordinates of the base, tip and two bottom corners (not used in analysis) of the tongue with mean absolute error loss. The regression error of the fully trained network in testing data was 3.1 ± 5.4° for θ and 0.00 ± 0.13 mm for L (mean ± s.d.). This performance was comparable to human level (Extended Data Fig. 1b). Specifically, a subset of frames (separate from testing data) was labelled by each of the five human labellers. The variability in human judgement was quantified by the differences between L and θ from individual humans and the human mean for each frame. We also computed the differences between L and θ from the network and the human mean for each frame. The two distributions showed a comparable variability, although the network showed small biases (L: humans 0 ± 0.11 mm, network −0.05 ± 0.10 mm; θ: humans 0 ± 5.7°, network 3.3 ± 5.5°; mean ± s.d.).

In a subset of trials and in frames classified as tongue-out, the third network, a VGG13-based SegNet⁴⁹, extracted the shape of the tongue by semantic image segmentation, that is, classifying each pixel as belonging to a tongue or not. Human labellers used a 10-vertex polygon to encompass the area of the tongue in a total of 3,856 frames. The training parameters were similar to the other networks except for a mini-batch size of eight and a learning rate of 1 × 10⁻³.

Behavioural training

Behavioural sessions occurred once per day during the dark phase and lasted for approximately 1 h or until the mouse stopped performing, whichever came earlier. Mice would receive all of their water from these sessions, unless it was necessary to supply additional water to maintain a stable body weight. The amount of water consumed during behaviour was measured by subtracting the pre-session volume of water in the dispenser from the post-session volume. On days in which their behaviour was not tested, they received 1 ml of water. Mice were water restricted (1 ml daily) for at least 7 days before beginning training. Whiskers and hairs around the mouth were trimmed frequently to avoid contact with the port.

The precise position of the implanted headpost varied across mice, so each mouse required an initial setup of the positions of the lick port. The lick port moved in an arc with respect to a chosen origin (see ‘Task control’). The origin was initially set at the midline of the animal and 2 mm posterior from the posterior face of the upper incisors. If there was any yaw of the head, the whole arc was rotationally shifted accordingly. The height of the lick port was manually adjusted until it was approximately 1 mm below the interface between the upper and lower lips when the mouth was closed.

In initial training sessions, the distance between the leftmost (L3) and the rightmost (R3) lick port position was reduced, the radius of the arc was shortened and the water reward was larger. As mice learned the task, both the L3 to R3 distance and the radius of the arc were gradually increased over a few days of training (Extended Data Fig. 1m). The difficulty of the task was increased whenever the mouse showed improvements in performing the task at the current port distance, radius and reward size. The difficulty remained constant in two conditions: either when the maximum set of parameters had been met (a radius of 5 mm for male mice and 4.5 mm for female mice) or if the mouse appeared demotivated (typically indicated by a notable decrease in the number of trials and licks). During the initial training sessions, water was occasionally supplemented at other points during the sequence to encourage licking behaviour. The amount of water reward per trial was eventually lowered to approximately 3 μl. For 3 of the 33 mice included in this study, we first trained them to lick in response to the auditory cue with the lick port staying at fixed positions. After mice responded consistently to the go cue, we shifted to the complete task with gradually increased difficulty. Although the three mice performed similarly to others when well trained, this procedure proved to be less efficient than beginning with the complete task.

Once a mouse had become adept at standard sequences, they were trained on the backtracking sequences. The first nine fully trained mice were used in backtracking related analyses; later, mice used for other purposes were not always fully trained in backtracking. For five of the nine mice, we first trained them with backtracking trials in only one direction and added the other direction once they mastered the first. For three of the nine mice, backtracking trials and standard trials were organized into separate blocks of 30 trials each. In developing this task, we tested subtle variations in the detailed organization of trial types, such as varying the percentage of backtracking trials in a block, or different forms of jumps in the port position. Details appear in Supplementary Table 1. Two of these three mice continued to perform the block-based backtracking trials during recording sessions. All nine mice eventually learned backtracking sequences but showed mixed learning curves (Extended Data Fig. 2a, b). About three mice were more biased towards previously learned standard sequences and tended to miss the port many times before relocating the lick port through exploration. The other six mice more readily made changes.

The shaping processes for zigzag sequences in a total of four mice all differed. Empirically, however, training on standard sequences first until proficiency and then on zigzag sequences could produce desirable performance.

Hearing loss

Hearing loss experiments were performed to exclude the possibility that mice used sounds produced by the motors to localize the motion of the lick port during sequence performance. To induce temporary hearing loss (approximately 27.5 dB attenuation)⁵⁰, we inserted two earplugs made of malleable putty (BlueStik Adhesive Putty, DAP Products Inc.) into the openings of the ear canal bilaterally under microscopic guidance. Earplugs were shaped like balls and then formed appropriately to cover the unique curvature of each ear canal. When necessary, the positioning of the earplugs was readjusted, or larger balls were inserted. Five well-trained mice performed one ‘earplug’ session and one control session. Mice did not have experience with earplugs before the earplug session. In earplug sessions, mice were first anaesthetized under isoflurane to implant earplugs (taking 11–12.5 min), then were put back in the homecage to recover from anaesthesia (taking 10–11.5 min), and performed the task after recovery. In control sessions, mice were anaesthetized for the same duration and allowed to recover for the same duration before performing the task.

Odour masking

Odour masking experiments were performed to exclude the possibility that mice used potential odours emanating from the lick port to localize its position during sequence performance. A fresh air outlet (1.59 mm in diameter) was placed in front of the mouse and aimed at the nose from approximately 2 cm away with an approximately 45° downward angle. We checked the coverage of air flow (2 LPM) by testing whether a water droplet (approximately 3 μl) would vigorously wobble in the flow at various locations, and confirmed that both the nose and all seven port positions were covered. Before the test session, head-fixed mice were habituated to occasional air flows when they were not performing sequences. In the test session, the air flow was turned off first and turned on continuously after the one-hundredth trial (in four mice) until the end of the session, or turned on first and turned off after the one-hundredth trial (in two mice). The air-off period served as the control condition for the air-on period.

Tongue numbing

Tongue numbing experiments were performed to directly test whether proper sequence execution depended on tactile feedback from the tongue. The sodium channel blocker lidocaine is used clinically to block signals from somatosensory afferents in the periphery. Before a behavioural session, mice were anaesthetized under isoflurane, and a cotton ball soaked with 2% lidocaine (for numbing) or saline (as control) was inserted into the oral cavity, covering the tongue. After 10 min, the cotton ball was removed, the anaesthesia was terminated and the mice woke up in a behavioural setup to perform standard sequences. As lidocaine has a relatively short half-life, we limited the analysis to trials performed within approximately 30 min after removing the cotton ball. One of the six mice was excluded from analysis as it was unable to perform the task within approximately 30 min after its tongue was numbed.

Electrophysiology

Two types of silicon probe were used to record extracellular potentials. One (H3, Cambridge Neurotech) had a single shank with 64 electrodes evenly spaced at 20-µm intervals. The other (H2, Cambridge Neurotech) had two shanks separated by 250 µm, where each shank had 32 electrodes evenly spaced with 25-µm intervals. Before each insertion, the tips of the silicon probe were dipped in either DiI (saturated), CM-DiI (1 mg ml⁻¹) or DiD (5–10 mg ml⁻¹) ethanol solution and allowed to dry. Probe insertions were either vertical or at 40° from the vertical line depending on the anatomy of the recorded region and surgical accessibility. Once fully inserted, the brain was covered with a layer of 1.5% agarose and ACSF, and was left to settle for approximately 10 min before recording. On the basis of the depth of the probe tip, the angle of penetration and the position of these sites, the location of units could be determined. Units recorded outside the target structure were excluded from analysis.

Extracellular voltages were amplified and digitized at 30 kHz via an RHD2164 amplifier board and acquired by an RHD2000 system (Intan Technologies). No filtering was performed at the data acquisition stage. Kilosort⁵¹ was used for initial spike clustering. We configured Kilosort to high-pass filter the input voltage time series at 300 Hz. The automatic clustering results were manually curated in Phy for putative single-unit isolation. We noticed a previously reported issue of Phy double counting a small fraction of spikes (with exact same timestamps) after manually merging certain clusters, thus duplicated spike times in a cluster were fixed post-hoc to keep only one.

Cluster quality was quantified using two metrics (Extended Data Fig. 4a–c, e). The first was the percentage of inter-spike intervals violating the refractory period (RPV). We set 2.5 ms as the duration of the refractory period and used 1% as the RPV threshold above which clusters were regarded as multi-units. It has been argued that RPV does not represent an estimate of false alarm rate of contaminated spikes^52,53 as units with low spike rates tend to have lower RPV, whereas units with high spike rates tend to show higher RPV even if they are contaminated with the same percentage of false-positive spikes. Therefore, we estimated the contamination rate based on a reported method⁵². A modification was that we computed the mean spike rate of a cluster from periods during which the spike rate was at least 0.5 spikes per second rather than from an entire recording session. As a result, the mean spike rate reflected more about neuronal excitability than task involvement. Any clusters with more than 15% contamination rate were regarded as multi-units. Combining these two criteria in fact classified fewer single units than using a single, although more stringent, RPV of 0.5%. A low RPV can fail potentially well-isolated fast-spiking interneurons whose inter-spike intervals can frequently be shorter than the set threshold.

Photostimulation

We used the ‘clear-skull’ preparation⁶, a method that greatly improves the optical transparency of intact skull (see the ‘Surgery’ section), to non-invasively photoactivate channelrhodopsin-expressing GABAergic neurons and thus indirectly inhibit nearby excitatory neurons (Extended Data Fig. 3a).

Bilateral stimulation of the brain was achieved using a pair of optic fibres (0.39 NA, 400-µm core diameter) that were manually positioned above the clear skull before the beginning of each behavioural session. These optic fibres were coupled to 470-nm LEDs (M470F3, Thorlabs). The illumination power was externally controlled via WaveSurfer (http://wavesurfer.janelia.org). Each stimulation had a 2-s long 40-Hz sinusoidal waveform with a 0.1-s linearly modulated ramp-down at the end. The peak powers in the main experiments were 16 mW and 8 mW. We used the previously reported 50% transmission efficiency of the clear-skull preparation⁶ and report the estimated average power in the main text. There was a 10% chance of light delivery triggered at each of the following points in a sequence: cue onset, the middle touch or the first touch after water delivery. To ensure that the light from photostimulation did not affect the performance of the mouse through vision, we set up a masking light with two blue LEDs directed at each eye of the mouse. Each flash of the masking light was 2 s long separated by random intervals of 5–10 s. This masking light was introduced several training sessions in advance of photostimulation to ensure that the light no longer affected the behaviour of the mouse. In addition, the optic fibres were positioned to shine light from approximately 5 to 10 mm above the head of the mouse on these days leading up to photostimulation.

In a subset of silicon probe recording sessions (related to Extended Data Fig. 3c–f), we used an optic fibre (0.3 NA, 400-µm core diameter) to simultaneously photoinhibit the same cortical region (within 1 mm) or a different cortical region (approximately 1.5 or approximately 3 mm away) via a craniotomy. The tip of the fibre was kept approximately 1 mm away from the brain surface. For testing the efficiency of photoinhibition, the same 2-s photostimulation was applied but only at the mid-sequence, with 7.5% probability for each of the four powers (1, 2, 4 and 8 mW). For each isolated unit, the photo-evoked spike rate was normalized to that obtained during the equivalent 2-s time window without photostimulation. To avoid a floor effect, we also excluded units that on average fired less than one spike during the no stimulation windows. We classified units as putative pyramidal neurons if the width of the average spike waveform (defined as time from trough to peak) was greater than 0.5 ms, and as putative fast-spiking interneurons if shorter than 0.4 ms or if units had more than twice the firing rate during 8-mW photostimulations than during periods of no stimulation.

With the light powers we used in the main experiments (4 mW each hemisphere), light within a 1-mm distance reduced the mean spike rate of putative pyramidal cells (Extended Data Fig. 3c–e) by 91%, light at approximately 1.5 mm away by 61%, and at approximately 3 mm away by 19% in behaving animals (Extended Data Fig. 3f). The mean spike rate of putative fast-spiking neurons at approximately 3 mm away was also reduced by 19%, rather than showing an increase due to photoactivation, suggesting that the decreased activity of both pyramidal and fast-spiking neurons was probably due to a reduction of cortical input. By contrast, light shined within 1 mm increased the mean spike rate of fast-spiking neurons by 739% and at approximately 1.5 mm by 140%.

Histology

Mice were perfused transcardially with PBS followed by 4% PFA in 0.1 M PB. The tissue was fixed in 4% PFA at least overnight. The brain was then suspended in 3% agarose in PBS. A vibratome (HM 650V, Thermo Scientific) cut coronal sections of 100 μm that were mounted and subsequently imaged on a fluorescence microscope (BX41, Olympus). Images showing DiI and DiD fluorescence were collected to recover the location of silicon probe recordings. The plotted coordinates of recording sites (Fig. 2d) were randomly jittered by ±0.05 mm to avoid visual overlap.

General data analysis

All analyses were performed in MATLAB (MathWorks) version 2019b unless noted otherwise.

The first trial and the last trial were always removed due to incomplete data acquisition. Trials in which mice did not finish the sequence before video recording stopped were excluded from the analyses that involved kinematic variables of tongue motion.

We assigned mice of appropriate genotypes to experimental groups arbitrarily, without randomization or blinding. We did not use statistical methods to predetermine sample sizes. Sample sizes are similar to those reported in the field.

Behavioural quantifications

The duration of individual licks was variable. To average quantities within single licks (Fig. 1, Extended Data Figs. 1, 2, 6), we first linearly interpolated each quantity using the same 30 time points spanning the lick duration (from the first to the last video frame of a tracked lick). L′ was computed before interpolation. When the tongue was short, the regression network showed greater variability in determining θ and sometimes produced outliers. Thus, we detected and replaced outliers using the MATLAB ‘filloutliers’ function (with ‘nearest’ and ‘quartiles’ options), and only included θ when L was longer than 1 mm. In addition, any ‘lick’ with a duration shorter than 10 ms was excluded.

For licks occurring at the most lateral positions, the tongue would typically ‘shoot’ out and quickly but briefly reach a maximal deviation from midline (|θ|_max) (Extended Data Fig. 1g). As a result, the onset of touch mostly occurred around |θ|_max. When analysing licks that may or may not have contact, we used θ_shoot, defined as the θ when L reached 0.84 maximal L (L_max), to succinctly depict the lick angle (Extended Data Fig. 1g).

The instantaneous lick rate was computed as the reciprocal of the inter-lick interval (ILI). The instantaneous sequence speed was defined as the reciprocal of the duration from the touch onset of a previous port position to the touch onset of the next.

Values in the learning curves (Extended Data Figs. 1l, m, 2a, b) were averaged in bins of 100 trials, with 50% overlap of consecutive bins.

The behavioural effects of photoinhibition (Extended Data Fig. 3j–m) were quantified in two steps. First, we used 0.2-s time bins to compute Θ_shoot, L_max, the rate of licks and the rate of touches as functions of time for each trial. The time series of SD(Θ_shoot) was computed from binned Θ_shoot across trials in each experimental condition and each session. Second, bins within a time window during photoinhibition (or equivalent time for trials without inhibition) were averaged to yield a single number. The time window was typically 1 s following the start of photoinhibition. The shorter window helped to minimize the effects ‘bleeding over’ from mid-sequence to initiation, and from consumption to mid-sequence. However, this was not an issue for the consumption period, and we instead used the 2-s window during which light was delivered (Fig. 2c, right; ‘Cons’ in Extended Data Fig. 3k, m). Figure 2b, c presents the same results quantified in Extended Data Fig. 3j,k but directly plotting changes in means between conditions on schematic brain images.

Standardization of ILIs within lick bouts

Owing to individual variability, different mice tended to lick at slightly different rates within lick bouts. The same mouse might also perform a bit faster in one sequence direction than the other. Even in a given direction, a mouse might start faster and then slow down a little, or go slower first and faster later. When aligning trials from heterogeneous sources, a 10% difference in lick rate, for instance, will result in a complete mismatch (reversed phase) of lick cycle after only five licks. Therefore, before the analyses that were sensitive to inconsistent lick rates (Figs. 2e–h, 3, 4, Extended Data Figs. 4–7, except for Extended Data Fig. 4f–h), we linearly stretched or shrunk ILIs within each lick bout to a constant value of 0.154 s (that is, 6.5 licks per second), which is around the overall mean. The lick timestamps used to compute ILIs were the mid-time of the duration of each lick. A lick bout was operationally defined as a series of consecutive licks in which every ILI must be shorter than 1.5× the median of all ILIs in the entire behavioural session. ILIs outside lick bouts were unchanged. For ease of programming, we compensatorily scaled the time between the last lick of a trial and the start of the next trial to maintain an unchanged global trial time. Original time series, including spike rates and L′, were obtained before standardizing ILIs. After standardization, the behavioural and neural time series were resampled uniformly at 400 samples per second.

Trial selection for standard and backtracking sequences

After standardizing lick bout ILIs, we used a custom algorithm to select a group of trials with the most similar sequence performance. First, all trials of the same sequence type in a behavioural session were collected and a time window of interest was determined. In Fig. 2e–h and Extended Data Fig. 4, we used 0–0.5 s from cue onset, −1 to 1 s from middle touch, and −0.5 to 0.7 s from last consummatory touch for the respective periods. In Fig. 3, we used −1 to 1 s from middle touch. In Extended Data Fig. 6, we used −0.5 to 1 s from the first lick touching water. Next, for each trial, we created three time histograms (with a 10-ms bin size): one for all licks, one for all touches and one for touches that triggered port movements. The three time histograms were then smoothed by a Gaussian filter (100-ms kernel width, 20-ms s.d.). Concatenating them along time gave a single feature vector that depicts the licking pattern and performance for the trial. Last, pairwise Euclidean distances were computed among feature vectors of all candidate trials and we chose a subset of n trials with the lowest average pairwise distance, that is, those that have the most similar lick and touch patterns. The number n was set to one-third of the available candidate trials with a minimal limit of n = 10 trials. We used this relatively low fraction mainly to handle the greater behavioural variability in sequences with backtracking. To handle trial-to-trial variability in sequence initiation time (defined as the interval from the cue onset to the onset of the first touch), which was not captured in our feature vectors, before clustering we limited trials to those with a sequence initiation time of less than 1 s.

Trial selection and subsequence matching for zigzag sequences

After standardizing lick bout ILIs, we limited candidate trials to those with perfect sequence execution, that is, no missed licks or breaks. To find the time shift that gave the best match between two subsequences, as illustrated in Fig. 4c, we first computed the median time series of tongue angles (θ) for each of the two sequence types. Next, we identified the best time shifts as those corresponding to the peaks of a cross-correlogram between the two time series.

Analysis of zigzag sequences was intended to reveal whether neurons encoded sequence context (that is, identity) during periods with the same subsequence movements. To aid this purpose, we further selected trials whose θ were closest to the median θ computed from trials of either sequence type pooled together, unless the resulting number of trials was less than one-third of all candidate trials.

Hierarchical bootstrap

Directly averaging trials pooled across animals assumes that data from different animals, acquired in different sessions, come from the same distribution. Potentially meaningful animal-to-animal and session-to-session variability is thereby underestimated. To account for this variability, where noted, we performed a hierarchical bootstrap procedure⁵⁴ when computing confidence intervals and performing statistical tests. In each iteration of this procedure, we first randomly sampled animals with replacement, then, from each of these resampled animals, sampled sessions with replacement, and then trials from each of the resampled sessions. The statistic of interest was then computed from each of these bootstrap replicates.

PETH and NNMF clustering

Spike rates were computed by temporal binning (bin size of 2.5 ms) of spike times followed by smoothing (15-ms s.d. Gaussian kernel). The smooth PETHs were computed by averaging spike rates across trials. Each unit had six PETHs: three time windows (for sequence initiation, mid-sequence and sequence termination) each in two standard sequences (left to right and right to left). We excluded inactive units whose maximal spike rate across the six PETHs was less than 10 spikes per second. For the rest, we normalized PETHs of each unit to this maximal spike rate.

To evaluate the consistency of neuronal spiking across trials, we quantified the uncertainty in PETHs using a variant of bootstrap cross-validation. Specifically, for each neuron and in a given run, we randomly split the trials into two halves and computed PETHs with each half. We then computed the root mean squared error (RMSE) between the two sets of PETHs, producing a single RMSE value. This procedure was performed for every neuron and was repeated 200 times. The mean RMSE value for each neuron across the 200 runs is shown in Extended Data Fig. 4i.

To construct inputs to non-negative matrix factorization (NNMF), the six PETHs of each unit were downsampled from 2.5 ms per sample to 25 ms per sample and were concatenated along time to form a single feature vector.

NNMF is a close relative of principal component analysis (PCA) and has gained increasing popularity for processing neural data⁵⁵. The algorithm finds a small number of activity patterns (non-negative left factor, analogous to principal components in PCA) along with a set of weights for each neuron (non-negative right factor), so that the original PETHs can be best reconstructed by weighted sums of those activity patterns. As a result, a small number of activity patterns (or dimensions) is usually able to capture the main structure of the original PETHs, and the weights of the neuron quantify the degree to which its activity reflects each pattern. In the context of clustering, each pattern describes representative activity of a cluster, and the pattern with the greatest weight for a neuron determines its cluster membership.

NNMF was performed using the MATLAB function ‘nnmf’ with default options. To find the best number of clusters, we tested a range of numbers with bootstrap cross-validation to see what cluster number produced the most consistent cluster membership. In each bootstrap iteration, NNMF with a given cluster number was applied using 50% of randomly sampled neurons. The extracted activity patterns were used to compute cluster memberships for the other 50% of neurons that were held-out. This process was repeated 1,000 times. The final cluster membership of a neuron was the one that had the highest likelihood of containing that neuron. We ran this method with the number of clusters set to each value from 6 to 20, and found that 13 clusters achieved the best consistency (Extended Data Fig. 4j), quantified as the mean likelihood that a neuron was grouped in the same cluster across all bootstrap iterations.

Quantification of rhythmic licking modulation in spike PETHs

Neuronal responses modulated by rhythmic licking should show a modulation frequency that matches the rate of licks (approximately 6.5 licks per second during sequence execution), with a phase shift that may vary from neuron to neuron. Therefore, we first quantified the rhythmicity by fitting a sinusoidal function, f(t) = A × sin(2πω_lickt + Φ) + C, to each PETH (Extended Data Fig. 4n), where the free parameter Φ shifts the function in phase, A and C scale and offset the function vertically to match the neuronal firing rate, and ω_lick is a constant of 6.5. Next, a Pearson’s correlation coefficient (r) was computed between a mid-sequence PETH and its best-fitted sinusoids. Every neuron had two r values, one for each sequence direction. The final rhythmicity was represented by the average of the two (r_avg).

PCA

The input to PCA was the normalized spike rates of simultaneously recorded single units and multi-units (Extended Data Fig. 4d). The original spike rates were first computed by temporal binning (2.5-ms bin size, that is, 400 samples per second) of spike times followed by smoothing (15-ms s.d. Gaussian kernel). To obtain normalized spike rates, we divided the original spike rates by the maximum spike rate or 5 Hz, whichever was greater. We adopted this ‘soft’ normalization technique²⁹ to prevent weakly firing units from contributing as much variance as actively firing units. The percent variance explained by principal components was simply derived from the singular values.

Linear regression and decoding

A linear model can be expressed as

$${y}_{t}={w}^{1}{r}_{t}^{1}+{w}^{2}{r}_{t}^{2}+{w}^{3}{r}_{t}^{3}+\ldots +{w}^{n}{r}_{t}^{n}+c+{\in }_{t}={{\bf{r}}}_{t}^{{\rm{\top }}}{\bf{w}}+c+{\in }_{t}$$

where t is the time in a recording session, n is the number of simultaneously recorded units, y_t is the behavioural variable at t, ${r}_{t}^{i}$ is the normalized spike rate of the i-th unit at t, ${w}^{i}$ is the regression coefficient for the i-th unit, c is the intercept, ${\in }_{t}$ is the error term, and ${{\bf{r}}}_{t}^{{\rm{\top }}}{\bf{w}}$ is the matrix notation form of the summed multiplications.

The normalized population spike rates were computed in the same way as those for PCA. Note that, although the normalization was only necessary for PCA, it did not affect the goodness of fit, R², of linear models. The behavioural variable was either tongue length (L), tongue velocity (L′), tongue angle (θ), sequence identity (I), target position (TP) or relative sequence time (τ) (Fig. 3a, Extended Data Figs. 5, 7). L, L′ and θ were directly available at 400 samples per second. However, these variables had values only when the tongue was outside of the mouth. Therefore, samples without observed values were either set to zero (for L) or excluded from regression (for L′ and θ). I was defined as 1 if the sequence was from right to left and 2 if left to right. τ simply took sample timestamps as its values. TP was the same as I but defined based on the upcoming sequence.

Predicting single responses with dozens of predictors is prone to overfitting. Therefore, we chose the elastic-net⁵⁶ variant of linear regression (using the MATLAB function ‘lasso’ with ‘Alpha’ set to 0.1), which penalizes big coefficients for redundant or uninformative predictors. A parameter λ controls the strength of this penalty. To find the best λ, we configured the lasso function to compute a tenfold cross-validated mean squared error (cvMSE) of the fit for a series of λ values. The smallest cvMSE indicates the best generalization, that is, the least overfit. We conservatively chose the largest λ value such that the cvMSE was within one standard error of the minimum cvMSE. For each model, we derived the R² from this cvMSE and reported it in Fig. 3 and Extended Data Figs. 5, 7.

Linear decoding can be expressed as

$${\hat{y}}_{t}={w}^{1}{r}_{t}^{1}+{w}^{2}{r}_{t}^{2}+{w}^{3}{r}_{t}^{3}+\ldots +{w}^{n}{r}_{t}^{n}+c={{\bf{r}}}_{t}^{{\rm{\top }}}{\bf{w}}+c$$

where ${\hat{y}}_{t}$ is the decoded behavioural variable at t, w and c are the coefficients obtained from regression, and ${{\bf{r}}}_{t}$ is the vector of normalized population spike rates at t. We did not perform additional cross-validation in decoding because (1) 30% of the decoding for standard sequences (0.5–0.8 s in Fig. 3 and −1.3 to −1 s in Extended Data Fig. 7) was from new data; (2) all decoding in backtracking sequences and during consumption periods was from new data; and (3) the model has been proven to be the best generalization via cross-validation when selecting λ.

The matrix notation form of the equation, r^Tw, shows that the linear decoding can be geometrically interpreted as projecting the vector of population spike rates r onto the axis in the direction of vector w, and reading out the length of the projection (scaled by ||w||, plus the intercept c). We therefore referred to this axis as the coding axis. To compute the variance explained for each coding axis, we first obtained its unit vector and projected population spike rates onto it. The variance of the projected values is Var(explained). The total variance, Var(total), of the population activity is the sum of variance of all units. Finally, variance explained equals Var(explained) / Var(total) × 100%.

Support-vector machine classification

First, to prepare a denoised version of the predictors for more robust classification, we performed PCA with normalized population spike rates, and projected the spike rates onto the first 12 principal components. The projected activity was then downsampled from 400 to 66.7 samples per second (Fig. 3f) or 200 samples per second (Fig. 4f, g) to reduce subsequent computation time. Class labels were the sequence identity values, including standard versus backtracking types (Fig. 3f), or the two types of zigzag sequence (Fig. 4).

Classification was performed independently for each time bin with the MATLAB ‘fitcsvm’ function. Linear kernels were used for all classifications. Trials were weighted so that the chance classification accuracy was 0.5 even if the two classes did not have equal numbers of trials. The results were computed with tenfold cross-validation. All other function parameters were kept as the defaults. The null classification results were obtained using the same procedure but with randomly shuffled class labels.

Canonical correlation analysis

The canonical correlation analysis seeks linear transformations of two vectors of random variables such that the Pearson’s correlation coefficients between the transformed vectors are maximized:

$$\mathop{\text{arg}\,max}\limits_{{a}_{i},{b}_{i}}{\rm{c}}{\rm{o}}{\rm{r}}{\rm{r}}({{\bf{U}}}_{{\bf{i}}}{{\bf{V}}}_{{\bf{i}}}),\,{{\bf{U}}}_{{\bf{i}}}={{\bf{a}}}_{i}^{{\rm{\top }}}{\bf{X}},\,{{\bf{V}}}_{{\bf{i}}}={{\bf{b}}}_{i}^{{\rm{\top }}}{\bf{Y}},\,i=1,2,\ldots ,n$$

where X and Y are vectors of random variables, ${{\bf{a}}}_{i}$ and ${{\bf{b}}}_{i}$ are transformation vectors for the i-th iteration, and n is the number of dimensions in X or Y, whichever is smaller. Matrices A and B will be used to represent the concatenated transformation vectors across all iterations.

In the present analysis, X and Y were matrices of sampled data for each session. X contained the time series of the decoded behavioural variables (L, L′, θ, I, τ; zero centred). Y contained the projection of neural activity onto the top principal components obtained from PCA. We focused our analysis on standard sequences, with a time window of −0.5 to 0.8 s relative to the middle touch. The linearly decoded or principal component-projected data were averaged across trials with the same sequence direction. Averaged data from the two sequence directions were concatenated along time.

Canonical correlations were computed using the MATLAB ‘canoncorr’ function between matrices with a selected subset of dimensions. In Fig. 3d, Y was transformed using A^T−1B^TY so that the pattern could be best aligned with the patterns of X. In Fig. 3e, n correlation coefficients (r) quantified the correlation between each pair of U_i and V_i. The average r across the n values reflected the overall alignment between the two transformed matrices.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability

Data are available from the corresponding author upon request.

Code availability

The MATLAB code used to analyse the data is available at GitHub and from the corresponding author upon request.

References

Rosenbaum, D. A. Human Motor Control (Elsevier, 2010).
Mayrhofer, J. M. et al. Distinct contributions of whisker sensory cortex and tongue-jaw motor cortex in a goal-directed sensorimotor transformation. Neuron 103, 1034–1043.e5 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chen, T.-W., Li, N., Daie, K. & Svoboda, K. A map of anticipatory activity in mouse motor cortex. Neuron 94, 866–879.e4 (2017).
Article CAS PubMed Google Scholar
Economo, M. N. et al. Distinct descending motor cortex pathways and their roles in movement. Nature 563, 79–84 (2018).
Article CAS PubMed ADS Google Scholar
Gao, Z. et al. A cortico-cerebellar loop for motor planning. Nature 563, 113–116 (2018).
Article CAS PubMed PubMed Central ADS Google Scholar
Guo, Z. V. et al. Flow of cortical activity underlying a tactile decision in mice. Neuron 81, 179–194 (2014).
Article CAS PubMed Google Scholar
Inagaki, H. K., Fontolan, L., Romani, S. & Svoboda, K. Discrete attractor dynamics underlies persistent activity in the frontal cortex. Nature 566, 212–217 (2019).
Article CAS PubMed ADS Google Scholar
Li, N., Chen, T.-W., Guo, Z. V., Gerfen, C. R. & Svoboda, K. A motor cortex circuit for motor planning and movement. Nature 519, 51–56 (2015).
Article CAS PubMed ADS Google Scholar
Li, N., Daie, K., Svoboda, K. & Druckmann, S. Robust neuronal dynamics in premotor cortex during motor planning. Nature 532, 459–464 (2016).
Article CAS PubMed PubMed Central ADS Google Scholar
Komiyama, T. et al. Learning-related fine-scale specificity imaged in motor cortex circuits of behaving mice. Nature 464, 1182–1186 (2010).
Article CAS PubMed ADS Google Scholar
Kurnikova, A., Moore, J. D., Liao, S.-M., Deschênes, M. & Kleinfeld, D. Coordination of orofacial motor actions into exploratory behavior by rat. Curr. Biol. 27, 688–696 (2017).
Article CAS PubMed PubMed Central Google Scholar
McElvain, L. E. et al. Circuits in the rodent brainstem that control whisking in concert with other orofacial motor actions. Neuroscience 368, 152–170 (2018).
Article CAS PubMed Google Scholar
Welker, W. I. Analysis of sniffing of the albino rat 1). Behaviour 22, 223–244 (1964).
Article Google Scholar
Chartier, J., Anumanchipalli, G. K., Johnson, K. & Chang, E. F. Encoding of articulatory kinematic trajectories in human speech sensorimotor cortex. Neuron 98, 1042–1054.e4 (2018).
Article PubMed PubMed Central Google Scholar
Svoboda, K. & Li, N. Neural mechanisms of movement planning: motor cortex and beyond. Curr. Opin. Neurobiol. 49, 33–41 (2018).
Article CAS PubMed Google Scholar
Ayling, O. G. S., Harrison, T. C., Boyd, J. D., Goroshkov, A. & Murphy, T. H. Automated light-based mapping of motor cortex by photoactivation of channelrhodopsin-2 transgenic mice. Nat. Methods 6, 219–224 (2009).
Article CAS PubMed Google Scholar
Guo, J.-Z. et al. Cortex commands the performance of skilled movement. eLife 4, e10774 (2015).
Article PubMed PubMed Central Google Scholar
Clemens, A. M., Fernandez Delgado, Y., Mehlman, M. L., Mishra, P. & Brecht, M. Multisensory and motor representations in rat oral somatosensory cortex. Sci. Rep. 8, 13556 (2018).
Article PubMed PubMed Central ADS Google Scholar
Proske, U. & Gandevia, S. C. The proprioceptive senses: their roles in signaling body shape, body position and movement, and muscle force. Physiol. Rev. 92, 1651–1697 (2012).
Article CAS PubMed Google Scholar
Franklin, D. W. & Wolpert, D. M. Computational mechanisms of sensorimotor control. Neuron 72, 425–442 (2011).
Article CAS PubMed Google Scholar
Shadmehr, R., Smith, M. A. & Krakauer, J. W. Error correction, sensory prediction, and adaptation in motor control. Annu. Rev. Neurosci. 33, 89–108 (2010).
Article CAS PubMed Google Scholar
Chesler, A. T. et al. The role of PIEZO2 in human mechanosensation. N. Engl. J. Med. 375, 1355–1364 (2016).
Article CAS PubMed PubMed Central Google Scholar
Inagaki, H. K., Inagaki, M., Romani, S. & Svoboda, K. Low-dimensional and monotonic preparatory activity in mouse anterior lateral motor cortex. J. Neurosci. 38, 4163–4185 (2018).
Article CAS PubMed PubMed Central Google Scholar
Stapleton, J. R. Rapid taste responses in the gustatory cortex during licking. J. Neurosci. 26, 4126–4138 (2006).
Article CAS PubMed PubMed Central Google Scholar
Accolla, R., Bathellier, B., Petersen, C. C. H. & Carleton, A. Differential spatial representation of taste modalities in the rat gustatory cortex. J. Neurosci. 27, 1396–1404 (2007).
Article CAS PubMed PubMed Central Google Scholar
Jin, X. & Costa, R. M. Start/stop signals emerge in nigrostriatal circuits during sequence learning. Nature 466, 457–462 (2010).
Article CAS PubMed PubMed Central ADS Google Scholar
Kriegeskorte, N. & Douglas, P. K. Interpreting encoding and decoding models. Curr. Opin. Neurobiol. 55, 167–179 (2019).
Article CAS PubMed PubMed Central Google Scholar
Russo, A. A. et al. Neural trajectories in the supplementary motor area and motor cortex exhibit distinct geometries, compatible with different classes of computation. Neuron 107, 745–758.e6 (2020).
Article CAS PubMed Google Scholar
Russo, A. A. et al. Motor cortex embeds muscle-like commands in an untangled population response. Neuron 97, 953–966.e8 (2018).
Article CAS PubMed PubMed Central Google Scholar
Evarts, E. V. & Tanji, J. Reflex and intended responses in motor cortex pyramidal tract neurons of monkey. J. Neurophysiol. 39, 1069–1080 (1976).
Article CAS PubMed Google Scholar
Heindorf, M., Arber, S. & Keller, G. B. Mouse motor cortex coordinates the behavioral response to unpredicted sensory feedback. Neuron 99, 1040–1054.e5 (2018).
Article CAS PubMed PubMed Central Google Scholar
Pruszynski, J. A. et al. Primary motor cortex underlies multi-joint integration for fast feedback control. Nature 478, 387–390 (2011).
Article CAS PubMed PubMed Central ADS Google Scholar
Scott, S. H., Cluff, T., Lowrey, C. R. & Takei, T. Feedback control during voluntary motor actions. Curr. Opin. Neurobiol. 33, 85–94 (2015).
Article CAS PubMed Google Scholar
Stavisky, S. D., Kao, J. C., Ryu, S. I. & Shenoy, K. V. Motor cortical visuomotor feedback activity is initially isolated from downstream targets in output-null neural state space dimensions. Neuron 95, 195–208.e9 (2017).
Article CAS PubMed PubMed Central Google Scholar
Bollu, T. et al. Cortex-dependent corrections as the tongue reaches for and misses targets. Nature 594, 82–87 (2021).
Article CAS PubMed ADS Google Scholar
Tanji, J. Sequential organization of multiple movements: involvement of cortical motor areas. Annu. Rev. Neurosci. 24, 631–651 (2001).
Article CAS PubMed Google Scholar
Desrochers, T. M., Burk, D. C., Badre, D. & Sheinberg, D. L. The monitoring and control of task sequences in human and non-human primates. Front. Syst. Neurosci. 9, 185 (2016).
Article PubMed PubMed Central Google Scholar
Shima, K. & Tanji, J. Neuronal activity in the supplementary and presupplementary motor areas for temporal organization of multiple movements. J. Neurophysiol. 84, 2148–2160 (2000).
Article CAS PubMed Google Scholar
Tanji, J. & Shima, K. Role for supplementary motor area cells in planning several movements ahead. Nature 371, 413–416 (1994).
Article CAS ADS PubMed Google Scholar
Sohn, J.-W. & Lee, D. Order-dependent modulation of directional signals in the supplementary and presupplementary motor areas. J. Neurosci. 27, 13655–13666 (2007).
Article CAS PubMed PubMed Central Google Scholar
Chabrol, F. P., Blot, A. & Mrsic-Flogel, T. D. Cerebellar contribution to preparatory activity in motor neocortex. Neuron 103, 506–519.e4 (2019).
Article CAS PubMed PubMed Central Google Scholar
Vong, L. et al. Leptin action on GABAergic neurons prevents obesity and reduces inhibitory tone to POMC neurons. Neuron 71, 142–154 (2011).
Article CAS PubMed PubMed Central Google Scholar
Madisen, L. et al. A toolbox of Cre-dependent optogenetic transgenic mice for light-induced activation and silencing. Nat. Neurosci. 15, 793–802 (2012).
Article CAS PubMed PubMed Central Google Scholar
Zhao, S. et al. Cell type-specific channelrhodopsin-2 transgenic mice for optogenetic dissection of neural circuitry function. Nat. Methods 8, 745–752 (2011).
Article CAS PubMed PubMed Central Google Scholar
Savitt, J. M. Bcl-x is required for proper development of the mouse substantia nigra. J. Neurosci. 25, 6721–6728 (2005).
Article CAS PubMed PubMed Central Google Scholar
Taniguchi, H. et al. A resource of Cre driver lines for genetic targeting of GABAergic neurons in cerebral cortex. Neuron 71, 995–1013 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zhou, X. et al. Deletion of PIK3C3/Vps34 in sensory neurons causes rapid neurodegeneration by disrupting the endosomal but not the autophagic pathway. Proc. Natl Acad. Sci. USA 107, 9424–9429 (2010).
Article CAS PubMed PubMed Central ADS Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. Proc. IEEE Conf. Computer Vision and Pattern Recognition 770–778 (2016).
Badrinarayanan, V., Kendall, A. & Cipolla, R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017).
Mowery, T. M., Kotak, V. C. & Sanes, D. H. Transient hearing loss within a critical period causes persistent changes to cellular properties in adult auditory cortex. Cereb. Cortex 25, 2083–2094 (2015).
Article PubMed Google Scholar
Pachitariu, M., Steinmetz, N. A., Kadir, S. N., Carandini, M. & Harris, K. D. Fast and accurate spike sorting of high-channel count probes with KiloSort. Adv. Neural Inf. Process. Syst. https://papers.nips.cc/paper/2016/file/1145a30ff80745b56fb0cecf65305017-Paper.pdf (2016).
Hill, D. N., Mehta, S. B. & Kleinfeld, D. Quality metrics to accompany spike sorting of extracellular signals. J. Neurosci. 31, 8699–8705 (2011).
Article CAS PubMed PubMed Central Google Scholar
Navratilova, Z., Godfrey, K. B. & McNaughton, B. L. Grids from bands, or bands from grids? An examination of the effects of single unit contamination on grid cell firing fields. J. Neurophysiol. 115, 992–1002 (2016).
Article CAS PubMed Google Scholar
Saravanan, V., Berman, G. J. & Sober, S. J. Application of the hierarchical bootstrap to multi-level data in neuroscience. Preprint at https://arxiv.org/abs/2007.07797 (2020).
Williams, A. H. & Linderman, S. W. Statistical neuroscience in the single trial limit. Curr. Opin. Neurobiol. 70, 193–205 (2021).
Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67, 301–320 (2005).
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We thank W. Olson, R. Dasgupta, Y.-T. Chang, V. Chokshi, J. Cohen, M. Economo and K. Svoboda for comments on the manuscript; A. Aly, Y. Hwang and H. Deng for assistance with experiments and data curation; V. Chokshi and W. Olson for sharing animals; B. Bari and J. Cohen for VGAT-Cre;Ai32 mice; and T. Babola for suggestions on the hearing loss experiment. This work was supported by NIH grants R01NS089652 and 1R01NS104834-01 to D.H.O., and NIH center grant P30NS050274.

Author information

Authors and Affiliations

The Solomon H. Snyder Department of Neuroscience, Krieger Mind/Brain Institute, Kavli Neuroscience Discovery Institute, Brain Science Institute, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
Duo Xu, Mingyuan Dong, Linghua Zhang & Daniel H. O’Connor
Undergraduate Studies, Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, MD, USA
Yuxi Chen, Angel M. Delgado & Natasha C. Hughes

Authors

Duo Xu
View author publications
You can also search for this author in PubMed Google Scholar
Mingyuan Dong
View author publications
You can also search for this author in PubMed Google Scholar
Yuxi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Angel M. Delgado
View author publications
You can also search for this author in PubMed Google Scholar
Natasha C. Hughes
View author publications
You can also search for this author in PubMed Google Scholar
Linghua Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Daniel H. O’Connor
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.X., M.D., Y.C., A.M.D., N.C.H. and L.Z. performed the experiments. D.X. developed custom software, hardware, the analysis code and analysed the data, with input from all authors. D.X., M.D., N.C.H. and D.H.O. wrote the paper with input from all authors.

Corresponding author

Correspondence to Daniel H. O’Connor.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review information

Nature thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Behavioral measurements, performance, and control experiments.

a, Confusion matrix showing the performance of the classification network. The numbers represent percentages within each (true) class (n = 1696 frames). b, Performance of the regression network. Top, the gray probability distribution shows how L from five human individuals varied from the mean L across the five. The red distribution shows how predicted L varied from the human mean. Bottom, similar quantification as the top but for θ. n = 573 frames. c, CAD images of the sensor core (left) and the assembly (right) with a lick tube. d, Linear relationship between the applied force and the sensor output voltage. e, Two example trials showing the trajectories of the tongue tip when a mouse sequentially reached the 7 port positions, for both sequence directions. Arrows indicate the direction of time within each trajectory. f, Patterns of kinematics and forces of single licks at each port position (n = 25683 trials from 17 mice; mean ± 95% bootstrap confidence interval). The duration of individual licks was normalized. g, Top, the pattern of angle deviation from midline (|θ|) of single licks pooled from R3 and L3. The vertical line indicates maximum |θ| (|θ|_max). Middle, tongue length (L) expressed as a fraction of its maximum (L_max). The horizontal line indicates, on average, the fraction where |θ|_max occurred. Bottom, time aligned probability distributions showing when touch onset, |θ|_max, L_max, or θ_shoot occurred. Red lines mark quartiles. n = 25683 trials from 17 mice. Lick patterns show mean ± 95% bootstrap confidence interval. h, Top, probability distributions of L_max and θ_touch for licks at each port position. Bottom, probability distributions of the change in Θ_touch (ΔΘ_touch) and instantaneous sequence speed (Methods) for each interval separating port positions. Distributions show mean ± s.d. across n = 17 mice. i, Median time to first touch (top) and the average number of missed licks during sequence performance (bottom) in control (Sham) versus hearing loss (Earplug) conditions. Bars show group means and lines show data from individual mice. ∗∗∗ p < 0.001, n.s. p > 0.05, paired one-tailed bootstrap test, n = 5 mice. j, Average number of missed licks before first touch (top) and during sequence performance (bottom) in control (Normal) versus odor masking (Masked) conditions. Same statistical tests as in (i), n = 6 mice. k, Similar to (j) but comparing control (Saline) versus tongue numbing (Lidocaine) conditions. n = 5 mice. l, Learning curves for 15 individual mice (gray) and the mean (black) showing a reduction in sequence initiation time (left) in response to the auditory cue and an increase in sequence speed (right). The three red asterisks correspond to the three examples of sequence performance shown in (n). m, Gradual increase in task difficulty (Methods) accompanying the improved performance shown in (l). n, Depiction of example sequences performed by a mouse in alternating directions across consecutive trials at different stages of learning. Trial onsets are marked by yellow bars. Port positions shown in the black trace are overlaid with touch onsets (dots).

Extended Data Fig. 2 Performance in backtracking sequences.

a, Learning curves for 9 individual mice (gray) and the mean (black) showing the duration of time spent to perform standard (left) and backtracking (right) sequences. b, Similar to (a) but limited to the interval following the middle lick in standard (left) or backtracking (right) sequences. c, L, L’ and Θ patterns for seven consecutive licks aligned at the Mid touch (number 0). Licks in standard sequences (n = 7458 trials) are shown in black, those in backtracking sequences (n = 2695 trials) are in green. Mean ± s.d. d, Probability distributions of instantaneous lick rate for each interval separating consecutive pairs of the seven licks during standard (black) or backtracking (green) sequences (n = 8 mice; mean ± s.d.). e, Top, time to locate the port at its next position during the 4th interval, for standard sequences (black) or for sequences when the port backtracked (green). Bottom, the number of missed licks during the 4th interval. Mean ± 95% bootstrap confidence interval. n = 7458 standard and 2695 backtracking sequences from 47 total sessions.

Extended Data Fig. 3 Closed-loop optogenetic inhibition defines cortical areas involved in sequence control.

a, Left, dorsal view of an example “clear-skull” preparation. Right, table shows the center coordinates used for illumination for each target region. b, Triggering scheme for photoinhibition at sequence initiation, mid-sequence and water consumption. c, Average spike waveform of putative pyramidal cells (black; n = 224) and putative FS neurons (blue; n = 117), normalized to the amplitude of negative peaks. d, Relationship between spike widths (defined as the trough to peak time of average waveform) and changes in mean spike rate under opto illumination (4 mW, within 1 mm) relative to baseline. Pyramidal cells (black; n = 42) and FS neurons (blue; n = 41) were classified by the two thresholds (dashed lines at 0.4 and 0.5 ms) with ambiguous units (gray; n = 6) in the middle. e, Distributions of spike widths from neurons in (d) (filled bars; n = 89) and from all neurons (empty bars; n = 414) including those where illuminations were not at recording sites. Classification thresholds are shown in dashed lines. f, Left, inhibition efficiency of putative pyramidal cells as a function of light power and distance away from the center of illumination (n = 224 units total). Right, similar to left but showing the excitation efficiency of putative FS neurons (n = 117 units total). Mean ± 95% hierarchical bootstrap confidence interval. g, Example trial with S1TJ inhibition triggered at mid-sequence. Instantaneous tongue angle (Θ) and length (L) are shown in lighter traces. Shooting angles (Θ_shoot) and maximum length (L_max) of each lick are marked using stems on top of the instantaneous traces. The blue waveform indicates photostimulation. Traces and markers during photostimulation are colored blue. h, Similar to (g) but inhibiting ALM–M1TJ. i, Similar to (g) but inhibiting S1BF. j, Changes in licking kinematics (rows) when inhibiting each of the five brain regions (columns), quantified across all three inhibition periods (Methods). Bar plots show mean ± 99% hierarchical bootstrap confidence interval. Gray lines show the data of individual mice. Two-tailed hierarchical bootstrap test, ∗∗∗ p < 0.001, ∗∗ p < 0.01, ∗ p < 0.05, n.s. p ≥ 0.05, after Bonferroni correction for 15 comparisons. k, Changes in the rate of lick (solid bars) and touch (dashed bars) at each of the inhibition periods (rows) when inhibiting each of the five brain regions (columns). Plot style and statistical tests are the same as in (j) but using Bonferroni correction for 30 comparisons. l, Same convention as in (j) but showing results with half-power (2 mW) inhibition. m, Same convention as in (k) but showing results with half-power (2 mW) inhibition.

Extended Data Fig. 4 Characterization of single-unit responses.

a, Contamination rates and refractory period violation rates of all recorded single- (green) and multi-units (black). The shaded region shows the thresholds for assignment as multi- vs single-unit. b, CDF of contamination rate including single- (green) and multi-units (gray). c, Same as (b) but for refractory period violation rate. d, The number of single- (green) and multi-units (black) recorded in each session, grouped by brain area. e, ISI histograms of randomly selected single-units. Refractory period violation rates (RPV) and contamination rates (Con) are labeled on the top (in percent). f, Responses of three simultaneously recorded S1TJ neurons during right-to-left (top half) or left-to-right (bottom half) licking sequences, aligned at cue onset (left column), middle touch (middle column), and the last consummatory touch (right column). For each sequence direction, the first row shows rasters of lick times (touches in black and misses in gray) from 10 selected trials (Methods). Stacked below are spike rasters and the corresponding PETHs (mean ± s.e.) from the same 10 trials for each example neuron. g, Same as (f) but for three example neurons from M1TJ. h, Same as (f) but for three example neurons from ALM. i, Uncertainty in mean spike rate (normalized to peak) estimated by bootstrap crossvalidation (Methods). Each data point is the bootstrap average value of the root mean squared error (RMSE) for a single neuron. Data (n = 804 neurons) are grouped by brain region and presented in whisker-box plots (centre mark: median, bounds of box: 25th and 75th percentiles, max whisker length: 1.5 times IQR, no max or min limit). j, The probability (mean ± 95% bootstrap confidence interval) of a PETH being consistently grouped into the same cluster across bootstrap iterations for different total numbers of clusters. Maximal consistency was achieved when using thirteen clusters for NNMF (arrow). k, NNMF components that represent each of the thirteen PETH clusters. Right-to-left (blue) and left-to-right (red) activities (mean ± 95% bootstrap confidence interval) are overlaid together. The vertical lines are located at time zero in each period. The height of the lines represents the scale of normalized neuronal activity from 0 to 1. l, Histograms of PETH peak times. Plot organization and time alignment are the same as in (f). m, Proportions of neurons from different clusters at different cortical depths. Some clusters with similar types of response were grouped together for better readability. ALM (n = 324), M1TJ (n = 233) and S1TJ (n = 119). n, Quantification of rhythmicity in PETHs. Black traces are mid-sequence PETHs of three example neurons in (f), (g), and (h). Colored traces show the best fit licking rhythms (6.5 Hz sinusoids). Average Pearson’s correlation coefficients (r_avg) of the left-to-right and right-to-left fits are shown beneath neuron IDs. o, Empirical CDFs of r_avg for neurons in S1TJ, M1TJ, and ALM. Circles mark the values of the 9 example neurons in (f), (g), and (h).

Extended Data Fig. 5 Additional analyses for population coding.

a, Relative goodness of fit of tongue angle regressions with a range of shifts in spike times. Black traces and error bars show mean ± 95% bootstrap confidence interval. Lighter traces show individual recordings. (S1TJ, n = 8 sessions; M1TJ, n = 9 sessions; ALM, n = 13 sessions). b, Absolute pairwise Pearson’s correlation coefficients among the five behavioral variables (mean; n = 35 sessions). c, Single-trial decoding of the five behavioral variables (rows; black traces) from 29 simultaneously recorded S1TJ units in a right-to-left (left) and a left-to-right (right) sequence. d, Same as (c) but decoding from 58 simultaneously recorded units in ALM. e, Total percent variance explained (VE) by the first five principal components (left in each region) versus that by the five coding axes (right in each region) during sequence execution. Lighter lines show individual recording sessions and thicker lines show the means. f, Absolute pairwise cosine values among coding axes (mean; n = 35 sessions). g, Cumulative time histograms showing the fraction of trials that could be correctly classified as a standard vs backtracking sequence as time progresses. Two-tailed bootstrap test, ∗∗∗ p ≈ 0, n.s. p = 0.91. h, Same as sequence progress in Fig. 3a, b, but for “zigzag” sequences.

Extended Data Fig. 6 Reward modulation of activity in ALM.

a, Responses of two simultaneously recorded ALM neurons (number 1 and number 2) aligned at the first lick (specifically the middle of a tongue-out period) that touched water reward. For each sequence direction, shown at top are rasters of lick times (touches in black and misses in gray) and the duration of water delivery (blue) from 20 selected trials (Methods). Stacked below are spike rasters and the corresponding PETHs from the same 20 trials for each example neuron. b, The probability of licking (i.e. tongue-out) as a function of time. Licks are sequentially indexed with respect to the first lick (number 0) touching the water. c, Patterns of kinematics and force for single licks around the first lick (number 0) touching water (n = 25289 trials; mean ± 95% bootstrap confidence interval). The duration of individual licks was normalized. The total force (F_total) is the vector sum of vertical and lateral forces. d, Decoding of τ, I and θ (mean ± 99% bootstrap confidence interval) from neuronal populations recorded in ALM (n = 13 sessions), M1TJ (n = 9 sessions), and S1TJ (n = 8 sessions) in right-to-left (blue) or left-to-right (red) trials around the consumption period. e, The difference between the decoded θ traces in right-to-left versus left-to-right trials. Same data source, mean and error presentation as in (d).

Extended Data Fig. 7 Coding of upcoming sequences in ALM.

a, Depiction of sequences performed by a mouse in alternating directions across 14 consecutive trials. Trial onsets are marked by yellow lines. Port positions shown in the black trace are overlaid with touch onsets (dots). b, Probability distributions of Θ_shoot (left) and L_max (middle) for the first 3 licks at the start of a sequence (n = 8 mice; mean ± s.d.). The negative y-axis of Θ_shoot points to the side at which the port is located. The CDF (right; 8 individual mice in gray and the mean in black) of the maximal Θ_shoot explored before touching the port (at the side of negative Θ_shoot). The blue line shows the probability of successfully locating the port without exploring beyond the midline. c, Top, rasters of two example neurons which had persistent and target position (TP) selective firing during the 14 consecutive trials in (a). Bottom, normalized and smoothed (0.25 s s.d. Gaussian kernel) spike rates of the two neurons. d, Decoded instantaneous TP (dark trace) from 58 simultaneously recorded units in ALM, overlaid with normalized port position (light trace). e, Decoding of TP from ALM (mean ± 99% bootstrap confidence interval) before upcoming right-to-left trials (blue) or left-to-right trials (red). Crossvalidated R² is shown (mean ± s.d.; n = 13 sessions). f, Goodness of fit for linear models that predict TP during ITIs, quantified by crossvalidated R². g, Using the same linear models in (e) to decode TP during execution of standard right-to-left (blue) or left-to-right (red) sequences (mean ± 99% bootstrap confidence interval). h, Same as (f) but for τ.

Supplementary information

Reporting Summary

Supplementary Table 1

Mouse information. Excel workbook detailing assignment of individual mice to experiments and figures.

Supplementary Video 1

Example performance in a standard sequence. Top, high-speed video capturing the bottom and side views of the mouse. The tracked base and the tip of the tongue are labeled by red asterisks. Bottom, time-aligned behavioral variables and events similar to Fig. 1d. A moving bar indicates the position of the current frame. The playback was slowed down 5-fold.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, D., Dong, M., Chen, Y. et al. Cortical processing of flexible and context-dependent sensorimotor sequences. Nature 603, 464–469 (2022). https://doi.org/10.1038/s41586-022-04478-7

Download citation

Received: 10 January 2020
Accepted: 26 January 2022
Published: 09 March 2022
Issue Date: 17 March 2022
DOI: https://doi.org/10.1038/s41586-022-04478-7
Springer Nature Limited

This article is cited by

A prefrontal motor circuit initiates persistent movement
- Yihan Wang
- Qian-Quan Sun
Nature Communications (2024)
Dissociating the contributions of sensorimotor striatum to automatic and visually guided motor sequences
- Kevin G. C. Mizes
- Jack Lindsey
- Bence P. Ölveczky
Nature Neuroscience (2023)
Peripersonal encoding of forelimb proprioception in the mouse somatosensory cortex
- Ignacio Alonso
- Irina Scheer
- Mario Prsa
Nature Communications (2023)

Cortical processing of flexible and context-dependent sensorimotor sequences

Abstract

Similar content being viewed by others

Main

Optogenetic inhibition screen

Sequence tiling of single-unit responses

Hierarchical population coding

Sequence branching signals in ALM–M1TJ

Context-dependent coding of subsequences

Reward modulation in ALM

ALM encodes upcoming sequences

Methods

Mice

Surgery

Task control

Two-axis optical force sensors

High-speed videography and tongue tracking

Behavioural training

Hearing loss

Odour masking

Tongue numbing

Electrophysiology

Photostimulation

Histology

General data analysis

Behavioural quantifications

Standardization of ILIs within lick bouts

Trial selection for standard and backtracking sequences

Trial selection and subsequence matching for zigzag sequences

Hierarchical bootstrap

PETH and NNMF clustering

Quantification of rhythmic licking modulation in spike PETHs

PCA

Linear regression and decoding

Support-vector machine classification

Canonical correlation analysis

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review information

Additional information

Extended data figures and tables

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation