Introduction

The interactions between cortical neuronal spiking and extracellular local-field potentials [LFPs, slow electrophysiological signals that represent the global activity of a certain neuronal ensemble (Buzsaki et al. 2012)] contribute considerably to the processing of sensory stimuli. A plethora of empirical evidence supports the notion that low and high-frequency LFPs mediate a series of mechanisms underlying neural computations. For example, delta (1–4 Hz) and theta (4–8 Hz)-band oscillations modulate neuronal excitability (Lakatos et al. 2005) and subserve attentional and predictive processes, sensory selectivity, and multisensory integration (Lakatos et al. 2005, 2007; Kayser et al. 2008; Schroeder et al. 2008; Schroeder and Lakatos 2009; Luo et al. 2010; Arnal and Giraud 2012; Lakatos et al. 2013; van Atteveldt et al. 2014; Levy et al. 2017). In addition, spike synchronization to oscillatory activity in the gamma (> 30 Hz) range plays an important role in neuronal computations, interareal communication, and selective attention, all well-documented phenomena in the visual system (Konig et al. 1995; Womelsdorf and Fries 2007; Womelsdorf et al. 2007; Fries 2009, 2015). In the auditory cortex (AC), the phase of low-frequency oscillatory activity affects performance in acoustic detection tasks [Stefanics et al. 2010; Henry and Obleser 2012; Ng et al. 2012), but see also Zoefel and Heil (2013)] and may provide a temporal basis for the accurate neuronal representation of well temporally structured auditory stimuli (Nourski and Brugge 2011; Arnal and Giraud 2012; Giraud and Poeppel 2012; Teng et al. 2017; García-Rosales et al. 2018b).

The laminar and columnar organization of the neocortex is one of the major features of the mammalian brain (Mountcastle 1997). In the visual, somatosensory, and auditory modalities, middle layers (typically bottom layer III and layer IV) in a cortical column receive strong thalamic inputs, whereas superficial and deep layers are mostly involved in local corticocortical or distant interareal/corticofugal connections (Linden and Schreiner 2003). In the visual cortex, the functional implications of such laminar specificities in the columnar microcircuit is in agreement with models of top-down and bottom-up neuronal processing, which in turn correlate with differences in neuronal coherence and patterns of causality across cortical depths (Buffalo et al. 2011; Hansen and Dragoi 2011; van Kerkoerle et al. 2014). Similarly, studies in the auditory cortex have shown distinct depth-specific dynamics indicating a differential involvement of supragranular, granular and infragranular layers in auditory attentional processes (O’Connell et al. 2014; De Martino et al. 2015; Francis et al. 2018), and multisensory interactions (Lakatos et al. 2007).

Although low-frequency LFPs and their phase synchronization with cortical spiking could be important for the neuronal representation of complex, temporally well-organized sounds (Kayser et al. 2009; Panzeri et al. 2010; Giraud and Poeppel 2012; García-Rosales et al. 2018a, b), the laminar patterns of spike–LFP coherence in the auditory modality remain largely unexplored. The former is true especially considering the processing of naturalistic acoustic sequences. Here, we aim to bridge this gap by quantifying laminar coherence patterns in the auditory cortex of awake Seba’s short-tailed bats (Carollia perspicillata), while animals listened to conspecific distress vocalizations which comprise discernible temporal scales [i.e., syllables and multisyllabic bouts; (Hechavarria et al. 2016a)]. In previous research, we have shown that cortical spiking activity synchronizes to low-frequency LFPs, which is a fingerprint of the neuronal ability to represent slow acoustic rhythms (García-Rosales et al. 2018b), and that high-frequency LFPs track the fast varying temporal structure of artificial amplitude-modulated sounds and natural conspecific vocalizations (Hechavarria et al. 2016a; García-Rosales et al. 2018a; b). However, these data described mainly responses at cortical depths of ~ 300 to 500 μm, corresponding specifically to input layers of C. perspicillata’s AC. Thus, the question remains as to whether such auditory cortical synchronization patterns show a laminar dependence that would correlate with proposed anatomical models of the columnar microcircuit. This question is directly addressed here by systematically quantifying, in a depth-specific manner, spike–LFP and LFP–stimulus phase synchrony in the AC. Consistent with previous results, we observed that spiking activity in the AC synchronizes to ongoing LFPs in low frequencies, and that LFPs follow the slow and fast temporal structure of natural sound sequences. Critically, the strength of such spike–LFP or LFP–stimulus synchrony depends on cortical depth, pointing towards a spatial specificity of coherence in auditory cortical columns.

Results

Responses to conspecific distress vocalizations in auditory cortical columns

Electrophysiological data were recorded from the auditory cortex of seven awake bats (Carollia perspicillata) using linear laminar electrodes (one shank per penetration) spanning cortical depths from 0 to 750 μm measured orthogonally from the cortical surface, and reaching all six cortical layers (Fig. 1a). In total, we obtained responses from 80 independent penetrations to two natural vocalization sequences (seq1 and seq2, Fig. 1b). Both sequences have been used in previous studies addressing the representation of distress vocalizations in the AC of C. perspicillata (Hechavarria et al. 2016b; García-Rosales et al. 2018a), and constitute typical examples of distress calls from this bat species (Hechavarria et al. 2016a). The calls are composed of individual syllables emitted at fast rates (> 30 Hz), grouped into multisyllabic bouts emitted at rates lower than 10–15 Hz (see Fig. 1b). Notably, seq1 has a marked bout periodicity at ~ 4 Hz (8 bouts in approximately 1.96 s) which is also present, albeit more weakly, in seq2. Figure 1c illustrates the autocorrelograms of the amplitude envelope of each sequence (left, seq1; right, seq2). The oscillatory structure of seq1’s autocorrelation reflects clearly its ~ 4 Hz bout repetition rate.

Fig. 1
figure 1

Neuronal responses to conspecific vocalizations in a representative auditory cortical column. a Nissl-stained histological section in the AC. A silicon probe’s trace (region marked on the left image, zoomed into on the right) spans all layers of a cortical column, and a lesion in the tissue is clearly visible at middle-to-deep locations. The overlaid probe diagram depicts electrode depths, in scale, along the column. Channel separation was of 50 μm. b Spectrotemporal representation of the natural distress sequences used as stimuli. c Envelope autocorrelogram of the vocalizations shown in b (left, seq1; right, seq2). Note the clear periodic temporal structure of the first sequence, with a principal rhythmicity of ~ 4 Hz. d Example responses along a cortical column to seq1, for all recording electrodes. Peri-stimulus time histograms (PSTH, 1 ms bins) are shown on the left subpanel, whereas average LFP responses (n = 50 trials) are shown on the right. Note the oscillogram of the sequence used as stimulus at the top of the panel. The stimulation window is delimited with vertical red dashed lines. e Same as in d, but in response to seq2. The oscillogram of the sequence is shown at the top. f Normalized mean population MUA per channel (shown as a heatmap), in response to both calls (seq1, top; seq2, bottom), across all penetrations. Overlaid white traces show grand-average LFPs, also from all penetrations, at illustrative depths (150, 350, 550, and 750 μm). Marker lines accompanying depth axes depict the extent of all six cortical layers (see a)

Multi-unit activity (MUA) recordings from auditory cortical columns indicate that the temporal structure of the vocalizations was well represented. Figure 1d, e (left subpanels) shows peri-stimulus time histograms (PSTH; 1 ms bin size) obtained from MUA responses in a representative column to the calls analyzed (Fig. 1d, responses to seq1; Fig. 1e, responses to seq2; see Supplementary Fig. 1 for the frequency tuning characteristics across penetrations). At a population level, the MUA typically followed the calls’ slow temporal envelope (i.e., the bouts: 47.66%, 610/1280 responses across 16 electrodes and 80 columns; see “Methods”), a smaller subset of MUA responses phase-locked to the calls’ fast envelope (i.e., the syllables: 10.47%, 134/1280), while another subgroup did not track the fast nor the slow temporal structures of the vocalizations (41.87%, 536/1280). The distribution of bout-, syllable- and non-tracking responses was depth dependent: spiking activity phase-synchronized to either bouts or syllables was more common in layers III to VI (syllable-tracking responses were often found in layers IV–V), whereas non-tracking responses most frequently occurred in layers I and II (see Supplementary Fig. 2b). This was mirrored by the MUA’s synchronization strength to the sequences’ temporal structure, indicating that the locking to the calls’ temporal envelopes was strongest in layers III–VI of the AC (Supplementary Fig. 2C, D). That, for example, units in layers III and IV tracked well the slow bout structure of the sequences is also visible in the representative PSTHs of Fig. 1d, e, and in the population average MUA depicted in Fig. 1f.

Average LFP responses to both acoustic sequences are shown in Fig. 1d, e (right), for all recording depths. The LFP traces exhibited phase consistencies with the slow and fast temporal envelopes of the calls, although more prominently so at depths of 0–100 and 250–750 μm (mostly layers I–II and IV–VI, respectively; see also the overlaid grand-average LFPs at representative depths in Fig. 1f, white traces). The MUA responses and LFP traces depicted in Fig. 1 also suggested that the spiking activity synchronized with cortical oscillations more strongly at middle depths of the cortex.

Laminar and condition dependence of spike-field coherence in the AC

Synchrony between spikes and LFPs was calculated using the spike-field coherence (SFC) metric (Rutishauser et al. 2010). In brief, the SFC is a frequency-dependent, normalized coherence index that quantifies the phase-locking between spikes and ongoing field potentials. Because the metric is biased by the numbers of spikes used to calculate it, we included only columns for which at least 150 spikes were detected in all electrodes, in response to each call and during spontaneous activity (79/80 columns). The SFC values in both conditions (spontaneous and sound-driven activity) were compared to assess the laminar specificity of spike–LFP synchrony in the auditory cortex. For the quantification of laminar coherence patterns, all penetrations were pooled regardless of the temporal response properties of individual electrodes within them, so that paired tests could be performed when statistically comparing across channels (see, for example, comparisons in Fig. 3d; note that the uneven distribution of BT, NT or ST responses across depths would have prevented such approach).

Layer-specific SFC was evident during spontaneous activity and acoustic processing (Fig. 2). Independently of the condition considered (with or without auditory stimulation), electrodes located in putative layers IV–VI typically had the strongest coherence values (Fig. 2a, b; spontaneous activity, left; seq1, middle; seq2, right). In line with previous results in this bat species (García-Rosales et al. 2018a), spike–LFP synchronization was prominent in low frequencies of the spectrum, encompassing the delta-band (1–4 Hz) during spontaneous activity, and the theta-band (4–8 Hz) during acoustic stimulation, particularly in response to seq1. Besides the apparent frequency shift of the SFC from delta to theta-bands in response to the first sequence, an interesting effect of acoustic stimulation was the laminar reorganization in relative strength of spike–LFP locking. That is, while the highest SFC values during spontaneous activity occurred at depths > ~ 450 μm (corresponding mostly to layers V and VI), in response to either sequence the highest coherence estimates were found at depths of ~ 350 to 600 μm (spanning mostly layers IV and V; Fig. 2a, b) instead. To further visualize these patterns, we calculated the average gradient of the difference between the SFC during auditory processing and spontaneous activity across columns (Fig. 2c). The gradient indicates the overall tendency of SFC strength as it shifts in depth and frequency from spontaneous to sound-driven activity. Gradient vectors (black arrows in Fig. 2c) located in the region spanning depths ≥ 500 μm and SFC frequencies < 4 Hz (i.e., the lower left quadrant of the heatmaps) suggested a change in the depth-dependent SFC strength from lower to middle layers, and a SFC frequency shift (in response to seq1 but not as clear in response to seq2) from delta to theta-bands of the spectrum (the latter corresponding to seq1’s periodicity; Fig. 2d). Note that the directionality of the vectors supports the observed shifts in depth and frequency (in response to seq1) of the SFC (Rayleigh tests, significant circular directionality when p < 0.05; p = 8.03 × 10−7 for seq1 and p = 4.66 × 10−4 for seq2).

Fig. 2
figure 2

Laminar patterns of spike–field coherence in the AC. a Spike–field coherence (SFC) at exemplary depths (50, 300, 450 and 700 μm; n = 79 columns, data shown as mean ± SEM), during spontaneous activity (left) and sound processing (middle, responses to seq1; right, responses to seq2). b Population-level laminar patterns of SFC, during spontaneous activity (left) and sound processing (middle, responses to seq1; right, responses to seq2). c Normalized mean difference, for low frequency (< 40 Hz LFPs) and across cortical depths, between SFC in response to natural vocalizations (seq1, left; seq2, right) and SFC during spontaneous activity. Red colors in the heatmaps indicate SFC during acoustic processing > SFC during spontaneous activity. Gradient vectors are indicated as arrows. d Gradient vectors corresponding to the region in c with depth ≥ 500 μm and LFP frequencies < 4 Hz (delta-band; i.e., lower left quadrant of heatmaps in c. The vectors’ directionalities (Rayleigh tests; p = 8.03 × 10−7 in response to seq1, p = 4.66 × 10−4 in response to seq2) demonstrate the shift in depth and frequency of the SFC. Phases of π/2 (3/2π) imply shifts towards lower (higher) depths, while phases of π (0) imply shifts towards lower (higher) frequencies. The former is also indicated in the figure. Marker lines accompanying depth axes depict the extent of all six cortical layers (see also Fig. 1a)

Because of the above, we sought to quantify and statistically address the laminar dynamics of low-frequency SFC at the population level. We focused on the theta-band because we had previously shown a significant increase of theta-SFC in the bat AC during the processing of natural sequences (García-Rosales et al. 2018a), and we thus aimed to unravel laminar dependencies of such theta spike–LFP synchronization. Moreover, the data depicted in Fig. 2 suggested that the SFC shift in depth also occurred in this frequency range. Regarding the theta-SFC in response to seq1, there was a significant coherence increase (from spontaneous to stimulus-related activity; FDR-corrected Wilcoxon signed rank tests, pcorr ≤ 0.0419) in most electrodes with depths up to 350 μm (layer IV), whereas in response to seq2, the most consistent changes occurred at depths of 200–300 μm (mostly layers III and IV; pcorr ≤ 0.0485; see Fig. 3a, b). The increase was on average higher than 40% at the latter depths in response to both communication calls (Fig. 3b), whereas the effect size of such differences, measured using Cliff’s delta, was medium to small in depths ranging ~ 150 to 400 μm (layers III and IV), and typically negligible in layers V–VI (i.e., depths > 500 μm; Fig. 3c). The boundaries for negligible (absolute values < 0.147), small (absolute values < 0.33), medium (absolute values < 0.474) and large (absolute values ≥ 0.474) effects were set after (Romano et al. 2006).

Fig. 3
figure 3

SFC in spontaneous activity and during auditory processing. a Theta-band (4–8 Hz) SFC across recording depths during spontaneous activity and the processing of both communications sequences used as stimulus (left, responses to seq1; right, responses to seq2). Data depicted in red or green correspond to population-level theta SFC for either seq1 (left) or seq2 (right), respectively; data depicted in black correspond to theta-SFC during spontaneous activity (data shown as mean ± SEM). b Theta-SFC change (in percentage) from spontaneous activity to sound processing (i.e., [SFCseq − SFCspont]/SFCspont × 100). Stars indicate changes that were significantly different from zero (FDR-corrected Wilcoxon signed rank tests; *pcorr < 0.05, **pcorr < 0.01; red stars depict statistics related to seq1, whereas green stars depict statistics related to seq2). c Effect size (Cliff’s δ) of the differences between theta-band SFC during sound processing and spontaneous activity (seq1, red; seq2, green). Boundaries for magnitude assessment were negligible, absolute values < 0.147; small, absolute values < 0.33; medium, absolute values < 0.474; and large, absolute values ≥ 0.474. d Significance matrices of pairwise comparisons, between all channel pairs, of theta-band SFC during spontaneous activity (left), or sound stimulation (middle, responses to seq1; right, responses to seq2). Each cell (i, j) in a matrix represents the FDR-corrected p value of a Wilcoxon signed rank test comparing theta-band SFC between depths i and j. Comparisons inside the red contour lines were statistically significant (pcorr ≤ 0.0252 during spontaneous activity; pcorr ≤ 0.0416 in response to seq1; and pcorr ≤ 0.0337 in response to seq2). e Collapsed (summed across cortical depths) population-level SFC during spontaneous activity (black), and in response to seq1 and seq2 (red and green, respectively). Data are shown as mean ± SEM. f Peak frequency (up to 20 Hz) of the SFC in response to seq1 (red), seq2 (green), or during spontaneous activity (black; shown for clarity on a separate subpanel). Note the median of each distribution marked as triangle at the top of the histograms. There were significant differences between the SFC peak frequency in response to seq1 and during spontaneous activity (FDR-corrected sign tests, pcorr = 0.0074), but not between peak frequencies in response to seq2 and spontaneous activity (pcorr = 0.22). Marker lines accompanying depth axes depict the extent of all six cortical layers (see also Fig. 1a)

Considering the laminar dependence of the low-frequency SFC estimates during spontaneous activity (shown in black in Fig. 3a), pairwise comparisons across electrodes confirmed that theta-band spike–LFP synchrony was strongest in layers V–VI (Fig. 3d; FDR-corrected Wilcoxon signed rank tests; pcorr ≤ 0.0252). The significance matrices in Fig. 3d summarize all pairwise comparisons (left, spontaneous activity; middle and right, responses to seq1 and seq2, respectively; see below). A cell (i, j) in any of the matrices represents the log-scaled p value of statistically comparing the theta-SFC from electrodes at depths i and j (note that layers are indicated in the depth axes). The regions within red contour lines correspond to statistically significant differences in the SFC. These comparisons corroborated depth variations of theta-band SFC from spontaneous activity to auditory processing: in response to either call, the highest SFC values were no longer confined to depths > ~ 450 μm. Instead, strongest coherence occurred at depths of 350–550 μm (i.e., layers IV–V), and the trends depicted in the significance matrices of Fig. 3d (middle and right) corresponded with the inverted ‘V’ (passband) shape of the stimulus-related theta-SFC shown in Fig. 3a. Collectively, our results indicate that the relative laminar strength of theta-band SFC shifted during auditory processing, with the effect of rendering highest coherence values in putative input layers of the AC (see “Discussion”).

Together with the depth-dependent SFC change from spontaneous to stimulus-related activity, the data shown in Fig. 2 also suggested a frequency shift from delta to theta-bands, particularly in response to seq1. We explored this by collapsing laminar coherence estimates in the depth dimension (i.e., calculating the cumulative SFC across electrodes for each column), during spontaneous or sound-driven activity (Fig. 3e). Whereas during spontaneous activity, or when the animals listened to seq2, the peak SFC typically occurred in the delta-band (black and green traces in Fig. 3e, respectively), in response to seq1 the SFC peaked in the theta-band across columns (red trace in Fig. 3e), consistent with the sequence’s slow temporal periodicity. The distribution of peak SFC frequency for all cases analyzed is shown in Fig. 3f (left, responses to seq1 and seq2; right, spontaneous activity). Even though the frequency shifts could be masked after collapsing in the depth dimension due to depth-dependent differences in the SFC (see above), 46.84% (37/79) of the columns had peak coherence in the theta-range (4–8 Hz) in response to seq1, whereas the former was only true for 37.97% (30/79) and 31.65% (25/79) of the columns in response to seq2 and during spontaneous activity, respectively. Consequently, we observed a significant change in the peak SFC frequency in response to seq1 compared to spontaneous activity (FDR-corrected sign test, pcorr = 0.0074), but not for columnar responses to seq2 (pcorr = 0.22). The medians of these distributions (Fig. 3f) are shown as triangles on top of the histograms.

Transient, spontaneous high-excitability events are synchronized to delta-band LFPs

The relatively strong delta-band SFC in layers V–VI of the AC without the presence of acoustic stimuli resonates with previous studies suggesting that transient events of abundant spiking (termed “UP-states” in this study) occur spontaneously in cortex, and originate mostly in deep laminae [see Sakata and Harris (2009), Beltramo et al. (2013) and “Discussion”]. These studies, together with our own coherence results, may indicate that such high-excitability events could be phase-synchronized to cortical oscillations, mostly in low-frequency bands (delta), during spontaneous activity. We tested this hypothesis by correlating the onset of UP-states in our data with the phase of delta LFPs (Fig. 4). Briefly, UP-states were defined as an increase of spiking activity that lasted for at least 50 ms, preceded by at least 100 ms of “silence” (Fig. 4a; see “Methods” for a detailed description of UP-state detection). UP-state onsets were then related to the phase of ongoing delta LFPs, and their coupling strength computed by means of circular statistics (mean vector, R). The distribution of UP-state phases relative to delta oscillations in a representative column is shown in Fig. 4b at all recording depths. In line with previous reports (Sakata and Harris, 2009), our data showed that spiking activity during UP-states was typically strongest and earliest in layers V–VI of the AC (Fig. 4c, d). In fact, the spike latency (quantified as the time point when the spike count reached 10% of the maximum across depths) was robustly anticorrelated with electrode depth (Fig. 4E; latency = 21.98 + (− 0.019) × depth (μm); R = 0.82, p = 1.16 × 10−4), suggesting that these high-excitability events originated and were strongest in deep layers, as observed by Sakata and Harris (2009).

Fig. 4
figure 4

Spontaneous UP-states in the auditory cortex of C. perspicillata are phase-synchronized with delta-band LFPs. a Representative UP-state (transient event of increased spiking) during spontaneous activity. The moment in which the UP-state is detected (see “Methods”) is marked as time 0. Gray traces correspond to the raw LFP; red, to delta-band (1–4 Hz)-filtered LFPs; and blue triangles indicate the occurrence of a spike. b Distribution of UP-state event onsets relative to the phase of delta-band LFPs in a representative penetration (same column from which the UP-state in a was taken; n = 40 event onsets), across cortical depths. c Depth-dependent population spiking activity (normalized) around the onset of an UP-state (time 0; n = 1960 total detected UP-states). Note in such events how the spiking is on average earliest and strongest in layers V–VI. d Cumulative spike count (normalized) around the time of an UP-state (onset at time 0) at exemplary depths. e The latency of the time point when spiking reached 10% of the maximum, across depths, relative to the onset of the UP-state. Latency and depth were negatively correlated (latency = 21.98 + (− 0.019) × depth (μm); R = 0.82, p = 1.16 × 10−4). f Locking strength (mean resultant vector, R; shown as mean ± SEM, n = 49 penetrations) between UP-state onsets and the phase of delta LFPs across cortical depths. UP-states were detected using spikes from all depths, whereas the LFP phases were channel specific. g Significance matrix resulting from pairwise comparisons of R values across electrodes (FDR-corrected Wilcoxon sign rank tests; significance when p ≤ 0.0146). Conventions as in Fig. 3d. h Similar to f, but UP-states were either detected using only spikes from layers I–III (white bars), or spikes from layers V–VI (gray bars). The phase of the LFP was depth-specific. UP-state onsets detected using spikes from deep laminae (layers V–VI) were always significantly better synchronized to delta LFPs than UP-state onsets detected using spikes from only layers I–III (FDR-corrected Wilcoxon sign rank tests; p ≤ 1.0621 × 10−5). Marker lines accompanying depth axes depict the extent of all six cortical layers (see also Fig. 1a)

UP-state onsets showed relatively high synchrony with delta oscillations in layers I and II, as well as bottom layer IV to layer VI (Fig. 4f), a trend confirmed by statistically comparing R values across channels (Fig. 4g; FDR-corrected Wilcoxon sign rank tests; p ≤ 0.0146). The appearance of synchronization between onsets and delta LFPs in top layers was surprising but, because UP-states were detected considering spiking along the entire column, could be an effect of oscillatory activity in superficial layers being well phase-locked to the strong spiking activity of deep laminae, characteristic of UP-state events (see Fig. 4c). To explore whether high R values in superficial layers could be truly attributed to superficial spiking interacting with local delta-band LFPs, we detected UP-states only considering the spiking activity of layers I–III, and related it to the phase of local (i.e., recorded in individual, corresponding electrodes) delta oscillations. The resulting synchronization strength was compared to the synchrony of UP-state onsets and delta LFPs, but when the former were detected using only layer V–VI spiking. Figure 4h shows that UP-state onsets detected using spikes from layers V–VI were always significantly better synchronized to delta LFPs than UP-state onsets detected using spikes from only layers I–III (FDR-corrected Wilcoxon sign rank tests; p ≤ 1.0621 × 10−5). The above also corroborates our observations regarding spike-field coherence (Figs. 2, 3), and suggest that spontaneous UP-state activity onset is phase coherent with low-frequency oscillations in the AC, with a strongest effect in deep laminae.

LFP–stimulus synchrony strength is layer specific

The synchronization between cortical oscillations and the temporal structure of the vocalizations was quantified with the stimulus–field coherence metric [StimFC; see García-Rosales et al. (2018a; b)]. Similar to the SFC, the StimFC is a frequency-dependent, normalized index that measures the phase consistency of the LFPs in relation to the stimulus (see “Methods”). Stimulus–field coherence values from each channel and penetration were z-normalized to a surrogate distribution in which the phase relationships present in oscillatory activity were disrupted. This z-normalized StimFC (zStimFC) was used for the evaluation of laminar variability in oscillatory coherence for responses to either seq1 or seq2. These analyses were performed in the same 79 columns used to quantify the SFC.

Conforming with previous measurements in C. perspicillata’s AC (Hechavarria et al. 2016a; García-Rosales et al. 2018a), auditory cortical LFPs were well synchronized to the temporal structure of natural vocalizations in low and high frequencies (matching the calls’ bout or syllabic rhythms, respectively; Fig. 5). After examining the strength of coherence (zStimFC) across electrodes, we observed that low-frequency synchronization was more prominent in layers I–II, and IV–VI, whereas high-frequency coherence occurred markedly in layers IV–VI of the cortex in response to both calls tested (Fig. 5a–d). Note that since the syllabic rate of seq2 is lower than that of seq1 (average inter-syllable interval of seq1: 15.7 ms; average inter-syllable interval of seq2: 26.4 ms), high-frequency LFP–stimulus synchrony occurred mainly in the 30–50 Hz range of the spectrum in response to seq2, and in the 50–100 Hz range of the spectrum in response to seq1 (see Fig. 5c, d). Figure 5e, f shows the average zStimFC in theta- and high-frequency (50–100 Hz for seq1; 30–50 Hz for seq2; n = 79 columns) bands across cortical depths in response to each sequence. In both cases, theta-band zStimFC was the strongest in superficial and middle-to-deep layers (layer I to the top of layer II, and IV–VI, respectively; see above), whereas high-frequency LFP synchrony was highest in layers IV–VI of the AC, but not in superficial ones. Pairwise statistical comparisons of theta-band or high-frequency zStimFC values across depths are summarized in Fig. 5g, h, and corroborated the abovementioned trends (FDR-corrected Wilcoxon singed rank tests). The matrices follow the same conventions as those in Fig. 3d. In Fig. 5g (corresponding to responses to seq1), the significance threshold was pcorr = 0.0169 for theta-band coherence, and pcorr = 0.0284 for high-frequency (50–100 Hz) coherence. In Fig. 5h (responses to seq2), the significance thresholds were of pcorr = 0.0274 and pcorr = 0.0336, for theta-band and high-frequency (30–50 Hz) zStimFCs, respectively. Taken together, these results confirm that the strength of LFP–stimulus coherence in the auditory cortex was layer specific.

Fig. 5
figure 5

Laminar profile of LFP–stimulus synchronization in the AC. a z-Normalized stimulus-field coherence (zStimFC) at different depths (50, 300, 450 and 700 μm; n = 79 columns, data shown as mean ± SEM) in response to seq1. b Same as in a, but data corresponded to responses to seq2. c Laminar distribution of zStimFC values, in response to seq1. d Same as in c, but considering responses to seq2. e Theta-band (4–8 Hz, black) and high-frequency (50–100 Hz, green) zStimFC across depths, in response to seq1. f Same as in e, but considering responses to seq2. Note that the high-frequency range in this case was of 30–50 Hz. g, h Significance matrices illustrating the results of pairwise statistical comparisons (FDR-corrected Wilcoxon sign rank tests) of zStimFC values in the theta- (left) and high-frequency (right) bands between channels. Each cell (i, j) shows the corrected p value obtained after comparing the zStimFC in either frequency range, between channels i and j. The contour lines delimit regions of statistical significance. g Data in response to seq1 (significance threshold of pcorr = 0.0169 for the theta-band, and pcorr = 0.0284 for the high-frequency range), whereas h shows data in response to seq2 (significance threshold of pcorr = 0.0274 for the theta-band, and pcorr = 0.0336 for the high-frequency range). Marker lines accompanying depth axes depict the extent of all six cortical layers (see also Fig. 1a)

Discussion

In this study, we directly addressed the laminar patterns of spike–LFP and LFP–stimulus coherence in the auditory cortex in response to natural communication sequences. We present two major findings that demonstrate the depth dependence of neuronal synchronization in the AC, namely (1) the strength of cortical spike–LFP coherence is highest in layers IV and V of the cortex during acoustic processing; and (2) low- and high-frequency LFP–stimulus synchronization exhibits layer-specific patterns, with strongest coherence values in layers I–II (for low frequencies) and IV–VI (for low and high frequencies) of the cortex. This is, to our knowledge, the first report of layer-specific spike–LFP and LFP–stimulus synchronization in the AC for the representation of naturalistic acoustic streams. Our main results are summarized in Fig. 6.

Fig. 6
figure 6

Summary diagram of the observed laminar properties of neuronal synchronization in the auditory cortex (AC). The left part of the figure shows a (simplified) schematic of connections within an auditory cortical column. Thalamic afferents (mostly from the ventral region of the medial geniculate body, MGBv in the figure) arrive more densely into bottom layer III, layer IV, and the border of between layers V and VI. Superficial and deep layers are typically involved in connections with regions inside the AC, as well as with other ipsi- and contralateral cortical structures. Deep layers could also receive inhibitory inputs (marked as “inhibitory?”) from higher order structures. Rich connections also exist within a column, including network motifs which involve inhibitory interneurons (represented as a red “cell”). During spontaneous activity, spikes and LFPs in an auditory cortical column are weakly synchronized in layers I–III, whereas the spiking in deep laminae (middle layers IV–VI) is coupled with oscillations in the delta-band. During acoustic processing, spike–LFP synchrony shifts towards middle (mostly input regions: bottom layers III and IV) layers, at preferred frequency bands of delta and theta. Strong and consistent thalamic inputs drive LFP entrainment to high frequencies present in the stimulus’ envelope particularly in layers IV–VI, whereas superficial and middle-to-deep laminae exhibited low-frequency LFP–stimulus synchrony. Layer-specific LFP–stimulus and spike–LFP synchrony may be important for the representation of spectrotemporally complex acoustic sequences

Anatomical correlates of layer-specific neuronal coherence

Before addressing functional implications of depth-dependent spike–LFP and stimulus–LFP synchrony, it is important to consider the anatomical underlying scaffolding supporting such spatial coherence patterns. In terms of columnar architecture, the auditory cortex itself does not differ much from the standard koniocortical model (Mountcastle 1997; Linden and Schreiner 2003; Douglas and Martin 2004; Harris and Mrsic-Flogel, 2013). The AC receives lemniscal thalamic afferents from the ventral section of the medial geniculate body (MGBv in Fig. 6) mostly into lower layer III, layer IV, and the boundary between layers V and VI (Romanski and LeDoux 1993; Linden and Schreiner 2003; Winer and Lee 2007). In contrast, supra- and infragranular layers (most of layers I–III and V–VI, respectively) do not receive the same proportion of inputs from the MGBv, but mediate interareal (corticocortical/cortifugal) connections and multimodal interactions instead (Linden and Schreiner 2003; Winer and Lee 2007). The former is also consistent, for example, with data from the visual cortex (Felleman and Van Essen 1991; van Kerkoerle et al. 2014). In fact, it has been proposed that feedback projections consistently target deep layers of the cortex, initiate spontaneous high excitability events (UP-states), and are mainly of inhibitory nature (Sakata and Harris 2009), whereas supragranular layers appear engaged in multisensory interactions involving somatosensory or visual domains (Lakatos et al. 2007; Atilgan et al. 2018).

Taking into account the canonical circuitry of an auditory cortical column, the difference between patterns of neuronal synchronization during spontaneous activity and acoustic processing (Figs. 2, 3) is not unexpected. As mentioned above, UP-states (i.e., discrete events of increased spiking) during spontaneous activity originate in deep layers (V and VI), potentially as an effect of putative feedback projections from higher order structures (Sakata and Harris 2009), mostly of inhibitory nature (Sanchez-Vives and McCormick 2000; Bastos et al. 2012). The predominance of spike–LFP coherence in the delta-band in layers V and VI can then be straightforwardly explained, if one would speculate that these low-frequency oscillations correlate with such inhibitory rhythms during spontaneous activity. Here, we quantified the presence of UP-states during spontaneous activity (Fig. 4). Our observations align with those reported in the literature as we found evidence indicating that such events may originate in deep layers of the cortex (see Fig. 4c–e). Importantly, we found a correlation between the phase of spontaneous delta oscillations and UP-state onsets (Fig. 4f–h), with strongest effects in deep laminae, which further supports the SFC results. Nevertheless, beyond the phase correlation between high-activity onsets and low-frequency LFPs during spontaneous activity, our data do not allow to make thorough assertions about the impact of UP-states in terms, for example, of whether they are modulated by low-frequency oscillations, or causal in their generation. This could be addressed with experimental approaches that allow the activation/inactivation of specific cortical circuits (see Beltramo et al. 2013).

On the other hand, during acoustic stimulation strong thalamic volleys arrive into input layers of the AC (where afferents are more predominant: lower III, IV, and the boundary between V and VI; see above), altering the spatial distribution of spike–LFP coherence within a cortical column. Thus, neuronal synchrony is, in this case, stimulus related, and may then explain why its relative strength shifts from layer V/VI (in spontaneous activity) towards classical thalamorecipient layers during acoustic processing (Fig. 3a, d). In principle, it is also possible to hypothesize a complementary scenario where spatial changes of spontaneous vs. stimulus-related SFC occur due to the activation of distinct mechanisms which generate cortical oscillations, located at different depths. While layers II/III pyramidal cells have been linked to the generation of low-frequency oscillations during spontaneous activity, Beltramo and colleagues demonstrated via optogenetical manipulations in vivo that only neurons located in deep infragranular laminae (mainly layer V, and not II/III) were necessary and sufficient to account for low-frequency (~ 1 Hz) dynamics in the cortex (Beltramo et al. 2013). During sensory processing, however, distinct oscillatory mechanisms could prevail due, in part, to the strong effect of thalamic afferents. It has been proposed that stimulus-related low-frequency oscillations in the theta-band might be mediated by pyramidal-interneuron theta (PINT)-networks located in input layers [lower III and IV, mostly; (Giraud and Poeppel 2012)]. These networks’ properties would not differ much from those of classical, well-studied pyramidal-interneuron gamma (PING) networks (Tiesinga and Sejnowski 2009; Buzsaki and Wang 2012), other that in their temporal dynamics. The presence of intrinsic oscillatory mechanisms in the cortex that would phase-lock to the auditory input is still actively debated. Nevertheless, computational studies have shown the utility of low-frequency oscillations for the accurate processing of temporally complex stimuli (Hyafil et al. 2015), and recent data have advanced evidence for an intrinsic oscillator capable of synchronizing with slow acoustic streams in the auditory cortex (Doelling et al. 2019). We believe that the above could theoretically also serve as a mechanism contributing to the shift of SFC from deep layers to input layers during spontaneous or sound-driven activity, but our observations remain correlational and the distinct oscillator view, speculative. Whether such cortical oscillators are present in the bat AC (as well as details regarding their spatial location within a column) cannot be thoroughly elucidated with our data.

Layer specificity of LFP–stimulus coherence

In previous studies from the AC of C. perspicillata, we reported that high-frequency LFPs synchronize to fast temporal envelopes of natural stimuli (including both sequences tested in this study) and artificial amplitude-modulated sounds (Hechavarria et al. 2016b; García-Rosales et al. 2018a, b). Here, we further show that fast LFP–stimulus synchronization (quantified with the zStimFC) occurs more strongly in middle-to-deep layers of the cortex (Fig. 5). High-frequency oscillatory synchrony could in principle be explained by the passive propagation of stimulus-evoked potentials originating at subcortical auditory structures, whose neurons follow fast acoustic rhythms with higher reliability than those of the AC (Herdman et al. 2002; Joris et al. 2004; Farahani et al. 2017). An alternative explanation for the phase-locking at high-LFP frequencies is that synaptic inputs into the thalamorecipient layers of the cortex modulate neuronal subthreshold potentials [and consequently the LFP (Buzsaki et al. 2012)], as thalamic neurons can reliably track temporal periodicities above 100 Hz (Creutzfeldt et al. 1980; Joris et al. 2004). The data presented here align to the latter hypothesis, and is supported by four complementary and converging observations: first, subthreshold membrane potentials shape the LFP (Buzsaki et al. 2012; Einevoll et al. 2013; Reimann et al. 2013; Haider et al. 2016); second, subthreshold membrane potentials in the AC synchronize to fast varying acoustic periodicities (Gao and Wehr 2015; Gao et al. 2016); third, spiking activity in the AC can phase-lock to the fast temporal structure of natural communication sequences (García-Rosales et al. 2018a); and fourth, LFP–stimulus synchronization is strongest in input layers of the AC (this study). Taken together, these observations indicate that synchronized high-frequency LFPs in C. perspicillata’s AC are an effect of stimulus-related thalamocortical projections into the cortex (at middle-to-deep layers), and at least not solely attributable to passive propagation of evoked potentials from lower order auditory structures. In other words, synchronized high-frequency LFPs carry signatures of thalamic inputs at fast rates, and therefore could be a direct and useful reference for the representation of fast varying temporal envelopes present in acoustic streams (see García-Rosales et al. 2018a).

On the other hand, low-frequency oscillations in the theta-range were phase-locked to the stimulus’ slow temporal dynamics. Phase synchrony to the slow envelope of acoustic streams is an effective cortical mechanism for the representation of temporal or even spectral regularities present in auditory stimuli (Lakatos et al. 2005; Luo and Poeppel 2007; Kayser et al. 2009; Arnal and Giraud 2012; Giraud and Poeppel 2012; Henry and Obleser 2012; Kayser et al. 2012; Gross et al. 2013; Molinaro and Lizarazu 2018). The underlying principles of synchronous LFP signals involve a phase-resetting of the ongoing oscillations by salient elements in the sensory stream, which ‘align’ ongoing activity to putatively relevant features in the temporal structure of sounds [e.g., edges in the acoustic inputs; see Gross et al. (2013)]. The true nature of low-frequency synchronization remains controversial, because the phase alignment of the LFP may be a consequence of evoked (i.e., transient, large synaptic events) or modulatory (the entrainment of intrinsic low-frequency oscillations) activity in the AC. The first proposition is consistent, for example, with findings by Szymanski et al. (2011) in the AC of anesthetized rats. In that study, the authors report that strong bursts of synaptic activity in thalamorecipient layers account for the phase-resetting and the information content of theta-alpha (7–11 Hz) LFPs in superficial and middle cortical depths (Szymanski et al. 2011). However, an alternative hypothesis proposes the entrainment of ongoing low-frequency oscillations to the regularities of periodic or even quasi-periodic sounds (see Doelling et al. 2019), to some extent mediated by top-down attentional or predictive influences (Arnal and Giraud 2012; Giraud and Poeppel 2012). Although it is clear that LFPs synchronize to the slow temporal structure of communication signals, our current data do not allow to disentangle whether high values of low-frequency LFP–stimulus coherence in superficial and middle layers occur due to evoked activity or oscillatory entrainment (particularly in response to seq1, which has a clear quasi-periodic structure, see Fig. 1c). Both phenomena may likely co-occur in our recordings.

Functional implications of layer-specific spike–LFP coherence

Spikes and local–field potentials in the auditory cortex synchronized in low frequencies of the spectrum with depth-dependent strength (Figs. 2 and 3). In previous studies, we reported that high theta-band SFC is associated with the neuronal capability to represent the relatively slow temporal envelope of artificial and natural sounds (García-Rosales et al. 2018a, b). Since low-frequency oscillatory activity modulates neuronal excitability in the AC (Lakatos et al. 2005; O’Connell et al. 2015), phase-locked fluctuations in the field potentials (see above) could act as an orchestrating mechanism for the organization of the neuronal representation of the stimulus’ temporal structure (Arnal and Giraud 2012; Giraud and Poeppel 2012; Hyafil et al. 2015; O’Connell et al. 2015). Critically, the strength of theta-band SFC was depth-dependent: during sound processing, the highest coherence values were obtained in input layers in response to either call. Together with the fact that stimulus-synchronized MUA responses were most common in input and deep layers (lower III to VI; Supplementary Fig. 2), the laminar SFC patterns observed while animals listened to the sequences support the relationship between spike–LFP synchrony and the neuronal representation of slow acoustic envelopes in input layers of the cortex.

If stimulus-locked low-frequency cortical oscillations orchestrate neuronal firing, it is conceivable that such oscillatory activity in input (and superficial; see Fig. 4) laminae could be top-down modulated to regulate the effective representation of naturalistic sequences (Arnal and Giraud 2012; O’Connell et al. 2014; Barczak et al. 2018). It is noteworthy that top-down interactions improve sensory perception in the somatosensory system (Manita et al. 2015), whereas in the AC attention (a top-down process) sharpens the frequency tuning of neurons (Lakatos et al. 2013) in a layer-specific manner (O’Connell et al. 2015; Francis et al. 2018). In the auditory cortex of the bat Pteronotus parnellii, feedback projections from the dorsal fringe (DF) onto the so-called FM–FM area (FM: frequency modulation; both DF and FM–FM areas are comprised of neurons selective to echo-delay) augment best-delay responses in area FM–FM (putative lower order region) when stimulated neurons in area DF (putative higher order region) had comparable delay tuning (Tang and Suga 2008). That low-frequency oscillations also participate in top-down interactions is supported by empirical evidence from the visual [(von Stein et al. 2000; van Kerkoerle et al. 2014), but see also Spyropoulos et al. (2018)], somatosensory (Haegens et al. 2011), and auditory modalities (Lakatos et al. 2013; Park et al. 2015). Furthermore, the modulation of excitability in input layers of the AC via low-frequency oscillations could also have implications for multisensory integration, based on the fact that somatosensory or visual stimulation paired with acoustic cues enhances the representation of acoustic stimuli in the auditory system (van Wassenhove et al. 2005; Lakatos et al. 2007; Kayser et al. 2008; Luo et al. 2010; Zion Golumbic et al. 2013; Atilgan et al. 2018). Since cross-modality inputs into the AC modulate slow oscillations in superficial layers according to the temporal structure of the cross-modal stimulus (Lakatos et al. 2007; Atilgan et al. 2018), a tempting proposition is that delta-theta LFPs in middle layers serve as a temporal reference frame within which multisensory interactions in the AC would be most (or less) effective, allowing for the organization of auditory-evoked spiking according to temporally congruent signals from other cortical regions. The former may prove a strong mechanism for auditory scene analysis and acoustic stream selectivity in noisy environments, such as in “cocktail party” scenarios (Zion Golumbic et al. 2012; Atilgan et al. 2018).

Methods

Animal preparation and surgical procedures

The study was performed on seven adult bats of the species Carollia perspicillata (four males). Animals were obtained from the colony in the Institute for Cell Biology and Neuroscience, Goethe University, Frankfurt am Main, Germany. All experimental procedures were in compliance with current European regulations on animal experimentation and were approved by the Regierungspräsidium Darmstadt (experimental permit #FU-1126).

Bats were anesthetized before surgery with a mixture of ketamine (10 mg kg−1, Ketavet, Pfizer) and xylazine (38 mg kg−1, Rompun, Bayer). Local anesthesia (ropivacaine hydrochloride, 2 mg/ml, Fresenius Kabi, Germany) was applied in the scalp subcutaneously prior to surgery and to any subsequent handling of the wounds. A rostro-caudal midline incision was made in the skin covering the superior part of the head, after which skin and muscle tissues were carefully removed to expose the skull in the region of the AC. Enough tissue was removed as well to ensure that sufficient area of the skull was exposed to place a metal rod (1 cm length, 0.1 cm in diameter), used to fix the animal’s head during electrophysiological recordings. The rod was attached to the bone using dental cement (Paladur, Heraeus Kulzer GmbH, Germany). The location of the AC was macroscopically assessed with the aid of well-described landmarks (see Esser and Eiermann 1999; Hechavarria et al. 2016a), and the cortex was exposed by cutting a hole (~ 1 mm2) in the skull using a scalpel blade. After surgical procedures, animals had at least one full day of recovery before undergoing experiments.

Recordings were performed chronically in awake bats, and lasted no more than 4 h per day. Water was offered to the animals every ~ 1 to 1.5 h. Between sessions, the bats were able to recover for at least one whole day. No animal was used more than six times in total.

Electrophysiological recordings

All experiments were conducted inside a sound-proofed and electrically isolated chamber, containing a custom-made holder in which the bats were placed throughout the sessions. The temperature of the holder was kept constant at 30 °C with a heating blanket (Harvard, Homeothermic blanket control unit). A speaker (NeoCD 1.0 Ribbon Tweeter; Fountek Electronics, China), used for free-field stimulation and located inside of the chamber, was positioned 12 cm away from the bat’s right ear, contralateral to the hemisphere from which data were acquired. The speaker was calibrated using a ¼-inch microphone (Brüel & Kjaer, model 4135, Denmark), connected to a custom-made microphone amplifier.

Neurophysiological recordings were performed with 16-channel laminar probes (Model A1x16, NeuroNexus, MI; impedance: 0.5–3 MΩ), which were carefully inserted orthogonal to the cortical surface and advanced into the brain with the aid of a piezo manipulator (PM-101, Science 455 products GmbH, Hofheim, Germany), until the top channel was just visible on the surface of the cortex (Schaefer et al. 2015, 2017). The distance between channels was 50 μm, and thus the deepest electrode reached 750 μm from the surface. Probe length was chosen after visualizing C. perspicillata’s cortex in Nissl stained preparations; electrodes spanned all six cortical layers (Fig. 1a). All recordings were performed in primary AC, although we cannot discard the presence of columns belonging to high-frequency fields (see Esser and Eiermann 1999). Electrodes were connected to a microamplifier (MPA 16, Multichannel Systems MCS GmbH, Reutlingen, Germany), and signal acquisition was done with a portable multichannel system with integrated analog-to-digital converter (Multi Channel Systems MCS GmbH, model ME32 System, Germany), using a sampling frequency of 20 kHz (16-bit precision). The data were online monitored and stored in a recording computer with MC_Rack_Software (Multi Channel Systems MCS GmbH, Reutlingen, Germany; version 4.6.2).

Acoustic stimulation

Acoustic stimulation was controlled using a custom-written Matlab (version 7.9.0.529 (R2009b), MathWorks, Natick, MA) software. Frequency tuning curves (FTC) for the recorded multi-unit activity were calculated from responses to pure tone stimuli (10 ms duration, 0.5 ms rise/fall time), whose frequencies were in the range of 5–90 kHz. The sound pressure level (SPL) of the stimuli was of 75 dB SPL for 50 of the penetrations, and of 60 dB SPL for the remaining 30 penetrations. Each frequency-level combination was pseudo-randomly presented 8 times for cases where the sound intensity was of 75 dB SPL, and 50 times in cases where the SPL remained constant at 60 dB SPL.

Two distress calls (sequences) of C. perspicillata were used as natural acoustic stimuli, referred to as seq1 and seq2 throughout the text (see Fig. 1b). Procedures used for the recording of these vocalizations are described in Hechavarria et al. (2016a). Vocalizations were digital-to-analog converted with a sound card (M2Tech Hi-face DAC, 384 kHz, 32 bit), and analog-amplified (Rotel power amplifier, model RB-1050, Rotel, Japan), before being presented through the stimulation speaker located inside the chamber. Prior to presentation, distress calls were down-sampled to 192 kHz and low-pass filtered (80 kHz cutoff). The sequences were multiplied times a linear fading window (10 ms long) at the beginning and end to avoid acoustic artifacts, and were presented in pseudo-random manner a total of 50 times with an interstimulus interval of 1 s, a pretime (silence before the call’s onset) of 300 ms, and a post-time (silence after the call’s offset) of 500 ms.

Histology

We performed Nissl staining of histological sections in the AC to visualize and corroborate the extent, along a cortical column, of the laminar probe. For histology, the animal was euthanized with an intraperitoneal injection of 0.1 ml sodium pentobarbital (160 mg/ml, Narcoren, Boehringer–Ingelheim, Germany). To preserve and fixate the anatomical structure of the brain tissue, the bat was transcardially perfused using a peristaltic pump (Ismatec, Wertheim, Germany) with a pressure rate of 3–4 ml/min. Perfusion was performed with 0.1 M phosphate buffer saline (PBS) for 5 min, and subsequently with a 4% paraformaldehyde (PFA) solution for 30 min. After removing the surrounding tissue, muscles and skull, the brain was carefully eviscerated, fixed in 4% PFA at 4 °C for two nights and placed in an ascending sucrose sequence solution (1 h in 10%, 2–3 h in 20%, 1 night in 30%) at 4 °C to avoid the formation of ice crystals in the tissue. Next, the brain was frozen in an egg yolk embedding encompassing the fixation in glutaraldehyde (25%) with CO2. For sectioning the frozen brain, a cryostat (Leica CM 3050S, Leica Microsystem, Wetzlar, Germany) was utilized and coronal slices (50 µm thick) were prepared and mounted on gelatin-coated slides. Nissl staining was performed on these slides as follows. Brain slices were immersed in 96% ethanol overnight and 70% ethanol (5 min), hydrated in distilled water (3 × 3 min), stained in 0.5% cresyl violet (10 min), rinsed in diluted glacial acetic acid (30 s), differentiated in 70% ethanol + glacial acetic acid until neuronal somata were still red-violet stained with only faint coloration of the background, fixed in an ascending alcohol sequence (2 × 5 min in 96% ethanol, 2 × 5 min in 100% isopropyl alcohol), cleaned by Rotihistol I, II and III solution (Carl-Roth GmbH, Karlsruhe, Germany) and covered with DPX mounting medium. The inspection of the lesion was facilitated by a bright-field, fluorescence microscope (Keyence BZ-9000, Neu-Isenburg, Germany), with which high-resolution photographs were taken to illustrate the electrode track in the cortical tissue (Fig. 1a).

Separation of MUA and LFPs

All data analyses were performed offline using custom-written Matlab (version 8.6.0.267246 (R2015b)) scripts. Multi-unit activity and local-field potentials were separated by bandpass filtering (fourth-order Butterworth digital filter) the demeaned raw electrophysiological signal, with cutoffs of 0.1–300 Hz for LFPs, and 300–3000 Hz for MUA. Spike detection was based on their amplitude relative to the recording noise baseline. A spike was detected if the peak voltage of the signal was above 4 standard deviations from the baseline, and if no other spike occurred for at least 2 ms before the current event.

MUA synchronization to the syllabic or bout structures of the calls

The spiking synchronization ability to either the slow or fast temporal structure of the calls was assessed as described in a previous study (García-Rosales et al. 2018a). This approach allowed us to classify MUA responses as bout-tracking, syllable-tracking or non-tracking. In brief, we used the slow (1–15 Hz) or fast envelopes (50–100 Hz) of the calls, corresponding to the temporal dynamics of the bout or the syllabic rhythms, respectively. Spike times from MUA responses were expressed relative to the instantaneous phase of either envelope, and circular statistics [Rayleigh tests, by means of the Circular Statistics Toolbox, CircStat (Berens 2009)] were performed to evaluate whether the spiking was synchronized to the temporal structure of the vocalization sequences. Units were considered as syllable trackers if they were significantly synchronized (Rayleigh test, p < 0.001) to the fast envelope of both calls. Units that did not fulfill this criterion, but were synchronized to the slow envelope of both calls were defined as bout-trackers. MUA responses that did not fulfill any of the above criteria were classified as non-trackers. This analysis was performed across all depths and columns to obtain the laminar distributions shown in Supplementary Fig. 2. The strength of synchronization (Rs for the slow envelope, or Rf for the fast one) was calculated as the mean circular vector of the distribution of spike phases relative to either the slow or the fast envelopes of the calls. Mathematically, either Rs or Rf can be expressed as follows:

$$R = \left| {\frac{1}{n}\mathop \sum \limits_{k = 1}^{n} {\text{e}}^{{i\phi_{k} }} } \right|,$$
(1)

where R is the mean resulting circular vector, n is the number of spikes, and ϕk is the phase of the kth spike relative to either the slow or the fast envelope.

Spike-field coherence

Synchronization between spiking activity and LFPs was quantified with the spike-field coherence (SFC) metric (Fries et al. 2001; Rutishauser et al. 2010; García-Rosales et al. 2018a). The SFC is a normalized, frequency-dependent synchronization index that quantifies how well-locked the spiking activity and ongoing oscillations are (0 no coherence; 1 perfect coherence). Briefly, the method relies on the selection and averaging of a number of LFP segments (150 in this study) centered at spike times. The averaging yields traces (spike-triggered average, STA) in which only phase-consistent oscillatory components remain. The power of the STAs is then normalized by the average power of the original individual LFP windows (spike-triggered power, STP). Such normalized value is the SFC, which can be mathematically expressed as follows:

$${\text{SFC}}\left( f \right) = \frac{{\varPsi \left( { {\text{STA}}} \right)}}{{\frac{1}{n}\mathop \sum \nolimits_{i = 1}^{n} \varPsi \left( { w_{i} } \right)}},$$
(2)

where \(\varPsi \left( \cdot \right)\) is the function used to calculate power spectra, and the denominator represents the STP of a series of LFP windows w1,2,3,…,n.

To analyze coherence estimates, we used only columns in which 150 spikes were elicited in response to both sequences, across all channels. Because that number of spikes was unlikely to be fired in a single trial, 150 spikes were randomly selected across trials and their associated LFP segments (480 ms long) were used. To reduce sampling biases, the random selection of spikes was repeated 500 times, and from the obtained distribution of coherence spectra, we chose the median as the ‘true’ SFC of a particular channel and column. A similar procedure was conducted to estimate spike–LFP coherence during spontaneous activity, maintaining the same parameters as described above. All power spectra used to calculate the SFC were computed with the multitaper method (Percival and Walden, 1993), implemented in the Chronux toolbox (Bokil et al. 2010), using 2 tapers with a time-bandwidth (TW) product of 2.

UP-state detection and analyses

UP-states of spontaneous activity were defined as an increase in the spiking rate considering the activity across a whole penetration, or either from superficial (I–III) or deep (V–VI) layers only (see Fig. 4). Spikes across the considered electrodes were pooled and the time-dependent spiking rate was calculated via a smoothened PSTH with a 20 ms Gaussian kernel (ksdensity function, Matlab). Similar to previous studies (Sakata and Harris 2009), an UP-state was detected if the spiking activity rose above the mean of the non-zero time-varying spiking rate (also, UP threshold) for ≥ 50 ms. Moreover, a period of “silence” (spiking rate at any point below half the UP threshold) was required to extend at least 100 ms before the onset of the candidate UP-state. Therefore, UP-states were transient increases of activity (lasting no less than 50 ms) that appeared after a relatively prolonged time without elevated spiking.

UP-state onsets were related to delta-band LFPs as follows. First, raw LFP traces were bandpass filtered (fourth-order Butterworth filter) between 1 and 4 Hz (delta-band), and the instantaneous phase of the filtered signal was extracted by means of a Hilbert transform. An UP-state onset at any time t was associated with the instantaneous phase occurring at the same time t for a particular electrode’s LFP. Note that although UP-states were always detected using spiking activity from multiple electrodes, their onsets were paired with LFP traces across channels individually, thus rendering phase distributions, per penetrations, across multiple depths. The strength of the synchronization between UP-state onsets and the phase of delta LFPs for any given channel was calculated as the mean resulting vector (R) of the phase distribution using Eq. 1 (see above), but with phases in the distribution not related to individual spike times (as when calculating spike–stimulus synchrony, Eq. 1), rather to the onset of UP-states during spontaneous activity relative to delta-band LFPs. Because the value of R depends on the sample size of the phase distribution (and hence the number of UP-states), we considered only columns for which at least 40 onsets were detected, and if there were more than 40 events we randomly discarded the excess. In total, 49 columns were used in which at least 40 UP-states were detected considering spikes from all channels (n = 1960 events), and 62 columns were used where 40 UP-states could be detected using both superficial or deep spiking separately (see above). The latter allowed to compare in the same column the effect of laminar location on UP-state onset and delta-LFP synchrony (Fig. 4h).

Stimulus–field coherence

Coherence between the LFP and stimuli was calculated using the stimulus–field coherence metric [StimFC; see García-Rosales et al. (2018b)]. Similar to the SFC, the StimFC is a normalized, frequency-dependent synchronization index that measures how well LFPs phase-lock to the vocalizations. In brief, LFP segments were chosen spanning the entire stimulus presentation window (note that windows were of different length for the cases of seq1 and seq2). These segments, obtained from all 50 trials in response to a certain call, in a column/channel basis, were averaged and the power of the resulting trace (stimulus-triggered average, StimTA) was normalized by the average power of each segment (stimulus-triggered power, StimTP). Mathematically, the former could be represented as follows:

$${\text{StimFC}}\left( f \right) = \frac{{\varPsi \left( { {\text{StimTA}}} \right)}}{{\frac{1}{n}\mathop \sum \nolimits_{i = 1}^{n} \varPsi \left( { w_{i} } \right)}},$$
(3)

where terms are analogous to Eq. (2), except that for StimFC calculations, LFP segments are not centered at spike times, but are temporally fixed to the stimulus onset.

The StimFC was z-normalized to a surrogate distribution in which phase consistency across trials was destroyed as follows: for each trial, the LFP was split at a random time point and the resulting segments were swapped (see also García-Rosales et al. 2018a). With the above, surrogate StimTAs did not present phase-locked components that were observed in the original data. Based on the manipulated LFP windows, a surrogate StimFC was calculated representing coherence values at chance level. This was repeated 500 times, and the observed StimFC, per column and channel, in response to either sequence, was z-normalized to the surrogate distribution. Coherence estimates derived from original and surrogate data were calculated using the same window length, and the power spectra were computed with the multitaper method using five tapers and a TW product of 8.