Introduction

Since the pioneering work by Regen (1913) demonstrated that female field crickets approach the male calling song even when transmitted via a telephone, crickets have become a well-studied model system in neuroethology due to the simple structure of their acoustic signals, their robust singing and phonotaxis behavior, and their rather simple nervous system, which allows analyzing neural mechanisms at the level of single, identified neurons (Huber et al. 1989; Gerhardt and Huber 2002).

Male crickets (Gryllus bimaculatus) rub their front wings together to produce a calling song consisting of chirps with stereotyped sequences of 3–5 pure tone pulses. When females, which are ready to mate, recognize the conspecific song pattern, they phonotactically approach the singing male or an experimental signal source, even when walking on a trackball system. As phonotaxis is selectively tuned to the temporal features of the conspecific calling song (Thorson et al. 1982; Doherty 1985; Hedwig 2006) it makes female crickets an ideal system to approach two fundamental questions in neurobiology which are related to signal processing and motor control: How is temporal selectivity of auditory processing established within the pattern-recognizing network, and how does the network finally trigger phonotactic behavior? Since decades these questions have been a central topic in cricket neurobiology. Parallel to the analysis of phonotactic behavior and the structural and functional organization of the auditory pathway some very different concepts like template matching, band-pass filtering or delay-line coincidence detection were proposed to underlie or describe the neural mechanisms of temporal pattern recognition (Weber and Thorson 1989; Bush and Schul 2006). Most of these concepts rely on behavioral data and so far lack strong neural evidence to back up the suggested claims. With this review we aim to present the different concepts proposed for pattern recognition in crickets and to relate their validity to current neuronal evidence.

Pattern recognition based on an internal template

A reoccurring concept in cricket pattern recognition is the comparison of the acoustic signal with some form of an “internal template”. This may either be a reflection of the singing motor activity or it may correspond to rather vaguely defined functional properties of the pattern recognition network. The proposition is that the pulse pattern can be recognized, when its characteristics match the internal template (Fig. 1a).

Fig. 1
figure 1

a Temporal processing by template matching. The temporal pattern of the auditory signal is compared with the activity of an internal template, e.g. derived from the central pattern generating network, or a rhythmically active network as part of the pattern recognition network. b Neurons ON1, AN1 and DN1 of the auditory pathway (left) an ascending and descending opener interneuron of the singing central pattern generator (right) occupy different regions in the cricket CNS. a Modified from Weber and Thorson (1989) and Bush and Schul (2006); b from Wohlers and Huber (1982) and Schöneich and Hedwig (2012)

Coupling of networks for song pattern generation and pattern recognition

Well before our understanding of the neural circuits underlying singing and auditory processing one of the early concepts of auditory pattern recognition proposed that signal processing and central pattern generation for singing could be coupled at the neural network level. Studying acridid grasshoppers (Haskell 1956) suggested that the discrimination mechanism for song patterns must be housed in the central nervous system as auditory afferent recordings in four species revealed a similar response pattern when exposed to the different species-specific songs. As males and female grasshoppers stridulate Haskell suggested that pattern recognition may be established by a mechanism that would compare an internal efference copy (von Holst and Mittelstaedt 1950) generated by the singing motor system with the auditory afferent signal to enable discrimination between the songs. In a similar way Alexander (1962) suggested that in crickets the neural components necessary for song production may also reside in the mute females. He based this assumption on behavioral studies indicating that aggressive females perform silent wing movements corresponding to rivalry singing of fighting males (Huber 1962). Alexander subsequently speculated that the structure of the central neural networks for singing might be linked with the structure of the networks for pattern recognition as this would also provide a simplification for evolutionary change because signaler and receiver could evolve together and guarantee a persisting communication system. Akin to these premises Hoy (1978) proposed that a feature detector for the conspecific song patterns could be established by a comparison of the auditory neural input pattern with an internal template. This template for auditory feature detection was suggested to be a corollary discharge from the singing CPG which would be active at a low level even in the mute female crickets. However, to what degree do the neural networks for auditory processing and singing overlap? Recent progress in the characterization of the singing CPG (Schöneich and Hedwig 2011, 2012) now allows a detailed comparison with the auditory pathway.

In males and females about 40–60 primary afferents forward the auditory activity from the hearing organ in the tibiae of the forelegs to the auditory neuropil within the ventral, anterior prothoracic ganglion. The axonal arborizations of the afferents synapse with local, ascending and descending auditory interneurons (Wohlers and Huber 1982; Schildberger et al. 1989). While the local omega neurons (ON1 and ON2) connect the left and right side of the auditory neuropil the two ascending neurons (AN1, AN2) project directly towards the brain; two descending neurons (TN and DN1) send axons towards the mesothoracic ganglion (Fig. 1b). Thus, the main neuropil structures and neurons of the auditory pathway are housed in the prothoracic ganglion and further in the brain. How does this relate to the singing motor pathway?

Male crickets sing with rhythmic opening-closing movements of the forewings. Although the brain controls singing behavior by descending command neurons (Bentley 1977; Hedwig 2000) the temporal motor pattern underlying the rhythmic wing movements for sound production is established by neural networks in the thoracic and abdominal ganglia (Fig. 1c) (Schöneich and Hedwig 2011, 2012). Each wing movement is controlled by a pool of opener and closer motoneurons with dendrites located in the dorsal neuropil of the mesothoracic ganglion. Although it is tempting to assume that also the singing CPG is housed within that ganglion recent experiments demonstrate that the metathoracic (Hennig and Otto 1996) and moreover the abdominal ganglia play a crucial role in central pattern generation for singing. Severing the abdominal connectives in front of the third abdominal ganglion causes an immediate stop of pharmacologically induced singing (Schöneich and Hedwig 2011). Metathoracic interneurons that are activated in phase with the opener motoneuron activity extend along the chain of abdominal ganglia. Most importantly the cell body and dendrites of neurons, which reset the chirp pattern and which are part of the singing CPG, are housed in the third abdominal ganglion; with axonal projections towards the mesothoracic motoneurons (Schöneich and Hedwig 2012). Thus, the interneuronal network for singing motor pattern generation in the male is distributed between the metathoracic and the first free abdominal ganglia. Apart from the incidental observations of singing-like wing movements in females (Huber 1962; Hoy 1978), there is no information as to what degree the singing CPG neurons are also present and functional in the female CNS.

Besides genetic evidence that links pattern recognition and song pattern generation (Hoy et al. 1977) at a neuronal level so far the only link between the singing network and the auditory pathway is established by a corollary discharge interneuron that mediates inhibition to auditory interneurons when the singing CPG is activated in males (Poulet and Hedwig 2006). The corollary discharge interneuron maintains their auditory responsiveness; however, it does not respond to stimulation with calling song and thus cannot interfere with pattern recognition. Accordingly, there is no convincing neural evidence that the singing motor networks would provide a calling song template for auditory pattern recognition in female crickets. Also, the calling song command neurons which descend from the brain do not carry information about the pulse pattern but rather show unpatterned tonic activity (Hedwig 2000). As the singing network within the CNS is very different to the organization of the auditory pathway a direct coupling between the networks for pattern recognition and singing motor pattern generation appears to be unlikely, but of course cannot be excluded.

Cross-correlation with an internal template

Pollack and Hoy (1979) tested the mechanism of template matching in T. oceanicus females. The male calling song consists of a chirp and trill section with different pulse intervals. Females, however, showed positive phonotaxis to song patterns even when the specific temporal order of correct pulse intervals of chirps and trills was shuffled and randomized. This contradicts a mechanism that is based on matching the acoustic pattern to a specific internal template of the song structure by a cross-correlation like mechanism and rather points to the processing of single pulse periods. Rapid steering to pairs of sound pulses has recently been demonstrated in Teleogryllus with a fast trackball system (Cros and Hedwig 2014).

Hennig (2003) proposed that pattern recognition may be based on cross-correlation processing of the external sound pattern with an internal template. Hennig based this model on phonotaxis experiments in two cricket species that revealed different preferences for temporal patterns. Phonotaxis in T. oceanicus (chirp with pulse period of 60–70 ms) is based on a pulse-period filter whereas T. commodus (chirp with a pulse period of 50–60 ms) uses a pulse-duration filter. This means that behavioral selectivity in T. oceanicus is strongly affected by a change in pulse period while different pulse durations have a marginal effect on behavior; the opposite is true for T. commodus. The cross-correlation model proposes that the similarity of the acoustic pulse pattern with the internal template is determined over a relatively short specific time window. According to the behavioral data it seems that both species inherit a similar internal template but the evaluation time window differs significantly and is 180–400 ms for T. oceanicus and for T. commodus in the range of 90–160 ms. These two time windows are within a range that is also used for auditory analysis by acridid grasshopper species (Ronacher et al. 2000). The data indicate that the differences in temporal selectivity of both Teleogryllus species rely on differences of the evaluation time window. A change of only one parameter instead of a major change in the internal template would provide an evolutionary mechanism to drive species-specific pattern recognition underlying speciation and the maintenance of species isolation. Such a cross-correlation mechanism would also guarantee that the responses of filter neurons do not depend on sound intensity, a substantial requirement of pattern recognition as also pointed out by Schildberger (1984).

Although it is not clear how internal templates and processing by cross-correlation are implemented in the pattern recognition networks, Hennig suggests that oscillatory properties of neurons or networks operating at physiologically plausible time scales can shape the template pattern as well as the required evaluation time window as indicated by the data of Crawford (1997) and Hutcheon and Yarom (2000). Depending on the time window, only parts of the external patterns and internal template are compared, providing different temporal selectivity. However, even the simplest version of cross-correlation requires a trigger mechanism to align the neural response to the initial sound pulses and to achieve an appropriate comparison of the external pattern and internal template like an oscillating network (Bush and Schul 2006). So far there is no neural evidence that supports such a trigger and neural evidence in favor of an internal template in the auditory pathway of crickets is lacking.

Template matching based on Gabor filters

More recently, Clemens and Hennig (2013) proposed a mechanism for pattern recognition based on two feature detectors derived from linear–nonlinear computational models. The linear filter part acts as template that is compared to the envelope of a song signal by a cross-correlation. The result of this computation is then transformed by the nonlinearity and integrated over a given time window to obtain a feature value for the song pattern. The linear filters, or templates, can be much shorter than the actual calling song signal. These filters can be well approximated by Gabor filters (Smith and Lewicki 2006; Priebe and Ferster 2012), i.e. a sine wave multiplied by a Gaussian function. The positive and negative lobes of a Gabor filter can be viewed as “excitatory” or “suppressive” weights and it is possible to adapt these so that the output of the computational process matches the female phonotactic preference function. By adapting the filter functions and linear and non-linearities the model allows to calculate templates and feature detectors for the variety of temporal patterns of insect songs. An application of this model to large behavioral data sets of two cricket species shows that Gabor filters can describe the specific behavioral preference functions. For example, G. locorojo seems to inherit two band-pass Gabor filters while G. bimaculatus revealed one band-pass Gabor filter and one filter for longer pulse periods. Specific parameters of the filters were related to the shape of the preference functions as there was a shift of the model’s preference functions when parameters of the Gabor filters were adjusted accordingly.

Due to their computational versatility based on linear–nonlinear models and Gabor filters the concept can provide a phenomenological description how in principle the selective responses of the behavior may be formed. It also provides computational algorithms for pattern recognition, especially regarding the selectivity for specific chirp period filters, as a long-lasting integration occurs over longer time scales. Although the concept does not provide any suggestions for the actual structural and functional organization at the neuronal level detailed experimental studies of the pattern recognition process may now allow comparing this theoretical framework with the neurophysiological evidence. As both the computational model and the pattern recognition networks are adapted to solve the same problem of temporal processing and as the model may reflect the overall possible operations in the auditory pathway, similarities between neural data and the filter procedures can be expected. Modeling approaches can be useful to corroborate theoretical concepts like those of Gabor filters and they may indicate what principle mechanisms could be appropriate, but in any case they will need a well-grounded neurophysiological validation.

In bushcrickets (Tettigonia cantans) experiments provide suggestions for pulse-rate recognition by oscillatory neurons. Females phonotactically responded to signals with half the normal pulse rate and also to shifted patterns when these were in phase with the standard normal pattern. Thus in these species oscillations of central neurons that resonate with the pulse-period of the acoustic signal may form a template which is crucial for the pattern recognition process (Bush and Schul 2006).

Temporal pattern recognition by filter neurons in the brain

To unravel the function and organization of the pattern recognizing network it is crucial to analyze the responses of high-order interneurons involved in signal processing that match the characteristic behavioral response (Bullock 1961). With the first systematic study of auditory brain neurons Schildberger (1984) provided neural evidence for a different mechanism of pattern recognition. His experiments suggested that the band-pass tuning of female phonotaxis in G. bimaculatus is a result of specific low-pass and high-pass neurons that in combination shape the activity of band-pass neurons which finally establish the selectivity of the behavior (Fig. 2a).

Fig. 2
figure 2

a Pattern recognition based on low-pass and high-pass filter neurons in the brain, the output of which is combined to a band-pass filter matching phonotactic behavior. b Neural evidence provided for band-pass processing by auditory interneurons. Note the different response dynamics in band-pass neurons as compared to low-pass and high-pass neurons. c Tuning curves of the different filter neurons; the tuning of the band-pass neuron matches the tuning of female phonotaxis, modified from Schildberger (1984)

The ascending neuron AN1 is not tuned to the temporal pattern of the male calling song

Only two identified interneurons (AN1 and AN2) ascend from the prothoracic ganglion to the brain (Wohlers and Huber 1982). The axonal arborizations of AN1 and AN2 form a ring-like structure within the anterior protocerebrum. AN1 is tuned to the carrier frequency of the male calling song and is a key neuron for phonotaxis whereas AN2 may play a minor role (Schildberger and Hörner 1988). AN2 is tuned to high frequencies and is important for negative phonotaxis when crickets respond to calls of predatory bats (Moiseff and Hoy 1983). Schildberger analyzed the responses of AN1 and of local brain neurons to a range of pulse-repetition rates. He varied the pulse periods within the chirps according to the paradigm by Thorson et al. (1982) maintaining a constant sound energy for all pulse periods.

When crickets were stimulated with different pulse periods (5 kHz, 80 dB SPL), interneuron AN1 responded with about 40 APs (action potential) per chirp (see also Wohlers and Huber 1982) and at sound intensities of 60 and 70 dB SPL AN1 generated about 20–30 AP/Chirp at all pulse periods, respectively. This indicates that AN1 just copies any acoustic stimulation pattern and forwards it to the brain (Fig. 2b). This is strong evidence that no temporal filtering occurs at the level of thoracic ganglia and that temporal pattern recognition must occur in the brain. Similar to AN1 also AN2 is not selective for specific pulse periods (Wohlers and Huber 1982); however, a recently reported response decrement in AN2 may correlate with phonotaxis (Stout et al. 2011; Samuel et al. 2013).

Low-pass and high-pass neurons may shape the response of band-pass neurons

According to their response properties and structure brain neurons were divided in two classes (Schildberger 1984, 1985). The first class of neurons (BNC1) revealed arborizations within the anterior protocerebrum overlapping with the ascending interneurons. The second class of neurons (BNC2) exhibited more posterior arborizations mainly within the deutocerebrum. Some BNC1 neurons have projections that overlap with the arborizations of BNC2 neurons. Generally, BNC1 neurons revealed a phasic response to the sound pulses of chirps contrary to BNC2 neurons that responded with a rather tonic pattern of APs over the entire chirp. When tested with the pulse period paradigm, BNC1 (BNC1a) cells responded unselectively similar to AN1 or as in case of BNC1d revealed a stronger response at longer pulse periods, i.e. low pulse rate. Contrary BNC2 cells responded stronger to shorter pulse periods (BNC2d) or revealed band-pass characteristics with maximal responses at the pulse period of the calling song as evident in BNC2a. According to these data, BNC1d would represent a low-pass filter neuron and BNC2b a high-pass filter neuron. A conclusion drawn was that the combined activity of BNC1d and BNC2b neurons will shape the band-pass response of BNC2a that finally corresponds to the tuning of the phonotactic behavior and may represent the last stage of pattern recognition underlying phonotaxis (Fig. 2b, c). A similar concept had been proposed by Rose and Capranica (1983) regarding temporal filtering in the anuran brain. Although Schildberger’s data are appealing for explaining pattern recognition some crucial evidence on how the response of the band-pass neuron is shaped remained open. The structure of BNC1d shows no overlap with the arborizations of BNC2a, and BNC1d is not tuned to the carrier frequency of the male calling song but rather to high sound frequencies. We would not expect a high-frequency neuron to be crucially involved in pattern recognition as high-frequency processing is related to negative phonotaxis (Moiseff and Hoy 1983). Also BNC1d responds with spikes only to the first sound pulse of chirps while the band-pass neuron BNC2a responds with action potentials over the entire duration of the calling song. Furthermore, at high pulse rates the strong activity of the high-pass neuron BNC2b does not lead to activity in the band-pass neuron BNC2a. However, at medium pulse rates of the calling song, the activity of BNC2b apparently drives the activity of the band-pass neuron over the entire chirp, although there is no response in the low-pass neuron BNC1d after the first sound pulse. This indicates that at high pulse rates, some inhibitory mechanism may restrain the activity of the BNC2a. However, besides such open points these experiments were an important step in identifying auditory brain neurons and in comparing neural response patterns with a theoretical concept.

Autocorrelation: processing by a delay-line and coincidence-detection mechanism

An early model by Weber and Thorson (1989) based on Reiss (1964) concept of resonant neural networks suggests processing by a specific delay-line and coincidence-detector mechanism. The principal premise is that auditory activity within the CNS is processed via a direct and a delayed pathway equivalent to an autocorrelation. Pattern recognition is established when the pulse period of the pulse pattern corresponds to the internal neural delay in the second pathway as activity in both pathways is integrated by a coincidence-detector (Fig. 3). If the response of the delayed pathway coincides with the subsequent response to a pulse of the direct pathway, they will generate a consistently stronger response in the coincidence detector. According to autocorrelation processing phonotactic behavior should also occur at multiples of the pulse-period; however, such responses were never observed during phonotaxis in crickets or in these neural recordings.

Fig. 3
figure 3

The delay-line and coincidence detection concept for temporal pattern recognition is based on a direct and a delayed pathway feeding into a coincidence detector. If the auditory response is delayed by the species-specific pulse period the coincidence detector responds best to the species-specific pulse interval, providing an autocorrelation-like processing, modified after Weber and Thorson (1989)

In a classical sense delay-lines are considered to be derived by specific anatomical adaptations like extended axonal projections over longer distances (Carr and Konishi 1988). Axonal extensions for a 40-ms delay, however, would require an additional length of 40 mm at a conduction velocity of 1 m/s. They would not be economic and are unlikely to be present within the small brains of insects. Reiss (1964) argues that in resonant networks long delay lines as required for the processing of communication signals may be the result of inhibitory mechanisms as apparent in reciprocal inhibition. Our recent neural data suggested that temporal filtering of species-specific pulse patterns is established by fast interactions between inhibition and excitation and that neural processing at least shares functional similarities with the delay-line coincidence detector concept (Zorovic and Hedwig 2011; Kostarakos and Hedwig 2012).

Evidence for a local pattern recognition network in the brain

In order to approach a deeper understanding of the mechanism for temporal filtering we recorded the responses of auditory brain neurons to different pulse patterns and compared these with the tuning of female phonotactic behavior (Kostarakos and Hedwig 2012). Following an approach by Zorovic and Hedwig (2011) we focused intracellular recordings to the vicinity of the axonal output region of AN1, where we expected the crucial steps of auditory processing to occur. In this region we identified local brain neurons (Fig. 4a) that respond to the calling song. All neurons have in common that they are located within the anterior protocerebrum with a cell body next to the optical nerve. Their neurites form a ring-like structure that overlaps with the axonal arborizations of the ascending interneuron AN1. One of these neurons (B-LC3) has an axon projecting to the contralateral side of the brain. This connection between the two hemispheres may guarantee pattern recognition in the brain independently from the direction of the sound. The arborization of neurons B-LI2 and B-LI4 are restricted to one side of the brain and the latter reveals axonal projections towards the midline with a beaded appearance typical for axons. The anatomical data provide evidence that these neurons form an auditory neuropil structure linked to a very early stage of auditory processing in the brain. Despite their structural similarity, the neurons showed clearly differing response properties.

Fig. 4
figure 4

a Identified local auditory brain neurons (B-LI2, B-LI3 and B-LI4) form a ring-like arborization pattern in each half of the protocerebrum and match the output structures of the ascending neuron AN1. b Response patterns of the interneurons to three different pulse periods; the pulse period of 34 ms corresponds to the species-specific calling song. c Comparison between the tuning of phonotactic behavior (gray lines) and the response of the local neurons (black lines). Whereas B-LI2 just copies the patterns, the activity of B-LC3 shows some tuning and the activity of B-LI4 matches the tuning of phonotaxis. d Relative responses of phonotaxis and brain neurons plotted against pulse duration (abscissa) and pulse interval (ordinate). Note the different tuning and the close match between phonotaxis and the response pattern of B-LI4. a From Kostarakos and Hedwig (2012) with permission

Inhibition and excitation shape the temporal selectivity of auditory brain neurons tuned to the phonotactic behavior

We used three different temporal paradigms to test the selectivity of the phonotactic behavior and to compare this with the tuning of the brain neurons (see Kostarakos and Hedwig 2012 for details). One paradigm corresponded to the pulse period paradigm used by Thorson et al. (1982). All presented brain neurons revealed a clear phasic response to individual sound pulses of the calling song (Fig. 4b, c). We found no evidence for ongoing rhythmic oscillations in any of these auditory brain neurons.

Neuron B-LI2 responded with a rather short latency (21.2 ms) of post synaptic potential (PSPs) and followed the activity pattern of AN1. Accordingly, this neuron revealed no selectivity for the temporal features of the song and is likely to receive direct inputs from AN1. B-LC3 was tuned to the temporal features of the calling song patterns but still responded with 30–40 % of its maximum response to pulse rates higher or lower than the calling song. Interestingly at the species-specific pulse period it responded stronger to the second, third and fourth sound pulses of a chirp. These data indicate that B-LC3 may be the first neuron within the pattern processing network that exhibits a temporal selectivity for features of the calling song. Whereas neurons B-LI2 and B-LC3 summed excitatory inputs in response to sound pulses neuron B-LI4 integrated inhibitory and excitatory synaptic inputs. B-LI4 responded with inhibition to single sound pulses or the first pulses of the calling song chirp and only the second and consecutive sound pulses elicited spikes. The inhibitory response was pronounced at long pulse durations as during large pulse periods or paradigms with very short pulse intervals. Thus the strongest selectivity occurred in the activity of B-LI4, which was inhibited at pulse patterns of high or low repetition rates and responded with action potentials only within a narrow range of pulse periods corresponding to the temporal range of the calling song. Interestingly at constant pulse periods of 40 ms the neuron B-LI4 also shows a high selectivity for pulse durations (Kostarakos and Hedwig 2012) indicating that pattern recognition is not just based on a pulse period filter/detector as suggested by Thorson et al. (1982).

When comparing the tuning towards pulse intervals and pulse durations for different neurons (Fig. 4c) B-LI4 showed a nearly perfect match with the temporal selectivity of phonotactic behavior. Based on its high response selectivity it may be considered as the feature detector for the calling song and may represent the last stage of pulse pattern recognition. This indicates that pattern recognition is already established at a very early stage of auditory processing within the anterior protocerebrum of the cricket brain.

Temporal selectivity at the chirp level

Cricket phonotaxis also depends on the chirp period and the number of pulses which comprise a chirp (Doherty 1985). In preliminary experiments, in which the local brain neurons were stimulated with chirps of different pulse numbers, keeping the chirp period constant, at least one neuron showed response properties that reflect a tuning to the chirp duration. To single sound pulses neurons B-LI2 and B-LC3 respond with a suprathreshold excitation whereas B-LI4 receives an inhibition (Fig. 5a). Overall the spiking response of the neurons increases with the number of sound pulses and in B-LI2 and B-LC3 this increase is almost linear (Fig. 5b). The phonotactic behavior, however, is strongest towards chirps with 4 pulses and then declines again. This tuning of phonotaxis is reflected in the spike activity of the local brain neuron B-LI4 where the spike response towards chirps with 3–5 pulses matches the tuning of the behavior and then reaches a plateau. Note that the increase at chirps with 6 pulses corresponds to only 0.5 AP. This finding is similar to Schildberger (1985) reporting that band-pass neurons generated the same number of AP to calling song chirps once the pulse number within a chirp was 5 or higher. As the B-LI4 response at least partially matches the tuning of the behavior, this indicates that the response of this neuron not only reflects phonotactic tuning towards the pulse pattern (Fig. 4c) but may also reflect mechanisms involved in filtering the chirp duration. Our data suggest that inhibition contributes to shaping the selective response of these brain neurons.

Fig. 5
figure 5

a Responses of local brain neurons to chirps comprising a different number of sound pulses. Single pulses elicit an excitatory response in B-LI2 and B-LC3 and an inhibition in B-LI4. For chirps with 6 pulses the response of B-LC3 and B-LI4 is best for pulses 2–4 and then decays. b The response magnitude of phonotaxis is given relative to the maximum response and the response of neurons as AP/chirp. Phonotaxis is strongest to chirps with 4 pulses (gray line). The response of the neurons (black lines) gradually shows a better match to the tuning of the behavior and overall is best for the activity of B-LI4

Temporal selectivity is linked to sparse coding

In Fig. 6a we arranged the activity of the local brain neurons in response to a calling song chirp corresponding to their PSP-latencies, which provides information indicating the flow of information within the network. Since the PSP-latencies of B-LC3 and B-LI-2 are considerably short (21.1 and 21.2 ms, respectively) we assume that these neurons may receive direct inputs from AN1. The more complex response pattern of B-LI-4 will require differently timed inhibitory and excitatory inputs (IPSP latency = 25.7 ms). We speculate that B-LI2 may mediate inhibitory inputs to the network, especially B-LI4.

Fig. 6
figure 6

Sparse coding in neurons of the pattern recognition system. Original recordings (a) and PST histograms (b). Whereas the spike latency of the auditory response increases from 22.6 ms in B-LI2 to 35.6 ms in B-LC3 and 46.7 ms (or 34.1 ms to the second sound pulse) in B-LI4, the spike activity in response to the species-specific sound pattern decreases from AN1 to B-LI4 by about 90 %

Temporal selectivity of the neurons increases from B-LI-2 to B-LC3 to B-LI4 (Fig. 4). This increase in selectivity corresponds to a reduction in spike activity as evident in Fig. 6b. The absolute spike activity decreased from AN1 to B-LI4 by nearly 90 %. The neuron that reveals inhibitory inputs and the best match with behavior also shows the lowest spike activity. This continuous decrease of spike activity corresponds to concepts of sparse coding and the transformation to a place code (Olshausen and Fields 2004). According to this concept, sensory information is encoded by a small number of specific neurons, ensuring a robust neural representation of the pattern with an energetic advantage.

We have some preliminary evidence of a presumably non-spiking neuron responding to individual sound pulses with an inhibition followed by a depolarization. The depolarization seems to occur after a specific time and coincides with responses of AN1 to the consecutive sound pulses of the calling song, but not at longer or shorter pulse periods.

Neural evidence supporting the delay-line concept

In order to test if an internal delay line could be involved in auditory processing, we stimulated the auditory neurons with sound pulses and gradually increased the pulse intervals; in Fig. 7a we compare the activity of AN1 to the responses of B-LC3. The first and last pulse intervals (21 ms) represent the feature of the calling song. AN1 revealed a very similar response in terms of AP/pulse and spike rate to all presented sound pulses independently of the preceding pulse interval. Contrary to AN1, the activity of B-LC3 strongly depended on the preceding pulse intervals. In response to song pulses following a silent interval longer than 71 ms the neuron generated only 1–2 AP/pulse. However, when the preceding pulse interval corresponded to the pulse interval of the calling song, B-LC3 responded with considerably stronger activity of 3–4 AP/pulse and a transient spike rate of 200 AP/s. Its activity decreased again with increasing pulse intervals to 31 and 41 ms. At the end of the sequence, when the pulse interval corresponded to the calling song, the response of B-LC3 was again enhanced.

Fig. 7
figure 7

a Evidence for processing by a delay-line and coincidence detection in the auditory pathway. Ascending neuron AN1 responds to all sound pulses with different intervals in a similar way. The response of the local interneuron B-LC3 is strongest to sound pulses, which are preceded by an interval of 21 ms. b Response of B-LI4 to pairs pulses with different intervals demonstrate that the interneuron responds after the species-specific pulse interval. Second sound pulses of 20 ms (blue) and 50 ms duration (red) elicit the same excitatory response, which is just shifted by 30 ms, following an initial 50 ms pulse. Paired with a 50-ms pulse a 20-ms pulse (green) elicits the same excitatory response as a 50-ms pulse (red). a From Kostarakos and Hedwig (2012) with permission

In a similar way Fig. 7b demonstrates responses of B-LI4 when challenged with two pulses with a duration of 20 ms (blue line) and/or 50 ms (red line) separated by an interval of 20 ms. The response to the first pulse was inhibitory but in response to the second pulse, a clear excitatory PSP and spiking response occurred, demonstrating a fundamental change in its activity. When the duration of the first pulse was increased to 50 ms, then the duration of the inhibition was extended and the excitatory response to the subsequent 20 ms pulse was shifted by exactly 30 ms. Thus an increase of the pulse duration by 30 ms resulted in a shift of the response latency to the second pulse by exactly this value. This indicates that the network involves a specific time constant that is triggered by the end of the preceding sound pulse independent of its duration (in the range tested). When the recordings are aligned to the end of the first sound pulse (bottom) it becomes obvious, that a critical time window for a response to a second sound pulse is triggered by the end of the first sound pulse and not by its beginning. Furthermore the response to the second pulse appears to be independent of the duration of the second pulse; pulses of either 20 ms (green and blue line) or 50 ms (red line) elicit a very similar excitatory response. This may indicate a filtering for pulse duration in the network.

These data suggest that temporal pattern recognition is shaped by fast processing of excitation and inhibition. The selective response of the local auditory brain neurons to pulses occurring after the species-specific pulse interval could correspond to the model of a delay-line and coincidence detector that would be most strongly activated by two pulses occurring with an interval that matches the internal species-specific delay. If the presumably non-spiking interneuron would be part of the delay-line it could explain the stronger activity to the consecutive sound pulses in B-LC3 and B-LI4. However, the functional significance of this neuron awaits further analysis.

Conclusions

Is there a delay-line and coincidence detector for pattern recognition?

Pattern recognition in crickets occurs within the anterior protocerebrum of the brain. The selective responses of some auditory brain neurons for temporal features of the song match the temporal selectivity of behavior. This temporal filtering appears to be a result of fast interactions between excitation and inhibition. We found no evidence for ongoing oscillations in the network. However, we still do not understand how temporal filtering is established in the local brain neurons that match the tuning of phonotaxis. Our neural data may indicate that this selectivity is shaped by an autocorrelation-like processing based on a delay-line and coincidence detection mechanism as outlined by Weber and Thorson (1989).

The response of B-LI4 is formed by integrating inhibitory and excitatory inputs. The importance of the interaction between inhibition and excitation depending on the pulse rate have been shown also in anurans (Edwards et al. 2007; Rose et al. 2011). Buonomano (2000) demonstrated in network simulations that it is possible to tune cells to respond selectively to different intervals by changing the weight of their synaptic inputs. Long-lasting inhibitions can establish a delay line necessary for coincidence detection with a second event after a specific pulse interval. In his simulations delay lines by long-lasting inhibitions can last up to 200 ms and are, therefore, more than sufficient to establish a delay of 40 ms corresponding to the pulse period of the cricket song. Recently local brain neurons in bushcrickets have been shown to maintain an extended inhibition correlating with the male–female communication time interval (Ostrowski and Stumpner 2013).

The selective responses of B-LI4 to pulse pairs with different intervals corroborate the concept of a delay-line and coincidence detection mechanism. The selectivity for a specific pulse interval seems to be the result of an internal delay-line the nature of which remains to be established. Support for a delay-line mechanism is also apparent in auditory brain neurons within the lateral accessory lobes (Zorovic and Hedwig 2011). Local neurons connecting the lobes responded with spikes only to the second and consecutive pulses of the species-specific chirps and were tuned to the pulse period of the calling song. As the second response is independent from pulse duration (Fig. 7b) this may suggest that the delay line is coupled to the end of a sound pulse. Accordingly, the selectivity for a specific pulse period may be the result of the selectivity to a specific pulse interval. This is also evident in the tuning of the behavior and the response functions of B-LI4 to different pulse durations. Even when the pulse repetition rate was constant (40 ms), there was a clear selectivity for a specific pulse duration and pulse interval (Kostarakos and Hedwig 2012).

What is a template?

Central to concepts of pattern recognition is the idea of a “template” an innate representation in a given cricket species about the correct song pattern against which the incoming signal is being compared. Both the template and the comparison process need to be reflected in neural processing. The template may be of an oscillatory nature, so that the patterned sensory signal can be compared against the timing of the internal oscillations. In the past, the template was discussed on a long (Pollack and Hoy 1979) or short time scale (Hennig 2003; Clemens and Hennig 2013). In computational approaches the template is given by a filter function and compared to the song signal by a cross-correlation. It is less clear how these computations are physiologically implemented in the neural processing within the auditory pathway. Our data of cricket brain neurons are not in favor of a template concept that relies on an internal representation of the song pattern. Our data rather point to a network of few local excitatory and inhibitory auditory brain neurons that due to their synaptic connections and their specific processing properties establish a pattern recognition network that successively processes the incoming sensory information so that the final pattern recognition neurons, the feature detectors, generate a maximum activity when the system is stimulated with the species-specific pulse pattern. Properties of the pattern recognition network may be specifically adapted to the temporal patterns that are detected and recognized, in a similar way like central pattern generators are based on very different neural and network properties. Our preliminary data indicate that the processing mechanism in the cricket may be similar to an autocorrelation-like processing by a delay-line and a coincidence detector as suggested by Weber and Thorson (1989). A model by Large and Crawford (2002) demonstrated that temporal selectivity in the acoustically communicating fish Pollimyrus adspersus could be the result of a coincidence between excitatory inputs and an intrinsic post-inhibitory rebound excitation. Pattern recognition in crickets could be in line with mechanisms proposed in lower vertebrates, but so far the delay-line and the coincidence detector have not been explicitly identified.

The problem of pattern recognition in crickets may be an example where the number of proposed concepts is not balanced by the number of neurophysiological studies addressing the actual processing by brain neurons. Following Bullock (1961) and Konishi (1991) we think that analyzing the activity of the higher brain’s neurons will be essential to reveal and verify the principles of temporal pattern recognition. Unraveling the basic mechanisms for pattern recognition within the rather simple neural network of crickets may help to understand pattern recognition in other modalities and in even more complex auditory networks of vertebrates.

Pattern recognition and steering

Further insight into the neural control of phonotaxis may be obtained from comparing the dynamics of pattern recognition and auditory steering. G. bimaculatus and also T. oceanicus females show rapid steering responses towards single sound pulses (Hedwig and Poulet 2004, 2005; Cros and Hedwig 2014) and even to the very first sound pulses of chirps, whereas pattern recognition requires at least two subsequent pulses (Schildberger 1984; Hedwig and Poulet 2004). Females even steer to unattractive pulses when these are inserted into a sequence of a calling song chirps and continue to do so for several seconds after listening to a calling song (Poulet and Hedwig 2005). These fast and unspecific phonotactic steering responses cannot rely on template matching by cross-correlation if pattern recognition requires a time window of hundred milliseconds (Hennig 2003), and they are not in accord with a serial processing of pattern recognition and steering. They rather indicate that the auditory steering responses are elicited by a faster, more direct, parallel pathway that is activated once the pattern has been recognized (see also review by von Helversen and von Helversen 1995). Since the non-specific auditory steering responses only occur once pattern recognition is established it appears that the output of the recognition process modulates and enhances processing in the parallel steering pathway with a time constant of several seconds. Such fast and non-specific steering responses would not be expected if the input signal is always first compared against an internal template before any orientation response is initiated.