Introduction

Source–filter coupling is apparently the rarest acoustic phenomenon not only in Iberian red deer, but in any mammal. Instructive spectrograms, illustrating this phenomenon in a non-human mammal, are not available to date. Most mammalian vocalizations are well described in the framework of the source–filter theory of voice production (Fant 1960; Taylor and Reby 2010). According to this theory, the fundamental frequency (f0) of mammalian vocalizations is generated by vibrations of the vocal folds in the larynx (the source). Subsequently, it is filtered by the supralaryngeal vocal tract, revealing formants representing the vocal tract resonances. The source–filter theory suggests the independence of source and filter; that is, vocal tract filtering should not affect the f0 of the sound created in the larynx (Fant 1960; Titze 1994; Taylor and Reby 2010). Some apparent departures from this theory (e.g., higher f0 in the oral than in the nasal calls) may result from a rotational movement of the thyroid cartilage at mouth opening, and thus do not contradict to the predictions of the theory (Efremova et al. 2011; Volodin et al. 2011).

Spectrographic analysis of mammalian vocalizations may provide insights on the involvement of source- and filter-related effects in sound production (Reby and McComb 2003; Taylor and Reby 2010; Volodin et al. 2011) and also may help to reveal the existence of source–filter coupling (Titze et al. 2008). Vocal apparatus oscillations at one of the formant frequencies, like in a wind instrument, may be the result of source–filter coupling (Titze 2008). Modeling has shown that source–filter coupling is possible in red deer Cervus elaphus (Titze and Riede 2010), but the phenomenon has not yet been observed in live animals. Earlier, close values of f0 and first formant were suggested as a mechanism facilitating the appearance of other acoustic nonlinear phenomena (Wilden et al. 1998), such as biphonation (Herzel and Reuter 1997; Mergell and Herzel 1997), subharmonics (Peters et al. 2004; Hatzikirou et al. 2006), frequency jumps, and chaos (Hatzikirou et al. 2006). The coincidence of fundamental and formant frequencies increases the call amplitude in human singers (Titze et al. 2008) and in songbirds (Riede et al. 2006; Riede and Suthers 2009).

The species C. elaphus originated in Middle Asia and then distributed in two opposite directions: to Siberia and further to North America, and to Europe (Mahmut et al. 2002; Ludt et al. 2004). The ancient Bactrian subspecies of red deer Cervus elaphus bactrianus lives in the center of origin. Common origin, rather recent evolutionary history, and existence of many subspecies within European, Asian, and American lineages of red deer show that C. elaphus may be considered in a broad sense as a continuum of subspecies and recent species (Mahmut et al. 2002; Ludt et al. 2004; Zachos and Hartl 2011).

In different parts of the distribution area, red deer stags produce rutting calls different in acoustic structure. European subspecies produce low-frequency roars (Reby and McComb 2003; Kidjo et al. 2008; Frey et al. 2012), whereas North American and Siberian subspecies mainly produce extremely high-frequency bugles (Struhsaker 1968; Feighny et al. 2006; Nikol’skii 2011). Middle Asian ancient Bactrian subspecies produce bugles and roars as well as biphonic rutting calls, comprising both low and high fundamental frequencies in their spectra (Nikol’skii 1975; Nikol’skii et al. 1979). Roars of European red deer may be subdivided into common roars with a clearly visible f0 and its harmonics, and harsh roars, where f0 is masked with deterministic chaos and subharmonics for most part of call duration (Reby and McComb 2003; Frey et al. 2012). The Fig. 1 represents male rutting calls of a few red deer subspecies, recorded by the authors either in the wild or in captivity.

Fig. 1
figure 1

Diversity of stag rutting calls across subspecies of red deer C. elaphus: a bugle of a captive Cervus elaphus canadensis (Tierpark Berlin, Germany); b bugle of a captive Cervus elaphus sibiricus (Saint-Petersburg Zoo, Russia); c bugle of a captive Cervus elaphus nannodes (Tierpark Berlin, Germany); d common roar of a wild C. elaphus hispanicus (Andalusia, Spain); e common roar of a captive C. elaphus elaphus (Tierpark Berlin, Germany); f transitional from harsh to common roar of a captive Cervus elaphus hippelaphus (Kaliningrad Zoo, Russia); g1 common roar, g2 biphonic rutting call, and g3 bugle of a wild C. elaphus bactrianus (North Uzbekistan). Spectrograms were created with 11,025 Hz sampling frequency, Hamming window, FFT 1,024 points, frame 50%, and overlap 93.75%. Audio files of these calls can be downloaded at http://www.bioacoustica.org/gallery/mammals_eng.html#Artiodactyla

Male red deer retract the larynx down to the sternum during their rutting calls (Fitch and Reby 2001; Feighny et al. 2006; Frey et al. 2012). This results in vocal tract elongation and a lowering of formants, which are clearly visible in spectrograms of the low-frequency roars of European red deer (Reby and McComb 2003; Kidjo et al. 2008). However, similar retraction of the larynx in high-frequency bugles of North American/Asian subspecies does not reveal a clear formant structure (Feighny et al. 2006; Titze and Riede 2010; Nikol’skii 2011).

Despite very different patterns of rutting calls, larynges and vocal folds of European and North American/Asian red deer are very similar morphologically and potentially both capable of producing an f0 range of 60–1,200 Hz (Riede and Titze 2008; Titze and Riede 2010; Frey et al. 2012). The mechanisms underlying such great vocal variation among subspecies with basically the same larynx are poorly understood to date. Potentially, source–filter coupling may account, at least partially, for such vocal flexibility of red deer (Titze and Riede 2010).

Iberian red deer Cervus elaphus hispanicus is a relict subspecies of the Iberian refugium from where red deer re-colonized Western Europe after the Late Pleistocene glacial maximum occurred 25,000–12,000 years ago (Zachos and Hartl 2011). Iberian stags produce relatively low-frequency roars, with average maximum f0 223 Hz and average mean f0 186 Hz (Frey et al. 2012). Among acoustic nonlinear phenomena, the subharmonics and deterministic chaos occur regularly in rutting calls of Iberian stags, but we never detected biphonation (Frey et al. 2012). This study documents and describes an unusual mode of vocal production in rutting roars of Iberian red deer, pointing to source–filter coupling.

Materials and methods

Audio recordings of 2,928 rutting roars were collected from about 75 free-ranging unmarked adult male Iberian red deer stags during September 2007 in Andalusia and Extremadura, Spain. These recordings were part of a combined study of the underlying anatomy, the vocal behavior and the acoustics of Iberian red deer male rutting calls (Frey et al. 2012). The distance of the animals to the microphone varied from 10 to 150 m. For the audio recordings (48 kHz, 16 bits), we used a Fostex Field Recorder FR–2LE (Fostex Company, Tokyo, Japan) and a Sennheiser MKH 70 P48 condenser directional microphone (Sennheiser, Wedemark, Germany). We analyzed one natural three-call bout from a wild Iberian red deer stag.

For creating spectrograms and acoustic measurements, we used Avisoft SASLab Pro software (Avisoft Bioacoustics, Berlin, Germany) and Praat DSP package (Boersma and Weenink 2009). Single Fast Fourier Transform (FFT) spectra (6.4 ms wide) were created with 5,000 Hz sampling frequency, Hamming window, FFT 1,024 points, frame 50%, and overlap 96.87%. For comparison with FFT spectra, linear predictive coding (LPC) spectra were created in 20-ms fragments around the same time points, using “To Formants (Burg)” command in Praat, window length 0.04 s, time step 0.01 s, maximum number of formants 8, maximum formant frequency 2,000 Hz. Peak formant values were extracted with “To Spectrum (slice)” command in Praat.

Results

The spectrogram in Fig. 2 visualizes a natural bout of three rutting roars of an Iberian red deer stag. The first call in this bout is a short common roar. The second call is a longer common roar. The third call is a harsh roar with insertion of an unusual acoustic phenomenon.

Fig. 2
figure 2

Spectrogram (below) and wave form (above) illustrating different modes of phonation in a natural bout of three rutting calls of Iberian red deer. The first and second calls are common roars. In the second call, the f0 and its harmonics and the lowering formants at the beginning of the call are clearly visible. The third call shows the transition from a chaotic mode to a probable source–filter coupling mode at 5.1 s, where the frequency of the fourth formant comes into the new fundamental frequency band. Spectrogram was created with 11,025 Hz sampling frequency, Hamming window, FFT 1,024 points, frame 50%, and overlap 93.75%. Audio files of these calls can be downloaded at http://www.bioacoustica.org/gallery/mammals_eng.html#Artiodactyla

For the second call of the bout, the FFT spectra at 1.5 and 2.2 s show clear fundamental frequency (f0) and its integer multiple harmonics (Fig. 3). The f0 at the beginning of this call (at 1.5 s) comprises 174 Hz and increases to the middle of the call (at 2.2 s) up to 239 Hz. The acoustic energy is distributed non-uniformly over the spectrum, showing local maxima of about 0.6 and 1.4 kHz and local minima of about 0.3 and 1.0 kHz.

Fig. 3
figure 3

FFT spectra, taken along the bout of rutting calls of Iberian red deer. The labels in the upper right corner represent the time points on the spectrogram from Fig. 2. The curve lines imposed above the FFT spectra at time points 5.0 and 6.0 s represent the LPC spectra, created in 20-ms fragments around these time points. f0—fundamental frequency, F3—third formant, F4—fourth formant, F5—fifth formant, F6—sixth formant

In the spectrogram of the third call, the onset segment also exhibits the f0 with harmonics (Fig. 2). The FFT spectrum at 4.0 s also clearly reveals the f0 (186 Hz) and its integer multiple harmonics (Fig. 3). This spectrum matches those created for the onset part of the second call (at 1.5 s) and demonstrates approximately the same distribution of local power maxima and minima (Fig. 3). The onset segment of the third call also shows the lowering formants (Figs. 2 and 4), similarly to the onset part of the second call (Fig. 2). The next segment of the third call represents deterministic chaos, revealing formants, especially third, fourth, fifth, and sixth (F3–F6, Fig. 4). Formants at this segment are nearly horizontal and are lowering very steadily. The FFT spectrum at 5.0 s does not reveal harmonic structure (Fig. 3). However, the four well-visible spectral peaks correspond to central values of the four formants (F3–F6) that confirmed by the LPC spectrum, created at the same time point (Fig. 3, Table 1).

Fig. 4
figure 4

Spectrogram of the third call of the bout with labeled first six formants (F1–F6). A notable lowering of the formants at the onset part of the call result from the prominent retraction of the larynx and respective elongation of the vocal tract. Time scale is the same as on Fig. 2. Spectrogram was created with 5,000 Hz sampling frequency, Hamming window, FFT 1,024 points, frame 50%, and overlap 96.87%

Table 1 Comparison of values of spectral peaks from FFT and LPC spectra, taken in the same time points (5.0 and 6.0 s) along the spectrogram on the Fig. 2

At 5.1 s, the segment of deterministic chaos transfers abruptly into the harmonic segment with f0 and its integer multiple harmonics (Fig. 2). The new fundamental frequency has much higher value (728 Hz at 5.4 s) than maximum f0 values achieved in the centrum of the second call of the bout (Figs. 2 and 3). At the same time, the new f0 is very close in value to the preceding value of the fourth formant (723 Hz at 5.0 s). The spectrogram shows that the frequency of the fourth formant transits smoothly into the fundamental frequency band (Fig. 4). The waveform above the spectrogram shows, that at the point of the transition the call amplitude increases strongly (Fig. 2). As a result, the sound changes from harsh roar to siren.

The final segment of the call is represented by deterministic chaos (Figs. 2 and 4), and the FFT spectrum at 6.0 s does not reveal harmonic structure (Fig. 3). Spectral peaks match formant frequencies (Fig. 3) whose values are close to those of formants in the preceding segment with deterministic chaos (Table 1). The high fundamental frequency transits into the frequency of the fourth formant, whereas the harmonics of the high fundamental frequency stop abruptly and do not continue along the final segment of the calls (Fig. 4). This moment corresponds to the abrupt decrease of the call amplitude (Fig. 2).

Discussion

This study describes different modes of phonation in a natural bout of rutting calls of Iberian red deer. The second call is produced by a normal mode of phonation (Wilden et al. 1998), in which the effects of the sound source and the vocal tract are independent from each other (Fant 1960; Taylor and Reby 2010). The spectrogram of the second call clearly depicts the f0 and its harmonics, resulting from the oscillation of the vocal folds within the larynx (sound source), and the descending formants at the beginning of the call, resulting from an elongation of the vocal tract effected by larynx retraction during the roar. The third call starts with the f0 and its harmonics followed by deterministic chaos with clearly accented formants. Then the fourth formant continues into the new high fundamental frequency band, creating own harmonics. At the point where the sound source starts producing the fundamental frequency at the frequency of the fourth formant, the sound amplitude of the call increases strongly, suggesting a resonance effect, as for a wind instrument. As a result, the sound changes from harsh roar to siren. The final part of the call again represents deterministic chaos, returning to the initial formants of the roar. Thus, the third call shows the transition from a chaotic mode to a probable source–filter coupling mode (Titze 2008), in which the vocal folds start to vibrate on the frequency of the fourth vocal tract formant.

The third call with the probable source–filter coupling has been produced immediately after the common roars, showing usual vocal pattern for Iberian red deer. This suggests that the caller was a healthy stag with normal vocal function, i.e., capable also of normal vocal production. However, source–filter coupling was a rare acoustic phenomenon in Iberian red deer, as it was found in less than 0.1% of the rutting roars compared to, e.g., deterministic chaos, found in more than 10% of roars (Frey et al. 2012).

The new high fundamental frequency of 728 Hz of the siren fragment of the third call corresponded to the fourth formant of the preceding harsh part of the same roar (Fig. 2). Also, it approximately corresponded to the average value of the fourth formant (744 Hz), measured for this sample of rutting roars of male Iberian red deer (Frey et al. 2012). The f0 value of 728 Hz was substantially higher than the average maximum f0 values, found in the roars of Iberian stags (223 Hz, Frey et al. 2012) or in any European subspecies of red deer: 52 Hz in Cervus elaphus corsicanus (Kidjo et al. 2008) and 137 Hz in Cervus elaphus scoticus (Reby and McComb 2003).

Maximum f0 values of roars recorded at the study site never exceeded 338 Hz (Frey et al. 2012), therefore there is no reason to expect that it was an occasional increase of f0 of the source up to 728 Hz. Also, the observed mode of production could not be a true whistle phenomenon (a vortex-shedding obstruction in the vocal tract), as the true whistle harmonics of the fundamental frequency are poorly visible in the call spectrum. However, the siren call part clearly shows harmonics, suggesting against the possibility that the sound is a whistle. Thus, the source–filter coupling (Titze 2008) seems most plausible explanation for production mode of the siren call part in Iberian red deer.

At the same time, the f0 value of 728 Hz, achieved due to source–filter coupling in an Iberian stag in this study, was substantially lower than the f0 of the bugles reported for North American and Asian subspecies, from 1,000 to over 2,000 Hz (Struhsaker 1968; Feighny et al. 2006; Riede and Titze 2008; Titze and Riede 2010; Nikol’skii 2011). In bugles, harmonics of this fundamental frequency are spaced respectively by 1,000 to over 2,000 Hz, whereas formants of red deer stags are spaced four times more closely (Fitch and Reby 2001; Reby and McComb 2003; Frey et al. 2012). Formants only increase the energy, already presented in call spectra. They are invisible in bugles, as most of them fall between harmonics, where the acoustic energy is lacking, whereas some of them fall on the harmonics, but are undistinguishable from them. Thus, it is unclear whether the bugle fundamental frequency coincides with one of formants or not, without alternating between coupling and non-coupling. This could be the reason, why source–filter coupling has not yet been found in bugling deer.

Nevertheless, we may hypothesize that source–filter coupling can play some role in production of extremely high-frequency bugles of these subspecies (Feighny et al. 2006; Titze and Riede 2010; Nikol’skii 2011; see also Fig. 1). As the larynx retraction inevitably changes the resonance characteristics of the vocal tract, the deer should be capable of adjusting to which formant of the call the f0 will be coupled. This would suggest a different function of larynx retraction in the production of high-frequency bugles compared to the low-frequency roars by rutting stags. In roars, retraction of the larynx accents the lowering formants assumed to provide acoustic cues to male body size (Fitch and Reby 2001; Fitch and Hauser 2002; Reby and McComb 2003). In bugles, formants are not accented because of widely spaced harmonics, but retraction of the larynx nevertheless shortens the distance between neighboring formants, thereby facilitating the tuning of the f0 of the source to the resonance frequencies of the vocal tract.

The unusual vocalization found in this study gives insight into the bugling of red deer. Potentially, a vocal fold of 3 cm long, as in American stags, can be damaged or even broken when producing calls higher 1,000 Hz (Riede and Titze 2008). Source–filter coupling allows to avoid the traumatic tension of vocal folds, thus explaining how the extremely high-frequency and high-amplitude bugles can be produced without special histological adaptations (Titze and Riede 2010). However, in case if bugling is achieved by source–filter coupling, this acoustic phenomenon should be widespread rather than the rarest, at least for the North American/Asian subspecies of C. elaphus.