Introduction

Many mammals use the acoustic channel for mother-offspring recognition: e.g., subantarctic fur seals Arctocephalus tropicalis (Charrier et al. 2002); walrus Odobenus rosmarus (Charrier et al. 2010); Australian sea lions Neophoca cinerea (Pitcher et al. 2010); fallow deer Dama dama (Torriani et al. 2006); white-tailed deer Odocoileus virginianus (Lingle et al. 2012); red deer Cervus elaphus (Sibiryakova et al. 2015; Volodin et al. 2016a); domestic sheep Ovis aries (Sèbe et al. 2007); domestic goats Capra hircus (Briefer and McElligott 2011a; Briefer et al. 2012); domestic cattle Bos taurus (Padilla de la Torre et al. 2016), and saiga Saiga tatarica (Volodin et al. 2014). The individuality of the calls is important for the mothers in order to recognize their own offspring and to reject alien young (Torriani et al. 2006; Lingle et al. 2007a; Sèbe et al. 2007) for avoiding potential allosuckling (Nowak et al. 2000; Brandlová et al. 2013). For the young, it is important to vocally advertise own identity to be recognized by their mother and nursed (Espmark 1971; Sèbe et al. 2007) and important to recognize the mother by her individualistic voice among other females (Sèbe et al. 2010). It is expected that neonates vocalize less than mothers, because neonates need to reduce the energy costs for maintaining thermoregulation to avoid hypothermy in cold weather or dehydration in hot weather, and to minimize the risk of disclosing their presence to predators (Kjellander et al. 2012; Roberts and Rubenstein 2014).

In ruminants, the acoustic individuality of mother and young calls varies between species (Espmark 1975; Shillito-Walser et al. 1981; Charrier et al. 2002; Terrazas et al. 2003; Torriani et al. 2006; Sèbe et al. 2010; Briefer and McElligott 2011a; Sibiryakova et al. 2015). The call-based mother-offspring recognition may be unidirectional, as in fallow deer, in which the young recognize their mothers but not vice versa (Torriani et al. 2006), or mutual, as in domestic goats (Terrazas et al. 2003; Briefer and McElligott 2011a) or domestic cattle (Padilla de la Torre et al. 2016). Both directionality of mother-offspring vocal recognition and the degree of acoustic individuality are influenced by the offspring anti-predator strategy (Torriani et al. 2006; Briefer and McElligott 2011b). Neonate ruminants use two main strategies against predation, “hiding” and “following” (Lent 1974; Fisher et al. 2002). The hider neonates stay hidden in vegetation for the first 1–3 weeks after birth, and are approached by their mothers only for feeding, as in goitred gazelles Gazella subgutturosa (Jevnerov 1984; Blank 1998; Blank et al. 2015), fallow deer (Torriani et al. 2006), red deer (Vaňková and Málek 1997; Sibiryakova et al. 2015), and domestic goats (Terrazas et al. 2003). In contrast, the follower neonates, as reindeer Rangifer tarandus (Espmark 1971), domestic sheep (Sèbe et al. 2007), and saiga (Sokolov and Zhirnov 1998; Danilkin 2005), are hiders only for 1–2 days after birth, then following their mothers.

The “follower” saiga neonates within 30 min after birth are already capable of standing, suckling, and walking, and they even try to run (Danilkin 2005; Kokshunova 2012). A few hours after birth, they transfer to another place together with their mothers; after 2–3 days, they follow their mothers permanently; and after 10 days, they are capable of following the herd and running as quickly as adults in the case of danger (Sokolov and Zhirnov 1998; Danilkin 2005). For the remaining year, saigas forage in herds of many thousand individuals in the steppes of Russia and Kazakhstan (Bannikov et al. 1961; Sokolov and Zhirnov 1998; Danilkin 2005). Saiga females start breeding at the age of 8 months (Bekenov et al. 1998; Kühl et al. 2007), whereas males start in the second year of life (Bannikov et al. 1961; Fadeev and Sludskii 1982). Saiga migrate twice a year for many thousands of kilometers in their steppe habitats in Kazakhstan (Bannikov et al. 1961). Saiga population in Kazakstan raises and collapses, from over a million animals in 1993 to 178,000 in 2000, to 20,000 in 2003, then increased to 216,500 in 2014, and collapsed again in 2015 to about 70,000 individuals (Milner-Gulland et al. 2001; Orynbayev et al. 2016).

In saiga, the maternal tactics for decreasing neonate mortality of predation involve the gathering of pregnant females in huge aggregations of dozen thousand animals. Within these aggregations, the females give birth within a short period of 5–8 days in groups of 15–20 individuals where distances between individuals are around 20 m and up to 200–300 m between groups (Bannikov et al. 1961; Bekenov and Milner-Gulland 1998; Sokolov and Zhirnov 1998; Danilkin 2005). As main saiga predators are wolves Canis lupus, which are strictly territorial during the saiga calving season in May, only few wolf packs hunt in saiga breeding area (Bannikov et al. 1961; Sokolov and Zhirnov 1998; Danilkin 2005). Other factors influencing the saiga calving aggregations are pasture productivity, water availability, and avoidance of human disturbance (Singh et al. 2010). These aggregations of thousand vocalizing mother and neonate saigas (Electronic Supplementary Material 1) create challenge of “cocktail-party,” when individual voices of many animals merge into the loud disorganized choir, in which individual mother-neonate recognition is becoming strongly complicated (Aubin and Jouventin 1998).

All sex and age classes of saiga vocalize: adult males during the rut (Frey et al. 2007), adult females, adolescents, and neonates when contacting in herds (Volodin et al. 2009, 2014). Adult females also vocalize postpartum and neonates after birth (Kokshunova 2012; Volodin et al. 2014), so mother and her young have an opportunity to remember each other’s voices just after the parturitions. In addition, neonate saigas vocalize when captured by a predator and when begging for the nursing (Volodin et al. 2014). Saiga adolescents and adult males vocalize through the nose (Frey et al. 2007; Volodin et al. 2009), whereas adult females and neonates produce calls as through the nose (nasal calls) and through the mouth (oral calls) (Volodin et al. 2014).

The fundamental frequency (f0) of mammalian vocalizations is generated by vibrations of the vocal folds in the larynx (the source). Subsequently, while passing through the supralaryngeal vocal tract towards the mouth or nose, it is subjected to an acoustic filtering process revealing vocal tract resonances (formants). Formant frequencies are inversely related to vocal tract length (Fitch and Reby 2001; Taylor and Reby 2010). Mammalian nasal vocal tracts are longer than oral vocal tracts in the same individual (Efremova et al. 2016), especially in saigas as a result of their trunk-like nose (Frey et al. 2007; Volodin et al. 2009, 2014). Therefore, in mammals, formants of nasal calls are lower than formants of oral calls (Efremova et al. 2011; Volodin et al. 2011, 2014; Stoeger et al. 2012). As the acoustics of the oral and nasal calls differ, analyses of the effects of individuality on the acoustics should be conducted separately for the nasal and oral calls.

The purpose of this study was to investigate the individuality of saiga mother and young contact calls in a large aggregation of females and their offspring on their breeding grounds in the wild. We compare the call structure and the individual distinctiveness between the nasal and oral contact calls of mothers and between the oral calls of mothers and neonates.

Methods

Study sites, subjects, and dates

Saiga (Saiga tatarica tatarica) mother and neonate oral and nasal contact calls were collected between 13 and 18 May 2014 from adult female (1 year and older) and neonate (1–2 days postpartum) saigas on their natural breeding grounds in the Turgai steppe of Northern Kazakhstan (49°53′N, 65°48′E). In May 2014, the entire saiga population of Kazakhstan comprised about 200,000 animals (Orynbayev et al. 2016). The study Betpakdala subpopulation of the Turgai steppe at the start of the study comprised approximately 30,000 pregnant females. The location of this herd was identified by satellite collars used in parallel studies of saiga ecology and diseases (Kock et al. 2015; Orynbayev et al. 2016).

Call collection

After initial observations by car of both the location and spread of the birth aggregation, straight-line transect routes were selected with the aim of crossing the area of the highest concentration of neonates. GPS-guided transect route location varied daily in position and direction, with minimum distance 1 km between neighboring transects. Transects were walked by five people, each 12 m apart, covering a width of 60 m and length of 10 km. This transect method is applied for neonate saiga censuses (Kühl et al. 2007). During the walking of transects by people, the saiga mothers retreated and stayed at a distance of 400–500 m to the people and returned to their neonates within 30 min after the humans proceeded along the transect, whereas the neonates remained hidden in the grass.

Three automated devices Song Meter SM2+ (Wildlife Acoustics Inc., Maynard, MA, USA) were used for recording saiga mother and neonate vocalizations. They were positioned while passing along transects at places with a high concentration of hiding neonates (10–15 neonates per 100 m of transect). Automatic recording of the calls started 1 hour later after the humans had left. For the acoustic recordings (22.05 kHz, 16 bit, stereo), each Song Meter device was positioned horizontally, 20 cm above the ground. The Song Meter devices were collected 2–3 days later, when mother and neonate saigas had already left these transects and transferred to other transects, which were not yet left by saiga mothers and neonates. In total, 397 sound files (each of 29-min duration), were collected at 7 recording sites.

Body data collection

While people passed along the transects, 42 male and 26 female saiga neonates were hand-captured, weighed with 10 g precision using electronic scales Voltcraft HS-10L (Voltcraft, Hirschau, Germany) and sexed by the presence of tiny horn primordia in males and by external sexual traits. The entire handling lasted 3–7 min per animal; then, the animals were released at the place of capture.

Nine mother cadavers and 13 neonate cadavers (6 males, 6 females, 1 of undetermined sex) were detected during the transect walks. The cadavers served for measuring the resting lengths of the nasal and oral vocal tracts using a tape with 1 mm precision from the approximate position of the vocal folds (laryngeal prominence) up to the nostrils or lips, respectively. In addition, the lengths of a hind leg and of the head were measured in the cadavers as proxies of mother and neonate body size. Each body measurement was repeated three times for each cadaver and the mean value was calculated.

For one male neonate cadaver, we measured the dorsoventral and rostral-caudal vocal fold length after an on-site field dissection. After cutting the entire larynx medio-sagittally into two halves, the maximal dorsoventral length and the maximal rostrocaudal length of the right and left vocal folds were measured with electronic calipers (Aerospace, Brüder Mannesmann Werkzeuge GmbH, Remscheid, Germany) with 0.5 mm precision (following Efremova et al. 2016). The dorsoventral length was measured along the medial surface of the vocal folds facing the glottis, from their ventral attachment to the thyroid cartilage dorsally up to their attachment to the vocal process of the arytenoid cartilage. The maximal rostrocaudal length was measured at the maximal rostrocaudal diameter in the dorsal third of the vocal fold, from its rostral edge to its caudal edge. The body measurements were used for establishing the settings for measuring formants of the oral and nasal calls and for estimating sex differences in neonates.

Call samples

To minimize the bias in call selection and analyses, all acoustic analyses were performed using a blind approach by a researcher (OS) who was not involved in data collection. Selection of calls for analysis and subsequent spectrographic analysis was done using Avisoft SASLab Pro software (Avisoft Bioacoustics, Berlin, Germany). Mother and neonate saigas produced calls when looking for each other (Volodin et al. 2014). Calls within a series following with near-regular intervals were considered as emitted by one individual, and only one series per individual was included in the spectrographic analysis (see example call series in Electronic Supplementary Material 2). We took call series from different recording sites separated by >500 m and from different files separated by large time intervals (> 30 min). Many dozen mother and neonate saigas moved and vocalized around each recording site, so the probability of including more than one call series per individual in the analyses was negligible.

Calls were subdivided into originating from a mother or from a neonate by visual analysis of spectrograms in Avisoft according to substantial differences between the f0 ranges of adult females and neonates (Fig. 1) following previous studies (Volodin et al. 2009, 2014). Adult males were lacking on the breeding grounds, and male yearlings occurred very rarely, so probability of occurrence of their calls within recordings was very low. Mother and neonate calls were further subdivided to nasal calls (produced with a closed mouth) and oral calls (produced via an opened mouth) based on corresponding spectrograms of these calls, where we could visually estimate differences in energy distribution (Fig. 1), and based on specific “nasal” sound quality in calls produced through the nose (Volodin et al. 2011, 2014).

Fig. 1
figure 1

Spectrogram (below) and waveform (above) of example saiga antelope mother and neonate calls: mother nasal call (A), mother oral call (B), neonate nasal call (C), and neonate oral call (D). The spectrogram was created with Hamming window, 22.05 kHz sampling rate, FFT 1024 points, frame 50%, and overlap 93.75%. Both female and young calls are tonal, but at the same spectrogram settings the calls show either a pulsed (mother) or a harmonic (neonate) representation due to the lower fundamental frequency of mother calls. See Electronic Supplementary Material 3 for the sound file, which was used for creating this spectrogram

The single call series per individual mother or neonate was sufficient for analysis of vocal individual identity, as this fits natural conditions where mother and offspring should discriminate each other voices among other individuals based on a single series of oral or nasal calls (e.g., Matrosova et al. 2009, 2010). Within series, all calls were uniformly either nasal or oral. Mixed call series (containing both oral and nasal calls within series) were not analyzed.

We included in spectrographic analyses call series of good quality, not disrupted by wind and not overlapped by noise or other calls. For mothers, we selected 8–10 nasal calls per call series (=individual) from 18 individuals, and 8–10 oral calls per call series (=individual) from 21 individuals, 168 nasal, and 192 oral calls in total. For neonates, we selected 8–10 oral calls per call series (=individual) from 22 individuals, 197 oral calls in total. As the number of neonate nasal calls was small, we selected 1–10 nasal calls per call series (=individual) from 16 individuals, 78 nasal calls in total.

For each of these 77 individual callers, we calculated the average value for each acoustic variable to compare the variables of nasal and oral calls within mothers and within neonates. For the analysis of individuality encoded in nasal and oral calls, we selected 168 nasal calls from 18 mothers (8–10 calls per individual), 164 oral calls from 18 mothers (8–10 calls per individual) and 161 oral calls from 18 neonates (8–10 calls per individual).

Call analysis

For each mother and neonate oral or nasal call, we measured the same 6 acoustic variables: the duration, the fundamental frequency period and the first four formant frequencies (Fig. 2). Prior to analysis, calls were downsampled to 22.05 kHz. We measured call duration from the screen with the standard marker cursor in the main window of Avisoft. The mean f0 period (i.e., the mean distance from a previous pulse to the following pulse) was measured from the screen with the standard marker cursor in the main window of Avisoft, displaying the spectrogram and the waveform, with the following settings: Hamming window, FFT-length 512, frame 50%. Frequency resolution of the spectrographic analysis was 43 Hz, and time resolution varied between 0.3 and 0.5 ms, depending on call duration. All measurements were exported automatically to Microsoft Excel (Microsoft Corp., Redmond, WA, USA). Then, we calculated the mean f0 of each call as the inversed value of the mean f0 period of the call (Fig. 2).

Fig. 2
figure 2

Measured acoustic variables from waveform (above) and spectrogram (below) of the oral (left) and nasal (right) calls of two different mother saigas: duration, fundamental frequency period (period f0), and tracks of the first four formants (F1–F4). The LPC settings were: Burg analysis, window length 0.04 s, time step 0.01 s, maximum number of formants 4, maximum formant frequency 3400 Hz (for the oral call), and 2000 Hz (for the nasal call)

The four first formants (F1, F2, F3, and F4) were measured using LPC with Praat DSP package (P. Boersma and D. Weenink, University of Amsterdam, Netherlands, www.praat.org). The generally agreed basic model for the analysis of formants is that of a uniform tube closed at one end, considering the sound source (larynx and vocal folds) as the closed end, while the mouth or nostrils represent the open end (Fitch and Reby 2001). According to this model, expected formant frequencies can be calculated as:

$$ Fn\kern0.5em =\kern0.5em \frac{\left(2 n\kern0.5em -\kern0.5em 1\right)\ast c}{4 L} $$

where n are formant numbers (1, 2, 3 etc.), L is vocal tract length, and c is the speed of sound in air, approximated as 350 m s−1. The age of our subject animals over the study period corresponded to that of the dissected specimens. Therefore, the anatomically measured mean values of vocal tract lengths obtained from the cadaver specimens served to establish the settings (maximum number of formants and the upper limit of frequency range) for linear prediction coding (LPC) and further analysis of the formant frequencies of the nasal and oral calls with the Praat DSP package. For mothers, the maximum formant frequency (the upper limit of frequency range) was 3000–3700 Hz for oral calls and 2000–2500 Hz for nasal calls. For neonates, the maximum formant frequency was 5800–6700 Hz for oral calls and 4900–5300 Hz for nasal calls. Point values of formant tracks were extracted, exported to Excel and the value of each formant for the given call was calculated as the average value from the point values. Applying the model of a uniform tube closed at one end, we calculated the formant dispersion (dF) for nasal and oral calls of mother and neonate saigas by using linear regression according to Reby and McComb (2003).

Statistical analyses

Statistical analyses were conducted using STATISTICA v. 6.0 (StatSoft, Tulsa, OK, USA) and the statistical software package R (R Development Core Team 2012). Means are given as mean ± SD, all tests were two-tailed, and differences were considered significant whenever p < 0.05. Fifty of 54 distributions of measured acoustic parameter values and all anatomical measurements did not depart from normality (Kolmogorov-Smirnov test, p > 0.05). As parametric ANOVA and discriminant function analysis (DFA) are relatively robust to departures from normality (Dillon and Goldstein 1984), this was not an obstacle to the application of these tests.

We used one-way ANOVA to compare the average parameter values between oral and nasal calls within mothers and neonates. We used DFA standard procedure to calculate the probability of the assignment of calls to the correct individual for oral calls and nasal calls of mothers and for oral calls of neonates (the number of neonate nasal calls was insufficient for conducting the DFA for individuality). We included all 6 measured call variables (f0, duration, F1, F2, F3, and F4) in all the DFAs. For comparison of individuality between oral and nasal calls and between mothers and neonates, we used unified samples of 18 callers for mother oral calls, mother nasal calls and neonate oral calls.

We used Wilks’ lambda values to estimate how strongly the acoustic variables of the calls contribute to the discrimination among individuals. With a 2 × 2 Yates’ chi-squared test, we compared the values of correct assignment of oral and nasal calls to individual within and between mothers and neonates. To validate our DFA results, we calculated the random values of correct assignment of calls to individual by applying a randomization procedure with macros, created in R. The random values were averaged from DFAs performed on 1000 randomized permutations on the data sets as described by Solow (1990). For example, to calculate the random value of classifying oral calls to individual mothers, each permutation procedure included the random permutation of 164 calls among 18 randomization groups, respectively the 18 individual mothers which were examined, and followed by DFA standard procedure. All other permutation procedures were made similarly. Using a distribution obtained by the permutations, we noted whether the observed value exceeded 95, 99, or 99.9% of the values within the distribution (Solow 1990). If the observed value exceeded 95, 99, or 99.9% of values within this distribution, we established that the observed value did differ significantly from the random one with a probability p < 0.05, p < 0.01, or p < 0.001, respectively (Solow 1990; Sibiryakova et al. 2015).

Results

The average oral vocal tract length was 218 mm in the mother and 111 mm in the neonate saigas (Table 1). The average nasal vocal tract length was 259 mm in the mother and 130 mm in the neonate saigas. The average body mass of the neonate saigas was 3.37 kg (Table 1). One-way ANOVA did not reveal sex differences in any body variable in the neonate saigas (Table 1). The dorsoventral vocal fold length in the male neonate saiga was 11 mm, and the rostrocaudal vocal fold length was 8 mm.

Table 1 Mean ± SD values of body variables for mother and neonate saigas and ANOVA results for comparison between sexes in neonates

From the total of 397 sound files, neonate calls were presented in 154 (38.8%) files and mother calls were presented in 291 (73.3%) files. The first four formants of mother nasal calls were significantly lower than respective formants of mother oral calls (Table 2). The formant distances F2–F1 and F3–F2 were significantly shorter in nasal calls than in oral calls, whereas the distance F4–F3 did not differ between nasal and oral calls. Mother nasal calls were shorter than oral calls; the mean f0 did not differ between nasal and oral calls (Table 2). The estimated formant dispersion of 581 Hz corresponded to an acoustically estimated nasal vocal tract length of 301 mm during the emission of nasal calls (Fig. 3). The estimated formant dispersion of 832 Hz corresponded to an acoustically estimated oral vocal tract length of 210 mm during the emission of oral calls.

Table 2 Mean ± SD values of acoustic variables for mother saiga oral and nasal calls and ANOVA results for their comparison
Fig. 3
figure 3

Estimation of formant dispersion (dF) in saiga by using linear regression: mother nasal calls (a), mother oral calls (b), neonate nasal calls (c), and neonate oral calls (d). Central points show the means of the first four formants (F1–F4); whiskers show the SD

As in mothers, in neonates the values of the four first formants of the nasal calls were lower than respective values of formants of the oral calls (Table 3). Unlike mothers, only the F2–F1 distance was shorter in the nasal than in the oral calls; the distances F3–F2 and F4–F3 did not differ between the nasal and oral calls. Neonate nasal calls were shorter by duration than oral calls; the mean f0 did not differ between nasal and oral calls (Table 3). The estimated formant dispersion of 1224 Hz corresponded to an estimated nasal vocal tract length of 143 mm during emission of the nasal calls (Fig. 3). The estimated formant dispersion of 1571 Hz corresponded to an estimated oral vocal tract length of 111 mm during emission of the oral calls.

Table 3 Mean ± SD values of acoustic variables for neonate saiga oral and nasal calls and ANOVA results for their comparison

For each of the three equal (18 individuals) samples of mother oral calls, mother nasal calls and neonate oral calls, the average value of correct classification to individual with DFA (99.4% for mother oral calls, 89.3% for mother nasal calls, 94.4% for neonate oral calls) exceeded our random expectations (19.1 ± 2.6, 18.7 ± 2.5, and 19.4 ± 2.6%, respectively, all p < 0.001) (Fig. 4). The average value of correct classification to individual was higher in mother oral than in mother nasal calls (χ 2 1 = 13.89, p = 0.002) and in mother oral calls than in neonate oral calls (χ 2 1 = 5.19, p = 0.02); no significant difference of individuality was observed between mother nasal and neonate oral calls (χ 2 1 = 2.23, p = 0.14) (Fig. 4). In order of decreasing importance, the mean f0, F3, and F2 were mainly responsible for discriminating individuals for mother oral calls, the mean f0, F3, and F4 were mainly responsible for discriminating individuals for mother nasal calls and the mean f0, F2, and F3 were mainly responsible for discriminating individuals for neonate oral calls. Thus, in all three DFAs, similar sets of key discriminating variables were found.

Fig. 4
figure 4

Individual discrimination of saigas based on mother oral calls, mother nasal calls, and neonate oral calls. Gray bars indicate values of discriminant function analysis (DFA) and white bars indicate random values, calculated with randomization procedure. Comparisons between observed and random values and between mother oral calls, mother nasal calls, and neonate oral calls with χ 2 test are shown by brackets above

Discussion

We found prominently individualistic calls of saiga mother and neonates in the calving aggregations in Kazakhstan. The strong vocal individuality may serve as a behavioral adaptation to maternal care and neonate survival in a densely crowded social environment. Our results also suggest that saiga neonates vocalize substantially less than saiga mothers, probably to decrease the risk of disclosing their location to predators by neonates. Similarly, in red and fallow deer, the calves are mostly silent, whereas the mothers vocalize when looking for their hiding young (Vaňková et al. 1997; Torriani et al. 2006; Sibiryakova et al. 2015). Another important reason why saiga neonates are more silent than their mothers is the higher risk of dehydration, because they only receive water from milk, whereas mothers were regularly observed drinking from sheep water pools, both before and after parturitions (own observations of the authors).

Our observations confirmed that saiga neonates do not display a definitive hiding phase as real hiders, as red deer (Vaňková et al. 1997; Sibiryakova et al. 2015) and fallow deer (Torriani et al. 2006), because at danger they flee instead of remaining immobile. So, they can be considered as “super-followers” on the gradual scale of hiding and following anti-predator strategies of neonates (Lent 1974; Fisher et al. 2002; Volodin et al. 2011). In contrast to those ungulates that defend their offspring against predators (Ralls et al. 1986; Smith 1987; Fisher et al. 2002; Lingle et al. 2007a, 2007b; Jacques and Jenks 2010; Bonar et al. 2016; Scornavacca and Brunetti 2016), we never observed that adult saigas defend their young. So, in case of danger saiga neonates only rely on own fast running. We hypothesize that individualistic calls may serve for mother-offspring reunions after predatory attacks and for supporting the spatial cohesion between mother and offspring in large herds during migrations.

In contrast, the Mongolian subspecies of saiga (S. t. mongolica) apparently uses a distinctive strategy of a decreased group size for facilitating mother-offspring recognition and maintaining the spatial cohesion between mother and young. In this subspecies in Mongolia, nursery groups during calving period solely consist of a mother and her young, whereas during summer, the group size increases to 6–8 individuals (Buuveibaatar et al. 2013a). Distinctive to the nominal subspecies S. t. tatarica in Kazakhstan, in which main predators are wolves (Bannikov et al. 1961), main predators of the Mongolian subspecies are raptors, golden eagles Aquila chrysaetos and cinereous vultures Aegypius monachus, and foxes Vulpes vulpes and V. corsac (Buuveibaatar et al. 2013a). So, large breeding aggregations that serve as anti-predator strategy against wolves may be unnecessary for this subspecies. Alternately, the strategy of small nursery groups in Mongolian saiga might have developed as a consequence of the depopulation of this subspecies due to poaching or as a defensive strategy against mass infection diseases (primarily pasteurellosis) that spread very fast in dense saiga aggregations and play an important role in saiga population collapses in Kazakhstan (Orynbayev et al. 2016). Further study is necessary to estimate whether calls of mother and young of the Mongolian subspecies of saiga are less individualistic than in the nominal subspecies.

In the wild-living saiga population of this study, the acoustics of mother and neonate calls were similar to those reported for captive saigas of the same subspecies S. t. tatarica originating from the Kalmykian population in Russia (Volodin et al. 2014). The differences in formants were less than 5% in calls of either mothers or neonates, and the calculated VTLs based on the formant dispersion were respectively similar in the two studies. In both studies, the duration was shorter in nasal than in oral calls; however, only in the Kalmykian population (Volodin et al. 2014) the fundamental frequency was significantly lower in nasal than in oral calls. Therefore, the acoustic structure of mother and neonate contact calls in saiga does not display geographical variation and is the same in wild-living and captive individuals.

In this study, tape measurements of non-dissected cadavers were used for collecting data on the oral and nasal vocal tract resting lengths of saiga mothers and neonates, to obtain reasonable settings for formant measuring by spectrographic software. We confirmed that the variation of VTL values was related to the variation in body size rather than to measurement errors, as standard deviation around the average VTL approximately corresponded to variations in head length and hind leg length (Table 1). However, all VTLs measured in the cadavers were shorter than the respective values reported for one dissected adult female (oral VTL 240 mm, nasal VTL 320 mm) and one male neonate specimen measured using computer tomography (oral VTL 116 mm, nasal VTL 142 mm) (Volodin et al. 2014). The oral VTLs were only slightly shorter, whereas the nasal VTLs were substantially shorter, probably because of dehydration shrinkage of the soft trunk-like nose of saiga in the cadavers.

Using automated recording systems for collecting calls allowed investigating vocal individuality of undisturbed saigas in the wild. Distinctive to manual call collection with hand-held microphones, the automated recorders can be programmed to work day and night throughout the entire period of saiga presence on their breeding grounds. This automated recording avoids the negative effects of human presence (Obrist et al. 2010; Llusia et al. 2011; Volodin et al. 2015a). Studies of vocal individuality in ungulates commonly apply experimental separations of mother and offspring (Sèbe et al. 2007; Briefer and McElligott 2011a, 2011b) or direct observations of calling animals (Volodin et al. 2014; Sibiryakova et al. 2015; Padilla de la Torre et al. 2016). However, these methods may affect the acoustic variables (Weary and Chua 2000) and are inapplicable for wild-living species, including saigas. Previously for ungulates, the automated recording systems have been applied for collecting rutting calls of free-ranging red deer (Volodin et al. 2013, 2015a, 2015b), farmed red deer (Volodin et al. 2016a, 2016b), and captive giraffes Giraffa camelopardalis (Baotic et al. 2015).

We could easily separate calls of saiga mothers and neonates in the automated recordings, as their acoustics are strongly different (Volodin et al. 2014). This would be impossible however for species in which the acoustics are indistinguishable between ages, as in Siberian wapiti Cervus elaphus sibiricus (Volodin et al. 2016a); piebald shrews Diplomesodon pulchellum (Volodin et al. 2015c) and some species of ground squirrels (Matrosova et al. 2007; Swan and Hare 2008; Volodina et al. 2010).

For saiga, we could extract individual call series in the automated recordings. This approach is relevant because each recognition event is based on a single calls series, when a mother is looking for her offspring or the neonate is calling for its mother. For saiga and other timid animals living in open habitats, this approach might be the single way to investigate vocal communication of mother and young on the breeding grounds. The probability of repeated recording call series from the same individuals was very low. We used three automated recording systems, each working for 2–3 days in the total of 7 sites, separated by >500 m. So, these 7 samples of acoustic files were perfectly independent. In addition, the area of qualitative automatic recording did not exceed 100 m, whereas neonate saigas transfer to another place a few hours after birth (Sokolov and Zhirnov 1998; Danilkin 2005). So, the next feeding occurred on another place and the concomitant vocalizations of formerly recorded animals could not be captured by the same recording device. When creating call samples for analysis, we additionally decreased potential probability of inclusion call series from the same individuals, by taking call series from the same recording site separated by large time intervals (>30 min). And based on the DFA results, it seems doubtful that call series were from the same or few individuals; otherwise, we would expect lower classification rates.

An important limitation of the automated call collection method is that the recorded call series do not allow determining, which neonate belongs to which mother. Therefore, we could not identify particular mother—offspring dyads or triads (twins occur in 20–35% of births in different saiga populations) (Kühl et al. 2007, 2009; Buuveibaatar et al. 2013b). Similar approach of using individual call series has been applied in studies of vocal individuality in alarm communication (Matrosova et al. 2009, 2010). However, data on individual identity based on individual call series are not directly comparable with studies where call samples are selected from different calls series (Volodin et al. 2011; Sibiryakova et al. 2015).

In conclusion, our data suggest that the degree of vocal identity in mother and neonate saiga antelopes might be sufficient to promote the reliable mother-offspring vocal recognition in dense social surrounding on breeding grounds. The remarkable individual vocal identity seems to be sufficient also for supporting spatial cohesion of mother and young in large herds in the wild. These findings support a hypothesis that the high individual identity represents a specialization for breeding in the crowd social environment of saiga antelopes, where mother and offspring face a challenge of bi-directional mutual recognition among other numerous vocalizing individuals.