Introduction

One of the key goals for scholars in acoustic communication is to clarify the information content of vocal signals. Vocal signals vary with the context in which they are given (Seyfarth et al. 1980; Zuberbühler 2000; Fischer et al. 2001; Fichtel and Kappeler 2002), they reveal information about the individual identity of the caller (e.g. Hammerschmidt and Todt 1995; Rendall 2003), and they may vary with fighting ability (Fischer et al. 2004) or hormonal state (Boulet and Oddens 1996; Abitol et al. 1999; Amir and Kishon-Rabin 2002, 2004). In addition, vocal signals may provide listeners with information about signaller characteristics such as size. Body size should be of particular importance in signals used during intra-sexual competition or territory defence, but also for other social interactions, such as intra-group aggression. However, in free-ranging animals, data about body size are difficult to obtain, and therefore, age and sex are often used as proxies for evaluating the influence of body size on acoustic features of vocalizations.

Among the various modalities of communication, signal production in the vocal domain is one of the best understood. Thanks to studies based on human speech (Fant 1960; Liebermann and Blumstein 1988) and musical acoustics (Sundberg 1987, 1991; Benade 1990), the mechanisms of sound production for terrestrial mammals, including nonhuman primates are well known (Fitch and Hauser 1995; Owren and Linker 1995; Fitch 2003). To briefly summarise, most vocal signals are produced by an outward flowing air stream generated in the lungs (Fitch and Hauser 1995; Ball and Rahilly 1999; Reetz 1999). The lung capacity and the control of the emptying speed allow variations in duration of the air flow, and therefore of the produced sound. The speed of the air stream also determines the amplitude of the produced sound. The signal then passes the laryngeal system, which includes the larynx and vocal folds. The tension and the size of the vocal folds determine the characteristics of the fundamental frequency (i.e. the lowest frequency at which the vocal folds are oscillating). The tenser the vocal folds, the higher their oscillation rate, and hence the fundamental frequency, and vice versa: the longer and thicker the vocal folds, the slower they oscillate and the lower the fundamental frequency. Oscillations at the fundamental frequency are accompanied by oscillations at the multiple integers of the fundamental frequency (i.e. the harmonics) (Fig. 1a).

Fig. 1a, b
figure 1

Illustration of some acoustic parameters used in the study. a Spectrogram presenting some parameters related to the duration and the fundamental frequency (F 0) of a call. b Power spectrum representing some parameters related to energy and energy distribution (dfa distribution of frequency amplitude; f 1, f 2, f 3: first, second and third formants)

The source signal then passes the supralaryngeal system, made up of the oral and nasal cavity as well as the hard and soft palate and becomes modified depending on the length, shape and boundary conditions of this system. The vocal tract acts as a filter allowing only a narrow range of frequencies (i.e. the formant frequencies, which are emphasized frequencies among the harmonics) to pass (Riede and Fitch 1999). The filter characteristics are related to the size of the vocal tract. An increase in the length of the vocal tract will lead to a decrease in the average spacing between successive formants, that is, a decrease in formant dispersion (Fant 1960; Lieberman and Blumstein 1988). Movement of tongue, lips, velum and/or epiglottis (articulators) can also alter the sound. In human speech, the filter characteristics determine the formation of the different vowels, for instance. Finally, the sound radiates from the mouth or in fewer cases from the nose (for a more detailed review on vocal production, see Fitch and Hauser 1995; the neural circuitry underlying vocal production is discussed elsewhere, e.g. in Hammerschmidt and Fischer 2006).

The anatomy of the vocal apparatus appears to determine the acoustic features of vocal signals. Such a relationship allows researchers to predict some acoustic features in vocal signals according to the body size of the emitter. For instance, since larger animals possess larger lungs and therefore have a greater air volume available for calling, they should emit longer calls than smaller animals. Additionally, since they also have longer (and possibly thicker) vocal folds, it can be predicted that they utter calls with a lower fundamental frequency. Moreover, because large animals have a longer vocal tract than small ones, they should also give signals in which formants are less dispersed and with energy concentrated in lower frequencies. Consequently, measures that reflect the distribution of the amplitude in the spectrum should have lower values in larger animals than in smaller ones. Examples are the peak frequency (i.e. the frequency with the highest amplitude), the general distribution of frequency amplitude in the spectrum (“DFA”, Fig. 1b), and the location of the dominant frequency band (i.e. the frequency that exceeds a certain energy threshold).

Only a few studies have examined these predictions by using direct measurements of body size in human and non-human primates (e.g. Hauser 1993; Fitch 1997; Hammerschmidt et al. 2000; González 2004; Pfefferle and Fischer 2006). Most other studies have used age and sex as proxies for body size (Table 1) because direct measures of body size are difficult to obtain in the wild, where most studies of vocal behaviour have been conducted.

Table 1 Table summarizing the studies cited in the review. A. Influence of body size; B. Influence of age; C. Influence of sex

Body size increases with increasing age until the animals reach adulthood (e.g. in Japanese macaques, Macaca fuscata: Inoue 1988; in squirrel monkeys, Saimiri sciureus: Hammerschmidt et al. 2001; in Chacma baboons, Papio hamadryas ursinus: Johnson 2003). Sexually dimorphic species also present test cases for the influence of body size on acoustics. In Old World monkeys, males are usually larger and heavier than females. Except for one New World monkey species (Ateles paniscus), most species are sexually monomorphic in terms of body size and weight (Ford and Davis 1992).

In this review, we examine age- and sex-related variations in non-human primate vocalizations, using both studies in which direct measures of body size were available and studies in which such differences were inferred from differences in age and/or sex. The goal of this study was to assess whether the observed acoustic variation meets the predictions generated from our knowledge about sound production. We aim to complement the work of Hauser (1993), who examined the relationship between frequencies of vocalizations and body size among taxa, and of Fitch and Hauser (1995), who reviewed the physical constraint of body size on vocal production.

Evidence

We did an exhaustive research of the available literature using the Web of Knowledge and the Science Direct databases. Despite extra effort, we could not find studies on prosimians that examined the effects of age or sex on the structure of their vocalizations. For size- and age-related analyses, we indicate the age classes if they were not conducted with continuous measures of age from birth to adulthood, and we note the sex if they do not concern both males and females. For sex-related analyses, we state the age classes used in the studies which do not focus only on adults.

Duration of the vocalization

Our predictions are supported by one study that examined the relationship between body size and call duration: large (in terms of body weight) infant rhesus macaques (Macaca mulatta) utter longer vocalizations than smaller ones (coo calls: Hammerschmidt et al. 2000). In another study however, the positive correlations between the duration of grunts of Hamadryas baboons (Papio hamadryas hamadryas) and the vocal tract length or a compound measure of various body measures were not significant (vocal tract length: n = 12, r = 0.264, P = 0.408; compound measure: n = 13, r = 0.113, P = 0.714; Pfefferle 2003).

When age is considered, duration appears to be positively correlated with age, that is, vocal signals of older animals are longer than those of young ones (isolation peeps of squirrel monkeys between 1 day and 2 years of age: Lieblich et al. 1980; trills and J-calls of pygmy marmosets, Cebuella pygmaea: reviewed in Snowdon 1988, 1989; Elowson et al. 1992; chirps of cotton-top tamarins, Saguinus oedipus: Castro and Snowdon 2000; coo calls of rhesus macaques, from birth until 5 months of age: Hammerschmidt et al. 2000; twitters of squirrel monkeys between birth and 20 months of age: Hammerschmidt et al. 2001; contact barks of male Chacma baboons, in adolescents, sub-adults and adults: Fischer et al. 2002; loud calls of male Thomas langurs, Presbytis thomasi, in juveniles, sub-adults, young adults, and old adults: Wich et al. 2003; phees, trillphees, trills and twitters of infant and juvenile common marmosets, Callithrix jacchus: Pistorio et al. 2006). To our knowledge, in only three studies was duration found to be either negatively correlated with age (grunts of vervet monkeys, Chlorocebus aethiops: Seyfarth and Cheney 1986; trillphees of common marmosets, between infants and adults: Pistorio et al. 2006) or uncorrelated (grunts of Hamadryas baboons: Pfefferle 2003; trills of common marmosets, between infants and adults: Pistorio et al. 2006).

In addition, as predicted, the duration of the call, or of parts of the call, appears to be longer in the vocalizations of the sex having the bigger size. The first part of the male Chacma baboon alarm wahoo (the equivalent of the female alarm bark) is of longer duration than the female alarm bark (Fischer et al. 2002; Fig. 2). A similar result is also found when data of Fischer et al. (2001, 2002) are compared: the first syllable of adult male contact barks is longer than the barks of adult females. The temporal parameters of the phees of juvenile common marmosets (about 20 weeks of age) also have higher values in males than in females, though there are no sexual differences in calls of infants of 5 weeks of age (Pistorio et al. 2006). In contrast, in cotton-top tamarins, the whistle duration of the combination long call appears to be significantly shorter in males than in females (Miller et al. 2004); this is also the case in the trillphees of juvenile common marmosets (Pistorio et al. 2006). It might be interesting to note that this reversal appears to happen in New World monkey species in which sexual size dimorphism is often weak. For instance, the cotton-top tamarin and the common marmoset are sexually monomorphic (Ford and Davis 1992; Rowe 1996). We might therefore expect no differences between sexes in these species, but this is apparently not the case. Furthermore, it appears from these previous statements that the influence of sex varies also according to the type of call considered, for instance in juvenile common marmosets, between phees and trillphees (Pistorio et al. 2006).

Fig. 2
figure 2

Spectrograms of an adult female (left) and an adult male (right) Chacma baboon contact call

Fundamental frequency of the vocalization

Fundamental frequency was held to be unreliable for assessing body size in some species; this was assumed but not explicitly tested by Fitch (1997). For instance, Rendall et al. (2005) found no relation between body weight, length, and neck circumference and fundamental frequency in adult humans. However, a number of studies showed that large animals utter vocalizations with a lower fundamental frequency than smaller ones in many other species [body weight in Japanese macaques: Inoue 1988; adult body weight across species and taxa with control for phylogeny: Hauser 1993; Fitch and Hauser 1995; adult body weight across species and taxa, but no control for the influence of phylogeny: Mitani and Stuht 1998; body weight in infant rhesus macaques: Hammerschmidt et al. 2000; body component (i.e. a compound measure of body weight, body length, other various body measures, vocal tract length) and all these individual body measurements in Hamadryas baboons: Pfefferle and Fischer 2006].

The fundamental frequency of a call also decreases with increasing age in many species. As scientists observed in various species and in various types of calls, young individuals generally utter calls with a higher fundamental frequency than older animals (e.g. grunts of vervet monkeys: Seyfarth and Cheney 1986; food call of Japanese macaques: Inoue 1988; trills of pygmy marmosets: reviewed in Snowdon 1988; screams of pig-tailed macaques, Macaca nemestrina: Gouzoules and Gouzoules 1989; inter-group wrrs of vervet monkeys, infant and juvenile males and females, adult females: Hauser 1989; various call types in Barbary macaques, Macaca sylvanus: Hammerschmidt et al. 1994; Hammerschmidt and Fischer 1998; coos of rhesus macaques, from birth to 5 months: Hammerschmidt et al. 2000; grunts of Hamadryas baboons: Pfefferle 2003; phees and twitters of common marmosets: Pistorio et al. 2006). Moreover, the fundamental frequency can be more variable within a call in young animals than in older ones, and so young animals produce more modulated calls compared to adults (e.g. grunts of vervet monkeys: Seyfarth and Cheney 1986; inter-group wrrs in vervet monkeys, in infant and juvenile males and females, adult females: Hauser 1989; various call types in Barbary macaques: Hammerschmidt and Fischer 1998).

Variations between the sexes also seem to reflect the variations due to body size in fundamental frequency. For instance, Chacma baboon male and female grunts are similar, but the fundamental frequency is 50% lower in male grunts than in female grunts (Rendall et al. 2004). In some other species and call types, male calls also present lower frequency characteristics than those of females (screams of bonobos, Pan paniscus, and chimpanzees, Pan troglodytes: Mitani and Groslouis 1995; alarm barks of Chacma baboons: Fischer et al. 2002). The expectation is that this tendency might be very weak in New World monkey species, in which sexual dimorphism in body size and mass is not as pronounced as in Old World monkeys. However, in common marmosets, phee-call frequency characteristics are higher in males than in females (peri- and postpubertal animals: Norcross and Newman 1993; Norcross et al. 1999; trillphees, trills and twitters of juveniles: Pistorio et al. 2006).

Peak frequency of the vocalization

The mechanisms of sound production allow us to predict that the peak frequency (i.e. the frequency with the highest amplitude) should decrease with increasing body size. We found only two studies examining the direct influence of body size on the peak frequency. In coo calls of infant rhesus macaques, Hammerschmidt et al. (2000) found that the mean peak frequency decreased when body weight increased. Pfefferle (2003) found the same trend in grunts of Hamadryas baboons when she examined the correlation of the peak frequency and a compound measure of various body measures. Likewise, the peak frequency should decrease with increasing age and should be lower in males than in females.

Indeed, the mean peak frequency decreases with increasing age in agonistic screams of pig-tailed macaques (Gouzoules and Gouzoules 1989), in various call types of Barbary macaques (Hammerschmidt et al. 1994) where it appears to be a well-suited parameter for determining age, in contact barks of male Chacma baboons (Fischer et al. 2002), and in grunts of Hamadryas baboons (Pfefferle 2003). However, changes in peak frequency can be inversed, as they are in squirrel monkey “yap” mobbing calls and “chuck” calls uttered in relaxed situations: peak frequency increases with increasing age between birth and 5 months of age (Hammerschmidt et al. 2001).

The prediction is also verified when sex is considered. Peak frequency appears to be higher in females than in males in alarm barks of Chacma baboons (Fischer et al. 2002), and in screams of bonobos and chimpanzees (Mitani and Groslouis 1995). This means that females concentrate energy in higher frequencies than males. Nevertheless, in phee calls of peri- and postpubertal common marmosets, a monomorphic New World monkey species, the peak frequency appears to be lower in females than in males (Norcross and Newman 1993; Norcross et al. 1999).

Formant dispersion within the vocalization

Bigger animals have a longer vocal tract (Fig. 3). This is reflected in at least three studies by a smaller formant dispersion (i.e. the average difference between successive formant frequencies) in large animals than in small ones (vocal tract length, body length and weight in threat vocalizations of rhesus macaques: Fitch 1997; vocal tract length and a compound measure in grunts of Hamadryas baboons: Pfefferle 2003; body weight in roars of adult male black and white colobus, Colobus guereza: Harris et al. 2006). However, this is not the case in adult humans (body height and weight: González 2004; body height and weight, neck circumference, length of third digit: Rendall et al. 2005). Pfefferle and Fischer (2006) also found a weaker correlation between formant dispersion and a compound body measure than between fundamental frequency and the compound body measure in grunts of Hamadryas baboons.

Fig. 3
figure 3

Bivariate plot illustrating the correlation between a body component (i.e. a compound measure of body size, without the vocal tract length) and the vocal tract length of Hamadryas baboons of all age classes (n = 12, r = 0.897; P < 0.001)

The results concerning the influence of age on formant dispersion are also ambiguous. On the one hand, the frequency difference between the first and the second formants appears to decrease with increasing age in male Chacma baboon contact barks both during the whole lifespan (adolescents, sub-adults, and adults; Fischer et al. 2002) and among adult males only (Fischer et al. 2004). On the other hand, the correlation of the difference between the first and the second formants (as well as formant dispersion) with body size may not hold within an age class (here adult females), even if it is significant when all age classes are considered together (Hamadryas baboons: Pfefferle and Fischer 2006). In this study, formant dispersion and the frequency difference between the first and the second formants were significantly correlated with body size across age classes (from 1–28 years of age; Fig. 4) but these relations did not hold among adult females, in contrast to fundamental frequency where the relation with body size held both across age classes and within adult females (Pfefferle and Fischer 2006).

Fig. 4
figure 4

Bivariate plot illustrating correlations between body component (i.e. a compound measure of body size) and a formant dispersion (r = −0.925; R 2 = 0.855; P = 0.0001) and b distance between the first and the second formants (r = −0.887; R 2 = 0.786; P = 0.0005). Vertical lines represent SD. Figure redrawn from Pfefferle and Fischer (2006)

We failed to find any study that directly investigated sexual differences in formant dispersion in non-human primates. Therefore, we are not able to examine the relationship between sex and this acoustic parameter to see if it matches our predictions. In sum, though, formant dispersion might not always be a better predictor of body size than fundamental frequency. Whether or not formant dispersion can be considered to be a useful measure appears to depend on the call type under study.

Energy distribution within the vocalization

According to the mechanisms of sound production, we predict that the energy should be concentrated in lower frequencies in signals of animals with a large body size. In grunts of Hamadryas baboons, the frequency at which the first quartile of global energy is reached (distribution of the amplitude in the frequency spectrum, dubbed ‘DFA’, see Fig. 1b) decreased with increasing body size (Pfefferle 2003).

In contact barks of male Chacma baboons, variables that are related to the distribution of energy in the spectrum (DFA) at the beginning and across the call also decrease with increasing age from adolescents to sub-adults and adults (Fischer et al. 2002). In grunts of Hamadryas baboons, the frequency at which the first quartile of global energy is reached (DFA1) and the first dominant frequency band (i.e. the first frequencies that exceed a certain energy threshold) also decrease with increasing age (Pfefferle 2003).

The sex of the caller also appears to influence the energy distribution in the predicted direction. For instance, in alarm and contact barks of Chacma baboons (Fischer et al. 2001, 2002), the distribution of the frequency amplitude is concentrated in lower frequencies in males than in females. In addition, the mean value of the first dominant frequency band and its value at the beginning of the call have been found to be significantly lower in males than in females in alarm barks of Chacma baboons (Fischer et al. 2002).

Conclusion

In this review, we examined whether acoustic variables vary in the same way with age and sex as with changes in body size. Overall, variations directly linked to body size confirmed the predictions based on the mechanics of sound production. Larger animals, i.e. older animals or animals of the sex with the larger size, utter longer calls, with energy concentrated in lower frequencies, as well as with a lower and a less modulated fundamental frequency than smaller ones (Table 1). The homogeneity of age- and sex-related variations is quite remarkable, even for the acoustic parameter fundamental frequency, which was recently discussed as being generally unreliable because signallers can modulate their fundamental frequency. Therefore, age and sex (and age more reliably than sex) seem to represent generally reliable proxies to evaluate the influence of body size on acoustic features (at least duration, fundamental frequency and energy distribution), if data on body size are not available. This conclusion, however, is valid only for large differences in body size, such as in adults versus juveniles.

Age seemed to be a more reliable proxy than sex. Sexual selection might have decoupled acoustic properties from body size. Indeed, in some call types used, for instance, for mate recognition and advertisement of territory, or in human speech, the variations in acoustic features exceed those predicted by body size dimorphism (Rendall et al. 2005). These vocalizations might have been shaped through evolutionary time by sexual selection, which could have enhanced sexual differentiation in the vocal folds for instance, independently of body size (Rendall et al. 2005).

According to our evaluation of the published data, sex-related variations do not reliably reflect variations in body size in New World monkey vocalizations. The finding that females produce calls that are different from males in terms of duration, fundamental frequency and peak frequency is puzzling, since the considered New World monkey species do not present any obvious sexual dimorphism in body size or mass (Ford and Davis 1992; Rowe 1996). Possibly other factors, such as sexual selection, may have a stronger influence on acoustic variables than body size. For instance, sexual selection might affect vocalizations in another way in New World monkey species than in Old World monkeys, favouring the transfer of exaggerated information about size in females, especially in calls used for mate recognition, in New World monkey species. However, the reversal of tendencies also appears in vocalizations such as isolation calls, in which sexual selection is not expected to play a determining role. In these types of calls, sexual differences in arousal, social context, or growth rate in juveniles might have an influence. Remarkably, no sexual differences were found in acoustic parameters belonging to all the categories studied in various call types of Barbary macaques, which are strongly sexually dimorphic in body size and mass (Hammerschmidt and Fischer 1998). Such particular cases may need further investigation.

Other results from our review highlight the complexity of the relationship between body size and acoustic features. This relationship is highly predictable when body size variations are large, but can become unpredictable and less obvious when variations in body size are more subtle (e.g. within an age- and sex-class). Indeed, Rendall et al. (2005) did not find any relationship between fundamental frequency or formant frequencies and body size in human adult females, and only a weak relationship between body length and formants in human adult males. González (2004) found similar results in adult humans of both sexes. Collins (2000) also highlighted the fact that, in human adult males, fundamental frequency and formants are not correlated with body size, height and shoulder width, even though human females can reliably estimate weight (but not height or age) with these two acoustic variables. In non-human primates, Fischer et al. (2004) also failed to find any relationship in contest barks between fundamental frequency and shoulder height or weight among adult male Chacma baboons. When these subtle variations in body size cannot explain variations in acoustic features, variations in the social context or in the internal state of the caller, such as hormones or arousal, might play a role. Indeed, Fischer et al. (2004) found that high ranking adult male Chacma baboons produce contest barks with a higher fundamental frequency (Fig. 5), which might be a by-product of a higher call amplitude. Unfortunately, it was not possible to test this assumption in the field. These other factors may interact with or override the influence of body size, and this interaction needs to be investigated in more detail.

Fig. 5
figure 5

Scatterplots showing the relationship between absolute rank and the mean fundamental frequency (R 2 = 0.594; F = 20.056; P = 0.001). Vertical lines represent SD. Figure redrawn from Fischer et al. (2004)

Another explanation for such a complication of the relationship between body size and acoustic features may be related to honest signalling and the advertisement of quality through communicative signals. First, the mechanisms of sound production impose global physical limits on vocal production due to body size. The physical constraint of body size on acoustic features defines a basic range for each acoustic feature within which the acoustic feature can vary without any additional investment from the caller. This provides reliable information about the caller’s intrinsic attributes, such as its size, with relatively low cost for the signallers, simply because of the mode of signal production. Vocalizations in which acoustic features vary only in relation to these cost-free signaller attributes (i.e. without additional investment) are termed “index” signals (Vehrencamp 2000; Fitch and Hauser 2003). However, within the range imposed by the anatomy of the caller, vocalizations may also vary with the quality of the caller. Individuals of high quality can afford some additional costs, such as longer calls with a higher call amplitude, and they may therefore modify their vocalizations in such a way that the acoustic features are shifted to the extremes of the range defined by the physical constraint of body size. If this shift to the extreme bears some cost, such vocalizations fulfil the criteria for “quality” signals (Vehrencamp 2000). They provide honest information about the quality of the caller, for instance his competitive ability, because only high quality individuals can afford the additional costs (e.g. energy used for production, higher exposure to predation, higher vulnerability to receiver’s attack) of shifting to the extremes of the range determined by physical constraints (Vehrencamp 2000; Zahavi 2003; summarised in Fischer et al. 2004).

Studies on age-related variations in acoustic features have traditionally been conducted without separation between sexes. The factor “age” is therefore often confounded with the factor “sex”. In addition, studies concerning sex-related variations were most often conducted only on adult animals. Therefore, few, if any, studies have investigated at which particular state of development sexual differences emerge in non-human primate vocalizations. This constitutes a gap that remains to be filled.