Keywords

6.1 Introduction

The focus of a chapter on communication in family-living and pair-bonded primate species may seem at first blush to be unusual since most reviews are organized by phylogeny. However, there is merit in focusing a review based on the social system rather than phylogeny. Much of the interest in studying primate vocal communication has been because of their phylogenetic closeness to human beings. Hence, research on our closest ancestors, the great apes, may have great importance for understanding the evolution of human communication, including language. However, none of the great apes shares a social system similar to that in which most humans today live, a system of pair-bonded adults raising offspring within the context of a family. Although the study of communication in great apes may illuminate some of the important cognitive underpinnings of the evolution of human communication, the divergence in social systems may obscure many other important social aspects of communication that may be specific to family-living and pair-bonded species. The goal of this chapter is to examine how vocal communication in family-living and pair-bonded primates might differ from that of species in other social systems.

What is different about family-living primates? These species are typically characterized by close social and emotional bonds between mates. Rather than simply mate and separate, an adult male and female will spend much of their time together; for example, pair-bonded common marmosets (Callithrix jacchus) may spend up to one-fifth of their daily activity (2–3 hours) grooming each other (Lazaro-Perea et al. 2004). Pair-bonded primates are known to show emotional responses when separated from each other, and they often prefer to associate with each other instead of other same-sex and opposite-sex members of their species or even their own infants (Mendoza and Mason 1986). In contrast to great apes, pair-bonded, family-living primates typically share parental care, foraging, and territorial defense duties, often exchanging roles throughout the day (Savage et al. 1996). Thus, careful and precise communication is important to share these different roles among group members, and one should expect to find greater attention to signals produced by group mates than might be the case in other species. In contrast to great apes, pair-bonded primates are generally sexually monomorphic (i.e., no obvious physical differences between males and females) and intra- and intersexual selection acts on both sexes equally. Thus, one might expect vocal communication to be less sexually dimorphic than that found in species with other social systems. Vocal communication also plays a role in developing and sustaining a pair bond and reducing stress in partners. Pair-bonded, family-living primates have exhibited the best evidence to date of cooperative foraging, rapid social learning, and even teaching of young (e.g., Rapaport 2006; Burkart et al. 2007; Humle and Snowdon 2008). Each of these activities is facilitated by vocal communication.

Family-living primates are noteworthy for the extensive energy that both parents invest in parental care. This joint investment is a major difference between family-living nonhuman primates and other nonhuman primates for which mothers are the major or sole providers of infant care. However, shared parental care leads to critical problems for each parent. Unlike female mammals where maternity is obvious due to pregnancy and parturition, a male mammal can never be certain of the paternity of his offspring. Consequently, a male needs to develop some trust that his mate will be unlikely to mate with anyone else, and at the same time, a female needs to have trust that her mate will still be with her when her infants are born to help her to care for them. Thus, it is important for both sexes to be selective about their mates and for both sexes to exhibit traits that can be used to evaluate mate quality. Both mates have a shared interest in defending their relationship from intruders of either sex, and territorial defense is often marked by vocal signaling.

Developmental processes, as well as plasticity of adult communication, appear to be different in these species, with evidence of babbling (highly variable versions of adult calls that become more stable with age), contingent actions by adults that shape vocal development, and the capacity to change vocal structure when an individual joins a new group. It is almost canonical in the literature that primate vocal development follows a fixed trajectory with no vocal learning or modification of call structure and usage (Hammerschmidt and Fischer 2008), yet social processes do play an important role in vocal development in many pair-bonded, family-living primates, with adults reinforcing vocal signals in the young and facilitating their development of adult structure and usage (Snowdon 2009). Evidence of population-specific dialects and vocal convergence when individuals change group membership illustrate vocal plasticity throughout the life span, not just in young animals. These primates also demonstrate remarkable flexibility in using vocal signals in novel contexts and in altering vocal structure in response to environmental noise and distance between conspecifics.

The pair-bonded, family-living species for which there is the most information about vocal communication include gibbons (Hylobates spp.) and siamangs (Syndactylus spp.), lesser apes found in Asia, and owl (or night) monkeys (Aotus spp.), titi monkeys (Callicebus spp.), Goeldi’s monkeys (Callimico goeldi), marmosets (Cebuella pygmaea, Callibella spp., Callithrix spp.), and tamarins (Saguinus spp., Leontopithecus spp.) all from the South American tropics. Each of these species is almost exclusively arboreal where visual contact between group members is minimal except with animals in close proximity, which makes the use of vocal signals often of greater importance in communication than visual signals.

The remainder of this chapter reviews research on vocal communication in pair-bonded, family-living primates. The topics examined include sexual selection and sex differences (Sect. 6.2); formation and maintenance of pair bonds, including identification of mate (Sect. 6.3); cognitive aspects of vocalizations (Sect. 6.4); vocal signals used in social learning and teaching (Sect. 6.5); developmental processes, including babbling and adult contingent reinforcement of infant calls (Sect. 6.6); flexible adult vocal structure, including dialects and response to environmental noise (Sect. 6.7) and habitat acoustics, and use of vocalizations (Sect. 6.8).

6.2 Sexual Selection and Sex Differences

Most family-living primates display minimal sexual dimorphism (differences between males and females) in body structure and hair or skin coloration, and a logical prediction is that sexual dimorphism should be reduced in vocal communication as well. In many primate species living in multi-male, multi-female groups or in one male, multi-female groups, males may be up to twice the size of females, and in many species there are sex-specific vocal adaptations, including the flanges of mature male orangutans and the resonating throat pouches of male howling monkeys (Alouatta palliata palliata). Gauthier and Gauthier (1977) described several sex differences in vocalizations of Old World monkeys, including many species that have loud calls produced exclusively by males. In addition, males often had smaller vocal repertoires than females and called less frequently (Gauthier and Gauthier-Hion 1982).

Most family-living species are socially monogamous with males often playing an important role in infant care. This leads to different predictions from those usually made concerning sexual selection: males are said to compete with each other for mates, whereas females should be coy and choose carefully among many potential mates. In monogamous species, sexual selection should apply equally to both sexes. If a male is going to be more heavily involved in energetically costly parental care activities, then he should be selective about his partner, and one might expect greater competition among females to choose the best mate. At the same time, if a female relies on assistance from her mate for successful infant care, then she also needs to be selective, and males should compete among each other for the best mate. This should lead to similar displays indicating mate quality in both sexes, including vocal displays. In particular, both sexes should be similar in terms of pitches of vocalizations and might be expected to have similar vocal repertoires and to use them in similar contexts.

Although overall sex differences are predicted to be minimal in family-living nonhuman primates, some sex differences in vocalizations have been observed, such as differences in temporal parameters and usage, and some subtle differences in fundamental frequency rather than the large differences in fundamental frequency seen clearly in humans (e.g., Puts et al. 2012). Where fundamental frequency is sexually dimorphic, it is often males with higher frequencies. In the species of gibbons in which duetting between mates is common, males and females typically produce different sequences (Marshall and Marshall 1976) with females appearing to induce male singing in some species. (Deputte 1982). Lan (1993) reported that morning singing was dominated by males and that males and females produce different calls. Kloss’s gibbons (Hylobates klossii) do not show the coordinated singing (duets) found in most other gibbon species (Dooley et al. 2013). Playbacks of male H. klossii solo songs elicited responses only from resident males, whereas playbacks of female songs elicited responses only from resident females (Raemakers and Raemakers 1985). Song structure may provide information relevant to mate choice in gibbons. Barelli et al. (2013) measured male song structure and fecal androgens, and they found males with higher androgen levels produced longer calls with higher pitch.

In common marmosets, males have higher frequency and greater variability in phee calls than females (Norcross and Newman 1993). In Weid’s black-tufted-ear marmosets (Callithrix kuhlii) differences in frequency parameters distinguish male and female phee calls, and marmosets responded differently to playbacks of male and female calls (Smith et al. 2009). Miller et al. (2004) reported sex differences in the combination long calls of cotton-top tamarins (Saguinus oedipus) with males having shorter calls than females. Females were more attracted to male long calls with shorter notes, and males were more attracted to female calls with longer note duration, suggesting to Miller et al. (2004) that these long calls may play a role in sexual selection.

A natural playback experiment designed to see how cotton-top tamarins would respond to hearing calls of unfamiliar monkeys found a sex-specific response (McConnell and Snowdon 1986). Males gave chirps and females gave long calls in the early minutes, but both sexes converged on chirp plus long call vocalizations at the peak of arousal. However, a replication of the experiment on later generations of the same colony 20 years later (Scott et al. 2006) found a complete reversal, with males giving long calls and females giving chirps in the initial response to hearing an unfamiliar group. This replication provides a caution about attributing sex differences to tamarins. Although these studies have been done with captive animals, there is little reason to suspect different results from wild populations.

In summary, although some sex differences in vocalizations (and in response to vocalizations) have been reported in several family-living species, these are often quite subtle, requiring discriminant analyses of calls using multiple acoustic parameters to uncover sex differences. In comparison with species with other breeding systems, the sex differences in family-living species are relatively minor and, given the reversal of results in the same colony after 20 years, might be labile.

6.3 Formation and Defense of the Pair Bond

A strong pair bond is critical for socially monogamous species and in mammals is a necessary precursor to male parental care (Lukas and Clutton-Brock 2013). All male mammals face the problem of never being certain of paternity, and a monogamous relationship can provide some confidence that the infants the male is helping to rear are likely to be his own offspring. Vocalizations play an important role in forming and maintaining a pair bond and in keeping other individuals away. Most family-living species have long calls or songs that are often coordinated between mates and may serve to both reinforce the pair bonds and exclude others.

6.3.1 Duetting, Coordinated Songs, and Long Calls

Most gibbon species, as well as titi monkeys, show coordinated duetting or singing behavior, and there has been much interest in its coordination and function. Duetting is found among monogamous species of several Old World primates, including strepsirrhines [e.g., tarsiers (Tarsius spectrum), indris (Indri indri)], a langur species (Mentawai langur, Presbytis potenziani), and gibbons (Hylobates spp.). Haimoff (1986) has noted convergence in the structure and timing of duetting across these diverse species, including narrowband calls at dawn with a restricted frequency range and few harmonics, suggesting a convergence of duetting in monogamous Old World primates. Although duetting in gibbons and titi monkeys may have a role in pair-bond formation and strengthening the relationship between mates, most of the research, including playback experiments and naturalistic observations, suggests these calls primarily function to exclude intruders and maintain spacing.

6.3.2 Vocal Responses to Intruders

Several studies on gibbons have looked at responses to playbacks of vocalizations from familiar and unfamiliar animals. Raemakers and Raemakers (1985) found that male white-handed gibbons (H. lar) would respond as if to evict intruders if the songs were from solo males or pairs but not from solo females, whereas females reacted to the songs of solo females only. As with white-handed gibbons, female Bornean gibbons (H. muelleri) led group approaches and initiated singing to playback of female song, whereas males led group approaches and initiated singing to male songs. Studies on the Bornean gibbon found that playback location influenced responses (Mitani 1984, 1985). When the playback speaker was placed within the territory, mated males led approaches toward the songs, whereas songs played on the periphery led most commonly to singing behavior by the mated pair. Playbacks from deep within a neighbor’s territory yielded neither approaches nor singing.

The role of duetting in territory maintenance has been studied in several species of titi monkeys. There are many curious species differences among titi monkeys that do not lead to any simple conclusions about the territorial functions of complex calls. In the red-bellied titi (Callicebus moloch), calling and counter calling led neighboring groups to approach each other and served to reinforce territory boundaries (Robinson 1979b, 1981), whereas in the collared titi (Callicebus torquatus), playbacks of solo male calls led to avoidance of the caller and playbacks of paired song led to counter calling but not approaching (Kinzey and Robinson 1983). However, in masked titi monkeys (Callicebus personatus), group encounters were rare and exclusively vocal with few signs of territorial behavior (Price and Piedade 2001). Caselli et al. (2014) found several variations of loud calls in the black-fronted titi (Callicebus nigrifrons) and argued that the calls were not used between groups and did not defend access to mates, but instead they regulated access to resources.

Marmosets and tamarins produce multi-syllabic whistle-like calls that appear to be used in multiple contexts. In some species, individual syllables are relatively flat in frequency, whereas in other species, syllables are frequency modulated. Three types of long calls were identified in cotton-top tamarins: one used in response to hearing calls of unfamiliar animals; another used when pair mates were separated or at a distance from each other; and a third form, the combination long call that includes both chirps and whistle notes, used mainly by nonreproductive individuals (Cleveland and Snowdon 1982). But Miller et al. (2005) found the combination long call to be common among reproductive adults in their colony. Playback studies found each of these three call types elicited different behavioral responses in cotton-top tamarins (Snowdon et al. 1983).

In a captive study that involved open doors between colony rooms to simulate the approach of unfamiliar animals, cotton-top tamarins increased their rates of long calling, suggesting that long calls play a role in territorial behavior (McConnell and Snowdon 1986; Scott et al. 2006). Playbacks of long calls of an unfamiliar cotton-top tamarin elicited antiphonal calling from residents and was used to census populations in the wild (Savage et al. 2010). These results imply a territorial function for long calling. Norcross and Newman (1993) found that phee calls from separated marmosets differed in structure from phee calls used in territorial contexts from the home cage. Furthermore, Norcross and Newman (1997) found that common marmosets rarely produced territorial phee calls when living in their natal group, but they began producing phee calls within four days after being paired with a mate. Golden lion tamarins (Leontopithecus rosalia) also have three distinct forms of long calls that are used for within-group cohesion, by animals separated from their group, and in territorial encounters, respectively (Halloy and Kleiman 1994). Thus, some long calls have a clear territorial function but other variants are used in other contexts.

6.3.3 Partner Separation

It is common for many species to call when separated from their group, but in marmosets and tamarins, calling is stronger when separated specifically from their mates. Playback of calls from a mate can reduce the stress of separation. Porter (1994) separated cotton-top tamarin mates into different rooms for 30 min and recorded a high rate of long calls from both sexes with males giving significantly more calls than females. Similarly, increased calling rates (and elevated cortisol levels) have been reported in marmosets and golden lion tamarins when housed alone or in novel social environments (Smith et al. 1998; Norcross and Newman 1999; Shepherd and French 1999). In a captive experiment, Ruckstalis and French (2005) played back vocalizations of mates to isolated marmosets and found that cortisol levels were significantly reduced compared with levels under control (no playback of mate calls) conditions. Thus, marmosets and tamarins display distress through increased long calls when separated from their mates, but this distress can be alleviated simply through playback of the mate’s vocalizations. These results imply the ability to recognize specific individuals on the basis of call structure, and this has been shown explicitly in studies of pygmy marmosets (Cebuella pygmaea) (Snowdon and Cleveland 1980) and cotton-top tamarins (Snowdon et al. 1983).

6.3.4 Summary: Formation and Defense of Pair Bonds

Although coordinated duetting or singing behavior is often thought to be involved in indicating or maintaining a pair bond, there is little direct evidence of this except from marmosets and tamarins for whom the main function of coordinated calls is to indicate territory boundaries or maintain spacing between groups. However, in marmosets and tamarins there are also acoustic differences between the long calls used when bonded animals are separated from each other and the long calls used in territorial displays. Although separation induces long calling that is associated with increased stress hormone levels, playback of the mate’s calls is sufficient to reduce cortisol levels, suggesting that the mate’s voice has a stress-reducing effect.

6.4 Cognitive Aspects of Vocalizations

Many scientists have been interested in the cognitive components of communication. This research has mainly focused on Old World primates and great apes, but there has been increasing research on family-living primates. Among the topics that have been studied are whether signals are purely emotive or can also reference objects or events outside of the communicator, whether there is any syntactic structure to call sequences, the ordering of turn taking among individuals within a group, long-term memory for vocalizations, and perception of signals.

6.4.1 Referential Signals

Referential signals are calls that refer to a specific object or event in the environment. Some investigators (e.g., Zuberbühler 2000) have equated these signals with the prototypes of words in human language, but there is an emotional component in these calls as well. An animal communicating about food may also be communicating about its own desire for or interest in food. An animal that gives a predator-specific alarm call is not just identifying a predator but also is likely to be indicating some state of fear or arousal as well. Both food-associated calls and predator alarm calls have been studied in family-living primates.

6.4.1.1 Food Calls

Many nonhuman primates have specific calls that they give when they discover food. Elowson et al. (1991) measured individual food preferences for six foods in cotton-top tamarins and subsequently recorded calls associated with each of these foods. They reported two subtly different forms of calls: C-chirps were given as an animal approached the food and D-chirps were given after animals had taken the food. The rate of anticipatory calls (C-chirps) correlated directly with an individual’s preference for foods. Benz (1993) replicated this study with golden lion tamarins and twelve different types of food and also found a correlation between an individual’s preference and the rate of calling and specific call variants for protein, dried fruit, and grapes.

Caine et al. (1995) studied food calls in red-bellied tamarins and found more food calls with larger quantities and more palatable foods. However, they failed to find food calls during food exchanges between adults, similar to the results for cotton-top tamarins (Joyce and Snowdon 2007) but in contrast to adult lion tamarins (Brown and Mack 1978). Caine et al. (1995) also found that red-bellied tamarins called more often when they could see other group members than if they found food alone. In contrast, Roush and Snowdon (1994) failed to replicate the relationship between food preferences and rate of calling in cotton-top tamarins.

6.4.1.2 Predator Alarm Calls

Several nonhuman primates produce calls that are either specific to predator species or to the general context in which a predator operates (i.e., aerial/canopy versus ground). These predator specific calls, most famously among the vervet monkeys (Chlorocebus aethiops), have provoked considerable interest as a possible semantic signal parallel to words in speech (e.g., Seyfarth et al. 1980). Family-living primates are no exception. White-handed gibbons produce predator-specific calls to tigers and leopards and nonspecific alarm calls to eagles and pythons (Clarke et al. 2012). Black-fronted titi monkeys produce one type of alarm to raptors and to capuchin monkeys (Cebus capucinus) found in the canopy, and a different type of call is given to terrestrial threats (Cäsar et al. 2012a). In a study that played these calls back to groups of black-fronted titi monkeys, the monkeys looked up to the sky and canopy when the aerial alarm was played and looked at the caller when the terrestrial alarm was played, suggesting that the monkeys made inferences about the type or location of a predator based on call structure alone (Cäsar et al. 2012b). Sympatric saddleback and moustached tamarins (Saguinus fuscicollis and S. mystax) also have predator-specific alarms for aerial and terrestrial predators, and they responded in a similar fashion when each was given (Kirchhof and Hammerschmidt 2006). Both species responded equally to the calls of their own as well as to those of the other species, illustrating cross-species recognition of alarm calls.

6.4.1.3 Other Signals

White-handed gibbons produce a seemingly similar hoo call (a moderate to soft call with a broad frequency range, given as a single call or in bouts of two to three calls in a variety of contexts). However, when these calls were analyzed in terms of structure, several subtle variants were identified that were consistently correlated with specific contexts: feeding, separation from group, encountering predators, interacting with neighbors, and duet songs (Clarke et al. 2015). Similar results were reported much earlier in cotton-top tamarins: eight different varieties of chirps were each associated with different contexts (alarm, mobbing, unfamiliar animal, approaching feeding, feeding, within group coordination; Cleveland and Snowdon 1982) (see Fig. 6.1). The differentiation of variants, in what initially sounds to human observers like a single call, indicates a greater complexity of vocal structure and contextual reference for these variants than previously appreciated.

Fig. 6.1
figure 1

Chirp variants in cotton-top tamarins: type A used in mobbing; type B used in investigation of novel objects; type C used during foraging, and type D used during eating. Type E chirps serve as alarm calls. Type F chirps are given in response to hearing calls of novel animals. Type G chirps are exchanged between calm animals within a group and type H chirps are used as mild alarms. (Modified from Snowdon 1982)

6.4.2 Syntax

Syntax in animal signals refers to the orderly sequencing of multiple calls or notes. Much of bird song is highly organized in terms of the structure and sequencing of different notes or themes, and there is also evidence of this in family-living primates. The songs of gibbons are highly structured with a series of notes produced in the duetting song and coordination of singing between the male and female (see Sect. 6.2). While the same notes are also found in songs that are given in response to predators, the structural organization of the notes differs. In white-handed gibbons, out-of-sight animals responded differently to the two types of songs, indicating that they were using the sequencing of notes rather than the notes themselves to discriminate between the two types (Clarke et al. 2006). In red-bellied titi monkeys, several calls are repeated, and these calls are organized into sequences involving different call types. These sequences were quite regular, and when playbacks of calls in altered sequences were presented to titi monkeys, they showed some ability to discriminate between normal and abnormal sequences (Robinson 1979a).

Tamarins and marmosets also show examples of syntax. Cleveland and Snowdon (1982) described several sequences in calls of cotton-top tamarins with a few general rules. Chirp-like calls always preceded longer constant-frequency calls within a sequence and, within a series of constant-frequency calls, each successive note was higher pitched than the previous one. In most cases, the sequence could not be decomposed into separate parts. That is, the sequenced call did not have the same function as each of the component parts did individually. This is phonological syntax, akin to the use in speech of different phonemes to create different meanings, such as “dog” versus “god.” However, cotton-top tamarins showed a few examples of lexical syntax, wherein each component of the sequence has its own context and the sequence represents the combination of these contexts. For example, after an alarming event an animal will combine an alarm call with an affiliation call, and after this, other group members become active again. A second example is calling in response to the calls of novel animals: the male and female initially each use different calls but combine both types of calls at the peak of arousal (McConnell and Snowdon 1986). Miller et al. (2005) presented tamarins with manipulated long calls and found that recognition of call type and of caller occurred in separate stages of sensory processing.

6.4.3 Turn Taking

Duetting between mated pairs was discussed previously in Sect. 6.3.2 on pair bonding, but coordination of calling among group members is also seen outside of the calling between mates. In a group of three pygmy marmosets, Snowdon and Cleveland (1984) found that each animal within the group was more likely to call before another animal called a second time, and one possible order of turn taking (e.g., ABC, BCA, or CAB) was more common than the other order (CBA, BAC, or ACB). The development of turn taking is dependent upon the ability to recognize each individual based on voice alone.

Several studies have looked at antiphonal calling (the exchange of calls between two or more individuals or groups), which is common among marmosets and tamarins. The results included evidence of individual recognition within antiphonal calling (Miller and Thomas 2012), different structure in initial calls versus answering calls (Miller et al. 2010), and evidence of learning turn-taking behavior during development (Chow et al. 2015). Vocal turn taking by marmosets shows similar dynamics as vocal turn taking by humans, implying a converging evolution of cooperative vocal behavior in these two cooperatively breeding species (Takahashi et al. 2013).

6.4.4 Vocal Memory

Individual recognition by voice is critically important in any social group of primates, and recognition of voices of mates and of other family members is important in family-living species (also see Sect. 6.3). Little work has been done on long-term memory for vocalizations. However, in the natural environment where animals of both sexes disperse and form new family groups, recognition of the voices of relatives might be important in avoiding inbreeding. One study of cotton-top tamarins demonstrated that memories of calls of former family members last up to 5.5 years (Matthews and Snowdon 2011). To date, this is the longest duration of vocal memory in any nonhuman primate.

6.4.5 Perception

In human speech, phonemes are produced along a variety of continua, such as voice onset time or place of articulation, and human perceptual systems organize these vocal continua into discrete categories that allow the perception of distinct phonemes instead of multiple variations. Do similar processes exist in other species? Pygmy marmosets produce many variants of trills, which are sinusoidal, frequency-modulated calls varying in bandwidth and duration (see Fig. 6.2). Although several variants are used in similar contexts (see Sect. 6.8), two trill types are used in distinct contexts: the closed mouth trill is used as an affiliative contact call, whereas the open mouth trill is used in agonistic contexts. The main structural difference between these two calls in a captive population was duration with all closed mouth trills being shorter than 250 ms and all open mouth trills being longer. Snowdon and Pola (1978) synthesized trills and varied them along dimensions of bandwidth, rate of frequency modulation, and duration and played these synthesized trills to the marmosets. On the duration dimension, there was a clear category boundary at 250 ms with calls on either side of the boundary (varying by only 8 ms) eliciting different responses. Closed mouth trills elicited an immediate antiphonal response, whereas open mouth trills did not.

Fig. 6.2
figure 2

Trill variants in pygmy marmosets: (A) closed mouth trill; (B) open mouth trill; (C) quiet trill; (D) juvenile trill; (E) J-call. (From Snowdon 1982, reprinted with permission of Cambridge University Press)

Masataka (1983) played synthesized alarm calls to Goeldi’s monkeys (Callimico goeldii) and found that an increase of 0.2 kHz in the frequency range of the modulating sweep was sufficient to induce different behavioral responses, from a response appropriate to a mobbing call (i.e., approaching the caller to attack a predator) at a low-frequency range to a response appropriate to an alarm call (i.e., freezing) at a higher frequency range. Thus, both pygmy marmosets and Goeldi’s monkeys show a human-like categorical perception of their own calls.

In a perceptual study of cotton-top tamarins, Ghazanfar et al. (2001) played back partial phrases or complete combination long calls and found that isolated tamarins responded significantly more to the entire call than to any component parts. They concluded that, from a tamarin’s perspective, the entire long call forms the unit of perception. Bauers and Snowdon (1990) selected the two most acoustically similar of the eight chirps produced by cotton-top tamarins (F and G chirps, see Fig. 6.1) and found a clear difference in behavioral responses between the two playbacks.

6.4.6 Summary: Cognitive Aspects

There is considerable evidence for cognitive complexity in vocal communication in family-living primates. Referential signals communicate about food quality and predator types, and there is evidence of subtle variation in call structure that is correlated with specific contexts. Several species have call sequences that are consistent and predictable, and different sequences are used in different contexts. Many species show turn-taking behavior that indicates rule-based structures governing who will call as well as individual recognition of group members. There is some evidence of long-term vocal memory that may be important in avoiding inbreeding, and the perception of vocalizations has several parallels to the perception of speech sounds by humans.

6.5 Vocalizations in Social Learning and Teaching

Studies of social learning and teaching rarely mention the role of communication, yet vocal communication may play an important role. This section examines two sets of findings: one on how vocal communication might influence social learning and the other on putative teaching behavior in tamarins.

6.5.1 Social Learning

Although there is good evidence that rodents and birds can learn from others to avoid noxious foods (Galef and Giraldeau 2001), there has been little evidence among nonhuman primates. An illustrative example is on tufted capuchin monkeys (Cebus apella), which are not pair bonded or cooperatively breeding. When invisible white pepper was added to a familiar preferred food, mozzarella cheese, Visalberghi and Addessi (2000) found that capuchin monkeys learned to avoid the food individually. That is, there was no effect of watching other animals sample the adulterated food.

In a replication of the food avoidance study, in this case with cotton-top tamarins, Snowdon and Boe (2003) added white pepper to highly preferred tuna fish and found that only a third of the tamarins ever sampled the adulterated tuna, meaning that the other two-thirds of the animals avoided this previously preferred food. Furthermore, when tuna was later presented without any pepper, several animals continued to avoid eating tuna for more than a year after the initial experiment. What could account for the difference between these two studies? There was no evidence of any communication between the non-family-living capuchin monkeys that Visalberghi and Addessi studied, whereas cotton-top tamarins that sampled the adulterated tuna significantly reduced the number of food calls produced and increased the number of alarm calls (a novel use of alarm calls, see Sect. 6.7.4). The monkeys that first sampled the food also gave an increased frequency of visual disgust responses. Thus, the use of vocalizations (and visual signals) by the tamarins that first sampled the adulterated tuna may have facilitated the rapid and enduring social learning to avoid tuna.

6.5.2 Teaching

The existence of teaching in nonhuman animals has long been controversial. However, Caro and Hauser (1992) provided a simple operational definition. They have four criteria: (1) the teacher must alter its behavior only in the presence of a naïve animal; (2) the teacher must incur some cost or at least no immediate benefit; (3) the teacher’s behavior encourages, punishes, or sets an example for the naïve animals; and (4) as a result, the naïve animal acquires a skill faster than it might otherwise. An additional criterion might be that the teacher is sensitive to the changes in the learner’s behavior and alters its own behavior accordingly.

Tamarin and marmoset species are interesting because adults often share food with infants beginning at the time of weaning. This appears to modulate any weaning conflict and leads to young animals being able to feed on solid food at an earlier age than they might otherwise. Vocalizations play an important role in this process. Infants of many species beg for food, but adult tamarins who are prepared to share food with infants give distinct variations on normal food calls (see Sect. 6.4.1.1). Adults produce not only more bouts of food calls but also produce many more calls within a bout at a much faster rate than they do with only adults present (Joyce and Snowdon 2007). The probability of an infant being able to obtain food from an adult is dependent on the adult producing the call (Roush and Snowdon 2001; Joyce and Snowdon 2007). Adults have modified their vocal behavior specifically for use in the food sharing context. Since these calls are energetically more costly than normal food calls and the adults are giving up some of their food, they are clearly incurring a cost. When twins are present (twinning is common among marmosets and tamarins), adults begin to give these rapid food calls and to share food almost a month earlier than when there is only a single infant present. Twins who receive food sharing at an earlier date also begin to forage on their own earlier than singletons, suggesting that the initially naïve animals are acquiring skills as a result of the adult behavior.

Food sharing begins at the end of the second month of life, peaks during the third month, and is rarely seen by five months of age. At this point all young tamarins are foraging successfully by themselves and giving food-associated calls similar to those of adults. Humle and Snowdon (2008) tested juvenile cotton-top tamarins seven months and older on a novel foraging task. Two opaque tubes with a food container suspended inside each tube were introduced first to the parents, and each parent was trained on a different method of solution. One solution was to walk along a branch and reach up into a tube to obtain food. The other solution was to hang suspended from the ceiling and pull up the food container hand over hand. Once the adults were well-trained, a juvenile was introduced. Even though food sharing and infant forms of food calling had not been observed for more than two months, the adults again began to give infant food calls and shared with the juveniles, but they only did this in the presence of the novel task and not on control days when food was present in a food dish. However, as soon as the juvenile was successful in obtaining food from the apparatus, the adult model stopped vocalizing and no longer engaged in food sharing. This is clear evidence that adult tamarins are sensitive to the changes in the learner’s behavior and are adjusting their own behavior.

Parallel results have been reported in both captive and field studies of golden lion tamarins. Captive golden lion tamarins are more likely to share novel or difficult-to-process foods with infants (Rapaport 1999), and in the wild, where young tamarins have difficulty catching insect prey, adults successively withhold assistance from juveniles as their insect-catching skills improve (Rapaport 2006; Rapaport and Ruiz-Miranda 2006). In both golden lion tamarins and cotton-top top tamarins, adults have been observed calling near a prey source or assisting a young animal in obtaining food. This scaffolding behavior is a mark of human teaching, and its presence in tamarins contrasts sharply with the absence of any coaching or scaffolding behavior in chimpanzees, even when young individuals are feeding on potentially painful biting ants (Humle et al. 2009). However, despite the evidence for adults appearing to be sensitive to the abilities of young animals in cotton-top tamarins and lion tamarins, research on common marmosets did not show evidence of such sensitivity (Brown et al. 2005).

6.5.3 Summary: Vocalizations in Social Learning and Teaching

Vocal signals play an important role in both social learning and in teaching behavior in tamarins, and one is tempted to argue that such communication may be responsible for facilitating the rapid social learning seen in these species and absent in capuchin monkeys and chimpanzees. However, this is a hypothesis that needs to be tested closely in other family-living species as well as nonhuman primates with other forms of social organization. Most researchers on social learning have not been interested in the role of communication, but this may prove to be important.

6.6 Vocal Development

As noted in Section 6.1, it is commonly thought that vocal structures are innate in primates with little or no developmental modification. However, family-living primates appear to demonstrate a greater influence of social and environmental factors on vocal structure than has been seen in other nonhuman primates. This section first reviews various models and methods of studying vocal development followed by information about babbling and consideration of some naturalistic and experimental studies that suggest that vocal development of family-living primates is sensitive to social and environmental factors. Section 6.7 then examines plasticity in adult vocal structure and usage.

6.6.1 Models and Methods of Vocal Development

Three aspects of the development of vocal communication can be distinguished: (1) signal structure; (2) appropriate usage; and (3) comprehension of signals. Each of these may be subjected to different developmental processes. Four models can be used to explain developmental processes in vocal communication. These include (1) innate or genetic determination, whereby signal structure, usage, or comprehension are fixed at birth; (2) maturation, whereby signal structure, usage, or comprehension changes as a function of physical or social maturation but without any explicit learning process; (3) limited learning, whereby only certain aspects of signal structure, usage, or comprehension can be developed and only during a limited period in development; and (4) open-ended learning where structure, usage, or comprehension can be modified throughout an animal’s life span.

It is generally accepted that nonhuman primates display developmental flexibility in the usage and comprehension of signals, but vocal structures are innate and not susceptible to modification by experience (Seyfarth and Cheney 1997). Janik and Slater (1997, 2000) have argued that evidence of vocal learning requires that an animal be able to acquire vocalizations from outside their natural species-specific repertoire. They further state that only songbirds and a few other genera of birds, cetaceans, bats, and humans show this ability, whereas no nonhuman primates do. This view has been reinforced by early studies of squirrel monkeys (Saimiri sciureus) and rhesus macaques (Macaca mulatta) that were reared in isolation. The isolate-reared squirrel monkeys had a normal adult vocal repertoire and responded with appropriate vocalizations in the proper contexts (i.e., giving alarm calls to predators never seen before) (Winter et al. 1973; Herzog and Hopf 1983, 1984). Similarly, isolate-reared rhesus macaques showed only minor perturbations in the structure of their coo vocalizations (Newman and Symmes 1974). When isolate-reared rhesus macaques were tested in a situation where one animal saw a stimulus that indicated a shock and a second animal could only see the facial expression of the monkey seeing the stimulus but had to respond to save both animals from getting shocked, the isolate-reared animals were effective communicators, but they could not “read” the signals of another monkey when they had to respond (Miller 1967). This suggests that, whereas the production of the signal and its use in an appropriate context were not affected by isolate rearing, the comprehension of the signal was impaired.

Isolate rearing of nonhuman primates is not ethically acceptable today, but cross-fostering and hybridization are two less invasive methods. In a study that cross-fostered rhesus and Japanese macaques with mothers of the opposite species, there was no evidence that the cross-fostered infants acquired the vocalizations of its foster species, but the foster mothers rapidly learned to respond appropriately to the calls of the foster infant (Owren et al. 1993). Hybridization between two species of squirrel monkeys found that the hybrid offspring tended to acquire the call characteristics of their mothers (Newman and Symmes 1982). However, in the wild, male squirrel monkeys are typically excluded from the group after mating, so it is possible that infant squirrel monkeys normally learn call structure from their mothers. Two studies on hybrid gibbon infants found that the calls of infants did not resemble those of either parent and, in some cases, contained aspects of the vocal structure of unrelated species. The mechanisms of vocal development in gibbons are complex and not easily related either to direct inheritance from one or both parents or to vocal learning from parents (Geissmann 1984; Tenaza 1985).

However, with the exception of the gibbons, none of these species reviewed so far are family living. Would developmental processes be different in family-living species? There are two types of examples: the spontaneous babbling-like behavior of pygmy marmosets (Cebuella pygmaea) and the naturalistic study of vocal development combined with some experimental manipulations in pygmy marmosets, common marmosets, and cotton-top tamarins. Little is known about other family-living species, and this material is reviewed in the final section.

6.6.2 Babbling-Like Behavior

From the first two weeks of life, young pygmy marmosets engage in long vocal bouts that contain a variety of call types (Elowson et al. 1998). These bouts share many characteristics with the babbling behavior of human infants. The majority of the calls produced was similar to adult calls and, indeed, represent a subset of adult calls. The calls (e.g., alarm calls, food calls, contact calls, etc.) are given out of context, given in a haphazard order, and often repeated several times with no relationship to the normal adult context for calls. Finally, adults respond to calling infants by approaching them and making physical contact. The main difference in comparison to human babbling is that the pygmy marmosets do not have a phonetic structure; thus babbling consists of calls rather than phonemes. Often the subsong and plastic song of songbirds is treated as a parallel to human babbling behavior (Marler 1970), but there are some fundamental differences. Song is typically produced only by male birds and subsong and plastic song appear only as birds undergo puberty. In contrast, pygmy marmoset babbling begins in infancy and is seen equally in both sexes.

What are the consequences of babbling? Snowdon and Elowson (2001) reported that greater babbling early in infancy led to improved vocal production and a greater number of adult-like vocalizations after weaning. However, vocal development was not completed at weaning. The most commonly used adult call is the trill, and marmosets continued to improve on the production of adult trills throughout puberty and adolescence, reaching adult-like trill structure only as breeding adults, much like the food-associated calls of cotton-top tamarins (see below). Interestingly, submissive adult marmosets regress to babbling behavior during aggressive encounters, implying a plasticity of usage of infant vocalizations.

6.6.3 Naturalistic and Noninvasive Experimental Approaches

Studies of cotton-top tamarins found some plasticity in vocal development. In a feeding context, when adults gave specific food-associated calls as approaching and leaving food (Elowson et al. 1991), infant and juvenile tamarins produced calls that did not match adult structure and were considerably more variable. These young animals also produced other vocalizations (not heard from adults) in feeding contexts (Roush and Snowdon 1994). Curiously, there was no developmental progression toward the production of adult-like vocalizations in this context, even in animals that were past puberty. In an experimental study, Roush and Snowdon (1999) recorded feeding vocalizations in cotton-top tamarins while living in family groups and after they were paired with a mate and separated from their natal families. There was a rapid (within 2–3 weeks) change in feeding vocalizations, including the elimination of the other calls and development of a clear adult structure for the food calls. This suggests that social context may serve as a constraint on adult vocal production. As tamarins are cooperative breeders, in which only the adult pair reproduce and other group members act as nonreproductive helpers, it may be that young animals inhibit their adult usage of calls until they become reproductively active themselves.

Cotton-top tamarins produce eight chirp-like vocalizations (short, high-pitched, frequency-modulated calls, see Fig. 6.1) with each chirp type being used in a discrete context (e.g. feeding, mobbing, alarming, responding to a stranger’s call, and responding to a group member) (Cleveland and Snowdon 1982). Castro and Snowdon (2000) carried out an experimental study of how infant tamarins used these calls. Adult tamarins used the appropriate chirp type in each of the different contexts. Infants, unlike adults, typically did not produce discrete chirps but instead produced a sequence of chirps with descending frequency. Over the period of infant dependence through weaning, each of the infants tested produced some of the chirp types in an appropriate context, but no one individual produced all of the chirp types and no experimental context elicited an appropriate chirp type from each infant. These results suggest a relatively slow process of development and show that young tamarins are not able to produce adult calls at birth, in marked contrast to non-family-living squirrel monkeys. Although cotton-top tamarins did not show the babbling-like behavior seen in pygmy marmosets, they did show great variation in chirp structure and only rarely produced adult-like calls. If there are innate templates for vocal structures, they need to be shaped and sharpened through experience.

Elowson et al. (1992) recorded pygmy marmoset trills throughout ontogeny and found that trills changed during the course of development, suggesting they are not produced in adult-like ways at birth. Given that maturational processes are involved in development, all animals should show a similar pattern of vocal development. However, young marmosets, even twins within a litter, showed different patterns of trill development that were not consistent with a simple maturational model. Evidence of adult plasticity in vocal production and usage (presented in the next section) suggests that marmosets and tamarins can adjust vocal production throughout their lives.

A study of common marmosets shows quite elegantly that adult caregivers play an important role in shaping the vocal development of their offspring. Takahashi et al. (2015) studied the development of the phee call, a frequent call given when marmosets are separated from one another. They found that the calls became more stereotyped over the first two months with increased duration, decreased central frequency, and decreased entropy. Four discrete clusters of calls were seen in neonates, but these were reduced to one or two clusters by two months of age. At first glance this may seem to support a simple maturational model of vocal development. However, changes in phee quality were not correlated with age, body weight, or physiological development of the respiratory system. Takahashi et al. (2015) recorded infants both when alone and when in vocal contact with one of their parents. Parents generally respond to infant calls with well-formed adult phees. Rates of parental responsiveness to infants correlated directly with the age at which infants began producing well-formed phees of their own, suggesting that parental responsiveness to infant cries directly influences an infant’s trajectory toward an adult call. Although studies of babbling in pygmy marmosets showed a higher rate of adult social interaction with babbling versus nonbabbling infants (Elowson et al. 1998), this is the first experimental demonstration of parental influence on vocal development in any nonhuman primate. However, there are clear parallels to vocal development in other taxa, including birds and humans (West and King 1988; Goldstein et al. 2003).

6.6.4 Vocal Development in Other Family-living Species

In hybrid gibbons, the song structure was complicated with few direct structural features inherited or learned from parents (Geissmann 1984; Geissmann and Orgeldinger 2000). However, Merker and Cox (1999), studying a single female gibbon, reported that vocal development was a slow process with different components of female great call structure appearing at different ages, much like the relatively slow development reported for marmosets and tamarins. There was also increased coordination of the infant’s calling with that of its mother as the infant grew older, suggesting that the mother may serve as a model. Further support for mothers serving as models for gibbon vocal development comes from Koda et al. (2013) who found acoustic matching of songs between mothers and daughters. Mothers adjusted their songs to be more stereotyped when co-singing with daughters, especially with daughters who co-sang less. Thus, for female gibbons at least, there appears to be a form of coaching behavior that may serve like the contingent responding in marmosets to shape vocal development in the young.

6.6.5 Summary: Vocal Development

In contrast to the general view that primate vocal structures are innate and not modified through learning processes, the data from family-living primates clearly show that development of adult vocal structures is a gradual process that cannot be attributed solely to maturation. Social variables, such as contingent responding by adults to infant babbling in pygmy marmosets, in response to infant cries by common marmosets and coaching songs by gibbons, can influence the rapidity of acquisition of adult-like calling. At the same time the suppression of breeding in adult helpers, inherent in the structure of cooperative breeding, may also inhibit the expression of some adult-like vocalizations until animals achieve breeding status. There are several parallels between development in family-living primates and that of humans that have not yet been reported in species with other breeding systems. Does this plasticity seen in young animals carry over into adult vocal production?

6.7 Flexible Adult Vocal Structure and Usage

Another characteristic of family-living primates is that vocal communication can be used flexibly by adults, with evidence for change in structure and usage in different social and environmental contexts. This is especially evident in four areas: (1) adjustment and convergence of vocal structures with pair or group formation (Sect. 6.7.1); (2) population specific dialects (Sect. 6.7.2); (3) structural change in response to environmental noise (Sect. 6.7.3); and (4) novel responses to captive environments (Sect. 6.7.4).

6.7.1 Modification and Convergence of Calls with Pair Formation

In a wide array of species, ranging from birds through dolphins to humans, there is evidence of vocal convergence with preferred social partners (Snowdon and Hausberger 1997), but there has been little evidence of vocal change in nonhuman primates. The primary examples again occur among family-living primates. Elowson and Snowdon (1994) recorded calls from two different colonies of pygmy marmosets and subsequently combined the colonies. Within 10 weeks of housing the colonies together, adult and subadult members of both colonies showed an increase in bandwidth of trill calls as well as an increase in pitch. There is no obvious reason for calls to change in this way, but the results demonstrate vocal flexibility in response to a changed social environment. In a parallel study on Weid’s black-tufted-eared marmosets, Ruckstalis et al. (2003) recorded phee calls in marmosets under stable social conditions and reported strong individual differences in call structure. Subsequently, some of the animals remained in the same colony room, but others were moved to a different colony room with unfamiliar conspecifics. When phees were recorded eight weeks later, phee calls of the marmosets in the stable social condition could still be identified, whereas those with changed social environments also exhibited changes in their individual call structure.

Another study by Snowdon and Elowson (1999) examined trill structure of pygmy marmosets while animals were living in their natal family groups and then paired each individual with an unfamiliar mate and tracked trill structure for the first six weeks after pairing. Some pairs were followed for up to three years. In every case where the calls of individual monkeys were distinct before pairing, there was a convergence in call structure to a common “pair trill” within the first six weeks of pairing. Although there were changes in call structure over the course of three years, the similarity in call structure between pair mates remained. Jorgenson and French (1998) also noted that there were clear individual differences in marmoset call structure within a year of pairing, but over the course of three years, the individually distinct signatures changed. Although they could not identify any specific cause of the vocal change, the implications of these studies are clear: marmosets are able to change their individual signatures in response to changes in their social environments, and as a consequence of this, listeners must also be able to track these changes perceptually in order to maintain individual relationships.

There is less evidence concerning vocal convergence in other family-living species. In both coppery titi monkeys (Callicebus cupreus) and siamangs (Hylobates syndactylus), adult mates alternate in producing duets. Although newly formed pairs appear to engage in duets with their partners soon after pairing, they do not match the duetting ability of long-term pairs. In the case of coppery titi monkeys, the phrases are much more variable in nearly all acoustic features of the duets (Müller and Anzenberger 2002), whereas in siamangs the pair may take several months to reach the level of coordination and pair specificity seen in long-term pairs (Geissmann and Orgeldinger 2000). In gibbons, several variables appear to influence singing. Although general rewards such as food and water have no effect on vocalizations, social influences, such as a new mate, the maturation of offspring attempting to sing themselves, the ability to adjust to “mistakes” in calling by others, and the presence or absence of familiar or unfamiliar members of other groups, can influence singing patterns in gibbons (Haraway and Maples 1998). As with marmosets and tamarins, we see evidence of adult vocal flexibility in response to social change in other family-living species.

6.7.2 Population-Specific Dialects

Among the best evidence for song learning in birds is when different populations exhibit vocal dialects (e.g., Marler and Tamura 1964). The apparent lack of population-specific calls in nonhuman primates has been another factor in arguing against environmental influences on vocal development. An early study hypothesized the existence of dialects in the food calls of provisioned Japanese macaques (Green 1975), although the differences may have resulted from humans rewarding variant calls with food. Dialects have been described in different populations of squirrel monkeys (Saimiri sp.) (Newman and Symmes 1982), and members of each population responded preferentially to playback of infant separation calls from those in the same population but were indifferent to infant calls from the other population (Snowdon et al. 1985). However, genetic analyses have revealed that these phenotypically distinct populations are actually separate species (rather than subspecies). Subspecies-typical calls have been reported for wild saddleback tamarins (Saguinus fuscicollis) in Peru (Hodun et al. 1981), although the pelage differences are quite pronounced, again raising the question of whether these should be considered as subspecies or separate species.

The clearest data on population differences in vocalizations come from different populations of pygmy marmosets in Ecuador. De la Torre and Snowdon (2009) analyzed the structure of trill and J-call vocalizations from five populations. After accounting for individual and pair-specific differences, they showed that there were acoustic differences that differentiated each population from the other for both call types (see Fig. 6.3). Measurements of the spectrum of ambient noise and call playbacks with re-recording at different distances showed different patterns of ambient noise and reverberation in the local habitat of each population. However, the differences in habitat acoustics did not predict the call structures found in each habitat (de la Torre and Snowdon 2009). Preliminary evidence of genetic variability (de la Torre, personal communication) provides no evidence for any genetic diversity in parallel with the vocal diversity. Given the results on vocal flexibility in captive marmosets (described above), the most parsimonious interpretation of the results is that social learning or socially induced plasticity is responsible for the dialects.

Fig. 6.3
figure 3

Dialect differences in wild pygmy marmoset trill (A) and J-calls (B). Calls are from the La Hormiga population (left) and the Amazoonica population (right). (Modified from de la Torre and Snowdon 2009)

Maeda and Masataka (1987) also reported the presence of dialects in the long calls of two populations of the red-bellied tamarin (Saguinus labiatus) in Bolivia with peak frequency and the range of frequency modulation differing between the populations. Although they did not evaluate possible effects of habitat acoustics on call structure, it seems unlikely that forest vegetation differed significantly between the two populations. In a follow-up study, Masataka (1988) played back calls to captive monkeys of each population and reported that females responded selectively to the long calls of males from their own population. However, females showed no difference in response to female long calls, and males failed to differentiate between male and female calls of their own population versus the other population. This suggests that either differentiation between populations was not yet well-developed or that there was no functional significance to the dialect differences, at least for males.

An interesting experimental study played back the affiliative calls of common marmosets to simulate amicable neighbors (Watson et al. 2014). Over the course of the playback, the listeners demonstrated increased affiliation, decreased aggression, and decreased anxious behavior. Although the behavioral changes did not continue after the playbacks ended, the study does suggest how vocalizations might vary between groups and lead to distinct cultural styles.

6.7.3 Structural Change in Response to Ambient Noise

Environmental noise can have an important influence on vocal signals (Brown and Waser, Chap. 4). In the natural environments of pygmy marmosets in Peru and Ecuador, the principal frequency of marmoset calls was above the spectral range of the majority of ambient noise (mainly from birds, frogs, and insects) (Snowdon and Hodun 1981; de la Torre and Snowdon 2002). These calls were also above the hearing range of many birds that might be predators, and natural selection appears to have influenced the frequency range of calls in this species.

But are nonhuman primates capable of responding to short-term changes in the acoustic environment? One common response seen in humans and some Old World primates is the Lombard effect, an increase in vocal amplitude with increased ambient noise. Common marmosets showed evidence of the Lombard effect, increasing the amplitude of their twitter calls with increasing amplitude of ambient noise and increasing the duration of individual units within their twitter calls (Brumm et al. 2004). Similar results were found with cotton-top tamarins (Egnor and Hauser 2006), which also adjusted the timing of their calls amidst bursts of white noise to call during the silent periods (Egnor et al. 2007). Using a different methodology involving presentation of a burst of white noise in the middle of an on-going long call in cotton-top tamarins, Miller et al. (2003) found that the noise would interrupt the production of long calls, with the call terminating after completion of the syllable that was interrupted. This led the authors to conclude that the long call was not organized as a complete call, either cognitively or with respect to motor pattern, but rather the syllables of the call were formed discretely, suggestive of grammar.

In an extension of this paradigm, Egnor et al. (2006) found that white noise bursts during long call production lead to shorter notes and calls with higher amplitude and longer interpulse intervals, consistent with both the Lombard effect and the idea that tamarins can adjust their calling in a flexible way to environmental noise. Roy et al. (2011) played noise bursts that varied in duration and predictability to common marmosets. They found that the monkeys initiated calling in silent intervals under all conditions, suggesting vocal control with respect to noise. The Lombard effect, the truncation of a call in response to a burst of white noise, and the ability to initiate calls during quiet periods imply that marmosets and tamarins must have some degree of control over the structure of their vocalizations. Vocalizations are not simply due to fixed motor patterns.

6.7.4 Responses in Novel Environments

Although the structure, usage, and understanding of vocal signals by conspecifics have been shaped by natural selection in wild populations, many nonhuman primates are faced with novel environments either through captivity in zoological parks and research institutions or through increasing anthropogenic influences on natural environments. How do nonhuman primates adjust to these novel environments?

A study of pygmy marmosets in Ecuador compared groups living with high levels of anthropogenic noise before and after the capture of one or more animals for the pet trade with another population that experienced little anthropogenic influence (de la Torre et al. 2000). In groups with extensive human exposure, social play was greatly reduced, and the monkeys used higher strata within the forest compared with more isolated groups. After the capture of animals from a group, calling rates were greatly reduced. Duarte et al. (2011) reported that black-tufted marmosets (Callithrix penicillata) living in a park in the middle of the city of Belo Horizonte, Brazil, actively avoided the periphery of the park and more often frequented the central areas away from traffic noise. Although the authors did not record vocal activity, the monkeys may have been minimizing potential masking noise from human activities.

Captivity can be viewed as a novel environment and one can ask whether features of the captive environment can affect vocal communication. Although comparative field studies of captive versus wild populations of pygmy marmosets and cotton-top tamarins have failed to reveal any significant differences in vocal structure or in vocal repertoire, the use of calls in captivity does vary from usage in the wild. This is best illustrated in two examples. In the first example (also see Sect. 6.5), cotton-top tamarins that sampled a familiar, highly preferred food that had been adulterated by invisible white pepper produced alarm calls in this completely novel context (Snowdon and Boe 2003). The second example focused on how captive-born tamarins would react when exposed to cues of natural predators, either a live snake or audio recordings of natural predators. Captive-born tamarins did not give alarm or mobbing calls when exposed to live boa constrictors (Hayes and Snowdon 1990; Campbell and Snowdon 2007), but they did give mobbing calls to a human dressed as a veterinarian and also to a brush used to clean the light fixtures (Campbell and Snowdon 2009). Thus, captive-born monkeys failed to give alarms to a natural predator but did alarm to features of the captive environment. Despite attempting to use several different conditioning paradigms, Campbell and Snowdon (2009) were unable to train captive-born tamarins to fear snakes by associating snakes with conspecific alarm calls. When captive tamarins were played calls of natural predators and harmless sympatric herbivores, they responded to vocalizations that had low-frequency and broadband components whether or not the calls were from a predator or herbivore (Friant et al. 2008). The lack of response to natural predators or calls of natural predators suggests strongly that these captive-born monkeys do not have an innate response to predators but learn to use alarm calls in contexts that are appropriate to their captive environments.

6.7.5 Summary: Vocal Flexibility in Adults

This section has shown that adult family-living monkeys have a great deal of flexibility in vocal production and usage. When new pairs are formed or previously separated colonies are merged, there is evidence of vocal convergence toward a common pair or group structure. In duetting species, the development of a well-coordinated pair song may take several weeks or months and might be an indicator of the state of pair bonding. Whereas evidence of vocal dialects in squirrel monkeys and Japanese macaques is questionable, the presence of dialects in pygmy marmosets is clear and at present cannot be explained by adaptation to habitat acoustics or genetic divergence. The results are most parsimoniously explained as reflecting social learning processes, given the evidence of social influences from captive studies. Marmosets and tamarins are sensitive to their auditory environments and either avoid areas of possible masking, reduce calling when human activities have been disruptive, or adjust call structure by making calls louder or longer, or they interrupt calling. Finally, marmosets and tamarins adjust to captivity as an ecological niche and fail to respond to stimuli from nature, but they can direct alarm calling to novel situations found only in captivity.

6.8 Primates as Psychophysicists

Are monkeys able to adjust their calling according to principles of psychophysics? One problem faced by all species that depend on vocal communication is the accurate localization of sound sources. This is important not only for localizing predators but also for locating conspecifics. Most research on sound localization has involved the two-dimensional space in which humans and other terrestrial animals live, but localization in three dimensions creates additional problems.

An early study of vocalizations in captive pygmy marmosets described several trill-like vocalizations that were sinusoidal frequency-modulated calls that varied in bandwidth, duration, and whether the modulation was continuous or interrupted (Fig. 6.2) (Pola and Snowdon 1975). Three of these trill variants appeared to be used in similar contexts of vocal contact with other group members, but the structural differences between the calls suggested that they contained different cues for sound localization. The softest call, the quiet trill, was short and continuous and had a narrow bandwidth. The closed mouth trill was also continuous but had a larger bandwidth, and the J-call was a series of separate sinusoidal frequency-modulated notes with an even greater bandwidth. Based on principles of sound localization (Thurlow 1971), these three calls represent a continuum that is increasingly locatable.

Since vocal communication is risky in natural environments, one might predict that, ideally, callers would use the most cryptic calls when close to other group members and reserve the calls most easily localized to contexts when group members are widely separated. To test this prediction, two field studies on pygmy marmosets in the Amazon have been completed. When one animal called, the location between the caller and the closest visible conspecific was measured and the distances between animals plotted as a function of the call type recorded (Snowdon and Hodun 1981). With the most cryptic call, the quiet trill, the vast majority of nearest conspecifics was located within 5 m, whereas with the more locatable J-call, the majority of nearest neighbor distances was 10–15 m. In a second study, an additional call, the long call, was added and was primarily used when caller and recipient were more than 16 m apart (de la Torre and Snowdon 2002).

Broadcasts of pygmy marmoset calls were made in the habitat with re-recording done between 1 and 40 m from the speaker (de la Torre and Snowdon 2002). Trills and J-calls were highly distorted at 20 m, and only long calls could be re-recorded with minimal distortion at 40 m. The upper frequency range of each call type was degraded, as predicted by the inverse square law and the excess attenuation found in an arboreal habitat. The reduction of the upper frequency range with increasing distance provides a potential mechanism for ranging distance (Owings and Morton 1998). By using the amount of attenuation in higher frequency components, the listener could compute the distance to the caller. This distance estimating ability may allow those who respond to vocalizations to select from the repertoire of trills and J-calls the one that is most likely to be heard by others while minimizing risk of detection by potential predators.

Thus, pygmy marmosets appear to be good psychophysicists, adjusting the structure of the calls they use to maintain contact with other group members based on how far away the recipient is and how far the call is likely to travel. These results imply another type of vocal flexibility not described in many other primates, the selective use of contact call types depending on the distance from others.

6.9 Chapter Summary and Future Directions

This chapter emphasized vocal communication in family-living primates and illustrated several unique features in these species. Compared with primates with other forms of social organization, family-living primates display less sexual dimorphism in vocal communication and often use vocal signals to maintain spacing and to defend territories and mates. Vocalizations from partners reduce the stress of separation. Nonetheless, these species show similar cognitive aspects of vocal communication as seen in other species with referential signals, syntax, turn taking, and long-term vocal memory. Among the consequences of family living is an increased role for vocal signals in social learning and teaching, and to date, some of the strongest evidence for social influences on learned vocal development arises from research on family-living primates. There is good evidence for vocal flexibility and plasticity throughout adulthood as these primates are able to adjust to anthropogenic noise and the ecological niche of captivity, and they are able to apply calls in novel settings. There is good evidence of population-specific dialects, at least in pygmy marmosets. All of these findings suggest that if one is to understand the evolution of human communication, one needs to look not only at the vocal signaling of our closest relatives but also to consider the role of evolutionary convergence as illustrated by family-living primates.

However, there remain several gaps in the literature that require future research. Most of the work on social influences on vocal development and on the role of vocalization in social learning and teaching has been carried out on marmosets and tamarins, which are cooperative breeders. Will these same findings also be seen in family-living species without cooperative infant care or are these adaptations unique to cooperative breeders? Although duetting and coordinated singing in titi monkeys and gibbons appear to be important in pair bonding as well as territory defense, the current evidence provides stronger support for these calls being used by each sex to defend against same-sex intruders. More studies are needed of these species to see if duetting calls serve to strengthen or form a pair bond and whether calls from a mate can have a stress-reducing effect as seen in marmosets. Research on titi monkeys has shown the physiological and behavioral effects of mate separation, but there appears to be no work on vocal communication. The major work has been carried out in field studies on gibbons and titi monkeys and in captive studies with marmosets and tamarins. Increasing the breadth of research to include captive titi monkeys and gibbons and determining whether results from captive studies are seen in wild populations of marmosets and tamarins would be most welcome.