Introduction

One widespread mechanism of increasing acoustic flexibility in animal communication is to concatenate sounds into more complex vocal sequences. This phenomenon is common in songbirds and other species that produce utterances composed of a series of notes or ‘syllables’ (e.g. Passeri: Catchpole and Slater 1995, quacking frog Crinia georgiana: Gerhardt et al. 2000, Humpback whales: Megaptera novaeangliae: Payne and McVay 1971). Although there have been repeated efforts to compare such animal communication systems with syntax in human language (e.g. Marler 1977), the gulf has remained vast with major differences in terms of generativity and semanticity (Chomsky 1981; Hauser et al. 2002). For example, animal syntax is typically based on elements with little or no independent meaning that could be linked to the organisational principles of the sequence. Moreover, there is no clear evidence for generative use of sound combinations, and as a consequence, the debate on the phylogenetic origins of human language has not yet made much progress (Bickerton and Szathmáry 2009).

However, due to their close phylogenetic proximity to humans, the vocal behaviour of non-human primates is relevant to investigate the evolutionary pathways of human language (Lemasson 2011). The mainstream hypothesis here is that human speech has emerged as an evolutionary derivative of a gesture-based communication system, with a subsequent transition from the visual to the vocal domain (Corballis 2003). One alternative view is that ancestral humans initially relied on a primate-like vocal communication system, perhaps complemented by gestural signals, but then experienced an evolutionary process of gaining increasing motor control over their vocal apparatus, which eventually enabled them to imitate sound patterns and produce arbitrary vocal patterns (e.g. Enard et al. 2002). Social complexity may have favoured this process (Dunbar 1998). One prediction of the vocal transition hypothesis therefore is that enhanced acoustic flexibility should be found, to various degrees, in primate call types that are primarily used while interacting socially.

There are a growing number of primate studies that have demonstrated acoustic flexibility within some of the species-specific (i.e. ‘genetically’ predetermined) call types (Cebuella pygmaea: Elowson and Snowdon 1994; Snowdon and Elowson 1999; Macaca fuscata: Koda et al. 2008; Papio anubis: Ey et al. 2009; Cercopithecus campbelli: Lemasson and Hausberger 2004; Pan troglodytes: Slocombe et al. 2010). A second source of acoustic flexibility is in the form of combinations of existing calls (P. troglodytes: Crockford and Boesch 2005; Pan paniscus: Clay and Zuberbühler 2009; Hylobates lar: Clarke et al. 2006; Colobus guereza: Schel et al. 2009; Cercopithecus nictitans: Arnold and Zuberbühler 2006; C. campbelli: Ouattara et al. 2009a, b; Sanguinus oedipus: Cleveland and Snowdon 1982; Cebus olivaceus: Robinson 1984) with evidence that some of these sequences can be ‘meaningful’ to others (C. nictitans: Arnold and Zuberbühler 2008; Cercopithecus diana: Zuberbühler 2002; C. guereza: Schel et al. 2010; P. paniscus: Clay and Zuberbühler 2011).

One drawback is that studies of call combinations in primates have focused on long-distance communication or calls to predators. For example, male putty-nosed monkeys (C. nictitans) combine two types of loud calls into sequences that reliably predict forthcoming group progression (Arnold and Zuberbühler 2008). Similarly, male Campbell’s monkeys (C. campbelli) transform highly specific alarm calls into general alert calls by an affixation mechanism (Ouattara et al. 2009a) and concatenate individual calls into sequences that are context-specific and related to external events (Ouattara et al. 2009b). However, a largely unaddressed question is whether close-range social calls in primates show similar or even increased flexibility in terms of acoustic properties and sequential structure, as hypothesised by Lemasson and Hausberger (2011).

Many primate species produce short-distance social calls, usually referred to as ‘clear calls’ or ‘contact calls’ (e.g. Uster and Zuberbühler 2001). They tend to be amongst the most frequently emitted calls of the vocal repertoire and can encode information on the caller’s identity, social affinities, or spatial positioning (Harcourt and Stewart 1996; Gautier-Hion 1988; Lemasson and Hausberger 2004, 2011). For example, Seyfarth and Cheney (1984) showed that vervet monkeys give acoustically distinct grunts in different social contexts, such as when approaching a dominant or subordinate group member, and that these acoustic differences are ‘meaningful’ to conspecifics. In terms of acoustic flexibility, various studies have found subtle contact call subtypes, and in some cases, there is evidence for semantic content [e.g. Japanese macaque ‘coo’ calls: Green 1975; Pygmy marmoset ‘trill’ calls: Pola and Snowdon 1975; Baboon ‘grunts’: Owren et al. 1997; Campbell’s monkey ‘CH’ calls: Lemasson et al. 2004; Lemasson and Hausberger 2011; review by Snowdon (2009)]. Further evidence for socially determined acoustic flexibility is in the form of converging acoustic structure of contact calls between affiliated females (Pygmy marmosets: Snowdon and Elowson 1999; Campbell’s monkeys: Lemasson and Hausberger 2004, Lemasson et al. 2005). Here, we define ‘social’ calls broadly as vocalisations to communicate with other group members over short distances in non-predatory contexts.

To address this possibility that primate social calls also have combinatorial properties, we carried out a study on wild Diana monkeys, Cercopithecus diana diana, a guenon species closely related to Campbell’s and putty-nosed monkeys (Gautier 1988). Although Diana monkeys’ alarm calls have been extensively studied (Zuberbühler et al. 1997, 1999; Zuberbühler 2000a, b), little attention has been paid to females’ other types of vocalisations (Gautier 1988; Hill 1994; Zuberbühler et al. 1997; Uster and Zuberbühler 2001). This was partly due to the difficulties in identifying and describing these animals’ behaviour in detail, because they spend much of their time in the upper forest canopy (McGraw 2007). Unlike savannah-dwelling primates, forest guenons are often out of sight from each other. Social interactions are much less common because they spend more effort monitoring each other’s behaviour and adjusting their own spatial position accordingly (Rowell and Olson 1983; Rowell 1988). Instead, guenons typically emit social calls to overcome the constraints of poor visibility in the forest and maintain group cohesion (e.g. Gautier and Gautier 1977; Uster and Zuberbühler 2001). Calling tends to be contagious, and call rates are increased when visibility is poor. Still, the specific contexts of emission of these social calls remain unknown. It is hence both interesting and challenging to try and better understand these females’ social communicative system.

We were interested in the influence of social and environmental factors on the acoustic structure of female Diana monkey’s vocalisations at several organisational levels of their repertoire. Given the complexity of their alarm calling system and the importance of indirect social interactions via vocal communication, we hypothesised that their social calls contained similar or even greater levels of acoustic diversity in relation to contextual variables.

Methods

Study site and subjects

Data were collected from February to May 2009 and from January to June 2010 from two groups (DIA1 and DIA2) of free-ranging Diana monkeys (C. diana diana) in Taï National Park, Ivory Coast. The study area is located in the south-western part of the park, adjacent to the CRE (Centre de Recherche en Ecologie) research station (5°50′N, 7°21′W). Both groups had been under observation since the early 1990s and were fully habituated to the presence of human observers. Both groups consisted of about 20–25 individuals, including one adult male, 9–10 adult females (individuals with visible nipples and at least one offspring), several sub-adults, juveniles and infants.

Data collection

DIA1 and DIA2 groups were followed alternatively. Data were collected between 07:30 and 17:00 h GMT. Every 30 min, a scan sample (Altmann 1974) was taken on a number of variables that, according to previous studies, had the potential to influence the monkeys’ vocal behaviour (Ouattara et al. 2009a). Specifically, we scored the location of the group within its territory (using a map and a grid system), the degree of group scattering, the group’s main activity, general luminosity and the presence of a neighbouring Diana monkey group (Table 1).

Table 1 Definition of the scan and focal variables

Between scans, adult females were monitored alternatively following a 10-min focal animal sampling procedure (Altmann 1974). We systematically described the female’s behaviour, according to the behavioural categories described in Table 1. Efforts were made to equalise the amount of observation effort for each female.

Recordings were made 5–25 m from the focal female (depending on her elevation in the canopy) with a Sennheiser K6/ME66 directional microphone and a Marantz PMD660 solid-state recorder (sampling rate, 44.1 kHz; resolution, 16 bits). The observer (AC) complemented her observations with a running commentary on the behaviour of focal individuals, recorded with a Lavallier microphone to the recorder’s second channel and later transcribed.

Acoustic analyses

Spectrograms were generated with RAVEN 1.3 software (Cornell Laboratory of Ornithology, Ithaca, New York). Poor-quality recordings were discarded (3.7%). From the remaining sample, we first categorised the recordings according to the main call types, following visual and auditory assessments and taking into account previous findings from work on Campbell’s monkeys’ vocal behaviour (Lemasson and Hausberger 2011; Fig. 1a). We then validated our classification with a basic acoustic analysis of call structure conducted on a subset of calls from the same females to control for individual differences (Fig. 1b; Table 2). It was based on total duration, minimum fundamental frequency (F0min) and maximum fundamental frequency (F0max). We also took a number of measurements that were more suited to some call types, such as amplitude and duration of frequency modulation in trilled calls and the number of units and duration of the first unit in the multi-unit calls.

Fig. 1
figure 1

Spectrographic representation of the calls and acoustic parameters measured. Spectrograms were produced using a Hanning window function; filter bandwidth, 124 Hz; frequency resolution, 86.1 Hz; grid time resolution, 5.80 ms. a Calls were classified according to two criteria: in columns, the presence and type of arched frequency modulation, and in lines, the presence and type of introductory unit. b Shows the total duration (D), minimum (F0 min) and maximum (F0max) of fundamental frequency measured

Table 2 Acoustic parameters

Contextual analyses

Our goal was to investigate the link between a given call type and its context of emission. Consequently, behaviours not associated with a vocalisation by the focal individual were not further considered. The influence of context on call production was investigated at two levels. ‘General’ context was based on data collected during scan sampling, while ‘immediate’ context was based on data collected during focal animal sampling. Continuous observations from focal sampling were divided into 30-s intervals to determine which of the ten aforementioned behavioural categories were produced by the focal animal when calling (see Lemasson et al. 2004). Our prospective analysis on detailed behavioural categories showed trends that brought us to lump the different behaviours into more general biologically relevant categories, as follows: (a) socio-positive or relaxed situations (‘resting’, ‘foraging’, ‘feeding’ and ‘positive social interaction’), (b) neutral situations (‘scanning’, ‘walking’ and ‘neutral social interaction’) or (c) socio-negative or potentially dangerous situations (‘jumping’, ‘negative social interaction’ and ‘vigilance’).

In Diana monkeys, social calls typically trigger a vocal response by another group member within a few seconds (>60% of cases; Uster and Zuberbühler 2001). We thus counted the number of calls emitted 3 s prior and after a focal animal’s call to determine whether the call was (a) isolated (no other call 3 s before nor after), (b) exchanged (1–3 other calls separated by a less than 3 s, with no call overlap: see Lemasson et al. 2010) or (c) chorused (at least 4 other calls with overlapping).

Statistical analyses

To test for morphological differences between the call types, we performed a discriminant function analysis (DFA) based on the three basic acoustic variables that were measurable on every call type: total duration and the minimum and the maximum fundamental frequency. To control for individual differences, we used the same number of calls per call type from each female. The classification results were based on equal probabilities of class (call type) membership. After generating the discriminant function, we used the leave-one-out classification procedure to verify our subjective classification. In this cross-validation procedure, each call is classified by the functions derived from all other calls. The ideal procedure to investigate the influence of context on call structure would have been to conduct a multivariate analysis including all possible contexts of emission. Unfortunately, this was not possible due to insufficient sample size. Instead, we conducted separate tests for each contextual variable while avoiding multiple comparisons on the same data set. The relations between call types and context of emission were examined at the individual level, except for rare call types where small sample size precluded this level of analysis. Although less rigorous, we decided to carry out analyses at the level because this provided us with a crucial basis for comparisons with combined calls. We performed G tests of independence on contingency tables of call types versus contextual categories to assess which associations were the strongest (see Bouchet et al. 2010). When the expected values were small, we corrected the G statistics for continuity, according to Williams (1976). For the analyses at the individual level, all females were included, provided we had recordings of their calls in the respective context, and subjected to Wilcoxon signed-rank tests. Statistical analyses were performed using SPSS 17.0 software. All tests were two-tailed, and significance was set at α = 0.05.

Results

Acoustic morphology analysis

Call types

A total of N = 2,129 vocalisations were collected during 58 h of focal sampling. We found four different call types referred to as ‘H’ (high-pitched trilled calls), ‘L’ (low-pitched trilled calls), ‘R’ (repeated-unit calls) and ‘A’ (arched frequency modulation calls). ‘H’ calls were continuous high-pitched quavered structures with a descending frequency modulation ranging from 1,237 ± 616 to 358 ± 87 Hz (Table 2). ‘L’ calls were continuous low-pitched quavered structures with a general ascending frequency modulation ranging from 247 ± 84 to 664 ± 354 Hz. Importantly, ‘H’ and ‘L’ calls were structurally discrete, not variants of a graded continuum. Although both types of call structure were trilled, we found no intermediate forms, suggesting they were separate types. ‘R’ calls were composed of one to four brief (25–34 ms) generally atonal sounds, separated by short (40–57 ms) periods of silence. ‘A’ calls were characterised by a tonal arched-shape frequency modulation of 3,047 ± 774 Hz. We were able to distinguish two subtypes of ‘A’ call, based on whether the arch was continuous (‘Af’: full arch) or broken (‘Ab’: broken arch).

Three acoustic parameters (D, F0min and F0max) were sufficient to discriminate significantly between the four call types (DFA: Wilk’s λ = 0.111, χ2 = 707.295, Df = 6, P < 0.001, Fig. 2). The discriminant analysis derived three functions (one less than the number of categories) with the first accounting for 84.7% of the variance and the second for an additional 15.3%. The success rate of classification was higher than expected from a random assignment, both in the original (88.9%, N = 323) and in the leave-one-out cross-validation procedure (88.0%). In addition, ‘Ab’ subtypes differed from ‘Af’ subtypes by the presence of a long silence gap in the arched modulation, representing on average 37% of the total duration (mean ± SD = 114 ± 65 ms; N = 119 calls from 6 females; range, 87–142 ms).

Fig. 2
figure 2

Results of the discriminant function analysis

We further confirmed the generality of our classification by showing that each type and subtype was present in at least two adult females of both habituated groups (Table 3).

Table 3 Female’s individual vocal repertoires

Call combinations

Our results showed that females could produce four call types (‘H’, ‘L’, ‘R’ and ‘A’) either alone or combined in the following three ways (Fig. 1a). We found combinations of ‘H’ and ‘A’ calls (‘HA’ combination), ‘L’ and ‘A’ calls (‘LA’ combinations) and ‘R’ and ‘A’ calls (‘RA’ combinations), with either full (‘Af’) or broken (‘Ab’) arched components. Although other combinations would have been possible, we did not find them. Instead, combined calls were always introduced by ‘H’, ‘L’ or ‘R’ call type followed by one of the two arched call subtypes. The most common utterances were uncombined ‘A’ calls and ‘LA’ combinations (respectively 17 calls per hour and almost 20 calls per hour), while all other structures were much more rare (‘RA’: 2.7 calls per hour; ‘H’: 1.3 calls per hour; ‘HA’, ‘L’ and ‘R’: less than 1 call per hour; Table 3).

Contextual analyses

Call types

Call types could be discriminated by their context of emission. ‘H’ calls were significantly associated with high mobility, high spatial cohesion, being outside of the territory, high luminosity and the presence of neighbours (G tests of independence, Table 4). ‘H’ calls were also significantly associated with socio-positive or relaxed situations and were often uttered in isolation. ‘L’ calls were significantly associated with high mobility, low spatial cohesion, being in the center of the territory, high luminosity and vocal chorusing (Table 4). ‘R’ calls were significantly associated with being in the center of the territory, high spatial cohesion, low luminosity and socio-negative situations. ‘R’ calls were uttered mainly in isolation of other vocal behaviour. ‘A’ calls finally were associated with group resting, being in the core area of the territory, low spatial cohesion, low luminosity, neutral situations and vocal exchanges (Table 4). Although ‘L’ was the only type to show no association with an immediate non-vocal context, it was significantly different from the ‘R’ type (‘R’ associated with socio-negative situations and ‘L’ with neutral situations, G test of independence, G = 8.9, Df = 2, P = 0.0115), while it did not differ significantly from ‘H’ or ‘A’ type (G tests of independence, G = 2.2, Df = 2, P = 0.3357 and G = 2.2, Df = 2, P = 0.3368, respectively). In sum, each call type had a particular contextual profile. Specifically, ‘A’ call type was contextually more neutral than the other calls and was the only type preferentially used during vocal exchanges.

Table 4 Contextual analyses of call types

Arched call subtypes

The arched call type ‘A’ occupies a key position in the vocal repertoire of female Diana monkeys (>95% of all calls; Table 3) with the two subtypes ‘Af’ and ‘Ab’ differing in contextual use. The ‘Af’ subtype was emitted significantly more frequently than the ‘Ab’ if neighbours were nearby (Wilcoxon two-tailed test: N = 14, Z = 2.229, P exact = 0.026), the luminosity was low (N = 14, Z = 2.103, P exact = 0.035), the caller jumped (N = 15, Z = 2.045, P exact = 0.041) or was engaged in an agonistic interaction (N = 15, Z = 2.032, P exact = 0.047). The ‘Af’ subtype was also significantly more frequent than the ‘Ab’ during choruses (N = 15, Z = 2.480, P exact = 0.01). Conversely, the ‘Ab’ subtype was more frequent when the neighbours were absent (Wilcoxon two-tailed test: N = 14, Z = 2.229, P exact = 0.026) and when the caller was resting (N = 15, Z = 2.556, P exact = 0.008). ‘Ab’ subtypes were also more frequent, though not significantly, when calls were uttered in isolation (N = 15, Z = 1.915, P exact = 0.058). Table 5 summarises the main effects. In sum, there were significant differences in the contextual use of the two arched subtypes, with ‘Af’ subtype preferentially used in situations when providing identity cues was important.

Table 5 Contextual profiles of arched calls depending on the introduction (‘LA’ vs. ‘RA’) or the subtype of arch (‘Ab’ vs. ‘Af’)

Call combinations

Both ‘L’ and ‘R’ calls were found in combination with ‘A’ calls (i.e. ‘LA’ and ‘RA’ combinations), depending on the context of emission. ‘LA’ combinations were emitted significantly more often than ‘RA’ combinations when the group was foraging (Wilcoxon two-tailed test, N = 15 females, Z = 2.954, P exact = 0.002), during call exchanges (N = 15 females, Z = 2.124, P exact = 0.001), when the caller was resting (N = 15, Z = 2.271, P exact = 0.021), involved in a friendly social interaction (N = 15, Z = 2.201, P exact = 0.031) and more generally during positive situations (N = 15, Z = 1.978, P exact = 0.047). ‘LA’ combinations were more frequent, though not significantly, when the groups were at the periphery of their territory (Wilcoxon two-tailed test, N = 10, Z = 1.955, P exact = 0.055) and when individuals were scattered (N = 15, Z = 1,867, P exact = 0.067). Conversely, ‘RA’ combinations were uttered significantly more in isolation than ‘LA’ calls (N = 15, Z = 2.354, P exact = 0.017) and were more frequent, though not significantly, when the group was not scattered (N = 15, Z = 1.956, P exact = 0.054). ‘HA’ combinations also existed but were too rare to be included in this analysis. Table 5 summarises the main effects obtained when conducting the analysis at the individual level. Interestingly, at the population level, ‘LA’ combinations were still significantly associated with positive situations, while ‘RA’ combinations were significantly associated with negative situations (G test, G = 13.5, Df = 2, P exact = 0.0012). In sum, there were significant differences in the contextual use of ‘LA’ and ‘RA’ call combinations.

Discussion

We carried out an observational study to investigate the levels of flexibility in female Diana monkey’s social calls. We found flexibility at two levels, variability in acoustic structures and combinations of these structures into more complex utterances. Both mechanisms significantly enlarged females’ vocal repertoire that consisted of only four basic call types (‘H’, ‘R’, ‘L’ and ‘A’). First, we observed non-random combinations of the four basic calls, which increased the repertoire to seven types of utterances (‘H’, ‘L’, ‘R’, ‘A’, ‘HA’, ‘LA’ and ‘RA’). Second, we found that, within the most frequently emitted call type (‘A’), females produced two subtypes characterised by differences in the frequency modulation, which in turn increased the repertoire to eleven utterances.

The shape of the frequency modulation of ‘A’ calls (arch broken or full) is a pattern also seen in the calls of other guenon species (Gautier 1988). For instance, female Campbell’s monkeys produce six subtypes of ‘CH’ calls, which seem to be the structural and contextual analogue of the ‘LA’ combinations of Diana monkeys. Campbell’s monkeys also produce broken and full arches in relation to different contexts, regardless of the caller’s age (Lemasson and Hausberger 2004, 2011). In individuals raised in captivity, the full arch encoded information about caller’s identity and affiliative bonds. Call structure changed across years in adult females, and playback experiments showed that females reacted differently to current and to no longer produced variants of familiar females (Lemasson et al. 2005). Although presumably other calls are also individually distinctive, we found that Diana females preferentially used the full arched calls when revealing identity was particularly important, such as during periods of low visibility, when facing an opponent and during auditory confusing environments such as call choruses. The full arched frequency modulation is an acoustic structure that has a high potential for individual coding.

Although the contextual variables used in this study were somewhat crude, especially if compared with studies on the social calls of savannah-dwelling primates, they generated biologically relevant links to the observed vocal patterns. Indeed, both levels of flexibility—acoustic modulation and combination—turned out to be context-related in this species, showing that the cohesion–contact calls system of Diana monkeys contains subtleties that go beyond a simple function of individual identification and spatial positioning, as originally proposed by early studies (Gautier 1988). When uttered alone, ‘H’ and ‘R’ types were associated with social activities and contexts relating to high group spatial cohesion and were uttered in isolation. ‘H’ calls were given when in the outer parts of the home range, in the presence of neighbours and when luminosity was high, while ‘R’ calls were given in the center of the territory and when luminosity was poor. ‘L’ and ‘A’ types were more typically associated with neutral contexts, when the group was scattered and when the vocal activity was high. Importantly, ‘H’ calls were emitted in situations that were ‘socially positive or relaxed’ for the emitter while ‘R’ calls were emitted in ‘socially negative or potentially dangerous situations’. The majority of ‘L’ calls uttered alone were emitted during a ‘neutral situation’, although this result was not statistically significant. It is hence possible that these three call types form a gradient reflecting the general motivational state of the caller. In contrast, ‘A’ calls uttered alone differed from the previous call types in several ways. They were emitted much more frequently, were contextually neutral and were typically used during vocal exchanges.

Call combinations were optional and always in the form of a two-compound utterance with the first call used as an introductory unit followed by one of two subtypes of arched calls. In addition, when females produced call combinations, their contexts of emission were not fundamentally different from the contextual profile of the same calls emitted alone (either the introductory unit or the arched call). Instead, call combination seemed to modulate the utterance of an ‘A’ call with a contextual value regarding the immediate situation faced by the emitter in terms of ‘positive or relaxed’, ‘negative or potentially dangerous’ or ‘neutral’ situation. One hypothesis is that the ‘A’ call could function as an individual identifier combined with or without contextual situation. A similar finding has recently been reported in Campbell’s monkeys, where females emit ‘LA’-like combinations in which the ‘L’-like part reveals something about the caller’s kin relatedness and the ‘A’-like part the caller’s social bonds (Lemasson and Hausberger 2011). For Diana monkeys, further work is needed to explore the kind of information conveyed by differences in arch structures.

Combinatorial properties may be more widespread in primate communication than previously reported, although very little is still known about the informational content of these combinations if compared to the single units (Zuberbühler 2002; Crockford and Boesch 2005; Ouattara et al. 2009a, b). Traditionally, analyses of primate vocal behaviour have been carried out at the level of the individual call type, but as stated by Hauser (2000), sequences may also be communicatively relevant (see Bouchet et al. 2010). In non-primate taxa, sequence-based investigations are more common (e.g. songbirds: Kroodsma (1982), killer whales Orcinus orca: Shapiro et al. 2010), although this has not generated much progress in terms of context-specific production.

When compared to previous studies in closely related species, the combinatorial system of social calls in Diana monkeys showed some parallels with the affixation system in Campbell’s monkeys (Ouattara 2009a), although a number of important differences were also present. Specifically, there was no evidence that Diana monkeys’ combinations of social calls carried strong semantic content relating to specific events, such as a falling tree, the approach of a neighbouring group (Ouattara et al. 2009b) or a signal for group progression (Arnold and Zuberbühler 2006). Instead, the combinations of social calls seen in Diana monkeys appear to convey the individual identity of the caller (most likely, though not exclusively, to be found in the arched frequency modulation) and something about the immediate motivational state the caller finds herself in, that is, whether she assesses the current situation as positive, negative or neutral (found in the introductory call). A particularly interesting case is the rare ‘HA’ combinations whose communicative function will require more investigations.

Whatever the function of non-random concatenation of calls is, it is clear that this behaviour can significantly enlarge the vocal repertoire of a species and expand the functional use of calls, which may be particularly relevant for species that have little control over call morphology. Similar arguments have been made for male Campbell’s monkeys, where affixation broadens the ‘meaning’ from predator-specific alarm calls to calls given to a broader class of disturbances (Ouattara et al. 2009a). In male putty-nosed monkeys, ‘pyow-hack’ combinations carry different ‘meanings’ than pure ‘pyow’ or ‘hack’ series (Arnold and Zuberbühler 2006). In Diana monkeys, the concatenation of one of several possible introductory calls to the arched call unit seems to function as a contextual refiner of this contextually neutral call. The degree to which these subtleties are intentionally produced, mere reflections of a caller’s motivational state (Owings and Morton 1998; Owren and Rendall 2001) or both has not been addressed by this study and will require further investigation.

To conclude, we evidenced optional and potentially partially redundant combinatorial properties in the social calling system of female Diana monkeys, the first evidence of this kind for short-distance vocalisations used in social contexts. This study brings new insights into the mechanism by which non-human primates can achieve enhanced acoustic flexibility, something that may be especially important during social interactions. The degree to which this and other non-human primate combinatorial calling systems are relevant for understanding the early biological roots of human language is currently unclear and much debated. The outcome of this debate will also largely depend on whether similar properties can be found in the calling systems of our closest relatives, the chimpanzees and bonobos.