Introduction

Inner states can change both quantitatively (intensity) and qualitatively (type) with considerable speed following changes in the external or internal environment of the individual (e.g., Carlson et al. 1989; Fredrickson and Levenson 1998). Agonistic interactions are examples of such dynamic events. Where aggressive behaviors occur in response to a specific threat, they are complexly modulated by features of the attacker and the environment (Nelson and Trainor 2007). Changing these external features (e.g., the nearness of the attacker or noticeable signs of aggression from it) will dynamically modulate the inner state of the recipient and this will affect its signaling behavior. During such interactions, vocalizations carry both indexical and affective cues. While in most cases indexical cues are hard to modify and are therefore honest (see Fitch and Hauser 2003), the changes of inner state have the potential to dynamically modulate certain parameters of the sound.

The source–filter framework (Fant 1960; Titze 1994) is a widely applicable mechano-functional explanatory model of how the acoustic features of sound relate to the anatomical constraints resulting in such indexical cues and to the affective states of the signaler (for review, see Taylor and Reby 2010). According to the theory, indexical features (such as body size, age and sex), as well as the affective state of the signaler, may modify the sound at the level of the ‘source’ (larynx, laryngeal and sublaryngeal structures) and of the ‘filter’ (vocal tract) (Fitch and Hauser 2003). For example, basic changes in respiration might affect the amplitude, tempo and absolute fundamental frequency (f 0) of the calls, while changes in vocal fold tension and coordination (resulting from changes in overall muscle tonus and control) might affect the pitch of the sound (Rendall 2003).

Basic indexical features, such as body size, have a well-described effect on particular acoustic parameters, such as f 0 (which is a ‘source-based’ parameter) and formant frequencies (F m, which are typical ‘filter-characterized’ parameters) (e.g., Fitch 1997; Briefer and McElligott 2011). Changes in the fundamental frequency are the result of the active modification of vocal fold tension and depend on the morphology and size of the larynx, which, in turn, is only weakly linked to the size of individuals within a particular species. On the other hand, ‘formant dispersion’ (dF), which is the average spacing between neighboring formants (spectral peaks in the vocal signal), is directly dependent on the vocal tract length (Fitch 1997). The length of the vocal tract is closely connected to overall body size; thus, formant dispersion represents an indexical cue in various mammalian species which is hard to manipulate, for example, in dogs, Canis familiaris: Riede and Fitch (1999); koala (Phascolarctos cinereus): Charlton et al. (2011); and American bison (Bison bison): Wyman et al. (2012).

In her review, Briefer (2012) emphasizes that non-human animals are excellent subjects for modeling the link between inner states and their vocal correlates due to their almost complete lack of cognitive control, indicating that these vocalizations will represent a ‘direct expression of underlying emotions.’ According to the so-called motivational–structural (MS) rules outlined by Morton (1977), the inner state of the individual is reflected by the acoustic features of the call that are dependent on the ‘source’: harsh (broadband), lower-frequency vocalizations are used in agonistic contexts; tonal, higher-frequency calls in appeasing or non-agonistic contexts. According to Morton’s theory, in terms of the evolution of producing competitive signals, the production of harsh, low-frequency sound is linked to a relatively larger body size. If the receiver reacts accordingly, such signals may also determine the outcome of an agonistic encounter. Due to their size dependency, such vocalizations may help to avoid direct confrontation in evolutionary terms (‘expressive size symbolism’ Morton 1994). However, it is important to note that ‘size dependency’ is a relative term. While f 0, due to its flexibility, has become ‘detached’ from the size of the signaler in many species, thereby opening up an easier possibility for dishonest signaling (e.g., in green frogs, Rana clamitans, Bee et al. 2000; or humans, Rendall et al. 2007), dF on the other hand has remained a more or less reliable indicator of the signaler’s size. Ohala (1984) drew a parallel between minor changes of formant dispersion (due to specific face expressions) and the signaling of inner states due to an evolutionary process in which the body size communicated has become ‘symbolized’ to the corresponding inner states. Owings and Morton (1998) also emphasized that the opposite endpoints of motivational tendencies will be expressed as lower and harsher, or higher and more tonal vocalizations (Rendall 2003; Gogoleva et al. 2010a).

Although the physical parameters of the caller show a more or less strong association with the acoustical signals, certain anatomical adaptations may allow a passive or active modulation of even the indexical content of the calls. Consequently, the vocal signal will not reflect the signaler’s actual body size during particular interactions (Taylor and Reby 2010). Examples of ‘passive size exaggeration’ appear in particular bird species where the trachea has become longer than anatomy requires it to be (Fitch 1999). The elongated nasal region of elephant seals (Mirounga leonina) is another case of passive size exaggeration (Sanvito et al. 2007). When animals modify the length of their vocal tract by using specific groups of muscles during vocalization, we consider these to be cases of ‘active size exaggeration.’ While the larynx in male red deer (Cervus elaphus) and fallow deer (Dama dama) sits at a relatively low position in the neck, during the production of mating calls it can be retracted even lower down to the sternum (Reby et al. 2005). Active acoustical ‘exaggeration’ can also be achieved in other ways—as in male saiga antelopes (Saiga tatarica) which use a specific posture while emitting courtship calls (Volodin et al. 2009).

In canines, our knowledge of the vocal imprints of different affective states is slender due to the limited number of experimental studies. However, there have been investigations of affect-related vocalizations and their acoustical representations in African wild dogs (Robbins and McCreery 2003) and also in silver foxes (Gogoleva et al. 2010b). The barks of domestic dogs (C. familiaris) also show acoustical dissimilarities across various social contexts reflecting the assumed differences in the subjects’ inner state (Feddersen-Petersen 2000; Yin and McCowan 2004). Moreover, it has been found that humans are able to correctly categorize barking contexts and the assumed inner state of the barking dog by listening to pre-recorded bark samples (Pongrácz et al. 2005, 2006).

Other studies have investigated size-related information of dog vocalizations. It has been shown that the indexical cues conveyed by dog growls are different in agonistic and playful situations (Faragó et al. 2010b). Furthermore, this cue can be perceived by dogs (Faragó et al. 2010a; Bálint et al. 2013) as well as humans (Taylor et al. 2008). It is known in other species that size-related vocal parameters may be altered to some extent in agonistic situations (Fitch and Reby 2001; Taylor and Reby 2010); however, while size exaggeration in the roars of red deer is not known to be affected by the social environment, we expect a threat-level-dependent effect in the case of dog growls. This assumption is supported as humans tend to perceive dog growls with exaggerated size information (based on both f 0 and dF) as more aggressive (Taylor et al. 2010).

While ethologists focus their interest mostly on the affiliative/cooperative aspect of the dog–human relationship (e.g., Miklósi and Topál 2013), there are also well-documented and often problematic agonistic encounters between the two species (e.g., Klausz et al. 2014). Human-directed aggression by dogs is often classified as defensive aggression (Reisner 2003), and it is often stated that humans are prone to misjudge the affective state/mood of dogs which can easily escalate conflicts between the two species (e.g., Besser 2007; Meints and de Keuster 2009). While it is generally accepted that in most agonistic interactions (holding a territory, defensive behaviors, etc.), the physical attributes of the opponents, e.g., their body size, are the crucial determinant of the outcome of the contest (Owings and Morton 1998; Taylor and Reby 2010), there is an additional factor emerging as the elicitor of differential agonistic reactions of dogs: human gender. There is a fast-growing evidence that dogs are not only capable of distinguishing men from women (e.g., Nagasawa et al. 2011; Ratcliffe et al. 2014; Ruffman and Yong 2015), but (at least shelter dogs) show more affiliative behaviors when approached by a female assistant (Hennessy et al. 1998) and behave more aggressively when encountered by men (Lore and Eisenberg 1986; Wells and Hepper 1999). Ratcliffe et al. (2014) also reported that dogs’ ability to make a correct cross-modal match between male or female voices and two assistants of different gender was dependent on the amount of a priori experience with humans (i.e., only dogs living with more than two adults in a household were capable of successful gender-specific matching). Therefore, we cannot rule out a role for gender experience in dogs’ different reactions to men and women in agonistic interactions.

In this experiment, our aim was to investigate the vocal reaction of dogs in a situation where the dog encounters a ‘threatening’ unfamiliar person. The main goal was to investigate (1) whether the acoustic parameters of the dogs’ vocal response are affected by particular aspects of the threatening human (such as sex or body size), and (2) how the different levels of threat affect the vocal parameters associated with the inner state and indexical parameters (in our study the body size) of the dog.

In our experimental setup, we compared the dogs’ vocal response in two consecutive trials. We manipulated the level of threat by using differently sized men and women as ‘threateners’ in two experimental trials. Our hypothesis was that if dogs show stronger fear/agonistic reaction to men or to a larger person, their inner state might be reflected in certain acoustical parameters of their growls, mainly resulting in growls with lower fundamental frequencies and narrower formant dispersion.

Materials and methods

Subjects

All of our subjects were family dogs kept as companion animals in Hungarian households. To represent the natural variability of commonly kept companion dogs, we recruited dogs of a large number of different breeds (38); an additional 48 dogs were mixed breeds (for details, see Online Resource 3) with high variability in size and morphology. All owners participated in our experiment voluntarily, and they were fully informed about the experiment and the potential stress the procedure might cause to the subjects. When recruiting dog–owner dyads for the experiments, in order to reduce the potential risk to the experimenters, we excluded a priori those subjects that the owner reported had shown human-directed aggression in the past (attacks resulting in bites).

In total, 138 dogs participated in the experiment; however, only 96 were tested in both trials for various reasons (e.g., logistical problems, owners’ lack of collaboration, or in a few cases, the experimenters decided that the dog showed strong signs of stress during the first trial). From the 96 subjects who were tested twice, we excluded 32 from the analysis because these dogs did not emit growls during both trials; the excluded dogs barked, whined or remained silent, and our main focus here was on the within-subject comparison of growls. The dogs’ sex ratio was 50:50, and their age ranged from 9 months to 11 years (M = 3.77, SD = 2.44). Their height at the withers ranged from 15 cm to 64 cm (M = 45.53, SD = 11.07), while their mass ranged from 4.7 to 47 kg (M = 18.13, SD = 8.7).

Experimental procedure

We used a within-subject experimental design, that is, all subjects were tested twice, with different experimenters in the role of the threatening approaching stranger. At least 3 days passed between the two tests (M = 10.1, SD = 8.95). In each test for evoking mild stress in the subjects, we applied the method developed by Vas et al. (2005), which has also been used in other studies (Faragó et al. 2010b; Klausz et al. 2014; Gácsi et al. 2013). The core of the so-called Threatening Stranger (‘Stranger’) method is an interaction where the dog and its owner are being approached by an unfamiliar person in a slow, stalking manner, while steadily staring at the dog’s eyes. The process was terminated when an average of 20–30 s of vocalizations were recorded, similar to the procedure of Faragó et al. (2010b). The entire test was in general 60–90 s long. The duration of the interaction had a rather large variability due to the dogs’ varying response manners and dynamics (first test: M = 67.54 s, SD = 28.52, second test: M = 72.63 s, SD = 29.76). Also, in those cases where the dog showed a strong behavioral reaction (struggling, shaking, strong salivation, urination, etc.), we immediately terminated the trial, excluded that subject from further testing and, in cooperation with the owner, performed stress-relieving exercises.

We strongly emphasized the importance of preventing the dogs from experiencing undue stress in the test. Owners were informed that they could interrupt the ongoing test any time if they felt that their dog is experiencing an unacceptable level of stress. No owners decided to do so. Similarly, the experimenter who acted as the ‘Stranger’ assessed the dog’s behavior during the test, looking for signs of elevated stress (fear or aggression) levels. As the goal was to record an adequate amount of growls for acoustic analysis, the ‘Stranger’ did not approach the dog closer than was necessary to elicit the growls. After the dog started to growl, the experimenter remained at that distance until the required amount of growls were recorded. With this protocol, we were able to avoid severe signals of fear/aggression, such as snarling or barking. At the end of the test, as in the original Vas et al. (2005) study, the experimenter changed his/her approach from threatening to friendly behavior. He/she stopped, stepped back, crouched or sat down on the floor and called the dog’s name in a friendly, beckoning way (repeatedly calling with high-pitched voice) with a calm or happy expression. At the same time, the experimenter asked the owner to let the dog free. If necessary, the owner encouraged the dog in a calm manner to approach the experimenter. If the dog approached the stranger, she/he gently petted the dog and initiated a friendly, playful interaction with it. In rare cases when the dog was still showing stress or unwilling to approach the stranger, the owner approached the experimenter and interacted with her/him in a friendly way. The owner then called and encouraged the dog to approach them to show that the ‘Stranger’ meant no harm.

Threatening strangers

All ‘Strangers’ were adult Caucasian men and women, recruited from the research staff of the Department of Ethology. The age of the ‘Strangers’ ranged from 24 to 60 (men), and from 24 to 32 (women). Since we intended to measure the effects of gender and size of the ‘threatening’ human on the acoustical response of dogs, we categorized within these two features. To reduce the probability of pseudoreplication, we used a substantial sample of ‘Strangers’ (eight women and eight men). The body size of the ‘Strangers’ (‘large’ or ‘small’) was categorized based on a value labeled as ‘Frontally visible body surface.’ This means the apparent surface of the body, seen from the front, and is calculated as follows: [height of person (cm)] × 3√[mass of person (kg)]. This value helps to combine different measures of body size (height, mass), and it was proportional to the visible surface of the approaching human. We calculated the medians of these values for the men and women separately and defined ‘large’ as above the median, while ‘small’ was below it. For men, M = 776.77; thus, four men were categorized as ‘large’ and four of them as ‘small.’ For females, M = 638.69, there were also four ‘large’ and four ‘small’ participants. See Online Resource 2.

Experimental setup

The experiment was conducted in a 4 m × 6 m room. The dog was standing with its owner in one of the corners of the room. The owner was asked to stand behind the dog during the test, next to the wall and to avoid any interaction with the dog. The dog was held by a leash that was fixed to the floor, 40 cm away from the corner for safety reasons and also to keep the distance between the dog and the microphone within a relatively stable, close range to obtain comparable loudness measurements from the recordings. The leash was 110 cm long, allowing a comfortable movement area for the dog. The ‘Stranger’ was hiding outside the room, behind a door in the opposite corner of the room, until the test started. The distance between the dog and the door where the ‘Stranger’ appeared was approximately 4.5 m (see Fig. 1).

Fig. 1
figure 1

The experimental setup. The dog was standing in front of the owner, on a leash which was fixed to the floor. The ‘Threatening Stranger’ approached from the opposite corner of the experimental room. A microphone and a camcorder were placed on the floor in order to record the growling and movements of the dog

The tests were recorded by four video cameras: three UI-2230-C (USB), which were placed on three different walls of the room, and one Panasonic NV-GS27 camcorder, which was placed on the floor, 1.80 m away from the dog, to get the best close view of its responses.

A Sennheiser ME-65 type microphone with K6p powering module was placed at 1.30 m from the dog, oriented toward the dog in order to obtain the best quality of recording (see Fig. 1). The microphone was phantom-powered with 42 V from an H4n recording device used as a USB sound card. VirtualDub software was used for the simultaneous recording of the video (compressed avi) and uncompressed sound material (windows PCM Wav 44.1 kHz sampling rate, 16 bit).

Experimental design

Corresponding to our experimental hypotheses, we had three experimental groups. These were the ‘LargeMan–SmallMan’ (LM–SM), ‘LargeWoman–SmallWoman’ (LW–SW) and a ‘Man–Woman’ (M–W) group. The basic details of the participating dogs are shown in Online Resource 3.

In the LM–SM group, the dogs were approached both times by men, in the LW–SW group, both times by women, but the size of the ‘Stranger’ was different (‘large’ vs. ‘small’) in the two trials. In both groups, 19 dogs were tested with balanced gender arrangement (11 male dogs in both LM–SM and LW–SW). The order of the approaching ‘Stranger’—large or small—was balanced between the subjects.

Small and large ‘Strangers’ were assigned to various pairs, creating the highest possible number of different pairings. Each particular pair was used for only one dog, with the exception of one pair in the LM–SM (M5–M1), and three pairs (F6–F3, F3–F6, F5–F2) in the LW–SW, which were used twice. Online Resource 3 shows which threatening humans were used for each dog.

In the M–W group, we tested 25 dogs (number of male dogs: 10). In this group, the size categories of the ‘Strangers’ were determined by the absolute size difference of the available male and female ‘Stranger.’ All possible size and sex combinations were used, based on a predetermined schedule. The corresponding sizes, ‘large’ (‘l’) and ‘small’ (‘s’), are shown in Online Resource 3. The order of the gender of the approaching ‘Stranger’ was also balanced between the subjects: 12 dogs encountered a man, and 13 dogs encountered a woman during the first test (see Online Resource 3). With this mixed within- and between-subject design, we were able to minimize the stress we put on our subjects and avoid the potential confounding order effect of repeated testing, but were still able to test both gender and size effects.

Acoustical analysis

We used a similar, but extended set of acoustic measures (N = 50 variables) as in Molnár et al. (2008) and Larrañaga et al. (2015), extracted from bark samples with a custom-made ‘Praat’ script (Boersma and Weenink 2001); the scripts used are included in Online Resource 1. During the analysis, the script extracted the individual growls from the recordings in a semiautomatic way, using Praat’s built in annotating algorithm [To Text Grid (silences)…] to mark the boundaries of each growl. Then, these boundaries were checked by the operator of the script (TF), who if necessary modified the boundaries, or selected and marked the missed growls. While performing this task, the operator was blind to the testing conditions in which the particular sound samples were collected from. Also in this stage, the operator excluded growls that overlapped with background noise. To optimize the extraction of acoustic parameters, we set up the search range of fundamental frequency, the number of formants and the maximum frequency of the highest formant manually, dog-by-dog, after visual inspection of the sonograms and using Praat’s View Pitch and View Formants functions in the editor window (this was necessary due to the high individual variation, see also Riede and Fitch 1999). All the acoustical parameters were extracted automatically from each individual growl. The extracted parameters, which were either source (fundamental frequency contour parameters, tonality measures), filter (formant frequencies, dispersion of spectral energy) or intensity related, have potential communicative function and are widely used in bioacoustical studies (e.g., Briefer 2012 for details, see Table 1). The calculation of formant dispersion used the same method as in Faragó et al. (2010b), which was based on Riede and Fitch’s (1999) approach. We extracted the formant frequencies using the Burg method and then averaged the difference between successive formants \(({\text{d}}F=\frac{\mathop\sum\nolimits_{i=1}^{m-1}({F_{i+1}-F_{i}})}{m-1})\). The measured parameters were averaged over growls throughout each recording, and in the later analysis, we used this average for describing each dog’s vocal behavior.

Table 1 Description of the measured acoustic variables

Statistical analysis

Since the size arrangement of the ‘Strangers’ in the LM–SM and LW–SW groups was not equivalent to that in the M–W group, we analyzed the first two groups and the third group separately in a ‘Same Gender Groups analysis’ and a ‘Mixed Gender Group analysis.’ In the ‘Same Gender Groups analysis,’ the size labels described in "Materials and methods" were applied, while in the ‘Mixed Gender Group analysis,’ we determined the size category of the ‘Strangers’ separately for each ‘Stranger’ pair, based on their absolute size differences. This was necessary because the previously formed size categories (‘large,’ ‘small’) of the men and women ‘Strangers’ were suitable for within-gender comparisons and were not directly applicable to the size differences of men and women in the mixed gender group. The corresponding size labels are shown in Online Resource 3.

Principal component analysis

Since we had a large set of variables (50 initial acoustical variables), we first performed a principal component analysis (PCA) based on correlations between variables with varimax rotation to assess whether there were significant associations among the acoustical variables. The number of PCA components was chosen using the break point of the scree plot (see Cattel (1966) and validated by parallel analysis based on O’Connor’s method (O’Connor 2000) using the rawpar function of the paramap R package. This method determines the number of factors by calculating the eigenvalues from the original dataset and then compares these with ones calculated from a high number (in our case 10,000) of random permutations of the original dataset. To further simplify the components, we applied a backward elimination approach, excluding step-by-step those parameters that had low loading (<0.5) or contributed to more than one component with similar absolute loading (this approach is commonly used in PCA analysis). Cronbach’s alpha was calculated to assess the internal consistency of the final extracted factors and for testing the repeatability of the measurement (DeVellis 1991).

First, PCA was performed on the first dataset, consisting of the first trials of each dog, and then, the obtained factors were tested on the second dataset (which came from the second tests of the subjects). The acoustical variables selected for the components during the first PCA matched the second dataset very well, resulting in very similar item distribution and loads for the same number of components. The detailed description of exact meaning of those acoustic parameters that were included to the components is shown in Table 1.

The first PCA performed on the initial 50 acoustical variables resulted in four components, labeled as ‘Intensity_1,’ ‘Pitch_1,’ ‘Dynamics_1’ and ‘Tonality_1’ (Table 2). Applying the variables associated with the components obtained in the PCA performed on the data of the first test, we ran a PCA on the second dataset as well. The acoustic variables with their loadings at the particular components, the individual and cumulative percentage of variance of each component and the reliability (Cronbach’s alpha) of the components are shown in Table 3.

Table 2 Results of the PCA on the acoustic variables of dogs’ vocalizations in the first test series
Table 3 Results of the PCA on the acoustic variables of dogs’ vocalizations in the second test series

The extracted components

Intensity’: This component consists of variables describing the energy content (amplitude) of the sound, intensity measures and the change of sound energy in time.

Pitch and Formants’: Variables with the highest loads are related to the fundamental frequency parameters of the sound and the average spacing of formants; ‘formant dispersion’ also contributes to this component.

Dynamics’: Variables describe the length of the sound (call duration), as well as the latency (time point of appearance) of the minimum/maximum intensity and frequency values in the sound.

Tonality’: This component consists of two variables; both are related to the tonality of the sound; ‘jitter’ describes the frequency alteration of consecutive voice cycles, while ‘mean harmonicity’ is the mean tonality of the sound.

Generalized linear mixed models

Using the components generated by the PCA as individual dependent variables, we performed generalized linear mixed models analysis on all four components separately, both in the ‘Same Gender Groups’ (LM–SM group, LW–SW group) and in ‘Mixed Gender Group’ (M–W group) analyses. The residuals were not normally distributed in the case of component ‘Pitch and Formants’ and ‘Dynamics.’ Therefore, the scores of these components were log-transformed before entering them in the analysis (‘log-P&F,’ ‘log-Dynamics’). We used the gender of the ‘Stranger’ (male vs. female), the size of the ‘Stranger’ (smaller vs. larger) and the order of the experiments (1st vs. 2nd), and based on Ratcliffe et al.’s (2014) finding, we also included the owner’s gender (male owner vs. female owner vs. mixed household) as fixed effects and the ID of the individuals as random factor. To rule out the possibility of a confounding effect of dog size differences between the household categories, we tested this with one-way ANOVA. We found no significant dog weight difference between the categories [F(2,61) = 1.198; P = 0.309]. Since the body size of the dogs could have an impact on particular acoustic parameters of their vocalizations (Riede and Fitch 1999), we included the body mass of the dogs (‘mass’—kg) as a covariant. Besides the main effects, we measured the 2-way and 3-way interaction effects of the above-mentioned four fixed effects. To obtain the simplest model that sufficiently explains our data, we applied backward elimination model selection excluding interactions with the highest P value step-by-step, till we reached effects with lower than 0.05 P values. In the following, we report the obtained final models only. Post hoc tests were performed as pairwise comparisons of the levels of the significant effects and controlled for multiple comparisons with sequential Sidak method. These corrected P values are reported in all post hoc results. All of the statistical analyses were performed by the IBM SPSS Statistics 22.0 software.

Results

Same gender groups

Intensity

We found no significant effect on the ‘Intensity’ component [F(6,71) = 1.46; P = 0.204], neither in the main effects, nor in the two-way or three-way interaction effects.

Pitch and formants

The model showed a significant effect [F(18,59) = 5.466; P < 0.001] with the main effect of the covariant weight [F(1,59) = 31.951; P < 0.001] and three three-way interactions (gender of ‘Stranger’ × gender of owner × order: F(1,59) = 7.032; P = 0.01; gender of ‘Stranger’ × gender of owner × size of ‘Stranger’: F(1,59) = 4.073; P = 0.048; order × gender of ‘Stranger’ × size of ‘Stranger’: F(1,59) = 11.018; P = 0.002).

The post hoc test revealed that when the ‘Stranger’ was male, in the first encounter dogs with a male owner had growls with higher Pitch and Formants factor score than dogs with a female owner [t(59) = 4.097; P < 0.001] or from a mixed household [t(59) = −4.479; P < 0.001], while the latter two did not differ significantly [t(59) = −1.162; P = 0.25; Fig. 2]. During the second encounter, each pairwise comparison showed the same pattern of differences [male vs. female: T(59) = 3.029; P = 0.004; male vs. mixed: T(59) = −4.886; P = 0.001; female vs. mixed: T(59) = −3.792; P < 0.001]. We found a slight order effect which was only present in dogs from mixed households encountering male ‘Strangers’: They growled with significantly lower Pitch and Formants score in the second trial [t(59) = 2.418; P = 0.019]. Furthermore, those dogs from a mixed household that faced a male ‘Stranger’ had lower scores than dogs that encountered a female ‘Stranger’ [1st trial: T(59) = −2.686; P = 0.009; 2nd trial: T(59) = −4.659; P < 0.001], while dogs with a male owner had higher scores when facing a male ‘Stranger’ [1st trial: T(59) = 2.745; P = 0.008; 2nd trial: T(59) = 2.340; P = 0.023]. Dogs with a female owner emitted growls with significantly lower scores to male ‘Strangers’ only if it was their first encounter [t(59) = −2.696; P = 0.009].

Fig. 2
figure 2

The interaction between the gender of the owner, the gender of the stranger and the order of the trials affecting the Pitch and Formants factor. The dogs from mixed household had the lowest factor score when encountering a male stranger

Within the LW–SW group, the post hoc comparison showed no effect of the size of the ‘Stranger’ and the gender of the owner, while dogs that encountered male ‘Stranger’ showed different reactions based on their household composition (Fig. 3). Dogs with male owners had higher scores on the factor than mixed household [large ‘Stranger’: T(59) = −5.462; P < 0.001; small ‘Stranger’: T(59) = −4.475; P < 0.001] or female-owned dogs [large ‘Stranger’: T(59) = 3.927; P < 0.001; small ‘Stranger’: T(59) = 3.927; P < 0.001]. However, the difference between the latter two was present only when the dog was facing a large male stranger [large ‘Stranger’: T(59) = −3.334; P = 0.001; small ‘Stranger’: T(59) = −1.611; P = 0.113]. Mixed-household dogs growled with lower scores when facing a male stranger than a female [large ‘Stranger’: T(59) = −4.329; P < 0.001; small ‘Stranger’: T(59) = −3.016; P = 0.004], while male-owned dogs had growls with higher scores on male ‘Stranger,’ regardless of their size [large ‘Stranger’: T(59) = 3.016; P = 0.004; small ‘Stranger’: T(59) = 2.482; P = 0.016]. Dogs with a female owner showed such different growling behavior only when facing a small ‘Stranger’ [large ‘Stranger’: T(59) = −1.346; P = 0.183; small ‘Stranger’: T(59) = −2.685; P = 0.009]. We found no size effect within the household and ‘Stranger’ gender-based groups.

Fig. 3
figure 3

The interaction between the gender of the owner, the gender and the size category of the stranger affecting the Pitch and Formants factor

Finally, we found an opposite order effect in the LM–SM group depending on the ‘Strangers’ size. Dogs encountering a large male first had growls with higher scores than those that met a large male second time [t(59) = 4.109; P < 0.001] and those that met small male first had lower scores than those that met the small male stranger the second time [t(59) = −2.855; P = 0.006]. Those dogs that met a small male first had growls with lower scores than those that met with a small female [t(59) = −2.020; P = 0.048], while those that encountered a large male in the second trial had lower scores than those that encountered a large female second [t(59) = −2.650; P = 0.01]. Moreover, those dogs that faced a large male first emitted growls with higher Pitch and Formants score than those that met a small male ‘Stranger’ first [t(59) = 3.713; P < 0.001], and meeting with a large male second time resulted in lower scores than a small male in the second trial [t(59) = −3.609; P = 0.001].

Dynamics

In the case of Dynamics component, we found a significant effect [F(4,73) = 4.754; P = 0.002] with main effects of the gender of the owner [F(2,73) = 5.591; P = 0.006] and the gender of the ‘Stranger’ [F(1,73) = 8.626; P = 0.004]. The post hoc tests showed that dogs with a female owner growled more briefly and more variably than dogs in the other two categories [female vs. mixed: T(73) = 2.471; P = 0.031; female vs. male: T(73) = 2.642; P = 0.03; mixed vs. male: T(73) = −0.853; P = 0.397]. In addition, male ‘Stranger’ also evoked shorter and faster changing growls (Fig. 4).

Fig. 4
figure 4

The effect of the stranger’s gender on the Dynamics factor. Dogs facing male Stranger had shorter and faster changing growls

Tonality

In the case of tonality, we found no significant effect [F(6,71) = 0.136; P = 0.991].

Mixed gender groups

Intensity

We found no significant effect on the ‘Intensity’ component [F(6,43) = 0.805; P = 0.571].

Pitch and formants

In the mixed gender group, we found significant effect on the Pitch and Formants component [F(8,41) = 3.068; P = 0.008]. Again, the body mass of the dogs had a strong negative effect on this factor [F(1,41) = 10.644; P = 0.002] and the household composition and the ‘Strangers’ gender showed a significant interaction effect [F(2,41) = 4.553; P = 0.016]. Dogs from a mixed household had lower scores on the Pitch and Formants factor than dogs in the other two categories when facing a male ‘Stranger’ [mixed vs. male: T(41) = −2.369; P = 0.054; mixed vs. female: T(41) = −2.455; P = 0.054; male vs. female: T(41) = 0.632; P = 0.531], and also these dogs emitted growls with lower scores at the male ‘Stranger’ than at female ‘Strangers’ [t(41) = −2.307; P = 0.026] regardless of their size category (Fig. 5).

Fig. 5
figure 5

The interaction of the owner’s and the stranger’s gender affecting the Pitch and Formants factor. Again the same pattern rises as in the case of the same gender groups, the lowest scores can be found in the case of dogs from mixed household, although this effect is prominent when the dogs were encountering male strangers. Dogs from mixed households growled with lower fundamental frequency and formant dispersion at male strangers than female ones

Dynamics

We found significant effect on the Dynamics component [F(7,42) = 2.497; P = 0.031], with the significant positive main effect of body mass [F(1,42) = 4.656; P = 0.037] and the interaction of order and the gender of the ‘Stranger’ [F(1,42) = 4.649; P = 0.037]. The post hoc test showed a significant difference in the case of female ‘Strangers,’ and dogs meeting a female ‘Stranger’ first growled shorter and more variably than dogs meeting the female ‘Stranger’ in the second trial [t(42) = −2.010; P = 0.051; Fig. 6]. Dogs that encountered a male ‘Stranger’ on the second occasion growled also shorter and faster than dogs that met a female ‘Stranger’ second [t(42) = −2.659; P = 0.011].

Fig. 6
figure 6

The interaction between the stranger’s gender and the order of facing them on the Dynamics factor. Dogs facing with a female stranger first growled shorter and more variably than dogs facing a female stranger the second time. The same pattern is found with a male stranger

Tonality

In this group, we found no significant effect on tonality [F(6,43) = 0.429; P = 0.856].

Discussion

In this study, where dogs were approached by a stranger (‘Stranger’) in a threatening manner, we investigated whether the vocal response of dogs was influenced by the gender and/or the body size of the ‘Stranger.’ Both in the ‘Same Gender’ and in the ‘Mixed Gender’ Groups analyses, the ‘Pitch–Formant’ component showed significant variation in response to ‘Strangers’ of different genders, depending on the owner’s gender. Male ‘Strangers’ elicited a lower Pitch–Formant response when the dog had a female owner or came from a multi-gender household. The ‘Pitch–Formant’ component describes the fundamental frequency and its properties as well as the formant dispersion of the growl. This suggests that female-owned or partly female-owned dogs growled at lower frequencies and the growls had narrower formant dispersions if the encountered ‘Stranger’ was a man.

Additionally, in the ‘Same Gender Groups’ Analysis, the ‘Dynamics’ component was significantly lower in the case of male ‘Strangers’ and in general when the dog was owned by a woman. ‘Dynamics’ component includes the length of the growl, as well as the latencies of the minimum and maximum fundamental frequency and intensity values. This result suggests that dogs growled in shorter bouts at male ‘Strangers,’ and (consequently) the extreme values in fundamental frequency and intensity were reached faster when the ‘Strangers’ were men.

In the ‘Mixed Gender Group analysis,’ according to the two-way interaction effect of the order of the trials and the gender of the ‘Stranger,’ we found that if the second ‘Stranger’ was male (in these cases the first ‘Stranger’ was a female), the growls had lower ‘Dynamics’ values than if the second ‘Stranger’ was female (in this case the first ‘Stranger’ was male). Also, the ‘Pitch–Formant’ component was lower, while the ‘Dynamics’ component had higher values in the case of larger dogs.

Interestingly, we did not find any significant effects in the case of the other two components, ‘Intensity’ and ‘Tonality,’ in either of our experimental groups. Most of the acoustical variables contained in these components (e.g., jitter, amplitude, intensity measures, harmonicity) are usually linked to the arousal state of the caller (Scheumann et al. 2012; Briefer 2012). Although the potential threat caused by the approaching human most probably affected the arousal state of the animals, the gradual approach of the threatening stranger also induced dynamic changes in the vocalizations (e.g., a crescendo effect), which may have masked the variation in other related acoustical parameters.

We also found in both the ‘Same Gender Groups’ and ‘Mixed Gender Group’ analyses, that the ‘Pitch–Formant’ component was lower in the case of the growls of larger dogs. This is in line with observations about the relationship between the anatomical structure and vocalization of animals which suggest that the vocalizations of larger animals usually have lower fundamental frequencies (Taylor and Reby 2010) and also narrower formant dispersions (Fitch 1997; Fitch and Fritz 2006).

As some of our significant results were also subject to an order effect imposed by our experimental design, showing the interactions of multiple variables, conclusions should only be drawn cautiously, avoiding over-interpretation. However, by analyzing the details rigorously, based on the post hoc test, we can still derive sound conclusions. Considering Morton’s ‘motivational–structural’ rules, our results for the ‘Pitch–Formant’ component suggest that as regards their affective states, women-owned or mixed-household dogs facing male ‘Strangers’ are more likely to be on the highly aroused, aggressive end of this motivational continuum. Schassburger (1993) studying the vocal repertoire of wolves distinguished the ‘Moan’ vocalization that represents a transient form between growls and whines and he suggested that it is linked to an ambivalent inner state. If we assume that growls represent a low-pitched and harsh endpoint of a continuum in a graded vocalization with the high-pitched and tonal whines on the other end, and the moans in between, it is possible that the gradual change in the fundamental frequency will reflect the transition from aggression to fear. Based on our findings, it may be that the acoustical difference between growls in response to male and female ‘Strangers’ reflects dogs’ altered inner state because of the different experience they have had with human genders (i.e., in different households).

The ‘Pitch–Formant’ component also contains the parameter ‘formant dispersion’ (‘dF’). Our results suggest that the dF of growls was lower if the ‘Stranger’ was a man, than if it was a woman. Formant dispersion has been previously described to be related to overall body size in many species (for review, see Taylor and Reby (2010) and thus has the potential to serve as a vocal indexical cue. Specifically, more closely spaced formants (lower formant dispersion) are related to larger body size. Notably, it has been found in many instances that this size-related cue can be modified to some extent by anatomical and/or behavioral adaptations (Fitch and Reby 2001; red deer: Reby and McComb 2003; fallow deer: McElligott et al. 2006; Mongolian gazelles: Frey et al. 2008; and saiga antelopes: Volodin et al. 2009). In dogs it has been shown that contextually different growl types of the same dog differ in their dF (Faragó et al. 2010b) and that the conveyed size-related cue can be perceived by conspecifics (Faragó et al. 2010a; Bálint et al. 2013), where ‘food-guarding’ growls seemed to depict the adequate body size of the caller and playful growls appeared to express a larger stature. According to our results, dogs living with female owners or with both genders seem to communicate a larger body size in their growls when approached by a male ‘Stranger’ than by a female ‘Stranger,’ suggesting that this indexical cue may show alterations in dog vocalizations according to the caller’s different affective states.

Importantly, the gender of the owner significantly influenced the dogs’ reaction to the different threatening strangers. While dogs growled with lower Pitch–Formant values at approaching men if they had women owners or they were coming from mixed-gender households, those dogs that were owned by men emitted growls with higher Pitch–Formant values. Ratcliffe et al. (2014) did not report the exact gender composition of the dogs’ household, but they found that if dogs had more than two adults at home, they performed successfully in a cross-modal (gender-specific voice to correct gender person) matching task. The authors concluded that when dogs have experience with multiple representatives of human genders, they can assess the genders more easily (Ratcliffe et al. 2014). Our results add new details to the gender-specific responses of dogs to humans—in an agonistic context. Whether the above-mentioned effect of the different owner background of the subjects was caused mainly by habituation to specific genders, or was the consequence of specifically learned strategies of the dogs in relation to particular genders as opponents, would require further research.

It was also found that dog growls had smaller ‘Dynamics’ values if the ‘Stranger’ was a man in the ‘Same Gender Groups analysis’ and that it was also smaller if the second ‘Stranger’ was a man in the ‘Mixed Gender Group analysis.’ The latter result shows that dogs, after first facing a woman ‘Stranger,’ produced growls with lower ‘Dynamics’ values than those that first met the male ‘Stranger’ and after that the female. According to the variables included in the ‘Dynamics’ component, this indicates that dogs uttered shorter growls at male ‘Strangers’ and also that the latency of the extreme values of the fundamental frequency and intensity of the sound appeared earlier in time. Call duration and other temporal changes in animal vocalizations have been shown to be associated with different arousal and affective states in a number of species (e.g., hyena: Theis et al. 2007, meerkats: Manser 2001, and orangutans: Spillmann et al. 2010, Altenmüller et al. 2013). Thus, the shorter growl bouts of dogs in response to male ‘Strangers’ might reflect a more motivated, higher arousal state when encountering men ‘Strangers.’ Interestingly, recent results have shown that human listeners rate longer and more tonal dog vocalizations as less intense (Faragó et al. 2014), although this was true over different types of vocalizations. This suggests that the longer growls in response to female ‘Strangers’ might also be considered as less aroused/aggressive by humans, who were the intended receivers of the vocal signal during the experimental episodes.

An interesting additional finding was that the growls of larger dogs had higher ‘Dynamics’ values, indicating that the calls were longer in these cases. Supposing that shorter calls indicate more aroused motivational states (e.g., Manser 2001), this suggests that the larger dogs’ reaction was less aroused, e.g., less fearful to the approaching human. This might indicate that the inner/motivational state of dogs in an agonistic situation might be related to their body size, suggesting a connection between fighting potential and affective states in agonistic/defensive reactions.

It is worth noting that the size of the threatening stranger had no direct (main) effect on the acoustical variables in either of our experimental groups, only as the part of two- or three-way interactions. Although it is widely accepted that overall body size is an essential factor in determining the outcome of agonistic encounters, there are also exceptions to this rule in a number of species (e.g., fallow deer: Jennings et al. 2004; collared lizard: Lappin and Husak 2005; and humans: Sell et al. 2009). Sell et al. (2009) showed that when humans had to gauge the physical power of different subjects, the extracted cues were largely independent of general body size, but corresponded to objective measures of a certain body region, namely upper body strength. It might be that when faced with a potential human opponent, dogs do not use overall body size as a pertinent attribute in their assessment, but rather base their decision on more specific details like gender, body regions or distinctive features. Additionally, it could also be that the categorization of body size used in our experiment (‘Frontally visible body surface’) did not represent adequately, or even masked, those bodily attributes that dogs use in assessing the physical characteristics of human (and other) opponents.

A limitation of our design is the omission of subjects that did not emit growls in both trials, or remained silent during both occasions. Fear and aggression can have other vocal indicators apart from growls (i.e., barks, (Pongrácz et al. 2005), or some dogs simply do not vocalize in agonistic encounters (Vas et al. 2005). However, as our primary goal was to detect the dynamic changes in the acoustic parameters of one particular vocalization type, growls, we had to limit our sample to those dogs that provided these vocalizations during both encounters.

Taken together, our results indicate that (depending on their prior ownership experience) dogs are in a higher arousal-motivational state when encountering male ‘Strangers,’ mirrored in the variation of a number of different acoustical variables, such as lower fundamental frequency, narrower formant dispersion or the shorter growl lengths uttered in response to the threatening approach. Aggression induced by fear is a known behavioral phenomenon (Blackshaw 1991), and much experimental evidence suggests that fear triggers and potentiates aggressive responses in such agonistic situations as the threatening approach (Guy et al. 2001; Pageat 2004; O’Sullivan et al. 2008; Klausz et al. 2014; Landsberg et al. 2013). We suggest that the higher experienced threat caused by men might evoke a more aggressive response from the dogs that can be detected in their vocal signals, by, for example, expressing a larger body size, or emitting lower-frequency vocalizations in line with the findings of Taylor et al. (2010) in case of human listeners.