Broadly defined, social norms reflect a community’s expectations of how one should behave in a given context, and thus the interest of the society (de Waal 1991; Flack et al. 2004; Hechter and Opp 2001; Horne 2001). In humans, social norms form an integral part of social life (Jasso 2001; Sober and Wilson 1998). They are not only the key component of human moral behavior, they also structure our daily routines so thoroughly that we often fail to recognize their omnipresence and the automatic nature of our adherence to them (Young 2002). The fundamental function of these norms consists of guiding the behavior of individuals in situations where the interests of the society collide with the interests of a particular individual (Hechter and Opp 2001; Rudolf von Rohr et al. 2011).

If a community’s members adhere to social norms, they produce observable behavioral regularities. Since behavioral regularities can emerge through a variety of mechanisms, it is all but impossible to infer the presence of social norms by behavioral observation in naturalistic contexts (Rudolf von Rohr et al. 2011). In humans, this difficulty can arguably be overcome by asking people what they consider appropriate or inappropriate behavior in their society (but see Haidt 2007), but in nonlinguistic individuals (young children, other species) this is obviously impossible. The best approach to identifying social norms in nonlinguistic individuals is to focus on events in which norm violations occur. In humans, norm violations provoke strong emotional reactions (Haidt 2007), which may also lead to corrective interventions or altruistic punishment. Obviously, whether the emotional reactions actually do so depends on additional situational factors that determine the costs of such interventions.

Importantly, the reactions to putative norm violations shown by individuals who are directly affected by the deviant behavior can readily be explained without recourse to the existence of social norms (Fehr and Gächter 2002) because those reactions most likely reflect damage to individual interests. In contrast, uninvolved bystanders’ (i.e., third parties) reactions to norm violations (Fehr and Fischbacher 2004) pertain to the realm of moral behavior, for they provide no immediate benefits to the performers. Indeed, they may produce costs in terms of emotional unease and risk of provoking retaliation (Horne 2001). In sum, in order to demonstrate that behavioral regularities in nonlinguistic individuals are the result of an underlying social norm, one must crucially demonstrate that uninvolved bystanders show reactions to events that represent a violation of this putative norm.

The reactions of personally uninvolved bystanders to putative norm violations are compatible with the presence of quasi social norms, proto social norms, or collective social norms (as defined by Rudolf von Rohr et al. 2011). Hence, additional tests are necessary to distinguish between these possibilities. In quasi social norms, bystander reactions are automatically triggered by specific cues or striking features, such as screams by the victims. Although bystander reactions that are simply triggered by salient cues may appear moral to the outside observer, they are not. Bystander reactions based on quasi social norms may simply express emotional contagion rather than a more elaborate empathetic competence (Preston and de Waal 2002), or they may be the result of simple rules, such as “when an infant screams in a particular way, then attack (if hierarchically possible) the individual that is closest to it.” Note that these reactions are only moral from a functional perspective; hence the term quasi social norm.

In proto social norms, bystander reactions cannot be explained by simple stimulus–response mechanisms or emotional contagion, but as a response to the specific context, such as “an individual harms an infant.” Therefore, the subject responds to the content of the norm violation per se, and thus the violation of its social expectations. The reactions can be characterized by high arousal and goal-directed actions that correspond to the nature of the norm violation. Striking features, such as infant screams or waa barks (see below), may still play a role in attracting a bystander’s attention (orienting reaction), but they are not direct releasers for the behavioral reactions of the bystanders. Rather, these reactions result from the evaluation of the entire situation and ensue if this situation is perceived as a norm violation. Collective social norms, finally, are most likely uniquely human, for they also include an awareness that the community shares the same social expectations as ego, and thus an internalized preference for an impartial rule.

To date, anecdotal and observational data support the possibility of social norms in chimpanzees (Goodall 1983, 1986), but systematic evidence is still rather scarce (de Waal 1991; Flack and de Waal 2002; Flack et al. 2004). For example, chimpanzees (and other species of animals as well) may react strongly to certain incidents, specifically toward harmful behavior, in their midst. Captive female chimpanzees have been reported to mediate between former opponents by facilitating grooming and reconciliation between them (de Waal 1982; de Waal and van Roosmalen 1979). Furthermore, chimpanzee bystanders have been reported to comment on dramatically escalating aggression by uttering waa barks. These loud and sharp vocalizations have been interpreted as protests against the violent incident (de Waal 1996; Goodall 1986; Killen and de Waal 2000). Such examples of pacifying behaviors and protest vocalizations are likely to go psychologically beyond pure egoism and reflect at least some social expectations about how others should behave. Yet, what exactly underlies the above-mentioned examples may be diverse and difficult to disentangle. For example, what appears to be arbitration may simply reflect annoyance at the noisy disturbance of a conflict and action to put a stop to it (Goodall 1986).

More systematic evidence is available for policing behavior—impartial interventions by third parties in ongoing conflicts (Boehm 1994; de Waal 1982, 1984; de Waal and Hoekstra 1980; de Waal and van Hooff 1981; Flack et al. 2005, 2006; Goodall 1986). A recent analysis of 5,500 conflicts that included 94 events of impartial third-party interventions revealed that arbitrators are most often high-ranking males and females (Rudolf von Rohr et al. 2012) and their behavior is most consistent with the hypothesis that it reflects a concern about the conflicts of others, or “community concern” (de Waal 1996). Of course, more selfish benefits, and thus selfish motivations, can again not be entirely ruled out since the arbitrators, even though not directly involved in the conflict, are always part of the same social group, and group stability may be an important individual interest. Nonetheless, in both humans and chimpanzees, it is unlikely that the individuals showing community concern at the proximate level mentally represent its ultimate goal—group stability and the individual benefits this entails.

Perhaps most suggestive of social norms in chimpanzees are the behaviors reported in response to infanticidal acts, which may represent violations of the putative norm not to harm infants. Chimpanzee infants universally enjoy high levels of tolerance (reviewed in Rudolf von Rohr et al. 2011), but on rare occasions they become the victims of severe aggression in the form of inter- as well as intra-community infanticide. Whereas infanticide between communities can be understood in the context of the high territoriality that includes coalitional killing of all catchable neighbors (Wrangham 1999), intra-community infanticide is more puzzling. It has been reported for non-kin of both sexes (Townsend et al. 2007; van Schaik 2000) and is not the result of a general aggressiveness, but presumably of particular individual interests, which is reflected by the fact that chimpanzee infanticidal behavior is infrequent (Murray et al. 2007) and highly selective (Hamai et al. 1992). Anecdotal reports show that such incidents provoke strong reactions in both female and male group members. These reactions include high arousal and persistent screaming, vocal protests in the form of waa barks, and even risky, highly-goal-directed behaviors such as attempts to intervene and coalitions in defense of the mother-infant pair (Goodall 1977; Hamai et al. 1992; Murray et al. 2007; Sakamaki et al. 2001; Townsend et al. 2007).

But how can the juxtaposition of within-group infanticide and a social expectation of not harming infants be explained? Most likely in the same way as in humans: Social norms (in the broad sense) and their underlying social expectations reflect the interest of society (e.g., a predictable social environment), which may collide with the interest of particular individuals (e.g., increased reproductive success). Indeed, it is exactly under such conditions that the presence of social norms becomes necessary (Rudolf von Rohr et al. 2011). Whereas the apparent goal-directedness of the group’s behavior is suggestive of proto-social norms, more selfish interests can again not be excluded.

The aim of our study was to systematically assess the reaction to the violation of a putative social norm by truly uninvolved bystanders in captive chimpanzees, examine the possible existence of social expectations, and hence identify candidates of evolutionary precursors of social norms. Based on the reports summarized above, we hypothesized the existence of a putative norm in chimpanzees not to harm infants (Rudolf von Rohr et al. 2011). We made sure to test truly uninvolved bystanders by presenting video clips of completely unfamiliar conspecifics that engaged in multiple instances of four categories of behavior: nut cracking; severe aggression against infants, including infanticide; hunting of a colobus monkey of similar size to an infant, including its killing; and severe aggression among adult chimpanzees. Captive chimpanzees demonstrably understand information presented as video clips (Parr 2001; Parr and Hopkins 2000; Poss and Rochat 2003). Specifically, we exposed them to multiple instances of behavioral sequences of each behavioral category and recorded their reaction in terms of looking times and behavioral indicators of negative emotional arousal and threat behavior directed at the television screen. The infanticide condition represented a violation of the putative social expectation of not harming infants, whereas the control videos and targeted analyses were used to exclude the possibility that the reactions were caused by stimuli other than the norm violation, such as the presence of unfamiliar conspecifics, general excitement and arousal in adult chimpanzees, severe aggression directed not at an infant but at a monkey of similar size or at adult chimpanzees, and the presence of screaming infants.

If chimpanzees have a social expectation that infants must not be harmed, and thus have proto social norms, the infanticide video represents a violation of this putative social norm for them. In this case, two predictions can be made. First, subjects should look longer at video clips depicting severe aggression against infants than the control video because nonhuman primates, including human infants (Hamlin et al. 2007; Kuhlmeier et al. 2003; Wang et al. 2004), look longer at unexpected events. In the past, looking times have been successfully used to investigate nonhuman primates’ physical (Cacchione and Burkart 2012; Cacchione and Krist 2004; Santos and Hauser 2002) and social (Bergman et al. 2003; Cheney and Seyfarth 1999) expectations. Our second prediction was that the subjects would show higher levels of negative emotional arousal during these video clips than during the control videos.

Methods

Subjects and Housing

Data were collected among two captive social groups of chimpanzees (Pan troglodytes) housed in the zoological gardens in Gossau (n = 14) and Basel (n = 10), both Switzerland. Only adult and subadult individuals were included in the study (n = 17, Table 1), and one female in Gossau had to be excluded because she was not regularly present during the experiments.

Table 1 Overview of study animals. Individuals are ranked according to group and sex

All adult females had been a mother at least once. The subadult females had ample experience with handling infants in their respective group. Furthermore, at both sites there had been at least one incident in which rejected newborns were killed, and thus all subjects were familiar with lethal aggression against infants. In all these cases, the perpetrators were female.

The chimpanzees in Gossau had access to a 900-m2 outdoor facility and a 300-m2 indoor facility. In Basel, the chimpanzees had access to a 50-m2 outdoor facility and a 200-m2 indoor facility. The indoor facilities at both sites were split in at least two compartments, separated by walls with large passage ways. All subjects were tested in a suitable (and for the animals most comfortable) compartment of their indoor facility. All indoor facilities contained ample three-dimensional climbing structures, nets, ropes, and artificial termite mounds, and were regularly supplied with enrichment items. The chimpanzees at both sites were fed several times a day on a mixture of fruit, vegetables, and seeds, and had ad libitum access to water and also received tea or juice. Subjects were neither food- nor water-deprived and did not receive any rewards during or after the experiments and had never been exposed to the video clips prior to the experiment.

The study complied with all regulations regarding the ethical treatment of animals and was formally approved by the veterinary offices of St. Gallen and Basel, both Switzerland. Both zoos belong to the European Association of Zoo and Aquaria (EAZA) and therefore complied with their welfare requirements. The experiments presented in this paper did not induce any aggressive behavior in the subjects toward other group members.

General Procedure

The experiment consisted of a habituation phase and a test phase. In the habituation phase, we showed the whole group a neutral video clip in which unfamiliar conspecifics performed socially neutral behaviors, such as nut cracking and walking around (Neutral condition). The clip was presented six times, once a day on three consecutive days in two consecutive weeks. During this phase, the animals were habituated to watching video clips and seeing unfamiliar chimpanzees.

The experimental phase followed right after the habituation phase and lasted for 6 additional weeks. During three consecutive days in each week, we presented the animals with one video clip per day showing unfamiliar conspecifics (a) performing severe aggression against a chimpanzee infant, including infanticide (Infanticide condition); (b) hunting, including the killing of a small colobus monkey (Hunt condition); or (c) involved in social aggression among adults in various contexts (Aggression condition). Each clip was composed of three events from the respective category, and the clips were presented in a counterbalanced order.

The video clips were always presented in the morning after feeding time, using a Philips DVD player and a 31.5-inch (80 cm) Sony LCD color television screen. Both devices were placed in the animal keeper area in front of the enclosure. To obtain better ecological validity and to avoid separation effects, the video clips were presented to the whole group of chimpanzees. In Gossau, subjects were tested in one large compartment, which permitted all subjects to have visual and auditory access to the video. In Basel, subjects had access to two compartments, a large one and a smaller, adjacent one. From the large compartment, all animals had visual and auditory access to the video clip, but from the smaller one, they only had auditory access.

All experiments were videotaped with three Sony HDV video cameras from three different perspectives. One video camera was placed right next to the television screen and the other two were placed in the visitor area, to capture what happened in the remaining part of the test compartment. No zoo visitors were present during the experiments, and the animal keepers followed their daily routine.

Video Standardization

The stimulus material was taken from research films as well as documentary footage and depicted unfamiliar chimpanzees, largely from the wild. For the video clips used as stimuli, we selected film footage that showed the most representative forms of the behavior of interest, had the least accompanying background noise, and had close-ups of individuals. For each of the four conditions (Neutral, Infanticide, Hunt, and Aggression), the stimulus consisted of three different instances of the respective behavioral category. By presenting multiple instances of the behavioral category in each condition, we made sure that the animals would not respond to specific perceptual features of a single video clip, but instead to the general stimulus significance.

Video stimuli were standardized as follows (Fig. 1). Each video clip started with an intro consisting of a short sequence showing the Teletubbies, followed by a test frame accompanied by a buzzer, for a total of 15 s. This intro functioned as an attention getter for the subjects. After the intro came three sequences of 20 s each. These sequences were separated by breaks (black screens) of 5 s. The third sequence was followed by a break of 10 s. After this, the whole unit of the three sequences (without the intro sequence) was repeated five times in order to ensure that subjects comprehended the content of the sequences and to provide subordinate subjects with enough time to approach the television screen. The entire video stimulus per test session thus lasted for 8 min. Audio was available and the volume of the video clips equalized and kept constant in all experiments. All film footage was edited using Apple Final Cut software.

Fig. 1
figure 1

Schematic design of the video clips

Data Collection and Analysis

The collected video material was analyzed with INTERACT software (Version 8.0.2). Each subject was coded individually. The behaviors coded included looking time durations and behavioral indicators of negative emotional arousal, including events of scratching, yawning, and unrest (walking around) (Aureli and van Schaik 1991; Baker and Aureli 1997; Das et al. 1998; Troisi 2002). We also coded threat behaviors directed at the television screen, including arm-raising, stamping, slapping the ground, swaggering, and piloerection (Goodall 1986; Nishida et al. 1999). Arousal was calculated as the sum of events of scratching, yawning, and unrest per minute; likewise, threat behavior was calculated as the sum of the corresponding behaviors per minute. Looking time (time spent watching the video) was defined as the percentage of time the subjects spent watching the video clips in front of the television screen up to a distance of 5 m. The time during which subjects looked at the intro was excluded from the analysis. To assess inter-rater reliability, 59% of the data from the test phase (180 of 306 individual/condition combinations) was coded by an additional rater who was blind with regard to the experimental condition. The correspondence between the first and second raters was high (Hunt: r = 0.992, n = 60, P < 0.001; Aggression : r = 0.998, n = 60, P < 0.001; Infanticide: r = 0.990, n = 60, P < 0.001).

Because of the spatial arrangement of the compartments in Basel, individuals had no visual access to the video screen when they were in the smaller compartment. Furthermore, when individuals were watching from a very specific location in the larger compartment it was not possible to unambiguously identify their gaze direction. For the main analysis, we therefore excluded from the total observation time the durations when individuals were in the smaller compartment or in this specific location and calculated how often the relevant behaviors occurred in the remaining observation time. These exclusions in Basel averaged around 10.48 ± 2.05% (Mean ± SE). Alternatively, one might argue that individuals in the smaller compartment were there because they were uninterested in the video (note that the vocal stimuli from the videos were audible in the entire enclosure) and that these 10.5% should be included in the main analysis and coded as “not looking.” To control for this possibility, we ran the main analyses twice, once including and once excluding these data (see below).

In Gossau three females had recently joined the group, resulting in a period of general social upheaval (Rudolf von Rohr et al. 2012). The Gossau chimpanzees, in particular the adult males, were highly sensitive to disturbances occurring from the outside, such as maintenance activities, and/or from within the group (i.e., conflicts). The males then typically responded with intense bluffing behavior that affected the entire group. Such episodes during testing may have affected the subjects’ looking and emotional distress behavior. In order to control for this effect, the time in which these disturbances occurred was excluded from the main analysis. In Gossau, such disturbances averaged around 12.12 ± 3.69% (Mean ± SE). They never occurred in Basel.

To quantify the effect of these exclusions, we compared the looking times both when including and when excluding the periods in which animals had either no visual access to the screen or gaze directions were too ambiguous (Basel) or disturbances had occurred (Gossau). Both measures were strongly correlated (data for all six sessions, n = 17: Hunt: r = 0.973, P < 0.001; Aggression: r = 0.967, P < 0.001; Infanticide: r = 0.992, P < 0.001). Most importantly, the main pattern of results reported below (longer looking times in Infanticide compared with Hunt and Aggression conditions) is significant regardless of whether these periods are included or excluded. In the main analyses, we therefore used the more conservative data set in which these periods were excluded. All statistical analyses were performed in SPSS 20.

Results

Looking Times

A repeated-measures ANOVA on the looking times, including the within-subject factor test condition and the between-subject factors group and sex, revealed a significant main effect for test condition (F 3,11 = 6.5, P = 0.008) but not of the other factors (group: F 1,13 = 0.055, P = 0.818; sex: F 1,13 = 0.8, P = 0.39) or the two- and three-way interactions between any of the factors: F always < 1.3, P always > 0.27). Pair-wise post-hoc comparisons based on paired t tests revealed that the main effect of condition reflected that the chimpanzees looked longer at the infanticide than at the other three conditions (Fig. 2a). The pattern was similar during the first exposure to the stimuli in the first week (Fig. 2b) but much stronger than over all six test sessions, with looking time for infanticide roughly four times higher than for the other treatments and the control.

Fig. 2
figure 2

Looking times in the four conditions (Mean ± SE). a = entire test phase, b = first exposure to the stimuli. The chimpanzees spent more time looking at severe aggression toward infants (Infanticide) compared to severe aggression against a non-conspecific monkey (Hunt), severe aggression against adult chimpanzees (Aggression), or conspecifics engaging in neutral behaviors (Neutral)

Excluding Alternative Explanations

The chimpanzees’ looking behavior provides clear evidence that they looked longer at the infanticide scenes. However, to make sure that this looking pattern was driven by the perception of the violation of the putative norm not to harm infants per se, several alternative explanations have to be ruled out.

First, it could be argued that the looking time pattern resulted from a preference for novelty, or that it is more difficult to interpret events involving infanticide. However, novelty effects cannot account for increased looking times in the Infanticide condition because lethal aggression against infants had previously occurred at least once in both groups and hence did not represent a novel event for the subjects. In contrast, hunting behavior, including killing of a monkey, was definitely new to them, but did not elicit looking times beyond control levels. Thus, novelty cannot account for the looking pattern.

Second, the result may have been driven by females with dependent offspring. These mothers may have been concerned about the safety of their own infants, rather than responding to a perceived norm violation. We therefore compared the looking times in the Infanticide condition between females with dependent offspring and other females and also with all other group members, but found no differences (t 8 = −0.616, P = 0.56; and t 15 = 0.409, P = 0.69, respectively). However, even though not all females had dependent offspring, all females either had been a mother at least once or had ample experience with infants. We therefore also compared all females with all males, but again found no difference in looking behavior during the Infanticide condition (t 15 = 1.14, P = 0.27). Taken together, these results thus don’t support the second alternative hypothesis, that mothers’ concern for their own dependent infants drove the looking pattern.

A third alternative for the finding that the chimpanzees paid more attention to the infanticide videos than to the other videos is that the subjects may have looked longer at the Infanticide condition because it contained striking features other than the violation of the social expectation (i.e., harming an infant) per se. Excluding this possibility is particularly important in order to determine whether the behavior was guided by quasi or proto social norms. Such striking features may include infants (representing attractive objects), unfamiliar males (representing potential enemies), hectic movement (running around, representing high arousal), and screaming, waa barks, and infant screaming (all representing salient vocalizations).

In order to examine these alternative possibilities, we first quantified the presence of the respective striking features in the stimulus video clips. Based on the rates of the various striking features occurring in the Infanticide condition, we calculated the expected looking time for each control condition under the assumption that the looking times in the Infanticide condition were driven entirely by the respective striking feature. For example, infants are present in the Neutral condition almost as often as they are in the Infanticide condition. We would therefore expect to find similar looking time durations in the two conditions if the presence of infants alone was driving the subjects’ looking behavior. However, a comparison of the expected looking time calculated for the baseline condition (Mean ± SE = 18.62 ± 3.82) against the observed looking time for this condition (Mean ± SE = 5.82 ± 1.41) revealed a significant difference (Sign test: N = 17, z = −2.91, exact P = 0.002), indicating that subjects did not look longer at Infanticide owing to the presence of infants in this video clip.

Similar results were found for the presence of unfamiliar males and screaming, which are also almost equally present in the video clips Hunt and Aggression as they are in Infanticide (Table 2). The presence of hectic movement also could not account for the looking time pattern we found, since this feature is actually twice as common in the Hunt and Aggression conditions as it is in the Infanticide condition (see Table 2 for exact P values).

Table 2 Possible alternative explanations for the subjects’ looking behavior

Both waa barks and infant screaming were only present in the Infanticide condition. However, the three sequences in the Infanticide video did not all contain these striking features, and when they were present, their rates varied. For instance, waa barks were present in the first and third sequence, but the chimpanzees did nevertheless not look longer at the first and third sequence. In fact, the looking times were distributed randomly over the sequences (Friedman’s test: N = 17, χ2 = 3.836, df = 2, exact P = 0.15). The analysis of infant screams led to the same conclusion (Table 2). Thus, the presence of waa barks or infant screams was not responsible for the variation in looking times.

To capture even more fine-grained differences in looking times, we analyzed the looking behavior of the chimpanzees with a 1-s resolution. We determined how the striking features were distributed across the three sequences of the Infanticide clip, and whether the timing of the attention paid by the chimpanzees was contingent on the presence of waa barks and infant screams. We then compared the percentage of time subjects looked at the screen in the Infanticide condition when screams or waa barks were present vs. absent. This analysis was performed for the first test session, and for Gossau only. The looking times did not significantly differ (paired-sample t test, infant screams: t 9 = 2.14, p = 0.061; waa barks: t 9 = −0.184, p = 0.858; Fig. 3). However, because the infant screams may have shown a trend, we also added 5 s after each infant screen or waa bark to detect potential aftereffects—in other words, that the striking feature would attract the attention of the subject to the screen who then would check the event on the TV for a couple of seconds. Even during these additional 5 s, the subjects did not watch the screen for a longer percentage of time than they did for the clip in which these features were absent (infant screams: t 9 = 2.07, p = 0.068; waa barks: t 9 = −2.006, p = 0.071), although the trend was still positive for the infant screams (and negative for waa barks). The mean difference in looking times when infant screams were present vs. absent was very similar when comparing both the exact duration when this feature was present (Mean = 3.57%, SD = 5.27%) and the same duration plus 5 s (Mean = 3.17%, SD = 5.27%). In other words, there probably was a small initial effect of looking at the screen when infant screams were present, but no aftereffect. Furthermore, the looking times increased only moderately when screams were present vs. absent (by 17% and 16%, for the full duration and the duration plus 5 s, respectively) and therefore cannot account for the overall pattern of results, with looking times that were up to 500% longer in Gossau in the Infanticide condition (Mean = 25.5% of the video, SEM = 8.7%) compared with the other conditions (Neutral: Mean = 4.2%, SEM = 3.8%; Hunt: Mean = 5.2%, SEM = 2%; Aggression: Mean = 11.2%, SEM = 5.88%).

Fig. 3
figure 3

Looking times during the infanticide condition (during first session, exposure to the stimuli: Gossau), when infant screams were present vs. absent (Mean ± SE, light gray), and when infant screams were present + 5 s afterwards vs. the remaining time without screams (Mean ± SE, dark gray). The second analysis was added to control for potential aftereffects. The differences are not significant

Arousal and Threat Behaviors

Next, we analyzed whether, in addition to the longer looking times, chimpanzees would also show higher levels of arousal in the Infanticide condition. A repeated-measures ANOVA revealed a significant main effect of the within-subject factor test condition (F 3,11 = 6.153, P = 0.01), but not for the between-subject factors group (F 1,13 = 1.34, P = 0.267) or sex (F 1,13 = 4.37, P = 0.057). In addition, we found a strong interaction effect for test condition*group (F 3.11 = 7.77, P = 0.005) and a weak interaction for sex*group (F 1,13 = 4.7, P = 0.058). We therefore analyzed the data for both groups separately, including the factors test condition and sex in the repeated measures ANOVAs.

In both groups, there was no main effect for test condition (Basel: F 3,3 = 3.9, P = 0.148; Gossau: F 3,6 = 2.6, P = 0.147). There was a significant effect of sex in Basel (F 1,5 = 16.8, P = 0.009), but not in Gossau (F 1,8 = 0, P = 0.992), and the interaction test condition*sex was not significant in either group (Basel: F 3,3 = 0.368, P = 0.783; Gossau: F 3,6 = 0.641, P = 0.616).

Post-hoc analyses based on paired t tests revealed that the females in Basel exhibited higher arousal than males when watching videos showing severe aggression against adults (Aggression: t 5 = 2.7, P = 0.043), and there was a trend for higher arousal in females when watching hunting (Hunt: t 5 = 2.54, P = 0.052) and neutral videos (Neutral: t 5 = 2.48, P = 0.056). Neither of the sexes, in either group, showed higher arousal in the infanticide condition than in the other test conditions. This pattern was present both in the overall data (Fig. 4a) and during the first exposure to the stimuli in the first week (Fig. 4b). When looking at each arousal component separately (scratching, yawning, unrest), we also found no evidence for higher arousal in infanticide conditions compared to the other conditions, neither in the first test session nor in all test sessions combined.

Fig. 4
figure 4

Arousal in the four conditions (Mean ± SE). a = entire test phase, b = first exposure to the stimuli. The chimpanzees showed comparable levels of arousal across all conditions

Over all conditions, we observed 583 instances of threat behaviors directed toward the screen: 41% occurred in the Infanticide condition, 28.87% in Aggression, 16.12% in Hunt, and 18.01% in Neutral. However, 88.16% of all threat behaviors were performed by a single female (Xindra), and we restricted the statistical analysis to this individual. Xindra performed significantly more threat behaviors in the Infanticide condition (41.05%) compared with Aggression (18.87%), Hunt (13.62%) and Neutral (26.46%; χ 2 = 87.76, df = 3, P < 0.001).

Discussion

We found that chimpanzees discriminated between a video clip depicting severe aggression against an infant and video clips depicting other forms of social aggression or neutral behavior. Specifically, they showed significantly longer looking times in the infanticide condition than in the control conditions. This result is consistent with the idea that severe aggression against infants did not match chimpanzees’ social expectations of a certain tolerance normally afforded to infants.

Several alternative explanations could be ruled out experimentally. First, our results could not be explained by the fact that the chimpanzees reacted more strongly to infanticide because it was an unknown situation that was difficult for them to interpret (novelty) since in both groups infanticide had occurred previously whereas hunting had not. Second, our results could not be explained by the fact that mothers with dependent offspring were primarily concerned for their own offspring because the effect was not stronger in current mothers than in other individuals, and also not in parous females compared with other individuals. Third, and most important, our results could not be explained as a direct response triggered by striking features, such as the presence of infants, unfamiliar males, hectic movements, screaming by adults, waa barks, or infant screaming. There was a non-significant trend for the animals to orient to the screens upon hearing an infant scream, but this effect was short-lived and far too weak to explain the main pattern of results. The looking time pattern of the chimpanzees thus suggests that they paid preferential attention to the infanticide scenes as a whole, rather than only responding to the infant screams. This result is most consistent with the presence of proto social norms, where individuals react as bystanders to the violation of a certain expectation of how others should behave.

Although the chimpanzees clearly looked longer at the infanticide videos, we found no evidence that this event elicited higher levels of arousal in the subjects. This lack of evidence may have three explanations. First, it may be an artifact arising from the experimental approach. Perhaps for most subjects (as for most humans) watching television lacks the immediacy, and hence the dramatic nature, of realistic events and therefore does not systematically elicit an overt behavioral response. This is consistent with the near-absence of threat behaviors against the video screen, but more difficult to reconcile with increased arousal in some conditions. Second, it might be that our method of measuring behavioral indicators of negative emotional arousal was too crude to detect more subtle emotional responses to the different video clips. Physiological measures that can be collected in normally behaving chimpanzees may provide more accurate data on negative emotional arousal, such as peripheral skin temperature, tympanic membrane temperature (Parr 2001; Parr and Hopkins 2000), or startle reactions (Lang et al. 1997). Such measures would allow for the detection of subtle physiological changes in the animals, which do not necessarily translate into observable behavior. However, this interpretation does not explain why the aggression videos did elicit a measurable increase in arousal, at least in the females in Basel. Even though more fine-grained follow-up studies are desirable, this suggests the absence of arousal observed in the infanticide condition is real and not a methodological artifact.

The third possibility is that the absence of emotional responses suggests that chimpanzees as uninvolved bystanders detect norm violations, but that these events are not accompanied by negative emotional reactions if they don’t occur within their own group. Both the presence of quasi social norms as well as the presence of proto social norms would have predicted higher arousal, either because of contagion (quasi social norms) or some empathetic competence (Rudolf von Rohr et al. 2011). These notions had been developed based on naturalistic observations of spontaneous events that suggest strong bystander reactions to harmful behavior (de Waal 1991, 1996; Flack & de Waal 2002), including protesting waa barks (de Waal 1996; Goodall 1971; Killen and de Waal 2000), policing (Boehm 1994; de Waal 1982, 1984; de Waal and van Hooff 1981; Goodall 1986; Rudolf von Rohr et al. 2012), and in the case of infanticide, risky, goal-directed behaviors (interventions and defense) in favor of the mother-infant pair (Goodall 1977; Hamai et al. 1992; Murray et al. 2007; Sakamaki et al. 2001; Townsend et al. 2007). Indeed, severe aggressive acts toward infants are typically highly dramatic and provoke massive protests (waa barking) and high levels of arousal in bystanders (de Waal 1996; Goodall 1971; reviewed in Rudolf von Rohr et al. 2011).

When viewed in the light of these naturalistic observations, the pattern of results suggests that chimpanzees detect norm violations both within their group as well as in a group of unfamiliar individuals, but that they only respond emotionally to such norm violations within their own group. It has been argued that human social norms emerged through within-group social interactions (Ellikson 2001; Ehrlich and Levin 2005) and hence probably were applied only, or most strongly, to in-group members (Bowles and Gintis 2004). Chimpanzees combine high within-group solidarity with high out-group hostility (Boesch 2009). Thus, taken together, the evidence suggests that chimpanzees are likely to have strong social expectations that infants must not be harmed but that the violation of these expectations only releases an urgent emotional reaction when it occurs within their own community.

Nevertheless, the reaction of the one female in this study who did show threat behaviors may warrant the working hypothesis that chimpanzees sometimes may generalize their social expectations and also apply them to out-group conspecifics. This adult female (Xindra) regularly exhibited very strong behavioral indicators of negative arousal and threat behavior to close-ups depicting a male chimpanzee committing infanticide but did not do so in the other conditions. These fierce reactions persisted over the entire experimental phase. An intriguing working hypothesis is thus that under some conditions, social expectations may gradually increase the social reach of their validity. Future studies aiming at pinpointing these conditions will significantly contribute to a better understanding of how such an extension became more widespread during human evolution.

In conclusion, our quantitative study provides the first tentative evidence that chimpanzees, like humans, are sensitive to the appropriateness of behaviors that do not affect themselves. Chimpanzees distinguish severe aggression against infants from other forms of aggression and harmful behavior, indicating that such incidents do not match the social expectations of tolerance toward infants. This tolerance afforded to infants might, contrary to other behavioral regularities in chimpanzees, constitute a proto social norm, whereby individuals react to the norm violation per se (Rudolf von Rohr et al. 2011). This finding adds to the growing body of evidence investigating possible building blocks of human morality (de Waal 2006) in our closest living relatives, including consolation (Fraser and Aureli 2008; Fraser et al. 2008; Koski and Sterck 2009), inequity aversion (Brosnan and de Waal 2014, but see Bräuer and Hanus 2012), instrumental helping (Melis et al. 2011; Warneken and Tomasello 2006; Yamamoto et al. 2009), and spontaneous altruism (Warneken et al. 2007). Each of these building blocks can be identified to some extent in chimpanzees: Together they form the basis upon which the uniquely human forms of normativity were built (for a more detailed discussion, see Rudolf von Rohr et al. 2011; van Schaik et al. 2014). Although this study focused exclusively on chimpanzees, other species of animals “endowed with well-marked social instincts” (Darwin 1871 [1982]) might also form social expectations about how others, specifically infants, should be treated. This fruitful topic for future research might provide us with important insights into the evolution of specific social norms in humans and why some of them are widely accepted and others more difficult to establish.