Introduction

The speed and ease with which we recognize highly familiar faces belies the difficulty we experience performing the same task with unfamiliar faces. In studies of face identification in human adults, behavioural data suggest that we find it difficult to match two photographs of an unfamiliar face because of the various ways face structure can be degraded by circumstance (Bruce 1994; Bruce et al. 1999; Bruce and Young 1986; Burton et al. 2005, 2011, 2016; Hancock et al. 2000; Hill et al. 1997; Kemp et al. 1997; Megreya and Burton 2006; O’Toole et al. 1998). For example, every photograph of a face will be degraded to some extent by changes in viewing distance, eccentricity, expression and viewpoint, among other things. The image of a face on the retina is influenced by these same contextual variables, altering the appearance of facial features in such a way that each encounter with a person’s face can provide very different visual patterns for analysis. The variability in the retinal images from one encounter with a face to the next is thought to explain why unfamiliar face identification is such a difficult task (Andrews et al. 2015, 2016; Bruce et al. 1994, 1999; Burton et al. 2005, 2011, 2016; Dowsett et al. 2016; Hancock et al. 2000; Jenkins and Burton 2011; Jenkins et al. 2006; Johnston and Edmonds 2009; Megreya and Burton 2006; Ritchie and Burton 2016). Importantly, we are able to recognize the faces of a large number of people that we encounter over our lifetime automatically and effortlessly, despite the numerous ways a retinal image can be distorted, in addition to the physical changes that result from changes in age (Andrews et al. 2016; Megreya and Burton 2006, 2008; Megreya et al. 2013). This suggests that the human brain processes familiar and unfamiliar faces differently, a hypothesis supported by several neuroscientific techniques (Balas et al. 2010; Caharel et al. 2009, 2011, 2014; Ewbank et al. 2008; Gobbini and Haxby 2006; Itier and Taylor 2002; Ramon et al. 2015). Continuing questions about familiar face recognition concern both the nature of the underlying representations and how these representations might change over time to make familiar face recognition seem effortless [for reviews see Jenkins and Burton (2011), Johnston and Edmonds (2009)].

To understand how exposure might improve recognition, researchers have investigated the theory of ‘stability from variation’ posited by Bruce (1994). Although this is an older psychological theory, it remains promising and continues to provide testable hypotheses that address humans’ exceptional ability to recognize familiar faces (Andrews et al. 2015, 2016; Burton et al. 2005, 2016; Clutterbuck and Johnston 2005; Dowsett and Burton 2015; Dowsett et al. 2016; Jenkins et al. 2006; Megreya and Burton 2006, 2008; Megreya et al. 2013; Ritchie and Burton 2016; Robertson et al. 2015, 2016; White et al. 2014a, b). The cornerstone of this theory is that the variable appearance of a person’s face allows the perceiver to distil a robust representation of that individual that maximizes aspects of their appearance that are relevant for identification (for example, stable diagnostic information), while discarding the non-diagnostic information that arise in any particular set of images due to contextual variability (Bruce 1994; Burton et al. 2005, 2011). This representation is not dissimilar to a mathematical average, a statistical method for preserving consistent information while weakening information that varies with higher frequency. An average face can be experimentally created by defining the shape of a person’s face in multiple photographs using a series of discrete xy coordinates and then aligning those coordinates to a standardized space. Because non-diagnostic information, such as lighting direction, is uncorrelated with identity, the morphed image regresses to the reliable pixel information. The process thus dilutes aspects of the image that change from one photograph to the next, while preserving aspects of the image that are constant. Behavioural support for the idea that the brain uses average faces to encode individual identity comes from studies in which researchers have compared recognition performance for familiar faces presented as either mathematical averages of multiple photographs or singular photographs (Burton et al. 2005, 2011; Jenkins and Burton 2011; Johnston and Edmonds 2009). Human subjects reportedly found it easier to recognize familiar/famous faces presented as averages, than individual photographs of the same people. This finding was used to argue that for humans recognizing human faces, average images were more powerful representations of individual identity than individual photographs, but is this true for all primates?

Turning to studies of comparative psychology, while a number of primate species have face-selective neural mechanisms (marmosets Hung et al. 2015; chimpanzees Parr et al. 2009; and rhesus monkeys Tsao et al. 2003), it is not necessarily true that all primate species are as finely tuned to familiar faces as humans. Our aim in this paper was to compare behaviour towards average faces in two species of nonhuman primates: chimpanzees (Pan troglodytes) and rhesus monkeys (Macaca mulatta). The comparison does not require a significant difference in protocol, as is required in comparing human and nonhuman primate behaviour (i.e. the difference between measuring a conditioned behaviour after generalizing to a novel stimulus set and measuring behavioural responses that result from verbal instructions in a university setting). In this paper, we test the behaviour of our closest living relative, the chimpanzee, with whom we last shared a common ancestor approximately six million years ago and with whom we are more likely to share similar cognitive and neural mechanisms, to that of the rhesus monkey which last shared a common ancestor with humans approximately 23 million years ago. Moreover, the vast majority of the research shows similar face processing mechanisms in chimpanzees and humans, including expertise (Parr et al. 2011; Taubert et al. 2012a, b; Weldon et al. 2013) and face space organization (Parr et al. 2012). More importantly, here we take the opportunity to compare chimpanzee behaviour to the behaviour response of a species more distantly related but far more commonly used as an animal model of social cognition without making substantial changes to the experimental protocol (i.e. the rhesus monkey). While face discrimination in chimpanzees seems to be sensitive to increase in familiarity, as one would expect based on reports of human behaviour (Parr et al. 2011), there is some doubt as to whether familiarity has some impact on face discrimination in rhesus monkeys (Parr et al. 2008). Therefore, it is possible that rhesus monkeys recognize and discriminate individual faces using different psychological mechanisms than those available to great ape species such as humans and chimpanzees.

In service of our research goals, we compared how chimpanzees and rhesus monkeys matched single photographs of unfamiliar conspecifics or digital averages in an identity-matching task. Based on previous reports of human behaviour and the assumption that the stored representation looks more like an average face than a single instance (Jenkins and Burton 2011), the expectation was that discriminating two comparison stimuli in a two-alternative forced-choice task would be easier if the comparison stimuli were averages than when the comparison stimuli were single instances (i.e. a main effect of comparison stimuli). This same theory would also predict it would be easier to match an unseen instance of a face (viewpoint deviants) to an average image than to another single instance. However, if the brain stores multiple representations of familiar faces, rather than an average, then we predict proficient performance regardless of whether the subjects are matching to averages or to single instances (i.e. no effect of comparison stimuli and no interaction between sample stimuli and comparison stimuli).

Materials and methods

Subjects

Five adult chimpanzees (P. troglodytes, three male) aged between 17 and 24 years served as subjects together with five mature rhesus monkeys (M. mulatta, two male) that were 8 years old. All subjects were captive-born. The chimpanzees were nursery-reared in peer groups by humans until they were 4 years old. At that time, they joined established social groups at the Yerkes Main Station. The monkeys were mother-reared in large social groups at the Yerkes National Primate Research Center Field Station until they were 4 years old after which they were relocated to the Yerkes Main Station where they were pair-housed in a colony room, able to maintain visual and auditory contact with a large number of other rhesus monkeys. All subjects had participated in cognitive experiments before and had repeatedly demonstrated their ability to perform the matching task employed here (Parr and Heintz 2009; Parr et al. 2008, 2011, 2012; Parr and Taubert 2011; Taubert et al. 2012a, b; Taubert and Parr 2009, 2011, 2012; Weldon et al. 2013).

Stimuli

Photographs of twenty individuals were used as stimuli in these experiments (ten female chimpanzees and ten female rhesus monkeys). Although these individuals were personally unfamiliar to subjects, numerous photographs of them had been presented as stimuli through other experiments conducted over several years. Although the exact number of exposures each subject had to each stimulus is unknown, each stimulus in the current experiment had been seen in at least six other experiments completed in the 2 years prior to testing. Additionally, we note that often these experiments involved a high degree of image manipulation applied to completely independent images (Parr et al. 2011, 2012; Parr and Taubert 2011; Taubert et al. 2012a, b; Taubert and Parr 2011, 2012; Weldon et al. 2013). For each species, 5 of the 10 individuals were randomly selected to serve as ‘targets’, e.g. the matching images, while the remaining five would serve as ‘foils’ or nonmatching images. Examples of the stimuli used in this experiment are shown in Fig. 1. For each of the 20 individuals in the stimulus set, we collected a number of photographs. These photographs are part of a large database maintained by LA Parr (lisaaparr.com).

Fig. 1
figure 1

Examples of the experiment stimuli for two target individuals. Top row chimpanzee known as ‘Cheopi’ from six different viewpoints. The image on the far right is a computer-generated average of ten photographs. Bottom row monkey known as ‘Dc08’ from six different viewpoints with the ten image average on the far right

For each stimulus identity, we selected six photographs referred to as instances. The first instance selected was considered the ‘best instance’ (see Fig. 1). Selection of the best instance was based on a number of criteria: front most viewpoint, neutral expression, no shadows or obstructions. These stimuli had all been used in previous experiments. The remaining five instances we refer to as ‘deviants’ because the subjects’ face varied in viewpoint and the lighting conditions were not ideal. We also made no effort to control mouth or eye direction in these images, although the faces all displayed a neutral expression and were not occluded by other objects (i.e. all facial features were visible; see Fig. 1). Important, because these deviant images were considered undesirable photographs in which to represent individual identity, they had never before been seen in any experiment and thus were completely unfamiliar to the subjects.

The averages of each individual (targets and foils) were created using Psychomorph software (Tiddeman et al. 2001). Each average was comprised of ten photographs that were delineated by hand and morphed into a single image—this procedure has been described in detail elsewhere (Parr et al. 2012). None of the instances or deviants that were used in this experiment was also used in the creation of the individual average. Like the best instances described above, all the subjects had seen and matched these average images in a prior experiment (Parr et al. 2012).

All stimuli were cropped and resized to fit on a square canvas (350 × 350 pixels) and converted into 256 shades of grey using Adobe Photoshop. Also, a customized black mask was applied to cover the background information, so only the head was shown (e.g. see Fig. 1). In addition, the averages and best instances were rotated so that the two eyes were aligned on the horizontal axis. The contrast and brightness of these images (both averages and best instances) were also adjusted so that all the exemplars matched as much as possible (see Fig. 1). In sum, we created three different kinds of images—best instances, deviants and averages. The best instances and averages had all been seen in previous experiments and would now serve as comparison stimuli. Deviant images had never been seen before and were only used as sample stimuli in the deviant trials.

General procedure

Subjects were tested twice daily with computerized systems that have been described in detail in other papers (Parr et al. 2011, 2012; Weldon et al. 2013). For the chimpanzees, the system consisted of a 19″ computer, a colour monitor and an industrial joystick. The custom-made testing rig was mobile and could be wheeled in front of the subject’s home cage. For the monkeys, testing also took place in the home cage using a mobile testing rig equipped with two 15″ touch screen monitors built into a metal frame. These frames allowed the monitors to be attached to the front of the subject’s home cage and left for extended periods of time.

All subjects were tested using a standard match-to-sample procedure built using Visual Basic software. Each discrete trial began with the presentation of a single image, the sample, in one of the four positions (centre top, centre bottom, centre left and centre right). The position of the sample was determined at random. Subjects were required to orient towards the sample, either by contacting it with a joystick-controlled cursor (chimpanzees) or by touching it three times in rapid succession on a touchscreen monitor (monkeys), after which two additional stimuli would appear on the opposite side of the screen, equidistant from the sample (see Fig. 2a). These were the two comparison stimuli (one target and one foil). The side the target appeared on was controlled at random. When the subjects correctly selected the target stimulus (by contacting it with a joystick-controlled cursor in the case of chimpanzees or touching the image three times in rapid succession on a touchscreen monitor for the monkeys), they were rewarded with a squirt of juice (chimpanzees) or a small piece of food (monkeys), followed by a short inter-trial interval of 2 s. An incorrect response to the foil was followed by an inter-trial interval of 6 s and no reinforcement.

Fig. 2
figure 2

a Schematic of the behavioural task. b An example of a visual display from each of the four unique conditions. In each example, the comparison stimuli are presented below the sample. The target image is on the left and the foil is on the right

Experimental design

To test whether (1) our subjects could better discriminate averages than single instances, and (2) whether our subjects could match novel instances (deviants) to averages more easily than to single instances, we created an experiment with four unique conditions in a 2 × 2 factorial structure. The two factors were sample stimuli (same image vs. viewpoint deviant) and comparison stimuli (instances vs. averages). The first factor, sample stimuli, describes a manipulation of the sample, which could be either the same image as the target (effectively an image matching task) or a deviant image (a novel exemplar of an animal that had been seen and matched in previous experiments; see Fig. 2b). It was expected that the subjects would be more accurate when the sample was the same image as the target (image matching) than when the sample was a viewpoint deviant (identity matching). We also manipulated the comparison stimuli, both the target and the foil, which were either best instances or averages (see Fig. 2b; both best instances and averages were images that had been used and matched in previous experiments).

In each of the four unique conditions (same image/instances, same image/averages, viewpoint deviant/instances and viewpoint deviant/averages), each of the five target face identities was matched against each of the five foil face identities, twice. Thus, there were ten trials per condition. All 40 unique trials were presented as part of a single test session, with trial order determined at random. This test session was repeated ten times over 5 days with all subjects completing 400 trials in total. All data were included from the consecutive testing sessions with no a prior performance criterion set.

Results

Chimpanzees

The 2 × 2 repeated-measures ANOVA procedure performed on the chimpanzee data revealed a main effect of sample stimulus [F(1, 4) = 68.436, p = 0.001, partial η 2 = 0.945), indicating that the subjects responded more accurately in the matching task when the same image appeared as both the sample and match (image matching) than when the sample and the match depicted a different view of same individual, as we expected. The main effect of comparison stimuli [F(1, 4) = 15.714, p = 0.017, partial η 2 = 0.797; see Fig. 3) confirmed one of the experimental hypotheses by indicating it was easier for the subjects to match individual faces when the comparison stimuli were averages, generated from multiple photographs, rather than single photographs. The interaction between sample stimulus and comparison stimuli was not significant [F(1, 4) = 0.507, p = 0.516, partial η 2 = 0.113; see Fig. 3]. Nonetheless, a follow-up contrast (paired t test, one-tailed) was used to compare performance in two conditions (viewpoint deviant sample/instances and viewpoint deviant sample/averages) and confirmed it was easier for the subjects to match a deviant sample stimulus to an average than to match a deviant to a single instance (p = 0.04).

Fig. 3
figure 3

a Summary of the data collected from chimpanzees. Error bars reflect standard error. b A summary of the data collected from rhesus monkeys. Error bars reflect standard error

Monkeys

The 2 × 2 repeated-measures ANOVA carried out for the monkey data yielded a different pattern of results from the chimpanzees. The main effect of sample stimulus was still present [F(1,4) = 18.606, p = 0.013, partial η 2 = 0.823]; however, the effect was smaller. Critically, there was no evidence that the monkeys performed better when the comparison stimuli were averages compared to instances [F(1, 4) = 0.099, p = 0.769, partial η 2 = 0.024]. The interaction between sample stimulus and comparison stimuli was also not significant [F(1, 4) = 0.064, p = 0.812, partial η 2 = 0.016]. The same a priori contrast (paired t test, one-tailed) was run to determine whether it was easier for the subjects to match a deviant sample stimulus to an average target, compared to when they were required to match viewpoint deviants to single instances. However, we found no evidence that this was the case (p = 0.48).

Discussion

The present data imply that chimpanzees, like humans, find it easier to discriminate digital averages of faces than single instances, which fits with the predictions made by ‘stability from variation’ theory (Bruce et al. 1994; Burton et al. 2005, 2011; Jenkins and Burton 2011; Ritchie and Burton 2016). Similarly, we found some indication that when tasked with matching a viewpoint deviant sample, a photograph depicting the face of an individual they had encountered whose features had been organically degraded through circumstance, performance was better when the comparison stimuli were averages than when the comparison images were single instances (even though the single instances that were used as comparison stimuli were high-quality photographs that had been seen before in previous experiments). This finding indicates that average faces were not only easier to discriminate, but also easier to match to novel instances. Collectively, these two observations suggest that chimpanzees store representations of identity that become increasingly robust and maximize diagnostic information, through a process similar to image averaging. A more direct implication is that chimpanzees appear to process faces in a similar manner to ourselves.

Although the ‘stability from variation’ theory was first posited by Bruce in 1994, it has gained a lot of traction in the psychological and computer sciences, yielding substantial advances not only in our understanding of how the brain accomplishes face recognition over time (Andrews et al. 2015, 2016; Clutterbuck and Johnston 2005; Dowsett and Burton 2015; Dowsett et al. 2016; Faerber et al. 2016; Jenkins and Burton 2011; Jenkins et al. 2006, 2011; Megreya and Burton 2006; Ritchie and Burton 2016), but also how to engineer this behaviour in machines (Phillips et al. 2010; Robertson et al. 2015). Without a doubt, the recent surge in its popularity is due, in part, to its applied implications for National security and improved identity protection through face recognition software (Burton et al. 2005; Jenkins et al. 2006, 2011; White et al. 2014a, b). Here, we provide empirical evidence indicating that both humans and chimpanzees encode individual faces in a similar way. Previous reports have also pointed to the striking similarity between human and chimpanzee behaviour towards faces in cognitive experiments (Dahl et al. 2013a, b; Parr et al. 2011, 2012; Taubert and Parr 2011, 2012; Taubert et al. 2012b; Weldon et al. 2013), and they have argued that evidence of common cognitive mechanisms may reflect the demands of a similar social system.

In this study, we also found no evidence of an advantage for averages over instances when we tested rhesus monkeys. It is important to note that these data only point to a difference across species and do not suggest that rhesus monkeys rely on more rudimentary processes than a great ape species. The processes that underlie familiar face recognition in rhesus monkey could well be equally complex and sophisticated. Indeed, the monkeys performed with greater overall accuracy than the chimpanzees in this experiment, suggesting that they employed a more efficient strategy in this context. A plausible explanation is that the rhesus monkeys were using a more optimal strategy for responding to the task, bypassing the mechanisms that underlie familiar face recognition. This ability of rhesus monkeys to use an alternate route for discriminating individual faces in a task might also explain the results of a previous lesion study where monkeys were first trained to perform a face discrimination task and were then retested after a bilateral lesion of the superior temporal sulcus (STS; Heywood and Cowey 1992). Although the STS is thought to house the core face processing system, these lesions did not confer a behavioural cost (Heywood and Cowey 1992). Taken with our current data, these studies provide some indication that rhesus monkeys need not use their face processing system to perform discrimination tasks, and importantly, in terms of the current experiment, this would still represent a systematic departure in the approach taken by different species when tested with the exact same task and under the same experimental conditions. In sum, we found no evidence to suggest an advantage for averages in a behavioural task, which represents a discontinuity with the results for chimpanzees reported here and humans.

This is not the first time that a similarity has been observed between humans and chimpanzees in terms of face perception that has failed to be replicated with rhesus monkeys. Another potentially important discontinuity concerns lateralization. Both humans (Ellis and Shepherd 1975; Hilliard 1973; Levy et al. 1972) and chimpanzees (Dahl et al. 2013b) are biased towards facial features presented in the left visual field. In humans, at least, this is thought to reflect neural lateralization in the brain (Gauthier et al. 1999; Grill-Spector et al. 2004). Unfortunately, such functional maps do not exist for chimpanzees. Studies of rhesus monkey behaviour, on the other hand, have yielded mixed results; many earlier studies reported no evidence of a hemi-field advantage (Hamilton 1977; Hamilton and Vermeire 1983; Overman and Doty 1982) until a study of split-brain monkeys in 1988 (Hamilton and Vermeire 1988). These authors later claimed the advantage may be mediated by gender (Vermeire et al. 1998). Nonetheless, the functional activity data have been more definitive with no evidence emerging of right hemispheric dominance at the system level in the rhesus monkey brain (Bell et al. 2011; Popivanov et al. 2012; Tsao et al. 2003). Importantly, any differences that occur across primate species, like those indicated in this paper, do not diminish the value of the rhesus monkey as a nonhuman primate model for social cognition. Instead, these date serve as a timely reminder that the rhesus monkey is a different species that evolved in its own lineage under a different set of social and cognitive demands. We need to investigate and quantify these differences so that models can be adjusted rather than assuming absolute similarity or overlooking cognitive differences entirely.

In this study, we compared two different species of nonhuman primate with each other using a similar experimental protocol. Moreover, we used the same number of chimpanzees and rhesus monkeys and were careful to use subjects that had already been trained to match to sample using positive reinforcement and had already participated in a number of studies investigating face perception (Parr and Heintz 2009; Parr et al. 2008, 2011, 2012; Parr and Taubert 2011; Taubert et al. 2012a, b; Taubert and Parr 2011, 2012; Weldon et al. 2013). The overlap in their research experience rules out general discrepancies in training and learning as contributing factors to the final result and allows us to compare these two species, directly. Therefore, the difference we report between the behavioural response of chimpanzees and rhesus monkey towards average faces could reflect a genuine difference in the way identity representations are stored. This conclusion holds broad implications for understanding perceptual learning and social cognition in these species because it indicates that increases in familiarity and experience might have a different impact on rhesus monkeys.