Introduction

As humans, we intuitively recognize individuals, assess their relationships with others, and perceive the disposition and intentions of others on a daily basis (Bruce and Young 1986). In particular, faces provide us with information on the age, sex, and identity of other individuals as well as information about the emotional states of others (Ekman and Oster 1979; Tranel et al. 1988). This adaptive specialization for socio-cognitive processes such as the recognition of individuals and facial expressions is thought to be possessed by other closely related primates as well (Barton 1998; Brothers 1990; Cheney and Seyfarth 1990; Hinde 1976). In fact, a number of researchers have posited that large brains and the cognitive capability we humans know as “intelligence” evolved in conjunction with group living and the social complexities that have arisen with it (Byrne and Whiten 1988; Humphrey 1976; Jolly 1966). Yet, there is little research investigating how sociality influences these socio-cognitive processes, largely because most primates are highly social, group-living species.

Numerous studies have found that faces are highly salient social stimuli for many animals, including nonhuman primates (Brown and Dooling 1992; Fujita 1993; Parr et al. 2000). Like humans (Pascalis et al. 2002), many nonhuman primates are better at recognizing individuals of their own species than those of other species (Macaca nemestrina: Fujita 1993; Macaca mulatta: Pascalis and Bachevalier 1998; Macaca tonkeana, Cebus apella: Dufour et al. 2006), implying that there was a strong evolutionary pressure to process conspecific faces differently, and presumably more efficiently, than the faces of other species. This would be under strong selection pressure in group living species, where individuals would do best if they recognized each other individually and remembered individuals with whom they had interacted (Porkorny and de Waal 2009).

However, other evidence indicates that the ability to discriminate conspecifics better than other species may merely be a result of the duration of visual exposure to the different species. For example, rhesus macaques were presented with pictures of conspecifics and domestic animals (images included full bodies). Using a habituation-dishabituation paradigm, macaques became habituated to a picture of a conspecific and then dishabituated when shown a picture of a different conspecific, suggesting discrimination between the two pictures. This effect was not found when the domestic animal stimuli were used; subjects treated domestic animals of the same species as similar, suggesting that rhesus macaques were able to discriminate conspecifics, but not domestic animals. However, after several months of exposure to the domestic animals, the macaques could individually discriminate them as well (Humphrey 1974). Similarly, chimpanzees (Pan troglodytes) with more exposure to human faces than to other chimpanzee faces were better at discriminating human faces than they were at discriminating chimpanzee faces (Martin-Malivel and Okada 2007). Together, these studies imply that experience and/or exposure may be a major factor in the ability to discriminate individuals.

Supporting this, human performance decreases when discriminating individuals across viewpoints or lighting conditions with unfamiliar but not familiar individuals (Bruce et al. 1999; Bruce et al. 2001; Hill and Bruce 1996). Likewise, chimpanzees tested on a face recognition task performed better when individuating highly familiar conspecifics across photographs displaying different viewpoints of the same individual compared to moderately familiar conspecifics (subjects had prior exposure to these individuals only as test or training stimuli), and worse when individuating unfamiliar conspecifics (Parr et al. 2011). In contrast, one study found that a gorilla and four orangutans performed at similar levels when discriminating familiar conspecifics and unfamiliar heterospecifics across various ages (Vonk and Hamilton 2014). Interestingly, orangutans preferred to look at currently unfamiliar as opposed to familiar faces unless they were paired with historically familiar faces, in which case they prefer to look at old acquaintances (Hanazuka et al. 2013). The only other study to evaluate familiarity on a face discrimination task in nonhuman primates found that capuchin monkeys, a highly social New World primate, performed equally well when tested on familiar and unfamiliar faces (Porkorny and de Waal 2009). However, this study did not examine the effect of familiarity directly.

Because socio-cognitive skills such as face recognition have been proposed to have evolved as a response to social complexity (Byrne and Whiten 1988; Humphrey 1976; Jolly 1966), it is perhaps not surprising that these skills have primarily been demonstrated in highly gregarious, group-living species (Parr et al. 2000; Porkorny and de Waal 2009; Rosenfeld and van Hoesen 1979), leaving open the question of whether all primates discriminate faces in similar ways, or whether there have been specialized adaptations based on whether or not individuals spend the majority of time in social groups. Therefore, in order to examine how sociality influences socio-cognitive skills such as face recognition or discrimination, additional work is required on closely related but less social species.

Orangutans are prime candidates to test questions of sociality among primates because they are far less gregarious than most primate species, and in particular, than the other great apes. In captivity, orangutans are social (Edwards and Snowdon 1980) and there have been arguments that their lack of sociality in the wild is an evolutionarily recent adaptation due to habitat loss in the Indonesian archipelago (Meijaard et al. 2010). In the wild, the home ranges of orangutans overlap (Mitani et al. 1991; te Boekhorst et al. 1990) and occasionally larger aggregations form during periods of high fruit abundance (MacKinnon 1974; Rijksen 1978; Singleton et al. 2009), providing the opportunity for social encounters. Nevertheless, orangutans do not form coalitions and alliances to the same degree as other apes (van Schaik 2004) and currently spend a significantly smaller proportion of their time in groups than do other species (Galdikas 1988; Rijksen 1978).

A recent study tested four orangutans on a matching-to-sample discrimination task in which subjects were presented with fully body images of familiar conspecifics and unfamiliar heterospecific individuals. All subjects performed above chance on matching familiar conspecifics across viewpoints, and three of the four orangutans performed above chance matching unfamiliar gorillas (Vonk and Hamilton 2014). However, in this study, familiarity was confounded with species identity. In order to address the question of familiarity more directly, one should examine the performance across familiarity within the same species.

Thus, in this study, we sought to expand upon previous results by investigating face discrimination performance as a function of familiarity in a different population of orangutans (Pongo spp.) to determine if they exhibit similar face discrimination behavior in familiar versus unfamiliar conspecifics as compared to other, more gregarious, great apes. Importantly, the individuals we tested came from established social groups at Zoo Atlanta, where they spend their entire day in the company of other orangutans, eliminating the possibility that any differences we might find would be due to different exposure of the individuals to other orangutans, as opposed to species differences. Because orangutans share many cognitive traits with chimpanzees (Herrmann et al. 2007; Russon 1998; Shumaker et al. 2001; Tomasello and Call 1994), which have demonstrated skills of individual discrimination (Parr et al. 2000; Parr et al. 2011), and a recent study indicated that orangutans discriminate full body images (Vonk and Hamilton 2014), it is reasonable to predict that orangutans will perform similarly well on a conspecific face discrimination task. However, as humans and chimpanzees are better able to discriminate familiar as opposed to unfamiliar conspecific faces across viewpoints (Hill and Bruce 1996; Parr et al. 2011) and exposure may improve one’s ability to identify individuals (Fujita 1990; Martin-Malivel and Okada 2007; Tanaka 2003), we expected orangutans to better discriminate familiar as opposed to unfamiliar faces.

Materials and Methods

Subjects and Housing

We tested three orangutans socially housed at Zoo Atlanta, Atlanta, GA, USA. Subjects included one Sumatran female, Madu, age 28; one Sumatran male, Junior, age 9; and one Bornean male, Satu, age 8. Test subjects came from two social groups. Madu and Junior were housed with a hybrid male (age 34) and two other Sumatran males (ages 1 and 5). Satu was housed with a Bornean male (age 18) and a Bornean female (age 19).

Madu was reared in a computer-enriched environment at the Georgia State University Language Research Center (Washburn et al. 2007) and participated in cognitive tasks there (e.g., Beran 2002). Madu and Satu had previous training with a variety of cognitive tasks using the matching-to-sample paradigm on a computerized-touchscreen testing apparatus including matching social stimuli such as faces.

All subjects had indoor/outdoor access, extensive material enrichment (climbing structures, ropes and swings, barrels, and other toys), and were fed their usual diet consisting of primate chow, fruits, and vegetables throughout the course of the study. In addition, feeding enrichment was provided on a daily basis as part of the husbandry routine. At no time were the subjects ever food or water deprived. Studies involved a single ape at a time. All subjects participated voluntarily, being called in from their social groups and tested in one of the indoor dens of their living area. If possible subjects were separated from other individuals to limit distractions (unweaned infants always accompanied their mothers). All procedures used in this research were approved by the Scientific Review Committee of Zoo Atlanta and the Institutional Animal Care and Use Committee of Georgia State University and were in accordance with the American Psychological Association’s guidelines for ethical conduct in the care and use of nonhuman animals in research.

Face Stimuli

Stimuli consisted of high-quality digital color photographs of both familiar and unfamiliar individuals. Hereafter, “unfamiliar” individuals refers to orangutans housed at various zoos and sanctuaries in the USA with whom subjects have never before interacted or seen, whereas “familiar” individuals refers to other orangutans housed at Zoo Atlanta from both within their social group and from neighboring groups. All subjects had daily visual and vocal access to each of the familiar individuals.

Using a standard graphics software package (PhotoShop CS3), photographs were cropped to only include heads and faces, making sure full flanges were visible for adult males. Backgrounds of the photographs were homogenized by filling in the area around the face with solid white. Brightness and contrast were standardized to control for differences in lighting.

Stimuli included multiple photographs of males and females of all ages displaying different head positions and gaze orientations. Two sets of photographs were compiled of the same set of individuals, one for training and one for testing. Training stimuli included 29 photographs of 14 unfamiliar conspecifics. Test stimuli included 72 photographs of 11 familiar individuals (5 to 40 years of age) and 62 photographs of 14 unfamiliar individuals (6 to 41 years of age). Presentation size of the images was 12.27 cm by 17.8 cm with a resolution of 300 dots per inch. The viewing distance of the subjects was approximately 40 cm.

Apparatus and General Procedure

We implemented a simultaneous matching-to-sample (MTS) paradigm (Nissen et al. 1948; Parr et al. 2000; Porkorny and de Waal 2009) in which subjects were presented with a sample image and matched one of two comparison images to the sample. The correct comparison image matched the sample on some predetermined stimulus dimension while the other image did not match. As not all subjects in this study were computer trained, we adapted this procedure to a manual task that allowed the experimenter to be completely blind to both the subjects’ options and the correct answer. To standardize the location of the stimuli, we presented sample stimuli on a presentation board, hereafter referred to as “sample board,” that faced the orangutan (and away from the experimenter). After the orangutan had seen the sample, the comparison stimuli were presented simultaneously on a second presentation board, the comparison board, that was placed against the sample board, underneath the presented sample (Fig. 1a). This board also faced the orangutan and away from the experimenter, who could not see any of the images and therefore could not inadvertently cue the subject (see additional details on this procedure below). The background of the presentation board was colored to specify the MTS rule or task. For example, the presentation board for the identical photo-matching task had a pink background, while the board for the different photo-matching task had a black background.

To avoid experimenter cueing, stimuli were randomized, pre-sorted, and kept upside down during all training and test sessions. The boards and the images had Velcro, so the experimenter could display the appropriate image (the one from the top of the pile) without viewing them. This way, the experimenter did not see any of the images until after the subject had made a choice and therefore did not know which images were being presented or which side the correct choice was on, and thus was blind to the correct response on every trial. For each trial, the experimenter first drew the two comparison stimuli from the pile of upside down stimuli and placed them, still upside down, on the floor, equidistant to the sample board, one to the left of the sample board and the other to the right. At this point, neither the experimenter nor the subject could see the comparison stimuli. Then the experimenter drew the upside down sample image, and held it up to the subject until the subject oriented towards the image by touching it. The sample was then fastened with Velcro onto the center of a sample board, centered in front of the subject (the experimenter still could not see the sample). Finally, the experimenter picked up the comparison board and placed it on top the two comparisons images, attaching them by Velcro, then rotated the board such that the orangutans could simultaneously see both comparison stimuli, while the experimenter could not. The comparison board was placed at the bottom of the sample board (see Fig. 1). In this way, the subjects saw both matches at exactly the same time, equidistant from the sample, while the experimenter could not see any of the three images. The picture the subject pointed towards was accepted as their choice. Subjects were familiar with this pointing methodology (similar to the behavioral command “hand” which they had used in previous studies; Stoinski personal communication; see Fig. 1b). If subjects pointed ambiguously (i.e., between the two comparison images, made a second choice or simultaneously pointed at both), the experimenter removed all images (still hidden from the experimenter), gave the subject a 3s inter-trial interval (ITI), and restarted the trial.

Fig. 1
figure 1

a Photograph of the experimental setup. The large presentation board displays the sample and the smaller comparison board displays the comparison stimuli. The comparison board allowed us to present both comparison stimuli simultaneously to the orangutans, equidistant from the sample stimulus. Note that the experimenter sat behind the display board b and could not see either the sample or comparison stimuli at any time during the trial, and therefore did not know what the correct choice was. See text for more details

Once the subjects made their choice, the experimenter examined the images to determine whether the subjects chose the correct comparison stimulus. When the correct choice was made, the experimenter verbally rewarded the ape by saying “Good Job,” placed the correct choice next to and then over the sample, providing visual feedback, and then rewarded subjects appropriately. If an incorrect choice was made, the experimenter indicated this was not correct by saying “No”, placed the incorrect image up next to the sample, showing that it did not match, and no reward was given to the subject. For every trial, the location of the image was recorded (left or right). All test sessions were videotaped for later reliability analysis.

Training Procedure

During training, a correction procedure was employed so that following an incorrect choice subjects received an ITI of 3 s and then repeated the trial. In this manner, the trial was repeated up to four times or until the subject selected the correct response, whichever occurred first (Porkorny and de Waal 2009). Only correct choices were rewarded. Training rewards consisted of cereal, flavored pellets, and Crystal Light brand juice. Subjects were given a maximum of one training session per day of up to 40 trials or 30 min. The number and duration of training sessions were recorded. Training included three phases or tasks (shape, identical, different). Training on each of these phases continued until the subjects’ performance reached a criterion of a minimum of 80 % success rate on two consecutive sessions (consisting of 20 trials each) on two different days (as in Parr et al. 2000). This criterion was used to verify that subject understood the task before moving on to the next, more complex, task.

Shape

Simultaneous MTS training was conducted with two-dimensional colored shapes. Subjects were trained to match the identical shape to the sample shape. Shape stimuli varied on two dimensions: shape and color. Thus, subjects could use either or both of these perceptual cues to make their choice. Subjects were trained using 62 shape stimuli or 31 matching pairs, all of which were congruent for shape and color.

Identical

Next, face stimuli were introduced. This training phase examined the ability of orangutans to match identical portraits of unfamiliar conspecifics. Subjects were trained to choose the identical comparison image that matched the sample. This phase was not designed to address whether subjects viewed these images as representations of specific individuals, but rather to provide an initial assessment of how quickly and accurately orangutans acquired the ability to discriminate complex two-dimensional face images.

Different

Subjects were then trained on the individual discrimination task to familiarize the subjects to the task and MTS rule. Subjects were required to match two different photographs of the same individual; photos displayed different head positions and gaze orientations. This phase verified that subjects were not relying on irrelevant perceptual features of the photographs, such as symmetry or lighting, to match stimuli.

Transfer Test

Once subjects met criterion on these three phases of training, they were then transferred to all novel stimuli, that is, none of the photographs used in the transfer test had ever been used in training and so none of the stimuli had ever been seen by the subjects. In the transfer test, multiple stimulus sets were presented for each individual (i.e., each individual was presented as the sample individual more than once, but samples were never used more than once). Successful transfer to novel stimuli allows us to rule out performances based upon irrelevant factors such as memorization of the training stimuli (Thompson and Oden 2000). Thus, the transfer test was designed to evaluate how well subjects discriminated individuals without previous exposure to the stimuli, but after they had received sufficient experience with the paradigm to be sure that they understood the task. To consider how individual discrimination abilities might differ for familiar versus unfamiliar faces, during testing we used novel stimuli of both familiar and unfamiliar faces.

Subjects participated voluntarily, called in from their social groups by the experimenter to test separately. Rewards were two sugar free colored pellets. Test trials using novel photographs were randomly inserted among training trials (with no more than two consecutive test trials) for a total of 84 familiar test trials and 87 unfamiliar test trials. Each session consisted of 24 trials including six test trials (balanced between familiar and unfamiliar conditions) and 18 previously seen training trials. In order to avoid providing feedback, test trials were always rewarded. Within each trial, familiarity (familiar or unfamiliar) and sex category (flanged male, unflanged male, female) were held constant, and all individuals were juveniles or adults (see Table 1 for number of individuals and photos per category) so that performance could not be based on recognizing features specific to these parameters; only the feature in question, identity, could accurately predict the correct choice. Stimulus sets for test trials were only presented once to each subject. Subjects were never presented with images of themselves. The location (left or right) of the correct comparison stimuli was randomized; however, within a test session (and within test trials), there were an equivalent number of correct choices located on each side. Subjects received no more than one test session per day. Test sessions were given on different (and, when possible, consecutive) days.

Table 1 Number of individuals and novel photos for each category of stimuli

Responses were immediately recorded on data sheets by the experimenter and test sessions were videotaped. Random number strings consisting of six digits were used to identify each photograph (i.e., each individual photograph had a unique random number, and numbers were different for the sample and its correct match). Inter-observer reliability was later conducted to verify the experimenter’s accuracy in deciding whether or not the correct match was chosen. Forty-two percent of the data were recorded from the videotapes by a coder who was blind to the hypotheses to verify the experimenter data. We found perfect agreement for all variables examined (sample stimulus: Cohen’s К = 1; left stimulus: Cohen’s К = 1; right stimulus: Cohen’s К = 1; choice: Cohen’s К = 1).

Data Analysis

For training phases, we reported the number and duration of training sessions needed to reach criteria on each task for each subject. For each test session, the experimenter recorded information on the subject, date, session number, random number strings for all of the images presented in stimulus sets, the location (left or right) and random number strings of the choices that were selected by the subject, and whether each trial was correct or incorrect. The primary dependent variable of interest was the response (correct/incorrect) and the independent variable was the condition (familiar/unfamiliar). Due to our small sample size, we ran a heterogeneity G-test to determine whether overall results deviated from expected proportions for familiar and unfamiliar performance (Sokal and Rohlf 1995). For each subject, we used binomial z scores to determine if a given number of test trials were significantly above chance (p < 0.05) as well as to examine proficiency on familiar versus unfamiliar trials and between sex categories (flanged male, unflanged male, female). All p values are two tailed. Finally, we calculated monotonic curves for each individual to evaluate performance over time on both familiar and unfamiliar trials (Royden and Fitzpatrick 2010). Nondecreasing monotonic curves are indicative of learning while nonincreasing monotonic curves would indicate that subjects were more likely to choose at random as the trials progressed.

Results

Training

For all training, performance criterion was set at 80 % correct on two consecutive sessions consisting of 20 trials each. However, because subjects had to the opportunity to choose not to participate, some training sessions were shorter than 20 trials. These short training sessions were not counted towards criterion, however, they were included in the total number of training trials. For shape training, designed to familiarize subjects with the matching-to-sample paradigm, subjects achieved criteria in an average of 270 trials (Junior 461, Madu 120, Satu 230). On the identical photo-matching task, subjects achieved criteria in an average of 350 trials (Junior 465, Madu 334, Satu 251). Finally, in the different photo-matching task, subject achieved criteria in an average of 230 trials (Junior 312, Madu 180, Satu 199). There was no significant difference in acquisition speed between the three training conditions (shape, identical, and different; Friedman’s test: χ 2 = 4.67, p = 0.097, d.f. = 2).

Transfer Test

Transfer to novel photographs allowed us to evaluate how well orangutans recognized individuals without previous experience with the stimuli, and to see whether or not they could extrapolate their knowledge of the task to these new photographs. Individual performances were not significantly heterogeneous (Gh(2) = 1.16, p = 0.56), so we pooled the results across individuals revealing that, overall, our subjects performed significantly above 50 % when discriminating familiar individuals (Gp(1) = 4.72, p = 0.029). On the individual level, two of the three subjects performed significantly above chance on the familiar trials indicating successful transfer (Fig. 2, Madu: Z = 2.08, p = 0.036; Satu: Z = 2.08, p = 0.036). No subject performed above chance on the unfamiliar trials (Fig. 2).

Fig. 2
figure 2

Individual performance: percentage of correct responses on the transfer test for each individual for both familiar (dark grey bars) and unfamiliar conditions (light grey bars). Madu and Satu performed significantly above chance level (50 %), indicated by the dashed line; *p < 0.05

Because we were interested in examining spontaneous categorization of both familiar and unfamiliar individuals, we also analyzed the first presentation of each individual as a sample stimulus (although each photograph was presented only once, multiple photographs of each individual were presented). Again, individual performances by the orangutans were not significantly heterogeneous (Gh(2) = 4.23, p = 0.12), and the stronger pooled analysis shows that, as a group, subjects discriminated faces on familiar trials significantly higher than expected by chance (Gp(1) = 17.41, p < 0.0001). On the individual level, only Madu performed significantly above chance when discriminating familiar individuals on the first presentation (Fig. 3, Z = 2.85, p = 0.002), and again, she did not do so for unfamiliar individuals (Z = 0.00, p = 1.00; see Fig. 3 for the performance of the other subjects).

Fig. 3
figure 3

Initial presentation: percentage of correct responses on the initial presentation of each sample individual both familiar (dark grey bars) and unfamiliar conditions (light grey bars). Madu performed significantly above chance level (50 %), indicated by the dashed line; **p < 0.01

One potential issue at looking at overall task performance is that we expected individuals to increase their accuracy throughout the test. Thus, in addition to overall performance, we assessed whether individuals showed evidence of learning in each condition (familiar and unfamiliar) throughout the testing period by plotting monotonic curves for each individual (Fig. 4). To do this, a cumulative performance index was calculated by taking the number of correct choices minus the number of incorrect choices across trial blocks (the total number of trials divided into four blocks; Glicksohn et al. 2007; Proctor et al. 2014). In order to classify learning, the cumulative performance index had to result in a monotonic function (Fig. 4). Monotonic functions are those that are either entirely nonincreasing or entirely nondecreasing (i.e., the first derivative of the function does not change sign; Royden and Fitzpatrick 2010). Nondecreasing monotonic functions are indicative of learning. Madu and Satu displayed nondecreasing monotonic functions for familiar individuals, whereas Junior did not (the slope of Junior’s monotonic function changed direction; Fig. 4). In contrast, no subject displayed a function for unfamiliar individuals. Thus, Madu and Satu showed evidence of learning to discriminate familiar individuals throughout the testing period, whereas no subject showed evidence of learning to discriminate unfamiliar individuals.

Fig. 4
figure 4

Performance over time: Monotonic curves for each individual: a Madu, b Junior, and c Satu. Madu and Satu displayed monotonically increasing functions on trials depicting familiar individuals (dark grey line), indicative of learning

Finally, it is possible that sex and secondary sexual characteristics may be an important social factor which influences facial discriminations; therefore, we examined performance across three different sex categories (flanged males, unflanged males, and females) of the stimulus sets to explore any differences in discrimination performance. Although our sample size is very small, we did find that one individual, Madu, performed significantly above chance, discriminating unflanged males in particular (binomial tests: Z = 2.29, p = 0.019).

Discussion

Orangutans are one of the first non-gregarious species to be tested on this type of socio-cognitive task, but like other primates that spend more time in social groups, they are better able to discriminate familiar, compared to unfamiliar, individuals from photographs (Homo sapiens: Hill and Bruce 1996; P. troglodytes: Parr et al. 2011; but see Porkorny and de Waal 2009 on C. apella). These results support previous findings that orangutans are sensitive to the social information present in two-dimensional images (Vonk and Hamilton 2014), which, at least for familiar individuals, allows them to discriminate members of their own species. If subjects were simply using feature matching to make these discriminations, one would expect them to perform equally well on both familiar and unfamiliar photographs. Providing further support that the orangutans were discriminating faces, one of the orangutans (Madu) performed above chance on trials depicting familiar individuals on the very first presentation of the individuals. Thus, the evidence indicates that at least some orangutans can immediately and spontaneously match conspecific faces when learned associations between the photographs are not possible.

One interesting possibility is that experience with cognitive testing leads to improved performance on this type of task. While our sample was too small to test this further, it is worth noting that our most successful subject, Madu, who distinguished familiar individuals on the first presentation, had more extensive testing history with the matching-to-sample paradigm than the other orangutans. We know that experience influences responses in cognitive tasks. For instance, chimpanzees with substantially greater testing experience were more likely to find the optimal outcome in an economic game than were chimpanzees with little previous history of cognitive testing (e.g., in the Assurance game; Brosnan et al. 2011). Likewise, on individual discrimination tasks, chimpanzees with an extensive testing history with MTS tasks were able to quickly perform at above chance levels (Parr et al. 2000) and orangutans that had prior experience with the testing paradigm outperformed naive individuals (Vonk and Hamilton 2014). These results highlight the importance of verifying that subjects have enough experience with the paradigm before concluding that a species cannot perform a task.

Aside from experience, it is also possible that exposure to many conspecific faces during critical developmental periods may aid recognition. For instance, Sugita (2008) found that after 6 months of face depravation, monkeys first exposed to either human faces or conspecific faces demonstrated better discrimination and greater viewing preference towards the species they were first exposed. All of our subjects were from a zoo and had been in social housing with numerous other conspecifics for their entire lives. If the social environment in which captive animals are housed intrinsically influences results on a task of this nature, different results might be expected in animals with less social experience. Therefore, future studies should consider the influence of both social history and expertise, especially during early development, when possible.

Although rarely considered in studies of face recognition, sex and secondary sexual characteristics may be an important social factor which influences face discriminations. In humans, for example, male faces are more distinctive than female faces, partly due to the pronounced nose/brow and chin/jaw area (Bruce et al. 1993). Moreover, in one study, rhesus monkeys also performed better at matching two different pictures of male monkeys compared to female monkeys (Parr et al. 2010). Thus, while this was not a specific aim in our study, we examined performance as a function of sex, specifically across three categories of stimuli: flanged males, unflanged males, and females. Although all subjects performed above chance on all of these categories, we found that one individual, Madu, performed particularly well when discriminating unflanged males as compared to the other sex categories. Given our small sample size, we do not think this result can be generalized. However, it highlights the need to consider sex as a variable in future studies to better evaluate the role this social factor may play in face recognition.

The most parsimonious explanation for the finding that orangutans are better able to discriminate familiar than unfamiliar faces is that orangutans and the African apes, who are more gregarious, evolved from a common ancestor who could discriminate faces. This is supported by the evidence for face discrimination in other, more distantly related, primates. The fact that orangutans have not lost this ability suggests two possibilities. Although our study cannot discriminate between the two, it is useful to discuss them. First, if the ability to discriminate faces is the result of a homology, it may be a holdover from the common ancestor of the apes, as there is no obvious selective pressure against facial recognition. Second, this ability may still be important to orangutans, who despite being relatively more solitary than the other apes, do form small social groups when food constraints do not exist (e.g., during mast fruiting and in captivity; MacKinnon 1974; Rijksen 1978; Singleton et al. 2009) and, at some sites, currently maintain some form of social unit (Morrogh-Bernard et al. 2009; van Schaik and van Hooff 1996). Moreover, orangutans living in zoos easily form social groups suggesting they encountered conspecifics more frequently in their evolutionary history (Meijaard et al. 2010), including very recently (e.g., within the last few hundred years, prior to extensive deforestation of Indonesia). Thus, while these findings could be “left over” from a common ancestor, it may also be that it is advantageous for orangutans to recognize other orangutans in their vicinity. Experience with or exposure to those individuals may enable the formation of more robust representations of these individuals as they view them across a broad range of conditions and viewpoints in their everyday encounters. Future comparative work examining the social factors and cognitive processes underlying face recognition in nonhuman primates may help us better elucidate the evolution of face processing skills.