Introduction

The visual search task, in which a human observer searches for a target among numerous distracters, is one of the best methods to explore two different ways of searching and detecting an object. If the target contains a feature attracting attention, the observer either searches quickly in an efficient, parallel manner for a target that “pops out,” or searches slowly in an inefficient, serial manner (Treisman and Gelade 1980; Wolfe 2001). Numerous studies have reported that an observer could efficiently detect emotional facial expressions in a crowd of other faces (Becker et al. 2011). Additionally, there have been studies in which subjects efficiently searched for their own faces (Tong and Nakayama 1999) or visually familiar faces (Hansen and Hansen 1988; Levin 1996; Montoute and Tiberghein 2001). Clearly, at least certain limited types of faces can be sought efficiently from among others in a visual search task.

In most of these studies, both the target and the distracters had similar characteristics. By using a visual search task with realistic and varied distracters, an appreciable face “pop-out” effect has been observed. Hershler and Hochstein (2005) demonstrated that a real human face pops out from among varied distracter objects even if more stimuli are presented simultaneously, and this “pop-out” effect disappears when a car, a house, or an animal face is presented as a target object.

There have been a few studies on visual searching for faces in non-human primates. We have recently tested visual searching for faces in Japanese monkeys (Nakata et al. 2014) and showed that the monkeys efficiently searched upright face-like stimuli among inverted face-like stimuli as opposed to searching for inverted face-like stimuli among upright face-like stimuli. In this study, these face-like stimuli consisted of circles with a configuration that was an approximation of the face, and therefore, we could not determine whether the monkeys indeed recognized face-like-configured drawings (circles) as realistic, i.e., in a cognitive sense of “face like.” Certainly, humans can recognize the schematic drawings as faces, but whether monkeys can remains unsettled. Visual searching for a realistic face among non-face objects corresponding to an actual scene is critical for examining this possibility in monkeys. Given that monkeys recognize a schematic face as a member of a facial group, possibly a task using naturalistic stimuli will produce a clearer pop-out effect. So far, however, a very limited number of studies of visual searches using realistic faces and varied distracters have been conducted with non-human primates. One of the few examples is a series of chimpanzee studies (Tomonaga and Imura 2015). In that series, chimpanzees detected conspecific faces more efficiently than they detected cars and houses. The efficient detection was also shown for adult human faces, with which the chimpanzees were rich in visual experience. The underlying mechanisms of facial recognition may explain the similarity in the processing of both humans and chimpanzees.

From evidence that Old World monkeys (e.g., baboons and macaques) demonstrated various facial recognition abilities (Parr and Hecht 2011) and that our macaque study (Nakata et al. 2014) showed a pop-out effect during the search for a schematic face, it is possible that the facial recognition abilities of monkeys follow those of apes. In addition, it is important to more naturally examine visual search in monkeys by allowing them to freely view the objects. Macaque studies exploring efficiency in searching for faces under a free viewing condition (overt search) may provide greater understanding of the facial processing in a search task with a natural face and natural objects.

In the present study, we examined the efficient search for a face by Japanese macaques (Macaca fuscata) and humans under a free viewing condition. The primary purpose of the study was to determine the factors involved in the efficient detection of a face among non-face objects by Japanese macaques and humans. The study’s secondary purpose was to explore the extent and the limitations of the facial recognition efficiency of both species. In Experiment 1, we examined the categorical effect for efficient detection by using several kinds of natural faces (e.g., conspecific faces and faces of other-species members). We hypothesized that each species would exhibit more efficient searches for faces of their own and similar species’ faces and less efficient searches for other-species faces that are clearly different from conspecific faces, given that the search efficiency for faces is likely to be based on the subjects’ familiarity with conspecific faces and/or the social relevance of conspecific faces. In Experiment 2, we further examined which facial information is important for the pop-out effect by modulating the faces’ spatial frequency components.

Methods

Animals

Two Japanese monkeys (M. fuscata; one male, one female; ages 4–5 years; weight 4.0–7.0 kg) were used for the experiments. They were housed alone, one per cage, but allowed visual and auditory communication with Japanese monkeys in other cages. Their few human caretakers wore facemasks and protective clothing to limit the animal’s experience of whole human faces.

Human participants

The study’s human participants were 8 university students (2 females, 6 males; ages 19–25 years with an average of 19.8; all Japanese) having normal or corrected-to-normal vision.

Stimuli

Target stimuli were 35 colored or gray-scale facial images chosen from the PrimFace database (http://visiome.neuroinf.jp/primface) or our own face set. One set of 28 faces was always presented without filtering. The remaining 7 faces were presented with spatial frequency filtering. A wide variety of non-face objects from our own stimulus set (480 photographs) were also included as distracter stimuli, after standardization of their dimensions (for example, see Fig. 1). Some of the non-face objects were human-made, namely buildings, clothing, electric appliances, furniture, kitchen goods, miscellaneous goods, stationery goods, tools, vehicles, and food products. Others were natural objects, namely trees, flowers, rocks, vegetables, fruits, and natural scenes (Fig. S1 and Table S1). The images were adjusted to the same standard size (70 × 70 pixels, approximately 2.0° of visual angle). The image processing was carried out with the use of Adobe Photoshop CS2.

Fig. 1
figure 1

a An example of stimulus presentations. The screen positions of each stimulus were selected randomly from 20 positions. In the example, the target face was positioned at the left most column in the 5th row of the stimulus array; 19 non-face distracters were presented at 19 other positions. b Examples of face stimuli presented in the test trials of each generalization phase. (a, b) A Japanese face and a Japanese monkey face, respectively, used in Experiment 1. (c, d) A high-pass filtered human face for human participants and a low-pass filtered monkey face for monkey subjects used in Experiment 2. Photographs of human faces and objects were prepared by R. Nakata. Photographs of monkey faces by PrimFace database: http://visiome.neuroinf.jp/primface

Apparatus

The monkeys were head-restrained and seated in a primate chair during the experiment. To record eye movements of monkeys, a scleral search coil was implanted in one eye. A real-time experimental control system, TEMPO (Reflective Computing, USA), run on a dedicated personal computer was used to generate the stimuli, control the behavioral task events, and record eye movements with a time resolution of 4 ms. For humans, we used a keypad to register their responses. E-Prime® (Psychology Software Tools, Inc., UK) generated the stimuli, controlled behavioral task events and recorded data.

Procedure

Behavioral task

In both experiments, the subjects searched a face (the target stimulus) among non-face objects (distracter stimuli). During the experiments, subjects could freely view the stimuli (overt search).

For monkeys, the trials began with the presentation of a circle at the center of the CRT monitor. After the monkey fixed its eyes on the circle for an average of 2000 ms (range: 1500–2500 ms), both target and distracters were simultaneously presented. The present distracters were selected randomly from all distracter stimuli, regardless of the category. The screen position of each stimulus was selected randomly from 20 possible positions. If the monkey fixed its eyes on the target for more than 1000 ms (a correct trial), a reinforcement (0.2 ml of apple juice) was delivered. Reaction times were defined as the interval between onset of the search stimulus array and the starting point of correct fixation on the target stimuli. After delivery of the reward, there was an intertrial interval (ITI) of approximately 10 s on average. However, if the monkey fixed its eyes on any one of the distracter stimuli for more than 1000 ms (an error trial), then the screen blackened, and the ITI began. Error trials were not rewarded.

The humans’ behavioral task sequence was identical to that of the monkeys’ in terms of the presentation of the fixation point and search stimulus array, but for the behavioral response the human participants were required to use a keypad to indicate when and where the target was presented. They were required to respond by pressing any key as soon as possible when they found the target. After that, four numbered squares were presented on the display. The screen positions of the four squares corresponded to each of the four quadrants: upper left, upper right, lower left, and lower right. The participants were required to identify where the target was on the four quadrants by pressing the appropriate key on a keyboard. Accordingly, reaction times were defined as the interval between onsets of the search stimulus array and first keypad response.

Experimental conditions

In Experiment 1, we used 4 types of target stimuli, 7 stimuli per type, which corresponded to the following 4 conditions: (1) a frontal view of the face of a member of the subject’s species or race (Japanese monkey faces for Japanese monkey subjects and Japanese faces for human participants, all of whom were Japanese), (2) a back-of-the head view of a member of the subject’s species (backs of the head), (3) a front face of a member of the subject’s closely related species or race (rhesus monkey faces for Japanese monkey subjects and Caucasian faces for human participants), and (4) a front face of a member of a species different from the subject’s species (human faces, Japanese faces, and Caucasian faces, for monkeys and Macaque faces, Japanese monkey faces, and rhesus monkey faces for humans).

In Experiment 2, we used 7 frontal face images of different individuals from the subject’s own species for target stimuli. We set 4 experimental conditions, modulating spatial frequency content in each face using either of two types of high-pass cutoff filters, (1) > 16.8 cycles/face width (c/fw) (HPF-16.8 faces) or (2) > 12.6 c/fw, or either of two types of low-pass cutoffs, (3) < 8.3 c/fw (LPF-8.3 faces) or (4) < 4.8 c/fw. Average stimulus luminance did not differ according to image type.

Experimental blocks and sessions

For monkeys, the present study consisted of 2 consecutive blocks corresponding to the Experiments 1–2. In each block, we first trained the monkey to search for a target among 3 distracters under 4 different conditions. (The conditions are described above under (1)–(4) in Experimental conditions.) In each condition, we trained the monkeys using 3 target stimuli with one target per trial. The monkeys performed 480 trials per session (4 conditions × 3 target stimuli × 20 possible positions on the screen × 2 repetitions) with 1 or 2 sessions per day. This training phase continued until the monkey achieved 99% correct performance in 2 consecutive sessions. If the monkey achieved this criterion, we next introduced a test phase: in addition to the same number (i.e., 3) of distracters as in the training phase, 7, 11, and 19 distracters were used to test the effect of different numbers of distracters (array sizes of 8, 12, and 20). All methods in the test phase duplicated the training phase just before the test phase except for target stimuli, i.e., some of the targets were new, although they were the same type as those used in the training phase. Therefore, a block used 28 target stimuli (12 of the stimuli were used in both the training and test phases, and 16 were newly used only in the test phases). A test session consisted of 528 trials (480 duplicates of the training phase and 48 test trials: 3 array sizes × 4 different conditions × 4 target stimuli). The test phase continued for 8 sessions, regardless of the behavioral results for the particular animal. After the 8th session of the test phase in the first block (Experiment 1), the initial session in the training phase of the next block (Experiment 2) began. In the test trials, the positions of the target faces were limited to 8 positions selected from 20. Across the 8 sessions of the test phase, each target appeared once at each position. Therefore, the average eccentricities of each target in the test trials were equalized.

Humans participated only in the test phases. There were 384 trials: 240 with an array size of 4 and 144 with the remaining array sizes. They participated Experiment 1 and Experiment 2 consecutively in 1 day with a short break between the two experiments.

Data analysis

We collected data on the reaction times in each subject and calculated the means for each condition. We analyzed the data by conducting an analysis of variance (ANOVA). For the macaques, we analyzed the data using a two-way ANOVA with the factors of the array size, i.e., the number of stimuli (target plus distracters: 8, 12, 20), and the variations in the target object (the four conditions in each experiment) by using each session data of the two macaques (eight sessions each) as statistical subjects. The variations across sessions were serving to estimate error variance. For the humans, a two-way ANOVA was used with the factors of the array size and the variations in the target object. Previous studies have used the slope of reaction times to search for a target among an increasing number of distracter items as a measure of search efficiency (Wolfe 2001), and we therefore further analyzed the slopes of the mean reaction time as a function of array size (ms/item) for each data set, using a linear regression.

Experiment 1

Results

The macaques

The two monkeys required 17 and 30 sessions, respectively, to satisfy the 99% criterion in the training phase. The mean correct-response rate of the test phase was 99.48% (SE = 0.19), and there were no significant differences in the mean correct-response rate between the 4 experimental conditions. There were also no significant differences in the mean reaction times between the conditions in the array size of 4 in this phase (Fig. S2).

We compared the mean reaction times for the three increased array sizes using new targets in the test phase. Figure 2a shows the mean reaction times for these conditions. The ANOVA revealed significant main effects of the array size (F 2,30 = 10.43, P < 0.001, η 2 = 0.06) and the variations in target objects (F 3,45 = 3.25, P < 0.05, η 2 = 0.05). Notably, a significant interaction between these two factors (F 6,90 = 2.70, P < 0.05, η 2 = 0.07) was also revealed. With respect to the array size, a simple main effect was confirmed for human faces (F 2,30 = 10.89, P < 0.001, η 2 = 0.66) and backs of the head (F 2,30 = 5.37, P < 0.01, η 2 = 0.33), but not for the faces of the two species of monkey (Japanese macaque and rhesus macaque).

Fig. 2
figure 2

Mean reaction times measured in Experiment 1 for the monkey (a) and human (b) subjects. Scores are shown for the three array sizes (8, 12, 20) with four different target face conditions. The numbers to the right are the slopes of the mean reaction time as a function of the array size. Bars correspond to standard errors

The analysis of the slopes of the mean reaction time as a function of the array size (Fig. 2a) showed flatter gradients for the Japanese macaque faces (− 1.68 ms per item) and rhesus macaque faces (− 0.09 ms per item) compared to those for the backs of the head (38.12 ms per item) and the human faces (54.66 ms per item).

The humans

The mean correct-response rate in the test phase was 99.71% (SE = 0.14), and there were no significant differences in this rate between the 4 experimental conditions. There were also no significant differences in the mean reaction times between the conditions with the array size of 4 in this phase (Fig. S1).

Figure 2b shows the mean reaction times for the three increased array sizes. The ANOVA revealed significant main effects of the array size (F 2,14 = 8.97, P < 0.01, η 2 = 0.09) and the variations in target objects (F 3,21 = 34.291, P < 0.001, η 2 = 0.43), and a significant interaction between these two factors (F 6,42 = 2.67, P < 0.05, η 2 = 0.08). With respect to the array size, a simple main effect was confirmed for Macaque faces (F 2,30 = 7.84, P < 0.001, η 2 = 0.45) and backs of the head (F 2,30 = 9.24, P < 0.001, η 2 = 0.53), but not for the human faces (both Japanese and Caucasian faces).

The analysis of the slopes of the mean reaction time as a function of the array size (Fig. 2b) showed flatter gradients for the Japanese faces (− 0.61 ms per item) and Caucasian faces (7.54 ms per item) compared to those for the backs of the head (43.70 ms per item) and Macaque faces (39.82 ms per item).

Discussion

We tested the Japanese macaques’ and human participants’ face searching ability by administering similar tasks. The realistic faces were searched efficiently among varied distracters by monkeys. These results replicated the previous finding (Nakata et al. 2014) in which monkeys searched for a face-like drawing among non-varied distracters, by using more natural settings. No significant increases in the reaction times for the increase in the array sizes were observed when the frontal face of own genus for the monkeys was searched by the Japanese macaques and the humans. It thus seems reasonable that not only humans and chimpanzees but also Japanese macaques detect the frontal-view faces of conspecifics efficiently. The range of distracter objects presented in this experiment was exceedingly large, and the monkeys’ visual experience of some of these distracters might be different from that of the human participants. In other words, the monkeys might not have encountered most of the human-made objects in their lives, whereas human participants presumably had sufficient experience with all presented objects. Nevertheless, the pattern of results was almost the same for humans and monkeys. Therefore, the difference in daily experiences for some of the distracter objects did not appear to affect the present results.

The faces used here, i.e., both the two types of human faces (Japanese and Caucasian) and the two types of Macaque faces (Japanese macaque and rhesus macaque), share several facial features in terms of function and shape, and the arrangement of the facial features is largely common. These similarities between human faces and Macaque faces — in terms of sensitivity to the first-order spatial configuration among facial features (Diamond and Carey 1986) as “a face” — were less effective for the efficient detection of the faces. In contrast, the information from faces that is used to infer “a species” was effective for the detection. The macaques in this experiment detected the macaque faces more efficiently than they detected the human faces, and the human participants detected human faces more efficiently than the macaque faces. These results concur with the results of Experiment 4 in the above-mentioned study of chimpanzees (Tomonaga and Imura 2015) in which the chimpanzees did not efficiently detect faces of Japanese macaques.

The advantage in the recognition of congeneric faces for monkeys as well as conspecific faces for humans has been referred to as own- or other-species effect (OSE) in behavioral studies (Campbell et al. 1997; Pascalis and Bachevalier 1998; Dufour et al. 2006; Scott and Fava 2013) and neuroscience studies (de Haan et al. 2002; Scott et al. 2005; Sigala et al. 2011). Our present findings suggest a robust OSE on the efficient detection of faces. We observed that the efficient detection of a face is likely to occur not toward every face stimulus, which shared the first-order spatial configuration, but toward congeneric face stimuli which may be at a basic and entry level classification (Tarr and Cheng 2003) of the description of faces.

Numerous studies indicate that faces from a human subject’s own race are recognized or discriminated better than faces of humans of other races, known as the other-race effect (ORE) in humans (Hugenberg et al. 2010; Anzures et al. 2013; O’Toole and Natu 2013), which is similar to the OSE. Japanese macaques clearly discriminated pictures of their own species from those of closely related species (Fujita 1987), indicating that Japanese macaques scanned other macaques in a manner that is similar to that observed in humans who are subject to the ORE.

In the present study, in contrast to the ORE confirmed in previous studies, the human subjects (who were Japanese) were able to efficiently detect the faces of another race (Caucasian) among non-face objects. It is likely that the human participants had experienced seeing many Caucasian faces in real life or through the internet, although Japanese have less experience with Caucasian faces in general. It must be noted that Japanese faces were detected just slightly faster than Caucasian faces. This may indicate that the Japanese participants processed Caucasian faces as having very subtle differences in the configuration of facial features compared to faces of their own ethnicity. The Japanese macaques were also observed to efficiently detect the faces of another (macaque) species, the rhesus faces. The present results suggest that, contrary to the OSE, the ORE has little impact on the efficient detection of the faces at the basic level of classification. Thus, subtle information that is useful for the discrimination of race is likely to be ignored when searching a face among non-face objects, and each face would be processed similarly as a face.

Stronger holistic or configural processing underlies the own-race advantage in face memory, in humans (Tanaka et al. 2004; Michel et al. 2006) and in chimpanzees (Tomonaga and Imura 2015). In addition, studies with human subjects obtained convincing evidence of the importance of high-level, holistic face characteristics (Maurer et al. 2002; Burke and Sulikowski 2013) for the efficient detection of conspecific faces (Hershler and Hochstein 2005). Considering that Old World monkeys use a holistic or configural process when they recognize the differences between conspecific faces (Martin-Malivel et al. 2006; Adachi et al. 2009; Dahl et al. 2009; Gothard et al. 2009), it is likely that the efficient detection confirmed in the present experiment was due to a process based on the holistic information of own-class faces rather than a process based on distinctive facial parts.

However, in another human study (VanRullen 2006), the pop-out effect similar to that observed in the above human study (Hershler and Hochstein 2005) was confirmed, but it was not limited to faces, was not holistic, and was due to low-level properties of the images. Although the pop-out effect of faces from a variety of heterogeneous distracters in humans is now widely accepted, there is still room for a considerable measure of disagreement about what information included in the face is important for the pop-out effect (Hershler and Hochstein 2006). Therefore, in Experiment 2, we further examined which facial information (holistic high-level information of faces or low-level visual information including faces) is important for the pop-out effect.

Experiment 2

Results

The macaques

In the training phase, the two macaques required five and eight sessions, respectively, to satisfy the 99% criterion. The mean correct-response rate in the subsequent test phase was 99.13% (SE = 0.28), and there were no significant differences in the rate between the experimental conditions. There were also no significant differences in the mean reaction times between the conditions with the array size of 4 in this phase (Fig. S1).

Figure 3a shows the mean reaction times for the three increased array sizes in the test phase. The ANOVA revealed significant main effects of the array size (F 2,30 = 25.07, P < 0.001, η 2 = 0.08) and the variation in target objects (F 3,45 = 17.91, P < 0.001, η 2 = 0.14) and a significant interaction between these two factors (F 6,90 = 4.52, P < 0.001, η 2 = 0.09). The simple main effect of the array size was confirmed only for the HPF-16.8 faces (F 2,30 = 30.08, P < 0.001, η 2 = 0.84); there were no significant differences between the array sizes for each of the other three types of faces.

Fig. 3
figure 3

Mean reaction times measured in Experiment 2 for the monkey (a) and human (b) subjects. Faces of subject’s own species were used. HPF faces filtered using a high-pass cutoff, and LPF faces filtered using a low-pass cutoff. The descriptions as for Fig. 2

Our analysis of the slopes of the mean reaction time as a function of the array size showed a steeper gradient (34.43 ms per item) for the HPF-16.8 faces compared to the two low-pass filtered faces (LPF-8.3 faces, 5.69 ms per item; LPF-4.8 faces, 8.64 ms per item). The slope for the HPF-12.6 faces (9.99 ms per item) was between these steep and flat gradients.

The humans

The mean correct-response rate in the test phase was 99.64% (SE = 0.11), and there were no significant differences in this rate between the experimental conditions. There were also no significant differences in the mean reaction times between the conditions with the array size of 4 in this phase.

Figure 3b shows the mean reaction times for the three increased array sizes in the test phase. The ANOVA revealed significant main effects of the array size (F 2,14 = 9.27, P < 0.01, η 2 = 0.12) and the variation in target objects (F 3,21 = 18.80, P < 0.001, η 2 = 0.34), and a significant interaction between these two factors (F 6,42 = 3.12, P < 0.05, η 2 = 0.08). With respect to the array size effect, the simple main effect was confirmed for two types of HPF faces (HPF-16.8 faces, F 2,30 = 14.52, P < 0.001, η 2 = 0.70 and HPF-12.6 faces, F 2,30 = 5.48, P < 0.01, η 2 = 0.26). There were no significant differences between the array sizes for the two types of LPF faces.

Our analysis of the slopes of the mean reaction time as a function of the array size showed steeper gradients (30.30 ms per item) for the HPF-16.8 faces and the HPF-12.6 faces (21.81 ms per item) compared to the two types of low-pass filtered faces (LPF-8.3 faces, 6.90 ms per item; LPF-4.8 faces, 4.91 ms per item).

Discussion

In the present experiment, when the array size was increased, both the humans and macaques showed almost no or only slight increases in reaction times when searching for the LPF faces, whereas they showed steeper slopes for the HPF faces; in particular, the slope for the HPF-16.8 faces showed a clearly different trend compared to that for the LPF faces. These results suggest that the information provided by low-spatial-frequency components in faces is more important than the information provided by high-spatial-frequency components for efficient facial searching among non-face objects. This may also be true when naturalistic own-class faces are presented as the target, as in Experiment 1.

Previous studies have shown that high-spatial-frequency components provide information about distinct facial parts relatively well and that low-spatial-frequency components provide holistic information (Vuilleumier et al. 2003; Goffaux and Rossion 2006). Actually, the LPF faces appeared blurred, while the HPF faces looked very sharp and each facial part appeared distinctly. Therefore, in searching for the LPF faces, it was difficult for the subject to focus on the individual parts of the face, and this difficulty may have augmented the process based on holistic information. In contrast, in searching for the HPF faces, it was easier for the subject to focus on the individual parts of the face, which may have augmented the process based on local information from individual facial parts. Our findings thus suggest that the process based on holistic information which is not ascribed to individual facial parts is more critical for efficient face searching than the process based on local information. In this experiment, we did not use intact faces in which the spatial frequency contents were not modulated. It is thus debatable whether the efficient facial searching for the LPF faces is similar to the facial searching of intact faces (e.g., the own-species faces in Experiment 1). Further research is needed to address this issue.

General discussion

Our findings have two important implications. First, the results for the humans and Japanese macaques showed similar trends at most points in both experiments, suggesting that both species possess a mechanism that allows them to search for a face among non-face objects in an efficient manner. Some studies claimed that unlike chimpanzees, rhesus monkeys did not show the Thatcher illusion (Thompson 1980), a measure of face processing ability in which changes in the configuration of facial features are difficult to detect in an inverted face (Weldon et al. 2013). There may be potential evolutionary differences in face processing among primates. However, other studies have systematically shown that monkeys perceive the Thatcher illusion (Adachi et al. 2009; Dahl et al. 2010). Furthermore, the Thatcher illusion was shown even in squirrel monkeys, a New World monkey that shows great differences from humans in terms of ecology, morphology, and anatomy (Nakata and Osada 2012). Other studies also did not find the differences in face processing among primates (Dahl et al. 2013). Therefore, it seems reasonable that the ability to search efficiently for faces derives from fundamental social cognition abilities broadly shared among primate species, although there is room for further investigation to determine whether other species showing social behavior than humans (Hershler and Hochstein 2005), chimpanzees (Tomonaga and Imura 2015) and Japanese macaques can efficiently search for a realistic face among various non-face objects; for example, among New World monkeys whose face recognition abilities are somewhat similar to those of humans (Weiss et al. 2001; Neiworth et al. 2007; Nakata and Osada 2012).

Second, efficient face detection was confirmed when the searched-for faces had lower-spatial-frequency components and when the searched-for faces shared similar visual description as “own-species faces” (e.g., faces of both Japanese and Caucasians for the human participants and macaque faces of both Japanese and rhesus macaques for the monkey subjects). The slope of the increase in reaction time to detect the target among additional distracter items was used as a measure of search efficiency in human studies, with a slope close to zero in efficient parallel search (Wolfe 2001). Our results showed this trend relatively well in the searching of conspecifics faces and LPF faces. These results suggest that (1) processing based on the holistic information of the faces is important for the efficient detection of faces; in other words, the holistic processing of the faces is clearly meaningful in the earlier stage of visual processing as well as in the later stages; and (2) the processing that is developed due to sufficient experience with conspecific faces could be generalized over the extent of the experience.

Under our experimental settings, why did humans and monkeys conduct searches for the same face in different manners? For example, why did humans search for a human face efficiently, while monkeys searched for the same face inefficiently? Viewed in this light, the most likely explanation is that, in both humans and Japanese macaques, the faces for which the subjects searched efficiently became specialized through perceptual narrowing in early life (Pascalis et al. 2002) and/or daily experience in later years. The specialization may be endowed innately or it may occur at a very early stage of life as a study indicated that newborn monkeys can acquire the knowledge of basic structure of own-species faces (Sugita 2009). Such specialization would lead to an increase in useful information (i.e., holistic information), which would further augment the efficiency of the search.

Morphological similarities among frequently encountered faces and infrequently encountered ones (e.g., faces of subjects’ closely related species) make use of similar configural processing in the faces (Dahl et al. 2014). Regarding our present results, it is thus likely that differences in the efficient detection of the face classes and the differences between the OSE and the ORE in efficient face detection are due to the differences in the morphological similarity among face classes.

In regard to the main conclusion of this study, our results strongly suggest that conspecific faces were detected efficiently among non-face objects due to a process in which the face recognition mechanism is shared not only among humans but also among at least one species of Old World monkey, i.e., Japanese macaques.