Introduction

Pictorial stimuli are often used in experiments to represent the entities they depict. Using photographs, line drawings, slides, and video to represent live animals or natural objects provides an alternative to presenting the real items as stimuli. Using images provides researchers with greater control over the stimuli presented to an animal and allows for repeated exposure of the same stimuli to all subjects in a study (D’Eath 1998; Fagot et al. 1999; Oliveira et al. 2000; Rosenthal 1999). However, an overriding concern when using images as stimuli rather than the actual items is whether the animals conceptualize the two-dimensional image as the three-dimensional object it is intended to represent (see review by Bovet and Vauclair 2000). Animals across a wide range of taxa from spiders to primates appear to exhibit a capacity to conceptualize the content of two-dimensional images (see Table 2 in Bovet and Vauclair 2000).

The techniques used to confirm picture recognition vary widely (see Bovet and Vauclair 2000, for a full review) with perhaps the most common being observation of “appropriate responses” when presented with an image. Among the many examples include rhesus monkeys (Macaca mulatta) showing fear when presented with a picture of a threatening individual (Sackett 1965), squirrel monkeys (Saimiri sciureus) responding to video images of predators with alarm calls or fear (Herzog and Hopf 1986), squirrel monkeys attempting to grasp a moving video image of insect food (Herzog and Hopf 1986) and female jumping spiders (Maevia inclemens) responding to videos of courting males with receptive behavior (Uetz and Smith 1999). A variation of an appropriate response test is a “preference” test in which an animal selects between images in a manner that implies they recognize the content. For example, nonhuman primates provided with images of many primate species tend to selectively view their own species, implying recognition of content (Dufour et al. 2006; Fujita 1987; Fujita and Watanabe 1995). Similarly, sheep (Ovis aries) given a choice between images in the arms of a Y-maze tend to select sheep faces more than human faces (Kendrick et al. 1992).

Experimental approaches for testing picture recognition include transfer experiments in which animals successfully transfer a learned discrimination from objects to pictures or pictures to objects. For example, pigeons taught to discriminate seeds from inedible objects (e.g., sticks) were later able to discriminate photographs of the seeds from the objects (Watanabe 1993, 1997). Matching of a three-dimensional object to a two-dimensional image also implies picture recognition (Cabe 1976; Delius 1992; Malone et al. 1980; Spetch and Friedman 2006; Tanaka 1996; Truppa et al. 2009). Successfully sorting of images into prearranged categories also implies recognition, as, for example, when nonhuman primates reliably sort images into the categories food or non-food (Bovet and Vauclair 1998; Savage-Rumbaugh et al. 1980). Learning by viewing images or watching video also implies recognition as when chimpanzees learn the location of a food reward by watching a video (Poss and Rochat 2003).

Despite widespread evidence for picture recognition in animals, Fagot et al. (1999) have cautioned that the ability should not be assumed in nonhuman subjects. They recommend that picture recognition should be tested directly before pictures are used as stimuli because animals may not recognize the content of images as a human experimenter would. Accordingly, Fagot et al. (1999) defined three levels at which animals might comprehend pictures. The first is “independence,” in which an animal has no comprehension of the image and does not translate the patterns of shape and color in the two-dimensional stimulus into any recognizable object. The mental processes that occur when an animal views the image are independent of the object in the picture. The second is “confusion,” in which the animal recognizes the content of the image but confuses the image with the entity depicted, as, for example, when monkeys grab at pictures of food in an attempt to place them in their mouth (Bovet and Vauclair 1998; Parron et al. 2008). The third is “equivalence,” in which the animal not only recognizes the content of the images, but also realizes that the image is a representation of an object and not the actual object. Humans achieve equivalence, although this comprehension is a developmental process. Children approximately 9 months of age treat images with confusion as if the picture were the object, but by approximately 19 months, children have learned through experience that the picture is a representation of and a referent to an object (DeLoache et al. 1998). In their review, Fagot et al. (1999) conclude that animals can recognize the content of images, but that evidence for equivalence in nonhuman animals is weak or contradictory.

The capacity to recognize pictures is not universal and can depend on a variety of factors, however (Bovet and Vauclair 2000; Fagot et al. 1999). For example, chimpanzees (Pan troglodytes), a species that typically performs well in picture recognition tasks, failed to match objects to their photographs in one experiment (Winner and Ettlinger 1979). Baboons (Papio papio) can also show poor performance in matching pictures to objects and vice versa (Martin-Malivel 1998). Birds often do not appear to recognize two-dimensional images (Bird and Emery 2008; D’Eath and Dawkins 1996; Dawkins 1996; Dittrich et al. 2010; Patterson-Kane et al. 1997; Ryan and Lea 1994), probably due to physiological and perceptual differences in their visual system compared to humans (Delius et al. 2000). Animals may show some degree of picture recognition but only when the images are shown in a particular medium, and as the image is abstracted (e.g., from video to photograph or photograph to silhouette), recognition usually declines (Bird and Emery 2008; Cabe 1976; Delius 1992; Ganea et al. 2008; Pierroutsakos and DeLoache 2003; Tolan et al. 1981). Familiarity or experience with the objects depicted also enhances picture recognition (Aust and Huber 2010; Fagot et al. 1999; Neiworth and Wright 1994). Further, individual differences in the ability to recognize the content of images appear to occur within species as some individuals perform successfully on picture recognition tasks while others do not (e.g., Bovet and Vauclair 1998; Martin-Malivel 1998; Tanaka 1996). The discrepancy could be accounted for by differences in task motivation, attention, or the capability of animals to carry out expected experimental procedures, but may also indicate individual differences in the cognitive ability to translate the two-dimensional image into a mental representation of the item depicted.

An additional issue inherent in picture–object recognition studies that involve discrimination, matching, and categorization is whether animals conceptualize the image as the object it represents. Animals can discriminate, match, and categorize pictures and objects using features common to both stimuli (e.g., shape, color) without understanding the content of the images. Aust and Huber (2006, 2010) emphasize the importance of demonstrating “representational insight” in picture–object recognition studies in which it is shown that animals understand the relation between the content of pictures and the objects they represent. However, they point out that this relationship is rarely tested. The authors tested for this experimentally by training pigeons to select incomplete pictures (e.g., humans with heads out of frame) and then testing whether the birds would selectively choose images with the unseen portion (human heads) versus control stimuli. The birds tended to select images that would complete the picture they were trained to select, and the authors concluded the birds exhibited representational insight or understanding of image content.

Accordingly, methodologies should be developed that not only indicate animals perceive correspondence between an image and an object, but also suggest animals interpret the image as the object it represents. We tested these capacities by assessing monkeys’ preferences among a group of objects and comparing those preferences to preferences for images of those same objects. We assumed that if animals recognized the images, they would select images of preferred items in order to receive them. Specifically, we assessed monkeys’ preferences for a wide range of food items and then exposed them to photographic images of two of those foods on a touch screen monitor. We provided them with a piece of whichever food they touched and animals learned to select the image of the preferred food in the pair in order to receive the food item. Once animals demonstrated that they would reliably select images of their preferred foods on one set of food items, we transferred them to images of a second set of different familiar foods and evaluated their choices. We predicted that if animals recognized the images presented on the screen, then they would spontaneously select images of their preferred foods on the second set of food images in order to receive the preferred food. Spontaneous transfer would rule out that animals were quickly learning an association between an unrecognizable image and its contingent food reward. We also tested whether animals touched images of their preferred foods more quickly than less-preferred foods. Quicker reaction times for images of preferred foods might indicate an expectancy for the real item the animal was about to receive and provide further support for picture/object correspondence.

We used familiar food as stimuli because biological relevance is thought to improve performance on picture recognition tasks (Bovet and Vauclair 1998). In addition, since the appearance of food items often changes, it has been suggested that animals should be more likely to recognize and identify food items despite visual variation (Santos et al. 2001). We used lion-tailed macaques (Macaca silenus) as subjects because nonhuman primates generally outperform other taxa on picture recognition tasks (Bovet and Vauclair 2000). Further, studies of chimpanzees (Savage-Rumbaugh et al. 1980) and baboons (Papio anubis; Bovet and Vauclair 1998) have demonstrated that primates can categorize images as food versus non-food items. Numerous studies have tested macaques with pictorial stimuli to assess their cognitive abilities, particularly studies that test the extent to which macaques can successfully categorize images containing similar items (e.g., Wright et al. 1984). However, as previously mentioned, animals can perform successfully at such tasks without necessarily understanding the content of the images. One exception was a study in which macaques (M. mulatta) categorized images of objects with which they had had active experience more accurately than objects with which they had had only passive experience (Neiworth and Wright 1994). Only a few studies have been designed as systematic tests for picture recognition in macaques, and these indicate that macaques can recognize the content of pictorial images (Malone et al. 1980; Tolan et al. 1981; Zimmermann and Hochberg 1970). Ours is perhaps the first study to systematically test for picture recognition of food items in macaques.

Spontaneous selection of preferred food items on the novel transfer images would provide rather convincing evidence that animals recognized the content of the images; however, we designed a second experiment that would further support picture recognition and rule out rapid association learning of an unrecognizable stimulus with a food reward. In this experiment, we paired an image of a moderately preferred food with either a low-preference food or a high-preference food. We predicted that, if the animals recognized the content, an image of the same moderately preferred food would be chosen when paired with an image of a low-preference food and would not be selected when paired with an image of a high-preference food. If animals did not recognize image content and were using association learning to pair an unrecognizable stimulus with receipt of a particular food reward, then, in this experiment, the stimulus of the same medium-preference food would serve as a positive discriminative stimulus on some trials and as a negative discriminative stimulus on others. Macaques have the ability to learn such complex context-specific stimulus associations, but it takes hundreds of trials and extensive training to acquire the task (e.g., Gaffan 1979). We designed this experiment with a low number of trials (N = 50) and with unique food pairings on each trial. Medium-preference foods were always paired with a different food stimulus, providing no opportunity to learn under which cases they were positive or negative discriminative stimuli. Thus, operant association learning, rather than picture recognition, would be a very unlikely explanation for successful performance.

Methods

Subjects and housing

The animals tested were a group of five adult male lion-tailed macaques (Macaca silenus) housed at the Bucknell University primate facility in Lewisburg, Pennsylvania (Bert, Max, Pierre, Henri, and Ranier). The group was established in 2002 from animals on loan from the San Diego Zoo. All animals had experience using a touch screen from previous experiments. Stimuli in prior experiments consisted of geometric shapes (black and gray squares) and patterns of color (as in Saito et al. 2003), but no animal had ever viewed images of food or any other naturalistic objects. Animals were housed in an indoor/outdoor enclosure consisting of a 9 × 11 × 4.5 m outdoor compound and a 9 × 6 × 2.25 m indoor quarter. The indoor quarter was subdivided into three approximately 3 × 6 × 2.5 m compartments. The three compartments were joined through interconnecting doorways and each had a doorway leading to the outdoor compound. High-protein monkey biscuits and water were available ad libitum. Once daily, this diet was supplemented with an assortment of nuts, fruits, grains, cereals, and/or vegetables.

Procedures

Assessment of food preferences

Animals were trained to enter the indoor quarters and move into separate compartments for training and testing. Thirty-eight food items were used, eighteen of which were foods already routinely offered in the animals’ diet. Twenty were new foods introduced into their diet in order to provide a sufficient number of choices to complete the planned regimen of training and testing. Animals were introduced to the new foods in the days prior to preference assessment. We used a wide variety of visually distinct foods: cakes, candies, cookies, crackers, cereals, earthworms, fruits, monkey chow, nuts, and vegetables. Individual food preferences were assessed by presenting paired combinations of the 38 food items to each subject. A pair of food items was placed 40 cm apart on the surface of a 75 × 51 cm white horizontal platform, which was rolled up to the animals’ caging. Animals would reach through the caging to take a food item, and the platform was retracted before they could take the second item. A pair of foods was presented to each animal twice, and, on the second presentation of each pair, the right or left orientation of the foods was reversed in order to control for a handedness bias. Preference tests were conducted over a period of 25 days, with each food being paired with another food an equivalent number of times. Initial preferences were assigned based on the total number of times each food was chosen over other foods. Preferences varied widely across individuals, and, like humans, the monkeys tended to favor less healthful sugary items and to eschew their vegetables (Table 1). We reassessed food preferences after each phase of training and testing to determine whether preferences changed on the food pairs used in each phase. We intended to remove trials from analyses in which a food preference reversed, but no preferences changed for any trial pair throughout the course of the study.

Table 1 Of the 38 foods presented, the five most preferred and least preferred for each subject

Stimuli

Digital photographs of each food were taken using a 3.34-megapixel Nikon Coolpix 995 camera. Foods were photographed on a plain white background in a state that was similar to the way they were provided during feeding. For example, apples were photographed as slices rather than as whole fruit because that was how they were fed to the animals. Photographs were taken from the same distance (31 cm) and with the same lighting to control for size, color, shadow, and contrast. Each image was also edited using Adobe Photoshop™ to attain a pure white background but retain the shadows. Three or four different pieces of each food type were photographed from a variety of perspectives so that, when a food was used more than once during a testing session, animals never viewed the same image of that food (Fig. 1a). Using multiple images of each food reduced the possibility that animals were rapidly learning an association between an unrecognizable stimulus and a particular food reward.

Fig. 1
figure 1

Exemplars of a three different stimulus images of the food item “Broccoli” with different pieces photographed from different perspectives and b a pair of stimuli as they would appear on the touch screen during a trial

Training

Prior to testing, animals needed to learn that they would receive a piece of the food depicted in an image if they touched that image on a touch screen. They also needed to learn that if two images were displayed on a screen, they received a piece of the food in the image they elected to touch and not the other food item. Finally, they needed to learn that they would receive a piece of the food they touched even though a piece of that food was not in view at the time of a selection. Each animal progressed through three training phases to acquire these concepts. Images were presented using a 15″ Elo touch screen monitor and a MacIntosh G3 computer running PsyScope experiment generating software (Cohen et al. 1993). A cart containing this apparatus was wheeled up to the caging, and animals could reach through and touch the screen.

In the first phase of training, a single image of food was presented on the screen, and, when an animal touched the image, it was rewarded with a corresponding piece of food. A trial began with a “start screen” containing a green rectangle at the bottom of the screen that subjects were required to touch to begin each trial. The start screen ensured that subjects were in front of the screen and ready to participate when the test stimulus appeared. A start screen was used to begin all trials throughout the remaining training phases and experiments. Once the start screen bar was touched, a 5.5 × 5.5 cm image of a food item was displayed in the center of the screen. Images were 546 × 410 pixels in resolution. During the trial, the experimenter held a piece of the food depicted in the image above the testing apparatus approximately 60 cm from the subject. When an animal touched the image, a piece of that food was dropped into a box affixed 16.5 cm to the right of the screen. Animals reached their hand through a 7.6 × 10.2 cm hole to retrieve the food from the box. The picture remained on the screen while the animal consumed the food to allow the animal to associate the image with the food. In the case of animals that recognized the two-dimensional image, they might learn more quickly that they were receiving the food that they touched. In the case of animals that were not recognizing the two-dimensional image, additional exposure to the image might allow them to learn that the three-dimensional object they received corresponded to the two-dimensional image. After the food appeared to be fully consumed, the experimenter advanced to the next trial. The experimenter also advanced to the next trial if the animal discarded a food item. Six of the thirty-eight food items were used in this phase of training. Animals received four sessions of one-food training with twenty trials per session. The 20 trials consisted of a randomized list of three exemplars of each food. The randomized list of images presented was formulated before each session, so the experimenter could arrange the foods in a holding tray in the proper order and be prepared to proffer the correct food for each trial. The apparatus also contained a second computer monitor displaying the screen observed by the animal to the experimenter, providing further coordination between the image displayed and the reward presented by the experimenter.

In the second phase of training, animals were presented with two images of food on a trial and were provided with the one that they touched. A trial began with two 5.5 × 5.5 cm images appearing in the center of the screen 3 cm apart (Fig. 1b). While the food images were presented, the experimenter held a piece of each food above the apparatus in view of the subject. One piece was held in each hand approximately 20 cm apart. When an image was touched, it remained on the screen while the second image disappeared. The animal was then given a piece of food corresponding to the image selected. Selected images remained visible on the screen until the food was consumed, and then the experimenter removed the selected food image from the screen by advancing to the next trial. Selected images were kept visible while the animals consumed the food to help the animals learn that they received the item from the pair that they touched. If animals did not consume the food item, usually by quickly discarding it, the rejection was recorded and the experimenter advanced to the next trial.

Twenty-four trials were conducted in each session. Food pairs consisted of 16 foods that were not used in the first phase of training. A food pair in each trial was based on the results of initial preference assessments obtained for the 16 foods as described above. Highly preferred foods were randomly paired with low-preference foods to create 24 food pairs, under the condition that no food was ever paired with the same food twice. If a food recurred among the 24 food pairs (i.e., it happened to be paired with more than one food), we used images of that food photographed from different perspectives so that a particular image was not used more than once in the session. A unique set of 24 image pairs was created for each subject that was tailored to their individual preferences. Food images were randomly presented on either the left or right side of the screen. Food items displayed to the animal by the experimenter were also randomly presented in either the left or right hand so the side on which that food was held did not necessarily correspond to the side that the food image was presented on the screen. As in the first phase of training, randomized schedules of presentation were constructed prior to training sessions so that the experimenter could prepare a tray containing each pair of foods in the order they occurred in the session and be ready to display the foods to the animal and provide the selected food.

To advance through training, subjects had to demonstrate a capacity to select the image of their preferred food. Our criterion was selection of images of preferred food items in a session significantly more often than expected by chance. Using 24 pairs in a session with a 50% chance of randomly selecting the preferred food, 17 out of 24 selections of preferred food would indicate one-tailed statistical significance according to a chi-square distribution. In addition, we required that each subject complete three consecutive training sessions of over-chance selection of preferred foods in order to complete training. For each session, the same 24 food pairs were presented in a randomized order.

The third training phase was identical to the previous phase except that the food items were no longer displayed to the animal during each trial. The training was necessary because we wished to test spontaneous picture recognition in the transfer experiment and no foods could be displayed concurrently with the images. The image pairs in each session were the same as those used in the previous training with the order of pairs randomized in each session. Again, animals were required to choose the image of their preferred food significantly more often than chance on three consecutive 24-trial sessions to complete training.

Transfer experiment

The procedures for the transfer experiment were identical to those used in the third phase of training except that we used the final 16 foods from the original pool of 38 as stimulus images. Foods represented the most and least preferred items from the original preference assessments. Animals selected between images of familiar foods they had never viewed as images, and the foods depicted were not displayed to the animals during trials. Images of 8 preferred and 8 non-preferred foods were semi-randomly paired to form a 24-trial session of preferred and non-preferred pairs with the conditions that each food appeared three times during the session and no food was ever paired with the same food. In addition, the three presentations of each food in a session were a different depiction of the food (e.g., Fig. 1a). Providing unique exemplars of the foods would prevent the learning of a rapid association between a particular unrecognizable stimulus and a contingent food reward. If animals were spontaneously recognizing the food images, we expected them to select images of preferred foods significantly over chance on the first transfer session. Unsuccessful transfer would suggest that the animals did not recognize the images and were able to complete their training by learning that particular stimuli, although unrecognizable as food, were associated with preferred rewards. To test for a possible learning effect, in which performance would improve with repeated presentations, we conducted two additional transfer sessions by presenting the 24 pairs from the first transfer session in a random order.

Relative preference experiment

We tested for “relative” preferences by dividing foods into high-, medium-, and low-preference categories based on each animal’s original preference assessments. Using 5 high-preference foods, 5 low-preference foods, and 10 medium-preference foods, we created 50 pairings, each of which contained a medium-preference food. In half of the pairs, the medium-preference food was paired with a lower-preference food, and in the other half, the medium-preference food was paired with a higher-preference food. A unique pair of foods was used in each of the 50 trials, and the two foods in each pair had not been paired in any previous training or transfer trials. The 50 pairings were presented in two 25-trial sessions. Each food was viewed twice in a session, but a different depiction of that food was used the second time it was displayed. Animals were tested using the same procedures as the transfer experiment: pairs of images were presented on the screen without the foods in view, and animals were provided with the food corresponding to the image they selected.

Data analyses

For the transfer experiment, we tallied trials on which animals chose the image of their preferred food based on their known preferences for those foods and conducted two-tailed binomial tests to determine whether each animal selected preferred foods significantly more than non-preferred foods in each of the three transfer sessions. With 24 trials in a session, selecting 18 out of 24 preferred foods (75%) would attain two-tailed statistical significance at P ≤ .05. For the relative preference experiment, we pooled data from the two 25-trial sessions for each animal, resulting in 25 trials in which a medium-preference food was paired with a low-preference food and 25 trials in which those same medium-preference foods were paired with high-preference foods. We tallied the number of trials in which a medium-preference food was selected under each pairing type and conducted two-tailed binomial tests to determine whether the medium-preference food was selected significantly more or less often. With 25 trials of each pairing type, 18 out of 25 selections (72%) would be significantly more than expected and 7 out of 25 selections (28%) would be significantly less than expected at P ≤ .05.

The speed with which animals selected preferred versus non-preferred food images during paired presentations was evaluated by obtaining reaction times for these two categories of image for each subject in each phase of training, the transfer test, and the relative preference test. We used each subjects’ median reaction times for preferred and non-preferred items in analyses, rather than means, in order to reduce the influence of extreme cases. Animals were free to take long intervals before responding, sometimes creating long response latencies. Animals also anticipated the arrival of the stimuli and sometimes rapidly touched an area of the screen where stimuli were due to arrive. Using the median response time that an animal took to select images of preferred versus non-preferred food reduced the influence of these extremes. Since the goal for analyzing reaction times was to determine whether the animals responded more quickly because they recognized and anticipated the preferred reward from a pair, we used the data from the last three sessions of the two training phases because this was the point at which animals were reliably choosing images of their preferred foods and had reached our criterion for apparent picture recognition. Examination of reaction times in the early sessions of training would not provide an appropriate test because, initially, some animals did not show any indication that they recognized the images. We examined the first transfer session because this was the first time the animals were viewing images of the foods depicted. If they selected images of their preferred foods more quickly on their first exposure to them, the result would support an expectancy for the object in the image. Finally, in the relative preference test, we compared pairings when the preferred food was selected (high preference over medium preference plus medium preference over low preference) to pairings when the non-preferred food was selected (medium preference over high preference plus low preference over medium preference). We wished to compare response latencies between preferred and non-preferred food images across subjects in each phase of training and testing, but, with five subjects, there were too few degrees of freedom (df = 4) to conduct conventional paired t tests. To estimate the probability of obtaining the differences between preferred and non-preferred food latencies, we ran a resampling version of a paired t test developed by Howell (2010). The test randomly assigns a positive or negative sign to the paired difference scores of each subject and conducts a t test under the assumption that, under a null hypothesis, each difference would have an equal chance of being positive or negative. After numerous random permutations and accompanying t values, the tests indicate the probability of obtaining the t value for the observed difference scores in relation to those for the random differences. We conducted the tests using 100,000 permutations and a two-tailed probability of .05.

Results

Training

All five monkeys completed training, but exhibited a wide range of individual variation in the number of sessions to meet our training criterion of three consecutive testing sessions of over-chance responding (Table 2). When first exposed to pairs of stimuli in the second phase of training, in which the foods in each pair were shown to the animal, Bert required the minimum of three testing sessions to meet our criterion of three consecutive testing sessions of over-chance responding. He selected his preferred foods over chance levels on his first exposure to the images and continued to do so. In the third phase of training, in which the foods in the pairs were no longer shown by the experimenter, he continued to select his preferred foods and again completed training in the minimal three consecutive sessions. Unlike Bert, the other monkeys did not reach our criterion spontaneously. Henri required eight sessions to reach criteria when foods matching the images were displayed during trials, but required no extra training when the foods were no longer visible in the last training phase. The other three monkeys required numerous sessions to reach our training criterion. Max had the greatest difficulty learning to select images of his preferred foods in order to receive them, requiring 19 sessions to learn the procedure even while the foods depicted in the images were in view during selections.

Table 2 Results of training for each subject indicating the number of 24-trial sessions needed to reach three consecutive sessions of over-chance responding

Although some animals took many sessions to reach our training criterion of over-chance selection of preferred food images in three consecutive testing sessions, performance during the first exposure to pairs of images in the second phase of training suggested that they were recognizing the images to some extent. Although not all statistically significant, all five animals selected more preferred food images than non-preferred food images on their first day of training (Fig. 2). Like Bert, Pierre selected images of preferred items significantly above chance during his first two training sessions. As training sessions progressed until reaching criterion, with few exceptions, animals continued to select more images of their preferred foods (Fig. 2).

Fig. 2
figure 2

Frequency of preferred food image selections in 24-trial training sessions by each subject during the second phase of training in which animals were first exposed to pairs of food images and the foods were held in view. Animals were trained until they reached three consecutive sessions of over-chance performance (17 out of 24 preferred food selections). Chance performance was 12 selections of preferred food (indicated by the dashed line)

Transfer experiment

When viewing images of familiar foods for the first time, three of the five subjects (Bert, Henri, and Pierre) chose images of preferred foods over non-preferred foods significantly more than expected during their first transfer session (Fig. 3). These monkeys continued to select images of their preferred foods during their two subsequent transfer sessions. Max did not select images of preferred foods significantly higher than chance on his first day of transfer, but performed significantly above chance during his second and third transfer sessions. Ranier did not select images of preferred foods significantly higher than chance on his first day of transfer or in the two subsequent transfer sessions. Because Ranier was not consistently selecting images of his preferred foods, he was not used in the relative preference experiment.

Fig. 3
figure 3

Percentage of preferred food images chosen in the first through third transfer sessions for each subject. The black bar highlights the critical first transfer session. The dashed line represents expected performance on the two-choice task if subjects were responding randomly (50%). Percentages that reached or exceeded the solid line (75%) represent selections significantly above chance levels

Relative preference experiment

Three of the four macaques tested selected the medium-preference foods significantly more often when they were paired with lower-preference foods and selected the same medium-preference foods significantly less often when they were paired with higher-preference foods (Fig. 4). The fourth monkey, Max, selected the medium-preference foods significantly more often when they were paired with lower-preference foods, but did not select those same medium-preference foods significantly less often when they were paired with higher-preference foods.

Fig. 4
figure 4

Percentage of trials on which each subject selected the medium-preference food rather than a lower-preference food (black) and the medium-preference food rather than a higher-preference food (gray). The dashed line indicates the 50% expected by chance on the two-choice task. Percentages that exceeded the upper-solid line were significantly above chance. Percentages that did not attain the lower-solid line were significantly below chance

Reaction times

Animals tended to select images of preferred foods more quickly than those of non-preferred foods in three of the four conditions (Fig. 5), but the resampling procedure indicated that only the result for the relative preference test had less than a 5% likelihood of occurring: training with food in view (P = .69), training with food out of view (P = .06), the first session of transfer (P = .31) and the relative preference test (P < .001).

Fig. 5
figure 5

Mean (+SE) median reaction times when selecting between images of preferred foods (white) and non-preferred foods (black) in each of the four conditions

Discussion

Three of five monkeys (Bert, Pierre, and Henri) showed clear evidence of picture recognition by selecting images of their preferred foods during their first transfer session in which they had never viewed the foods as images before. For one of these monkeys (Bert), picture recognition was spontaneous as he began selecting images of his preferred foods upon first exposure to pairs of food images and continued to do so throughout training, transfer, and the relative preference experiment. Bert’s results alone indicate that nonhuman primates are capable of representing a 3D object from a 2D picture. For Pierre and Henri, it is difficult to conclude whether their picture recognition abilities were spontaneous or learned because it took them several training sessions to begin selecting their preferred foods reliably. To eventually demonstrate picture recognition, an animal needed to learn or understand two concepts. One was that the images represented real objects and the second was that they received a piece of the food they selected. Animals may have spontaneously realized the images represented real objects when they first viewed them, but took many sessions to learn the reward contingency that they would receive the item that they touched. Or, conversely, animals may not have recognized the images at first but gradually learned to do so as they also learned the reward contingency. An animal could also perform successfully without recognizing the content of the images by associating an unrecognized stimulus with a particular reward and learning to select the stimuli that produced preferred rewards. The latter was a possibility with Pierre and Henri at the beginning of the study, but we know they eventually understood the picture–object translation because they spontaneously selected preferred foods in the transfer experiment. We cannot distinguish which of these three avenues to picture recognition that Pierre and Henri took, and there may be others. Pierre’s over-chance selections of preferred-food images on his first day of training (Fig. 2) suggest that he spontaneously recognized the images. In any case, their transfer results indicate that Pierre and Henri eventually demonstrated the ability to recognize pictures.

Results for Max and Ranier were, at best, equivocal evidence for picture recognition. They took relatively longer to learn to select the images of their preferred foods during training, particularly when the food items depicted in the images were no longer being held in view by the experimenter (Table 2). Their pattern of results may be more consistent with animals that did not recognize the images and learned to associate the unrecognized stimuli with preferred rewards. Max’s performance in the transfer experiment also is consistent with gradual learning without recognition. He did not select preferred foods significantly more often in his first transfer session, but did so in subsequent sessions, perhaps learning which stimuli provided which rewards. Ranier did not even exhibit evidence for gradual learning in his three transfer sessions.

The relative preference experiment provided corroboration for the transfer experiment and additional evidence that three animals recognized the content of the images. The three animals that showed picture recognition in the transfer experiment (Bert, Pierre, and Henri) also performed as predicted in the relative preference experiment. All three animals selected images of medium-preference foods when they were paired with low-preference foods but did not select images of the same medium-preference foods when they were paired with images of highly preferred foods. If animals were treating the images as unrecognizable stimuli that were associated with preferred rewards, then it may have been possible for them to learn to select particular stimuli to receive preferred rewards. They would have to learn to select the stimulus when paired with some stimuli and not select it when paired with others. In other words, the stimulus for the medium-preference food would have to act as a positive discriminative stimulus when paired with some stimuli and a negative discriminative stimulus when paired with several others. Monkeys are capable of such complex stimulus associations (Gaffan 1979), but this form of learning was not possible since the medium-preference foods were paired with unique items in each trial. In addition, if animals viewed the same food twice in a session, it was a different exemplar of the food, so they would have to generalize the stimulus across the three or four different exemplars of that stimulus to succeed through learning rather than recognition. Therefore, success did not depend on learning of stimulus associations, but on the memory of preferences for the objects depicted in the images. The relative preference study also indicated that their preferences were not all or none, but arranged along a continuum in which a medium-preference food can be considered non-preferred in one context (when paired with a more preferred item) but preferred in another (when paired with a less desirable food).

The large range of individual differences among the five monkeys may reflect differences in motivation, attention, testing ability, or temperament. Ranier, the animal that performed most poorly on the transfer experiment, tended to respond rather quickly compared to the other animals, especially on trials in which non-preferred food images were selected. His median response latency for non-preferred food images was less than half that of the other four subjects (301 ms versus 652, 861, 970, and 1,254 ms). We assume he often selected impulsively and consequently received many foods he did not desire. We recorded whether animals ate the foods they selected and, during his transfer experiment, Ranier always ate the preferred foods he selected and never ate the low-preference foods he selected. Since he would receive some food no matter which image he selected, perhaps he began to touch any image as soon as possible to see what he received. If he did not want the food item, he would discard it and respond quickly again to see what he received on the next trial. Since he completed training, we know Ranier was capable of selecting stimuli in order to receive preferred foods, but he may have been successful due to association learning without recognizing the images. The rapid pattern of responding may have been a simpler solution for obtaining preferred foods than memorizing the rewards associated with a whole new set of transfer stimuli. We cannot draw any conclusions concerning Ranier’s picture recognition ability; however, a lack of performance is not necessarily an indicator that he could not recognize the images. Another factor that is rarely considered is variation in visual acuity among the animals being tested. Many of the food images looked rather similar when photographed as they are prepared for feeding (e.g., small slices of sweet potato and carrot) and would be difficult to distinguish if an animal was simply nearsighted. Similar studies of baboons also show individual variability in their ability to recognize pictures of food (Bovet and Vauclair 1998) and other objects (Martin-Malivel 1998). Another possibility is that there may be individual differences in the ability of animals to recognize the content of the two-dimensional images presented on the touch screen. In interpreting any experiment in which images are intended to represent actual objects, an investigator should take into consideration that some animals may not translate the two-dimensional image into a mental representation of the item depicted.

Selection of images of preferred food items during transfer indicated that the animals recognized the content of the images, but did not necessarily mean that the animals made a connection between the image on the screen and the object they received. Animals may have recognized the content of the images and learned the general rule “select preferred food” knowing they would receive some form of preferred reward for their response, but may not have made the connection between the image and the particular object they received. The reaction time data could have been more helpful in inferring which cognitive process was occurring because faster reaction times for preferred foods might infer that animals expected to receive the particular piece of food in the image. Preferred foods were selected faster than non-preferred foods in three of the four conditions examined (Fig. 5), and the difference was statistically significant for the relative preference test, providing suggestive evidence that animals expected to receive what was depicted in the image. Additionally, based solely on anecdotal evidence, we would contend that the animals expected what they touched. Experimenters observed that the monkeys would become noticeably excited when a particularly preferred item was displayed on the screen and they would more quickly place their hand in the hole to be given those foods after touching the stimulus for that item. We recommend that reaction time analyses should be pursued further to address this issue because our tests had few subjects and the low sample size reduced the power of the tests. We should also point out that reaction times were quite fast and there may have been a ceiling effect in that the animals could not have responded much faster to preferred foods as they were responding quite quickly to both types of items. In any case, the cognitive process underlying choices deserve further investigation and continued testing of reaction times may inform the issue.

The novel design combined elements of several techniques used to test for picture recognition. Ours was a “preference” study in that animals selected images of their preferred food items. We did not have to infer preferences because we recorded their known preferences for a variety of foods beforehand and these did not change. As such, we showed that animals would make “appropriate responses” to pictures to receive their preferred foods. Our design was similar to a recent study that attempted to take advantage of animals’ prior learning of a discrimination to test for picture recognition (Dittrich et al. 2010). Pigeons that were shown to react differently to the person that fed them than to other individuals were then tested to determine whether they would select images of their caretaker rather than images of other people. They did not and the authors concluded that the birds did not recognize the images. The result shows the promise of the preferred image design in testing for picture recognition and, perhaps, a fundamental difference between macaques and pigeons.

Our design also incorporated an experimental approach in that we trained animals on one set of stimuli and tested them after transferring them to a novel set of stimuli. The transfer aspect is not unlike other studies in which the experimenter trains the animal to discriminate a particular type of image and then transfers to a novel set of stimuli to determine whether they will continue to select the training stimulus (Watanabe 1993, 1997). One advantage of our method is that we did not train the animals to make a discrimination (preferred versus non-preferred food), but took advantage of their already-established discriminations between food items to test for picture recognition. If one trains an animal to discriminate one set of objects from another, one can never know whether animals recognized the images as the actual entity when initially learning or whether they were using other attributes in the images to learn the discrimination. An animal could then use the same attributes on the transfer images without necessarily understanding what the pictures represented.

This confound was a possibility with our design if the preferred foods used in both training and transfer contained common physical cues that allowed animals to select images of preferred foods without recognizing the content, perhaps because preferred foods shared the same shape or color. Our comparison of the images indicated this was not the case. For example, “Banana” and “Peanut” were both among three of five animals’ most preferred foods (Table 1) and animals tended to select those images, yet the images of the foods bore no resemblance to each other. The “Banana” images were pictures of a transverse section of a thawed fruit, so they were circular with a dark brown peel circumscribing tan banana fruit. The “Peanut” images were of two nuts within their single shell. Similar discrepancies in appearance could be noted about many of the other preferred food images. Animals appeared to be selecting based on flavor rather than a generalized visual stimulus common to the preferred versus non-preferred foods. For example, four of five monkeys had both “Banana” and “Banana cake” in their top five choices and Henri preferred both “Peanut” and “Peanut butter cracker,” items similar in taste but not appearance. At the other end of the preference spectrum, many of the least preferred foods were green vegetables (Table 1), so animals may have generalized a “pick images without green” rule that helped in obtaining preferred foods without recognizing the content of the images. However, even though no animal’s top five foods were green, some fairly high-preference foods were (e.g., apple-flavored breakfast cereal). Also, many low-preference foods were vegetables that were not green (e.g., yellow squash and cauliflower), or were neither raw vegetables nor green (e.g., earthworms and popcorn). If an animal generalized a “pick images without green” rule, they would often receive non-preferred foods and sometimes forego preferred ones. By testing animals on a non-visual cue (i.e., flavor preference), we reduced or removed visual cues as a source of confound for picture recognition and could conclude that they understood content. Aust and Huber (2006, 2010) emphasize the importance of demonstrating such “representational insight” in picture recognition studies.

Taken together, results of the transfer and relative preference experiments provide strong evidence that some macaques recognized the content of two-dimensional images displayed on a touch screen. Correlating an animal’s known preferences for foods to their choices of food images allowed us to demonstrate picture recognition. Fagot et al. (1999) recommend such systematic testing because few studies using images as stimuli actually test for recognition of content. One change we might consider in replication of the experiment on another group of subjects would be to begin training without presenting real exemplars of the food in view for the animals. Successful initial training using this procedure would provide even stronger evidence for picture recognition and a picture–object association. We did not use this procedure because we were concerned that the animals would have difficulty learning the experimental protocol and that initial testing without exemplars in view might interfere with later training. Our concern was borne out as some animals took many sessions to learn the testing procedure even with the foods pictured in the images in view.

Finally, our data may help elucidate the cognitive processes the monkeys used as they evaluated the images, namely whether animals regarded the images with independence, confusion, or equivalence (Fagot et al. 1999). Three of our monkeys (Bert, Pierre, and Henri) showed evidence of picture recognition, ruling out independence between the visual stimuli presented on the screen and the objects they represented, although independence cannot be ruled out for the other two monkeys (Max and Ranier). Bert and Pierre may have treated the images with confusion at first as indicated by their spontaneous selection of preferred food images on their first day of paired training; however, several factors lead us to discount this explanation. First, they did not grab for the images as if to pick them up. Second, they had gone through the first phase of training in which they touched a single image on the screen while the experimenter was holding the item over the screen. The experimenter then placed the item where the monkeys could retrieve it through a hole to the right of the screen. The image remained on the screen during this process, so the monkeys could see that the image was not what they were receiving. Thus, prior to paired testing, they had experience learning that the image was different from the object they received. Third, animals had previous experience using the touch screen and were accustomed to touching stimuli on the hard flat surface with their fingertips and receiving rewards through the hole to the lower right of the screen. When they touched a correct stimulus, they would place their hand through the hole and await the reward. Bert and Pierre treated the food images in much the same way: touching them with their fingertip and quickly proffering their hand through the hole. They did not appear to treat the image on the screen as the object they expected to receive in their hand. Finally, by the time animals had completed training and testing, they had gained much experience touching the hard image on the screen and receiving something else through the hole while the image stayed on the screen. Taken together, we assume they differentiated the image from the food.

If animals demonstrating picture recognition did not show independence or confusion, we must conclude that they regarded the images with equivalence and understood that the image was a representation of a real object. Henri’s pattern of performance may, perhaps, be the best evidence for possible equivalence. He did not exhibit confusion because he did not immediately begin to select his preferred foods during paired training. He did not seem to recognize the images as food at first, but gradually learned after many training sessions with food in view that the images could represent food (Table 2). Having learned this association with food items in view, he quickly completed his second phase of training, in which the foods in the images were no longer in view, in the minimal number of sessions. He then used his image recognition ability to obtain preferred foods when he was transferred to images of different familiar foods. Since he did not seem to treat the images as real objects initially, we would not expect him to start treating them as such (i.e., confusion) after realizing they could represent food. As such, his performance implied that a macaque developed equivalence, realizing that an image can represent an object without being the object, a process also referred to as dual representation (DeLoache 2004).

Interestingly, Henri’s performance appears similar to human infants who gradually learn that a picture can represent an object and, as such, act as a symbol for an object (DeLoache et al. 1998; DeLoache 2004). On the other hand, if Bert treated the images with equivalence, then this adult monkey probably developed equivalence differently than a human infant. He did not seem to show confusion at first, as do human infants (DeLoache et al. 1998), and he did not seem to come to the realization of equivalence gradually since he spontaneously began selecting images of his preferred foods. Our study did not test for equivalence directly, so we can only speculate as to whether our monkeys perceived equivalence or thought symbolically. However, before one uses images to posit these issues, one must first demonstrate that animals recognize the content of the image and animals recognize the connection between a 2D image and a 3D object. Our design reliably allowed us to make these assertions, providing avenues for further investigation into the origins of symbolic thought in monkeys and other species.