Introduction

Since Herrnstein and Loveland (1964) showed that pigeons could learn to peck for reinforcement whenever pictures of people appeared on a screen and not to peck whenever pictures without people were presented, a lively area of research into the issue of animal categorization has developed (see Huber 2001). Many experiments following the initial one by Herrnstein and Loveland have yielded evidence of the amazing classification abilities of pigeons and other bird species, including, e.g., chickens, finches, and blue jays. Regarding mammals, research has mainly been carried out with primates (for overviews of the most relevant literature on animal categorization see, e.g., Cook 2001; Fagot 2000; Matsuzawa 2001; Zentall and Wasserman 2006).

In recent years, dogs have gained importance as subjects for studies investigating animal cognition (e.g. Adachi et al. 2007; Kaminski et al. 2004; Kubinyi et al. 2003; Miklosi et al. 2004; Pongracz et al. 2003; Svartberg 2005). This is partly due to their availability, as keeping lab animals is becoming more and more difficult. Furthermore, dogs have been selected to cooperate and communicate with humans (Hare et al. 2002; Hare and Tomasello 2005; Miklosi et al. 2003), which makes them exceptional among non-human animals in regard to their sensitivity to human-given communicative cues (Bräuer et al. 2006; Miklosi and Soproni 2006; Pongracz et al. 2004) and their trainability to perform actions that are not causally linked to a reward (Frank 1980).

Despite the increasing interest in dogs on the part of researchers in animal cognition, their categorization abilities have hardly ever been studied. The only experiment we know is that investigated categorization in dogs used acoustic stimuli. Heffner (1975) trained dogs to discriminate between two categories of sounds (“dog” vs. “non-dog” sounds). In a subsequent test, the dogs could also successfully categorize sounds to which they had not been exposed during training. Regarding visual categorization, evidence is lacking completely. Therefore, the present experiment was aimed at answering some basic questions concerning such abilities in dogs.

(1) Are dogs able to classify complex color photographs, which were chosen according to a perceptual class rule inferred by a human experimenter, as has been shown for birds and primates? What may be expected from an experiment requiring discrimination of pictorial stimuli will, of course, depends not only on the cognitive abilities of the subject under investigation, but also on its perceptual abilities, determined by the specific properties of that species’ visual system. Canid retinas contain predominantly rods, and only 3% of all photoreceptors are cones (Peichl 1991). Recent behavioral studies (Coile et al. 1989; Neitz et al. 1989) and visual-evoked potential studies (Aguirre 1978; Odom et al. 1983) have demonstrated that dogs possess dichromatic color vision with two classes of cone pigments, having spectral peaks at 429 and 555 nm. The temporal resolution of the cones seems to be a little higher in dogs (70–80 Hz) than in humans (50–60 Hz), whereas in rods it seems to be similar in both species (about 20 Hz; Aguirre 1978; Coile et al. 1989; Wadenstein 1956). Moreover, the retina of the dog contains about 150,000 ganglion cells (Arey and Gore 1942). The optic chiasm has a crossover of about 75% in the dog, consistent with good binocular vision. Although dogs have reduced color perception, image classification has already been shown not to be crucially dependent on color in pigeons (Aust and Huber 2001; Herrnstein and Loveland 1964; Huber et al. 2000), monkeys and humans (Delorme et al. 2000). We therefore expected no severe physiological limitations of the dog’s ability to classify color photographs, provided the category-specific aspects were not restricted to shades of red or very tiny fragments of the pictures.

(2) If dogs can sort color photographs according to an experimenter-intended rule, will they do so by actually attending to properties of the target? The first question to be answered was if the dogs would learn individual category instances by rote on a pixel-by-pixel basis (e.g., in the form of fixed templates) or would they form a representation allowing for some generalization. This could be examined by testing the dogs for transfer to novel pictures. The more interesting question, however, was if the dogs would also be able to distinguish between category-relevant and category-irrelevant features. As the visual features relevant to a perceptual class rule obtain some coherence and constitute the “target”, the problem may be considered a target search task or a figure-ground separation task (Aust and Huber 2001; Greene 1983). Generally, categorization can be accomplished by relying on either item- or category-specific information. An item-specific strategy would require the subject to learn about the individual properties of each stimulus and their associations with (non-) reinforcement, i.e., class membership. A category-specific strategy would, instead, require the subject to extract and combine the features common to most (or maybe even all) instances of a class and then react in the same way to all stimuli possessing those features (Cook et al. 1990).

A simple and elegant way of assessing the role of item- and category-specific features is to bring information about the presence or absence of a target into conflict with information about the background. Although animals (and humans) are known to gather information not only about category-relevant features but also about properties of individual stimuli (e.g., Aust and Huber 2001; Greene 1983), possessing a class rule requires an ability to give precedence to the former when in conflict with item-specific information. In Aust and Huber (2001), it was investigated what types of information pigeons would use in a people-present/people-absent discrimination task. The key idea was to find out about the control exerted by any item-specific background feature as compared to that exerted by the experimenter-intended people-present/people-absent rule by pitting the two against each other. Aust and Huber (2001) showed that the most demanding task was that of novel people being presented on familiar backgrounds, as the familiar backgrounds had previously been paired with non-reinforcement and covered a large area in the pictures, whereas the people in the pictures were unfamiliar. In the present experiment, a similar test was carried out with the dogs in order to find out if they actually attended to properties of a target figure or rather relied on irrelevant background cues confounded with the presence of a target.

(3) If dogs can respond to properties of the target, do they accomplish the task by a rule that is equivalent to ours? Generally, correct classification behavior does not necessarily imply that an animal’s classification rules are identical to those of the experimenter and sometimes there is evidence that they are not. Instead, the animal may employ an alternative rule that parallels the intended one. Above all, conclusions on target representations are limited by the fact that it is usually not clear whether an animal recognizes the representational nature of pictures. It is well possible that a subject forms a similarity-based response rule simply by extracting common features from otherwise “meaningless” patterns. Such a strategy would clearly differ from the ones usually applied by humans, who immediately recognize figures shown in photographs as representations of their 3D-referents (see, e.g., Aust and Huber 2006). Furthermore, humans and animals may differ in the features that they are able to detect, find salient, and/or judge relevant.

There are, however, two possibilities to gain at least some insight into the nature of the response rule applied by an animal. First, one may conduct tests with pictures, whose informational content is systematically varied, as did, for example, Troje et al. (1999) and Huber et al. (2000). Second, one may analyze the cases in which classification “goes wrong”. Classification “errors” are a useful source of information regarding the nature of the internal representation, as persistent reliance on irrelevant features argues against an accurate, subtly differentiated target representation (D’Amato and Van Sant 1988).

The present experiment investigated those three basic questions in the context of categorization by dogs. Their task was to discriminate color photographs showing dogs from photographs showing landscapes without any dogs. In contrast to former studies with pigeons (see Huber 2001; Huber and Aust 2006 for reviews), we used a simultaneous presentation of a positive and negative stimulus. In particular, the animals were first trained to choose the former in preference to the latter in a forced two-choice procedure and were then tested for generalization to novel instances of the two classes. Successful transfer would indicate an ability to classify the pictures on a basis beyond that of strict rote learning on a pixel-by-pixel basis. In the subsequent “reversed contingencies” test, it was investigated whether categorization was actually coupled to the features of the target or rather guided by memorizing item-specific stimulus properties. Therefore, the subjects were tested with pictures showing novel dog figures mounted onto previously negative backgrounds. Eventually, classification errors were analyzed to get further insight into the nature of the representation underlying the dogs’ categorization performance.

Method

Subjects

The subjects were one Border Collie (Maggie), one Border Collie mix (Lucy), one Australian Shepherd (Bertl), and one mongrel (Todor). Two dogs were male (Bertl, Todor), two were female (Maggie, Lucy). At the time of the experiment, the dogs were between 1.5 and 3.5 years old. All dogs were maintained on a normal diet that was not changed during the testing days (e.g. three dogs had a small breakfast before the tests). Three owners came with their dogs to the testing sessions and one dog was picked up at the owner’s house by the experimenter twice a week. The dogs were companion dogs with basic obedience training and were naïve to the experimental task.

Stimuli

The stimuli were taken from the World Wide Web and consisted of a total of 120 dog pictures and 120 landscape pictures. The dog pictures involved a variety of settings, and varied with respect to number, identity, sex, breed, age, size. Furthermore, they differed from each other with regard to their position within the picture, their posture and the context in which they were acting and the angle of regard. Some pictures showed close-ups of the head, whereas others showed full-body shots. The landscape photos also varied from mountains to plains, summer to winter shots. Examples of the 40 dog- and 40 landscape-training stimuli are shown in Fig. 1a–d. In the generalization test, 40 novel dog and 40 novel landscape test stimuli were presented (Test1). Examples of test stimuli are shown in Fig. 1e–h.

Fig. 1
figure 1

ad Training stimuli (a, b: S+; c, d: S−); eh Test 1 stimuli (e, f: S+; g, h: S−); i example for the combined S+ in Test 2. It is derived from a novel dog picture (j) and a familiar training stimulus (k)

The stimuli used in Test 2 consisted of novel dog pictures presented against backgrounds of familiar landscape pictures used in the training. Figure 1i shows such a composed stimulus as well as the original pictures (dog and landscape) from which it was derived. The stimuli (novel dogs on familiar backgrounds) provided a contradiction between the class rule (dog-present) and previously experienced background contingency (landscapes). A total of 40 stimuli was created and presented with novel landscape pictures.

Apparatus

Testing was conducted in a separate room at the university to prevent distraction of the dogs. The test apparatus stood on the floor and consisted of a closed rectangular box housing the pellet dispenser (feeder box; 40 × 70 × 40 cm, width × height × depth) and an adjacent rectangular testing enclosure (40 × 70 × 40 cm), separated from the feeding box by an opaque partition (Fig. 2). The testing enclosure allowed the dogs to reach the touch-screen but also shielded their vision to avoid distractions from the side and above. Inside the testing enclosure, a 15-inch TFT display was mounted onto the partition. The monitor was equipped with an infrared touch frame (Carroll Touch, Round Rock, TX; 32 vertical × 42 horizontal resolution; Huber et al. 2005; Pisacreta and Rilling 1987). The distance between the array of light-emitting diodes and screen was 1cm. The base of the touch-screen was 42 cm above the ground. Stimuli were presented at a size of 150 × 111 pixels producing a 5.29 × 3.92 cm picture on the monitor. Reinforcement was administered in the form of small commercial dog food pellets, which were made available through a small hole beneath the touch-screen (base was about 3 cm above the floor). They were delivered by an automated feeding device that was hidden inside the feeder box.

Fig. 2
figure 2

Schematic drawing of the apparatus (a) and photograph of a dog working in the box (b)

Data acquisition and device control were handled with a microcomputer interfaced through a digital input–output board. To reduce nervousness in the dogs, a human needed to be present in the testing room during the experiments. In order to control for possible social cues, the experimenter was standing or sitting next to the testing enclosure but was unaware which stimulus was being presented.

Procedure

Before we introduced the experimental task, the dogs were accustomed to the apparatus and the food delivery system during several sessions. First, they were trained to touch the monitor with their nose by means of a “clicker”-aided shaping procedure, defined as a subset of operant conditioning using positive reinforcement, extinction and negative punishment. We used the same food pellets that were delivered as rewards by the feeding device. Once the dogs were accustomed to touching the screen with their nose, they were trained to touch a stimulus (yellow circle or square) appearing on the otherwise black screen. The stimulus was randomly placed on the screen, and changed its position from trial to trial. This feature forced the dogs to search the whole screen in order to locate the stimulus. This second step of training was a combination of clicker training and automated responses. If the dogs hit the stimuli directly with their nose and thereby interrupted the infrared light grid in front of the screen, they triggered an acoustic signal and delivery of a food pellet by the feeding device. However, in order not to frustrate the dogs (it needs some practice on the side of the dogs to learn how to touch the screen in order to provoke a response), the clicker was still used to reward attempts.

When the dogs successfully responded to the stimuli presented, discrimination of simple forms (circle vs. rectangle) was required to make the dogs familiar with the forced two-choice procedure used for all following experiments. Each trial consisted of one positive and one negative training stimulus, being simultaneously presented on a black background in fixed positions (i.e., at about the animal’s eye level with one stimulus appearing somewhat left of the middle of the screen and the other one appearing somewhat right). The positions (left/right) of S+ and S− varied randomly from trial to trial. Each session consisted of 30 trials. Touching the correct stimulus (S+) immediately terminated presentation (i.e., both stimuli disappeared from the screen) and food was provided. Touching the wrong stimulus (S−) resulted in a correction trial, i.e., stimulus presentation was terminated and the color of the touch screen turned to red for 3 s. Then the stimuli of the previous trial were shown again in identical positions as before. Another wrong choice then led to another correction trial, whereas a correct choice terminated the trial and led to food access. To enhance learning, correct choices were indicated by a short tone followed by a food reward, negative choices were indicated by a buzz. Each trial (except correction trials) was followed by a 4 s inter-trial interval (ITI), during which an empty black background was shown.

The animals were transferred to a second discrimination training as soon as they reliably performed on a level at or beyond the learning criterion. The criterion required ≥22 correct first choices in 30 trials (which equal 73%) in three out of five consecutive sessions (correction trials were not considered). In the second training step, the dogs were accustomed to discriminate between pictures. Three underwater pictures had to be distinguished from three pictures of paintings or vice versa. The same criterion for successful discrimination as before was used. During these training tasks, we increased the sessions dogs had to finish on a given day to four. Thus, when they started with the crucial dog-landscape discrimination task, all four dogs were able to complete four sessions with a 10 min break after the first two sessions on a given testing day. The dogs usually completed all four sessions within 45 min. One dog came once a week, whereas the other three dogs came twice a week (Lucy, Bertl, Todor) for testing.

The dog-landscape discrimination was introduced as soon as the dogs reliably solved the form and picture discrimination problems. We used a total of 40 dog and 40 landscape training stimuli and each session consisted of 30 trials. The S+/S− pairings were varied so that a single combination occurred only once in 340 trials. The sequences of presented stimuli were randomized across sessions. After 340 trials, the combinations of stimuli and the sequence of presentation were repeated. The criterion was set at ≥24 correct first choices in 30 trials (which equals 80%) in four out of five consecutive sessions (i.e., correction trials were not considered).

In Test 1 (S+ = novel dog picture; S− = novel landscape) and Test 2 (S+ = novel dog picture on familiar landscape; S− = novel landscape), 40 test stimuli pairs were interspersed into sequences of ordinary training stimuli at a rate of 10 per session, thereby replacing an equal number of arbitrarily selected training stimuli. Each test thus consisted of four consecutive sessions conducted on the same day. Correct responses on test trials were rewarded as well since none of the test stimuli were presented more than once.

Results

Figure 3 depicts the results of the training phase. To reach criterion, the dogs were required to reach 80% correct or better performance in four out of five consecutive sessions. All dogs learned to discriminate between the two classes and reached the learning criterion. However, variance among the four dogs was high, with one dog needing as few as 24 30-trial sessions and one dog requiring as many as 68. Interestingly, the two male dogs (Australian Shepherd and Mongrel) needed about twice as many sessions as the two Border collie females. Please see the figures S1 and S2 for examples of correctly and incorrectly classified stimuli.

Fig. 3
figure 3

Percentage of correct first choices of the four subjects in the training trials. The dashed line represents chance level (50%), the solid line represents the one-sided significance level of 0.05 (66.66% first choices) and the dotted line 80% correct first choices, which was the criterion required to be reached in four out of five consecutive sessions. Lucy and Maggie were females, Bertl and Todor males

The results of Test 1 are illustrated separately for each dog in Fig. 4 as percent correct discrimination of the test trials in comparison with performance on the training stimuli presented in the test sessions. Discrimination performance in both tests (Tests 1 and 2) was assessed by means of one sided binomial tests (“chance” probability = 0.5). Three subjects performed ≥80% correct on the familiar training as well as on the novel stimuli, which is significantly above chance (Binomial test, P < 0.0001 for each subject). Although Todor dropped to 72.5% correct discrimination in the transfer trials (Binomial test; P = 0.0032), he showed a similarly low level of performance on the familiar training trials, which suggests a general concentration problem on the day of testing.

Fig. 4
figure 4

Percentage of correct choices in Test 1 of the four subjects. Light grey bars depict the dogs’ performance on the familiar training trials during the test sessions and dark grey bars the performance on the 40 novel S+ and S− stimuli. The dashed line represents chance level (50%), the solid line represents the one-sided significance level of 0.05 (65% correct first choices; n = 40) for the test trials. The dotted line represents the one-sided significance level of 0.05 (60% correct first choices; n = 80) for the training trials. Results of the Binomial test: *P < 0.05; **P < 0.01; ***P < 0.001

Figure 5 summarizes the results of the second test, in which picture combinations of new dog photos and familiar landscape photos were presented together with novel landscapes. In the combined stimuli, the class rule was pitted against item specific information about former background contingency. All dogs showed a significant performance on the test trials (Binomial test: Lucy: P < 0.0011; Maggie: P < 0.0082; Bertl: P < 0.040; Todor: P < 0.019), with three of them showing a lower level of performance than in trials with training stimuli.

Fig. 5
figure 5

Percentage of correct choices in Test 2 of the four subjects. Light grey bars depict the dogs’ performance on the familiar training trials during the test sessions and dark grey bars the performance on the 40 combined S+ versus 40 new S− stimuli. The dashed line represents chance level (50%), the solid line represents the one-sided significance level of 0.05 (65% correct first choices; n = 40) for the test trials. The dotted line represents the one-sided significance level of 0.05 (60% correct first choices; n = 80) for the training trials. Results of the Binomial test: *P < 0.05; **P < 0.01; ***P < 0.001

Discussion

The aim of the present study was to gain some insight into the visual categorization abilities of dogs. To this end, we trained dogs to classify photographs according to the presence or absence of dogs and tested them for transfer to novel stimuli as well as to stimuli providing contradictive information regarding their content of item- versus category-specific properties.

Although we found a high variance among dogs in the number of sessions they required to reach the criterion, all dogs eventually mastered the task. If the observed difference in learning between the two males and two females reflected an actual sex difference, a breed difference (two border collies vs. Australian Shepherd and mongrel) or was due to individual variation (regarding, e.g., attention or motivational state) cannot be assessed on the basis of such a small sample.

All dogs (regardless of sex and breed) showed excellent transfer to novel stimuli with only very small decrements as compared to training performance (≥72% correct)(Test 1). Such transfer may equally be accounted for physical similarities between individual familiar and novel exemplars, i.e., item-specific information (Cook et al. 1990; D’Amato and Van Sant 1988; Greene 1983; Lea 1984), and by the acquisition of a category-specific representation of the underlying class rule. Thus, successful transfer per se was not indicative of what information had entered the dogs’ representation of the training stimuli. Determining whether they accomplished the task by means of item- or category-specific information thus needed a more stringent test.

An item-specific strategy requires a subject to learn about the individual properties of each stimulus and their associations with (non-) reinforcement, i.e., class membership. A category-specific strategy, instead, requires a subject to extract and combine features common to most (or maybe even all) instances of a class and then to react in the same way to all stimuli possessing those features (Cook et al. 1990). To assess which strategy was used by the dogs, we conducted the second test, where information about the presence or absence of a target (i.e., a dog figure) was brought into conflict with information about the background. We found that although performance was poorer for three dogs on the test stimuli than on the training stimuli, discrimination of the former was nevertheless significant in all subjects. Similarly, Aust and Huber (2001) found that in pigeons the presentation of pictures providing contradictive information did not disrupt performance (at least in birds for which the person-present class was the positive one) either, which suggests that both dogs and pigeons made use of a category-based response rule with classification being coupled to category-relevant features. These results are in sharp contrast with what Greene (1983) reported from a similar test. She found that when new positive slides were introduced, consisting of the target (a particular person) added to former negatives, pigeons treated these slides as if they were still negatives. In turn, all new negatives, generated by removing the target from former positives, were treated as if they were still positives. Obviously, the pigeons’ responding was rather controlled by irrelevant background cues than by the experimenter-intended class rule.

Although our subjects classified the stimuli according to the experimenter-defined class rule (dog/non-dog), the possibility remains that they employed an alternative response strategy that paralleled the intended one. In the extreme case, no class rule at all might have been used, with performance being instead based on the extraction of simple invariants that inadvertently correlated with the presence or absence of dogs (see, e.g., Monen et al. 1998). In fact, animals have repeatedly been found to use quite surprising cues that help them distinguish between categories (e.g., D’Amato and Van Sant 1988; Greene 1983; Huber et al. 2000; Troje et al. 1999).

Studies carried out with pigeons (Aust and Huber 2001, 2002, 2003; Herrnstein and De Villiers 1980; Herrnstein and Loveland 1964; Herrnstein et al. 1976; Huber et al. 2000; Troje et al. 1999) and monkeys (D’Amato and Van Sant 1988) have shown that the subjects did not extract exactly the same information from people-present/people-absent photographs and/or organize it in the same way as do humans. Furthermore, in several experiments pigeons were found to respond accurately to what, to the human observer, were atypical instances of an experimenter-defined category (Herrnstein and De Villiers 1980; Herrnstein et al. 1976; Roberts and Mazmanian 1988). Such results have raised serious doubts on the equivalence of target representations built by humans and other species (see also McIlvane et al. 2000). At best, one may tentatively conclude that they are working with overlapping, but not precisely equivalent, categories (Herrnstein 1990). The findings of the present study with dogs lend further support to that notion.

Within this context it should also be stressed that extraction of category-relevant features does not mean that the dogs recognized the positive instances as representations of real dogs (and the negative ones as representations of mountains, rivers, or forests), as would certainly have been the case with human subjects. This would require the ability to see the equivalence of pictures and objects, which cannot be inferred from the present study. Therefore, it is impossible not only to make any conclusions on the level on which the dogs recognized the pictures (e.g., dogs, quadrupeds, animals), but even to decide whether they saw anything “meaningful” in them at all. However, these are aspects our experiment was not aimed at investigating. But the mere fact that the dogs were able to classify photographs of natural stimuli by means of a perceptual response rule already answers some essential (and hitherto open) questions on this species’ visual categorization abilities.

Regarding the methodological aspect of the present study, the results show that dogs, like other animal species (such as, e.g., pigeons or primates), can be trained to solve visual discrimination tasks carried out with a two-choice touch-screen procedure. This is an important finding for two reasons. First, it shows that the specific properties of the visual system of dogs do not constitute an actual obstacle to successfully handling such tasks. However, it is possible that the reduced acuity (Odom et al. 1983) and the way in which dogs perceive color (Neitz et al. 1989) impair learning speed and/or discrimination abilities in comparison with other species that have different visual systems (e.g., primates and birds). Pigeons, for example, may use different features (e.g. color, brightness) as important sources of information for classifying novel stimuli (see, for reviews, Huber 1999, 2001; Huber and Aust 2006). Thus, especially if stimuli are very complex and require attention towards specific features and/or colors, performance of dogs might be inferior to that of other species. However, further studies are needed to elucidate in more detail the visual discriminative abilities of dogs and their performance has to be compared to that of other species in similar tasks before any strong conclusions can be drawn.

Second, successful demonstration of learning in our automated touch-screen procedure yields evidence of the dogs’ categorization abilities with social cueing being almost excluded. Actually, one big concern in regard to traditional dog experiments is the close relationship between owners and dogs, which, depending on the raising and training of a dog can be analogous to the mother–child relationship (Topal et al. 1998). As a consequence, dogs usually need to be tested in the presence of their caretakers in order to establish a relaxed and natural experimental situation, which bears the risk of the owner consciously or subconsciously influencing the dog’s behavior (“Clever Hans Effect”; Pfungst 1907). The method employed in the present study is innovative insofar as the owner and/or the experimenter (though present) do not see the stimuli presented on the touch-screen and are thus unable to influence the dog. Furthermore, interestingly and rather unexpectedly, all dogs that have so far been subjected to a touch-screen training in our lab (N = 15) have shown high motivation to work.

In summary, we found that the dogs were able to classify photographs of natural stimuli by means of a perceptual response rule. Moreover, we are confident that the touch-screen method will allow for investigating a number of questions including individual learning abilities of dogs, memory, differences among subjects based on training experiences, sex differences etc. At the same time, an automated touch-screen procedure allows for escaping the usual trade-off between the risk of social cueing and a decrease in the dogs’ motivation to work. And, finally, the present procedure may prove a powerful means for testing a wide variety of bird and mammal species on the same tasks under almost identical experimental conditions.