Introduction

Learning abstract concepts involves the ability to compare stimuli to one another and to judge their relationship according to rules (e.g., ‘same as’, ‘different from’, ‘larger than’, and ‘smaller than’). One of the most important relational concepts is the identity concept, i.e., the ability to respond to the identity relationship among stimuli. The extent to which a species is able to process this kind of relationship is indicative of its relative capacity for processing abstract concepts (Herrnstein 1990; Premack 1978; Thomas 1980; Thompson and Oden 2000). It has been proposed that animals may learn identity between two stimuli in two different ways: (1) in a very specific way that applies only to familiar stimuli after training (i.e., by item-specific learning) and (2) in a way that goes beyond the stimuli used in the original discrimination training and that might be applied to completely novel stimuli (i.e., by relational learning) (D’Amato and Salmon 1984). In the latter case, the learning process involves judging a relationship between stimuli based on an abstract rule that can be applied to novel stimuli, i.e., abstract concepts (Premack 1978).

Studies of concepts in animals have almost exclusively employed two instrumental tasks, the same/different (S/D) discrimination and the matching-to-sample (MTS) tasks. In S/D discrimination tasks, a pair of stimuli is presented either simultaneously or successively, and the subject is rewarded for giving different responses regarding whether the stimuli are the same or different. In the MTS task, the subject is first presented with an individual stimulus, the sample, which is followed by two or more comparison stimuli. The subject is rewarded for responding to the comparison stimulus that matches the sample. In conditional or symbolic matching tasks, the sample and comparison stimuli are not identical but there is a specific, albeit arbitrary, relationship between them; in the identity matching task, the relationship that is involved requires a judgment based on the physical features of the stimuli. In both MTS and S/D tasks, the acquisition of abstract concepts is inferred from the subject’s ability to match/discriminate novel stimuli in transfer tests. In fact, the ability to apply an abstract rule to novel stimuli cannot be ascribed to learning the stimulus characteristics and represents a case of true relational concepts (Katz et al. 2007; Premack 1978; Thomas 1980, 1986).

To date, evidence of the ability to acquire an identity concept has been reported in a number of non-human species: chimpanzees (Nissen et al. 1948; Oden et al. 1988); capuchin monkeys (Barros et al. 2002; D’Amato and Salmon 1984, D’Amato et al. 1985); baboons (Bovet and Vauclair 1998, 2001); macaques (Katz et al. 2002; Mishkin and Delacour 1975; Wright et al. 2003); sea lions (Kastak and Schusterman 1994; Pack et al. 1991); dolphins (Herman and Gordon 1974; Herman et al. 1989; Mercado et al. 2000); rats (Peña et al. 2006); pigeons (Cook et al. 1997; Wright 1997; Wright et al. 1988; Zentall et al. 1981); parrots (Pepperberg 1987); and corvids (Wilson et al. 1985). However, an increasing amount of experimental data have suggested that the methodologies used to determine many of the conditions under which this kind of ability occurs are far from straightforward.

It has been argued that there are fundamental differences in the way primates and non-primates process identity relationships (D’Amato and Salmon 1984; D’Amato et al. 1985, 1986). Monkeys trained to match samples with a limited number of stimuli apparently exhibited a good degree of transfer to novel stimuli (e.g., D’Amato 1971; Weinstein 1941), whereas pigeons under comparable procedures showed little generalisation (D’Amato et al. 1986; Urcuioli and Nevin 1975; Zentall and Hogan 1978). Two different explanations, based on the specific characteristics of the stimuli used during the training (i.e., item-specific learning), have been proposed for the failure of pigeons to transfer to novel stimuli. Animals could solve the MTS task either by configural pattern learning (i.e., by learning the correct responses for the different gestalt configurations formed by the stimuli presented on the display) or by using an if-then rule (i.e., by learning specific stimulus response associations between the sample and the matching stimulus) (Carter and Werner 1978; Wright 1997, 2001). Further studies demonstrated that when pigeons failed to learn a full-abstract concept, they learned to solve the matching task using both an if-then learning strategy as well as a configural pattern strategy (Katz et al. 2008; Wright 1997) and were able to use an abstract identity rule using particular procedures (Lombardi 2008; Wright and Delius 2005). For example, Wright (1997, 2001) showed that pigeons tested in a matching-to-sample task were able to fully transfer their matching ability to completely novel stimuli if they were required to peck the sample item 20 times (fixed-ratio 20, FR20). Moreover, pigeons tested in a matching-to-sample task showed a better transfer performance when trained with large stimulus sets than with small stimulus sets (Bodily et al. 2008; Wright et al. 1988). The set size effect also occurred when pigeons were faced with an S/D task (Katz et al. 2002, 2007; Katz and Wright 2006; Wright and Katz 2006).

The important role of experimental procedures used to investigate sameness–difference judgments is also evident in studies on non-human primates. Similarly to pigeons, both rhesus macaques and capuchin monkeys administered an S/D task showed better transfer performance after training with a large set of stimuli (Katz et al. 2002; Wright and Katz 2006; Wright et al. 2003). Moreover, macaques, like pigeons, showed better transfer when trained with 10 sample observing responses (FR10) compared with no sample observing responses (FR0) (Wright et al. 2003). Finally, Galvão et al. (2005, 2008) adopted an individualised approach and reported that tufted capuchin monkeys were able to match stimuli following an identity rule when trained with a step-by-step procedure in which each individual gradually acquired complex repertoires. In fact, in this kind of procedure, an individual’s performance was analysed in every phase to avoid the selection of correct responses due to unplanned variables and to obtain a response behaviour that is controlled by the relationship between the sample and the matching stimulus (i.e., true matching-to-sample).

Some studies indicated that capuchins can acquire the ability to use an identity concept using a relatively small number of stimuli in the training phase (Barros et al. 2002; D’Amato and Salmon 1984; D’Amato et al. 1985). Unfortunately, a very limited number of stimuli were also repeatedly presented in the transfer phases in these studies. Reiterated stimulus presentation, together with differential rewarding for correct and incorrect choices in the transfer tests, does not completely rule out the possibility that the monkeys solved the problem with the new stimuli using very rapid associative learning processes. In their review on the recent advancements in abstract-concept learning, Katz et al. (2007) suggested several criteria that distinguish the acquisition of abstract concepts from alternative explanations. First, they proposed that the transfer stimuli must be novel and dissimilar to the training stimuli. Second, the transfer stimuli should not be repeated, and when repetition is necessary, the appropriate statistical analyses should be applied to assess the stability of performance across repetitions. Third, performance in the transfer stimuli should not differ from performance in the training stimuli.

In the present study, an identity matching-to-sample (Id-MTS) task was used to evaluate the ability of tufted capuchin monkeys (Cebus apella), a New World monkey species, to discriminate between individual items on the basis of an abstract identity rule (i.e., identity concept). The main aim of this study was to assess the conditions necessary for acquisition of a widely applicable matching concept in this species. To this end, we examined the effect of the stimulus set size on the capuchins’ ability to transfer to novel stimuli. Moreover, we assessed the capacity of these monkeys to transfer the matching ability from one visual dimension to others and maintain good performance when familiar stimuli were presented in new spatial dispositions. The study included two experiments. Experiment 1 was aimed at evaluating the ability of capuchin monkeys to match the shape of novel stimuli after a training with small stimulus sets. The main aim of Experiment 2 was to assess the ability of the monkeys to match novel stimuli according to shape after a training with a large stimulus set and the successive ability to transfer the concept to novel visual dimensions, such as colour and size. Moreover, in Experiment 2, we examined the capacity of capuchin monkeys to maintain/restore good performance when the spatial arrangement of the stimuli was varied from that used during the previous phases of the study.

Experiment 1

In Experiment 1, the subjects had to match figures according to identity by shape. Pairs of novel figures were gradually presented, one at a time. A new pair was introduced only after the monkeys reached the learning criterion on the previous pair of stimuli presented. We hypothesised that the monkeys would require fewer trials to reach the learning criterion as the number of pairs presented increased. Moreover, on the basis of apparent success in immediate transfers, we expected the capuchins to perform better in matching the last pairs presented than with the first ones.

Method

Subjects

The subjects consisted of six tufted capuchin monkeys (Cebus apella), of which three were males (Robot, Sandokan, and Rubens) and three were females (Pippi, Carlotta, and Roberta). All subjects were adults (5–26 years old) born in captivity. Monkeys were hosted at the Primate Center of the Institute of Cognitive Sciences and Technologies, CNR, Rome, Italy. They belonged to four groups, each housed in an indoor–outdoor enclosure (indoor: 5 m2 × 2.5 m high; outdoor: 40–130 m2 × 3 m high). Capuchins were individually tested in an experimental cage (0.76 m long × 1.70 m wide × 0.73 m high), to which they had access through a sliding door from the adjacent indoor cage. Each subject was separated from the group just before the daily testing session solely for the purpose of testing. The testing occurred between 10:30 a.m. and 4:00 p.m. Water was freely available at all times. Fresh fruit, vegetables, and monkey chow were provided in the afternoon after testing.

Two monkeys, Pippi and Rubens, were already familiar with the matching-to-sample procedure because they had been previously trained on tasks involving pattern discrimination and categorisation of visual stimuli (for details, Spinozzi et al. 2003, 2004). However, they had never been tested with a touch screen apparatus. The four remaining subjects (Carlotta, Roberta, Robot, and Sandokan) had never performed MTS tasks before.

Apparatus

The computerised test consisted of a personal computer (Model AMD Athlon 1200) connected to a 19″ touch screen (Model E96f+SB, CRT, ViewSonic) and an automatic food dispenser (Model ENV-203-45, MED Associates, Inc. Georgia, VT) (see Fig. 1). The E-Prime software (Psychology Software Tools, Inc.) was used as the stimulus generator and served both to present the stimuli and to record the response behaviour. The food dispenser was designed to deliver one 45-mg banana-flavoured pellet (TestDiet, Richmond, IN, USA) when the monkey provided a correct response during the experimental trial. The pellet was delivered into a Plexiglass feeding cup (10 cm wide × 5 cm deep × 3.5 cm high) located 16 cm below the touch screen in the centre. A wooden frame (48 cm wide × 64 cm high × 30 cm deep) with a central aperture (36 cm wide × 26 cm high) surrounded the touch screen. The food dispenser was placed behind the wooden frame, out of sight of the subject. Moreover, a supplemental LCD monitor was placed at the back of the touch screen to allow the experimenter to follow the progress of the session and to remove the apparatus at the end of the session.

Fig. 1
figure 1

Experimental apparatus

The touch screen, food dispenser, and supplemental LCD monitor were mounted on the top shelf of a trolley (81 cm long × 45 cm wide × 80 cm high), whereas the personal computer was mounted on the bottom shelf of the trolley.

The apparatus was placed approximately 15 cm from the grid of the experimental cage (see Fig. 1). The grid was made of horizontal metal bars (0.5 cm thick) that were separated by 4.5 cm. To touch the stimuli displayed on the touch screen, the subject could stretch its arm through the bars.

Stimuli

Two different sets of stimuli were used. Set I included 22 white shapes, created using Microsoft PowerPoint, that were presented in 11 pairs (Fig. 2a). Set II included 200 computer icons (which comprised both colour as well as black and white shapes), presented in 100 pairs such that the two figures that formed a pair had the same colour/s (Fig. 2b). Each figure was on average 3 cm × 3 cm (11.3° of visual angle) and was presented within a black frame (6.5 cm × 6.5 cm). All stimuli were transformed into bitmap images for stimulus presentation on the computer screen.

Fig. 2
figure 2

Examples of figures included in the different sets of stimuli used in Experiments 1 and 2

General procedure

A simultaneous matching-to-sample (MTS) procedure was adopted, in which three stimuli, the sample (SS) and the two comparison stimuli, the matching stimulus (S+) and the non-matching stimulus (S−), were simultaneously presented on the computer screen. At the beginning of each trial, the sample stimulus appeared on the upper half of the screen in the centre. Then, immediately after the subject touched the sample stimulus, the two comparison stimuli were displayed simultaneously 4 cm below the sample, to the right and left, at a distance of 5 cm apart (see Fig. 3). The initial touch to the sample was adopted to ensure that the monkeys were paying attention to the sample stimulus at the beginning of each trial. The sample remained visible for the duration of the trial. The right/left positions of S+ and S− were randomly determined in each trial. The subject had to indicate its choice by touching one of the comparison stimuli on the screen; the computer automatically recorded the choice. If the comparison stimulus was chosen correctly (S+), a food pellet was dispensed. If the selected stimulus was incorrect (S−), no pellet was dispensed. After the response, the display was immediately extinguished. A correct response was followed by a 5-s inter-trial interval (ITI), whereas an incorrect response was followed by both a 10-s time out (TO) and a 5-s ITI. During the experimental trials and the ITI, the screen was light grey; during the TO, the screen was green.

Fig. 3
figure 3

The monkey was required to touch the sample stimulus to generate two comparison stimuli, which were simultaneously displayed below the sample at the right and left, and then the subject had to touch the stimulus that matched the sample

Training

The monkeys were initially trained to match two figures from Set I. The training trial began with the presentation on the display of a sample stimulus and two comparison stimuli. The subject had to identify which of the two comparison stimuli was identical to the sample.

During the early training sessions, a correction procedure was adopted in which an incorrect trial was repeated until the animal made the correct response. Each session lasted until at least 12 correct responses were collected in which the two stimuli appeared as the sample an equal number of times. When the subject completed at least 12 correct consecutive responses with no more than three corrections (corresponding to 80% correct responses), non-correction sessions were administered.

Each non-correction session consisted of 24 trials. Within the same session, the stimuli of Set I appeared as samples an equal number of times in a random order. Also, each comparison stimulus appeared randomly at the left and the right of the screen with equal frequencies. When a subject met the predetermined criterion of 80% or more correct responses on two consecutive non-correction sessions, it was given 10 transfer tests.

During both correction and non-correction training, subjects received on average of six (range: 1–15) training sessions per day for 5 days per week. The daily number of sessions presented to each monkey varied according to both the subject’s motivation and needs and the scheduled alternation of experiments of different studies in the experimental rooms.

Transfer tests

In the first 10 transfer tests, novel pairs of stimuli from Set I were presented. Transfer tests consisted of four 24-trial sessions. In each session, half of the trials were based on a familiar stimulus pair (to assess the extent to which the original matching performance was maintained) and half on a new pair (to assess the transfer). For example, in the first transfer test, the familiar stimuli were the two figures of the first pair used during the training, whereas the new stimuli were two figures of a novel pair that the subjects had never seen before. Trials were randomly intermixed, and each comparison stimulus appeared randomly at the left and the right of the screen with equal frequencies. If the monkey succeeded in performing a transfer test within the first 48 trials (that is their overall performance in the first four sessions showed an accuracy of 80% or more correct responses for the novel stimuli), a new transfer test was presented. Otherwise, 24-trial sessions were repeated until the subject reached the learning criterion in the novel stimuli. This procedure was repeated for a total of 10 new pairs of stimuli. Each transfer included a new pair of stimuli, and all subjects received the different pairs in the same order. During transfer tests with single novel pairs, subjects received on average five (range: 1–13) sessions per day on 5 days per week.

Finally, a transfer test consisting of a single 100-trial session of novel stimuli from Set II was presented. Within the session, positive stimuli appeared at the left and the right side of the screen in a random order and with equal frequencies.

Data analysis

Performance was defined as the frequency of correct responses per test condition. We used binomial scores to assess whether the individual accuracy scores were significantly above the level of random chance. Binomial z scores were also used to evaluate the subjects’ performance on a large number of trials. Statistical significance was judged using an alpha level of 0.01 when the same stimuli were repeatedly presented within a session and/or when the same sessions were repeated over time. In contrast, we use an alpha level of 0.05 to evaluate the subjects’ performance when the sessions included all trials with different stimuli that were not repeated over time. Since the Kolmogorov–Smirnov test showed that the group data were normally distributed, we used parametric statistics to compare the accuracy scores both in different conditions and in different phases within the same condition. Finally, since transfer trials were always reinforced, to rule out the potential effects of training on performance in the transfer tests in which the same figures were repeatedly presented, we also verified for each transfer how the subjects responded at the first presentation of each new stimulus pair (Trial 1 analysis).

Results and discussion

There was great inter-individual variability in the number of trials required to reach the learning criterion, and the two subjects that were already familiar with the MTS task, Pippi and Rubens, did not perform better than the others. On average, considering the sessions with and without a correction procedure, the capuchins needed 10,852 trials (2,445; 5,082; 8,070; 10,413; 12,976; 26,128) to satisfy the learning criterion of 80% or more correct responses with the first pair of stimuli (see Fig. 4). This range of trials partly overlapped those reported for tufted capuchin monkeys tested with a computerised identity-MTS task by D’Amato et al. (1985, N = 4, range: 624–4,673) and Barros et al. (2002, N = 2, range: 862–2,543). Nevertheless, some of our monkeys required more trials than previous studies possibly because the other studies adopted methodological procedures designed to minimise unwanted effects due to stimulus control by location and stimulus novelty (Barros et al. 2002).

Fig. 4
figure 4

Experiment 1: Number of trials required for acquisition by each subject for the first pair of figures and for each of the following 10 pairs

Figure 4 reports the number of trials required for acquisition by each subject for the first pair and for each of the 10 novel pairs of figures, which ranged from 26,128 to 48. Binomial z scores were calculated to assess whether, for each pair, the number of correct responses performed by each subject was significantly different from that expected for a distribution with 50% correct responses. A transfer test was considered to be passed successfully only when the learning criterion was satisfied within the first 48 trials. As shown in Fig. 4, the average number of successful transfers within the first 48 trials (i.e., 4 sessions) was four (Rubens, N = 6; Sandokan, Robot, and Pippi, N = 5; Roberta, N = 3; and Carlotta, N = 0), most of which occurred in the second half of the set of pairs, i.e. from pair 7 to pair 11 (Rubens, Sandokan, Robot, and Pippi, N = 4; and Roberta, N = 3). Moreover, Carlotta, an adult female, never succeeded in performing a transfer within the first 48 trials. Interestingly, after the first successful transfer, the subjects did not always succeed in following transfers; thus, they did not show a linear pattern of acquisition.

Furthermore, for the transfer tests in which subjects reached the criterion within the first 48 trials, there were no differences between the mean percentages of correct responses for familiar (M = 85.9%, SD = 2.20, N = 3) and novel (M = 88.3%, SD = 2.35, N = 3) stimuli [t 4 = −0.44, P = 0.684, Cohen’s d = −1.05]. Additionally, there were no differences in accuracy for novel figures among the four 12-trial blocks (i.e., the performance across transfer sessions), thus ruling out the possibility of learning during the transfer session [Rubens, F 3, 15 = 2.16, P = 0.135; Sandokan, F 3, 12 = 1.78, P = 0.205; Robot, F 3, 12 = 0.43, P = 0.733; Pippi, F 3, 12 = 0.52, P = 0.674; Roberta, F 3, 6 = 0.45, P = 0.727]. In addition, binomial tests on the first trial where a given novel pair of figures was presented showed that Rubens correctly responded to the initial trial with novel stimuli only in two out of six immediate transfers (33.3%, P = 0.344), Pippi correctly responded in four out of five transfers (83.3%, P = 0.969), Robot and Sandokan correctly responded in three out of five transfers (60%, P = 0.812), and Roberta correctly responded in two out of three transfers (66.7%, P = 0.875). Thus, in some cases, the ability to transfer to novel stimuli was not immediate. Unfortunately, since they solved few tests within the first 48 trials, in total, the analyses were based on a few novel pairs of stimuli (3–6). This small number of transfer stimuli produces little if any statistical power.

Moreover, to quantify the decrease in the number of trials to acquisition, we calculated the average percentage decrease from the first and the second pairs of stimuli. Overall, capuchins showed an average percentage decrease of 87.8% (SD = 9.90, N = 6), with each subject exhibiting a high percentage decrease (Robot, 75.9%; Sandokan, 82.6%; Rubens, 96.4%; Pippi, 96.4%; Carlotta, 97.2%; Roberta, 78.5%).

Finally, Fig. 5 shows the individual accuracy levels for the transfers in which 100 novel pairs of stimuli were presented. According to the binomial z scores, the performance of the subjects could be considered significantly above the level of chance when they made 59% or more correct responses (P < 0.05). Three out of six monkeys performed above the level of chance: Robot had a high level of accuracy (74%), whereas Rubens (64%) and Roberta (60%) barely passed the criterion. The percentage of correct responses for Sandokan (56%), Pippi (55%), and Carlotta (58%) was slightly above the level of chance.

Fig. 5
figure 5

Experiment 1: Individual accuracy levels of the six subjects in the 100-trial transfer test with novel figures. The dashed line indicates the chance level of performance. (One tailed binomial z scores: * P < 0.05; *** P < 0.001)

Overall, Experiment 1 demonstrated that using small stimulus sets in both training and transfer might lead to incorrect conclusions about the acquisition of a full-abstract identity concept. In fact, when the ability to transfer was tested using a large number of novel stimuli, only one capuchin performed well above the level of chance. This strongly suggests the need to use a large stimulus set to assess the extent to which animals are able to transfer to completely novel stimuli.

Experiment 2

Experiment 2 included a training phase and four test phases. As in Experiment 1, in training phase and test phase 1 of Experiment 2 subjects had to match figures according to identity by shape. In particular, we evaluated whether further training the capuchins with a large stimulus set allowed them to transfer to a large number of novel stimuli better than in Experiment 1. According to previous findings on abstract-concept learning (for a review see Katz et al. 2007), we hypothesised that, after being trained with a large stimulus set, all of the monkeys would be able to immediately succeed in solving a transfer task involving a large number of novel figures. Moreover, in test phase 2 and test phase 3, we evaluated whether capuchin monkeys, which exhibited an identity matching concept widely applicable to the stimulus shape, were also able to solve an Id-MTS task according to the rule of identity by colour and size, respectively. Finally, in test phase 4, we assessed whether capuchins were able to maintain a good level of accuracy when discriminating highly familiar stimuli by shape identity when the spatial arrangement of the stimuli on the screen was changed compared with that used in the previous experiments.

Method

Subjects and apparatus

The subjects and apparatus were the same as those used in Experiment 1. The subjects received the different experimental conditions in the same order, with the exception of the two conditions of the test phase 4, which were counterbalanced among subjects.

Stimuli and procedure

Training

The stimulus set (Set II) was the same used in the last transfer test of Experiment 1 (Fig. 2b). As in Experiment 1, each figure was presented on a black frame of 6.5 cm × 6.5 cm.

Training sessions were repeated until the capuchins satisfied the learning criterion of 80% or more correct responses in a session of 100 trials. The figures varied within each session, i.e. the 100 trials involve one presentation each of the 100 Set II pairs. Moreover, the figure shown as the sample within each pair varied systematically across sessions, and each comparison stimulus appeared randomly at the left and the right of the screen with equal frequencies.

Test phase 1. 100 novel pairs

The stimulus set (Set III) included 200 novel computer icons (both colour as well as black and white 3 cm × 3 cm images, 11.3° of visual angle) (Fig. 2c). Similarly to the figures of Set II, the figures of Set III were presented in 100 pairs. Figures were combined so that in each pair of figures had the same colour/s. Each figure was presented on a black frame of 6.5 cm × 6.5 cm.

The transfer test consisted of a single 100-trial session of novel stimuli from Set III. None of the stimuli were repeated during the test. Positive stimuli appeared at the left and the right side of the screen in a random order and with equal frequencies within the session.

Test phase 2. Only colour

The stimulus set (Set IV) included circle-shaped figures (2.7 cm of diameter, 10° of visual angle) presented on a black frame of 6.5 cm × 6.5 cm. Circles were four different colours, two achromatic colours (white and grey), and two chromatic colours (yellow and blue) (Fig. 2d). Therefore, we used colours created by light of wavelengths (i.e., hues) that could be easily discriminated by all subjects. Capuchin monkeys possess a colour vision polymorphism characterised by dichromatic phenotypes in all males and X chromosome homozygous females (Jacobs 1998; Saito et al. 2005). Dichromatic colour vision results in difficulty to distinguish between colours in the red and green wavelength ranges.

The test consisted of a single 96-trial session. Stimuli were presented in all possible combinations: white/grey, white/yellow, white/blue, grey/yellow, grey/blue, and yellow/blue. Within the session, the trials were arranged in four blocks. In each block, each of the four colours was presented as the sample an equal number of times; moreover, within each block, the positive stimuli appeared at the left and the right side of the screen in a random order and with equal frequencies. The four blocks were presented consecutively within the same session.

Test phase 3. Only size

The stimulus set (Set V) included six different white shapes (circle, square, triangle, diamond, arrow, and pentagon) presented in two different sizes (large size: 3 cm × 3 cm, 11.3° of visual angle; small size: 1 cm × 1 cm, 3.8° of visual angle) (Fig. 2e). The figures were all novel to the subjects with the exception of the large circle, which was basically the same used in the test phase 2 of Experiment 2. Each figure was presented on a black frame of 6.5 cm × 6.5 cm.

The test consisted of a single 96-trial session. Figures with the same shape but different sizes were compared in a total of six ‘large vs. small’ combinations. As in test phase 2, the trials were arranged in four blocks within the session. In each block, the 12 figures were presented as the sample an equal number of times. Moreover, within each block, the positive stimuli appeared on the left and the right side of the screen in a random order and with equal frequencies. The four blocks were presented consecutively within the same session.

Test phase 4. Stimulus location

The stimuli were the same two white shapes used in the initial training of Experiment 1 (Fig. 2a, first two shapes of Set I). Each figure was presented on a black frame of 6.5 cm × 6.5 cm.

The test consisted of two 96-trial sessions. Each session featured a different spatial arrangement of the stimuli. In particular, the first arrangement (vertical translation) was specular on the vertical plane to that used in the previous experiments; in the vertical translation, the sample stimulus appeared in the centre of the bottom part of the screen, and the two comparison stimuli appeared above the sample, on the left and right. In the second spatial arrangement (vertical column), the sample appeared in the centre of the screen, and the two comparison stimuli appeared vertically aligned above and below it. Within each session, the trials were arranged in four 24-trial blocks. In each block, the two figures were presented as the sample an equal number of times. Moreover, within each block, the positive stimuli appeared on the left and the right side of the screen (vertical translation) or above and below the sample (vertical column) in a random order and with equal frequencies. The four blocks were presented consecutively within the same session. The order of presentation of the two sessions was counterbalanced between subjects.

Results and discussion

Training

Two subjects (Robot and Pippi) reached the learning criterion within three sessions. Three subjects (Sandokan, Rubens, and Roberta) reached it within seven sessions, and one (Carlotta) in eleven sessions.

Test phase 1. 100 novel pairs

Figure 6 (100 novel pairs) shows the accuracy levels of each subject in the test phase 1. All monkeys performed significantly above the level of chance (59%, P < 0.05) (Robot, 81%; Sandokan, 82%; Rubens, 71%; Pippi, 87%; Carlotta, 75%; Roberta, 73%). Moreover, the overall mean percentages of correct responses (M = 78.2%, SD = 6.14, N = 6) did not significantly differ from that observed for the learning criterion with the training stimulus set (M = 84.5%, SD = 4.13, N = 6) [t 5 = 1.94, P = 0.110, Cohen’s d = 1.20]. Since all stimuli were presented only once, performance was not affected by the learning process within the transfer test. However, given that we tested the same group of subjects as in Experiment 1, it was not possible to completely rule out carry-over effects that occurred from previous experience.

Fig. 6
figure 6

Experiment 2: Individual accuracy levels of monkeys when faced with a the second 100-trial test including all novel figures (test phase 1, black bars), b the Id-MTS task of colour identity (test phase 2, grey bars), c the Id-MTS task of size identity (test phase 3, white bars), d the vertical translation condition (test phase 4, checkered bars), and e the vertical column condition (test phase 4, striped bars). The dashed line indicates the chance level of performance. (One tailed binomial z scores: *** P < 0.001)

In Phase 1 of Experiment 2, all subjects transferred the learned rule to a large set of new stimuli, and Carlotta, who did not show evidence of transfer in Experiment 1, also acquired the abstract identity concept. These results suggest that a large and varied set of stimuli builds conceptual knowledge that can successfully be transferred to novel stimuli.

Test phase 2. Only colour

The individual accuracy levels of the monkeys are shown in Fig. 6 (only colour). All of the monkeys performed well above the level of chance (62.5%, P < 0.001) (Robot, 80.2%; Sandokan, 84.4%; Rubens, 81.2%; Pippi, 77.8%; Carlotta, 76.0%; Roberta, 68.7%). In particular, the three males had higher than 80% correct responses, whereas the females showed lower levels of accuracy, but the difference between the sexes was not significant [males: M = 81.9% (SD = 2.16, N = 3) success; females: M = 74.2% (SD = 4.80, N = 3) success; t 4 = 2.55, P = 0.063, Cohen’s d = 2.07]. Furthermore, there was no difference in the overall mean percentages of correct responses among the four 24-trial blocks [F 3, 15 = 0.59, P = 0.632]. In addition, examination of the first six trials according to each colour combination showed that Robot, Sandokan, Rubens, and Roberta made six correct responses (100%, P = 0.016, binomial test), Pippi made five correct responses (83.3%, P = 0.109), and Carlotta made four correct responses (66.7%, P = 0.344). Therefore, four subjects, all three males and one female, showed a good level of accuracy at the beginning of the test.

The ability to transfer from shape to colour indicates that capuchins used the identity rule in a flexible way and that the acquisition of a full-abstract identity concept goes beyond the original training conditions and is transferred to a novel visual dimension.

Test phase 3. Only size

Individual percentages of correct responses are shown in Fig. 6 (only size). All of the monkeys performed higher than the level of chance (62.5%, P < 0.001) (Robot, 86.5%; Sandokan, 90.6%; Rubens, 85.4%; Pippi, 84.4%; Carlotta, 83.3%; Roberta, 80.2%). All of the subjects had more than 80% correct responses. Moreover, there was no difference in the overall mean percentages of correct responses among the four 24-trial blocks [F 3, 15 = 3.26, P = 0.051]. Furthermore, examination of the first six trials according to each shape presented showed that Robot and Rubens made a total of six correct responses (100%, P = 0.016, binomial test), Sandokan made five correct responses (83.3%, P = 0.109), Pippi and Carlotta made three correct responses (50%, P = 0.656), and Roberta made two correct responses (33.3%, P = 0.344). Thus, considering a trial 1 analysis, two male capuchins were able to accurately match the stimuli at the first trials of the test.

Finally, the capuchins’ overall mean performance in the identity by size task (M = 85.1%, SD = 3.46, N = 6) was significantly higher than that observed for the identity by colour task (M = 78.0%, SD = 5.41, N = 6) [t 5 = 7.08, P = 0.001].

These findings further confirm that capuchins acquired a widely applicable identity concept. Interestingly, subjects had levels of accuracy that exceeded those shown in the previous matching test (i.e., identity by colour). Two explanations are possible: first, the capuchins gained experience from the previous transfer from shape to colour, and second, the visual dimensions explored in test phase 3 were easier to process than that in test phase 2. The perception of two white shapes of different sizes on a black background could involve very rapid processing of visual information. For example, it has been shown by means of a visual search procedure that the size of objects is processed pre-attentively and in parallel across the visual field (Stuart et al. 1993).

Test phase 4. Stimulus location

Figure 6 (vertical translation and vertical column) reports the percentage of correct responses given by each subject in each testing condition. All of the monkeys, except Carlotta in the vertical column arrangement, showed an overall performance above the level of chance (62.5%, P < 0.001) (vertical translation: Robot, 84.4%; Sandokan, 90.6%; Rubens, 68.7%; Pippi, 91.6%; Carlotta, 67.7%; Roberta, 72.9%. Vertical column: Robot, 82.3%; Sandokan, 86.5%; Rubens, 72.9%; Pippi, 80.2%; Carlotta, 48.9%; Roberta, 63.5%). Moreover, in each test condition, half of the subjects had more than 80% correct responses. Overall, there were no differences between the two types of stimulus arrangements [F 1,4 = 5.81, P = 0.074], the order of presentation of the test [F 1,4 = 0.11, P = 0.763], or the interaction of these two factors [F 1,4 = 2.47, P = 0.191]. Moreover, there was no difference in the overall mean percentages of correct responses between the four 24-trial blocks both in the vertical translation [F 3, 15 = 0.57, P = 0.644] and in the vertical column arrangements [F 3, 15 = −2.95, P = 0.066]. In this test phase, only one trial for each type of condition was available for the trial 1 analysis. Examination of the first trial showed that in the vertical translation, where the absolute position of the comparison stimuli was varied and the relative positions of the two stimuli were preserved, four subjects (Robot, Rubens, Pippi, and Carlotta) correctly responded at the beginning of the test, whereas two subjects (Sandokan and Roberta) provided the wrong response. Moreover, in the vertical column, where both the spatial relationship between the sample and the comparison stimuli and the relative positions between the comparison stimuli were changed, four subjects (Robot, Sandokan, Pippi, and Carlotta) correctly responded on the first trial, whereas two (Rubens and Roberta) provided the wrong response. Therefore, as in the previous condition, four out of six capuchins were able to match the stimuli in trial 1.

Overall, capuchins solved a matching problem with highly familiar stimuli presented in novel spatial arrangements. However, the original training performance was 80% or more correct responses for all subjects, whereas in this experiment, Rubens, Carlotta, and Roberta performed worse than that. Therefore, some capuchins did not respond to the matching problems involving familiar stimuli in novel spatial arrangements analogously to those previously experienced in the standard spatial arrangement. These data suggest that the ability to transfer to spatial arrangements led to a higher degree of individual differences than transfer to other visual dimensions, such as colour and size.

General discussion

To our knowledge, the present study is the first to demonstrate that the size of the training set positively affects the acquisition of an abstract identity concept in an MTS task in non-human primates. Moreover, we showed that when capuchins acquire a widely applicable identity concept based on the stimulus shape, they are able to transfer this matching ability to other visual dimensions, such as colour and size, and to new spatial arrangements of the stimuli.

Capuchins required a very long learning process to acquire an identity matching concept. Thousands of trials were necessary to master the first pair of stimuli; the learning criterion was achieved more quickly in the following pairs of novel stimuli, with the sharpest decrease occurring between the first and second pairs. D’Amato et al. (1985) also found that capuchins reached the criterion in an MTS task for the second pair of stimuli much faster than for the first pair. Similarly, rhesus macaques trained to solve a simple discrimination task between two objects took a decreasing number of trials in the following problems involving new pairs of objects (Harlow 1949). According to Harlow (1949), this learning-to-learn pattern, labelled the ‘learning set’, is a cognitive process indicating that the animal learns not only new stimulus-response associations but also how to efficiently deal with a particular type of problem it encounters over and over again. Here, the animal does more than adapt by trial and error to environment changes; it gains an appreciation of the relationship between responding and reinforcement (Harlow 1949; for a review on learning set studies, see Fobes and King 1982). More recently, Wright and Katz (2009) argued that the lack of a full transfer to novel stimuli is not necessarily evidence of item-specific learning strategies (i.e., if-then or configural learning), but it can also indicate restricted-domain relational learning. In fact, pigeons and rhesus macaques that learned an S/D task using a training set of eight stimuli did not show evidence of abstract-concept learning (by transferring to novel stimuli), although they performed well with novel combinations of training stimuli and when the training stimuli were inverted. These findings suggest that pigeons and macaques learned the task on the basis of the relationship between the stimuli presented in each trial. Therefore, since it is possible that relational learning processes are circumscribed to the training stimuli, and unless control tests are applied, the failure to fully transfer cannot be taken as evidence of item-specific learning (Wright and Katz 2009).

The lack of transfer of matching-to-sample abilities from a familiar experimental set-up to the new touch screen system is an intriguing point. Two of our subjects had been successful in MTS tasks presented in a wooden apparatus in which the sample stimulus was placed at the centre of a vertical panel and two comparison stimuli (S+ and S−) were placed at the right and the left of the sample, either below the sample (e.g., Spinozzi et al. 2004) or at the same level (e.g., Spinozzi et al. 2003); the food reward was hidden behind one of the two comparison stimuli and the subject obtained the reward by moving the matching stimulus (S+). These two subjects did not transfer their matching ability and were not more proficient than the other naïve subjects. Therefore, capuchins’ do not seem to treat the same MTS problem presented with different apparatuses/procedures as analogous, and their ability to identify dimensions of similarity across sets of problems seems to be context specific.

In Experiment 1, five out of six capuchins transferred to novel stimuli after being presented with four to seven pairs, though they often failed in trial 1. Moreover, when faced with a large number of novel stimuli, only one capuchin performed well above the level of chance. Therefore, overall, there was a lack of a widely applicable matching concept. In contrast, in Experiment 2, the training with a large stimulus set led to a good level of accuracy, which for all subjects did not differ from that of the training trials. The transfer to a large set of stimuli that were never repeated (Experiments 2, test phase 1) ruled out the possibility that stimulus-feature learning could account for the performance and demonstrates that capuchins relied on the relationship between the stimuli (Wright and Katz 2007).

Pigeons also successfully transfer to novel stimuli when trained with a large variety of stimuli and do not when trained with two stimuli (Wright et al. 1988); moreover, their transfer performance in MTS improved by increasing the stimulus set size from 3 to 768 items (Bodily et al. 2008). The positive effect of expansion of the stimulus set on abstract-concept learning is also evident in S/D discrimination tasks, in which there is a positive relationship between the number of stimuli used during the training and the percentage of correct responses in the transfer phase (pigeons: Katz and Wright 2006; capuchins: Wright et al. 2003; and rhesus macaques: Katz et al. 2002; for a review see Katz et al. 2007). Hence, set size expansion improves performance in MTS and S/D, though these tasks may be solved by strategies involving different cognitive abilities (Premack 1978, 1983; Katz et al. 2007).

The process by which the use of a large number of exemplars improves abstract-concept acquisition is still unclear (Wright and Katz 2006; Wright et al. 2003), and species might be affected differently by the stimulus set size. According to Wright et al. (2003), although apes and humans learn abstract concepts with smaller stimulus sets compared to other animals, this difference does not necessarily imply qualitatively different cognitive abilities. Moreover, since chimpanzees learned a matching concept with a training set of two items that are familiar to them, a lock and a cup, (Oden et al. 1988), it is impossible to rule out that the chimpanzees tested, as the other primate species with a larger set of unfamiliar stimuli, might also show an improvement in performance as the size of the stimulus set increases (Wright et al. 2003).

The ability to transfer to stimuli belonging to dimensions other than that used in the training set has been considered to be clear evidence of relational learning (Premack 1978). In the present study, capuchins were able to extend the identity relationship from shape to both colour (Experiments 2, test phase 2) and size (Experiments 2, test phase 3). The acquisition of a widely applicable identity concept in one dimension (shape) led to the use of the identity rule in a flexible way. Our findings contrast those reported by D’Amato et al. (1985) in which capuchins, after experience with one pair of static visual stimuli, did not transfer the identity concept from static visual stimuli to dynamic visual stimuli (D’Amato et al. 1985), even after more than 15 years of experience with about a dozen static stimuli (D’Amato and Colombo 1989). In our opinion, the small number of stimuli might be insufficient to acquire an abstract concept and to use it in a flexible way. Moreover, in the transfer test, D’Amato et al. (1985) presented static and dynamic stimuli within the same trial; the presence of training stimuli in the transfer trials could have biased the animals’ responses (Katz et al. 2007). In addition, the experiments by Jackson and Pegram (1970) in which rhesus macaques were unable to transfer from colour matching to shape matching had similar procedural flaws, since a small stimulus set was used and transfer trials simultaneously presented highly familiar training stimuli (two discs of different colours) and a novel transfer stimulus (a white triangle).

In early studies, pigeons also seemed unable to transfer from one visual dimension to another because they needed re-learning processes (Zentall and Hogan 1974, 1976, 1978). However, Lombardi (2008) has recently found that pigeons trained to solve a simultaneous MTS task following an identity rule by colour transferred to novel stimuli of a different dimension (i.e., shape) if the comparison stimuli were about 1 cm from the sample, but not if they were fully adjacent to the sample. These findings, and similar ones from other studies (Wright 1997; Wright and Delius 2005), led Lombardi (2008) to suggest that pigeons have to be explicitly required to process the sample to acquire abstract concepts.

The stimulus location affects the strategies used to solve a matching task (Barros et al. 2002; Iversen et al. 1986), and when stimuli are presented in new positions, performance is likely to decrease. The finding that five out of six of our capuchins solved a matching problem with highly familiar stimuli presented in novel spatial arrangements (Experiments 2, test phase 4) further demonstrates their flexible use of the identity concept. We can speculate about the possible reasons for the failure of Carlotta in test phase 4 of Experiment 2 by examining her possible strategies of acquisition in Experiment 1. Carlotta was the only capuchin that was successful with the first pair of stimuli but did not transfer the identity rule in Experiment 1 when the stimulus set was small; her performance suggests the use of a reiterated strategy based on configural learning (in which the arrangements of the stimuli were used as predictors of the position of the matching stimulus). In test phase 4 of Experiment 2, the use of the first pair of stimuli presented in the training phase of Experiment 1 made it possible to vary only the spatial location of the stimuli but could have been biased by previous experience. In particular, the use of the first pair of stimuli could have recalled previously learned stimulus configurations that could not be applied to the novel spatial arrangements. Recent studies demonstrated that both inter-individual differences and proactive interference processes strongly affect relational learning. Pigeons can learn an identical S/D task either by relational learning or by item-specific learning (Elmore et al. 2009), and the training conditions could have detrimental effects on later transfers (Nakamura et al. 2009). Recently, Katz et al. (2010) trained pigeons with stimulus sets of different sizes. In phase 1, subjects received a small stimulus set until the criterion was reached and failed to transfer to novel stimuli; in phase 2, they received a large stimulus set until the criterion was reached and showed full transfer; and finally in phase 3, they received the small stimulus set again and their performance in the transfer was above the level of chance, though significantly lower than the baseline (i.e., partial transfer). The latter experiment demonstrates that in pigeons, further training with a small training set can once again restrict the broad domain established by a large training set.

The above studies demonstrate that animals acquire abstract concepts when trained with a large and varied set of stimuli, whereas they perform worse when the set of stimuli is small (see also Katz et al. 2007). In the light of these findings, we must reconsider the manner in which we evaluate cognition in captive primates. In particular, since experience affects cognitive development, a more accurate appreciation of intelligence requires the systematic exposure of captive individuals to a wide variety of challenges.

Overall, our results provide further support for the idea that exploring changes in concept learning processes over a wide range of controlled parameters is the key to understanding the underlying mechanisms of abstract-concept learning (Cook and Wasserman 2006; Katz et al. 2007; Wasserman and Young 2010; Wright and Katz 2006, 2007, 2009). Future studies are needed to clarify how learning strategies are involved in the acquisition of abstract concepts across species as well as in different individuals and to examine the role of carry-over effects due to early learning processes.