Introduction

Studies of the development of nonhuman numerical competence are replete with controversy. Even for humans, some researchers still disagree on what constitutes various stages of numerical competence (e.g. Gelman and Gallistel 1986; Davis and Pérusse 1988; Gallistel 1988; Fuson 1988; review in Mix et al. 2002), which are the most complex, advanced stages (e.g. Gallistel 1988; Fuson 1988; Starkey and Cooper 1995; see Benoit et al. 2004), what mechanisms are involved (see Benoit et al. 2004) and even what is enumerated (e.g. Clearfield 2004; Feigenson 2005). Controversy also surrounds the role of language in numerical competence, not only for preverbal children but also for animal studies (note Watanabe and Huber 2006). Some scientists argue (e.g. Lenneberg 1971) that language and number skills require the same cognitive capacities and that animals, lacking human language, cannot succeed on number tasks; others view the issue differently and suggest that humans and animals have similar simple, basic number capacities but that only humans’ language skills enable development of numerical representation and thus abilities such as verbal counting, addition, etc. (e.g. Carey 2004; Feigenson et al. 2004; Spelke and Tsivkin 2001). Of course, label acquisition may direct children (and by implication, label-trained nonhumans) to attend more closely to characteristics involved in set formation (e.g. Waxman and Markow 1995; review in Mix et al. 2002), and thus provides preparation for dealing with number sets.

Scientists still do not even know whether number abilities require specific human brain centers. In some brain-damaged humans, numerical abilities may be impaired when other cognitive abilities are not harmed, but number capacities can be spared when skills involving memory, language, and reasoning are all damaged (Butterworth 1995). Other data do not support the need for a single brain area devoted to storing numerical knowledge (see Dehaene and Cohen 1994). More recent studies seem to demonstrate that certain human brain areas (e.g. intraparietal sulcus) are crucial for number work, but these areas have also been implicated in global tasks necessary for nonhuman survival, such as spatial attention; analogous (if not homologous) areas would thus be expected in animals (see Pepperberg 2006a). Other studies on possible parallels in brain areas underlying numerical judgments in humans and monkeys (review in Göbel and Rushworth 2004; see also Dehaene et al. 2003, 2004; Nieder and Miller 2003) support an evolutionary continuum in numerical abilities, but such data do not help for nonprimate subjects.

Despite controversies, comparisons of animal–human numerical competence continue. Researchers agree that, for any organism, “number sense” requires handling abstract concepts–representations and relations (Lenneberg 1971; Dehaene et al. 1999). A subject must have a representation of quantity that transfers across modalities and applies to any items (Seibt 1982). Number sense is not necessarily the same as true symbolic “counting” (Fuson 1988; Gelman and Gallistel 1986), which requires subjects to (1) produce a standard sequence of number tags, (2) apply a unique number tag to each item to be counted, (3) remember what already has been counted, and (4) know that the last number tag tells how many objects are there. Both humans and animals may use other mechanisms to quantify collections, such as subitizing or estimating; subitizing being defined (e.g. Kaufman et al. 1949) as a fast, effortless, accurate perceptual apprehension of number usually ≤4 that uses preattentive mechanisms and generally involves linear or canonical patterns (think dice or dominoes); estimation being a perceptual apprehension of larger numbers influenced by density and regularity of object distribution that enables approximations (e.g. between “80 and 100”; Dehaene 1997, pp. 70–72). Other researchers describe different noncounting mechanisms (e.g. object files or accumulators) to explain how very young children and animals achieve precise numeration for quantities<4 and approximate values for quantities ≥4 (Carey 2004; Mix et al. 2002; Xu 2003).

Whatever the mechanism used, animals exhibit various numerical abilities. Chimpanzees have enumeration skills, though not capacities identical to that of educated humans (Beran 2001, 2004; Beran and Rumbaugh 2001); monkeys, pigeons, and rats seem sensitive to ordinality and numerosity (e.g. Brannon and Terrace 1998, 2000; Brannon et al. 2001; Emmerton et al. 1997; Jordan and Brannon 2006; Nieder et al. 2002; Olthof et al. 1997; Orlov et al. 2002; Smith et al. 2003; Sulkowski and Hauser 2001; Xia et al. 2000, 2001; but see Dehaene 2001); crows (Smirnova et al. 2000; Thompson 1968), coots (Lyon 2003), orangutans (Call 2000; Shumaker et al. 2001), lions (McComb et al. 1994), lemurs (Lewis et al. 2005; Santos et al. 2005), and dolphins (Kilian et al. 2003; Mitchell et al. 1985) have some number concept; domestic dogs are sensitive to mass if not number (West and Young 2002), and even salamanders may represent more versus less (Uller et al. 2003).Footnote 1 Nevertheless, the extent to which animals understand number when compared to young children (e.g. Mix et al. 2002) is still unclear, and as noted above, several researchers consider more advanced numerical abilities–exact counting of quantities ≥4 and arithmetic operations–to be uniquely human and based on language skills (Spelke and Tsivkin 2001).

Apes referentially trained with Arabic numerals (chimpanzees, Pan troglodytes; Biro and Matsuzawa 2001; Boysen and Berntson 1989, 1990; Kawai and Matsuzawa 2000; Matsuzawa 1985; Murofushi 1997), or a Grey parrot (Psittacus erithacus) referentially using vocal English number words (Pepperberg 1987, 1994, 2006a, b; Pepperberg and Gordon 2005) would seem to be intermediary links between animals lacking such training and children (note Watanabe and Huber 2006); some of these animals have indeed demonstrated advanced numerical skills (e.g. addition and ordinality). Here I review the parrot work to show similarities and differences between human children and a nonhuman, nonprimate, nonmammalian species trained to use certain English labels referentially. I describe the parrot's data, briefly discuss possible mechanisms he might use, and in the process compare his numerical abilities to those of children.

Basic numerical abilities

After several years of training with a modeling technique, the Model-Rival (M/R) procedure (Pepperberg 1981), a Grey parrot, Alex, used vocal English labels to identify various objects, materials, colors, and shapes (review in Pepperberg 1999). He understood concepts of categories–that an item could be classified with respect to its color, shape, material, or label. He had functional use of phrases such as “I want X” and “Wanna go Y”, X and Y being appropriate object or location labels. He understood concepts of “same” and “different”: for any object pair he could label the attribute that was the same or different, and state “none” if nothing was same or different (Pepperberg 1999). But could he form an entirely new categorical class consisting of labels for quantity?

Could he be trained, for example, to reclassify items he currently labeled “key” or “green key” as “five key” (Pepperberg 1999)? To succeed, he would have to recognize that a new set of labels, “one”, “two”, “three”, “four”, “five”, and “six” represented a novel class: a means to categorize items based on both physical similarity within a group and a group's quantity, rather than solely by physical characteristics of group members. He would also have to generalize this new class of number labels to sets of novel items, items in random arrays, and heterogeneous collections. The study would provide data on a parrot's symbolic concept of number, that is, his ability to vocally designate the exact quantity of an array with an appropriate numerical, referential utterance.

Years before, Koehler (1943, 1950) and colleagues (Braun 1952; Lögler 1959) demonstrated Grey parrots’ sensitivity to quantity–numerosity and numerousness. Koehler's birds learned, for example, to open boxes randomly containing 0, 1, or 2 baits until they obtained a fixed number (e.g. 4). The number of boxes to be opened to obtain the precise number of baits varied across trials; the number being sought depended upon independent visual cues: black box lids denoted two baits, green lids three, etc. Koehler claimed his birds performed four different problems of this kind simultaneously. He did not state, however, if he presented different colored lids randomly in a single series of trials, and thus whether colors indeed “represented” particular quantities (see Pepperberg 1987, 1999). Lögler (1959) transferred such behavior to flashes of light and notes of a flute, thus going from simultaneous visual representations to sequential visual and auditory ones. But could Alex, like Matsuzawa's (1985) chimpanzee Ai, go beyond these tasks and use number as a symbolic, categorical label?

General training procedure

Alex was trained to identify small sets of objects with English number labels via the M/R procedure (see Pepperberg 1981, 1999), in which two humans demonstrate the targeted vocal behavior. Students and I used a subset of all objects available in the laboratory (e.g. keys and pieces of paper), training him first on three and four, next two and five, then six and finally one. Details are in Pepperberg (1987). The reason for the ordering was (a) he already identified triangles and squares as “three-corner” and “four-corner” and (b) to avoid cuing him on ordinality to see if this concept would emerge. Individuals taught number labels in consecutive order might use this information to learn that each successive label represents one more item than the one before it and form a number line, thereby simplifying the acquisition of labels for larger quantities. Such a procedure was avoided to ensure that Alex was building his concept of number solely by forming one-to-one associations between specific quantities and their respective number labels. Training also eschewed use of plurals so that Alex had to use “How many X?” rather than a final “s” to distinguish a number question from one involving color or shape (e.g., “What color X?”). Ordinality has indeed emerged (Pepperberg 2006b), and training on seven and eight are currently under way.

As described in Pepperberg (1987, 1999), Alex's speed of acquisition of any label (color, shape, number, etc.) was more often constrained by his need to learn how to control the many different parts of his vocal tract to produce a vocal English response (Patterson and Pepperberg 1994, 1998) rather than by the difficulty of the cognitive task. Thus I do not report number of trials to acquisition. Labels that contain familiar sounds may be learned in a single session; labels that contain sounds difficult to produce in an organism lacking lips (e.g. /p/, /wh/, /f/) can take weeks or even months to acquire. Note, however, that preliminary data from experiments designed to dissociate his ability to learn to produce vocal numerical labels from his ability to comprehend the meaning of these labels suggest that, much like children (Carey 2004), Alex may master new number meanings either immediately or within one or two trials (Pepperberg et al., unpublished data).

General testing procedure

All studies described here share a general testing procedure. The procedure, including descriptions of precautions against inadvertent and expectation cuing, is summarized below; details are in Pepperberg (1981, 1990, 1994). Specific queries, of course, differed with the type of numerical task.

Test sessions, at most one per day, involving one numerical question (i.e. on a single array) occurred two to five times per week. Test questions were presented intermittently either during free periods (when birds were requesting various foods or interactions) or during sessions on current (and thus unrelated) topics (e.g. using Alex to train another parrot on color labels) until all questions for the experiments were presented. As in all studies with Alex, the protocol differed from the ones used with other animals in that the task capitalized on his ability to work in the vocal mode. A question was repeated in a session only if his initial answer was incorrect (Pepperberg 1981, 1987). Thus, the number of times an array was presented depended on Alex's accuracy. If he produced the appropriate label, he received praise and the items to which the question referred to or was allowed to request an alternative reward. No further presentations of the same array then occurred; that is, there was one, “first trial” response, and that was the only number question that day. If an identification was incorrect or indistinct, the examiner removed the array, turned his/her head, and emphatically said “No!” Only under this condition was an array immediately, repeatedly presented in order to penalize a “win-stay” strategy, and presentation continued until a correct identification was made or four attempts occurred; errors were recorded. One number session could thus involve up to four trials on the same question, but only if Alex erred. Questions were randomly ordered by someone not involved in testing, and different numbers were thus generally tested on successive days. Only by chance could the same number be presented successively. Note that each array was unique: Even if similar objects were used (e.g. pieces of paper), collections would involve different colors, shapes, and/or textures.

Test situations included specific precautions to avoid cuing. One control was that each test session was, as noted above, presented intermittently during free periods or work on unrelated topics. A variety of objects were used in testing, including ones used for training the other birds; thus particular items would not cue Alex that a number test was in progress. The same was true for the tray on which objects were placed: it was used for a variety of other experiments (see Pepperberg 1999, 2006a for details). Alex's responses had to be chosen from his entire repertoire (>90 vocalizations, including labels for locations and foods) and from among numerous possible topics concerning various exemplars and questions during each session; each session contained only a single number array. This design not only increased task complexity, but prevented several forms of cuing (see Premack 1976, p. 132; Pepperberg 1999). A tester who, for example, poses a series of similar questions may expect a particular answer and unconsciously accept an indistinct (and by our criteria incorrect) response of, for example, “gree” (a mix of “green” and “three”) for “three”. Second, in general, a human other than the one presenting the tray (one of three to five possible individuals in these experiments), who did not know what was on the tray, confirmed the answer; his/her interpretation of Alex's response was thus unlikely to be influenced by an expectation of a certain color or object label. Only after such confirmation was Alex rewarded (Pepperberg 1981). Third, Alex could not have picked up on trainer-induced cues specific to a given label (Pepperberg 1981) because students who did not train number labels did the testing; also, for the majority of studies described here, no overlap occurred between training and testing situations–training on color, object, and number labels had occurred years before and involved 90 utterances. Fourth, because several different humans (at least three, often as many as five) were involved in testing, the presence of a particular individual could not cue a number session. Fifth, the evaluator was unlikely to be influenced by hearing the type of question posed: In a previous study, transcriptions of contextless tapes of Alex's responses in a session agreed with original evaluations to within 98.2%Footnote 2 (Pepperberg 1992).

Binomial tests were most often used to evaluate Alex's results for statistical significance, basing chance on number of labels relevant to a task; chance can vary from 1/2 to 1/7, depending on the query. Calculations are conservative in that they assume 100% comprehension of a query and, when appropriate, labels identifying a subset (i.e. that he understands labels such as “color” [and specific color labels], “matter” [and specific material labels], etc. in queries such as “How many green wood?” or “What color is five?”). Less conservative calculations would include the probability of erring on nonnumber labels; for example, correctly comprehending the number but incorrectly labeling or targeting the relevant color or object. Because he was ∼80% correct on color–shape–material–object comprehension tasks (Pepperberg 1990, 1992), probability for such a misidentification was small but not nonexistent.Footnote 3 Chance could also be based on production of all possible color or object labels, as if he randomly guessed after limiting his choice to colors, materials, or objects after hearing “What color …” (on queries such as “What color [is the set of] three?”). Note that anything in his large repertoire is a possible response; all calculations assume he will always (p=1) attend and respond correctly to the numerically relevant part of the question (i.e. not provide a random label with no connection to the task at hand, e.g. a shape label). In all cases the most conservative value of chance was used (see Pepperberg 1999 for details).

Results

The data showed Alex had some concept of quantity, but not necessarily one matching that of a human child (Fuson 1988; Mix et al. 2002; Pepperberg 1999). He could label sets of ≤6 different physical items (78.9%, all trials; Pepperberg 1987; Table 1); items did not need to be familiar nor in any particular pattern, such as a square or a triangle. Moreover, if presented with a heterogeneous set–of X's and Y’s–he could respond appropriately to “How many X?”, “How many Y?”, or “How many toy?” (70.0%, first trials; Pepperberg 1987). His level was beyond some children, who generally see only homogeneous sets (e.g. Starkey and Cooper 1995) and who, if asked about subsets in a heterogeneous set, usually label the total number of items if, like Alex, they have been taught to label homogeneous sets exclusively (see Siegel 1982; Greeno et al. 1984). By involving a variety of exemplars, tests ruled out Alex's use of cues such as mass, brightness, surface area, odor, object familiarity, or canonical pattern recognition (Pepperberg 1987, 1999). I had not, however, ruled out use of a noncounting strategy such as subitizing for the smallest collections or “clumping” or “chunking”–a form of subitizing (e.g. perception of six as two groups of three; see Jevons 1871; Mandler and Shebo 1982; von Glasersfeld 1982)–to correctly quantify larger collections. Other tests were needed to rule out these possibilities.

Table 1 Alex's use of number labels to categorize arrays of objects

Confounded number sets

An inferential test to distinguish perceptual recognition (subitizing) from counting in humans is based on visual processing mechanisms and involves “distractors” (Trick and Pylyshyn 1989, 1994): Subjects enumerate items in two different fields of distractors: (1) white or vertical lines among green horizontals; (2) white vertical lines among green vertical and white horizontals. Subitizing occurs for 1–3 only for the first condition. Subitizing thus seems to fail when conjunctive attentive processing is required for enumeration—when subjects must distinguish among various items defined by a collection of features (e.g. color and shape; see Pepperberg 1999). Such findings are consistent with the suggestion that the number of items that can be apprehended simultaneously decreases as the amount to be perceived about them increases (Glanville and Dallenbach 1929). Similar tests could be given to Alex because he already used a conjunctive condition to identify a single object within a collection (e.g. a red key within a collection of colored keys and other red items; Pepperberg 1992). He could now be asked to label the quantity of a similarly defined subset. Success would demonstrate whether Alex's competency, if not necessarily his strategy, was equivalent to that of humans (Pepperberg 1994).

Task and results

Alex was thus shown “confounded number sets” (quantities of four sets of items varying in two colors and two object categories–e.g. blue and red wood and blue and red wool) and was asked to enumerate items uniquely defined by both one color and one object category (e.g. “How many blue wood?”). His accuracy of 45/54, 83.3% (Pepperberg 1994) replicated that of humans in a comparable study (Trick and Pylyshyn 1989). Of additional importance was whether his scores varied according to number (Table 2) because a subitizing mechanism would be implied by data that (a) demonstrate a break in accuracy between, for example, 3 and 4 and (b) for larger numerosities, conform at least qualitatively to Weber's law (Gallistel and Gelman 1992; roughly stated, the greater the reference numerosity, the more imprecisely will a subject distinguish between it and nearby numerosities). That is, if the subject has high accuracy for small numbers but lower accuracy for larger ones, the subject is likely subitizing the smaller ones and using some other noncounting procedure for the larger ones. Thus, if Alex used a perceptual strategy similar to that used by humans, rather than some form of counting, he would make no errors for 1 and 2, few for 3, and more errors for larger numbers (Mandler and Shebo 1982). Sequential canonical analysis (Gorsuch and Figueredo 1991), however, showed that errors varied randomly with respect to number of items to be identified (Pepperberg 1994).

Table 2 Results and errors on heterogeneous “confounded” number task, listed for the targeted quantity

Discussion of errors

A detailed examination of Alex's responses revealed that some errors might involve misunderstanding questions, not numerical incompetence (Pepperberg 1994, 1999). Errors could, in fact, arise from four sources unrelated to number (Pepperberg 1992): (1) confusion of labels that sound alike; (2) misunderstanding the label directing a search; (3) problems with perceptual boundaries, e.g. differences in avian and human color perception; or (4) failure to understand that information from two categories must be used to identify targeted items (conjunction of information). Interestingly, only one error was random (i.e. did not correspond to any presented quantity). In a previous label comprehension study (Pepperberg 1992), almost half Alex's errors involved labels that, at least to humans, sound similar (e.g. rock and block). Such confounds were mostly eliminated in the present task, but a few trials combined block and truck. Such labels match only in the final consonant cluster, but Alex twice gave the number of trucks, rather than blocks, in a relevant trial. On one such trial, however, he also erred with respect to color (Pepperberg 1994). Determining if Alex misunderstood one or both labels directing the search is difficult because such errors cannot be distinguished from numerical errors (Pepperberg 1994, 1999). Previously, however, he scored ∼80% on tests requiring comprehension of attribute labels (Pepperberg 1990, 1992) and scores on the present task were comparable. Of course, he may have misinterpreted the defining labels, then correctly quantified the incorrectly targeted subset. Indeed, eight of his nine errors were the correct number for an alternative subset (Pepperberg 1994).

Perceptual errors were also possible. Few of Alex's nine errors, however, could be attributed to perceptual confounds (Pepperberg 1994). Tests avoided collections combining items such as rocks (Playdoh) and rawhide–difficult even for humans to distinguish by sight. A separate perceptual issue is that parrots and humans have different visual color boundaries (Bowmaker et al. 1994, 1996); thus Alex may confuse orange with red or yellow; purple with blue or red. Such combinations were avoided as much as possible. In two trials in which he responded with the number of the wrong colored subset, he labeled the number of green, not yellow, wools and yellow, not purple, papers.

In all cases, Alex used the conjunctive condition (Pepperberg 1994). He never quantified a subset defined by only one attribute (e.g. tallied all trucks). In eight out of nine errors, he labeled the quantity of a nontargeted subset that was still defined by a conjunction of color and object labels.

Discussion of mechanisms

Alex's mechanisms may not be identical to those of humans (Pepperberg 1994, 1999). He might subitize all quantities if avian visual perceptual capacities are superior to–or different from–those of humans. For example, unlike humans, might he perceptually segregate a set to be enumerated, then subitize (Dehaene and Changeux 1993)? Because targeted items are scattered among >9 distractors, such a strategy is unlikely for humans but may be possible for birds. Note that avian numerical perception may surpass that of humans in the auditory, sequential mode (Thompson 1968; Wolfgramm and Todt 1982); that is, because numbers of notes repeated in songs or calls may indicate different danger or aggression levels (e.g. Templeton et al. 2005), birds may be particularly good at distinguishing among various numbers of rapidly presented auditory stimuli, and conceivably transfer such ability not only between auditory and visual modes but also between sequential and simultaneous processes (Seibt 1982). To ensure Alex was beyond subitizing range, he would have to be tested on visual simultaneous quantities larger than those sequentially perceived auditorially in nature, but no data exist on how Greys process natural vocalizations. So, though results suggested that a bird had a competence level that, in an ape, would be considered comparable to a human with respect to quantifying sets of items (Pepperberg 1999), more was needed to determine the extent of Alex's abilities.

Comprehension of number labels

Although Alex achieved high scores on labeling numerical sets, such data did not show if he had formed only one-directional associations (i.e. could produce but not comprehend labels, which was a problem in some early studies on apes’ communicative and cognitive abilities and can occur in some instances in children as well; Savage-Rumbaugh et al. 1980, 1993; Fuson 1988), or if he fully understood the interchangeability of numerical questions (comprehension as well as production). For example, children who succeed on “How many marbles?”, showing that they can produce an appropriate number label, may fail on “Give me X marbles”; thus demonstrating that they really do not understand the relationship between the number label and quantity (see complete discussion in Wynn 1990). Note that this situation is the exact opposite from the usual comprehension–production distinction, in which children often provide evidence that they comprehend a label (e.g. via a looking task) before they can produce it (Golinkoff et al. 1987). If labeling indeed separates animal and human numerical abilities (see above discussion; Watanabe and Huber 2006), such equivalence is crucial to demonstrate nonhuman numerical competence (Fuson 1988). Even data demonstrating comprehension would not conclusively determine whether Alex was counting, but would provide additional evidence of his understanding of numerical symbols (Pepperberg and Gordon 2005).

Procedure (after Pepperberg and Gordon 2005)

Alex was shown collections with either three different numerical sets of the same objects of different colors (e.g. two orange blocks, three blue blocks, and six green blocks; see Fig. 1) or three different numerical sets of different objects of the same color (e.g. one red block, four red keys, five red sticks) all intermixed. He was asked “What color (object) [is] number X?”, where X=1–6. He received no training prior to testing (Pepperberg and Gordon 2005). Given the arrangement of items on the tray, the closeness of the tray to Alex's eye (5–10 cm), and the distance of the tray from the questioner's face (20–25 cm), Alex could not be cued as to the relevant subset by following eye gaze of the experimenter (e.g. Peignot and Anderson 1999; Vick and Anderson 2003).

Fig. 1
figure 1

Alex doing the number comprehension trials

The procedure required that he comprehend the auditorially presented numeral label (e.g. “6”) and use its meaning to direct a search for the cardinal amount specified by that label (e.g. six things), that is, know exactly what a set of “X” individual items is, even when intermixed with other items representing different numerical sets. Items for each number were not clumped together, and each item of a particular set was generally closer to an object of another set than to one of its own. Alex could not perform the task without comprehending the number label. Each query also retested his ability to identify the object or color of the set specified by the numerical label. To respond correctly, he had to process all types of information errorlessly. Some or all this behavior likely occurred as separate steps, each adding to task complexity (Premack 1983).

Results

Alex's score was 58/66, or 87.9% (first trials, binomial test, p<0.001, chance 1/3; Table 3). He made no errors on the first 10 trials, two errors in the second 10 trials, one in each 10 of the subsequent 20 trials, two in the next 10 trials, and two in the last 6 trials (Pepperberg and Gordon 2005). His error pattern suggested lack of focus or inattention as testing proceeded, not learning from mistakes. Number label comprehension matched production; that is, he understood what his number labels represent. He thus surpassed children up to about 3 years old, who, for example, may point to each item in a set, state “1, 2, 3”, but not understand that three items actually are present (Fuson 1988). How he compares to somewhat older children (≥3.5 years), who have generally begun to understand number labels fully and to count in the traditional sense (Fuson 1988; Wynn 1990), is unclear. He seems to have little difficulty with numbers differing by small amounts. Five of eight errors appear in such trials, but some may be due to color perception or phonological confusion. He often erred in distinguishing orange from red or yellow, which, as noted above, is a consequence of differences in parrot and human color vision (Bowmaker et al. 1994, 1996). He also sometimes confused “wool” and “wood” and “truck” and “chalk”, the latter being pronounced a bit like “chuck”. Given that Alex was not trained on the comprehension task, his results are compelling.

Table 3 Results and errors for Alex's comprehension trials for the targeted number (binomial test, chance 1/3, except for “none”, where chance was 1/4)a (from Pepperberg and Gordon 2005)

Use of “none”

Of particular interest, however, was the 10th trial within the first dozen. Alex was asked “What color 3?” to a set of two, three, and six objects. He replied “five”; the questioner asked twice more, each time he replied “five”. The questioner, not attending to the tray, finally said “OK, Alex, tell me, what color 5?” Alex immediately responded “none”. He had learned to state “none” if no category (color, shape, or material) was same or different when queried about similarity or difference for two objects (Pepperberg 1988), and had spontaneously transferred this response to “What color bigger?” for two objects of identical size in a study of relative size (Pepperberg and Brezinsky 1991), but had never been taught the concept of absence of quantity nor to respond to absence of an exemplar.Footnote 4 Note that he not only provided the correct response, but also set up the question himself. The query was repeated randomly throughout other trials with respect to absence of each possible number to ensure that this situation was not an odd happenstance. On these “none” trials, Alex's accuracy was 5/6 (83.3%; binomial test, p<0.01, chance of … [three relevant color labels plus “none”]). His one error was to label a color not on the tray.

The conventional term, “zero”, had not been introduced to indicate absence of quantity; Alex's use of “none” for this purpose was unexpected and impressive for at least four reasons (Pepperberg and Gordon 2005). First, labeling a null set, whether by “zero” or “none”, is a fairly recent human development (Bialystok and Codd 2000). That Alex, with a walnut-sized brain whose ancestral evolutionary history with humans likely dates from the dinosaurs, represented zero, even if not in a manner identical to that of humans, is striking. Second, the notion of none is abstract and relies on violation of expectation of presence (Pepperberg 1988); even though Alex already associated “none” with absence of similarity and difference (Pepperberg 1988) and lack of size difference (Pepperberg and Brezinsky 1991), he transferred the notion across domains to quantity, without training or prompting. Third, if parrots represent quantity as do children, then his comprehension of zero/none should have lagged behind than that of other small numbers (Wellman and Miller 1986). Children may have a none/nothing concept before learning that this quantity has a special label, “zero” (Wellman and Miller 1986); Alex was not taught “zero” but deliberately used “none” in a number comprehension task.Footnote 5 Finally, and likely most importantly, he initiated the topic. He repeatedly stated “five” when asked about “three”; when asked about the nonexistent “five”, he responded appropriately. The cognitive processes leading to this behavior are unknown. Possibly his action, occurring soon after a period of task noncompliance (Pepperberg and Gordon 2005), resulted from lack of interest and an attempt to make the procedure more challenging: Alex, when noncompliant, occasionally stated, then repeated, all colors not present on the trial; such behavior may have been a precursor to using “none”. Alex may not have understood none or zero in an ordinal sense; young children and apes have some difficulty with ordinal use of zero (Biro and Matsuzawa 2001; Wellman and Miller 1986), and relating cardinal and ordinal meaning is a hallmark of abstract numerical sense (Gelman and Gallistel 1986).

As usual, Alex's abilities raised more questions than they did answers. How closely did his notion of “none” match children's and animals’ understanding of zero? Might his understanding of number match that of chimpanzees who add and subtract (Boysen and Berntson 1989)?

Addition and further study of “none”

Few studies examine true addition in animals. True addition requires a subject to observe two (or more) separate quantities and provide the exact label for their total (Dehaene 1997). Only one study, on an ape, involved summation and required symbolical labeling of the sum (Boysen and Berntson 1989), demonstrating that the ape knew exactly how many objects were present at the end of the procedure; the study, however, used quantity totaling only four. Other studies, involving additive and subtractive tasks and using larger numbers of objects (up to 10), used only one type of token and required subject to choose the larger amount, not label final quantity (e.g. Beran 2001, 2004; Rumbaugh et al. 1987, 1988).Footnote 6 These procedural differences are important. First, when only one token type (e.g. marshmallows) is used in studies of relative amounts, evaluations of contour and mass, not number, could be responsible for the responses (Rousselle et al. 2004; see Mix et al. 2002 for a review), as was the case for pigeons (Columba livia; Olthof and Roberts 2000) and, on occasion, children (Feigenson 2005). Second, when the correct response is based on choice of relative amount, no information is obtained on whether the subject has “… a digital or discrete representation of numbers” (Dehaene 1997, p. 27).Footnote 7

And not all arithmetical studies on animals involve zero (Pepperberg 2006a). Zero is unique in that counting and adding presuppose something to add or count; absence of quantity may initially confuse children (see Bialystok and Codd 2000). Also, apes’ understanding of zero is not fully equivalent to that of humans. Although the ape Sheba was tested using a placard “0” in her addition trials, and could match one empty food tray to this placard, she never experienced total absence of objects to add (Boysen and Berntson 1989). The apes Sherman and Austin had to choose the greater quantity between two collections in which one food well could be empty, but were not asked to label the results (Rumbaugh et al. 1987, 1988); Beran's studies (2001, 2004) also did not involve labeling zero. Ai, trained both to produce and comprehend “0” with respect to absence of quantity, was not tested on zero in terms of arithmetic (Biro and Matsuzawa 2001).

Despite the need to know more about Alex's abilities, additional experiments were unplanned (Pepperberg 2006a). My students and I had begun a sequential auditory number session (training to respond to, e.g. hearing three computer-generated clicks with the vocal label “three”) with another bird in the standard manner, saying “Listen”, clicking (this time, twice), and then asking “Griffin, how many?” When he refused to answer, we replicated the trial. Alex, who often interrupts Griffin's sessions with phrases like “Talk clearly” or who occasionally answers even though he is not part of the procedure, said “four”. Alex was told to be quiet, as the answer for the specific trial was “two”. The trial was replicated yet again with Griffin, who remained silent; Alex now said “six”, which implied that he had summed all the clicks. A decision was thus made to replicate, as close as possible, the addition study of Boysen and Berntson (1989) and to extend the study to further work on zero (Pepperberg 2006a).

Procedure (from Pepperberg 2006a)

Without prior training, addition trials began when a human, out of Alex's sight, placed items, counterbalancing number sets right and left across trials, on a tray and covered the items with plastic cups. Items such as randomly shaped pieces of nuts or different sized jelly beans were used to avoid mass-contour issues (Mix et al. 2002). On a few trials, identical candy hearts were used to see if allowing responses with respect to mass-contour affected accuracy (see Feigenson 2005).

Each total amount was presented eight times, in random order, such that no collection was shown sequentially; collections totaled to every amount from 1 to 6. Alex was also asked “How many bean/nut/heart?” eight times when nothing was under any of the cups. Addends were displayed an equal number of times, such that, for example, amounts adding to 6 were presented as 6+0, 5+1, 4+2, and 3+3, two times each, alternating quantities under right/left cups; amounts adding to 5 were displayed as 5+0, 4+1, 3+2, etc.; examiners could present 1 only as 1+0, randomizing quantity under right/left cups. (NB: Unless otherwise stated, X+Y collections refer to X+Y and Y+X forms.) All possible addend collections were also randomized. When multiple objects were under a single cup, each object was less than 1cm from other nearest items and generally the distance was less.

Trials proceeded as follows. The human brought the tray to Alex's face, lifted the cup on his left, showed him what was under the cup for 2–3 s in initial trials, and then replaced the cup over the quantity; the procedure was replicated for the cup on his right. In trials comprising the last third of the experiment, he had ∼10–15 s to view the items under each cup, including (for reasons that become apparent below) a replication of all 5+0 trials that had been given for 2–3 s. The experimenter made eye contact with Alex, who was then asked, vocally, and without any training, to respond to queries such as “How many nut total?” No objects were visible during questioning. To respond correctly, he had to remember the quantity under each cup, perform some combinatorial process, then produce a label for the total amount. He had no time limit in which to respond, but if he did not answer within about 5 s, the question was restated; if he grabbed and overturned a cup, items were covered, he was again shown both sets of objects sequentially, and the query was restated. Given that his time to respond generally correlated with his current interest in the items being used in the task, rather than the task itself (Pepperberg 1988), response latency was not recorded. In the trials for which nothing was under both cups, the goal was to determine the extent to which he could generalize use of “none” without instruction.

Results and discussion

Alex's scores were calculated several ways and examined for several issues (Table 4; details of each trial are in Pepperberg 2006a). One issue of interest was that for X+0 trials, he had difficulty only with X=5. He could not do 5+0 trials in 2–3 s; he consistently said “six”, repeating the answer even when told he was wrong. Retaining the errors for the 5+0 trials given in 2–3 s, Alex's accuracy was 41/48 or 85.4% for first trial responses (binomial test, p<0.005 chance 1/2 or 1/6), and 48/60 or 80% for all trials. If replications of 5+0 trials under the 10–15 s time interval are substituted, his first trial accuracy was 43/48 or 89.6%, p<0.005, and 48/53 or 90.6% for all trials. His accuracy for quantities summing to 1 or 2 was 15/16 or 93.8% (p<0.005); his accuracy for sets summing to 5 or 6 when given the longer time period was 13/16 or 81.3% (p<0.005); the difference is nonsignificant (comparing errors and correct scores for small versus large trials, p=0.599, Fisher's exact test); note that all errors for larger sums involved apparent labeling of addends before labeling the total. Of particular interest was his first-trial score for each sum; his accuracy was 5/6 or 84.3% (binomial test, p<0.02 chance 1/3). Candy hearts did not help him with the second 5+0 trial or the single 3+1 error. Though he did not err on any of the other four trials using hearts, number of errors overall was too small to suggest that using objects of equal mass and contour made any difference. Interestingly, three of his four errors on queries other than 5+0 involved situations where the larger addend was on his left, that is, in reverse of the ordinal number line; however, he was correct on 18 of these reversed-order trials, suggesting that errors were random (comparing errors and correct scores for reversed versus nonreversed trials, p=0.606, Fisher's exact test). Initially, when given only 2–3 s, he was always wrong on the 5+0 sum, consistently stating “6”. When given 10–15 s, his accuracy went to 100% on 5+0 trials; the difference in accuracy between the shorter and longer interval trials was significant (counting all queries for 0+5 and 5+0, Fisher's exact test, p=0.01). For other trials, he went from 26/29 (89.7%, p<0.005) for 2–3 s to 15/17 (88.2%, p<0.005) for 10–15 s, that is, remained constant.

Table 4 Results and errors for Alex's addition trials, with respect to the total number of objects presented (from Pepperberg 2006a)

Alex thus demonstrated some ability to sum small quantities. Results seem independent of number or type of objects involved, except for 5+0 trials given in 2–3 s that he labeled as “6”, and some trials involving three where he seemingly labeled addends before providing the sum. If long-interval 5+0 trials are used, he was as accurate on small sums of one or two as on those summing to five or six. His performance was apparently independent of mass or contour; having equal mass-contour did not help when time was restricted for 5+0. Thus, in general, his data are comparable to those of young children (Mix et al. 2002) and apes (Boysen and Hallberg 2000). His responses on 5+0 trials suggest he may be using a counting strategy for 5, precisely because he needed additional time to achieve a correct answer (see below, Pepperberg 2006a).

Alex did not respond “none” when nothing was under any cup. On the first four trials, he looked at the tray and said nothing. He sometimes tried to lift the cups; he was then showed again, by a trainer who lifted cups one at a time, that nothing was present. On the fifth, sixth, and seventh trials, he said “one”. On the last trial, he again refused to answer. These two responses to absence were intriguing (Pepperberg 2006a). His failure to respond on six trials suggests he recognized a difference from other trials, that is, that standard number answers would be incorrect. He did not, as he does when bored with a task (e.g. Pepperberg 1992; Pepperberg and Gordon 2005), give strings of wrong answers or request treats or to return to his cage. He understood that the query (e.g., “How many nut total?”) did not correspond to the number of cups; he never said “two”. Overall, he acted more like autistic children (Sherman, personal communication, January 17, 2005), who simply stare at the questioner when asked “How many X?” if nothing exists to count. His response of “one” on three trials (Pepperberg 2006a) suggests comparison to Ai, who confused “one” with “zero”. Although Alex was never trained on ordinality, and had learned numbers in random order (see above), he, like Ai, seemed to grasp that “none” and “one” represent the lower end of the number spectrum. Data suggest that on tasks involving a number line even humans may treat zero differently from other numbers (Brysbaert 1995; Butterworth et al. 2001): more time is needed to process zero than other numerals and its processing may be based on different principles. As noted earlier, Alex previously used “none” to denote absence of a designated number of items (Pepperberg and Gordon 2005), an attribute of the overall collection, which was a logical extension of his use of “none” to mark absence of same–different with respect to various attributes of object pairs, including absence of size difference. Here he was asked to denote the total absence of labeled objects, a different task.

Specifically, Alex's use of “none” is zero-like, but is not isomorphic with the adult human use of “zero”. He does not use “none”, as he does his number labels (Pepperberg, 1987) to denote a specific numerosity (Pepperberg 2006a). In that sense, he is like humans in earlier cultures,Footnote 8 or young (∼3-year-old) children, who seem to have to be about 4-year old before they achieve full adult-like understanding of the labels for zero and other numerals (Bialystok and Codd 2000; Wellman and Miller 1986). Thus, whether Alex can acquire full understanding of the equivalence of “none” to the concept of zero is still to be determined.

General discussion

In sum, Alex demonstrated a range of numerical concepts that in many ways resemble those of young children in that he referentially comprehends and produces number labels with respect to specific exact quantities (≤6), can sum small amounts (again ≤6) and has a zero-like concept; with respect to the confounded number task, his abilities are similar to those of adult humans. His capacities are not, however, isomorphic with those of adult humans (e.g. he has not been tested on subtraction, and he has only recently been exposed to numbers greater than 6), and the extent to which his limitations relate to the forms of training he received, his limited language abilities, or both issues, remains unclear. Remember, unlike children, he did not learn number labels in order, was not trained with a plural marker to denote quantity, and was not given explicit instruction on ordinality (e.g. to produce a vocal number line). I also have not yet discussed in detail the mechanisms he might use in these tasks, or the brain structures that might be involved.

Alex, like humans, likely uses different mechanisms for different tasks and for smaller (≤4) and larger (>4) numbers. Alex's tasks–to enumerate simultaneously presented visual quantities—argues against use for ≤4 of an accumulator, which, although often used to explain data from many types of number-related tasks (Gallistel and Gelman 1992; Meck and Church 1983), is more appropriate for tracking sequences, such as tones or light flashes (see Benoit et al. (2004) and particularly Mix et al. (2002) for detailed discussions).Footnote 9 Alex's data are not consistent with such a model,Footnote 10 as his errors do not increase with amount if time to respond is not an issue. Similarly, although object file mechanisms for ≤4 have also been used to explain various numerical tasks (e.g. Uller et al. 1999), such a model is more useful for detecting matches or mismatches between different arrays, rather than direct enumeration (Benoit et al. 2004). Detailed reviews of these mechanisms and Alex's data can be found in Pepperberg and Gordon (2005) and Pepperberg (2006a). What is most likely for Alex is some form of comparison of ≤4 arrays with representations of memorized canonical sets (von Glasersfeld 1992). Unless birds have better perceptual mechanisms than humans, Alex's accuracy on sets >4, and particularly his need for a longer time to quantify 5, also argues against use of a perceptual mechanism for this quantity. For 5, he likely uses a mechanism more like human counting.Footnote 11 When 6 was his largest known number label, it may have come to represent anything ≥6. Note that, if given larger quantities (7, 8, 9), Alex, before he had labels for these arrays, also generally referred to them as “6” (Pepperberg, unpublished data). I suggest he uses some form of canonical comparison for quantities less than 5, switches to a counting-style mechanism for 5 if given adequate time to process what is presented, and before studies began on 7 and 8, probably used a perceptual mechanism (“lots”) for ≥6.

As has been argued previously, animals’ abilities to learn in the laboratory are likely based on an existent cognitive architecture (Pepperberg 1999, 2006a, b); their training merely provides a way to examine the extent to which this architecture matches that of educated humans (see Jarvis et al. 2005). As noted above, some data suggest that specific human brain areas are involved in numerical processing (e.g. Lemer et al. 2003; Dehaene et al. 2003); how might a parrot brain function on tasks such as those given Alex? Does he have a homologue or analogue of human inferior parietal cortices, particularly the intraparietal sulcus (IPS) and inferior parietal lobule, the human areas supposedly tied to numerical competence (e.g. Lemer et al. 2003)? Many number tasks involve issues of spatial attention and nonsymbolic comparisons, which also correlate to IPS activity (Coull and Nobre 1998; Fias et al. 2003; Göbel et al. 2004; Jordan et al. 2004; Simon 1999)Footnote 12 and that are essential for nonhuman survival; thus nonhumans likely have analogous brain areas (note Walsh 2003). Studies also suggest that different human brain areas are activated for processes involving small versus large numbers (Colvin et al. 2005; Göbel et al. 2001a, b); such might also be the case for Alex. Clearly, his numerical abilities are not identical to those of children, but his exposure to and training on such tasks are also limited compared to that of an average preschooler. Whatever brain areas he uses likely function in an analogous manner at least to nonhuman primates, given the similarities in data (Boysen 1993; Boysen and Berntson 1989, 1990; Boysen and Hallberg 2000; Pepperberg 2006a, b).

Alex's training on human number labels may have enabled him, like Matsuzawa's Ai (Matsuzawa 1985), Boysen's Sheba (Boysen 1993), and Premack's and Boysen's Sarah (Boysen 1993), to use representational abilities that would otherwise be inaccessible (see Watanabe and Huber 2006). These animals not only have access to symbols, but also extensive enculturation to a variety of human cognitive tasks; their data suggest that numerical concepts beyond those involving very small quantities (i.e. ≤4) are functional in at least some nonhumans (Pepperberg 2006a). Enculturation issues are emphasized by data on the human Pirahã tribe, who lack number labels and whose numerical abilities (they seem to have “one”, “two”, and “many”; Gordon, 2004) appear to be less complex than those of enculturated nonhumans. Thus, the ability to form symbolic representations, whether for collections or concepts, appears to enhance numerical competence, allowing a bird such as Alex and various apes to perform at the level of young children who are also beginning to understand symbolic representations. Additional studies are needed to determine the full extent of Alex's numerical capacities. He recently demonstrated some understanding of ordinality (Pepperberg 2006b), but further experiments are needed to determine if he can learn both to produce and fully understand the meaning of an ordinal number sequence, and use “none” as a numeral. He, like apes and young children, must also be tested on subtraction (e.g. Boysen and Berntson 1989, 1990; Fuson 1988), and whether both addition and subtraction can be extended to symbolic use of Arabic numbers. Although full-blown language, completely isomorphic with that of humans, may be necessary for more advanced numerical concepts, such a level seems not to be required for those concept so far understood by Alex.