The apes’ edge: positional learning in chimpanzees and humans

Endress, Ansgar D.; Carden, Sarah; Versace, Elisabetta; Hauser, Marc D.

doi:10.1007/s10071-009-0299-8

The apes’ edge: positional learning in chimpanzees and humans

Original Paper
Published: 11 December 2009

Volume 13, pages 483–495, (2010)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Animal Cognition Aims and scope Submit manuscript

The apes’ edge: positional learning in chimpanzees and humans

Download PDF

Ansgar D. Endress¹,
Sarah Carden²,
Elisabetta Versace^2,3 &
…
Marc D. Hauser⁴

645 Accesses
43 Citations
Explore all metrics

Abstract

A wide variety of organisms produce actions and signals in particular temporal sequences, including the motor actions recruited during tool-mediated foraging, the arrangement of notes in the songs of birds, whales and gibbons, and the patterning of words in human speech. To accurately reproduce such events, the elements that comprise such sequences must be memorized. Both memory and artificial language learning studies have revealed at least two mechanisms for memorizing sequences, one tracking co-occurrence statistics among items in sequences (i.e., transitional probabilities) and the other one tracking the positions of items in sequences, in particular those of items in sequence-edges. The latter mechanism seems to dominate the encoding of sequences after limited exposure, and to be recruited by a wide array of grammatical phenomena. To assess whether humans differ from other species in their reliance on one mechanism over the other after limited exposure, we presented chimpanzees (Pan troglodytes) and human adults with brief exposure to six items, auditory sequences. Each sequence consisted of three distinct sound types (X, A, B), arranged according to two simple temporal rules: the A item always preceded the B item, and the sequence-edges were always occupied by the X item. In line with previous results with human adults, both species primarily encoded positional information from the sequences; that is, they kept track of the items that occurred in the sequence-edges. In contrast, the sensitivity to co-occurrence statistics was much weaker. Our results suggest that a mechanism to spontaneously encode positional information from sequences is present in both chimpanzees and humans and may represent the default in the absence of training and with brief exposure. As many grammatical regularities exhibit properties of this mechanism, it may be recruited by language and constrain the form that certain grammatical regularities take.

Implicit sequence learning using auditory cues leads to modality-specific representations

Article 20 October 2021

Learning and organization of within-session sequences by pigeons (Columba livia)

Article 19 June 2023

Zebra finches are able to learn affixation-like patterns

Article Open access 22 August 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Sounds, actions and physical events all unfold over time. Thus, for example, when a songbird or whale sings, it strings together species-specific notes into phrases that, together, provide information on individual, population and species identity (e.g., Prather et al. 2008; Suzuki et al. 2006). Similarly, when a chimpanzee or crow prepares to use a tool for extractive foraging, it must coordinate into a precise sequence the identification of a target resource, the gathering and preparation of a relevant tool, the use of this tool in a particular fashion, often repeating the same action with the tool until the target food is obtained (e.g., Byrne 1999). And when humans perceive speech, they must recall not only which words were produced, but where in the sequence each word occurred relative to the others, each with a specific meaning. To adaptively generate, perceive and comprehend such events, therefore, animals—humans included—must be equipped with mechanisms that process and memorize sequential input (Conway and Christiansen 2001; Terrace 2005).

Sequences can be memorized in different ways, relying on distinct mechanisms. On the one hand, it is possible to memorize the sequential relations among the items in a sequence. On the other hand, it is possible to encode the positions of items in a sequence. In experimental studies of humans, the second encoding mechanism seems to dominate memory processes after limited exposure and seems to be particularly important for language (Endress and Bonatti 2007). Indeed, many grammatical regularities are defined by the positions of certain elements in both artificial and natural grammars (Endress et al. 2009). For example, grammatical morphemes (e.g., the English plural “s”) occur in the first or the last position of words, but much more rarely in other positions (see general discussion for additional examples). However, little is known about the encoding of positional information in non-human animals. That is, there is substantial evidence that certain non-human primates and birds can encode the positions of items in a sequence (e.g., Chen et al. 1997; Hailman and Ficken 1987; Orlov et al. 2000, 2006; Terrace et al. 2003; Treichler et al. 2003). However, it is unknown whether positional information is encoded in a similar way as in humans, especially when tested under comparable conditions, and whether positional memories would dominate memory encoding also in non-human animals. Here, we start addressing these questions. Specifically, we contrast the capacity of chimpanzees (Pan troglodytes) and human adults to spontaneously encode the positions of items in sequences and to use this mechanism for extracting positional regularities from sequences. If they share this capacity, then it most likely evolved independently of language, and may not require language-input for developing ontogenetically in our own species.

We begin with a brief and selective review of the substantial literature on memory and sequence encoding and then turn to the evidence from our comparative experiments.

Two ways to encode sequences

Studies of memory encoding of sequences date back at least to Ebbinghaus (1885/1913). One important result of this research tradition is that sequences such as ABCD can be encoded in (at least) two different ways. On the one hand, it is possible to encode that A goes to B, B to C and C to D. Following Henson (1998), we will call such memories “chaining memories.” However, sequences such as ABCD can also be encoded using a different mechanism, namely by remembering that A was in the first position, D in the last position and B in the second position. In other words, it is possible to link the items in the sequence to abstract positional codes (see, among many others, Conrad 1960; Henson 1998, 1999; Hicks et al. 1966; Ng and Maybery 2002; Schulz 1955). These codes are abstract because they are not bound to any specific sequence or sequence item. This is most apparent in so-called intrusion errors in memory experiments, where participants recall an element in its correct position—but in the wrong sequence (e.g., Conrad 1960). For example, following exposure to ABCD and EFGH, and when recalling the sequence EFGH, participants may erroneously recall the sequence EBGH; that is, the B item was erroneously included in the sequence, but it kept its correct position from its original sequence. As the B item was not “chained” to any of the items in the sequence EFGH, the positional codes must be sufficiently abstract to be generalized from one sequence to another. We call these kinds of memories “positional memories” and come back to their precise nature in the following text.

Artificial language learning experiments in humans and other animals have identified two similar sequence-learning mechanisms. Chaining memories are usually characterized in statistical terms as “transitional probabilities” (Aslin et al. 1998; Saffran et al. 1996). In a sequence ABCD, transitional probabilities reflect the conditional probabilities of A going to B, B to C and so on (rather than deterministic transitions as those studied in traditional memory research). While such computations were first demonstrated using continuous speech streams as stimuli, they work equally well on visual stimuli or musical tones (e.g., Fiser and Aslin 2002; Saffran et al. 1999). Also, the same mechanisms have been observed in a non-human primate (Hauser et al. 2001) and in rats (Toro and Trobalón 2005). They may thus reflect an evolutionarily ancient learning capacity.

Although less well studied, it has been shown that human adults can also acquire positional memories in the situation that is usually employed to investigate transitional probabilities, namely when participants are exposed to quasi-continuous speech (Endress and Bonatti 2007; Endress and Mehler 2009). In these experiments, participants learned that certain syllables had to occur word-initially or word-finally and generalized this regularity to new items they had never heard before. Much evidence, both from memory research and artificial grammar learning (e.g., Conrad 1960; Endress and Bonatti 2007; Endress and Mehler 2009; Henson 1998, 1999; Hicks et al. 1966; Ng and Maybery 2002; Schulz 1955), suggests that such memory for positions is distinct and independent from chaining memory. In Endress and Bonatti’s (2007) studies, for example, positional memories required different cues than chaining memories, seemed to dominate participants’ representations after little exposure (while chaining memories came to dominate after prolonged exposure), broke down under different conditions (Endress and Mehler 2009), and behaved differently under temporal reversal of the test items (that is, chaining memories worked equally well forward and backward, while positional memories broke down when the order of elements in the test items was reversed; Endress and Wood, in preparation; Turk-Browne and Scholl 2009). Moreover, phenomena such as the aforementioned intrusion errors are difficult to explain with chaining memories (e.g., Henson 1998, 1999). It thus seems reasonable to conclude that these two kinds of memories are indeed mediated by different mechanisms.

Previous experiments targeting serial learning abilities have revealed that different non-human species have some sensitivity to positional information (e.g., Chen et al. 1997; Hailman and Ficken 1987; Orlov et al. 2000, 2006; Terrace et al. 2003). In chick-a-dee calls, for example, certain note-types have to occur call-initially and others call-finally (Hailman and Ficken 1987), suggesting that these birds have a mechanism to track such positions. Likewise, Orlov et al. (2000) showed that macaque monkeys spontaneously link items to their sequential position. In each trial, the monkeys saw a sequence of visual shapes. Then, they saw all shapes of the sequence simultaneously together with a distracter shape on a touch screen and had to touch the shapes in the order in which they had previously seen them (without touching the distracter). Importantly, the distracter shapes were taken from other sequences the monkeys had seen. When the monkeys touched the distracter shapes, they tended to do so in sequential positions where the shape had occurred in its original sequence, suggesting that they had linked these items to their sequential positions. This pattern of errors is, therefore, reminiscent of the aforementioned intrusion errors in humans (e.g., Conrad 1960; Smith 1967).

How are sequential positions encoded?

Much memory research suggests that only edge positions may be encoded precisely, while all other positions appear to be encoded relative to the edges, and thus less precisely (e.g., Henson 1998). That is, according to most models of memory for sequential positions, items in a sequence become linked to edge-based markers, and their sequential position is derived from their distance to these marker points (e.g., Henson 1998; Hitch et al. 1996; Ng and Maybery 2002; Page and Norris 1998), even if the implementations in these models vary widely. It is important to note that the possibility that positions are encoded relative to sequence-edges cannot be reduced to a classic serial position effect (that is, the observation that items in sequence-edges are memorized better than items in middles); rather, memory for positions seems to show its own serial position effect that is independent of the classic serial position effect in the Ebbinghaus tradition. This follows from the aforementioned observation that positional memory is distinct from other forms of sequential memory (e.g., Conrad 1960; Endress and Bonatti 2007; Endress and Mehler 2009; Henson 1998, 1999; Hicks et al. 1966; Ng and Maybery 2002; Schulz 1955). If so, serial position effects in these other forms of memory cannot explain the possibility that positional memory is edge-based. Rather, people seem to be endowed with specific, edge-based positional codes to which items in a sequence get linked; this allows them to reconstruct the sequential positions of items even if the items appear in a new sequence.

Further evidence for edge-based constraints comes from positional phenomena in artificial grammar learning experiments. While human adults can extract regularities involving the positions of items when the crucial positions are at the edges of sequences, they fail to do so when the crucial positions are in the middle of sequences (Endress et al. 2005; Endress and Mehler 2009). For example, learners notice that a syllable occurs in a particular position in a sequence when that position is at a sequence-edge, but have greater difficulty determining the position of a syllable when it is in another, non-edge position. Note that these results are not just due to the salience of the edges, as learners can process middle syllables perfectly well when they can rely on cues other than the positions (Endress et al. 2005).

In contrast to the aforementioned studies, results with long-tailed macaques seem to suggest that these animals encode sequential positions in absolute terms (that is, in terms of the first, second, third position and so on) rather than relative to the sequence-edges (Orlov et al. 2006). The basic paradigm in these experiments was similar to that used in Orlov et al.’s (2000), that is, the monkeys first saw a visual shape sequence and then had to touch the simultaneously presented shapes in the order in which they had been seen. In contrast to the previous experiments, however, monkeys were not shown the initial sample sequences but had to select the shapes according to their long-term memories of the previously trained sequences. Crucially, monkeys were trained on sequences of different lengths, namely three- and four-item sequences. This allowed the authors to contrast the predictions of absolute and relative encoding of the positions. For example, if positions were encoded relative to the sequence-edges, the proportion of intrusions of distracters in the last position of three-item sequences should be as high when the distracters are the last elements of other three-item sequences as when they are the last elements of other four-item sequences (because they would be the last elements in either case). In contrast, if positional encoding is absolute, the monkeys should be more likely to make intrusion errors if the distracter is the last element of a three-item sequence than when it is the last element of a four-item sequence, because positions 3 and 4 are not equivalent in terms of absolute positions. Results showed that the monkeys were indeed less likely to touch distracters from four-item sequences when recalling three-item sequences than to touch distracters from sequences of the same length. Accordingly, Orlov et al. (2006) concluded that positional codes are absolute rather than relative.

There is an alternative interpretation of the Orlov experiments, one that directly connects with the experiments presented here: subjects may not only encode the sequential positions, but also the length of the sequences. If so, they may reject distracters from sequences of incorrect length, and one would expect the same pattern of results as that observed by Orlov et al. (2006). In fact, at least humans can remember in some circumstances (such as the tip-of-the-tongue experience) the length of words even when they cannot access the words (e.g., Brown and McNeill 1966; Koriat and Lieblich 1974). We also would argue that monkeys may plausibly encode the length of sequences when they have to learn them. Moreover, Orlov et al.’s (2006) data actually offer partial support for the relative encoding hypothesis. In their experiments, intrusions in incorrect positions (e.g., a distracter from position 2 that intrudes in position 3) are much more frequent in the second and third position than in the first and the last position (see their Figure 5A). This would be unexpected if positions were encoded absolutely. But the increased positional uncertainty in sequence-middles fits well with models encoding sequential positions relative to the sequence-edges. We thus believe that the current evidence is consistent with the idea that animals, humans included, encode sequential positions in relative terms. The following experiments attempt to provide additional support for this interpretation.

The current experiments

In the experiments reported in the following text, we presented chimpanzees (Pan troglodytes) and human adults with a situation in which they could encode both chaining regularities among items and regularities involving the positions of items, asking what kinds of information they extract from these sequences. While it is highly plausible that chimpanzees can process chaining dependencies among items, given that species as distant as humans, cotton-top tamarins and rats can do so (Hauser et al. 2001; Saffran et al. 1996; Toro and Trobalón 2005), it is less clear how they encode positions of items. Based on studies of human adults, one would expect that participants would initially encode information about an item’s position, leaving for subsequent processing and additional exposure information about dependencies among items (Endress and Bonatti 2007). This raises the following question: are humans particularly good at encoding positions in edges of sequences because many grammatical regularities are based on the positions of items, which, in turn, are encoded relative to the edges of different linguistic units? Humans may therefore know (consciously or unconsciously) from their extensive experience with language that edges constitute positions to “watch out” for. Another possibility (that is not necessarily incompatible with the first one) is that edge-based positional coding appears in other, non-linguistic domains, acting as a constraint on the structure of language. From this perspective, we might expect evidence for edge effects in non-human animals, a proposal that should not be overinterpreted. Specifically, we are not claiming that studies such as these will show that animals have language. In fact, our claim is almost exactly the opposite: that is, certain crucial properties of language might have non-linguistic origins that constrain the structure of language.

In the following experiments, we presented chimpanzees with materials that consisted of both positional and chaining regularities, and asked whether, given limited exposure, they were more likely to learn about edge-based positional information than about the dependencies among items. In other words, if chimpanzees follow the same general pattern as evidenced in human studies, then initially they should notice which items occur in the sequence-edges, while their sensitivity to other regularities should be much weaker.

Experiment 1: sequence learning in chimpanzees

Experiment 1 asks what kinds of information chimpanzees extract from sequences containing both positional and chaining regularities. Since these regularities are tracked by independent mechanisms in humans, chimpanzees may initially extract either of these regularities or both. More specifically, participants could learn that certain items occurred in the sequence-edges (i.e., a positional regularity) and that some items predicted others (i.e., a chaining regularity). To increase the chance of observing chaining information, we decided to make the items carrying the chaining information stand out in two ways. First, we used only three possible items in sequences of only six items, with the expectation that the limited number of items should make the chaining regularity easily detectable. Second, we set up the chaining regularity by associating the key items with more salient (i.e., in terms of acoustical dimensions such as pitch and amplitude) and functionally significant chimpanzee vocalizations than the item carrying the positional regularity.

Participants were first habituated to such sequences and then tested on new sequences that either respected or violated the aforementioned chaining and positional regularities.

Materials and method

Participants

We tested 27 chimpanzees (20 females, mean age 5.37 years, range 1–21 years) from the Tchimpounga sanctuary, Republic of Congo. This is a relatively naïve experimental population, but has been presented with various behavioral experiments over the last few years (Herrmann et al. 2007; Wood et al. 2007). Fifteen animals were included in the final analyses (seven adults, eight infants; see the following text for exclusion criteria). Approximately 1 year prior to this experiment, a subset of the present test subjects had been presented with some of the same tokens in a different habituation/discrimination paradigm (testing an AAB pattern; Hauser and Hare, in preparation). Thus, there was a long gap between the experiments, and minimal overlap in test subjects and test items. All subjects were born in the wild, lived in rich social and physical environments, and since an early age, they have been in close contact with human caretakers.

Stimuli

To explore the learning of chaining and positional regularities, we created sound sequences with three unique items: a pant grunt (X), a scream (A) and a copulation call (B), all recorded from wild chimpanzees unfamiliar to the test population. Both the chimpanzee scream and copulation call are very distinct from one another and are, we assume, both acoustically and functionally more salient than the grunt. Screams are emitted by a subordinate individual during moments of heightened aggression initiated by a dominant. Copulation calls occur, as indicated by their name, during copulation. Pant grunts, in contrast, are a more common occurrence, frequently emitted during relatively mild interactions between subordinates and dominants (e.g., Crockford and Boesch 2005). These items were combined into six item sequences with the stimuli arranged in different orders .

The average duration of the items was 240 ms (X = 130 ms, A = 310 ms, B = 280 ms, SD = 96.4). They were recorded to mono wav files with a sample rate of 44.1 kHz and a sample width of 16 bits. Sequences were created in Audacity (http://audacity.sourceforge.net) by pasting the items into new wav files (44.1 kHz, 16 bits, mono) in the order in which they were to appear in the sequences; items were separated by silences of 120 ms. Thus, the duration of each sequence was 1.71 s. Sequences were played at an intensity of 65–72 dB SPL.

Apparatus

The layout of the experiment is schematized in Fig. 1. Stimuli were played using an iPod Hi-fi speaker (Apple Inc., Cupertino, CA). Prior to testing, the speaker was placed out of the subject’s sight and operated by the experimenter by means of a laptop computer. The experimenter was always positioned such that she was not in the line of sight between the subject and the speaker (see Fig. 1). The tests were also captured by a digital camcorder, which was operated by and positioned next to the experimenter.

Infant chimpanzees younger than 3 years were kept on a keeper’s lap facing 180° away from the speaker. Animals of at least 3 years of age were tested alone in cages. Keepers directed subjects’ attention away from the speaker by approximately 90°, either by playing with subjects or using food items; none of the keepers were aware of the goals or design of the experiments, and therefore were blind to our hypotheses. Furthermore, they were instructed not to react to the auditory stimuli.

Procedure

Subjects were tested individually using a habituation/discrimination paradigm. Sequences were played when an individual looked away from the speaker according to the coding criteria. Subjects were habituated to a series of auditory sequences that adhered to two patterns: (1) A preceded B and (2) X was located in the sequence-edges. There were four habituation sequences: XABXXX, XAXBXX, XAXXBX and XXXABX. These sequences were presented in random order until habituation was reached. We defined habituation as a failure to respond on three consecutive trials but also required subjects to hear a minimum of ten habituation trials. In cases where subjects produced three no-response trials before listening to ten habituation trials, we continued to present subjects with habituation trials until they had fulfilled both requirements.

Following habituation, we presented subjects with the six new test sequences indicated in Table 1. As we could not control the length of time subjects would remain attentive, we used a pseudorandom counterbalancing design. Specifically, the first four test trials consisted of two sequences that involved manipulations of the chaining regularity and two that involved manipulations of the positional regularity (sequences 1–4 in Table 1), respectively; these test sequences were presented in random order, and were then followed by two additional test sequences exploring the internal chaining relationships. As the data showed that participants kept responding at similar levels when reaching the last two sequences, these sequences were also included in the analyses.^{Footnote 1}

Table 1 Summary of the test item types used

Full size table

Data acquisition and analysis

Upon completion of the experiment, SC blind-coded the final three habituation trials and all test trials, each in separate video clips with no sound; sound onset and offset were digitally flagged so that the response could be assessed relative to the playback. Due to the different positions of the speaker, different criteria were used to establish an orienting response in infants and adults, namely a turn toward the speaker of at least 90° or 45°, respectively. Trials not fulfilling these criteria were coded as non-responses. Trials were excluded from analysis if subjects left the camera view, were significantly distracted by another subject, or if there was a significant noise disturbance. We defined significant distraction as moments when the subject oriented toward or interacted with chimpanzees in neighboring areas. We defined noise disturbances as any sound (i.e., chimpanzee calls or the clanking of a cage) above the general background noise level. As the sounds were recorded using a camcorder, we could not use an objective criterion to define noise disturbances. (Camcorders automatically modify the recording gain depending on the general sound level; amplitudes in the recordings thus have no unique relation to the sound level in the environment.) While disturbances were thus defined subjectively, we observed only three such trials in the animals included in the final analysis, and two coders (SC and MH) agreed to exclude these three trials.

We assessed inter-observer reliability by having a second experienced coder (MH) analyze, blind to condition, a randomly chosen subset of 35 habituation and test trials. The two coders’ judgments coincided on 34 of the 35 trials. All χ² values are corrected for continuity. N values given with χ² values reflect the cumulative number of trials.

Subjects were excluded from analysis if, after blind-coding, it turned out that they did not have three non-responses in a row prior to starting the test phase or if they did not respond to any sequence at all during the test phase.

Results

On average, participants required 16.6 trials to habituate (SD = 6.0, range 10–27). As shown in Fig. 2 and Table 1, evidence of successful discrimination from the habituation material is revealed by a relatively stronger response to sequences violating the positional regularity (sequences 1 and 2 in Table 1; percentage of trials responded to: 46.7%) than to sequences respecting it (sequences 3–6 in Table 1; 20.0%; χ ²_1,N=80 = 5.14, P = 0.023, φ = 0.282). In contrast, subjects did not respond more to sequences violating the chaining relation (sequences 2, 4 and 6 in Table 1; 33.3%) than to sequences respecting it (sequences 1, 3, 5 in Table 1; 26.3%; χ ²_1,N=80 = 0.15, P = 0.696, φ = 0.071, ns). Power analysis revealed that the failure to observe a sensitivity to the chaining regularity was not due to insufficient statistical power, as one would need at least 260 chimpanzees (corresponding to 1,560 responses) to achieve a power of 80%.

To analyze more fully our results, we submitted our data to a binomial ANOVA (Venables and Ripley 2002) with the factors regularity type (positional vs. chaining) and test item type (legal vs. foil). We obtained a main effect of test item type (χ ²_1,N=160 = 4.86, P = 0.028), but no main effect of regularity type (χ ²_1,N=160 = 0, P > 0.999, ns), nor an interaction between these factors (χ ²_1,N=160 = 1.78, P = 0.181, ns).

The non-significant interaction raises the question of whether we had sufficient statistical power to detect it. Unfortunately, we are unaware of a well-accepted method of power analysis for binomial ANOVAs. We thus evaluated the power of the interaction as a function of the number of chimpanzees in the following way. For each number of chimpanzees, we generated 10,000 sample experiments. In each sample experiment, we randomly generated the number of orientations to each test item as an independent sample drawn from a binomial distribution, using the response rates observed in our experiment as orientation probabilities. Then we submitted the data from the sample experiment to the same binomial ANOVA as our empirical data and counted the proportion of sample experiments for which the interaction was significant at the 0.05 level. The results of our simulations showed that the interaction was significant in at least 80% of the simulations with at least 64 chimpanzees.

To assess whether the keepers’ behavior might have influenced that of the chimpanzees, SC also blind-coded the behavior of the keepers during the test trials, using the same criteria as used with the chimpanzees. Out of a total of 64 trials (excluding those trials (N = 26) in which the keeper was not visible in the recording), the keepers oriented toward the speaker only on four occasions (6.25%). Their orientation rates differed neither depending on whether the test sequence respected the positional regularity (χ ²_1,N=64 = 0.13, P = 0.724, φ = 0.026, ns) nor depending on whether the test sequence conformed to the chaining regularity (χ ²_1,N=80 = 0.34, P = 0.561, φ = 0.137, ns). Crucially, in trials in which both the chimpanzees’ and the keepers’ behavior could be coded, the chimpanzees’ orientation responses did not depend on whether the keepers oriented toward the speaker or not (χ ²_1,N=58 = 0.10, P = 0.756, φ = 0.132, ns).

For the adult chimpanzees, we also analyzed the trials separately where the subject was actually looking at the keeper. This was the case for 58.3% of the trials with adult subjects. However, as the keepers never oriented toward the speaker in these trials, it was not possible to evaluate the association between the chimpanzees’ behavior and that of the keepers. Together, these results indicate that the chimpanzees’ responses to playbacks were independent of the keepers’ behavior.

Discussion

The results of Experiment 1 suggest that when given the opportunity to learn either a chaining regularity, positional regularity or both, chimpanzees initially and spontaneously extracted the positional regularity under the test conditions presented. While the limited sample size used in Experiment 1 and, as a result, the limited statistical power do not allow for strong conclusions about the computational abilities of chimpanzees, it is striking that we found a reliable orienting response to violations of the positional regularity, but not to violations of the chaining regularity. We would argue that these results suggest that positional regularities are extracted more readily than chaining regularities, although we would ideally need more subjects to answer this question conclusively.

Before accepting this tentative conclusion, however, we need to rule out several alternative accounts of our data, relating to how the behavior of the keepers might have influenced that of the chimpanzees, whether the chimpanzees had equivalent exposure to both kinds of regularities and whether they simply might have attended only to the first and the last elements in the sequences.

Possibly, the chimpanzees might simply have reacted to cues provided by the keepers, without paying attention to the sequence at all, a possibility that is akin to the Clever Hans effect. In fact, although chimpanzees are not particularly attentive to human-provided cues except in competitive settings, they can follow human gaze (Bräuer et al. 2005, 2006). In Experiment 1, we minimized the possibility that subjects were just following human cues in several ways. First, the keepers were not informed about the experiment’s goals and design and were specifically asked not to react to the stimuli. Second, to rule out the possibility that they provided subconscious cues to the subjects, SC blind-coded their reactions during the test trials, using the same criteria as those used with the chimpanzees. Results showed that the keepers’ behavior was not correlated with that of the chimpanzees. Moreover, in trials in which the (adult) chimpanzees actually looked at the keepers, the keepers never oriented toward the speaker, suggesting that they did not cue the chimpanzees either. Together, these results suggest that the chimpanzees’ behavior was not based on cues provided by the keepers.

A second alternative interpretation of our results is that chimpanzees may simply have had more experience with the positional regularity than with the chaining regularity. In fact, all habituation items respected the positional regularity. The chaining regularity, in contrast, was implemented in a more variable way: in half of the habituation sequences, the A and B items were adjacent, while the remaining sequences had an intervening X item between the A and the B items. While this manipulation was necessary to avoid associating A and B items with any particular position within the sequences and to prevent participants from simply learning AB “chunks”, it might have made it more difficult to learn the chaining regularity. For the moment, we leave open this possibility. In Experiment 3, however, we present data suggesting that it is unlikely that this design prevented chimpanzees from learning the chaining regularity.

A final alternative interpretation of these results is that the chimpanzees attended exclusively to the first (or the last) element of the sequence and ignored the rest. If so, they would dishabituate to sequences violating the chaining dependency not because they extracted the positional regularity, but because the only sequence element they attended to was “illegal” (that is, the first or the last item in the sequence). This possibility is unlikely for three reasons.

First, in a pilot study, the same chimpanzees attended to sequence-internal stimuli in a similar habituation/dishabituation paradigm (Hauser and Hare, in preparation). In particular, following habituation to chimpanzee vocalizations arranged in an AAB pattern, chimpanzee subjects were more likely to respond to novel sequences arranged in an ABB pattern than to novel sequences arranged in the familiar AAB pattern. To detect this difference, subjects must have noted either the change in the position of the identity relationship (i.e., from the first and second slot [AA] to the second and third slots [BB]) or the difference between identity (AA) and non-identity (AB). Either way, the chimpanzees were attending to more than the first or last element in these short, three-item strings.

Second, the sequence-internal vocalizations used during habituation were much more salient than those occurring in the sequence-edges, both acoustically and functionally; in fact, it is highly unlikely that chimpanzees would ignore screams or copulation calls (that were used sequence-internally) rather than a pant grunt (that was used in the sequence-edges).

Third, given that all X items were physically identical, the A and B items were expected to “pop out” even though they were internal to the sequence. That is, these two call types were expected to stand out against the uniform background of X items. (The reader can verify this effect with an example sequence using human-produced sounds at the following address: http://tinyurl.com/humanpopout.) In vision, pop-out effects have been observed in humans (e.g., Treisman and Gelade 1980; Treisman and Gormican 1988), rhesus macaques (e.g., Bichot and Schall 1999) and chimpanzees (e.g., Tomonaga 1995). In audition, similar pop-out effects have been observed in humans (Cusack and Carlyon 2003). While we are not aware of any comparative work on auditory pop-out effects, they are highly likely to be shared by a wide variety of animals, given that they follow from general principles of auditory scene analysis and that these principles have been observed in many animals, including humans (Bregman 1990), Japanese macaques (Izumi 2002), European starlings (MacDougall-Shackleton et al. 1998), finches (Benney and Braaten 2000), bats (Moss and Surlykke 2001), and goldfish (Fay 1998, 2000). As a result, the A and B items are very likely to pop out, which would make it hard for the chimpanzees to ignore them.

We therefore suggest that it is unlikely that chimpanzees restricted their attention to the first (or the last) element of the sequences. More likely, the chimpanzees noticed that pant grunts occurred in the first, last or both positions, and when this positional regularity was violated, they responded by orienting to the speaker. Alternatively, they may have noticed that the scream and the copulation call did not occur in the sequence-edges during habituation and may have reacted to test sequences where these items were placed in the edges. In either case, after limited exposure, chimpanzees seem to have extracted a positional regularity rather than a chaining regularity.

Experiment 2: sequence learning in humans

Experiment 2 asked whether human adults would show the same pattern of responses as chimpanzees when tested under similar conditions. Thus, although certain aspects of the experimental design were necessarily different, we controlled for the amount of exposure and the particular details of the input to determine whether adult participants would preferentially detect violations at the edges, while being much less sensitive to violations concerning chaining regularities.

To make the materials used with human participants as comparable as possible to those used with chimpanzees, we ran two different experimental conditions. In Experiment 2a, we used human-produced non-speech sounds of different saliency, such that the A and the B items were (presumably) more salient than the X item. However, while these items were human-produced, they were not human speech; in Experiment 2b, we test for the potential significance of this distinction by presenting human participants with human speech syllables.