Keywords

Based on the idea that the mental abilities of all beings were subject to evolutionary continuity (Darwin, 1859, 1871), this chapter explores the uniqueness of human infant social cognition. Focusing on infants in this regard makes sense, because as soon as children reach preschool age, most tasks they are presented with in order to evaluate their social-cognitive abilities involve verbal responses and require sophisticated levels of other cognitive abilities such as attention and memory (see specific sections below). Thus, my approach is to compare human infant social cognition to that of our closest relatives, the nonhuman great apes – chimpanzees, bonobos, gorillas, and orangutans. This comparison is the most fruitful given that great apes are closely related to humans phylogenetically. The high degree of genetic relatedness (Homo sapiens and members of the species Pan share between 98.3% and 98.7% of their DNA; e.g., Wildman et al., 2003) suggests that differences in other areas, such as cognition, might also be smaller than expected.

In this chapter, I summarize empirical findings on human infants and great apes for their understanding of the mental states of goals, intentions, desires, and beliefs, examining whether human infants and great apes share an understanding of the mental state in question. If we identify a social-cognitive ability that only human infants possess but other great apes lack, this might be the key to human uniqueness and, consequently, a hint toward which abilities evolved specifically in humans and equipped our species with the tool kit needed to cooperate and build the kind of sophisticated culture we see in modern humans.

Reading others’ mental states helps individuals to explain and interpret others’ behavior (Premack & Woodruff, 1978), and, in consequence, to regulate one’s own intentional actions in accordance to the mental states underlying others’ actions. Thus, understanding that others act according to how they represent the current (and future) world equips individuals with a tremendous advantage. Although several authors have claimed that humans are unique in how they deal with others’ psychological states compared to all other animals (e.g., see Call & Tomasello, 2008; Povinelli et al., 2000; Tomasello & Rakoczy, 2003), in recent years comparative research has demonstrated that humans might not be the only species that discerns what others perceive, intend, desire, and believe when reasoning about others.

1 Understanding of Others’ Desires

The term desire has been used differently by different authors (e.g., Repacholi & Gopnik, 1997; Schult, 2002; Searle, 1983). What all explanations have in common is that desires are considered mental states that cause people to act to bring about changes in the world. Following Repacholi and Gopnik’s (1997) use of the term “desire,” it could easily be exchanged with other terms that describe an individual’s attitude toward outside entities (e.g., the term “preferences”). Such use allows us to differentiate desires from goals. At least in humans, an individual’s desire typically is reflected by emotional expressions (e.g., a happy face when liking an object).

1.1 Understanding Others’ Desires in Human Infants

The development of infants’ understanding of emotional expressions seems to start early in ontogeny. By 4 months of age, human infants discriminate between some facial expressions such as fear and happiness (Nelson, 1987). Repacholi (1998) tested 14-month-olds’ understanding of emotional expressions and found that the infants understood both the directedness and the valence of emotional signals. In this study, infants saw an adult approach two boxes, open each one in turn, and show an emotional expression according to the content of each box (either happiness or disgust). When handed both boxes afterward, the infants correctly interpreted the available emotional expressions which was shown by them, predominantly opening the box to which the adult had responded with a happy expression.

By 18 months of age, infants understand that an individual’s emotional expression toward an object can be considered an indicator of the individual’s desire (for this particular object). In their study, Repacholi and Gopnik (1997) showed that 18-month-old infants, after viewing an adult’s emotional responses to different food items, responded to the adult’s ambiguous request gesture by handing over her preferred food item even when it did not match their own preference. They thus demonstrated that 18-month-old infants understand that desires are mental states underlying behavior and action. Specifically, infants at this age – but not 14-month-olds – have a non-egocentric understanding of the differences between their own desires and those of others.

1.2 Understanding Others’ Desires in Nonhuman Great Apes

Nonhuman great apes have shown some ability to discriminate the emotional states of conspecifics (Parr et al., 1998) and are also able to match videos with emotional content (e.g., a conspecific being injected) to the appropriate conspecific facial expression (Parr, 2001). Apes thus can differentiate emotional expressions. Only a little research has investigated their understanding that emotional expressions reflect internal states (i.e., desires) directed referentially to outside entities. In one study (Buttelmann et al., 2009a, Exp. 1), behavioral settings previously presented to infants were modified and presented to great apes of all four species. As in Repacholi (1998), the authors had an experimenter emote neutrally or with disgust to the content of one box and with a happy emotional expression to the content of another box. At least for disgust versus happiness, the apes significantly preferred the box associated with the happy expression. Although this does not yet demonstrate that the apes considered emotional expressions as indicators for desires, it at least demonstrates an understanding of the valence and directedness of emotional expressions. To check whether apes indeed chose the happy box and did not just avoid the disgust box, and to see whether the apes used emotional expressions to infer desires and to interpret consequent behavior, the authors modified Repacholi and Gopnik’s (1997) procedure (Exp. 3). First, the experimenter hid two high-value food items in each of two containers. He then emoted to the content of each of the two containers: to one with an expression of happiness and to the other with an expression of disgust. Out of apes’ view, he then took the food out of the “happy box” and pretended to eat. Thus, when subsequently given a choice between both containers, the apes had to choose the disgust box in order to retrieve a food item. Although the effect was not too big, this is what the apes did. This research indicates that non-human great apes, like human 18-month-olds, can use the experimenter’s emotional expression to infer his desire.

2 Understanding Others’ Goals

As with the term desire, the use of the term goal provoked several misunderstandings (e.g., see Want & Harris, 2001). In general, there are two main ways this term is used in the developmental literature. On the one hand, authors talk about goals when referring to a certain state of the environment; i.e., a person’s desired result as a state in the world. On the other hand, the term goal is used as referring to an internal representation of this desired state in the world that guides a person’s behavior (Tomasello et al., 2005). Since this chapter deals with infants and nonhuman apes’ understanding of others’ mental states, the term “goal” refers to this mental representation of the desired state in the world.

2.1 Understanding Others’ Goals in Human Infants

Understanding others’ intentional actions in terms of goals tells the observer what an organism is trying to do. Thus, this mental state derives from a desire and forms a more concrete representation that affects the organism’s behavior. Several types of tasks have been used to investigate goal understanding in human infancy. A very prominent task is the habituation-dishabituation paradigm that measures infants’ looking time at video scenes in which an actor acts according to his or her goal versus performs an action that is not directed at his or her goal. Using this paradigm, research has shown that human infants seem to understand others’ goals from a very early age. For example, Woodward (1998) presented 6-month-olds with an actor who repeatedly reached for one of two objects that were always positioned next to each other (with each object staying at the same position during familiarization). At test, the two objects were switched, and the actor reached either at the old object (following a new path) or followed her previous path but consequently reached for the alternative object. Results indicated that 6-month-olds understand others’ actions as object-directed: The infants looked at the scene longer when the actor reached for the alternative object (although the path was identical to the previous one) compared to when she reached for the old object (although she had to use a new path).

In related research, a number of subsequent studies have provided evidence that infants between 6.5 and 12 months already understand that agents pursue goals around obstacles (e.g., Csibra et al., 1999; Gergely et al., 1995). Watching a computer screen, infants are habituated to an animate dot jumping over a wall and approaching another dot on the other side of this wall. When the wall is then removed in the test trial, infants seem to expect the dot to approach the other dot directly (e.g., horizontally, without a jump), which is interpreted from their longer looking times at the screen when the dot in the test trial continues the previous jumping action although it is now not necessary (violation of expectation).

Another clever task to measure whether human infants have an understanding of goal-directed actions is to investigate how they segment others’ actions (e.g., Baldwin et al., 2001; Saylor et al., 2007). Both on video and live, infants in the last quarter of their first year of life observed a human (or a human-like array of light dots) perform a continuous everyday action sequence. These action sequences were then interrupted, either at a point at which a goal-directed action was finished or at a point at which a goal-directed action was still in the process. What was measured was how long infants looked at these interruptions. Infants demonstrated their understanding of goal-directed actions when they looked longer at the interruptions when they stopped a goal-directed action from being fulfilled than at an interruption that appeared after a goal-directed action has been fulfilled.

Beginning in the second half of infants’ first year of life, their goal understanding starts to guide their behavior: 7-month-olds imitate an actor’s goal-directed actions in a differentiated manner. That is, they performed modeled actions only if these actions were goal-directed (Hamlin et al., 2008). Around their first birthdays, infants use their understanding of others’ goals when imitatively learning from others and copying others’ actions selectively according to the others’ goals. In one series of studies, participants were presented with an experimenter performing two identical but differently marked actions on a novel apparatus. Infants tended to imitate the action that was vocally marked as intentional (e.g., by the word “There!”), whereas they ignored the same action when marked as an accident (e.g., by the word “Whoops!”) by the experimenter (e.g., Carpenter et al., 1998; Olineck & Poulin-Dubois, 2005). In another set of imitation studies, 18-month-old infants even imitated an actor’s intended goal after they had only seen the actor’s failed attempt to achieve her goal (e.g., Bellagamba & Tomasello, 1999; Carpenter et al., 2005; Meltzoff, 1995).

Finally, studies testing infants in interactive non-imitative settings found that at around their first birthdays, infants start to understand that others persist in pursuing their goals when they are not achieved, for example, in the case of failed attempts or accidents. For example, Behne et al. (2005) presented infants sitting at a table with an experimenter handing over objects to the participants in two different ways. Infants from 9 months of age onwards became more impatient (reaching, banging on the table, turning away) when the experimenter was unwilling to pass over the object, e.g., by holding it out and moving back in a teasing fashion, than when the experimenter was unable to do so, e.g., because he accidentally dropped it during the attempt to hand it over to the participant. Further, infants infer an actor’s goal when presented with referential gestures, such as gazing and pointing in combination with intentional reaches for or graspings of an object, with some studies also including emotional cues (see previous section) (e.g., Moses et al., 2001; Phillips et al., 2002; Sodian & Thoermer, 2004). Taken together, there is an abundance of experimental evidence that shows that human infants have a relatively complex understanding of others’ goals by around their first birthdays.

2.2 Understanding Others’ Goals in Nonhuman Great Apes

Studies investigating the understanding of others’ goals in great apes apply different types of paradigms. In the first kind of study investigating goal understanding in great apes, a human experimenter communicates the location of hidden food to the subject by using various gestures (e.g., head orientation, pointing at the container) to show the subject which of these containers is baited. Across different studies, great apes show variable performance. Some studies indicate that, despite their ability to follow a human’s gaze (Call et al., 1998), chimpanzees still fail to use human pointing or gazing cues in such tasks (e.g., Call & Tomasello, 1994; Itakura et al., 1999; Tomasello et al., 1997). Other studies, however, indicate that chimpanzees, orangutans, and gorillas are all successful at using at least some of these cues, such as pointing, head and eye orientation, or a physical marker put intentionally on top of the baited container by the experimenter to locate hidden food (Barth et al., 2005; Byrnit, 2004; Call & Tomasello, 1998; Itakura & Tanaka, 1998; Miklósi & Soproni, 2006). However, it should be mentioned that in most cases when subjects did use these cues successfully, they first experienced extensive training where they might have learned the cues.

The fact that in this task subjects are presented with communicative cues such as pointing, tapping the container, or gazing in a cooperative setting might account for the apes’ failures to apply social cognition spontaneously (see also Herrmann et al., 2007). While all of these communicative cues are common behaviors for humans, they may not normally be used between conspecifics in other primate species (Goodall, 1986; Menzel, 1973). Yet great apes may be able to interpret others’ goals when observing them act in a more non-cooperative context, such as food acquisition or food processing. Unlike communicative cues, such behavioral cues consist of fully functional behaviors and do not reflect the actor’s intent to communicate specific information to other individuals (Buttelmann et al., 2008a). Indeed, several studies involving more behavioral cues provide evidence for great apes’ ability to understand the actions of others in terms of the goals they are pursuing. For example, although chimpanzees fail to retrieve the container with food in object-choice tasks that are set-up in a cooperative situation, i.e., when it was indicated by a communicative cue, they could successfully use a very similar behavioral cue to locate the hidden food in a competitive context, namely a human or conspecific reaching for the baited cup but not paying attention to the subject (Hare & Tomasello, 2004). Thus, even though both cues involve a very similar arm movement, subjects differentiated these two cues. However, there are also mixed findings with non-cooperative object-choice tasks. For instance, Bräuer et al. (2006) presented chimpanzees and bonobos with two behavioral cues: a cue in which the experimenter attempted to remove the lid from the baited cup, and the reaching cue from Hare and Tomasello (2004), and subjects failed to use these cues when choosing between locations. One important difference between this study and the one by Hare and Tomasello (2004) may account for the subjects’ failures in the reaching condition – in Hare and Tomasello’s study the experimenter first established a competitive relationship with the subject before the cue was given; such a relationship was not established in the Bräuer et al. study. This competitive context may have especially motivated the subjects. We will later come back to this point.

As with children (e.g., Behne et al., 2005), some studies also used natural interaction paradigms to investigate great apes’ ability to understand others’ goals. For example, Call et al. (2004) had a human experimenter repeatedly give chimpanzees food through a glass panel. Then, on some trials, the experimenter did not give the food. The experimental manipulation was that sometimes the experimenter did not give food because he was unwilling to, whereas other times he did not give it because he was unable to do so. Unwillingness was expressed in one of three ways: the experimenter either simply stared at the food on the table in front of him without giving it, he ate it himself, or he teased the chimpanzee with it, pulling it back when the chimpanzee reached for it. Matched with each of these three unwilling actions were two unable actions that resembled fairly closely their counterpart behaviorally – with respect to how and where the food was moved and where the experimenter looked. The basic finding was that chimpanzees reacted similarly to the different unwilling actions by expressing frustration and impatience, and they reacted similarly to the different unable actions by being patient (possibly because they inferred that the experimenter was “trying”). This similarity of reaction across the different manifestations of the two experimental conditions suggests that the chimpanzees understood the different goals of the experimenter in the different conditions – regardless of how they were expressed behaviorally (see also Buttelmann et al., 2012, Study 1). In another interactive paradigm, Buttelmann et al. (2012, Study 2) had an experimenter feeding subjects at two feeding sites and changing sites from time to time. The subjects were free to move between sites, and the authors measured how fast the subjects left a site after the experimenter got up at this site. The manipulation was that sometimes the experimenter got up wanting to feed the subject at the other site, other times his getting up was caused by cues in the context (e.g., a walkie-talkie making noise). The apes left the feeding site slower when the experimenter had a different goal (e.g., to answer the walkie-talkie) than the goal to feed them.

Interestingly, chimpanzees also demonstrated their understanding of others’ goals in other types of interaction studies previously applied to human infants, that is, interactive tasks in which the measure was more prosocial. Using a helping paradigm, Warneken and Tomasello (2006) had a human experimenter drop an object accidentally in the presence of each of three human-raised chimpanzees and then strain and reach toward it (with several different objects in several different situations). The chimpanzees retrieved it for him. Importantly, they did not retrieve it for him in various control conditions in which he threw the object away or otherwise indicated a lack of interest. The chimpanzees’ different behavior in the experimental and control conditions could be interpreted as indicating an understanding of the experimenter’s different goals in the two situations (see also Premack & Woodruff’s, 1978, study with a single human-raised chimpanzee). Warneken et al. (2007) set up a much more novel situation in which one chimpanzee also might help a conspecific. In this study, one chimpanzee was attempting to get into an adjoining room, often shaking the door in his attempt. Another chimpanzee then quite often, from her advantageous location, pulled a chain that unlocked the door so that the first chimpanzee could have access to the room he wanted to get into. They did this more in this condition than in a control condition in which the first chimpanzee was trying to get out another door. These studies of instrumental helping suggest that chimpanzees can tell when someone needs help achieving their goal.

Finally, Tomasello and Carpenter (2005) used two imitation paradigms (from research with human infants) with the same three human-raised, juvenile chimpanzees used in the Warneken and Tomasello (2006) study. In one paradigm, based on the study by Meltzoff (1995), a human tried but failed to perform various actions on objects. The chimpanzees successfully discerned the action the human was attempting to perform and performed it themselves (as often as when they had seen her perform it successfully, and more often than when she had just manipulated the object). In the other paradigm, based on the study by Carpenter et al. (1998), a human performed two actions on a series of apparatuses, one intentional and one accidental, before it was the chimpanzees’ turn. The human-raised chimpanzees copied the intentional action more often than the accidental action (see also Call et al., 2005, and Myowa-Yamakoshi & Matsuzawa, 2000, for other studies of this with less clear results).

Taken together, these studies seem to be fairly convincing that human infants and great apes, chimpanzees in particular, understand others’ intentional actions in terms of their goals. For great apes, showing this ability seems boosted when observing them acting in a non-communicative manner. However, one might argue that in most of these studies, the (human) actor behaved slightly differently in the experimental and control conditions (i.e., the slightly different behavior was the cue to the different underlying goals involved). This means that any one of these studies could be given an explanation in terms of behavioral rules – an association between an antecedent and a consequent behavior of the actor – that great apes are either born with or learn. In my view, this explanation is unlikely because of the novelty and diversity of behaviors used as both cues and responses in the great number of different studies. In addition, in some of these studies (e.g., Buttelmann et al., 2012), the experimenter’s observable behavior at test was identical in all conditions.

3 Understanding Others’ Intentions

Understanding what state in the world an individual is trying to achieve is already helpful for explaining and interpreting the individual’s behavior. However, the explanatory power of this ability is very restricted given that in most situations there are several methods of how to obtain a specific goal. It is thus also important to understand others’ intentions. Understanding intentions in this chapter refers to an understanding of others’ action in terms of both a goal – an internal representation of the desired state in the world – in combination with a rationally chosen means (Bratman, 1989). Thus, an individual that understands others’ intentions has an idea of what the acting organism is trying to do and how it wants to achieve it. The chosen action is rational since for the choice of a means, the person takes into account her knowledge and skills and her mental model of the current reality (Tomasello et al., 2005). Therefore, this choice is influenced by reasons (e.g., why the organism chooses a specific means). Some reasons for such a choice can be observed, others cannot (especially if these are mental as well, such as ignorance, see next section). Whereas goals can be fulfilled in different ways, intentions are characterized by what Searle calls causal self-referentiality (Searle, 1983). Intentions have to be carried out actively as originally represented. Achieving the desired outcome in some other way does not fulfill the intention (Astington & Gopnik, 1991; Schult, 2002). When comparing intentions with desires, Bratman (1987) further argues that there is a greater commitment to action with an intention than with a desire. Unlike with desires, intentions have to be made consistent with each other due to this greater commitment. In addition, because a person commits herself to an action plan in order to form an intention, intentions are resistant to reconsideration (Bratman, 1987, p. 18).

3.1 Understanding of Others’ Intentions in Human Infants

There is some evidence that infants understand others’ intentions around their first birthdays. Gergely et al. (2002) conducted a study in which 14-month-old infants watched an adult switch on a lamp with her forehead. For half of the infants the adult was forced to use this unusual means because her hands were occupied (she was holding a blanket around her shoulders; Hands-Occupied condition). The other half of the infants saw the adult use the same unusual means even though she could have more easily used her hands, which were not occupied (Hands-Free condition). When later given the possibility to act on the lamp themselves (with no constraints), more infants reproduced the adult’s unusual action when her hands had been free (69%) than when they had been occupied (21%). The most mentalistic interpretation of this result (Buttelmann et al., 2008b) is that in the Hands-Free condition, infants apparently understood that the adult must have chosen the unusual action for some reason (even if they did not know exactly what that reason was), and they copied the action in order to find out what that reason might have been. In the Hands-Occupied condition, in contrast, the adult’s reason for choosing that action was clear – her hands were constrained – and, since infants were not constrained in this way, they did not copy the action. Since the model’s goal was identical in both conditions (i.e., to illuminate the lamp), this study shows that infants took into account other aspects besides behavior, namely environmental variables, to identify the model’s intentions.

Subsequent research has made three important extensions to the original head-touch study. First, a study including the imitation of the use or choice of a tool to obtain a reward replicated the head-touch finding conceptually (Buttelmann et al., 2008b). In this study, a model either had to use a tool because the direct access to the reward was blocked by a physical barrier or he freely decided to use the tool although direct access was available (mirroring the Hands-Occupied and the Hands-Free conditions). In a set of two studies, 14-month-old infants used the tool themselves more often in the not-blocked than in the blocked condition. In a third study, the model had to use a tool in both conditions. In the not-blocked condition, he could freely choose between a rather unusual tool for pulling (i.e., a wooden block) or a tool usually associated with pulling (i.e., a rope), while in the blocked condition, he had access to the unusual tool only. The infants imitated the use of the unusual tool more often in the not-blocked than in the blocked condition. Another study had 12-month-olds watch as an adult made a toy animal enter a toy house through the chimney instead of the door – either while access through the door was blocked or while access through the door was possible (Schwier et al., 2006). Even these younger infants made the toy animal enter the house through the chimney more often in the door-not-blocked than in the door-blocked condition. Thus, infants make use of their intention understanding not only in body-part imitation tasks but in different types of imitation tasks.

Secondly, using two types of body-part-imitation tasks (i.e., head-touch and sit-touch), Géllen and Buttelmann (2017) tested 14-month-olds in a within-subjects design. The infants flexibly alternated their imitative response in accordance with a model’s changing physical constraints, demonstrating their ability to adapt their intention understanding in the light of the model’s physical constraints.

Third, two very recent studies made the most important extension of the head-touch experiment given the topic of this chapter by attempting to test the mentalistic account described above. Both studies presented 14-month-olds with a model performing the exact same unusual action (e.g., a head touch) in two conditions. What differed between conditions was whether or not the model was mentally – instead of physically – constrained. More specifically, while the model knew that the unusual action was unnecessary and that there was a much more efficient action to achieve the goal (e.g., hand use) in the knowledgeable condition (mirroring the hands-free condition), in the ignorant condition (mirroring the hands-occupied condition) the model was ignorant about this fact. Thus, if infants paid attention to the model’s knowledge state, they might differ in their response just as they did when being presented with physical constraints. The infants – who always knew of the efficient means either from observing an assistant or from self-experience – differed in their imitative response behavior as hypothesized in both a body-part imitation task (Géllen & Buttelmann, 2021) and in an action-on-object task (Buttelmann et al., 2021a). Thus, although all observable parameters of the model’s demonstration were identical in both conditions, infants imitated the model’s use of the unusual means more in the knowledgeable than in the ignorant condition. Thus, 1-year-old infants pay attention to the mental dimensions included in others’ intention formation, and they use this understanding when deciding what to imitate from them.

3.2 Understanding Others’ Intentions in Nonhuman Great Apes

Given our definition of intentions, for subjects to pass studies investigating the understanding of others’ actions in terms of intentions, it does not help to focus only on the question of what the other is trying to do. Additionally, the subjects have to pay attention to the means, that is, how the actor is trying to achieve the desired result. This seems to be difficult for great apes. When being presented with a model (human or conspecific) that performs a specific action in order to achieve a certain goal, they mostly seem to focus their attention on the end result of another’s action rather than the exact behavioral means that brings about the change in the world. For example, in the study by Tomasello et al. (1987), a chimpanzee was trained with a certain two-action methodology on a T-shaped tool that could be used to obtain out-of-reach food. In contrast to a control group without any model, the chimpanzees that had the possibility to observe the trained ape obtaining food with the tool also used the tool to get food that was out of their reach. Interestingly, although those subjects learned to use the tool whereas the subjects of the control did not, the successful subjects never copied the specific two-action method. They merely had learned causal relations without paying attention to the exact strategies of the demonstrator. Thus, apes mostly reproduce the end result of an action within social-learning situations (emulation learning, Tomasello, 1990, 1996) without copying the exact behavioral means that led to that result as human children do (e.g., Call et al., 2005; Call & Tomasello, 1994; Tennie et al., 2006). What remained unclear was whether this failure to imitate was due to an inability to focus their attention to the means used by the model. However, this might not be the case: Although the subjects did not imitate a successful means when it was their turn in a subsequent test phase, they successfully distinguished between a model trying to use a means that was successful in producing a reward in the past and a model who tried to open a reward box using a means not being demonstrated to be successful in the past (Buttelmann et al., 2013).

However, under some circumstances great apes do imitate. One such circumstance is a specific rearing environment. Indeed, human-raised or enculturated great apes often copy the specific actions of others (Bjorklund & Bering, 2003; Bjorklund et al., 2002; Tomasello & Carpenter, 2005; Tomasello et al., 1993). For instance, in the study of Bering et al. (2000), after a baseline period, a human demonstrator showed human-reared chimpanzees and orangutans a specific way of how to deal with several objects, e.g., use tongs bimanually to lift a cloth from a flat surface. After a delay of 10 min, the participants were then handed the objects again and scored for copying the demonstrator’s actions. All six participants displayed deferred imitation in at least one trial, with each species showing deferred imitation in approximately half of the trials they were tested in. Several reasons could account for the human-raised apes’ success to imitate. For example, it might be possible that those apes are exposed to tools and objects much more often than their mother-raised conspecifics, or they are notably trained to copy actions, or they actually possess a more detailed understanding of others’ intentions (see Call & Tomasello, 1996). I will later come back to this issue.

There is also recent evidence that non-enculturated great apes can learn to use specific foraging techniques when observing a conspecific. This is important because it may give rise to the possibility to apply this method usually used with human infants to investigate an understanding of intentions even in mother-reared great apes. Whiten et al. (2005) trained two adult chimpanzees with different tool-use techniques on how to obtain food from an apparatus (i.e., a lifting versus a poking technique) unobserved by their group mates. When the two model chimpanzees were then re-introduced to their respective groups, most group members mastered the new technique after observing their local expert (no ape did in a control group without an expert). Most chimpanzees adopted the exact method that had been seeded in their group. In this and other studies, chimpanzees are not only shown to acquire specific techniques, but that the adopted methods are even transmitted from one group member to the other (e.g., Horner et al., 2006; Whiten et al., 2007). Further, it has been shown that the movement of the involved parts of the apparatus alone (ghost condition) does not lead to the accomplishment and the transmission of the technique (Hopper et al., 2007).

Although it is impressive that great apes do imitate others, it remains unclear whether we can actually take this as evidence for an understanding of others’ intentions as rational choices of action plans. Actually, this is questionable for virtually all studies that tested mother-raised or human-raised great apes with only classical imitation tasks. Only the fact that subjects copy the means a model is performing and achieve the goal the model achieved before does not necessarily prove that successfully copying subjects understood the mental dimension behind this action (see Tomasello & Call, 1997). Still, two studies attempted to apply a rational-imitation paradigm like Gergely et al.’s (2002) head-touch task to great apes. Buttelmann et al. (2007) had enculturated chimpanzees watch a human model perform either a head touch, a foot touch, or a sit touch to illuminate a novel lamp or to play a sound from a novel sound box. The model either had to use these unusual means because of some physical constraints or he freely chose to use them without any constraint present. Like human infants, the chimpanzees imitated the model’s choice of means significantly more often when they freely chose to use them than when they were forced to use them. This was the first study to demonstrate that not only do great apes imitate others, they rather do so selectively. This finding might be less surprising given that the study subjects were enculturated. A subsequent study, however, found a similar pattern of results with great apes raised by their natural mothers. The study involving the rational choice of tools described above in the infant’s section (Buttelmann et al., 2008b) also tested great apes of all four species, i.e., chimpanzees, bonobos, gorillas, and orangutans, on the very same tasks as the ones applied to human infants. The interesting finding was that species differences appeared: While mother-reared chimpanzees, bonobos, and gorillas did not differentiate between conditions, the mother-reared orangutans chose/used the tool the model chose more often in the not-blocked condition (mirroring the hands-free condition) than in the blocked condition (mirroring the hands-occupied condition). Thus, while chimpanzees seem to need some enculturation to put them in a position to imitate rationally, orangutans’ imitative responses suggest that they applied an understanding of the model’s intentions as rational choices of action plans.

4 Human Infant and Nonhuman Great Ape Differences and Similarities in Their Understanding of Others’ Desires, Goals, and Intentions

Summing up the previous three sections, we can thus conclude that both human infants long before their second birthdays and great apes understand at least three different mental components in others’ intentional action. They can use others’ emotional expressions as indicators for their desires, they can interpret what others are trying to do (their goals), and they infer why an actor has chosen a specific means for how to achieve this goal (their intentions) and imitate this actor accordingly. Infants and great apes can use this understanding when learning from others and when interpreting others’ behavior. Still, given that studies on great apes’ understanding of intentions focus solely on rational imitation, more studies are needed to investigate whether great apes’ understanding of others’ intentions is similar to that of human infants. Assuming it is, understanding others’ desires, goals, and intentions does not seem to be the crucial element that differentiates infant social cognition from that of nonhuman great apes. What these three mental states have in common is that they have very strong behavioral correlates. In virtually all examples described above, subjects infer these mental states while watching a protagonist act. Thus, human infants and great apes might be able to extract mental states from protagonists’ immediate actions (e.g., facial expressions, hand movements, and so on) (e.g., Gergely & Csibra, 2003). Another thing that desires, goals, and intentions have in common is that they are non-propositional, that is, they cannot be true or false. Although someone’s goal might be improper in some moral or societal sense, it is never false regarding an objective view. The next section will investigate whether differences between human infants’ and great apes’ social cognition might lie in their understanding of propositional mental states, that is, beliefs.

5 Understanding Others’ Beliefs

In the last 15 years, the investigation on understanding others’ belief in preverbal children experienced a true revolution. This was because researchers found ways to present infants with language-reduced tasks. In contrast to the verbal tasks applied to toddlers and preschoolers for decades (e.g., Gopnik & Astington, 1988; Perner et al., 1987; Wimmer & Perner, 1983), these tasks on the one hand neither included long verbal instructions or stories nor did they require subjects to produce verbal responses. They thus measured subjects’ social-cognition competence implicitly.Footnote 1 Different types of tasks have been developed: tasks measuring (1) gaze behavior, (2) infants’ performance in interactions with others, and (3) neural activation.

5.1 Understanding of Others’ Beliefs in Human Infants

Although the first, pioneering study applying a gaze-behavior measure to investigate belief understanding was run more than 25 years ago (Clements & Perner, 1994), the rise of implicit false-belief tasks measuring gaze behavior started in earnest with Onishi and Baillargeon’s (2005) application of a violation-of-expectation paradigm. The authors presented 15-month-old infants with live theatre scenarios in which an actor either watched (true-belief condition) or did not watch (false-belief condition) an object switch locations before she reached to either the previous location or the new location of the object (i.e., the test event). When the actor reached, the researchers measured infants’ looking time at this event. Assuming that the actor reached because she wanted to have the object, then infants should expect her to reach for it where she knew or believed it to be (i.e., the new location in the true-belief condition and the previous location in the false-belief condition). A violation of this expectation should become visible in an enhanced looking time at the incongruent event. This is what the authors found. The interpretation was that infants considered the actor’s representation of the object’s current location when interpreting the actor’s actions. A large body of subsequent violation-of-expectation false-belief studies added empirical evidence to this interpretation (Kovács et al., 2010; see Baillargeon et al., 2010; Scott & Baillargeon, 2017, for reviews).

Another gaze-based measure of infant false-belief understanding is the measure of anticipatory looking. For this, subjects receive a light or sound cue right before the actor re-enters the scene after a change of the target object’s location. Researchers measure at which of several areas of interest (i.e., possible locations the actor might reach) infants are looking while getting this cue. This anticipatory look is considered the infants’ anticipation of the actor’s reach. If infants included the actor’s belief about the object’s location into their prediction of the actor’s acting, they should look at the current location if the actor held a true belief and they should look at the previous location of the object if the actor falsely believed the object to be where it was before the switch. Eighteen-month-olds and slightly older children have been shown to expect an agent to approach the location where the agent believes the toy to be (Senju et al., 2011; Southgate et al., 2007; Thoermer et al., 2012).

Tasks measuring infants’ false-belief understanding in interactive settings also put infants in situations in which they observe an actor who either holds a true believe (i.e., she knows the true location of the target object) or a false believe (i.e., she believes the object to be in a previous location). At test, infants are placed in a situation in which they can interact with the either (e.g., they can help her or communicate with her). In the first study of this kind (Buttelmann et al., 2009b), 18-month-olds needed to infer the actor’s goal from the actor’s behavior that was based on the actor’s (true or false) belief about the location of an object. More specifically, after the actor had placed a toy in one of two boxes and left the room, an assistant relocated the toy into a second box. Thus, when the actor re-entered the scene and tried to open the now-empty box, he falsely believed the toy to be in the box he tried to open. The children then helped the actor by opening the other box (i.e., the one that actually contained the toy) because they inferred that the actor wanted to find his toy. Importantly, when the actor had witnessed the relocation and nevertheless tried to open the empty box, the infants helped him by opening the empty box rather than the one currently containing the toy. For 16-month-olds this effect was less clear: Although they differed in their choice of box between conditions in the same pattern as did the older infants, in the true-belief condition they performed at chance level (instead of choosing the empty box significantly more often than the box with object).

Several subsequent interactive studies provided converging evidence for Buttelmann et al.’s (2009b) finding with 17- and 18-month-old infants (e.g., Buttelmann et al., 2014, 2015; Knudsen & Liszkowski, 2012a, b; Southgate et al., 2010). These studies also expanded the content of the actor’s beliefs about the content of a box or the identity of an object. In order to investigate whether the 18-month-olds in the Buttelmann et al. (2009b) study acted based on an understanding of beliefs (i.e., the actor believed the empty box to contain the toy) rather than an understanding of ignorance (i.e., the actor did not know where the toy was), Buttelmann et al. (2021b) presented 18-month-olds with a replication of the false-belief and a new ignorance condition, in which the actor did not know where the toy was in the first place. The infants’ helping behavior differed significantly between conditions: Whereas they opened the box with the object in the false-belief condition (replicating the original finding), they performed at chance level in the ignorance condition, where it was unclear whether the actor wanted his toy or wanted the empty box open. This might suggest that 18-month-olds indeed react to where an interaction partner represents an object.

The third class of infant false-belief tasks measures infants’ spontaneous neural responses to scenarios in which an actor acts while holding false or true beliefs (Hyde et al., 2015, 2018; Southgate & Vernetti, 2014). For example, in the Southgate and Vernetti (2014) study, using EEG, the authors measured 6-month-old infants’ sensorimotor alpha suppression as an indicator of motor-cortex activation. The infants showed such activation when an actor faced an empty box she believed to contain a toy, but they did not show activation of the motor cortex when the actor faced a box with an object she believed to be empty. This difference in motor activation suggests that if the actor believed the box to contain a toy, she might be reaching for it, while she might not do so in case she believed the box to be empty.

Recently, the replicability of implicit infant false-belief tasks was questioned because of different levels of success in replication attempts. While some laboratories successfully replicated and broadened previous results (e.g., Fizke et al., 2017; Oktay-Gür et al., 2018; Scott et al., 2015; Träuble et al., 2010), others failed to replicate earlier findings (e.g., Grosse Wiesmann et al., 2017; Yott & Poulin-Dubois, 2016; Zmyj et al., 2015). This leads to different consequences: On the one hand, one needs to be careful when interpreting unsuccessful replication attempts because a large number of factors might influence children’s performance in such tasks (see Scott & Baillargeon, 2017; Schulze & Buttelmann, 2020). Already small changes in the set-up and the procedure might lead to results that differ from that of the original studies (see Baillargeon et al., 2018; Buttelmann, 2017). On the other hand, the lack of success in some of the replications suggests that infants’ understanding of others’ beliefs might be relatively fragile, being subject to task demands and the specific circumstances of the testing situation.

5.2 Understanding of Others’ Beliefs in Nonhuman Great Apes

The interesting question is how nonhuman great apes might perform in such implicit false-belief tasks. Given that these tasks do not require verbal instructions or responses, they are highly suitable candidates to be applied to species that lack language. Indeed, after a number of unsuccessful attempts to find belief tracking in chimpanzees, bonobos, and orangutans (Call & Tomasello, 1999; Kaminski et al., 2008; Krachun et al., 2009, 2010), two studies – both using modified infant false-belief tasks – indicate that great apes do track others’ beliefs. In one of these studies (Krupenye et al., 2016), bonobos, chimpanzees, and orangutans as a group correctly anticipated a human actor’s goal-directed actions according to his belief about the location of an object (similar to how human infants did in the studies described above). More specifically, the apes observed videos in which an actor holding a false belief about the location of an object approached two target locations. As indicated by the subjects’ first look, they anticipated that the actor was likely to act on the location where he falsely believed the object to be, in contrast to the location that actually held the object (as known by the subjects). Interestingly, great apes of the same three species also correctly anticipated a human actor’s searching actions in the absence of behavioral cues (Kano et al., 2019): After gaining self-experience with translucent and opaque “windows,” the apes observed a human actor who saw an object being placed in one of two locations. The actor then positioned himself behind either a translucent or an opaque window (both appeared identical without self-experience), and the object was switched from the first location to the other. Then the actor positioned himself between the two locations. The subjects correctly anticipated that the actor might search for the object where he believed it to be by looking at this location (i.e., the second location after observing the switch [translucent window] or the first location after not observing [opaque window] the switch).

Like human infants, great apes also demonstrated their tracking of others’ beliefs in a more active task. Buttelmann et al. (2017) adopted the change-of-location interactive helping paradigm (Buttelmann et al., 2009b) and tested chimpanzees, bonobos, and orangutans. The authors found that the apes differed in their helping behavior according to whether the human actor held a true or a false belief about the location of an object. When observing the actor trying to open the empty box, they opened the box actually containing the object more when the actor believed this pulled box to contain the object than when the actor knew the pulled box was empty.

Thus, one might conclude that human infants and great apes are highly similar in their tracking of others’ beliefs, and they indeed might be. However, there might be one crucial difference, and this is the kind of actor the different species ascribe beliefs to. Human preschoolers and adults are known for their willingness to attribute mental states including goals, intentions, and beliefs not only to human actors but also to self-propelled inanimate objects (Buttelmann & Buttelmann, 2017; Heider & Simmel, 1944; Montgomery & Montgomery, 1999), and even human infants ascribe goals and beliefs to self-propelled geometric shapes (Luo & Baillargeon, 2005; Surian & Geraci, 2012). In contrast, when Krupenye et al. (2017) replaced the human actor from their 2016 study with self-propelled geometric shapes, chimpanzees, bonobos, and orangutans did not adapt their anticipatory looks to the “beliefs” of the geometric shape. Thus, great apes seem to ascribe beliefs to animate actors only. Human infants’ preparedness to ascribe beliefs (and other mental states) to virtually all kinds of self-propelled objects might be a specific feature of human social cognition.

Except for this small difference, no big gap can be found in tracking others’ desires, goals, intentions, and beliefs between human infants and great apes. Thus, although understanding others’ mental states might provide the foundation for the emergence of human acquisitions like the creation and use of complicated tools and technologies, mathematic and graphic symbols, or the development of complex social institutions such as religions, states, or social norms, it cannot be the critical feature because much of it is shared with our closest genetic relatives who do not use mathematics or graphic symbols. But what is it then that sets human infants onto a way leading to the type of culture that outperforms virtually other species in complexity and richness?

6 The Crucial Difference Between Humans and Great Apes

All the sophisticated acquisitions mentioned in the paragraph above do not rely on a single individual’s knowledge or ideas. They are most often products of collective cultural activities. Humans possess the motivation to take what was invented by their ancestors and accumulate and modify these inventions over generations (what Tomasello, 1999, calls the ratchet effect) to improve or create better artifacts. Thus, even though understanding others’ intentions is a fundamental aspect of social cognition and social behavior, there seems to be another important capacity in situations that require the collaboration of several people.

In regards to this capacity, Tomasello et al. (2005) introduced a second line of human development; the human unique motivation to share psychological states with others. Human infants, at the age of only a few months, seem to have a strong interest in sharing emotions reflect by their interactions with adults in protoconversations – social interactions in which the adult and the infant look, touch, smile, and vocalize toward each other in turn-taking sequences. Interestingly, what most observers have noted and researchers have found is that these protoconversations are held together through the exchange of emotions (Hobson, 2002; Stern, 1985; Trevarthen, 1979). Later, at the age of around 9 months, infants begin to engage with adults in triadic activities (Ratner & Bruner, 1978; Ross & Lollis, 1987). Infants start to share goals with adults as they act together to change the state of the world in some way and to perceive the world together in acts of joint perception (shared, or joint, attention). The infants’ motivation to share psychological states with others culminates shortly after their first birthday. At the age of 14 months, they not only understand others as intentional agents, they also have the desire to share their own intentions with others and begin to collaborate with others (Ross & Lollis, 1987; Warneken & Tomasello, 2007). At this stage of ontogeny, infants also start to use language, which in some theoretical perspectives is seen as an inherently collaborative activity by itself (see Clark, 1996).

As described before, an understanding of others’ mental states allows individuals to interpret others’ actions very explicitly. Being equipped with such a sophisticated understanding, an observer can make assumptions not only about the goal the observed individual is trying to achieve, but also about the particular manner with which this individual has chosen to act toward the goal. The observer can then develop some ideas about why the individual might have chosen to achieve the goal in that particular way based on the individual’s beliefs. Based on these assumptions, observers are provided with a powerful system of information about the other individuals’ action. This information can be used in two different contexts – competitive and cooperative. On the one hand, it can help in competitive situations, in which an individual tries to interpret a competitor’s behavior or tries to predict what a competitor might do next – something apes are already quite good at (Hare et al., 2000, 2001). In situations like these, the observing individual has an egocentric goal that it wants to fulfill (e.g., to steal food from the competitor) and plans to act to fulfill its goal for its own benefit. Acting this way might help the individual to survive and might even be adaptive for a whole species by allowing only the carrier of the best genes to reproduce (“survival of the fittest,” Spencer, 1864), but it averts the development of the achievements that need more skills and power than one individual alone can provide.

On the other hand, this understanding is especially helpful in more cooperative activities, such as cultural learning and collaboration, when an individual needs to decide what it should copy from another individual or how best it can help or coordinate with others (e.g., Bratman, 1992; Sebanz et al., 2006). In cooperative activities, two (or more) individuals have the same goal they want to fulfill (e.g., to build a house), and they act to fulfill their goal for their shared benefit. When using their sophisticated understanding of other’s mental states in such a way, a group of conspecifics can meet challenges that one individual alone would not be able to cope with. So, although some apes apparently have the same understanding of others’ intentional actions as human infants do, they do not copy others or collaborate with others to anywhere near the same extent as humans (see Tomasello, 2019; Tomasello et al., 2005, for reviews). Although an understanding of others’ intentions is necessary for human-like cultural learning and collaboration, it clearly is not sufficient.

Nonhuman great apes seem to lack the puzzle piece that turns more individualistic skills of social learning and group action into their collectively based, uniquely human counterparts of cultural learning and collaboration. For suggesting what this missing piece might be, I need to reiterate Tomasello et al.’ (2005) idea that what is needed for “the uniquely human aspects of social cognition” to emerge is an interaction of a sophisticated understanding of others’ intentional actions and a strong motivation to share intentions and other psychological states with others. Therefore, for an action to be performed in a collaborative manner it would not be enough to have the social-cognitive skills to know what action plan the other individual has rationally chosen in order to achieve its specific goal. What is necessary for human-like collaboration are specific social-motivational skills that, in combination with the social-cognitive skills, form shared intentionality, sometimes called “we” intentionality (Tomasello, 2019). For shared intentionality, both (or more) individuals must coordinate their action plans and commit themselves to the same goal and act according to their commitment (Gilbert, 1989; Searle, 1995; Tomasello & Carpenter, 2007).

The results reviewed above suggest that great apes seem to have the social-cognitive abilities necessary for shared intentionality: they understand others’ goals, intentions, and beliefs. What they might lack are the social-motivational prerequisites necessary to transform these individual social-cognitive skills into shared intentionality that then allows cultural learning and collaboration in the narrow sense of the term to take place.

However, in humans, the roots of this unique motivation to share experiences with others can already be observed in very young infants. Infants from very early in ontogeny share emotional states with others in turn-taking sequences (Trevarthen, 1979). Studies with slightly older children suggest that 24-month-olds prefer to perform an action that reveals an effect together with an adult, although they themselves could bring about the exact same effect when performing the action alone (Gräfenhain et al., 2009). Compared to this, great apes (with enculturated ones and maybe orangutans as an exception) also make use of their understanding of others’ mental states to interpret and predict others’ behavior (especially in competitive situations, see above) but do not copy an actor’s chosen action plan for the choice of which they cannot find a plausible reason. They do not consider these means as being important or necessary and therefore insist in using their own means to find solutions to solve a problem (i.e., emulation learning versus imitative learning). Thus, even if there were a plausible reason for the actor to choose her specific action plan that would also apply to the observer (e.g., maybe because this technique is more efficient for foraging), I would not expect mother-reared great apes (in the wild) to imitate this strategy (see Tomasello et al., 1987, for experimental evidence for this statement). Interestingly, in many instances in everyday human life, it seems on the surface to be even more adaptive not to copy the means of an actor but to emulate, and humans still do emulate in many occasions, as great apes predominantly do when learning socially (Tomasello & Call, 1997). The question this fact raises then is why did Mother Nature “allow” the human-innate social-motivational skills necessary for imitation (and therefore human-like culture) to evolve?

7 Future Directions

Because my major claim is that it is not predominant differences in social-cognitive skills but in pre-humans’ distinct social-motivational abilities that laid the foundation for the development of human-like culture, future studies investigating this area should compare different species’ motivational levels to share psychological states with others. Now that there is evidence that under certain circumstances chimpanzees are motivated enough to help others or cooperate in a problem-solving task (Melis et al., 2006; Warneken et al., 2007; Warneken & Tomasello, 2006, 2007), studies are needed that investigate whether great apes possess some kinds of motivations but lack the specific one to share psychological states with others. However, because the chimpanzees in Melis et al.’s (2006) study did not recruit cooperating partners when they were able to solve the task on their own, apes in general might perform differently than human children in tasks like the ones used by Gräfenhain et al. (2009).

There are two more questions of importance which deserve investigation. On the one hand, there is a need for more longitudinal studies that investigate the development of the sophisticated social-cognitive skills during great apes’ ontogeny (Bjorklund, & Bering, 2003; Bjorklund et al., 2000; Wobber et al., 2014) to clarify to what degree it is similar to that of human infants. On the other hand, and this poses a huge challenge for comparative research, it would be interesting to see whether great apes also attribute the mental states they seem to understand in humans to their conspecifics. This could be possible by presenting them with (trained) great ape “demonstrators” and “models.” These are central questions for future research attempting to establish the evolutionary roots of human-specific social-cognitive abilities and motivations.

8 Conclusion

Although some groundbreaking studies enhanced our understanding of infant and nonhuman great-ape social cognition, many more studies are needed, and we need to draw conclusions carefully. However, the evidence so far helps to narrow down species differences in understanding others’ mental states – insofar as at least on the level of desires (in the meaning of preferences), goals, intentions, and beliefs, the gap between humans and nonhuman primates is smaller than was previously thought (e.g., Tomasello & Call, 1997). The perfect area to look for crucial differences between humans and nonhuman primates, in my opinion, is the field of social-motivational aspects that drive individuals’ behavior. Perhaps it is the lack of high motivation or interest to share psychological states with others (and less differences in social-cognitive abilities) that made the huge difference in human evolution and enabled our (Homo) ancestors to collaborate, to create cultures, and so to go beyond accomplishments that one individual alone could achieve.